Epicareer Might not Working Properly
Learn More

Research Assistant (System Admin), Yang Zhang Lab

Salary undisclosed

Checking job availability...

Original
Simplified
By continuing to use and navigate this website, you are agreeing to the use of cookies.
Accept ClosePress Tab to Move to Skip to Content LinkSearch JobsJob DescriptionJob Title: Research Assistant (System Admin), Yang Zhang LabPosting Start Date: 20/03/2025Job Description:Job DescriptionProf Yang Zhang's Lab at the School of Computing, National University of Singapore (NUS) is seeking a skilled and adaptable System Administrator with expertise in Red Hat Linux and UNIX environments. The primary responsibility of this role is to oversee the construction and management of a High-Performance Computing (HPC) cluster comprising 4,500 CPU/GPU cores, BeeGFS storage, and Infiniband interconnects. The ideal candidate will play a critical role in ensuring the stability, security, and optimal performance of the system through proficient management and troubleshooting of the HPC infrastructure.Role & Responsibilities
  • Oversee the management, monitoring, and maintenance of the HPC cluster and online server systems operating on Red Hat Linux and UNIX platforms.
  • Perform routine system administration tasks, including user management, access control, file system maintenance, and system backups.
  • Continuously monitor system performance and resource utilization, proactively identifying and resolving bottlenecks to maintain optimal responsiveness.
  • Troubleshoot and resolve hardware, software, and network issues, collaborating with cross-functional teams when necessary.
  • Implement and enforce security protocols to protect systems from unauthorized access, vulnerabilities, and cyber threats.
  • Plan and execute system patches, updates, and upgrades, ensuring a secure and up-to-date computing environment.
  • Investigate and respond to system alerts and incidents, performing root cause analysis and implementing preventive measures.
  • Maintain comprehensive documentation of system configurations, procedures, and troubleshooting steps for internal reference and knowledge sharing.
  • Provide technical support to end-users, assisting with system-related inquiries, issue resolution, and training as needed.
  • Participate in capacity planning and scalability assessments to ensure system resources align with both current and future requirements.
  • Work closely with vendors and third-party service providers to manage hardware and software procurement, maintenance, and support contracts.
  • Undertake any additional responsibilities related to the procurement, updating, and maintenance of the HPC cluster and network infrastructure.
Apply:Interested candidates should submit their CV along with a brief description of their experience and interest to Prof. Yang Zhang via email at [email protected].Job Requirements
  • A Bachelor's degree or higher in Computer Science, Information Technology, or a related field. Relevant certifications such as Red Hat Certified Engineer (RHCE) are a plus.
  • Familiarity with artificial intelligence (AI) and machine learning is not mandatory but would be advantageous.
  • Expertise in CPU/GPU architectures, BeeGFS storage systems, and Infiniband interconnects is highly desirable.
  • Proven experience as a System Administrator, with a strong focus on Red Hat Linux and UNIX environments.
  • Proficiency in shell scripting and automation to optimize system administration workflows.
  • In-depth understanding of networking concepts, protocols, and troubleshooting techniques in a mixed-platform environment.
  • Experience with Linux Job Schedulers such as SLURM, PBS, or similar is critical and preferred.
  • Familiarity with system monitoring and management tools, including Nagios, Zabbix, and Ansible.
  • Strong analytical and problem-solving skills, with the ability to diagnose and resolve complex technical issues efficiently.
  • Excellent communication skills, both written and verbal, to collaborate effectively with technical and non-technical stakeholders.
  • Meticulous attention to detail, with a commitment to maintaining comprehensive documentation and accurate records.
  • Adaptability and teamwork skills, with the ability to work efficiently in a dynamic, evolving technological landscape.
By continuing to use and navigate this website, you are agreeing to the use of cookies.
Accept ClosePress Tab to Move to Skip to Content LinkSearch JobsJob DescriptionJob Title: Research Assistant (System Admin), Yang Zhang LabPosting Start Date: 20/03/2025Job Description:Job DescriptionProf Yang Zhang's Lab at the School of Computing, National University of Singapore (NUS) is seeking a skilled and adaptable System Administrator with expertise in Red Hat Linux and UNIX environments. The primary responsibility of this role is to oversee the construction and management of a High-Performance Computing (HPC) cluster comprising 4,500 CPU/GPU cores, BeeGFS storage, and Infiniband interconnects. The ideal candidate will play a critical role in ensuring the stability, security, and optimal performance of the system through proficient management and troubleshooting of the HPC infrastructure.Role & Responsibilities
  • Oversee the management, monitoring, and maintenance of the HPC cluster and online server systems operating on Red Hat Linux and UNIX platforms.
  • Perform routine system administration tasks, including user management, access control, file system maintenance, and system backups.
  • Continuously monitor system performance and resource utilization, proactively identifying and resolving bottlenecks to maintain optimal responsiveness.
  • Troubleshoot and resolve hardware, software, and network issues, collaborating with cross-functional teams when necessary.
  • Implement and enforce security protocols to protect systems from unauthorized access, vulnerabilities, and cyber threats.
  • Plan and execute system patches, updates, and upgrades, ensuring a secure and up-to-date computing environment.
  • Investigate and respond to system alerts and incidents, performing root cause analysis and implementing preventive measures.
  • Maintain comprehensive documentation of system configurations, procedures, and troubleshooting steps for internal reference and knowledge sharing.
  • Provide technical support to end-users, assisting with system-related inquiries, issue resolution, and training as needed.
  • Participate in capacity planning and scalability assessments to ensure system resources align with both current and future requirements.
  • Work closely with vendors and third-party service providers to manage hardware and software procurement, maintenance, and support contracts.
  • Undertake any additional responsibilities related to the procurement, updating, and maintenance of the HPC cluster and network infrastructure.
Apply:Interested candidates should submit their CV along with a brief description of their experience and interest to Prof. Yang Zhang via email at [email protected].Job Requirements
  • A Bachelor's degree or higher in Computer Science, Information Technology, or a related field. Relevant certifications such as Red Hat Certified Engineer (RHCE) are a plus.
  • Familiarity with artificial intelligence (AI) and machine learning is not mandatory but would be advantageous.
  • Expertise in CPU/GPU architectures, BeeGFS storage systems, and Infiniband interconnects is highly desirable.
  • Proven experience as a System Administrator, with a strong focus on Red Hat Linux and UNIX environments.
  • Proficiency in shell scripting and automation to optimize system administration workflows.
  • In-depth understanding of networking concepts, protocols, and troubleshooting techniques in a mixed-platform environment.
  • Experience with Linux Job Schedulers such as SLURM, PBS, or similar is critical and preferred.
  • Familiarity with system monitoring and management tools, including Nagios, Zabbix, and Ansible.
  • Strong analytical and problem-solving skills, with the ability to diagnose and resolve complex technical issues efficiently.
  • Excellent communication skills, both written and verbal, to collaborate effectively with technical and non-technical stakeholders.
  • Meticulous attention to detail, with a commitment to maintaining comprehensive documentation and accurate records.
  • Adaptability and teamwork skills, with the ability to work efficiently in a dynamic, evolving technological landscape.