Research Assistant (System Admin), Yang Zhang Lab
Salary undisclosed
Checking job availability...
Original
Simplified
Accept ClosePress Tab to Move to Skip to Content LinkSearch JobsJob DescriptionJob Title: Research Assistant (System Admin), Yang Zhang LabPosting Start Date: 20/03/2025Job Description:Job DescriptionProf Yang Zhang's Lab at the School of Computing, National University of Singapore (NUS) is seeking a skilled and adaptable System Administrator with expertise in Red Hat Linux and UNIX environments. The primary responsibility of this role is to oversee the construction and management of a High-Performance Computing (HPC) cluster comprising 4,500 CPU/GPU cores, BeeGFS storage, and Infiniband interconnects. The ideal candidate will play a critical role in ensuring the stability, security, and optimal performance of the system through proficient management and troubleshooting of the HPC infrastructure.Role & Responsibilities
- Oversee the management, monitoring, and maintenance of the HPC cluster and online server systems operating on Red Hat Linux and UNIX platforms.
- Perform routine system administration tasks, including user management, access control, file system maintenance, and system backups.
- Continuously monitor system performance and resource utilization, proactively identifying and resolving bottlenecks to maintain optimal responsiveness.
- Troubleshoot and resolve hardware, software, and network issues, collaborating with cross-functional teams when necessary.
- Implement and enforce security protocols to protect systems from unauthorized access, vulnerabilities, and cyber threats.
- Plan and execute system patches, updates, and upgrades, ensuring a secure and up-to-date computing environment.
- Investigate and respond to system alerts and incidents, performing root cause analysis and implementing preventive measures.
- Maintain comprehensive documentation of system configurations, procedures, and troubleshooting steps for internal reference and knowledge sharing.
- Provide technical support to end-users, assisting with system-related inquiries, issue resolution, and training as needed.
- Participate in capacity planning and scalability assessments to ensure system resources align with both current and future requirements.
- Work closely with vendors and third-party service providers to manage hardware and software procurement, maintenance, and support contracts.
- Undertake any additional responsibilities related to the procurement, updating, and maintenance of the HPC cluster and network infrastructure.
- A Bachelor's degree or higher in Computer Science, Information Technology, or a related field. Relevant certifications such as Red Hat Certified Engineer (RHCE) are a plus.
- Familiarity with artificial intelligence (AI) and machine learning is not mandatory but would be advantageous.
- Expertise in CPU/GPU architectures, BeeGFS storage systems, and Infiniband interconnects is highly desirable.
- Proven experience as a System Administrator, with a strong focus on Red Hat Linux and UNIX environments.
- Proficiency in shell scripting and automation to optimize system administration workflows.
- In-depth understanding of networking concepts, protocols, and troubleshooting techniques in a mixed-platform environment.
- Experience with Linux Job Schedulers such as SLURM, PBS, or similar is critical and preferred.
- Familiarity with system monitoring and management tools, including Nagios, Zabbix, and Ansible.
- Strong analytical and problem-solving skills, with the ability to diagnose and resolve complex technical issues efficiently.
- Excellent communication skills, both written and verbal, to collaborate effectively with technical and non-technical stakeholders.
- Meticulous attention to detail, with a commitment to maintaining comprehensive documentation and accurate records.
- Adaptability and teamwork skills, with the ability to work efficiently in a dynamic, evolving technological landscape.
Accept ClosePress Tab to Move to Skip to Content LinkSearch JobsJob DescriptionJob Title: Research Assistant (System Admin), Yang Zhang LabPosting Start Date: 20/03/2025Job Description:Job DescriptionProf Yang Zhang's Lab at the School of Computing, National University of Singapore (NUS) is seeking a skilled and adaptable System Administrator with expertise in Red Hat Linux and UNIX environments. The primary responsibility of this role is to oversee the construction and management of a High-Performance Computing (HPC) cluster comprising 4,500 CPU/GPU cores, BeeGFS storage, and Infiniband interconnects. The ideal candidate will play a critical role in ensuring the stability, security, and optimal performance of the system through proficient management and troubleshooting of the HPC infrastructure.Role & Responsibilities
- Oversee the management, monitoring, and maintenance of the HPC cluster and online server systems operating on Red Hat Linux and UNIX platforms.
- Perform routine system administration tasks, including user management, access control, file system maintenance, and system backups.
- Continuously monitor system performance and resource utilization, proactively identifying and resolving bottlenecks to maintain optimal responsiveness.
- Troubleshoot and resolve hardware, software, and network issues, collaborating with cross-functional teams when necessary.
- Implement and enforce security protocols to protect systems from unauthorized access, vulnerabilities, and cyber threats.
- Plan and execute system patches, updates, and upgrades, ensuring a secure and up-to-date computing environment.
- Investigate and respond to system alerts and incidents, performing root cause analysis and implementing preventive measures.
- Maintain comprehensive documentation of system configurations, procedures, and troubleshooting steps for internal reference and knowledge sharing.
- Provide technical support to end-users, assisting with system-related inquiries, issue resolution, and training as needed.
- Participate in capacity planning and scalability assessments to ensure system resources align with both current and future requirements.
- Work closely with vendors and third-party service providers to manage hardware and software procurement, maintenance, and support contracts.
- Undertake any additional responsibilities related to the procurement, updating, and maintenance of the HPC cluster and network infrastructure.
- A Bachelor's degree or higher in Computer Science, Information Technology, or a related field. Relevant certifications such as Red Hat Certified Engineer (RHCE) are a plus.
- Familiarity with artificial intelligence (AI) and machine learning is not mandatory but would be advantageous.
- Expertise in CPU/GPU architectures, BeeGFS storage systems, and Infiniband interconnects is highly desirable.
- Proven experience as a System Administrator, with a strong focus on Red Hat Linux and UNIX environments.
- Proficiency in shell scripting and automation to optimize system administration workflows.
- In-depth understanding of networking concepts, protocols, and troubleshooting techniques in a mixed-platform environment.
- Experience with Linux Job Schedulers such as SLURM, PBS, or similar is critical and preferred.
- Familiarity with system monitoring and management tools, including Nagios, Zabbix, and Ansible.
- Strong analytical and problem-solving skills, with the ability to diagnose and resolve complex technical issues efficiently.
- Excellent communication skills, both written and verbal, to collaborate effectively with technical and non-technical stakeholders.
- Meticulous attention to detail, with a commitment to maintaining comprehensive documentation and accurate records.
- Adaptability and teamwork skills, with the ability to work efficiently in a dynamic, evolving technological landscape.