Epicareer Might not Working Properly
Learn More

HPC System Administrator

Salary undisclosed

Checking job availability...

Original
Simplified

Job Description: HPC System Administrator

Position Summary: We are seeking a skilled HPC System Administrator to manage and maintain high-performance computing (HPC) systems. The ideal candidate will be responsible for system administration, user support, software integration, and collaboration with research teams to optimize computational workflows.

Key Responsibilities:

1. HPC System Management and Maintenance

•Install, configure, integrate, and maintain high-performance compute clusters and associated hardware

•Monitor system performance, troubleshoot issues, and ensure security compliance

•Process and document change management procedures

2. User Support and Consultation

•Assist users with computational jobs and optimize workflows for efficient resource utilization

•Provide training sessions and resolve user issues related to HPC environments

3. Software and Application Support

•Install, configure, and maintain scientific and engineering HPC software solutions

•Support software development for parallel computing and performance optimization

4. Collaboration with Research Teams

•Understand research project requirements and recommend appropriate HPC solutions

•Assist in designing and optimizing computational workflows for researchers

5. Resource Allocation and Scheduling

•Manage resource allocation and job scheduling within the HPC environment

•Implement policies for job queuing, resource limits, and workload balancing

•Enforce operational best practices and implementation plans

6. System and Network Optimization

•Configure and maintain high-speed networks for optimal data transfer within the HPC infrastructure

•Conduct performance benchmarking and optimization efforts

7. Documentation and Reporting

•Maintain detailed system documentation, configuration guides, and user manuals

•Generate reports on system performance, resource utilization, and operational efficiency

Qualifications and Skills:

Strong experience with HPC system administration, Linux-based environments, and cluster management tools.

Proficiency in job scheduling and resource management frameworks (e.g., Slurm, PBS, Grid Engine).

Hands-on experience with networking protocols, security policies, and data transfer optimizations.

Familiarity with scientific computing software and parallel programming techniques.

Ability to troubleshoot complex system and application issues effectively.

Strong communication skills to collaborate with researchers and support teams.

Share your resume at aishwarya.revankar@reve.cloud

Job Description: HPC System Administrator

Position Summary: We are seeking a skilled HPC System Administrator to manage and maintain high-performance computing (HPC) systems. The ideal candidate will be responsible for system administration, user support, software integration, and collaboration with research teams to optimize computational workflows.

Key Responsibilities:

1. HPC System Management and Maintenance

•Install, configure, integrate, and maintain high-performance compute clusters and associated hardware

•Monitor system performance, troubleshoot issues, and ensure security compliance

•Process and document change management procedures

2. User Support and Consultation

•Assist users with computational jobs and optimize workflows for efficient resource utilization

•Provide training sessions and resolve user issues related to HPC environments

3. Software and Application Support

•Install, configure, and maintain scientific and engineering HPC software solutions

•Support software development for parallel computing and performance optimization

4. Collaboration with Research Teams

•Understand research project requirements and recommend appropriate HPC solutions

•Assist in designing and optimizing computational workflows for researchers

5. Resource Allocation and Scheduling

•Manage resource allocation and job scheduling within the HPC environment

•Implement policies for job queuing, resource limits, and workload balancing

•Enforce operational best practices and implementation plans

6. System and Network Optimization

•Configure and maintain high-speed networks for optimal data transfer within the HPC infrastructure

•Conduct performance benchmarking and optimization efforts

7. Documentation and Reporting

•Maintain detailed system documentation, configuration guides, and user manuals

•Generate reports on system performance, resource utilization, and operational efficiency

Qualifications and Skills:

Strong experience with HPC system administration, Linux-based environments, and cluster management tools.

Proficiency in job scheduling and resource management frameworks (e.g., Slurm, PBS, Grid Engine).

Hands-on experience with networking protocols, security policies, and data transfer optimizations.

Familiarity with scientific computing software and parallel programming techniques.

Ability to troubleshoot complex system and application issues effectively.

Strong communication skills to collaborate with researchers and support teams.

Share your resume at aishwarya.revankar@reve.cloud