Epicareer Might not Working Properly
Learn More
A

HPC System Administrator

$ 7,000 - $ 12,000 / month

Checking job availability...

Original
Simplified

Job Description

Job Description

We are seeking a skilled HPC System Administrator to manage and maintain high-performance computing (HPC) systems. The ideal candidate will be responsible for system administration, user support, software integration, and collaboration with research teams to optimize computational workflows.

Key Responsibilities:

1. HPC System Management and Maintenance

Install, configure, integrate, and maintain high-performance compute clusters and associated hardware

Monitor system performance, troubleshoot issues, and ensure security compliance

Process and document change management procedures

2. User Support and Consultation

Assist users with computational jobs and optimize workflows for efficient resource utilization

Provide training sessions and resolve user issues related to HPC environments

3. Software and Application Support

Install, configure, and maintain scientific and engineering HPC software solutions

Support software development for parallel computing and performance optimization

4. Collaboration with Research Teams

Understand research project requirements and recommend appropriate HPC solutions

Assist in designing and optimizing computational workflows for researchers

5. Resource Allocation and Scheduling

Manage resource allocation and job scheduling within the HPC environment

Implement policies for job queuing, resource limits, and workload balancing

Enforce operational best practices and implementation plans

6. System and Network Optimization

Configure and maintain high-speed networks for optimal data transfer within the HPC infrastructure

Conduct performance benchmarking and optimization efforts

7. Documentation and Reporting

Maintain detailed system documentation, configuration guides, and user manuals

Generate reports on system performance, resource utilization, and operational efficiency.

Qualifications and Skills:

Strong experience with HPC system administration, Linux-based environments, and cluster management tools.

Proficiency in job scheduling and resource management frameworks (e.g., Slurm, PBS, Grid Engine).

Hands-on experience with networking protocols, security policies, and data transfer optimizations.

Familiarity with scientific computing software and parallel programming techniques. Ability to troubleshoot complex system and application issues effectively.

Strong communication skills to collaborate with researchers and support teams.

Job Description

Job Description

We are seeking a skilled HPC System Administrator to manage and maintain high-performance computing (HPC) systems. The ideal candidate will be responsible for system administration, user support, software integration, and collaboration with research teams to optimize computational workflows.

Key Responsibilities:

1. HPC System Management and Maintenance

Install, configure, integrate, and maintain high-performance compute clusters and associated hardware

Monitor system performance, troubleshoot issues, and ensure security compliance

Process and document change management procedures

2. User Support and Consultation

Assist users with computational jobs and optimize workflows for efficient resource utilization

Provide training sessions and resolve user issues related to HPC environments

3. Software and Application Support

Install, configure, and maintain scientific and engineering HPC software solutions

Support software development for parallel computing and performance optimization

4. Collaboration with Research Teams

Understand research project requirements and recommend appropriate HPC solutions

Assist in designing and optimizing computational workflows for researchers

5. Resource Allocation and Scheduling

Manage resource allocation and job scheduling within the HPC environment

Implement policies for job queuing, resource limits, and workload balancing

Enforce operational best practices and implementation plans

6. System and Network Optimization

Configure and maintain high-speed networks for optimal data transfer within the HPC infrastructure

Conduct performance benchmarking and optimization efforts

7. Documentation and Reporting

Maintain detailed system documentation, configuration guides, and user manuals

Generate reports on system performance, resource utilization, and operational efficiency.

Qualifications and Skills:

Strong experience with HPC system administration, Linux-based environments, and cluster management tools.

Proficiency in job scheduling and resource management frameworks (e.g., Slurm, PBS, Grid Engine).

Hands-on experience with networking protocols, security policies, and data transfer optimizations.

Familiarity with scientific computing software and parallel programming techniques. Ability to troubleshoot complex system and application issues effectively.

Strong communication skills to collaborate with researchers and support teams.