Epicareer Might not Working Properly
Learn More

GTS-Infrastructure Services, Site Reliability Engineer-Virtualization (Windows), Assistant Vice President

Salary undisclosed

Checking job availability...

Original
Simplified
Job OverviewWe are seeking a Site Reliability Engineer specialized in Windows Platform Management. The ideal candidate will play a pivotal role in managing the bank's Windows-based infrastructure, which encompasses entire server lifecycle management, from project setup to Day-2 operations, including detection and recovery of the Windows Server fleet, both on-premises and in the cloud. He/she will provide technical guidance, troubleshoot complex systems, and ensure the bank's mission-critical operations are maintained 24/7/365.Key Responsibilities
  • Review and resolve complex issues related to Windows platform in collaboration with internal stakeholders and external vendors, ensuring effective communication with management on issue status.
  • Implementing business continuity and disaster recovery and recommending improvement strategies.
  • Ensuring proper configuration, implementation, maintenance, and monitoring of servers and related components.
  • Collaborate with vendors and teams to analyze and implement new tools or technology enhancements.
  • Implement Service Improvement Plans to minimize system alerts and incidents, fostering a stable environment.
  • Maintain consistency and manageability of existing infrastructure through well-documented standards and procedures.
  • Conduct root cause analysis using various tools (e.g., Fishbone, 5 Whys) and provide detailed reports and action plans.
  • Lead knowledge sharing initiatives, including technical document creation and internal knowledge sharing sessions.
  • Perform in-depth product troubleshooting and remediation as necessary.
  • Automate tasks and processes by developing PowerShell scripts or utilizing other scripting languages.
  • Mentor new engineers and develop intermediate-level competence in support topics.
  • Collaborate across teams such to address cross-product technical issues.
Key Focus Areas
  • Windows System Administration and Configuration
  • Microsoft Failover Clustering
  • Storage Technologies and Concepts
  • Server Image Engineering and Hardening
  • Resource Performance Analysis
  • Automation and Scripting
QualificationsQualifications
  • Education: Bachelor's degree in computer science, Information Technology, or a related field.
  • Experience:
  • Minimum 5 years of experience in Windows system management and troubleshooting experience.
  • Hands-on experience with virtualization technologies (Hyper-V, VMWare and Huawei DCS).
  • Experience with automation tools such as Ansible and integration with Jenkin
  • Certification:
  • Relevant certifications in Azure, Microsoft, Huawei and VMware are preferred.
  • Knowledge and Skills:
  • Windows System Administration.
  • Virtualization technologies particularly VMware, Hyper-V or OpenStack (minimum 1 technology).
  • Microsoft Failover Clustering and storage technologies.
  • Excellent teamwork, interpersonal, and communication skills, with a strong sense of responsibility and attention to detail.
  • Ability to perform situational analysis and make informed decisions.
This role does not require shift work, but staff activation may be necessary during incidents.If you are a passionate and experienced SRE looking to make a significant impact in a dynamic environment, we encourage you to apply and join our team.
Job OverviewWe are seeking a Site Reliability Engineer specialized in Windows Platform Management. The ideal candidate will play a pivotal role in managing the bank's Windows-based infrastructure, which encompasses entire server lifecycle management, from project setup to Day-2 operations, including detection and recovery of the Windows Server fleet, both on-premises and in the cloud. He/she will provide technical guidance, troubleshoot complex systems, and ensure the bank's mission-critical operations are maintained 24/7/365.Key Responsibilities
  • Review and resolve complex issues related to Windows platform in collaboration with internal stakeholders and external vendors, ensuring effective communication with management on issue status.
  • Implementing business continuity and disaster recovery and recommending improvement strategies.
  • Ensuring proper configuration, implementation, maintenance, and monitoring of servers and related components.
  • Collaborate with vendors and teams to analyze and implement new tools or technology enhancements.
  • Implement Service Improvement Plans to minimize system alerts and incidents, fostering a stable environment.
  • Maintain consistency and manageability of existing infrastructure through well-documented standards and procedures.
  • Conduct root cause analysis using various tools (e.g., Fishbone, 5 Whys) and provide detailed reports and action plans.
  • Lead knowledge sharing initiatives, including technical document creation and internal knowledge sharing sessions.
  • Perform in-depth product troubleshooting and remediation as necessary.
  • Automate tasks and processes by developing PowerShell scripts or utilizing other scripting languages.
  • Mentor new engineers and develop intermediate-level competence in support topics.
  • Collaborate across teams such to address cross-product technical issues.
Key Focus Areas
  • Windows System Administration and Configuration
  • Microsoft Failover Clustering
  • Storage Technologies and Concepts
  • Server Image Engineering and Hardening
  • Resource Performance Analysis
  • Automation and Scripting
QualificationsQualifications
  • Education: Bachelor's degree in computer science, Information Technology, or a related field.
  • Experience:
  • Minimum 5 years of experience in Windows system management and troubleshooting experience.
  • Hands-on experience with virtualization technologies (Hyper-V, VMWare and Huawei DCS).
  • Experience with automation tools such as Ansible and integration with Jenkin
  • Certification:
  • Relevant certifications in Azure, Microsoft, Huawei and VMware are preferred.
  • Knowledge and Skills:
  • Windows System Administration.
  • Virtualization technologies particularly VMware, Hyper-V or OpenStack (minimum 1 technology).
  • Microsoft Failover Clustering and storage technologies.
  • Excellent teamwork, interpersonal, and communication skills, with a strong sense of responsibility and attention to detail.
  • Ability to perform situational analysis and make informed decisions.
This role does not require shift work, but staff activation may be necessary during incidents.If you are a passionate and experienced SRE looking to make a significant impact in a dynamic environment, we encourage you to apply and join our team.