Site Reliability Engineering Senior Manager
Job Description
Roles & Responsibilities
Qualifications
Requirements
Skills & Experience:
Degree in IT, Computer Science or related field
Minimum 10 years of root cause analysis (RCA) exposure & involvement leading discussions as a problem manager or incident commander
In depth understanding of Public/Private/Hybrid cloud solutions
Hands on experience with popular CI/CD tools like Jenkins, Nexus, SonarQube, Bitbucket etc.
Good exposure to logging & monitoring tools like Splunk, Dynatrace, Prometheus, Grafana, ELF/ELK
Good understanding of cloud native technologies like Containers, Kubernetes etc.
Develop & enhance production monitoring & management capabilities leveraging existing platforms & tools
In depth understanding of Incident & Problem Management functions & activities
Good understanding of Identity and access management
Software incident & problem management
Work with stakeholders & command centre in trouble shooting, escalating & solutioning critical site incidents
Proficiency in event management tools and platforms
Familiarity with ITIL (Information Technology Infrastructure Library) practices related to Incident Management, Problem Management, Change Management and Event management
Experience with AI/ML technologies and their application in incident analysis
Job Description
Roles & Responsibilities
Qualifications
Requirements
Skills & Experience:
Degree in IT, Computer Science or related field
Minimum 10 years of root cause analysis (RCA) exposure & involvement leading discussions as a problem manager or incident commander
In depth understanding of Public/Private/Hybrid cloud solutions
Hands on experience with popular CI/CD tools like Jenkins, Nexus, SonarQube, Bitbucket etc.
Good exposure to logging & monitoring tools like Splunk, Dynatrace, Prometheus, Grafana, ELF/ELK
Good understanding of cloud native technologies like Containers, Kubernetes etc.
Develop & enhance production monitoring & management capabilities leveraging existing platforms & tools
In depth understanding of Incident & Problem Management functions & activities
Good understanding of Identity and access management
Software incident & problem management
Work with stakeholders & command centre in trouble shooting, escalating & solutioning critical site incidents
Proficiency in event management tools and platforms
Familiarity with ITIL (Information Technology Infrastructure Library) practices related to Incident Management, Problem Management, Change Management and Event management
Experience with AI/ML technologies and their application in incident analysis