Epicareer Might not Working Properly
Learn More
S

Site Reliability Engineering Senior Manager

$ 3,500 - $ 4,500 / month

Checking job availability...

Original
Simplified

Job Description

Roles & Responsibilities

Qualifications

Requirements

Skills & Experience:

Degree in IT, Computer Science or related field

Minimum 10 years of root cause analysis (RCA) exposure & involvement leading discussions as a problem manager or incident commander

In depth understanding of Public/Private/Hybrid cloud solutions

Hands on experience with popular CI/CD tools like Jenkins, Nexus, SonarQube, Bitbucket etc.

Good exposure to logging & monitoring tools like Splunk, Dynatrace, Prometheus, Grafana, ELF/ELK

Good understanding of cloud native technologies like Containers, Kubernetes etc.

Develop & enhance production monitoring & management capabilities leveraging existing platforms & tools

In depth understanding of Incident & Problem Management functions & activities

Good understanding of Identity and access management

Software incident & problem management

Work with stakeholders & command centre in trouble shooting, escalating & solutioning critical site incidents

Proficiency in event management tools and platforms

Familiarity with ITIL (Information Technology Infrastructure Library) practices related to Incident Management, Problem Management, Change Management and Event management

Experience with AI/ML technologies and their application in incident analysis

Job Description

Roles & Responsibilities

Qualifications

Requirements

Skills & Experience:

Degree in IT, Computer Science or related field

Minimum 10 years of root cause analysis (RCA) exposure & involvement leading discussions as a problem manager or incident commander

In depth understanding of Public/Private/Hybrid cloud solutions

Hands on experience with popular CI/CD tools like Jenkins, Nexus, SonarQube, Bitbucket etc.

Good exposure to logging & monitoring tools like Splunk, Dynatrace, Prometheus, Grafana, ELF/ELK

Good understanding of cloud native technologies like Containers, Kubernetes etc.

Develop & enhance production monitoring & management capabilities leveraging existing platforms & tools

In depth understanding of Incident & Problem Management functions & activities

Good understanding of Identity and access management

Software incident & problem management

Work with stakeholders & command centre in trouble shooting, escalating & solutioning critical site incidents

Proficiency in event management tools and platforms

Familiarity with ITIL (Information Technology Infrastructure Library) practices related to Incident Management, Problem Management, Change Management and Event management

Experience with AI/ML technologies and their application in incident analysis