Epicareer Might not Working Properly
Learn More

Cloud Infrastructure Engineer (SRE)

Salary undisclosed

Checking job availability...

Original
Simplified

Description

In Digital Resiliency Engineering (DRE), we combine software and systems engineering to build and operate large-scale and distributed systems designed and/or built by the Singapore Government. We ensure Government services are reliable, meets expected performance and satisfy customer needs.

If you are someone with strong DevOps, Infrastructure engineering and/or SRE background, have experience operating mission critical production technology infrastructure at scale, and are looking for opportunities to work with a team of practitioners and leading industry experts, we welcome you to join us.

In this role, you will build central services for observability and automation of infrastructure services. You will be part of a rotation with other engineers in providing rapid response to major incidents impacting critical Government Services. You will provide technical leadership for the team and work closely with technical leads to operate highly available solutions. You will also provide guidance to other team member on managing availability and performance of mission critical services, building automation and monitoring solutions to prevent problem recurrence, and building automated responses for non-exceptional service conditions.

You will also manage execution of project priorities, deadlines and deliverables. You will also lead designs of major components, systems and features to improve availability, scalability, latency and efficiency of services design and built by the Government.

Key Responsibilities:

  • Build Service Level Indicators (SLI), Service Level Objective (SLO), Error Budgets, and Post-mortem Incident processes.
  • As part of an on-call roster, ensure reliability and performance of critical Government Services. Provide operational support and engineering for large-scale and distributed systems to drive incidents resolution effectively.
  • Gather and analyse metrics and logs from Operating Systems and/or applications for capacity planning, performance tuning and fault isolation.
  • Build automation to manage services, infrastructure, and/or applications.
  • Improve reliability and quality of services using proactive monitoring.
  • Measure and optimize system performance, with continuous improvement and pushing SRE practice forward.
  • Build SRE playbook for the Whole-of-Government to leverage as reference for SRE.
  • Identify potential and emerging technologies relevant to innovation for the Government.
  • Work in a cross-functional service team consisting of software engineers, infrastructure engineers, DevOps, and other specialists.

Requirements

Qualifications:

  • 5+ years of experience in technology operations as an Infrastructure Engineer or Site Reliability Engineer - with experience operating large-scale mission critical production systems.
  • Expertise in building and operating automated monitoring and incident detection systems, creating runbooks and running incident management processes.
  • Expertise in designing automation solutions using provisioning tools, continuous integration tools (CI/CD), and scripting languages.
  • Experience leading highly complex technical projects with multiple dependencies and stakeholders
  • Knowledgeable and experienced in working within an Agile development environment, focusing on dynamic and rapid quality delivery.
  • Proficient in building and managing highly available and scalable IT infrastructure and/or application, with knowledge in Container and Virtualization technologies.
  • Proficiency in Python, PowerShell, or Ruby.
  • Proficiency with Infrastructure as Code (IaC) tools such as SaltStack, Puppet, Terraform, or Ansible.
  • Able to work independently and deliver results within specified deadlines.
  • Ability to prioritize work and strong problem-solving skills.
  • Good to have communicate skills, both verbally and in writing to users, vendors and management.
  • Ability to communicate complex interaction concepts clearly and persuasively across different audience and varies levels in GovTech.

Join us at Assurity Trusted Solutions and contribute to creating innovative solutions that secure and enhance digital infrastructure!

We thank you for your interest and please note that only shortlisted candidates will be notified.

By submitting your application, you agree that your personal data may be collected, used, and disclosed by Assurity Trusted Solutions Pte. Ltd. (ATS), GovTech, and their service providers and agents in accordance with ATS’s privacy statement found at: https://www.assurity.sg/privacy.html or such other successor site.

Benefits

  • A wholly-owned subsidiary of GovTech.
  • We promote a learning culture and encourage you to grow and learn.
  • A competency framework is in place to help you grow your career with us.
  • Annual Leave Benefits with additional perks such as Family Care and Birthday Leave.
  • Contract Staff enjoys the same benefits as Permanent Employees.