Epicareer Might not Working Properly
Learn More

Technical Operations Lead (AWS/Cloud)

Salary undisclosed

Checking job availability...

Original
Simplified

Responsibilities:

  • Accountable for day-to-day operational activities in systems, to ensure optimum system performance and determine the system strategy for business continuity.
  • Review, implement and uphold the effective implementation of IT policies and operations protocols.
  • Lead a team of operational engineers to work with outsourced service providers and stakeholders in delivering operational objectives e.g. data conveyance, tools updates and maintenance, troubleshooting system issues, .
  • Review and plan for systems integration efforts to ensure the system is built in an efficient and robust manner towards a high level of maintainability, reliability and availability based on established guidelines and best practices.
  • Monitor and ensure that ATFM and Service Desk operations are functional, assist and triage issues.
  • Review and ensuring backup solution for data and system are in place
  • Responsible for ensuring data pipeline in operational state and ensuring data integrity and accessibility.
  • Manage application/platform and security incidents, work with various internal teams and vendors to resolve issues on a timely basis to meet SLA, escalating to higher management if necessary. Reporting of incidents, short- and long-term incident resolution plans at appropriate forums.
  • Accountable for maintaining and tracking of asset inventory management to ensure all assets are accounted for.
  • Ensure that Standard Operation Processes (SOPs) are properly documented and complies with audit requirements.

Requirements:

  • Formal AWS Certification.
  • Substantial technical experience in AWS [direct/related work ]
  • Degree in Computer Science/Engineering, Information Technology, or in relevant disciplines.
  • At least 6 years of working experience in Cloud-based services, IT operations and vendor management
  • Proactive and dedicated individual with strong leadership, and multi-tasking capabilities
  • Ability to build and maintain relationships with a wide array of people at both junior and senior levels
  • Experience in running incident, problem and change management processes.
  • Implementing processes as per ITIL framework. – incident, problem and change management, service transition.
  • Familiarity with security and access control measures to control privileged access to test and production environment.
  • Experience in networking technologies such as WAN, LAN, Network Security, Firewall rules, Load Balancers, VPNs and DNS.
  • Knowledge of disaster recovery, system backup and restore
  • Experience with cloud-based services (e.g. AWS including including Redshift, EMR, QuickSight, Lambda, Glue, etc) and project management tools (e.g. Atlassian, JIRA) are an added advantage
  • Infrastructure as Code (IaC): Familiarity with Infrastructure as Code tools (e.g., Terraform, CloudFormation) for managing AWS resources.
  • Cost Management: Ability to optimize costs and manage AWS budgets effectively.
  • Database and Data Warehousing: Knowledge of database management systems, data warehousing concepts, and SQL.
  • Monitoring and Troubleshooting: Proficiency in monitoring tools (e.g., CloudWatch) and troubleshooting issues in AWS environments.
  • Candidates with work experience on any of the following will be considered favorably:
  • Experience with setting up and or running operations for research projects, and machine learning platforms will be an advantage.
  • Experienced in and ability to navigate the Public Agency IT/System environment is preferred.
  • Experience in data analytics systems is preferred.
  • Experience with working in Government Commercial Cloud (GCC) environment and familiar with relevant change control framework will be an advantage
  • Understanding of DevOps principles and practices.

Responsibilities:

  • Accountable for day-to-day operational activities in systems, to ensure optimum system performance and determine the system strategy for business continuity.
  • Review, implement and uphold the effective implementation of IT policies and operations protocols.
  • Lead a team of operational engineers to work with outsourced service providers and stakeholders in delivering operational objectives e.g. data conveyance, tools updates and maintenance, troubleshooting system issues, .
  • Review and plan for systems integration efforts to ensure the system is built in an efficient and robust manner towards a high level of maintainability, reliability and availability based on established guidelines and best practices.
  • Monitor and ensure that ATFM and Service Desk operations are functional, assist and triage issues.
  • Review and ensuring backup solution for data and system are in place
  • Responsible for ensuring data pipeline in operational state and ensuring data integrity and accessibility.
  • Manage application/platform and security incidents, work with various internal teams and vendors to resolve issues on a timely basis to meet SLA, escalating to higher management if necessary. Reporting of incidents, short- and long-term incident resolution plans at appropriate forums.
  • Accountable for maintaining and tracking of asset inventory management to ensure all assets are accounted for.
  • Ensure that Standard Operation Processes (SOPs) are properly documented and complies with audit requirements.

Requirements:

  • Formal AWS Certification.
  • Substantial technical experience in AWS [direct/related work ]
  • Degree in Computer Science/Engineering, Information Technology, or in relevant disciplines.
  • At least 6 years of working experience in Cloud-based services, IT operations and vendor management
  • Proactive and dedicated individual with strong leadership, and multi-tasking capabilities
  • Ability to build and maintain relationships with a wide array of people at both junior and senior levels
  • Experience in running incident, problem and change management processes.
  • Implementing processes as per ITIL framework. – incident, problem and change management, service transition.
  • Familiarity with security and access control measures to control privileged access to test and production environment.
  • Experience in networking technologies such as WAN, LAN, Network Security, Firewall rules, Load Balancers, VPNs and DNS.
  • Knowledge of disaster recovery, system backup and restore
  • Experience with cloud-based services (e.g. AWS including including Redshift, EMR, QuickSight, Lambda, Glue, etc) and project management tools (e.g. Atlassian, JIRA) are an added advantage
  • Infrastructure as Code (IaC): Familiarity with Infrastructure as Code tools (e.g., Terraform, CloudFormation) for managing AWS resources.
  • Cost Management: Ability to optimize costs and manage AWS budgets effectively.
  • Database and Data Warehousing: Knowledge of database management systems, data warehousing concepts, and SQL.
  • Monitoring and Troubleshooting: Proficiency in monitoring tools (e.g., CloudWatch) and troubleshooting issues in AWS environments.
  • Candidates with work experience on any of the following will be considered favorably:
  • Experience with setting up and or running operations for research projects, and machine learning platforms will be an advantage.
  • Experienced in and ability to navigate the Public Agency IT/System environment is preferred.
  • Experience in data analytics systems is preferred.
  • Experience with working in Government Commercial Cloud (GCC) environment and familiar with relevant change control framework will be an advantage
  • Understanding of DevOps principles and practices.