Jobless Developer
ZainTECH logo
ZainTECH

Posted 3 months ago

Open

Incident Manager

New CairoOn-siteFull-time

AI Summary

Incident Manager responsible for end-to-end incident management in cloud and infrastructure services, coordinating cross-functional teams and driving post-incident reviews to prevent recurrence.

About this role

The Incident Manager is responsible for overseeing the end-to-end management of incidents impacting cloud and infrastructure services, including AWS, Azure, and OCI environments.

This role ensures rapid restoration of services, effective communication with stakeholders, and continuous improvement through post-incident analysis.

Responsibilities:

  • Own and manage the full incident lifecycle from detection to closure.
  • Act as the central command point during major (P1/P2) incidents.
  • Coordinate cross-functional teams including cloud, network and Infrastructure teams as well as CSMs.
  • Ensure timely incident triage, escalation, and resolution.
  • Lead incident bridges, war rooms, and crisis calls.
  • Ensure accurate and timely communication to stakeholders and leadership.
  • Track incidents against SLAs and ensure compliance with operational targets.
  • Drive root cause analysis (RCA) and post-incident reviews (PIRs).
  • Identify recurring issues and recommend preventive and corrective actions.
  • Maintain and improve incident management processes, playbooks, and runbooks.
  • Ensure proper documentation and ticket updates in ITSM tools.
  • Support audits, reporting, and service improvement initiatives.

Requirements

  • 5+ years of experience in IT operations, cloud, or infrastructure roles.
  • 2+ years of experience in Incident or Major Incident Management.
  • ITIL Foundation or ITIL Intermediate (Incident Management) certification preferred.
  • Cloud certifications (AWS, Azure, OCI) are a plus.
  • Strong understanding of cloud platforms (AWS, Azure, OCI) and Private cloud operations.
  • Familiarity with monitoring, alerting, and logging tools.
  • Good understanding of infrastructure components (compute, storage, networking, IAM).
  • Ability to assess technical impact and prioritize incidents effectively.
  • Experience with ITSM tools (ServiceNow, Jira, Remedy, etc.).
  • Strong knowledge of ITIL Incident and Major Incident Management processes.

Skills

Cloud (AWS, Azure, OCI)Compute / Storage / Networking / IAMCrisis ManagementIncident Bridges / War RoomsIncident ManagementITIL Foundations/IntermediatesIT OperationsITSM Tools (ServiceNow, Jira, Remedy)Major Incident ManagementMonitoring, Alerting, Logging ToolsPlaybooks And RunbooksPost-Incident Reviews (PIRs)Private Cloud OperationsRoot Cause Analysis (RCA)SLA TrackingStakeholder Communication

Explore related jobs

Browse these categories