Senior Site Reliability Engineer (SRE)
Makkah, Makkah Province, Saudi ArabiaOn-siteFull-time
AI Summary
Senior SRE responsible for leading reliability initiatives, incident response, performance tuning, and collaboration with engineering teams to build resilient, production-ready systems.
About this role
As a Senior SRE at Salla, you will lead reliability initiatives, handle complex incidents, improve platform performance, and guide engineering teams toward building resilient systems. You will also participate in the on-call rotation as part of our commitment to platform reliability.
Reliability & Incident Management
- Lead high-severity incident response and drive post-incident reviews.
- Troubleshoot complex issues across applications, infrastructure, and networks.
- Improve MTTR through better monitoring, alerts, and diagnostic tooling.
- Participate in the on-call rotation supporting production systems.
Performance & Scalability
- Identify and resolve performance bottlenecks and scaling challenges.
- Conduct load testing and capacity planning for high-traffic scenarios.
Infrastructure & Operations
- Enhance cloud-native infrastructure, deployment processes, and automation.
- Improve resilience, fault-tolerance, and recovery mechanisms across systems.
Observability
- Build and refine dashboards, alerts, metrics, logs, and traces.
- Define SLIs/SLOs and improve visibility into system behavior.
Tooling & Automation
- Develop tools that reduce operational toil and increase reliability.
- Contribute to infrastructure-as-code, CI/CD pipelines, and GitOps workflows.
Collaboration
- Work closely with engineering teams to ensure services are robust and production-ready.
- Mentor engineers on reliability, debugging, and operational best practices.
Bonus Skills
- Background in large-scale, high-traffic systems.
- Experience with fault-tolerant design, DR, and HA patterns.
- Familiarity with SLOs, SLIs, and error budgets.
Location Preference
- Candidates located within GMT 0 to +6 time zones are preferred to align with team collaboration and on-call coverage.
Requirements
- Strong experience with **Kubernetes **, ** service mesh technologies , and cloud platforms ( AWS, GCP, or Azure **).
- Deep understanding of **Linux **, ** networking **, ** distributed systems **, and ** load balancing **.
- Hands-on experience with Terraform or similar Infrastructure-as-Code tools.
- Experience with observability platforms such as **Prometheus, Grafana, Loki, Mimir, Elastic **, or equivalent.
- Proficiency in scripting or programming languages such as **Bash, Python, or Go **.
- Experience with CI/CD pipelines and ** GitOps** practices.
- Strong debugging, incident response, and performance analysis skills.
Skills
AWSAzureBashCI/CD PipelinesDistributed SystemsElasticGCPGitOpsGOGrafanaInfrastructure As CodeKubernetesLinuxLoad BalancingLokiMimirNetworkingPrometheusPythonService Mesh TechnologiesTerraform
Explore related jobs
More jobs at Salla
- Senior Backend Engineer - TypescriptMakkah, Makkah Province
- Sales ManagerJeddah, Makkah Province
- E-Commerce Talent Curator ( Tamheer program )Makkah, Makkah Province
- Accountant (Tamheer Program)Mecca, Makkah Province
- Financial Reporting SpecialistMakkah, Makkah Province
- Talent Acquisition Senior ManagerJeddah, Makkah Province