Jobless Developer
gradion logo
gradion

Posted 7 days ago

Open

Site Reliability Engineer - Senior

Ho Chi Minh CityHybrid

AI Summary

ABOUT US: At Gradion, we are the strategic partner for ambitious businesses, helping them achieve breakthrough growth through Digital Innovation and Deep Tech. With a global vision and an AI-first approach, we enable clients to reshape strategies, optimize systems, and adopt cutting-edge technologies to create sustainable value.

About this role

ABOUT US:
At Gradion, we are the strategic partner for ambitious businesses, helping them achieve breakthrough growth through Digital Innovation and Deep Tech.
With a global vision and an AI-first approach, we enable clients to reshape strategies, optimize systems, and adopt cutting-edge technologies to create sustainable value.
From AI and data to cybersecurity, robotics, and large-scale enterprise platforms, Gradion designs practical solutions that lay the foundation for the next generation of billion-dollar companies.
OUR FACTS & FIGURES:
- 23+ years of expertise - Gradion builds digital platforms & deep-tech solutions.
- 3 continents: Asia, Europe and Africa.
- 300+ specialists across 7 countries Vietnam, Singapore, Thailand, Saudi Arabia, Germany, Egypt and Australia.
- 100+ enterprise clients, including several unicorns (e.g., Alaiko, HomeToGo, Roadsurfer).
- Vietnam’s Best IT Company - recognized by ITViec for 8 consecutive years, including 2 consecutive years of ranking #1 (2024 and 2025).
- ISO 27001.

About the Role

  • Gradion is expanding its SRE team for the a client with a long-term managed services contract running through 2028. You will be part of a global, follow-the-sun SRE function, responsible for platform stability, cloud infrastructure, and 24/7 incident response across European and global client time zones.

  • This role suits engineers who are technically solid, self-directed, and comfortable operating in a fast-moving, internationally distributed environment. You will go through a structured onboarding alongside commercetools' internal SRE team before taking on independent operational responsibility.

What You Will Do

  • Own platform availability: monitor, triage, and resolve incidents within defined SLA windows

  • Manage cloud infrastructure on AWS and/or GCP - provisioning, scaling, and day-to-day operations

  • Maintain and improve CI/CD pipelines and GitOps workflows

  • Operate observability systems: monitoring, logging, and alerting at production scale

  • Participate in on-call rotation as part of the global follow-the-sun coverage model

  • Configure, deploy, and manage AI tooling and MCP servers in production environments

  • Contribute to infrastructure automation, scripting, and internal tooling

  • Write clear post-incident reviews and contribute to the monthly operational report

  • Collaborate closely with engineering teams across multiple time zones

What You Bring

  • 4+ years in a DevOps / SRE / Platform Engineering role within an international team

  • Solid Kubernetes knowledge - cluster operations, troubleshooting, and configuration

  • Hands-on cloud experience with AWS and/or GCP

  • Good understanding of networking fundamentals - DNS, load balancing, firewalls, VPC

  • Scripting and automation skills (Python, Bash, or similar)

  • Experience with CI/CD tools and GitOps-based delivery

  • Working knowledge of monitoring and observability systems (Prometheus, ELK, or equivalent)

  • Fluent English (C1 minimum) - daily communication with European stakeholders is a core requirement

  • Self-directed and proactive - you ask the right questions and drive issues to resolution without waiting to be told

Nice to Have

  • Experience configuring and managing MCP servers and AI tooling in production

  • Exposure to AI enablement workflows or LLM infrastructure

  • Background supporting eCommerce or SaaS platforms

  • Familiarity with the Frontastic / commercetools Frontend ecosystem

Explore related jobs

Browse these categories