Cloud Infrastructure Engineer
AI Summary
This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Cloud Infrastructure Engineer based in the United States.
About this role
This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Cloud Infrastructure Engineer based in the United States.
This role is focused on building and operating secure, scalable, and highly available cloud-native infrastructure that powers modern software delivery across multiple engineering teams.
The position plays a key role in designing and maintaining Kubernetes-based platforms, ensuring reliability, performance, and operational excellence in production environments.
It involves managing the full lifecycle of containerized applications, from deployment pipelines and infrastructure provisioning to observability and incident response.
The role sits at the center of DevOps, SRE, and platform engineering practices, enabling teams to ship software faster and more safely.
It requires strong hands-on engineering skills combined with a deep understanding of cloud infrastructure, automation, and system reliability.
This is a high-impact opportunity to shape infrastructure standards and improve the scalability of enterprise-grade systems.
Accountabilities:
- Design, deploy, and manage production-grade Kubernetes clusters, including networking policies, RBAC, workload scheduling, and cluster security configurations.
- Build and maintain CI/CD pipelines using Infrastructure as Code and GitOps practices to ensure reliable and repeatable deployments.
- Provision and automate cloud infrastructure using tools such as Terraform or similar IaC frameworks.
- Develop and manage containerization workflows, including secure image building, versioning, and promotion across environments.
- Implement and maintain observability stacks using tools such as Prometheus, Grafana, and OpenTelemetry to ensure system health and performance visibility.
- Support performance optimization efforts including load testing, capacity planning, and system resilience validation.
- Participate in incident response, root cause analysis, and ongoing reliability engineering improvements.
- Manage and support stateful services such as databases, caching systems, and messaging platforms in production environments.
- Maintain clear and comprehensive technical documentation covering architecture, operations, and recovery procedures.
- 2–5+ years of experience in Cloud Infrastructure Engineering, DevOps, or Site Reliability Engineering roles.
- Strong hands-on experience operating Kubernetes in production environments.
- Proven experience building CI/CD pipelines and working with GitOps methodologies.
- Solid experience with Infrastructure as Code tools such as Terraform or equivalent solutions.
- Strong Linux administration and troubleshooting skills in production environments.
- Proficiency in Python or another scripting language for automation and tooling.
- Good understanding of networking concepts, Kubernetes security, and deployment strategies.
- Experience with observability tools and performance monitoring solutions.
- Familiarity with load testing, system tuning, and reliability engineering practices.
- Strong collaboration and communication skills, with the ability to work across engineering teams.
- Competitive salary range of $85,000–$100,000 depending on experience, plus bonus eligibility.
- Fully remote work within the United States with occasional travel as needed.
- Opportunity to work on modern cloud-native infrastructure at scale.
- Exposure to advanced Kubernetes, DevOps, and SRE practices in production environments.
- Collaborative engineering culture focused on reliability, automation, and continuous improvement.
- Professional growth opportunities in platform engineering and cloud architecture domains.
