Jobless Developer
Maxana logo
Maxana

Posted 2 months ago

Open

Infrastructure Engineer

New York, New York, United StatesOn-siteFull-time

AI Summary

Infrastructure Engineer builds and maintains platform infrastructure for large-scale ML training, inference, and deployment; focuses on reliability, observability, and scalability in cloud-native environments.

About this role

Maxana is seeking an experienced Infrastructure Engineer for a confidential client — a fast-growing AI company. In this role you will build and maintain the platform layer supporting large-scale ML training, inference, and deployment. This is a high-impact role at the intersection of cloud infrastructure and ML systems.

Key Responsibilities

  • Build and maintain infrastructure supporting large-scale ML training and inference workloads
  • Work with GPU and compute infrastructure, distributed systems, and cloud-native platforms
  • Improve reliability, observability, and performance across the platform layer
  • Collaborate directly with senior engineers and product teams on architecture decisions
  • Own production reliability — monitoring, incident response, and proactive risk reduction
  • Develop and maintain internal tooling and automation to support engineering operations

Requirements

  • 5+ years of infrastructure or platform engineering experience in a production environment
  • Strong distributed systems background — experience with large-scale compute workloads preferred
  • Cloud-native infrastructure experience — AWS, GCP, or Azure; Docker and Kubernetes required
  • Familiarity with ML infrastructure a strong plus — training pipelines, inference serving, GPU workloads
  • Experience owning production reliability end to end

Benefits

  • Competitive base salary ($130,000-$240,000) + equity
  • Medical, dental, and vision
  • Flexible paid time off
  • Learning and development stipend
  • Working at the forefront of AI infrastructure at scale

Skills

AWSAzureCloud-native InfrastructureDistributed SystemsDockerGCPGPU ComputeGPU WorkloadsHigh-scale ML WorkloadsInference ServingInfrastructure AutomationKubernetesML InfrastructureMonitoring And Incident ResponseObservability ToolsProduction ReliabilityTraining Pipelines

Explore related jobs

Browse these categories