Senior Machine Learning Infrastructure Engineer
Santa ClaraOn-siteFull-time
AI Summary
Senior ML Infrastructure Engineer designs scalable ML pipelines and distributed systems for training/inference, manages GPU clusters, and collaborates with researchers to improve platform usability.
About this role
PlusAI is a Physical AI company pioneering AI-based virtual driver software for factory-built autonomous trucks. Headquartered in Silicon Valley with operations in the United States and Europe, Plus was named by Fast Company as one of the World’s Most Innovative Companies. Partners including TRATON GROUP’s Scania, MAN, and International brands, Hyundai Motor Company, Iveco Group, Bosch, and DSV are working with Plus to accelerate the deployment of next-generation autonomous trucks. If you’re ready to make a huge impact and drive the future of autonomy, Plus is looking for talented individuals to join its fast-growing teams.
As a Senior ML Infrastructure Engineer at Plus, you will design scalable architectures capable of handling petabytes of data while ensuring optimal performance for both training and inference phases. You will build robust pipelines for managing model versioning systems and experiment tracking frameworks, which are essential for maintaining reproducibility across experiments. Additionally, you will be responsible for managing large-scale GPU clusters. This role offers unparalleled opportunities—both technically and professionally—for individuals passionate about solving challenging problems using modern cloud-native technologies. Ideal candidates thrive in environments that leverage tools such as Docker containers orchestrated via Kubernetes clusters, seamlessly integrated with state-of-the-art deep learning frameworks like PyTorch or TensorFlow. If you are eager to push the boundaries of what's possible in machine learning infrastructure and contribute to cutting-edge solutions, this position is an excellent fit!
Responsibilities:
Required Skills:
Preferred Skills:
Salary Range:
Skills
AirflowC++CI/CD For MLCloud Platforms (AWS, GCP)Data PipelinesDistributed SystemsDockerExperiment TrackingGPU ClustersKubeflowKubernetesMLflowModel VersioningMonitoring/logging/alertingOn-premPrefectPythonPyTorchQMS ComplianceSQLTensorFlow
Explore related jobs
More jobs at PlusAI
Similar Airflow jobs
Jobs in Santa Clara
Mobile Dog BatherBarkbus · Santa Clara, Canada- ZCHW Team Lead (Santa Clara, Santa Cruz, San Francisco)Zócalo Health · Santa Clara, Santa Cruz or San Francisco County
- Senior Hardware Design Engineer, Ethernet SwitchingArista Networks · Santa Clara, CA
- Senior Signal Integrity / Power Integrity (SI/PI) EngineerArista Networks · Santa Clara, CA
Business Planner - Systems and Racksd-Matrix · Santa Clara- Recruiter, Business OperationsTenstorrent · Austin, Texas
