Helix AI Engineer, Training Infrastructure
AI Summary
Training Infrastructure Engineer who designs, deploys, and maintains large-scale deep learning training clusters and tooling for AI researchers.
About this role
Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home and commercial markets. Figure is headquartered in San Jose, CA.
Figure’s vision is to deploy autonomous humanoids at a global scale. Our Helix team is looking for an experienced Training Infrastructure Engineer, to take our infrastructure to the next level. This role is focused on managing the training cluster, implementing distributed training algorithms, data loaders, and developer tools for AI researchers. The ideal candidate has experience building tools and infrastructure for a large-scale deep learning system.
Responsibilities
- Design, deploy, and maintain Figure's training clusters
- Architect and maintain scalable deep learning frameworks for training on massive robot datasets
- Work together with AI researchers to implement training of new model architectures at a large scale
- Implement distributed training and parallelization strategies to reduce model development cycles
- Implement tooling for data processing, model experimentation, and continuous integration
Requirements
- Strong software engineering fundamentals
- Bachelor's or Master's degree in Computer Science, Robotics, Engineering, or a related field
- Experience with Python and PyTorch
- Experience managing HPC clusters for deep neural network training
- Minimum of 4 years of professional, full-time experience building reliable backend systems
Bonus Qualifications
- Experience managing cloud infrastructure (AWS, Azure, GCP)
- Experience with job scheduling / orchestration tools (SLURM, Kubernetes, LSF, etc.)
- Experience with configuration management tools (Ansible, Terraform, Puppet, Chef, etc.)
The US base salary range for this full-time position is between $150,000 - $350,000 annually.
The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.
Skills
Explore related jobs
More jobs at Otta
- Electrical Engineer, Actuator SystemsSan Jose, CA
- Accounts Payable SpecialistSan Jose, CA
- Mechanical Engineer - Hands (Compliant Elements)San Jose, CA
- Helix AI Engineer, AndroidSan Jose, CA
- Helix AI Engineer, Backend InfrastructureSan Jose, CA
- Field Service Technician - Commercial Site TeamLos Angeles, CA
Similar Ansible jobs
Jobs in San Jose
- Director, External Reporting and Technical Accounting- ContractorAlign Technology · US-California-San Jose
Independent Living Skills Direct Service ProviderAbilitypath · San Jose, Canada- Day Center SupervisorWelbeHealth · San Jose, CA
- Senior Creative Engineer (Production)Critical Mass · San Jose, Costa Rica
- Automotive TechnicianStress Free Auto Care · San Jose, California
- Senior Creative EngineerCritical Mass · San Jose, Costa Rica