
Posted 6 days ago
DevOps/IT Engineer
AI Summary
DevOps/IT Engineer负责维护与扩展云基础设施、CI/CD流水线、容器以及AI/ML基础设施,协助实验室日常运维并支持多名工程师的日常排障。
About this role
Key Responsibilities
Cloud Infrastructure & CI/CD
· Maintain and improve the Jenkins CI/CD infrastructure.
· Scale Jenkins with on‑demand workers using AWS ECS and Terraform.
Docker & Container Management
· Maintain and evolve custom Docker images based on NVIDIA CUDA for AMD and Jetson (ARM-based).
· Improve CI/CD caching strategies to significantly reduce Docker build times.
AI/ML Infrastructure
· Maintain IaC for training AI/ML models using Terraform and SageMaker AI..
· Optionally integrate with a Dashboard for training orchestration and monitoring: Tensorboard or Weights & Biases.
Hardware & Lab Support
· Support lab operations by preparing, installing, and maintaining workstations, Jetson.
Team Support & Collaboration
· Assist engineers when blocked by DevOps, CI/CD, IT, or cloud‑related issues.
· Optional: Build small internal web dashboards or automation tools.
Required Technical Skills
· Proficiency with AWS services (EC2, S3, IAM, ECR, VPC, autoscaling).
· Good knowledge of Terraform and Infrastructure as Code methodologies.
· Hands-on experience maintaining Jenkins CI/CD pipelines.
· Experience with C++ compilation toolchains (understanding build systems, not necessarily writing C++).
· Strong Docker knowledge.
· General IT infrastructure knowledge (networking basics, system administration, Linux environments).
· Optional but valuable:
- Experience with NVIDIA Jetson boards (flashing, OS preparation, infrastructure validation).
· Fullstack experience for building internal tools.
Ideal Candidate Profile
· Comfortable with in-office collaboration (5 days/week in Campbell, CA)
· Comfortable supporting multiple engineers daily, including rapid troubleshooting.
· Strong problem-solving skills and ability to autonomously improve existing systems.
· Experience working in fast-paced R&D or robotics/AI environments (preferred).
· Ability to document processes, propose improvements, and work crossfunctionally with software teams.