Senior AI Inference Engineer - Model Optimization & Deployment
Foster City, CAOn-siteFull-time
AI Summary
Senior AI Inference Engineer focusing on model optimization and deployment for edge, including quantization, pruning, and custom CUDA kernels to run multi-modal models in vehicle SOCs.
About this role
The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence.
As a Model Optimization & Deployment Engineer, you will focus on bringing highly efficient, production-ready large-scale models to our on-vehicle stack. We are looking for experts with hands-on experience in compressing, accelerating, and deploying complex models (LLMs, VLMs, or FMs) for power- and thermal-constrained vehicle SOCs. You will optimize the ML models, write custom CUDA kernels, and build highly concurrent inference code to ensure real-time, deterministic execution on edge devices.
In this role, you will:
Qualifications:
Bonus Qualifications:
Skills
BF16C++CUDACUDA KernelsCustom ML OPsDeepSpeedEdge DeploymentFlashAttentionFP16FP4FP8INT8Latency BenchmarkingLipLoRALow-level Accelerator ProgrammingMegatron-LMModel Conversion/compilation PipelinesMulti-modal ModelsONNXPagedAttentionParity CheckingPerception AlgorithmsPTQPythonQATQLoRARayReal-time InferenceTensorRTTensorRT PluginsTorch.compileTorch DistributedVLAsVLM
Explore related jobs
More jobs at Zoox
- Senior Machine Learning Engineer - Perception Detection and TrackingFoster City, CA
- Senior Staff Vehicle Control EngineerFoster City, CA
- Senior SAP BRIM - Convergent Mediation (CM) LeadFoster City, CA
- Senior / Staff Manufacturing Engineer - Body (Body-in-White)Hayward, CA
- Senior / Staff Manufacturing Engineer - High Voltage Battery AssemblyHayward, CA
- Senior/Staff Technical Program Manager - Autonomous Test Fleet Data Strategy & Mileage AccumulationFoster City, CA