Jobless Developer
zensors logo
zensors

Posted 1 month ago

Open

AI/ML Infrastructure Engineer

San FranciscoOn-siteFull-time

AI Summary

ML Infrastructure Engineer focuses on accelerating training and inference for computer vision models, building efficient operators, and optimizing the end-to-end video analytics pipeline across server and edge architectures.

About this role

The AI Infrastructure team at Zensors builds the engine that powers our visual sensing platform. We provide the tools to automate the lifecycle of our AI workflow, including model development, evaluation, optimization, deployment, and monitoring across thousands of video streams.

As a Machine Learning Engineer in ML Runtime & Optimization, you will develop technologies to accelerate the training and inference of computer vision models that power smart spaces and cities.

Your responsibilities will include:

  • Optimizing Core ML Pipelines: Identifying key bottlenecks in our current video analytics pipeline and performing in-depth analysis to ensure the best possible performance on current server and edge compute architectures.

  • Cross-Stack Collaboration: Collaborating closely with AI research and platform engineering teams to optimize core parallel algorithms and influence the design of our next-generation inference infrastructure.

  • Model Acceleration: Applying advanced model optimization techniques—such as quantization (Int8/FP16), pruning, and layer fusion—to our Vision Transformers (ViTs) and CNNs to maximize throughput and minimize latency.

  • Building Efficient Operators: Working across the entire ML framework/compiler stack (e.g., PyTorch, CUDA, TensorRT, and NVIDIA DeepStream) to write custom optimized ML operator libraries.

  • Resource Efficiency: Reducing the compute cost per video stream to enable massive scalability of our SaaS product.

  • Data Management: Building, improving, maintaining, and operating systems to facilitate the collection, labeling, and use of visual data for ML training.

Requirements

  • BS/MS or Ph.D. in Computer Science, Electrical Engineering, or a related discipline.

  • Strong programming skills in C/C++ and Python.

  • Experience with model optimization, quantization, and efficient deep learning techniques (e.g., knowledge distillation, pruning).

  • Deep understanding of GPU hardware performance, including execution models, thread hierarchy, memory/cache management, and the cost/performance trade-offs of video processing.

  • Experience with profiling and benchmarking tools (e.g., Nsight Systems, Nsight Compute) to validate performance on complex architectures.

  • Experience identifying and resolving compute and data flow bottlenecks, particularly in high-bandwidth video processing pipelines.

  • Strong communication skills and the ability to work cross-functionally between research and infrastructure teams.

Preferred Qualifications

  • Familiarity with database systems (e.g., SQL, Neo4j).

  • Work in Computer Vision, Deep Learning, and Vision Transformers.

  • Experience with video processing frameworks such as NVIDIA DeepStream, DALI, or FFmpeg.

  • Familiarity with ML compilers (e.g., TVM, MLIR) or inference engines like TensorRT or ONNX Runtime.

  • Knowledge of distributed training systems or cloud-scale inference serving (e.g., Triton Inference Server).

Skills

C++Cloud-scale Inference ServingCUDADALIDistributed Training SystemsFFmpegGPU Performance AnalysisHigh-bandwidth Video ProcessingKnowledge DistillationLayer FusionMLIRNsight ComputeNsight SystemsNVIDIA DeepStreamONNX RuntimeProfiling And BenchmarkingPruningPythonPyTorchQuantizationTensorRTTriton Inference ServerTVMVideo Processing Pipelines

Explore related jobs

Browse these categories