Lead AI Platform
AI Summary
Lead AI Platform Engineer focuses on bridging AI workloads with production-grade infrastructure, optimizing GPU utilization, and deploying scalable AI systems using NVIDIA tools.
About this role
Integrant is looking for game changers to join our team as " Lead AI Platform".
The Lead AI Platform Engineer is responsible for bridging AI workloads with production-grade infrastructure, with a strong focus on NVIDIA AI stack, enabling high-performance, scalable, and optimized AI systems.
This role focuses on model optimization, runtime efficiency, and GPU utilization, ensuring that AI workloads are production-ready, cost-efficient, and performant across enterprise environments.
Roles and Responsibilities:
-
Translate AI/ML workloads into optimized infrastructure and deployment strategies
-
Optimize model performance across GPU environments (latency, throughput, memory utilization)
-
Design and implement inference and training pipelines using NVIDIA stack tools (TensorRT, Triton, NIM)
-
Convert and optimize models across frameworks (PyTorch → ONNX → TensorRT)
-
Analyze and resolve performance bottlenecks using profiling tools (GPU, memory, network)
-
Improve GPU utilization and scheduling efficiency across clusters
-
Design scalable distributed training and inference architectures
-
Work closely with customers to define AI infrastructure strategies and deployment models
-
Support production deployments including monitoring, rollback, and performance validation
-
Conduct applied research to improve model efficiency and infrastructure utilization
-
Mentor team members on AI infrastructure, optimization, and GPU systems
-
Experiment tracking tools (MLflow, W&B, Neptune) log parameters, metrics, and artifacts for comparison
-
Find the Model degradation happens post-deployment: concept drift, data pipeline changes, traffic pattern shifts
-
Root cause analysis (RCA) applies to ML systems: isolating variables, reproducing issues
Requirements
- 8+ years of experience in AI systems
- 8+ years of experience in ML systems, HPC and AI infrastructure
- Strong proficiency in Python
- Strong experience with GPU-based AI workloads and performance optimization
- Deep understanding of model optimization techniques (quantization, pruning, batching)
- Hands-on experience with:
- PyTorch
- ONNX / ONNX Runtime
- TensorRT / TensorRT-LLM
- Triton Inference Server
- Knowledge of CUDA, cuDNN, and GPU architecture fundamentals
- Experience with distributed systems (multi-GPU / multi-node)
- Familiarity with:
- NCCL communication
- NVLink / InfiniBand
- Kubernetes or Slurm for orchestration
- Experience deploying AI models into production environments
- Ability to analyze system bottlenecks (compute, memory, network)
- Experience with profiling tools (Nsight, TensorRT profiler, etc.)
- Knowledge of cost optimization strategies for GPU workloads
- Experiment tracking tools (MLflow, W&B, Neptune) log parameters, metrics, and artifacts for comparison
- Find the Model degradation happens post-deployment: concept drift, data pipeline changes, traffic pattern shifts
- Root cause analysis (RCA) applies to ML systems: isolating variables, reproducing issues
Nice to Have
- Experience with NVIDIA NIM and NGC ecosystem
- Exposure to Megatron-LM, NeMo, or large-scale LLM training/inference
- Experience with LLM optimization techniques (KV cache, batching strategies)
- Familiarity with MLOps practices and CI/CD for AI systems
- Experience in customer-facing architecture or consulting roles
- Familiarity with hybrid cloud / on-prem HPC environments
Benefits
- Salary paid in USD
- Six-month career advancing opportunities
- Supportive and friendly work environment
- Premium medical insurance [employee +family]
- English language development courses
- Interest-free loans paid over 2.5 years
- Technical development courses
- Planned overtime program (POP)
- Employment referral program
- Premium location in Maadi
- Social insurance
Skills
Explore related jobs
More jobs at Integrant
- Principal Storage EngineerCairo, Cairo Governorate
- Head of Marketing - Content Strategy/Content StrategistCairo, Cairo Governorate
- Lead Software Engineer - UI/AngularMaadi, Giza
- Senior Lead Software Developer in Test (SDET)Cairo, Cairo Governorate
- Senior Data EngineerMaadi, Al Qāhirah
- Senior Lead SysOps/Devops EngineerCairo, Cairo Governorate
Similar CI/CD For AI Systems jobs
Jobs in Cairo
Regulatory Affairs Officer – MENAdLocal · Cairo
Activation LeadKlivvr · Cairo- Implementation Specialist - Spanish, French or ItalianCommvault · Cairo, Egypt
National Sales Manager (Fire and Life Safety products, Egypt)Ajax Systems · Cairo- On Premise Musketeer – GizaRed Bull · Cairo, Cairo Governorate
- Qualitative Senior Research ExecutiveNielseniq · Cairo, C