
Posted 10 months ago
High Performance Computing Software Engineer - Supercomputing
Abu DhabiOn-siteFull-time
AI Summary
High Performance Computing Software Engineer to design and operate software for large-scale AI training workloads (1000+ GPUs), optimize kernel-level components, and support ML frameworks in a research computing environment.
About this role
About the Institute of Foundation Models
We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.
As part of our team, you’ll have the opportunity to work on the core of cutting-edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem-solving skills will be instrumental in establishing MBZUAI as a global hub for high-performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers.
The Role
IFM is building the foundational compute infrastructure that will power tomorrow’s breakthroughs in AI and computational science. We’re looking for a High Performance Computing Software Engineer to help us design, develop, and operate the software systems that run our large-scale AI workloads.
In this role, you’ll work at the intersection of high-performance computing and machine learning. You’ll be part of a team responsible for crafting the software stack that enables training of cutting-edge ML models—spanning 1000+ GPUs—and ensuring our infrastructure is robust, performant, and developer-friendly.
Job Responsibilities
Skills & Experience
Skills
DeepSpeedGPU Kernel DevelopmentJAXKubernetesLibfabricLinux Kernel InternalsMegatron-LMMegatronLMMPINCCLPyTorchPyxisRCCLRDMARDMA-based SystemsSharpSlurmTensorFlowUCX
Explore related jobs
More jobs at Institute of Foundation Models
Similar DeepSpeed jobs
Jobs in Abu Dhabi
- VIP Service AgentEtihad Airways · Abu Dhabi, United Arab Emirates
MasonEgis Group · Al Ain, Abu Dhabi
Senior Data Analyst - trade.fibybit · Abu Dhabi, UAE
Senior Data Analyst - Loan Product & Customerbybit · Abu Dhabi, UAE- Strategy - Associate Principal - Long Term Asset Allocation (UAE National)Abu Dhabi Investment Council Company · Abu Dhabi, Abu Dhabi
- IT Support - Enterprise MonitoringGSSTech Group · Abu Dhabi, Abu Dhabi