Posted 5 months ago

Runtime Engineer

TorontoOn-siteFull-time

AI Summary

A Runtime Engineer designs and builds a multi-target runtime for an AI compiler stack, implementing low-level parallelization, kernel scheduling, and performance analysis to execute efficiently across diverse hardware.

About this role

About Us

At Lemurian Labs, we're reimagining the foundations of computing to make AI accessible to everyone. Our mission is to remove the limits of scale, hardware, and cost that hold back innovation, so the people solving humanity's hardest problems can move faster.

We're building a new kind of software stack: a hardware-agnostic platform that makes every system — from a laptop to a supercomputer — feel like one seamless engine. Developers can write once, run anywhere, and get state-of-the-art performance across any chip, any cloud, at any scale. It's a complete rethink of how software and hardware interact — designed for the era beyond Moore's Law.

We're not looking for the comfortable or the conventional; we're looking for the bold. The engineers who crave frontier problems, who want to bend the limits of what's possible, who see infrastructure not as a constraint but as a canvas. If you want to build the foundation for the next era of AI and change what humanity can achieve in the process, join us.

About the Role

We're looking for a Runtime Engineer to design and build the multi-target runtime that sits at the heart of our AI compiler stack. This is a systems-level role where you'll take the output of our optimizing compiler and make it execute — efficiently, correctly, and at scale — across a diverse landscape of hardware targets.

You'll work on low-level parallelization, kernel scheduling, and performance analysis, and collaborate closely with our compiler and product teams to push the boundaries of what's possible on modern AI hardware.

What You'll Do

Design, develop, maintain, and improve our multi-target runtime.
Apply the latest techniques in parallelization and partitioning to automate kernel generation and exploit highly optimized execution paths.
Rapidly prototype and data-drive exploration of new runtime ideas.
Benchmark and analyze the outputs produced by our optimizing compiler on target hardware.
Build tools to collect and analyze performance bottlenecks.
Work closely with our product team to understand the evolving needs of ML engineers and drive improvements in runtime architecture.

Requirements

Essential Skills and Experience

BS degree in Computer Science, Computer Engineering, or equivalent practical experience.
4+ years of experience working with compilers or runtime systems.
Deep understanding of asynchronous and concurrent programming.
4+ years of experience with C/C++ (C++14 or newer).
Understanding of hardware architecture: vector vs. scalar registers and instructions, memory hierarchies.
Knowledge of operating system kernel development or hypervisor development.

Preferred Skills and Experience

Master's or PhD in Computer Science, Computer Engineering, or equivalent.
Experience developing or maintaining GPU compute libraries such as CUDA or ROCm.
Experience with GPU programming and optimization.
Background in high-performance computing (HPC).
Knowledge of deep learning frameworks such as PyTorch, JAX, or Triton.
Experience programming large compute clusters.

Why Join Lemurian Labs

Build the runtime that makes next-generation AI infrastructure actually go fast.
Work across the full stack — from hardware intrinsics to compiler output to distributed execution.
Join a team that approaches infrastructure as a canvas, not a constraint.
Competitive compensation including equity, medical/dental/vision, retirement savings, and wellness benefits.

Lemurian Labs is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees, regardless of gender identity, race, ethnicity, sexual orientation, disability status, age, or background.

Compensation depends on experience and geographic location and will be narrowed during the interview process. Additional benefits include equity, company bonus opportunities, medical, dental, and vision coverage, a retirement savings plan, and supplemental wellness benefits.

Skills

Asynchronous ProgrammingC++Compiler RuntimeConcurrent ProgrammingCUDAHPCHypervisorJAXMemory HierarchiesPyTorchROCmTritonVector Instructions

Runtime Engineer

About this role

About Us

About the Role

What You'll Do

Requirements

Essential Skills and Experience

Preferred Skills and Experience

Why Join Lemurian Labs

Skills

Explore related jobs

More jobs at Lemurian Labs

Similar Asynchronous Programming jobs

Jobs in Toronto

Browse these categories