Jobless Developer
Merlin Labs logo
Merlin Labs

Posted 3 months ago

Open

Director, Compute Platform

BostonOn-siteFull-time

AI Summary

Senior platform leader responsible for end-to-end compute architecture powering cloud ML training, high-fidelity simulation, and edge/onboard systems for autonomous flight.

About this role

About Merlin:
Merlin is a venture backed aerospace startup building a non-human pilot to enable both reduced crew and uncrewed flight. Backed by some of the world’s leading investors, Merlin is scaling alongside our customers to begin leveraging autonomy today to solve some of aviation’s biggest challenges.

About You:
You are a senior platform leader who understands that autonomous flight demands uncompromising reliability, real-time performance, and scalable infrastructure.
You have built and owned compute platforms that power complex, mission-critical systems — spanning cloud ML training, high-fidelity simulation, and edge or embedded environments. You think holistically about architecture, balancing determinism, latency, scalability, safety, and cost.
You are not looking to maintain infrastructure — you want to define it. You operate as a technical authority, partnering closely with autonomy, perception, controls, and flight software leaders to ensure the compute foundation enables rapid development and safe, scalable deployment.
You thrive in environments where the platform you design directly impacts real-world performance. You are comfortable setting long-term technical direction while remaining hands-on in the most complex systems challenges.

Responsibilities:

  • End-to-end compute architecture spanning cloud ML training, simulation clusters, data pipelines, and edge/onboard systems.
  • Infrastructure that supports large-scale distributed training, high-fidelity simulation (SIL/HIL), and autonomy validation workflows.
  • GPU/accelerator orchestration, workload scheduling, and performance optimization for compute-intensive autonomy systems.
  • Design decisions that balance determinism, latency, safety, scalability, and cost.
  • Platform reliability standards for systems that ultimately support flight-critical software.
  • Long-term roadmap for scaling compute infrastructure as fleet size, simulation fidelity, and ML workloads grow.
  • Technical mentorship and bar-raising across platform and infrastructure engineering.
  • Requirements:

  • 10+ years of experience building large-scale distributed or high-performance compute systems.
  • Proven ownership of production infrastructure supporting autonomous systems, robotics, aerospace, or ML-heavy platforms.
  • Deep expertise in distributed systems design, networking, and systems performance optimization.
  • Experience architecting infrastructure for GPU-based training, simulation, or real-time compute workloads.
  • Strong programming background in C++, Go, or Python, with comfort operating close to systems layers.
  • Experience bridging cloud infrastructure and edge/embedded compute environments.
  • Track record of leading complex cross-functional technical initiatives.
  • Ability to operate with autonomy in a fast-scaling, high-ambiguity startup environment.
  • Skills

    Autonomy Validation WorkflowsC++Cloud InfrastructureCost OptimizationCross-functional LeadershipData PipelinesDeterminismDistributed SystemsEdge/embedded ComputeFleet-scale InfrastructureGOGPU/accelerator OrchestrationHigh-performance ComputingLatency OptimizationNetworkingOrchestration (e.g., Kubernetes)Platform ReliabilityPythonReal-time ComputeScalabilitySIL/HIL SimulationSystem Performance OptimizationWorkload Scheduling

    Explore related jobs

    Browse these categories