Jobless Developer
Rhoda ai logo
Rhoda ai

Posted 3 months ago

Open

ML Infra Engineer (Data)

Palo AltoOn-siteFull-time

AI Summary

Senior ML & Data Infrastructure Engineer owns and scales data pipelines and storage systems for large-scale model training datasets, focusing on ingestion, indexing, retrieval, and reliability across billions of video clips.

About this role

At Rhoda AI, we're building the full-stack foundation for the next generation of humanoid robots — from high-performance, software-defined hardware to the foundational models and video world models that control it. Our robots are designed to be generalists capable of operating in complex, real-world environments and handling scenarios unseen in training. We work at the intersection of large-scale learning, robotics, and systems, with a research team that includes researchers from Stanford, Berkeley, Harvard, and beyond. We're not building a feature; we're building a new computing platform for physical work — and with over $400M raised, we're investing aggressively in the R&D, hardware development, and manufacturing scale-up to make that a reality.

We're looking for a Senior ML & Data Infrastructure Engineer to own and scale the systems that power our model training data pipeline — from raw ingestion and storage to indexing, retrieval, and throughput optimization at massive scale.

What You'll Do

  • Architect, build, and scale a high-throughput data infrastructure that processes and manages billions of video clips with strong guarantees around reliability, latency, and cost efficiency

  • Design and optimize large-scale storage systems (cloud object storage, databases, metadata stores) for multimodal datasets

  • Build efficient indexing and retrieval systems to support fast dataset querying, filtering, and iteration for research and production use cases

  • Develop observability frameworks for data pipelines including monitoring, alerting, failure recovery, and performance optimization

  • Implement intelligent workload balancing and throughput optimization across distributed compute and storage systems

  • Manage data artifacts, versioning, and lineage to ensure reproducibility and traceability across training runs

  • Build internal interfaces and lightweight tools that enable researchers and engineers to explore, query, and analyze large datasets at scale

  • Support integration and scalable deployment of vision-language models (VLMs) within data pipelines for screening, enrichment, or metadata generation

What We're Looking For

  • 5+ years of experience in data infrastructure, distributed systems, ML infrastructure, or a closely related field

  • Strong experience building and operating large-scale data pipelines (1B+ samples or petabyte-scale systems preferred)

  • Deep understanding of distributed systems, databases, indexing strategies, and cloud storage architectures

  • Experience optimizing data throughput, workload balancing, and cost-performance tradeoffs in cloud environments

  • Strong skills in observability, monitoring, and production reliability for high-scale systems

  • Strong software engineering fundamentals with the ability to own systems end-to-end, from design to production

Nice to Have (But Not Required)

  • Experience managing large multimodal datasets

  • Familiarity with ML training workflows and data lifecycle management

  • Familiarity with vision-language models (VLMs) and experience running ML inference workloads at scale in distributed or cloud environments

  • Experience with robotics data formats or real-world sensor data (video, proprioception, teleoperation logs)

  • Familiarity with data versioning and lineage tooling (e.g., DVC, Delta Lake, or similar)

Why This Role

  • Own the data foundation that everything else runs on — model quality is only as good as the data infrastructure beneath it

  • Direct collaboration with research and ML systems teams; your work has immediate, measurable impact on training velocity

  • High ownership in a small team — you'll make real architectural decisions, not execute tickets

  • Help build the infrastructure that powers robots operating in the real world, at scale

Skills

Cloud Storage ArchitecturesData ArtifactsDatabasesData IndexingData InfrastructureData LineageDistributed SystemsDVC Or Delta Lake (familiarity)Large-scale Data PipelinesMetadata StoresMonitoringObject StorageObservabilityProduction ReliabilityRetrieval SystemsThroughput OptimizationTraining Data ManagementVersioningWorkload Balancing

Explore related jobs

Browse these categories