Posted 1 month ago

Research Scientist / Engineer - Post-training & Robot Learning

Palo AltoOn-siteFull-time

AI Summary

Research Scientist/Engineer focused on post-training and robot learning. Designs RL pipelines and adapts video-prediction models to real robot tasks, measuring generalization and real-world performance.

About this role

At Rhoda AI, we're building the full-stack foundation for the next generation of humanoid robots — from high-performance, software-defined hardware to the foundational models and video world models that control it. Our robots are designed to be generalists capable of operating in complex, real-world environments and handling scenarios unseen in training. We work at the intersection of large-scale learning, robotics, and systems, with a research team that includes researchers from Stanford, Berkeley, Harvard, and beyond. We're not building a feature; we're building a new computing platform for physical work — and with over $400M raised, we're investing aggressively in the R&D, hardware development, and manufacturing scale-up to make that a reality.

We're looking for Research Scientists and Research Engineers with deep robotics or autonomous systems domain knowledge to adapt our web-pretrained video model to real robot tasks. Post-training at Rhoda means taking a causal video generation model pretrained on internet-scale data and fine-tuning it on robot-collected demonstrations to produce reliable, generalizable behavior — with as little task-specific data as possible. We hire across levels — from senior to staff.

What You'll Do

Design and implement RL training pipelines to improve robot policy performance beyond what imitation learning alone achieves — reward design, online data collection, and policy optimization
Develop and apply RL algorithms (PPO, GRPO, or similar) adapted to the video prediction setting, including reward modeling and feedback collection strategies for physical task performance
Design and implement broader post-training pipelines: supervised fine-tuning, preference optimization, and behavioral alignment on robot-collected demonstration data
Work on the inverse dynamics model that translates video predictions into executable robot actions
Build evaluation frameworks for post-trained policies: task success, generalization to novel objects and environments, and failure mode analysis on real hardware
Research methods to efficiently adapt models to new tasks with minimal demonstration data, including in-context generalization and few-shot adaptation
Identify failure modes and systematic weaknesses in deployed robot policies and drive targeted improvements
Iterate quickly between simulation and real robot evaluation to close the feedback loop
Collaborate with the pre-training team to surface what capabilities are missing from the base model and need to be addressed upstream

What We're Looking For

Hands-on experience with robot systems, robotic policy learning, or autonomous systems in an industry or research setting (robotics, self-driving, or similar physical AI domains)
Strong understanding of robot policy learning: imitation learning, behavior cloning, and how RL builds on top of it
Practical familiarity with real robot hardware, deployment constraints, and sensor modalities (vision, proprioception)
Solid ML skills with hands-on PyTorch experience
Ability to diagnose policy failures, reason about distribution shift, and iterate effectively on data and training strategies
Comfort with ambiguity and fast-changing research priorities
Staff-level candidates are expected to define technical direction and drive research strategy independently; senior candidates execute complex projects with strong fundamentals and growing scope

Nice to Have (But Not Required)

Hands-on experience with reinforcement learning — reward design, policy optimization, and online RL training loops — applied to real or near-real environments (robotics, games, simulated physics, or similar); this is a significant plus
Prior industry experience in robotics, autonomous driving, or physical AI (e.g., manipulation, mobile robotics, self-driving stacks)
Experience with teleoperation systems or robot demonstration collection at scale
Familiarity with robot middleware (ROS/ROS2) and real-time control systems
Experience with simulation environments for robotics (MuJoCo, Isaac Sim, Genesis)
Understanding of video generation models and how they connect to action prediction
PhD in Robotics, ML, or a related field
Publication record at ICRA, CoRL, RSS, NeurIPS, or related venues

Why This Role

Your work is what makes our robots actually perform tasks reliably in the real world — the direct connection between pre-trained capability and deployed behavior
Work at a rare intersection: state-of-the-art video generation models applied to real robot hardware, not simulation
Fast feedback loop between model changes and real robot performance
High ownership on a small team where robotics domain expertise is core to the mission

Skills

Behavior CloningData-driven DebuggingDistribution ShiftEvaluation FrameworksFew-shot AdaptationGRPOImitation LearningIn-context LearningInverse Dynamics ModelingOnline Data CollectionPolicy OptimizationPPOProprioceptionPyTorchReinforcement LearningReward DesignRL Training PipelinesRobot Hardware DeploymentRobot PerceptionRobot PoliciesSimulation-to-real TransferTask GeneralizationVideo Generation ModelsVision Sensors

Research Scientist / Engineer - Post-training & Robot Learning

About this role

Skills

Explore related jobs

More jobs at Rhoda ai

Similar Behavior Cloning jobs

Browse these categories