Jobless Developer
Zoox logo

Posted 1 day ago

Open

Part-Time Student Worker – AI Validation and Benchmarking Engineer

Foster City, CAHybridContract

AI Summary

About ZooxZoox is an autonomous ride-hailing company building the world's first purpose-built robotaxi — fully electric, bidirectional, with no steering wheel or driver's seat.

About this role

About Zoox
Zoox is an autonomous ride-hailing company building the world's first purpose-built robotaxi — fully electric, bidirectional, with no steering wheel or driver's seat. Backed by Amazon and founded to make transportation safer, cleaner, and more accessible, Zoox designs its vehicles entirely around the rider. We're currently operating in Las Vegas and San Francisco, with Austin and Miami on the horizon, and testing underway across seven U.S. markets.

About Our Part-Time Student Worker Program
Zoox's part-time student worker program puts you at the center of one of the most ambitious challenges in transportation. You'll contribute to real projects, work alongside engineers and researchers pushing the boundaries of autonomous technology, and gain experience that goes well beyond the classroom. We're looking for students who bring strong academic foundations, curiosity that doesn't stop at coursework, and a drive to be part of something that matters.
Role Overview
This role requires supporting the end-to-end validation pipeline for AI tools: maintaining test datasets, running benchmarks, and measuring agent accuracy across routing decisions, classification labels, and structured output fields.

Responsibilities

  • Run and maintain the benchmark pipeline, analyzing results to identify routing errors and regressions across agent variants
  • Build and expand ground truth datasets used to evaluate agent outputs against known-correct answers
  • Identify and address gaps in benchmark validation and support building a more comprehensive evaluation infrastructure to improve validation prior to release
  • Develop new evaluation dimensions such as label accuracy and structured output correctness beyond the existing team classification benchmarks
  • Investigate failure modes in agent outputs and work with engineers to surface actionable improvements
  • Write scripts and tooling to automate data collection, result parsing, and metric reporting
  • Document findings, track benchmark trends over time, and present results to the team
  • Program Requirements

  • Currently enrolled in a B.S. or M.S. in Computer Science, Data Science, Engineering or a related field
  • Available to commit to a minimum three-month assignment
  • Able to commit to a minimum of 20 hours per week
  • Able to work on-site at one of our office locations
  • Must adhere with Zoox confidentiality requirements, including refraining from using or sharing proprietary company information outside of Zoox, such as in academic research, theses, publications, or presentations
  • Qualifications

  • Familiar with Cursor or Claude
  • Familiar with Python
  • Familiar with evaluation concepts: precision, recall, F1 score, and confusion matrices
  • Comfortable working with structured data (CSV, JSON)
  • Experience modifying or writing reproducible analysis scripts
  • Bonus Qualifications

  • Prior exposure to LLM-based systems, prompt engineering, or AI agent evaluation
  • Experience with Jira or Slack (e.g. ticketing systems, messaging apps)
  • Explore related jobs

    Browse these categories