About Zoox
Zoox is an autonomous ride-hailing company building the world's first purpose-built robotaxi — fully electric, bidirectional, with no steering wheel or driver's seat. Backed by Amazon and founded to make transportation safer, cleaner, and more accessible, Zoox designs its vehicles entirely around the rider. We're currently operating in Las Vegas and San Francisco, with Austin and Miami on the horizon, and testing underway across seven U.S. markets.

About Our Part-Time Student Worker Program
Zoox's part-time student worker program puts you at the center of one of the most ambitious challenges in transportation. You'll contribute to real projects, work alongside engineers and researchers pushing the boundaries of autonomous technology, and gain experience that goes well beyond the classroom. We're looking for students who bring strong academic foundations, curiosity that doesn't stop at coursework, and a drive to be part of something that matters.

Role Overview

This role requires supporting the end-to-end validation pipeline for AI tools: maintaining test datasets, running benchmarks, and measuring agent accuracy across routing decisions, classification labels, and structured output fields.

Responsibilities

Run and maintain the benchmark pipeline, analyzing results to identify routing errors and regressions across agent variants

Build and expand ground truth datasets used to evaluate agent outputs against known-correct answers

Identify and address gaps in benchmark validation and support building a more comprehensive evaluation infrastructure to improve validation prior to release

Develop new evaluation dimensions such as label accuracy and structured output correctness beyond the existing team classification benchmarks

Investigate failure modes in agent outputs and work with engineers to surface actionable improvements

Write scripts and tooling to automate data collection, result parsing, and metric reporting

Document findings, track benchmark trends over time, and present results to the team

Program Requirements

Currently enrolled in a B.S. or M.S. in Computer Science, Data Science, Engineering or a related field

Available to commit to a minimum three-month assignment

Able to commit to a minimum of 20 hours per week

Able to work on-site at one of our office locations

Must adhere with Zoox confidentiality requirements, including refraining from using or sharing proprietary company information outside of Zoox, such as in academic research, theses, publications, or presentations

Qualifications

Familiar with Cursor or Claude

Familiar with Python

Familiar with evaluation concepts: precision, recall, F1 score, and confusion matrices

Comfortable working with structured data (CSV, JSON)

Experience modifying or writing reproducible analysis scripts

Bonus Qualifications

Prior exposure to LLM-based systems, prompt engineering, or AI agent evaluation

Experience with Jira or Slack (e.g. ticketing systems, messaging apps)

Part-Time Student Worker – AI Validation and Benchmarking Engineer

About this role

Responsibilities

Program Requirements

Qualifications

Bonus Qualifications

Explore related jobs

More jobs at Zoox

Browse these categories