Back to search
nuro Greenhouse · Posted today

Technical Lead, Evaluation Infrastructure

California, United States

Offboard Infrastructure Greenhouse
Continue to application Add your email once, then Caio opens the original posting.

Indexed description

Who We Are

Nuro is a self-driving technology company on a mission to make autonomy accessible to all. Founded in 2016, Nuro is building the world’s most scalable driver, combining cutting-edge AI with automotive-grade hardware. Nuro licenses its core technology, the Nuro Driver™, to support a wide range of applications, from robotaxis and commercial fleets to personally owned vehicles. With technology proven over years of self-driving deployments, Nuro gives the automakers and mobility platforms a clear path to AVs at commercial scale, empowering a safer, richer, and more connected future.

About the Role

Evaluation Infrastructure plays a critical role at Nuro, directly enabling L4 driverless deployment. The team supports two demanding workloads: day-to-day Autonomy Evaluation that powers rapid software iteration, and large-scale Driverless Safety Validation that produces the rigorous evidence required to deploy autonomy on public roads.

The Evaluation Infrastructure team builds the metrics framework, evaluation pipelines, introspection tooling, and analysis products that turn raw on-road and simulation logs into actionable insight. Our metrics stack spans both heuristic and ML-based approaches, covering everything from low-level component accuracy to end-to-end behavior quality. The platform empowers autonomy and Systems & Safety teams to run complex evaluations and validations across a wide range of configurations and scales, producing the high-fidelity metrics that drive both short-term iteration and long-term release confidence — in close partnership with Simulation and the broader AI Platform.

As the Technical Lead, you will lead the team with deep technical guidance and rigor, setting the technical bar, shortening the time-to-signal for evaluation and the time-to-confidence for validation, so that both autonomy and Systems & Safety teams can iterate fast while deploying software safely.

About the Work

  • Build and own a unified metrics, evaluation, and validation platform — pipelines, introspection tooling, and analysis products that turn on-road and simulation logs into high-fidelity signals for autonomy iteration and driverless safety validation
  • Drive the technical bar for metric quality across both heuristic and ML-based approaches; invest in the scale, reliability, and CI/CD of the evaluation stack to shorten time-to-signal for evaluation and time-to-confidence for validation, and to meet high SLAs for downstream stakeholders
  • Mentor and grow the Evaluation Infrastructure team, and champion AI-native engineering practices that compound team velocity and code quality
  • Partner with Product, Autonomy, Systems & Safety, and Simulation teams to define and execute the vision and strategy for evaluation at Nuro

About You

  • You have a degree in B.Sc or M.Sc., plus 4 years of relevant work experience
  • Domain experience: Strong fluency in distributed systems, large-scale data and ML evaluation pipelines, metrics frameworks (heuristic and/or ML-based), and analytics platforms
  • Engineering leadership: Experience setting technical vision, roadmap, and prioritization for a team operating at the intersection of autonomy, safety, and data infrastructure; a clear, concise communicator who partners effectively with PMs, engineers, and cross-functional stakeholders across Autonomy, Systems & Safety, and Simulation
  • Technical excellence: Ability and willingness to deep-dive into implementation; sets the technical bar for metric quality, pipeline rigor, and safety-critical engineering practice across the broader software organization; strong proficiency in Python, C++, or similar languages
  • AI-native mindset: Daily user of modern AI coding assistants and agentic tools (Claude Code, Cursor, and similar), with strong intuition for where they accelerate engineering work and where they don't; eager to apply LLMs and ML systems to evaluation problems, from automated triage and metric generation to natural-language analysis of fleet behavior; raises the team's productivity, code quality, and signal density through thoughtful AI integration

Bonus Points

  • Knowledge of data engineering, and its tooling and best practices
  • Knowledge of batch and streaming data processing, warehousing, and analytics solutions
  • Experience with data workflow orchestration platforms
  • Prior experience building evaluation, validation, or analytics platforms, ideally in autonomy, robotics, or safety-critical systems

At Nuro, your base pay is one part of your total compensation package. For this position, the reasonably expected base pay range is between $193,930,200 and $291,150/year for the level at which this job has been scoped. Your base pay will depend on several factors, including your experience, qualifications, education, location, and skills. In the event that you are considered for a different level, a higher or lower pay range would apply. This position is also eligible for an annual performance bonus, equity, and a competitive benefits package.

At Nuro, we celebrate differences and are committed to a diverse workplace that fosters inclusion and psychological safety for all employees. Nuro is proud to be an equal opportunity employer and expressly prohibits any form of workplace discrimination based on race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, veteran status, or any other legally protected characteristics.

Free. 20 seconds. No password. See every match in this search.

Create a free Caio profile to unlock the full index and keep your job-search signal for future recommendations.

Unlock free search
Want help applying to roles like this? Search Caio for free. If the repetitive CV tweaking gets heavy, Daniel can help set up Caio Agent.
Ask about Agent