Lead LLM Evals Engineer
- $250,000-$350,000
- San Francisco, CA
- Permanent
About the job
Lead LLM Evals Engineer | SF or Redwood City
I'm hiring a Lead LLM Evals Engineer to join an early-stage physical AI startup building systems with general physical ability to experiment, engineer, and manufacture anything. They’re a small, deeply technical team pushing agentic LLMs into real autonomous workflows tied to physical systems, factories, and end-to-end execution.
This role owns the evaluation and verification layer for agentic LLM systems operating in complex, long-horizon environments. You’ll build eval harnesses, automated verifiers, and regression gates that determine whether agents can actually plan, execute, recover, and ship real outcomes across simulated and real-world workflows. The work directly shapes how fast these systems improve, how safely
they operate, and whether progress is real or illusory.
→ Build eval harnesses for agentic LLM systems in complex workflows
→ Design verifiers for planning, execution, recovery, and constraint adherence
→ Turn eval failures into training signals with research and systems teams
Both Senior & Lead levels considered.
Interested? Apply now!