Judgment Labs

Research Engineer

San Francisco, CA
Python Machine Learning AI Data Engineering Reinforcement Learning
Description

Research Engineer

Department: Research

Location: San Francisco

Employment Type: FullTime

Judgment Labs builds infrastructure for Agent Behavior Monitoring (ABM). While traditional observability focuses on logging exceptions and latency, our ABM surfaces behavioral anomalies such as instruction drifts and context retrieval loss in scaled production environments.

Hundreds of teams building autonomous agents rely on Judgment to understand how their systems are behaving post-deployment. Instead of reactive incident triage, they cluster patterns across conversations and workflows, correlate regressions to specific interaction types, and pinpoint where reliability breaks down in their usage context.

We’ve raised $30M+ across two rounds in the past five months. Our investors include Lightspeed, SV Angel, Valor Equity Partners, Nova Global, Chris Manning, Michael Ovitz, Michael Abbott, Cory Levy, Kevin Hartz, and others.

The Role:

We are looking for Research Engineers to build AI systems that use agent interaction data to help us understand how agents behave, evaluate them at scale, and improve them through learning and feedback.

Your research will not live on a whiteboard. You'll work directly with real-world agent data, apply frontier methods in production, and see your work ship immediately into the product. By making agent behavior measurable and debuggable, your systems will support teams deploying agents across finance, legal, operations, and other high-stakes workflows. You will own projects end-to-end, with significant autonomy, and work closely with the team to build self-improving agent systems.

What You'll Do:

  • Build systems to aggregate, index, and analyze large-scale agent interaction data to extract meaningful evaluation signals

  • Develop agent-based systems for analyzing and evaluating complex, long-running behaviors

  • Design and implement post-training and optimization workflows to improve agent behavior

  • Build internal tools and infrastructure to support rapid experimentation, analysis, and training

What We're Looking For:

You identify with at least one of the following:

  • You care about data quality, evaluation, and benchmarking, and are comfortable working hands-on with messy data

  • You have experience building agent systems and working with them in real-world or production settings

  • You have a strong background in reinforcement learning, agents, or machine learning fundamentals

  • You are comfortable working across infrastructure and systems, spanning training, data pipelines, and model serving.

  • You are comfortable working across teams to translate research into product, balancing real-world customer constraints and tradeoffs.

  • You enjoy turning ambiguous problems into clear, well-designed plans

Why Judgment?

  • Agents can’t work without this. Today’s agents hallucinate, drift, and break in production. We’re building the infrastructure that fixes this: the monitoring layer that makes agents self-improving.

  • We’re wired to win. We're a team of less than 20 but we ship like 50+ on the daily. You'll be working with olympiad medalists, debate champions, and competitive athletes who bring that same intensity to company building.

  • Fast track to founding. Our engineers interface directly with customers, ship code into their environments, and use their feedback to dictate what’s next on the roadmap. Everyone on the team is either an ex-founder or a founder-to-be.

  • We make sure our people do their best work. If you deserve a spot on the team, money will never get in the way of it. Full benefits, Equinox, and a private chef to take care of you. We sprint hard but we play hard, ask us about our Smash/Mario Kart tournaments.

    We work in person in San Francisco.

Judgment Labs
Judgment Labs

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say