Hark

Member of Technical Staff

San Jose, CA
Python PyTorch Machine Learning Reinforcement Learning LLM
Description

Member of Technical Staff, Post-training

Location: San Jose

Department: Computer Use Agents

About Hark

Hark is an artificial intelligence company building advanced, personalized intelligence. One that is proactive, multimodal, and capable of interacting with the world through speech, text, vision, and persistent memory.

We're pairing that intelligence with next-generation hardware to create a universal interface between humans and machines. While today's AI largely operates through chat boxes and decade-old devices, Hark is focused on what comes next: agentic systems that interact naturally with people and the real world.

To get there, we're developing multimodal models and next-generation AI hardware together - designed from the ground up as a single, unified interface for a new era of intelligent systems.

About the Role

We are looking for a Member of Technical Staff, Post-Training to lead the development of post-training strategies that define how our models acquire coding, computer use, and agentic capabilities at scale.

This role sits at the frontier of a rapidly emerging discipline — one where reinforcement learning, simulation, and large-scale model training converge to produce agents that can reason, plan, and act over long horizons. There is no established playbook here. We're looking for researchers and engineers who can bring rigor and creativity from adjacent fields — RL, robotics, game-playing systems, compiler tooling, formal verification, or program synthesis — and apply them to the next generation of coding and agentic AI.

Responsibilities

  • Design and implement post-training strategies,  primarily RL-based,  to develop strong coding agents capable of multi-step reasoning, tool use, and long-horizon task completion.
  • Build and scale simulation and scaffolding environments for agentic RL: code execution sandboxes, computer use environments, tool-calling harnesses, and verifiable reward signals.
  • Develop reward modeling pipelines — including outcome-based, execution-based, and process-based reward signals — and iterate on them based on training dynamics.
  • Scale synthetic data generation and trajectory distillation pipelines that feed RL training and improve sample efficiency.
  • Design and run rigorous ablations to understand how algorithm choice, data mixture, reward shaping, and scale interact in the agentic setting.
  • Build evaluation frameworks grounded in real agent tasks — code correctness, execution success, multi-step tool use — to measure progress and guide iteration.
  • Collaborate with mid-training, infrastructure, and product teams to translate research insights into durable improvements on the model.

Requirements

  • Strong background in machine learning, with hands-on experience training or fine-tuning large models — LLMs, multimodal, or equivalent systems.
  • Deep understanding of reinforcement learning: policy optimization, reward design, exploration, and the interplay between environment design and agent behavior.
  • Experience building or working within simulation or execution environments (e.g., code interpreters, sandboxed execution, game environments, robotics simulators).
  • Proven ability to design and execute rigorous experiments, with strong intuition for diagnosing training failures and scaling bottlenecks.
  • Proficiency in Python and PyTorch; comfort working across research and systems code.
  • Ability to work in a fast-moving, research-forward environment where the right approach is often unknown at the outset.

We expect strong candidates to come from a range of backgrounds — RL research, robotics, competitive programming systems, compilers, formal methods, or large-scale ML — rather than post-training specifically. The field is new enough that directly relevant experience is rare; what matters is depth, rigor, and transferability.

Bonus Qualifications

  • Experience with RL algorithms applied to language or code: RLHF, DPO, GRPO, PPO, or similar paradigms in the LLM setting.
  • Familiarity with coding agent benchmarks and evaluation environments (e.g., SWE-bench, HumanEval, LiveCodeBench, competitive programming judges).
  • Background in reward modeling — outcome-based, process-based, or learned reward signals.
  • Experience with trajectory-based training, imitation learning, or data distillation from stronger models or human demonstrations.
  • Prior work on computer use, GUI agents, or tool-using LLMs (e.g., OSWorld, WebArena-style tasks).
  • Experience training or scaling models at 10B+ parameters, with attention to efficiency, stability, and GPU utilization.
  • Contributions to open-source ML projects or publications at top venues (NeurIPS, ICML, ICLR, EMNLP, COLM, etc.).

Compensation

The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components and benefits depending on the specific role. This information will be shared if an employment offer is extended.

Hark
Hark

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say