Site Reliability Engineer, Platform
Department: Engineering
Location: Bay Area
Employment Type: FullTime
About Arena Intelligence
Arena Intelligence is the open platform for evaluating how AI models perform in the real world. Created by researchers from UC Berkeley’s SkyLab, our mission is to measure and advance the frontier of AI for real-world use.
Millions of people use Arena Intelligence each month to explore how frontier systems perform — and we use our community’s feedback to build transparent, rigorous, and human-centered model evaluations. Leading enterprises and AI labs rely on our evaluations to understand real-world reliability, alignment, and impact. Our leaderboards are the gold standard for AI performance — trusted by leaders across the AI community and shaping the global conversation on model reliability and progress.
We’re a team of researchers, engineers, academics, and builders from places like UC Berkeley, Google, Stanford, DeepMind, and Discord. We seek truth, move fast, and value craftsmanship, curiosity, and impact over hierarchy. We’re building a company where thoughtful, curious people from all backgrounds can do their best work. Everyone on our team is a deep expert in their field — our office radiates excellence, energy, and focus.
About the Role
Arena Intelligence is seeking a Site Reliability Engineer to own the reliability, performance, and operational security of the platform that millions of people depend on to evaluate frontier AI. This is the first dedicated SRE hire on the team — you'll build observability, incident response, and infrastructure hardening practices from scratch while also owning the CI/CD and developer tooling that keeps our engineering team moving fast.
Our stack runs on Vercel (Next.js, Hono API on Nitro), Supabase (Postgres, GoTrue auth), Cloudflare (Workers, R2, bot management), and AWS (CloudFront, Lambda). You'll work across the full request path — from edge-layer DDoS mitigation to auth hardening to production monitoring — partnering closely with security and product engineering to keep the platform fast, reliable, and resilient under adversarial traffic conditions.
You’ll
Harden auth infrastructure against volumetric attacks — edge-layer rate limiting in front of Supabase GoTrue, connection pool tuning, token caching, and origin shielding so DDoS traffic is filtered before it reaches the database
Extend CloudFront WAF rules and Cloudflare Worker bot management to cover auth endpoints and close gaps in application-layer rate limiting
Define and implement SLOs/SLIs across the full request path — CDN edge through serverless functions to Supabase
Build monitoring, alerting, and dashboards on top of existing Datadog and PostHog instrumentation that surface degradations before users notice them
Collaborate with security engineering to ensure clean handoff between edge-layer defenses and application-layer anti-abuse systems
Own and improve CI/CD pipelines (GitHub Actions, Turborepo) and expand infrastructure-as-code (Terraform) across cloud environments
Proactively load-test and stress-test infrastructure, model capacity limits, and drive cost optimization across our multi-cloud footprint
Enhance developer workflows to make building, testing, and deploying faster and more reliable
Mentor engineers across the company on building reliable, performant, and observable systems
You’ll have
6+ years of experience in SRE, platform engineering, or infrastructure engineering, including operating production systems at scale (millions of users / billions of requests)
Direct experience mitigating DDoS attacks and configuring edge security — WAF rules, CDN architecture, rate limiting, and traffic analysis
Hands-on experience building observability systems (Datadog, Grafana, Prometheus, or similar) and running incident response processes
Strong understanding of auth infrastructure under adversarial load — connection pooling, token caching, and rate limiting on login/signup endpoints
Experience with serverless architectures and managed platforms — you know how to make them reliable and observable at scale
Experience with infrastructure-as-code (Terraform, Pulumi) and CI/CD pipeline design
Track record of collaborating with security and product engineering to deliver both foundational systems and user-facing reliability improvements
Bonus Experience
Experience with Vercel, Supabase (GoTrue, Supavisor), Cloudflare Workers, or CloudFront specifically.
Experience with Node.js, TypeScript, Python, or Go in production backend environments.
Background in platforms with voting, reputation, or community-driven systems.
Experience being the first or early infrastructure hire at a startup.
Experience hardening auth systems under load (OAuth, JWT, PKCE flows, connection pooling).
What we offer
We offer competitive compensation and equity aligned to the markets where our team members are based. The base salary range will depend on the candidate’s permanent work location.
Comprehensive health and wellness benefits, including medical, dental, vision, and additional support programs.
The opportunity to work on cutting-edge AI with a small, mission-driven team
A culture that values transparency, trust, and community impact
Come help build the space where anyone can explore and help shape the future of AI.
Arena Intelligence provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability, genetics, sexual orientation, gender identity, or gender expression. We are committed to a diverse and inclusive workforce and welcome people from all backgrounds, experiences, perspectives, and abilities.
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
