Akuity

Senior Site Reliability Engineer

Remote
Kubernetes AWS EC2 EKS VPC NLB Route53 S3 RDS IAM Prometheus Grafana OpenTelemetry Datadog Go Python Bash Argo CD Terraform
Description

Senior Site Reliability Engineer

Location: Remote - US Timezones

Department: Engineering

About Akuity

With the move to the cloud, Kubernetes has become widely adopted by DevOps and Platform Engineering teams, but it has also added complexity. While scaling Kubernetes at Intuit, the Akuity founders started building Argo CD in order to streamline the adoption of Kubernetes. Argo CD helps developers own, understand and deploy their K8s deployments via GitOps.

Today, Argo CD is the third most popular project in the CNCF (Cloud Native Computing Foundation) and is used by 70% of companies who are using Kubernetes in production. The list of Argo CD users includes companies like Intuit, BlackRock, Tesla, Major League Baseball, Peloton, and many more.

The team founded Akuity in 2021 to enable enterprises to ship software faster and more reliably with modern GitOps best practices. The Akuity Platform enables teams to manage the development and deployment across hundreds – if not thousands – of Kubernetes clusters from a single control plane. Trusted by top companies around the globe, the Akuity Platform provides the only end-to-end GitOps platform for the enterprises.

Our mission is to simplify the software delivery process so that DevOps and Platform Engineering teams can move fast, and deploy code effortlessly without the fear of breaking things.

The Role

We are looking for a Senior SRE to help us keep the Akuity platform running at the level our enterprise customers expect. This is a high-ownership role; you won't just respond to incidents, you'll shape how we define and defend reliability across the entire platform. You'll work closely with engineering, infrastructure, and product to build the systems and culture that let us scale with confidence.

What You'll Own

Platform Reliability & SLAs

  • Own SLI/SLO/SLA definitions for the Akuity SaaS platform and drive continuous improvement against them
  • Design, instrument, and maintain observability systems (metrics, logs, traces) across multi-region AWS infrastructure
  • Identify reliability gaps, lead blameless post-mortems, and close the loop with permanent fixes
  • Partner with engineering teams to build reliability into new features before they ship to production

On-Call & Incident Response

  • Participate in an on-call rotation and act as incident commander for high-severity production events
  • Build and maintain runbooks, escalation paths, and incident playbooks that keep mean time to resolution low
  • Drive improvements to alerting fidelity; reduce noise, increase signal, eliminate toil
  • Lead post-incident reviews with clear timelines, root cause analysis, and follow-through on action items

What We're Looking For

Required

  • 5+ years of SRE, platform engineering, or production operations experience in a SaaS environment
  • Deep hands-on Kubernetes expertise; you understand the scheduler, networking, storage, and autoscaling at a level where you can debug anything
  • Strong AWS fundamentals across compute (EC2, EKS), networking (VPC, NLB, Route53), storage (S3, RDS), and IAM
  • Experience defining and operating against SLOs in production; you've written error budgets, not just read about them
  • Proficiency with observability tooling (Prometheus, Grafana, OpenTelemetry, Datadog, or equivalent)
  • Solid scripting and automation skills; Go, Python, Bash, or similar; you automate what you touch
  • Strong written communication: clear runbooks, sharp incident reports, thoughtful post-mortems
  • Live within US time zones (Pacific through Eastern), including Canada and other regions

Strong Advantage

  • Experience with Argo CD, Kargo, or GitOps-based delivery workflows
  • Familiarity with multi-region, multi-cluster Kubernetes deployments
  • Experience with compliance-adjacent infrastructure (SOC 2, ISO 27001, HIPAA, or PCI DSS)
  • Background operating infrastructure for other platform or developer tooling companies

Our Stack

  • Kubernetes (EKS):  multi-region, enterprise-grade clusters serving Argo CD and Kargo workloads
  • AWS: primary cloud provider across all production and DR environments
  • Argo CD & Kargo: GitOps delivery tools we build and run ourselves
  • Prometheus, Grafana, and OpenTelemetry for observability
  • Terraform and GitOps-driven infrastructure management

What We Offer

  • Competitive compensation, commensurate with experience
  • Equity participation in a well-funded, growing company
  • Fully remote: work from anywhere within US time zones (Pacific through Eastern), including Canada and other regions
  • Home office stipend and equipment budget
  • Flexible time off and a culture that respects it
  • Work directly with the engineers who built Argo CD and Kargo; you'll learn a lot here

US-based employees receive full benefits, including comprehensive health, dental, and vision coverage. Candidates based outside the US will be engaged as contractors.

Akuity
Akuity

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say