Lead Software Engineer - SRE
Team: Engineering
Location: New York City
Commitment: Fulltime
Workplace Type: hybrid
Responsibilities
- Lead the design and implementation of scalable, fault-tolerant, and self-healing infrastructure and services across AWS and Kubernetes.
- Collaborate with Product, Engineering, and Infrastructure teams to align SRE initiatives with business priorities and platform needs.
- Define and drive adoption of SLIs, SLOs, and SLAs to ensure consistent performance and high reliability across the platform.
- Own and evolve observability strategies using Prometheus, OpenTelemetry, Grafana, and related tooling.
- Design and maintain infrastructure as code (Terraform) and drive GitOps best practices.
- Oversee major incident response and on-call practices, including incident reviews and long-term remediation planning.
- Mentor and support the growth of SRE and platform engineers, fostering a culture of engineering rigor and operational excellence.
- Contribute to the long-term reliability roadmap and architecture of high-throughput, real-time systems in healthcare operations.
- Drive process improvements in CI/CD, service ownership, chaos engineering, disaster recovery, and secure deployment.
What You Bring
- 5+ years of experience in Site Reliability Engineering, Cloud Infrastructure, or Platform Engineering.
- 5+ years of software engineering experience building production-grade systems (Java, Python, Go, or similar).
- Proven success scaling high-traffic, mission-critical platforms in SaaS, IoT, or healthcare environments.
- Deep expertise in cloud platforms (especially AWS), Kubernetes, and distributed system architecture.
- Hands-on experience with monitoring, logging, and observability tools (Prometheus, OpenTelemetry, Datadog, etc.).
- Extensive knowledge of CI/CD automation, GitOps workflows, and infrastructure-as-code (Terraform, Helm, ArgoCD).
- A track record of leading major incident response and running postmortems with a blameless, learning-focused approach.
- Strong understanding of networking, access control, and security within regulated environments (HIPAA, SOC 2).
- A leadership mindset—able to drive cross-functional alignment, lead initiatives, and mentor a high-performance SRE team.
Why You'll Love It Here
- Own Mission-Critical Reliability – Ensure hospitals and care facilities always stay online with a 99.99% uptime healthcare platform.
- Scale AI-Powered Infrastructure – Work on real-time automation and self-healing cloud systems that orchestrate care delivery.
- Drive Big Impact in Healthcare – Help reduce waste, optimize resources, and improve patient care with technology that delivers 10X ROI.
- Automation-First Culture – Minimize manual ops with cutting-edge automation, observability, and incident response strategies.
- Join a High-Performing Team – Work with top engineers, AI experts, and healthcare innovators solving real-world challenges.
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
