Join us as we work to create a thriving ecosystem that delivers accessible, high-quality, and sustainable healthcare for all.
athenahealth is a progressive, innovation-driven software product company dedicated to transforming healthcare through cutting-edge cloud solutions. We partner with healthcare organizations to improve clinical and financial outcomes by building modern technology on an open, connected ecosystem that drives meaningful insights for our customers and their patients. We take pride in our values-driven culture, offering a flexible work-life balance and fostering an environment of innovation. As a testament to our industry leadership and rapid growth, we were acquired by Bain Capital for $17B in 2021, and we continue to launch new strategic product initiatives to push the boundaries of healthcare technology.
We are headquartered in Boston, US, and our India offices are in Chennai, Bangalore and Pune.
Position Summary: We are looking for a Site Reliability Engineering (SRE) Manager to lead our Cloud Infrastructure Engineering team in Chennai R&D. This team ensures the continuous availability of the technologies and systems that power athenahealth’s services. We manage thousands of servers, petabytes of storage, and process thousands of web requests per second, all while supporting rapid growth. Our goal is to create a seamless operating system for the medical office—abstracting administrative complexities so doctors can focus on patient care.
About the Team: We are a team of passionate Site Reliability Engineers focused on automation, reliability, and scalability. We operate within an agile framework, prioritizing impactful projects that support business needs.
We manage a hybrid cloud platform, making data-driven decisions on the best infrastructure solutions. Automation is at the heart of everything we do—eliminating repetitive tasks so we can focus on projects that drive real innovation.
Key Responsibilities:
Team Leadership & Development
- Lead, mentor, and develop a team of SREs, fostering a culture of collaboration, accountability, and continuous learning.
- Build a high-performing team focused on operational excellence, reliability, and scalability.
- Partner with Engineering, Product, and Project Management teams to align priorities and drive cross-functional collaboration.
Service Reliability & Performance
- Define and track Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs) for critical systems.
- Monitor and enhance the reliability, availability, and performance of all production services and infrastructure.
- Drive improvements in incident management, root cause analysis, and postmortem processes.
- Implement proactive monitoring, alerting, and incident response strategies.
System Automation & Scalability
- Lead automation efforts to eliminate manual tasks, improve system reliability, and streamline operations.
- Implement best practices for system design, capacity planning, and cost optimization.
- Work closely with engineering teams to build scalable, resilient, and efficient systems.
Collaboration & Cross-functional Engagement
- Advocate for reliability best practices across engineering and product teams.
- Ensure reliability is embedded in the development lifecycle by reviewing code, design, and deployment strategies.
- Align with other engineering managers on long-term goals, technical debt, and infrastructure investments.
Process & Efficiency Improvement
- Continuously improve incident management, deployment pipelines, and system observability.
- Champion automation, monitoring, alerting, and reporting tools.
- Use data-driven insights to measure and optimize operational performance.
Preferred Qualifications:
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- 10+ years of experience in building, scaling, and supporting highly available systems and services.
- 2-3 years of experience in managing and mentoring technical teams, with expertise in containerization (Docker, Kubernetes - On-prem & Cloud).
- Strong background in Platform Engineering, TechOps, FinOps, and DevSecOps in a hybrid cloud environment.
- Expertise in Infrastructure-as-Code (Terraform, Crossplane, Puppet, Ansible) and API integration.
- Proficiency in at least one scripting or programming language (Python, Go, Ruby, etc.).
- Hands-on experience with Linux systems, VMware, cloud platforms (AWS), and observability tools (Prometheus, Grafana, ELK, CloudWatch, Splunk).
- Strong understanding of site reliability principles, telemetry, and monitoring best practices.
- Experience with large-scale distributed systems and cloud-native architectures.
- Familiarity with configuration management tools (Ansible, Chef, Puppet).
- Solid grasp of security best practices and compliance standards.
About athenahealth
Here’s our vision: To create a thriving ecosystem that delivers accessible, high-quality, and sustainable healthcare for all.
What’s unique about our locations?
From an historic, 19th century arsenal to a converted, landmark power plant, all of athenahealth’s offices were carefully chosen to represent our innovative spirit and promote the most positive and productive work environment for our teams. Our 10 offices across the United States and India — plus numerous remote employees — all work to modernize the healthcare experience, together.
Our company culture might be our best feature.
We don't take ourselves too seriously. But our work? That’s another story. athenahealth develops and implements products and services that support US healthcare: It’s our chance to create healthier futures for ourselves, for our family and friends, for everyone.
Our vibrant and talented employees — or athenistas, as we call ourselves — spark the innovation and passion needed to accomplish our goal. We continue to expand our workforce with amazing people who bring diverse backgrounds, experiences, and perspectives at every level, and foster an environment where every athenista feels comfortable bringing their best selves to work.
Our size makes a difference, too: We are small enough that your individual contributions will stand out — but large enough to grow your career with our resources and established business stability.
Giving back is integral to our culture. Our athenaGives platform strives to support food security, expand access to high-quality healthcare for all, and support STEM education to develop providers and technologists who will provide access to high-quality healthcare for all in the future. As part of the evolution of athenahealth’s Corporate Social Responsibility (CSR) program, we’ve selected nonprofit partners that align with our purpose and let us foster long-term partnerships for charitable giving, employee volunteerism, insight sharing, collaboration, and cross-team engagement.
What can we do for you?
Along with health and financial benefits, athenistas enjoy perks specific to each location, including commuter support, employee assistance programs, tuition assistance, employee resource groups, and collaborative workspaces — some offices even welcome dogs.
In addition to our traditional benefits and perks, we sponsor events throughout the year, including book clubs, external speakers, and hackathons. And we provide athenistas with a company culture based on learning, the support of an engaged team, and an inclusive environment where all employees are valued.
We also encourage a better work-life balance for athenistas with our flexibility. While we know in-office collaboration is critical to our vision, we recognize that not all work needs to be done within an office environment, full-time. With consistent communication and digital collaboration tools, athenahealth enables employees to find a balance that feels fulfilling and productive for each individual situation.
Other Jobs from Athenahealth
Senior MLOps Engineer - athenaIntelligence R&D
Manager Engineering
Lead Site Reliability Engineer - Core Infra (LMTS)
Lead Site Reliability Engineer – Public Cloud (LMTS)
Drupal Developer- Associate
Principal Software Engineer, Patient Insurance – athenaCollector
Similar Jobs
Site Reliability Engineer (SRE)
Staff Software Development Engineer - Dev Ops
Manager, Software Engineering - Observability
Lead Software Engineer (Java - Observability)
Senior Software Engineer- AI Department
Senior Full Stack Engineer
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say