Responsibilities:
- Ensure 99.99% uptime of our cloud platform by maintaining highly reliable and resilient infrastructure.
- Design and implement self-healing, fault-tolerant systems to proactively prevent failures.
- Define and maintain SLIs, SLOs, and SLAs to ensure proactive performance monitoring and rapid incident resolution.
- Architect and optimize scalable cloud infrastructure (AWS) for real-time, high-throughput data processing.
- Improve and manage containerized environments (Kubernetes, Docker) to support multi-region deployments.
- Implement and enhance infrastructure as code (Terraform) for fully automated infrastructure management.
- Develop and refine a robust monitoring, alerting, and logging system using Prometheus, Grafana, OpenTelemetry, and Datadog.
- Participate in incident response and on-call rotations, driving down mean time to detection (MTTD) and mean time to resolution (MTTR).
- Conduct blameless postmortems and implement lessons learned to improve system resilience.
- Automate deployment, scaling, and failover mechanisms to reduce manual intervention.
- Contribute to disaster recovery and business continuity planning to maintain availability of critical healthcare services.
- Work closely with Product, Engineering, and Infrastructure teams to align SRE initiatives with business goals.
Our requirements:
- 5+ years of experience in Site Reliability Engineering or Cloud Infrastructure.
- Proven success scaling high-traffic, mission-critical platforms in SaaS, IoT, or healthcare.
- Deep expertise in cloud platforms (AWS), Kubernetes, and distributed systems.
- Strong background in monitoring, logging, and observability with Prometheus, OpenTelemetry, or similar tools.
- Deep knowledge of CI/CD automation, GitOps, and infrastructure as code (Terraform, etc.).
- Strong understanding of network security, access management, and compliance frameworks (HIPAA, SOC 2).
- Experience with healthcare IT, including EHR data, FHIR, and HL7 interoperability.
- Expertise in real-time distributed systems, event-driven architectures, or large-scale data pipelines.
Why You'll Love It Here
- Own Mission-Critical Reliability – Ensure hospitals and care facilities always stay online with a 99.99% uptime healthcare platform.
- Scale AI-Powered Infrastructure – Work on real-time automation and self-healing cloud systems that orchestrate care delivery.
- Drive Big Impact in Healthcare – Help reduce waste, optimize resources, and improve patient care with technology that delivers 10X ROI.
- Automation-First Culture – Minimize manual ops with cutting-edge automation, observability, and incident response strategies.
- Join a High-Performing Team – Work with top engineers, AI experts, and healthcare innovators solving real-world challenges

0 applies
25 views
Other Jobs from Kontakt.io
Data Platform Engineer
Agentic AI/ML Applications Platform Engineer
Senior Site Reliability Engineer
Data Platform Engineer
Agentic AI/ML Applications Platform Engineer
Similar Jobs
Senior Software Engineer - Full Stack Developer, Java major
Software Engineer - Full Stack Developer, UI major
Software Engineer - Full Stack Developer, Java major
Software Engineer - Full Stack Developer, .NET
Senior Specialist Software Engineer (Dot Net, AWS)
Senior Specialist Software Engineer (Dot Net, AWS)
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say