About NewsBreak
NewsBreak is redefining the way users interact with local news and their communities. By bridging local users, local content creators, and local businesses, our mission is to foster safer, more vibrant, and authentically connected lives. Through robust collaborations with thousands of local publishers and businesses across the nation, NewsBreak is revolutionizing how a new wave of readers access and engage with essential, locally sourced content & information.
Since our inception in 2015, our trajectory has been nothing short of remarkable. We proudly stand as the nation’s premier local news app.
As a Series-C unicorn startup, our headquarter nestles in the tech hub of Mountain View, California, with other offices in New York City and Seattle. For more information, visit www.newsbreak.com/about
About the role
As a Software Engineer in Reliability & Availability, you will be responsible for ensuring the stability, scalability, and resiliency of our cloud infrastructure and services. Working at the core of SRE, system performance, and availability management, you will design robust solutions to minimize downtime, optimize performance, and enhance system reliability. Your focus will be on AWS cloud infrastructure, Kubernetes (EKS), and big data processing (EMR), implementing high availability, fault tolerance, and self-healing mechanisms for distributed systems. Through automation, proactive monitoring, and incident response, you will help maintain seamless operations across our cloud-native platforms.
Responsibilities
- Ensure service reliability and availability by designing and implementing fault-tolerant architectures leveraging AWS, EKS (Elastic Kubernetes Service), and EMR (Elastic MapReduce).
- Build, automate, and optimize infrastructure for high-performance, scalable, and resilient cloud services.
- Develop monitoring, observability, and alerting solutions to proactively detect and mitigate service degradation and performance bottlenecks.
- Improve service lifecycle management, from capacity planning and launch reviews to post-incident analysis and continuous optimization.
- Enhance auto-scaling mechanisms to dynamically adjust resources and maintain system stability under varying workloads.
- Drive automation using Infrastructure-as-Code (IaC) and CI/CD pipelines to minimize manual intervention and improve service resilience.
- Engage in on-call rotations, manage incidents, conduct blameless postmortems, and drive long-term reliability improvements.
Requirements
- BS or MS in Computer Science, Engineering, or a related field, with at least 2+ years of experience in SRE, DevOps, or Infrastructure Engineering roles.
- Strong programming experience in at least one of the following: C, C++, Java, Python, or Go.
- Hands-on experience with cloud platforms (AWS, GCP, or Azure), with a strong emphasis on AWS services (EKS, EMR, EC2, RDS, S3).
- Deep understanding of Kubernetes (EKS) and containerized workloads, including scaling, monitoring, and failure recovery strategies.
- Strong experience with monitoring tools (Prometheus, Grafana,) ,log management (ELK, CloudWatch, Splunk), distributed tracing and profiling solutions.
- Extensive experience supporting production Internet services, troubleshooting performance issues, and implementing high-availability strategies.
- Strong problem-solving and debugging skills, with a systematic approach to incident response, root cause analysis, and continuous improvement.
Other Jobs from NewsBreak
Software Engineer, Security & Compliance
Machine Learning Engineer (Junior & New Grad)
Software Engineer, Web
Software Engineer, Cloud & Platform
AI Development Intern, 2025
Similar Jobs
Lead Full Stack Software Developer - Boeing Defense & Space
Associate Full Stack Software Developer - Boeing Defense & Space
Experienced Full Stack Software Developer - Boeing Defense & Space
Engineering Manager (f/m/d)
Principal Software Engineer - Kubernetes
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say