Senior Site Reliability Engineer
Team: Platform Experience
Location: United Kingdom, Spain
Commitment: Full-Time
Workplace Type: remote
What you'll do:
- Lead the design of scalable, fault-tolerant and self-healing systems in a multi-region AWS environment.
- Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to drive architectural decisions and error budget policies.
- Conduct blameless post-incident reviews to uncover systemic root causes and implement long-term preventive measures.
- Identify patterns of manual work and lead the development of internal tools/automation to permanently eliminate them.
- Develop and maintain automated runbooks and playbooks for common operational tasks and complex incident response.
- Shift from simple monitoring to deep observability, ensuring high cardinality data leads to proactive actionable insights.
- Proactively identify and mitigate operational risks through chaos engineering and architecture reviews.
- Work with software engineers to design systems for reliability, scalability, and maintainability from the early stages of the SDLC.
- Continuously evaluate and optimize system performance, capacity, and cost efficiency.
- Beyond just participating, you will refine the on-call experience to reduce alert fatigue, improve MTTR, and ensure sustainable rotation health.
Must Haves:
- Bachelor’s degree in Computer Engineering or a similar discipline.
- 5+ years of experience as a Site Reliability Engineer or in a similar role.
- 3+ years of experience with AWS services including strong knowledge of container orchestration.
- 2+ years of Kubernetes experience
- Deep understanding of observability principles and tools like (Prometheus, Datadog, OpenTelemetry).
- Experience with leading incident management and complex postmortem analysis.
- Experience and interest in managing infrastructure as code (Terraform).
- Experience with chaos engineering and other techniques for testing system resilience.
- Experience with CI/CD tools such as GitHub Actions ****for automated delivery.
- Proficiency in at least one programming language (Python, Go, Java, etc.) for building automation and internal tooling.
- Event-driven architecture experience (SNS, SQS etc)
- Ability to work independently and collaboratively in a fast-paced environment.
- Team player and open to new ideas.
- Good communication skills and fluency in English.
Good to have:
- Prior experience with Scrum and other agile methods.
- Certification in relevant areas such as AWS Certified DevOps Engineer, Certified Kubernetes Administrator (CKA), or similar.
- Prior experience with Telco Core Networks (e.g., 5G/LTE Packet Core, IMS, Signaling) and low-latency networking.
- Experience with AI-driven SRE tools for anomaly detection and improvements
- Contributions to open-source SRE projects or communities.
- Prior work experience in telecommunications.
- Deep understanding of eSIM and GSMA related technologies and services.
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
