Site Reliability Engineer II
Team: Research & Development
Location: Santiago de los Caballeros
Commitment: Full Time
Workplace Type: onsite
You Will
- Design and implement comprehensive monitoring strategies rather than owning observability platforms outright.
- Collaborate with DevOps and Engineering on shared observability platforms (Grafana, Prometheus/Loki, Azure Monitor/Application Insights).
- Define golden signals dashboards, measure SLOs/SLIs/error budgets, and help implement actionable alerting.
- Drive structured logging standards, distributed tracing patterns, and OpenTelemetry implementation standards for teams to deploy and SRE to validate.
- Conduct monitoring/auditing of production systems to ensure instrumentation completeness.
- Take ownership of production incident response, lead incident handling, and drive remediation.
- Conduct blameless post-incident reviews and ensure follow-through on action items.
- Continuously improve operational processes, reliability practices, and team readiness.
- Monitor system resource utilization and forecast future needs.
- Tune autoscaling configurations in partnership with Engineering teams.
- Evaluate capacity efficiency and support cost optimization strategies.
- Validate DR environments and test failover processes—not build them.
- Ensure DR capabilities are functioning as-designed with clear documentation.
- Define and lead regular DR drills in partnership with Engineering/Platform teams.
- Work with the Non-Functional Testing team on resilience and DR scenario simulations.
- Support chaos experiment planning and validation as a nice-to-have capability.
You Have
- 5+ years in Site Reliability Engineering, Production Engineering, or related operations roles.
- Strong knowledge of cloud-native systems, preferably Microsoft Azure.
- Experience with observability tooling (Grafana ecosystem, Prometheus/Loki, Azure Monitor, Application Insights).
- Understanding of DR concepts, failover validation, and operational readiness.
- Familiarity with chaos engineering practices (nice-to-have).
- Ability to read Terraform/HCL is a plus but not required.
- Strong grasp of SRE principles (SLOs/SLIs, error budgets, toil reduction, postmortems).
- Strong collaboration and communication skills.
- Mindset We Value
- Treat observability as a foundational product feature — not an afterthought.
- Proactively break systems to strengthen them.
- Automate away repetitive pain and convert incidents into lasting defenses.
- Clearly articulate complex risks, trade-offs, and recovery approaches to both technical and non-technical stakeholders.
- Remain composed during incidents while relentlessly focused on prevention.
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
