Site Reliability Engineer (SRE)
Team: Architecture
Location: Belgrade
Commitment: Full-Time
Workplace Type: onsite
Key responsibilities
- Platform reliability & operations:
- Ensure the availability, resilience, and performance of the platform and supporting services.
- Own and improve incident management, including troubleshooting, escalation handling, and follow-ups aligned to SLAs.
- Participate in an on-call rotation, supporting production systems and driving reliability improvements from real incidents.
- Infrastructure engineering (Linux / Cloud / Kubernetes):
- Design, deploy, configure, and manage Linux-based system architecture across environments.
- Build and support platform implementations using AWS and other cloud technologies (compute-centric services and related infrastructure).
- Design and implement large and complex technology projects, from initial design through production rollout and operational handover.
- Support and maintain Kubernetes-based workloads and platform components.
- Automation & Infrastructure as Code:
- Build tooling and solutions to automate recurring operational tasks.
- Use Infrastructure as Code (IaC) to standardize and scale: Terraform for provisioning , Ansible for configuration management and automation
- Improve reliability by reducing manual steps and enabling repeatable deployments.
- CI/CD & developer enablement:
- Manage and maintain CI/CD pipelines across 20+ repositories spanning multiple technology stacks.
- Partner with Engineering teams to improve build/release consistency, pipeline reliability, and deployment safety.
- Observability & operational readiness:
- Implement and enhance monitoring, logging, and alerting, using tools such as: Prometheus, Grafana, Zabbix, Splunk, PagerDuty (or equivalent incident alerting/response tooling).
- Use metrics and incident learnings to reduce noise, improve signal, and shorten time-to-detect/time-to-recover.
- Documentation & standards:
- Produce clear, formal documentation including: Configuration standards, Troubleshooting runbooks, Infrastructure and architecture design documentation.
- Contribute to internal standards that improve consistency, security, and operational maturity.
Required skills & experience
- 5+ years of hands-on experience in Linux systems administration / engineering in production environments.
- Strong working knowledge of the following (or equivalents): Linux, Kubernetes, GitLab, Terraform, Ansible.
- Experience working in Agile (Scrum) teams.
- Experience with AWS (compute-focused services) and/or Google Cloud Platform.
- Proven experience with distributed systems design, maintenance, and troubleshooting.
- Strong scripting/coding ability in at least one of: Python, Golang, bash.
- Experience with observability and incident response tooling such as: Zabbix, Splunk, Prometheus, Grafana, PagerDuty.
- Strong communication skills in English, with the ability to work effectively with customers, vendors, partners, and internal teams across levels.
- Working knowledge (expected familiarity) with datastores and messaging systems such as: PostgreSQL, MongoDB, RabbitMQ. Also Web/application infrastructure components such as: Apache, Nginx
- Demonstrated ability to learn quickly, work independently, make good decisions, and collaborate as a team player in fast-changing environments.
- Strong AI-driven mindset and curiosity about emerging AI technologies.
- Hands-on experience using AI tools (e.g., LLMs, automation frameworks, AI-assisted development tools) to enhance productivity or system performance.
Nice to have
- Experience operating highly available, high-volume web services.
- Strong initiative and self-starter attitude with minimal supervision.
- Demonstrated success reducing operational toil through automation and better tooling.
- Experience improving SLOs/SLIs, error budgets, or formal reliability practices (if applicable to your background).
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
