Site Reliability Operations Manager
Location: Athens, Greece
Department: Site Reliability
We are Kaizen Gaming
Kaizen Gaming, the team powering Betano, is one of the biggest GameTech companies in the world, operating in 20 markets. We always aim to leverage cutting-edge technology, providing the best experience to our millions of customers who trust us for their entertainment.
We are a diverse team of more than 2.700 Kaizeners, from 40+ nationalities spreading across 3 continents.
Our #oneteam is proud to be among the Best Workplaces in Europe and certified Great Place to Work across our offices. Here, there’ll be no average day for you. Ready to Press Play on Potential?
Let’s start with the role
As a Site Reliability Operations Manager, you will lead the operational reliability layer of our production environment, ensuring 24/7 service stability across networks, applications, and infrastructure.
You will own the performance and evolution of our Site Reliability Operations function — managing shift-based teams, strengthening incident response practices, driving measurable improvements in uptime, response time, and operational maturity, and directly handling and overseeing the end-to-end incident flow.
You will be responsible for ensuring that incidents are properly triaged, escalated, coordinated, and resolved, while continuously improving our incident management processes.
This role sits at the intersection of Infrastructure, Platform, Security, and Product, ensuring that reliability is not reactive, but engineered and continuously improved.
Reliability at scale in a high-traffic, real-time gaming environment demands precision, discipline, and strong leadership. This role is critical to that mission.
- Lead and develop the Site Reliability Operations team, ensuring high performance across 24/7 shift coverage.
- Own incident management processes, including severity classification, escalation paths, communication standards, and post-incident reviews.
- Ensure proactive monitoring of production systems with meaningful alerting that minimizes noise and maximizes actionability.
- Track and improve key operational metrics such as MTTA, MTTR, uptime, and SLA adherence.
- Establish and refine standard operating procedures for monitoring, escalation, and vendor coordination.
- Drive structured communication during incidents, ensuring clear updates to technical and business stakeholders.
- Collaborate closely with SRE, Infrastructure, Security, and Engineering teams to eliminate recurring incidents through root cause analysis and systemic improvements.
- Oversee relationships with external vendors and providers during both routine operations and major outages.
- Promote a culture of operational excellence, accountability, and continuous improvement.
- Participate in capacity planning and operational readiness reviews for new launches and major changes.
What you will bring
- Proven experience leading technical operations or NOC/SRE Operations teams in high-availability environments.
- Strong understanding of production monitoring, alerting systems, and incident management frameworks.
- Solid knowledge of networking fundamentals (TCP/IP), infrastructure components, and cloud or hybrid environments.
- Experience working in 24/7 operational models with shift-based teams.
- Hands-on familiarity with ticketing systems and operational reporting.
- Ability to analyze operational data and translate it into improvement initiatives.
- Strong stakeholder communication skills, especially under pressure.
- Structured thinker with close attention to detail and strong execution discipline.
- Experience in gaming, fintech, e-commerce, or other real-time, high-scale digital environments is considered a strong plus.
Recruitment Privacy Notice
Regarding the data you share with us, you may find and read our recruitment privacy notice here.
We are an equal opportunity employer committed to fostering a diverse and inclusive workplace. We welcome applications from individuals of all backgrounds, regardless of race, gender, religion, sexual orientation,or age.
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
