About the company:
Gloat puts people and companies in motion. Our Agile Workforce Operating System is helping the world's most renowned enterprises become dynamic organizations, future-fit for any eventuality, and poised for continuous growth and innovation in today's ever-changing economic climate.
We deliver AI-powered intelligence, infrastructure, and applications that enable organizations to effectively tackle change with agility, unlock capacity and productivity, and reduce workforce risk. Today we support industry leaders around the world including HSBC, Spotify, Nestle, Standard Chartered Bank, Schneider Electric, and many more.
Life at Gloat:
Gloat is a revolutionary startup with a global workforce. We have offices in Tel Aviv, New York City and London and work with customers around the globe. We value collaboration, innovative thinking, and curiosity and we’re looking for bright, driven, and passionate people to grow with us. If you care about empowering businesses and people to reach their potential, you’re in for a fun ride.
Who we’re looking for:
We’re looking for a Site Reliability Engineer (SRE) to enhance the reliability, performance, and scalability of our production infrastructure. This role goes beyond keeping systems running—you’ll be a key player in shaping the culture of reliability, driving self-healing mechanisms, proactive alerting strategies, and automation to reduce toil and improve operational efficiency. You'll work closely with engineering teams to ensure high availability, observability, and smooth incident management processes.
- Ensure reliability & scalability of our production environment across multiple cloud providers.
- Define and implement SRE best practices—fostering a culture of ownership, continuous improvement, and automation.
- Automate everything—from infrastructure deployment to self-healing mechanisms that eliminate manual intervention.
- Design and improve observability solutions (monitoring, logging, tracing) to enable faster detection and resolution of issues.
- Optimize alerting strategies to ensure actionable, high-quality alerts while minimizing noise and fatigue.
- Improve system resilience, driving chaos engineering, failover strategies, and automatic recovery processes.
- Enhance incident response processes, including on-call strategies, root cause analysis, and post-mortems to drive long-term stability.
- Collaborate with development teams to build reliable, scalable, and efficient architectures, ensuring seamless deployment and rollback processes.
- Promote a culture of reliability, educating teams on best practices, service ownership, and production-readiness.
- 3+ years of experience as an SRE, DevOps Engineer, or in a similar role.
- Strong expertise in Kubernetes and container orchestration in production.
- Hands-on experience with cloud platforms (AWS, Azure, or GCP).
- Proven experience with monitoring & observability tools (Prometheus, ELK, Grafana, Coralogix, etc.).
- Strong scripting/programming skills (Python, Go, Bash, or similar).
- Experience with Infrastructure as Code (IaC)—Terraform, Helm, or similar tools.
- Track record of improving system reliability, scalability, and performance.
- Experience designing and implementing self-healing mechanisms to minimize human intervention.
- Ability to foster a strong reliability culture across engineering teams, leading by example.
- Excellent problem-solving skills, with a proactive and ownership-driven mindset.
At Gloat, we believe that building the most important company in the history of human capital begins with having a diverse and inclusive workforce ourselves. This means that we look for individuals who can bring unique strengths, perspectives, skills, and backgrounds to our existing teams. Gloat is proud to be an Equal Opportunity Employer, and does/will not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, gender identity or expression, age, marital status, veteran status, disability status, pregnancy , parental status, genetic information, political affiliation, or any other status protected by the laws or regulations in the locations where we operate.
Other Jobs from Gloat
Site Reliability Engineer- Team Lead
NOC / Incident Manager Team Lead
Software Architect
Senior Back End Developer
Similar Jobs
CTO Manager
Senior K8S Engineer
Senior Cloud Security Engineer
Sr. Manager - Customer Success Engineer
DevOps Engineer
Sr Network Dev Engineer, Kuiper Network Services
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say