Description

Coralogix is a modern, full-stack observability platform transforming how businesses process and understand their data. Our unique architecture powers in-stream analytics without reliance on expensive indexing or hot storage. We specialize in comprehensive monitoring of logs, metrics, trace and security events with features such as APM, RUM, SIEM, Kubernetes monitoring and more, all enhancing operational efficiency and reducing observability spend by up to 70%.

We are seeking a Site Reliability Engineering (SRE) Group Leader to join our fast-paced and dynamic environment. As the Site Reliability Engineering (SRE) Group Leader, you will be at the forefront of ensuring the availability, stability, and performance of Coralogix's production platform. You will lead three specialized teams focusing on production availability and stability, observability, and production insights, while maintaining 99.9% uptime and ensuring immediate response to production issues.This role requires deep expertise in cloud technologies, Kubernetes, and the observability ecosystem. You'll work collaboratively across teams, setting objectives, defining metrics, and driving measurable improvements in platform reliability.

Key Responsibilities

Production Reliability: Ensure the platform achieves and maintains 99.9% uptime by implementing robust SRE practices.
Incident Response: Oversee immediate response to any production issues, ensuring timely resolution and minimizing downtime.
Strategic Leadership: Lead and mentor three teams specializing in production availability, observability, and production insights, fostering a culture of accountability and collaboration.
Cloud and Kubernetes Expertise: Drive optimization and reliability improvements using cloud technologies, Kubernetes, and Kubernetes operators.
Observability Leadership: Develop and enhance observability solutions, ensuring comprehensive monitoring, alerting, and actionable insights across production systems.
Data-Driven Decision-Making: Leverage production insights and metrics to drive system optimization and improvements.
Cross-Team Collaboration: Partner with engineering, product, and support teams to align on priorities, objectives, and deliverables for production excellence.

Production Focus: Extensive experience managing large-scale production systems with a focus on maintaining high availability (≥99.9%).
Incident Management Expertise: Proven ability to manage incident response processes and ensure rapid resolution of production issues.
Observability Knowledge: Strong understanding of observability tools like Prometheus, Grafana, OpenTelemetry, and the broader observability ecosystem.
Leadership Skills: Proven ability to manage and scale engineering teams, with experience leading multiple teams or groups.
OKR Experience: Ability to define objectives, measure performance, and drive results through OKR frameworks.
Problem-Solving Skills: Demonstrated expertise in troubleshooting and optimizing distributed systems and cloud environments.
Collaboration Skills: Strong ability to work across teams and departments, aligning technical efforts with organizational goals.

Preferred Qualifications:

Experience in companies within the observability domain (e.g., Datadog, New Relic, Sumologic).
Familiarity with incident management tools (PagerDuty, OpsGenie, etc.) and chaos engineering practices.
Background in designing and implementing SLOs for production systems.
Experience optimizing systems for high-throughput and low-latency workloads.

Cultural Fit

We’re seeking candidates who are hungry, humble, and smart. Coralogix fosters a culture of innovation and continuous learning, where team members are encouraged to challenge the status quo and contribute to our shared mission. If you thrive in dynamic environments and are eager to shape the future of observability solutions, we’d love to hear from you.

Coralogix is an equal opportunity employer and encourages applicants from all backgrounds to apply.

Coralogix

Analytics Artificial Intelligence Machine Learning SaaS Software

0 applies

6 views

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 401 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say

Coralogix

Site Reliability Engineering (SRE) Group Leader

Ugh.. sorry 😔 This job is closed.

Check out similar jobs below 😊

Other Jobs from Coralogix

Senior Software Engineer

Backend Tech Lead

Cloud infrastructure Team Lead

Database Reliability Engineer (DBRE)

Software Engineering Group Leader (Metrics)

Similar Jobs

Back-End Software Engineer - Golang

Backend Software Engineer

Senior Software Engineer, Full-Stack

Guidewire Software Engineer- Mid Level

Senior Software Engineer, Backend

Senior Software Engineer, Backend