What will you be doing?
- Design, develop and support our Observability platform (Prometheus, Loki and Grafana Cloud);
- Work across engineering teams to define Service Level Objectives;
- Instrument Go microservices using OpenTelemetry;
- Write and maintain Go modules providing fundamental capabilities to our applications (e.g tracing and logging);
- Driving a culture around Incident Management and how we can learn and improve from them;
- Engaging with teams across Engineering on reliability and performance issues;
- Educating and driving the SRE mandate across the organisation.
The successful applicant will:
- Work closely with our wider engineering team to understand what reliability metrics will enable them to prioritise production stability, and where additional instrumentation will assist in diagnosing complex issues. You’ll be a go-to expert on using our observability platform to uncover the root cause of problems.
- Be managing and scaling an observability platform ingesting millions of metrics series, terabytes of trace and log data, and providing an opinionated stance on reliability through curated dashboards and SLOs.
- Evolve our existing Incident Management process and tooling, enabling all of our engineering teams to mitigate, learn and drive improvements when things go wrong.
- Partner closely with the Engineering Leadership Team to define and share key reliability findings based on production telemetry and incident reviews.
- Develop capabilities and tooling that enable our engineering community to clearly understand how their production services are running, and how they can diagnose where performance and reliability issues come from.
- Collaborate with engineering teams across SafetyCulture to help them instrument their services, understand their observability telemetry, and diagnose complex problems within our microservice architectures. You’ll have opportunities to directly contribute to reliability improvements, and to grow your passion within the SRE space.
You will have experience in:
- Expertise to operating Observability platforms at scale.
- Strong technical leadership in SRE concepts
- Knowledge of best practices for the full software development life cycle; including coding standards, code reviews, source control management, build processes, testing, and operations.
- Experience in designing and building complex software and at scale systems
Your professional background will comprise of:
- Tertiary degree in Computer Science or related technical field, or equivalent practical experience.
- 8+ years relevant experience in software development and mentorship experience.
- A solid understanding of monitoring, logging, tracing, and observability instrumentation.
- Experience working with observability platforms like Grafana / Datadog / New Relic / Honeycomb.
- A solid background in SRE concepts like SLOs.
- Experience in defining and driving a culture of Incident Management
- Proven experience of working on complex and large-scale projects that require high-level technical skills, creativity, and leadership.
- Proficiency with one or more general purpose programming languages including but not limited to: C#, Golang, C++, Python, Java, Typescript, Scala.
What Do I Get Access To When Working at SafetyCulture?
- Equity with high growth potential, and a competitive salary.
- Hybrid working; we encourage you to create the best work blend while working from your home and the local SafetyCulture office.
- Access to professional and personal training and development opportunities.
- Participation in hackathons, workshops, and lunch & learn sessions.
- Community involvement, open source work, attending talks and events, and experimenting with new technologies.
What are the office benefits?
- In-house Culinary Crew serving up daily breakfast, lunch, and snacks.
- Barista coffee machine, craft beer on tap, boutique wines, and a range of non-alcoholic beverages.
- Quarterly celebrations and team events.
- Table tennis, board games, book library, and pet-friendly office.
Other Jobs from SafetyCulture
Staff Backend Engineer
Senior Software Engineer - SRE
Senior Frontend Engineer
Senior Backend Engineer
Senior Web Developer (E-Commerce) - SC Marketplace
Similar Jobs
Senior Full Stack Software Engineer
Full Stack Software Engineer
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 401 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say