Platform Engineer - Observability (Mid Level)
Team: Platform Engineering
Location: Melbourne, Remote, AU
Commitment: Full-time
Workplace Type: hybrid
What you'll do:
- Support and implement monitoring and alerting strategy across Kraken’s customer business
- Define and uphold observability best practices across multiple products and platforms
- Partner with product teams to implement observability tooling and improve reliability across the organisation
- Help product teams build best-in-class dashboards for their requirements or bespoke use cases
- Work with product teams to define and implement meaningful Service Level Objectives (SLOs) and Service Level Indicators (SLIs), aligned to contractual Service Level Agreements (SLAs)
- Build, tune, and continuously improve alerts and monitors using golden signals (latency, traffic, errors, saturation) as a framework - reducing noise and increasing actionable signal
- Help product teams transition to on-call models by improving signals, alert quality, and operational readiness
- Improve tooling and self-service capabilities for alerting and monitoring across multiple product teams
- Analyse incident metrics to identify trends and improvement opportunities, communicating insights clearly back to product teams
- Manage the cost and usage of our observability tooling stack in collaboration with FinOps
- Contribute to broader platform reliability infrastructure improvements where needed
- Help solve interesting and difficult problems - there’s a significant opportunity for disruption in the global energy market
What you'll have:
- AWS (supporting and improving cloud infrastructure used by product teams)
- Terraform (infrastructure as code; comfortable operating with Terraform day-to-day)
- Kubernetes (container orchestration and deployment management; comfortable working with Kubernetes day-to-day)
- Experience using industry-standard observability tooling - we use Datadog, Grafana, Prometheus and Rootly (experience with other monitoring/alerting platforms is transferable)
- Strong collaboration and communication skills - able to work effectively with developers, product managers, and other stakeholders to design and deliver impactful observability “golden paths” and monitoring experiences
- Exposure to Python (or a similar C-based language like TypeScript, Go, C#) - able to understand how applications behave in production to support observability and reliability improvements
- Previous experience working in small, highly autonomous teams
- Comfortable with ambiguity and able to create structure in unclear situations
- Proactive learning mindset (experiment, iterate, and adapt as the team evolves approaches)
- Strong asynchronous written communication (Slack/Notion/docs) and a habit of keeping others in the loop
- Autonomy and accountability - making progress independently and owning outcomes
What will help:
- Previous experience working in a data-focused or Observability team
- Experience working on SaaS platforms, including engaging product teams to ensure upskilling and knowledge sharing
- Experience building observability tooling to support large-scale internet-facing services
- Experience instrumenting and diagnosing issues with very large relational databases
- Familiarity with PostgreSQL (or similar RDBMS), particularly Amazon RDS at scale
- Experience using SLOs to drive meaningful performance and reliability improvements
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
