Platform Engineer - Product Reliability (Mid/Senior Level)
Team: Platform Engineering
Location: Melbourne
Commitment: Full-time
Workplace Type: hybrid
What you'll do:
- Teach and support product teams on best practices for reliability, implementation patterns and effective usage of our existing platforms
- Support product teams in improving the performance and availability of their systems
- Be hands-on in code and infrastructure to help product teams with reliability improvements
- Provide comprehensive feedback to the wider Platform group on improvements to be made to core infrastructure based on observations and first-hand experience in the code base
- Support the build-out of proof-of-concept requirements in product teams as needed to evolve application deployment architecture to align with business growth as well as enhance scalability and system resilience
- Collaborate with product teams to support the release of new features and services, ensuring adherence to reliability and performance standards
- Guide product teams in designing systems for resilience and graceful failure under heavy load
- Assist application teams with post-incident tasks and follow-ups, and contribute to the creation and review of post-mortem documentation
- Analyse incident metrics to identify trends and potential improvements, communicating these insights to the product teams
- Help solve interesting and difficult problems. There’s a great opportunity for disruption in the global energy market
What you'll have:
- Great communication skills, working effectively with developers, product managers and other business stakeholders to understand, design and deliver impactful projects and reliability improvements
- AWS (supporting and improving cloud infrastructure used by product teams)
- Terraform (infrastructure as code; comfortable operating with Terraform day-to-day)
- Kubernetes (container orchestration and deployment management; comfortable working with Kubernetes day-to-day)
- Experience using industry-standard observability tooling - we use Datadog, Grafana, Prometheus and Rootly (experience with other monitoring/alerting platforms is transferable)
- Strong collaboration and communication skills - able to work effectively with developers, product managers, and other stakeholders to design and deliver impactful observability “golden paths” and monitoring experiences
- Exposure to Python (or a similar C-based language like TypeScript, Go, C#) - able to understand how applications behave in production to support observability and reliability improvements
- Previous experience working in small, highly autonomous teams
- Comfortable with ambiguity and able to create structure in unclear situations
- Proactive learning mindset (experiment, iterate, and adapt as the team evolves approaches)
- Strong asynchronous written communication (Slack/Notion/docs) and a habit of keeping others in the loop
- Autonomy and accountability - making progress independently and owning outcomes
What will help:
- Previous experience as a Site Reliability Engineer
- Experience working on SaaS platforms, including engaging product teams to ensure up-skilling and knowledge sharing across teams
- Experience managing and supporting a large scale internet facing service
- Experience in responding to incidents and outages, writing technical incident reports and organising incident retrospectives
- Experience working with very large relational databases
- Experience in using service level objectives to improve application performance
- A proactive, innovative mindset
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
