Senior Site Reliability Engineer (L3)
Team: Site Reliability Engineering
Location: Malaysia
Commitment: Full-time
Workplace Type: remote
Job Responsibilities
- System Architecture: Review architecture and software components with software engineers. Ensure best practices are consistent across all teams.
- Operational Excellence: Own and ensure SLOs and SLAs are met. Monitor operational metrics and lead improvement plans. Develop and maintain tools including infra-as-code resources to scale operations and allow other teams to be autonomous.
- Security and Compliance: Manage and audit security controls to meet enterprise requirements. Implement and maintain best practices and compliance standards. Collaborate with legal and compliance to assess overall risk management.
- Release Planning: Lead strategic release plans (e.g., canary or blue-green deployments) to reduce blast radius and allow for faster reversal during release failures. Work closely with developers for pre-release requirements including provisioning test environments. Conduct ad hoc performance tests based on requirements.
- Incident Management: Lead incident response and post-mortems to resolve production issues, identify root-causes and prevent future occurrences.
- Disaster Recovery: Develop and implement DR plans and procedures, including data recovery and fault injection simulations on production replica.
- Daily Operations: Perform and improve day-to-day tasks including access onboarding-offboarding, config and patch management etc. Plan capacity to ensure our systems have sufficient capacity to handle peak demand while optimizing cost.
- Documentation: Develop and extend runbooks, documentation and other technical assets. Support periodic technical audits as required.
- Sharpen the Saw: Stay up-to-date with emerging trends and technologies in software development and contribute to knowledge sharing. Learn advanced architecture standards and new tools that improve the team’s code base and productivity. Demonstrate thorough understanding of a subject matter and how to apply it effectively.
- Team player: Collaborating with cross-functional teams to ensure smooth deployment and operation of software releases. Answer technical questions from other teams or outside the organization.
- Coaching: Provide feedback on the performance of junior staff and participate in people development initiatives.
- Support any ad hoc tasks as required by the company.
Job Requirements
- Proven track record: 3 to 5 years in managing software deployments and instrumentation in production environments with defined SLAs and SLOs. Strong knowledge of software delivery and devops principles.
- Cloud Operations: Experience with cloud platforms (e.g., AWS, CloudFlare, GCP) and infrastructure-as-code tools (e.g., Terraform, CloudFormation). Strong programming and scripting skills, preferably in languages such as Python, Go, or Ruby.
- Accreditation: Bachelor’s degree in Comp Sci., InfoSec or similar fields, or professional certificates e.g. Certified DevOps Professional, Certified Solutions Architect Professional in AWS or GCP.
- Scope of Work: Fully capable of taking substantial features from concept to shipping as a sole contributor. Works effectively in open-ended projects and is self-sufficient to deep dive and evaluate multiple solutions to a problem.
- Problem Solving: Solve hard problems with many constraints, using sound judgment to assess risks and present arguments in a well-structured, data-backed, written narrative. Have passion, creativity and empathy for users.
- Quick Thinking: Able to derive information, think critically and make snap judgements based on measured data in high pressure situations.
- People Skills: Strong communicator who is able to build positive working relationships between teams and form relationships with key customers. You must have experience supporting on-call rotations for 24x7 services to troubleshoot, perform runbooks or escalate incidents.
- Nice to have:
- Experience working in a growth stage startup.
- Experience building applications in different tech stacks.
- Keen interest in decentralized technologies and its applications including cryptocurrencies.
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
