Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences– all created by our global community of developers and creators.
At Roblox, we’re building the tools and platform that empower our community to bring any experience that they can imagine to life. Our vision is to reimagine the way people come together, from anywhere in the world, and on any device. We’re on a mission to connect a billion people with optimism and civility, and looking for amazing talent to help us get there.
A career at Roblox means you’ll be working to shape the future of human interaction, solving unique technical challenges at scale, and helping to create safer, more civil shared experiences for everyone.
As a Principal Site Reliability Operations Engineer on the Reliability Team, you will manage production incidents and improve Roblox's incident processes while reporting to the Senior Operations Manager. You will maintain reliability service-level objectives, drive incidents tenaciously to resolution, and work with service teams towards appropriate action items during the incident postmortem process. If you are passionate about maintaining uptime in a complex distributed environment full of continuous change, you'll be right at home with our Reliability team.You will report to the Senior Manager, Reliability.
This role requires 3 in-office days per week.
You Will:
- Lead and manage production incidents.
- Collaborate cross-functionally to troubleshoot and resolve sophisticated technical challenges.
- Guide the implementation of incident management processes and procedures, ensuring fast and effective responses to minimize impact.
- Continually monitor system health, performance and capacity, proactively addressing potential issues.
- Conduct comprehensive post-mortem analysis to ascertain the root cause of incidents and formulate corrective measures.
- Contribute substantially to the design and enhancement of system architecture to boost reliability and performance.
- Leverage coding skills to automate daily routine tasks and enhance system efficiency.
- Serve in the Incident Manager On-Call rotation.
- Mentor junior team members.
You Have:
- At least 8+ years of experience in a comparable role within a Site Reliability Team.
- Advanced knowledge of systems and network infrastructure protocols.
- Demonstrated ability in managing, troubleshooting, and resolving incidents in distributed environments.
- Experience solving problems.
- An ability to distill complex technical issues into clear and concise language.
- Familiarity with at least one scripting or programming language to automate routine tasks (Python, Golang, or similar languages preferred).
- Bachelor's degree or equivalent experience in Computer Science, Computer Engineering, or a similar technical field
You Are:
- A great communicator; you are able to explain complex systems clearly to stakeholders and fellow engineers.
- Able to operate in potentially ambiguous circumstances during a production incident.
- Familiar with the interactions of services in a distributed system.
- Tenacious towards driving challenging production incidents to resolution.
Roles that are based in our San Mateo, CA Headquarters are in-office Tuesday, Wednesday, and Thursday, with optional in-office on Monday and Friday (unless otherwise noted).
You’ll Love:
- Industry-leading compensation package
- Excellent medical, dental, and vision coverage
- A rewarding 401k program
- Flexible vacation policy
- Roflex - Flexible and supportive work policy
- Roblox Admin badge for your avatar
- At Roblox HQ:
- Free catered lunches five times a week and several fully stocked kitchens with unlimited snacks
- Onsite fitness center and fitness program credit
- Annual CalTrain Go Pass
Roblox provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. This policy applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation and training.
Jobs from our Partners
Senior Software UI Engineer
Other Jobs from Roblox
Principal Software Engineer - Engine Foundation
Engineering Manager, Studio Builder Tools
Principal Software Engineer - Compute Cell Lifecycle
Machine Bootstrap - Senior Software Engineer
Machine Bootstrap - Principal Software Engineer
Fleet Inventory - Principal Software Engineer
Similar Jobs
Senior SRE Engineer
Senior SRE Engineer
Senior SRE Engineer
Senior Backend Developer
Lead Full Stack Developer
Software Engineer III - Backend
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
50,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 249 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
Cancel anytime / Money-back guarantee