The USDS TikTok Recommendations Infra SRE team works with engineering and product teams to build and run large-scale, globally distributed, observable, fault-tolerant systems. SREs on this team will deliver on production ownership and be responsible for observability and automation across complex, large-scale service mesh architectures.
In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department. We regularly review our hybrid work model, and the specific requirements may change at any time.
Responsibilities
• Engage in and improve the whole lifecycle of Recommendation systems — from system design consulting through to launch reviews, deployment, operation and refinement
• Deliver tools/software to improve the reliability and scalability of services, automate operations and improve R&D efficiency
• Build availability of large-scale services deployed across global data centers
• Plan, manage and optimize cloud resources utilization, ensuring SLA of large-scale clusters
• Measure and monitor availability, latency and overall service health
• Practice sustainable incident response and postmortems.Minimum Qualifications
• Bachelor's degree or above majoring in Computer Science or related fields, with at least 2 years of related work experience
• Experience in SRE of large-scale systems deployment with high reliability and scalability
• Familiar with system operation skills in Linux and network
• Experience programming in at least one of the following languages: Python, Perl, Go, or C/C++
• Experience in designing, analyzing and troubleshooting large-scale distributed systems
• Familiar with popular CI/CD procedures and environments
• Effective communication skills and a sense of ownership and drive
Candidates for this position must be legally authorized to work in the United States. This position is not eligible for visa sponsorship or support.
Other Jobs from TikTok
Site Reliability Engineer, Compute - USDS
Site Reliability Engineer, Compute - USDS
Engineering Manager,TikTok Ads Monetization - USDS
Similar Jobs
Site Reliability Engineer, Compute - USDS
Site Reliability Engineer, Compute - USDS
AI Senior Software Engineer - SDLC AI Test Automation
Software Engineer, Infrastructure Automation
Senior Software Engineer
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say