TikTok

Senior Server Operations Engineer- USDS

Seattle, WA
Shell Python Ansible Docker Kubernetes
Description
TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. U.S. Data Security (“USDS”) is a subsidiary of TikTok in the U.S. This new, security-first division was created to bring heightened focus and governance to our data protection policies and content assurance protocols to keep U.S. users safe. Our focus is on providing oversight and protection of the TikTok platform and U.S. user data, so millions of Americans can continue turning to TikTok to learn something new, earn a living, express themselves creatively, or be entertained. The teams within USDS that deliver on this commitment daily span across Trust & Safety, Security & Privacy, Engineering, User & Product Ops, Corporate Functions and more.

Why Join Us
Creation is the core of TikTok's purpose. Our platform is built to help imaginations thrive. This is doubly true of the teams that make TikTok possible.
Together, we inspire creativity and bring joy - a mission we all believe in and aim towards achieving every day.
To us, every challenge, no matter how difficult, is an opportunity; to learn, to innovate, and to grow as one team. Status quo? Never. Courage? Always.
At TikTok, we create together and grow together. That's how we drive impact - for ourselves, our company, and the communities we serve.
Join us.

The USDS Edge Server Operations team builds and manages the full lifecycle of server and network infrastructure in multiple PoP locations to support TikTok's rapid growth. The goals of the team are to support rapid deployment in new edge locations, efficient and secure management of production infrastructure, and the decommission of infrastructure as we transition to larger facilities and new generations of hardware and architectures.

As a Server Operations Engineer, you will participate in managing the full lifecycle of our PoP server fleet, including initial deployment, OS installation, service provisioning, inventory management, troubleshooting and repairs, and eventual decommissioning and secure recycling.

In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department. We regularly review our hybrid work model, and the specific requirements may change at any time.

Responsibilities:
- Own all server infrastructure in our PoP data centers, including deployment, availability, and maintenance.
- Work cross-function on data center buildout and server deployment, reconfiguration, management, and decommission.
- Contribute to tooling and processes for managing the server fleet, including OS installation and updates, configuration management, BIOS and firmware updates, and fault detection.
- Work with hardware vendors to coordinate faulty hardware repairs and replacements.
- Contribute to and deploy security-assured OS images and packages. Contribute to our CI/CD frameworks for OS certification.
- Maintain software repositories and apply software updates.
- Resolve complex technical issues within the data center, including software errors and bugs, hardware failures, and network issues.
- Contribute to policies and procedures for the day to day operations to ensure a secure and highly available production environment.
- Build tools and automations to improve operational efficiency.
- Occasional U.S. travel required.
- Provide periodic on-call support of specific functions.Qualifications
Required Qualifications:
- Bachelor's degree in Computer Science, a related field, or equivalent practical experience.
- 5+ years of experience deploying and managing Linux and hardware systems at scale in a production environment.
- Experience with Debian-based distributions in production, including configuration and package management.
- Knowledge of the interdependencies of data center functions and technologies, including cooling and power.
- Strong understanding of DNS fundamentals and troubleshooting.
- Experience in shell scripting or programming language - preferably Python.
- Experience with large-scale remote OS installation using tools such as PXE boot.
- Experience with automation and deployment tools at scale. Ansible experience highly desirable.
- Experience with out-of-band/lights-out server communication methods, such as IPMI and NCSI.
- Knowledge of Agile and DevOps practices.
- Strong and comprehensive verbal and written communications.

Preferred Qualifications
- Experience as an engineering team lead.
- Experience with Docker and Kubernetes.
- Kernel configuration changes and optimization.
- Time and project management experience.
- Production network troubleshooting experience.
- Production network hardware troubleshooting.

TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At TikTok, our mission is to inspire creativity and bring joy. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.

TikTok is committed to providing reasonable accommodations in our recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs or other reasons protected by applicable laws. If you need assistance or a reasonable accommodation, please reach out to us at https://shorturl.at/ktJP6

This role requires the ability to work with and support systems designed to protect sensitive data and information. As such, this role will be subject to strict national security-related screening.

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

50,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 249 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

Cancel anytime / Money-back guarantee

Wall of love from fellow engineers