TikTok

Software Engineer, Model Training (LLM) - Trust and Safety - Canada

Vancouver, British Columbia
Kubernetes PyTorch Machine Learning Deep Learning C++ Go Python Shell
Description
TikTok is the leading destination for short-form mobile video. At TikTok, our mission is to inspire creativity and bring joy. TikTok's global headquarters are in Los Angeles and Singapore, and its offices include New York, London, Dublin, Paris, Berlin, Dubai, Jakarta, Seoul, and Tokyo.

Why Join Us
Creation is the core of TikTok's purpose. Our platform is built to help imaginations thrive. This is doubly true of the teams that make TikTok possible.
Together, we inspire creativity and bring joy - a mission we all believe in and aim towards achieving every day.
To us, every challenge, no matter how difficult, is an opportunity; to learn, to innovate, and to grow as one team. Status quo? Never. Courage? Always.
At TikTok, we create together and grow together. That's how we drive impact - for ourselves, our company, and the communities we serve.
Join us.

The Trust and Safety(TnS) engineering team is responsible for protecting our users from harmful content and abusive behaviors. With the continuous efforts of our trust and safety engineering team, TikTok can provide the best user experience and bring joy to everyone in the world. Our team is responsible for achieving goals by building content moderation process systems, rule engine, strategy systems, feature engine, human moderation platforms, risk insight systems and all kinds of supportive platforms across TnS organization.


Responsibilities - What You'II Do
1. Work closely with business teams to optimize the integration plan for algorithm applications, improve efficiency in evaluating and using algorithm applications across various business scenarios, and reduce the cost of managing and optimizing algorithm applications in different business scenarios.
2. Be responsible for the architectural design, development, and performance tuning of algorithm applications, solving technical challenges such as high concurrency, high reliability, and high scalability. Work includes multiple sub-areas: ML model training and evaluation, model optimization, model inference, model management, dataset management, workflow orchestration, etc.
3. Responsible for the design and development of Machine Learning infrastructure for LLM/AIGC, etc
4. Build up a super large machine learning system integrating GPUs, RDMA networking, and high-performance storage
5. Be responsible for researching and implementing cutting-edge engineering technologies related to LLM, NLP, CV.Qualification
- Hands-on experience in one or more of the following areas: Machine Learning, Deep Learning, Recommender Systems, Natural Language Processing, or Computer Vision
- Be proficient in 1 to 2 programming languages such as C++/Go/Python/Shell in Linux environment
- Understand the principles of distributed systems and have experience in design, development and maintenance of large-scale machine learning systems
- Be familiar with Kubernetes architecture, and have rich experience in system-level development and tuning
- Familiar with the ML Infrastructure of Large Model training and inference
- Strong understanding and engineering experience of cutting-edge LLM research and engineering (e.g., long context, multi modality, active learning, alignment research, agent ecosystem, etc.) and possess practical expertise in effectively implementing these advanced systems.
- Proficiency in programming languages such as Python, CUDA or C++ and a track record of working with deep learning frameworks (e.g., pytorch, deepspeed, megatron, vllm, etc.).
- Have experience with large scale data processing and parallel computing


Preferred Qualifications
- Excellent programming skills, data structure and algorithm skills, proficient in C/C++ or Python programming language, candidates with awards in ACM/ICPC, NOI/IOI, Top Coder, Kaggle and other competitions are preferred.
- Research or industry experience in the field of machine learning, especially in large language models (LLMs) and generative artificial intelligence.
- Distributed training framework optimizations such as DeepSpeed, FSDP, Megatron, GSPMD
- Experiences in in-depth CUDA programming and performance tuning (cutlass, triton)
- Experience with evaluation of ML models, LLM application & agent development is desirable.
- PhD/Master's degree required, with top artificial intelligence conference papers (NeurIPS, ICML, ICLR, CVPR, ACL, EMNLP, etc.) in machine learning (ML), computer vision (CV), natural language processing (NLP) and other fields.

TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At TikTok, our mission is to inspire creativity and bring joy. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.

TikTok is committed to providing reasonable accommodations in our recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs or other reasons protected by applicable laws. If you need assistance or a reasonable accommodation, please reach out to us at https://shorturl.at/cdpT2

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

50,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 264 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

Cancel anytime / Money-back guarantee

Wall of love from fellow engineers