Research Scientist - Data
Team: Research
Location: Sunnyvale, CA
Commitment: Full-time
Workplace Type: onsite
Key Responsibilities
- Pioneer web-scale data collection and curation methodologies for LLMs and multi-modal foundation models.
- Design and implement novel data synthesis pipelines for code, mathematics, and agentic reasoning datasets.
- Trace the impact of data from pre-training to final model capabilities and create automated quality assessment frameworks for massive datasets
- Design data recipes that maximize model capabilities across diverse domains.
- Optimize data-model co-design for improved training dynamics.
- Contribute to research papers and represent MBZUAI at industry conferences and events, showcasing the institution’s AI research and innovation.
Academic Qualifications
- Minimum: Master’s in Computer Science, Data Science, or a related technical field, or equivalent practical experience required.
- Preferred: PhD or equivalent research experience in Machine Learning, NLP, or Data Science with a focus on LLMs and data is preferred.
Professional Experience
- Experience working with large language models, including evaluation, fine-tuning, and prompt engineering.
- Strong Python development skills with a focus on research-grade code and scalable data pipelines.
- Familiarity with collecting and processing large-scale datasets from open-source and web resources.
- Demonstrated ability to work with ML infrastructure (e.g., model evaluation, optimization, debugging).
- Proactive mindset with the ability to identify impactful research questions and execute on them with minimal supervision.
- Effective communication and collaboration skills for working in cross-functional teams.
- Prior research experience in areas such as web data curation and mixing, synthetizing complex datasets for training, LLM evaluation, post-training data, efficient inference, LLM-as-a-judge, tokenization.
- Strong publication record in leading AI conferences (e.g., NeurIPS, ICLR, ICML, EMNLP) and/or prior contributions to open-source AI research or data tools.
- Hands-on experience training language/mutli-modal models from scratch.
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
