Data Engineer
Team: Engineering
Location: Sunnyvale, CA
Commitment: Full-time
Workplace Type: onsite
Key Responsibilities
- Rapidly collect, curate, and preprocess datasets based on detailed specifications provided by NLPresearchers,delivering data within tight timelines.
- Develop and maintain efficient web crawling solutions, APIs, and automated workflows to continuously improve data collection processes.
- Refine and evaluate outputs from Large Language Models (LLMs) to generate structured datasets suitable for model training and benchmarking.
- Implement scalable data pipelines, ensuring efficient data processing, storage, retrieval, and distribution to research teams.
- Collaborate closely with researchers and engineers to ensure collected data meets specified quality and relevance criteria.
- Document data collection methodologies, dataset characteristics, and pipeline architecture clearly and effectively.
- Engage with peer teams and participate in technical reviews to uphold best practices and data quality standards.
- Represent MBZUAI at industry and research forums, showcasing technical capabilities in large-scale data processing and AI data infrastructure.
Academic Qualifications
- Bachelor's degree in Computer Science, Data Science, Engineering, or a related technical field required
- Master’s degree or PhD degree or equivalent experience in Computer Science, Data Engineering, or related technical fields preferred.
Professional Experience - Required
- Extensive experience in data engineering, data processing, and automation using Python.
- Demonstrated proficiency in designing and deploying web crawling solutions, automated data extraction, and processing pipelines.
- Strong understanding of data structures, algorithms, databases, SQL, and performance optimization.
- Experience working with cloud infrastructure and distributed data processing frameworks (e.g., AWS, Spark, Kafka, Kubernetes).
- Excellent problem-solving abilities, attention to detail, and the capability to rapidly address technical challenges.
- Strong communication and collaboration skills with cross-functional teams.
Professional Experience - Preferred
- Proven track record of supporting NLP or AI research teams with rapid and reliable data delivery.
- Experience working with large language models, including evaluation, efficient inference, and prompt engineering.
- Experience with refining outputs from large-scale AI models, such as LLM-generated data.
- Contributions to open-source projects, coding competitions, or high visibility in coding communities (e.g., GitHub, Stack Overflow).
- Familiarity with the latest advancements in NLP data processing and large language model technologies.
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
