Software Engineering - Data Engineer
Location: Menlo Park, CA
Department: Engineering
Location Type: IN_OFFICE
Employment Type: FULL_TIME
- Collect, parse, and structure diverse data types—including text, images, tables, circuit diagrams, simulations, and signal data—into standardized formats suitable for machine learning applications
- Design and maintain scalable data pipelines that efficiently handle data ingestion, transformation, and integration into ML workflows, ensuring high throughput and reliability
- Optimize data storage solutions to balance performance, scalability, and cost-effectiveness, facilitating rapid access and processing of large datasets
- Collaborate with cross-functional teams, including ML and infra engineers, to curate high-quality training and evaluation datasets aligned with Voltai's product offerings
- Implement robust data validation and quality assurance processes to ensure the integrity and usability of datasets across various applications.
- Programming Languages: Proficiency in Python, with experience in compiled languages such as Go or Rust
- Data Parsing and Extraction: Expertise in parsing and extracting data from various formats and modalities, including PDFs, HTML, images, and binary files, utilizing tools like BeautifulSoup, pdfminer.six, and custom parsers
- Data Pipeline Frameworks: Experience with modern data pipeline frameworks such as Apache Airflow, Prefect, Dagster, or Apache Beam, enabling efficient orchestration of complex data workflows
- Data Processing Tools: Familiarity with tools like Apache Spark, Apache Flink, or similar platforms for large-scale data processing and transformation
- Database Systems: Strong knowledge of relational and non-relational databases, including PostgreSQL, Supabase, and other scalable storage solutions
- Cloud Platforms: In-depth experience with cloud services, particularly AWS, including S3, EC2, Lambda, and related services for deploying and managing data infrastructure
- Web Crawling and Agentic Crawling: Proficiency in building and managing web crawlers using frameworks like Scrapy, Firecrawl, or Crawl4AI, with an understanding of agentic crawling techniques to automate data extraction tasks
- Data Quality and Governance: Commitment to maintaining high data quality standards, with experience in implementing data validation, cleansing, and governance practices
- A strong background in hardware/electronics, gained through professional, academic, or personal projects
- Experience in constructing datasets for large scale ML models, specifically LLMs
- Contributions to open-source initiatives
- Experience thriving in a fast-paced, hyper-growth startup environment
- Unlimited PTO: Recharge when you need it, no questions asked.
- Comprehensive Health Coverage: Medical, dental, and vision insurance for you and your dependents.
- Free Meals and Snacks: Daily lunches, dinners, and snacks in the office.
- Professional Growth: We invest in your continuous learning and offer opportunities to expand your skills.
- Visa Sponsorship: We welcome global talent and provide visa sponsorship to support qualified candidates.
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
