Volt AI

Data Engineer

Menlo Park, CA
Python Go Rust HTML PDF BeautifulSoup Apache Airflow Prefect Dagster Apache Beam Apache Spark Apache Flink PostgreSQL Supabase AWS S3 EC2 Lambda Scrapy Machine Learning LLM
Description

Software Engineering - Data Engineer

Location: Menlo Park, CA

Department: Engineering

Location Type: IN_OFFICE

Employment Type: FULL_TIME

About Voltai
Voltai is the leading AI company building agentic systems and frontier foundation models for semiconductor and electronics design. Backed by Sequoia Capital, we’re putting AI in the hands of hardware engineers in over 70% of the world’s largest semiconductor and electronics companies to have effortless control over their next-generation chip and board designs, powering the future of automotive, industrial automation, consumer electronics, IoT, and semiconductor manufacturing. 

About the Team
Our founding team consists of IOI/IPhO olympiad medalists, Stanford professors, ex-CTO of Synopsys, and our business leadership has scaled revenue in their previous companies to over $1.5bn. At Voltai, we are combining the world’s best talent in the intersection of software and hardware.


Key Responsibilities
  • Collect, parse, and structure diverse data types—including text, images, tables, circuit diagrams, simulations, and signal data—into standardized formats suitable for machine learning applications
  • Design and maintain scalable data pipelines that efficiently handle data ingestion, transformation, and integration into ML workflows, ensuring high throughput and reliability
  • Optimize data storage solutions to balance performance, scalability, and cost-effectiveness, facilitating rapid access and processing of large datasets
  • Collaborate with cross-functional teams, including ML and infra engineers, to curate high-quality training and evaluation datasets aligned with Voltai's product offerings
  • Implement robust data validation and quality assurance processes to ensure the integrity and usability of datasets across various applications.

Required Skillsets
  • Programming Languages: Proficiency in Python, with experience in compiled languages such as Go or Rust
  • Data Parsing and Extraction: Expertise in parsing and extracting data from various formats and modalities, including PDFs, HTML, images, and binary files, utilizing tools like BeautifulSoup, pdfminer.six, and custom parsers
  • Data Pipeline Frameworks: Experience with modern data pipeline frameworks such as Apache Airflow, Prefect, Dagster, or Apache Beam, enabling efficient orchestration of complex data workflows
  • Data Processing Tools: Familiarity with tools like Apache Spark, Apache Flink, or similar platforms for large-scale data processing and transformation
  • Database Systems: Strong knowledge of relational and non-relational databases, including PostgreSQL, Supabase, and other scalable storage solutions
  • Cloud Platforms: In-depth experience with cloud services, particularly AWS, including S3, EC2, Lambda, and related services for deploying and managing data infrastructure
  • Web Crawling and Agentic Crawling: Proficiency in building and managing web crawlers using frameworks like Scrapy, Firecrawl, or Crawl4AI, with an understanding of agentic crawling techniques to automate data extraction tasks
  • Data Quality and Governance: Commitment to maintaining high data quality standards, with experience in implementing data validation, cleansing, and governance practices

Bonus Points
  • A strong background in hardware/electronics, gained through professional, academic, or personal projects
  • Experience in constructing datasets for large scale ML models, specifically LLMs
  • Contributions to open-source initiatives
  • Experience thriving in a fast-paced, hyper-growth startup environment

Our Benefits
  • Unlimited PTO: Recharge when you need it, no questions asked.
  • Comprehensive Health Coverage: Medical, dental, and vision insurance for you and your dependents. 
  • Free Meals and Snacks: Daily lunches, dinners, and snacks in the office.
  • Professional Growth: We invest in your continuous learning and offer opportunities to expand your skills.
  • Visa Sponsorship: We welcome global talent and provide visa sponsorship to support qualified candidates.

Volt AI
Volt AI

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say