Wynd Labs

Web Scraping Specialist

Remote
Python JavaScript BeautifulSoup Scrapy Selenium HTML CSS DOM MongoDB Cassandra Machine Learning AWS GCP Azure
Description

Web Scraping Specialist

Department: Analytics

Location: remote

Employment Type: FullTime

Who We Are:

We build infrastructure that delivers massive amounts of web data to the companies training the world’s most powerful AI models.

We're the team that helps to power and support Grass, a bandwidth-sharing network that lets us operate a massive distributed crawler, giving us unique access to high-quality public web data at global scale. On top of that, we’ve built pipelines for ingesting, segmenting, and annotating billions of videos, transcripts, and audio files, powering dataset creation for frontier labs.

We’re lean, technical, and move fast. No red tape, no slow decision-making; just a team of builders pushing to expand what’s possible for open web data and AI.

The Role.

We are seeking a Web Scraping Specialist who is proficient and brings significant experience in data extraction and web scraping techniques. You will join a small, specialized team and lead efforts to gather and analyze data, optimize scraping processes, and support our vision for a future where Grass plays a crucial role in transforming internet data accessibility.

Who You Are.

  • Demonstrated ability to extract data from complex websites with minimal supervision, with a portfolio or examples of past projects.

  • Proficiency in languages such as Python or JavaScript, with strong skills in libraries and frameworks like BeautifulSoup, Scrapy, or Selenium.

  • Knowledge of asynchronous programming, multithreading, and distributed scraping.

  • In-depth knowledge of HTML, CSS, JavaScript, and the Document Object Model (DOM).

  • Experience with NoSQL databases (MongoDB, Cassandra), capable of designing efficient storage solutions and managing data integrity.

  • Ability to apply machine learning algorithms for data cleaning, categorization, or predictive analysis adds significant value.

  • Experience with cloud services (AWS, Google Cloud, Azure) for deploying and managing scraping jobs at scale.

  • Active participation in open-source projects related to web scraping, data processing, or similar fields.

What You'll Be Doing.

  • Write, test, and refine code that extracts data from various online sources, ensuring reliability and efficiency.

  • Perform data retrieval tasks, handling complexities such as pagination and dynamic content loaded with AJAX.

  • Clean and format extracted data, ensuring it meets quality standards for further analysis or processing.

  • Database management: Store and manage the scraped data in appropriate databases, optimizing for access speed and data integrity.

  • Regularly monitor the scraping processes, identify and resolve any issues to maintain continuous data flow.

Why Work With Us:

  • Opportunity. We are at the forefront of developing a web-scale crawler and knowledge graph that improves access to public web data and extends the value of AI to the people.

  • Culture. We're a lean team with a high bar. We come to work not to be comfortable, but to find out what we're capable of and to do work that matters. We're not calling for people who keep things moving. We're calling for people who make everyone around them better.
    We prioritize low ego and high output. This is a fully remote team.

  • Compensation. You’ll receive a competitive salary, benefits and equity package.

Wynd Labs
Wynd Labs

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say