Mistral AI

Software Engineer, Data Acquisition (Paris/London)

Paris, France
Kubernetes C++ HTML JavaScript SQL API Java Hadoop Spark Machine Learning Python Redis PostgreSQL Pandas NumPy CSS
Search for More Jobs Talk to a recruiter now 💪
Description
About Mistral 
- At Mistral AI, we are a tight-knit, nimble team dedicated to bringing our cutting-edge AI technology to the world.
- Our mission is to make AI ubiquitous and open. 
- We are creative, low-ego, team-spirited, and have been passionate about AI for years.
- We hire people that foster in competitive environments, because they find them more fun to work in.
- We hire passionate women and men from all over the world.
- Our teams are distributed between France, UK and USA 

Role Summary 
- We are seeking a skilled and motivated Web Crawling and Data Indexing Engineer to join our dynamic engineering team.
- The ideal candidate will have a strong background in web scraping, data extraction and indexing, with a focus on leveraging advanced tools and technologies to gather and process large-scale data from various web sources.
- The role is based in Paris or London 

Key Responsibilities 
- Develop and maintain web crawlers using Python libraries such as Beautiful Soup to extract data from target websites.
- Utilize headless browsing techniques, such as Chrome DevTools, to automate and optimize data collection processes.
- Collaborate with cross-functional teams to identify, scrape, and integrate data from APIs to support business objectives.
- Create and implement efficient parsing patterns using regular expressions, XPaths, and CSS selectors to ensure accurate data extraction.
- Design and manage distributed job queues using technologies such as Redis, Kubernetes, and Postgres to handle large-scale data processing tasks.
- Develop strategies to monitor and ensure data quality, accuracy, and integrity throughout the crawling and indexing process.
- Continuously improve and optimize existing web crawling infrastructure to maximize efficiency and adapt to new challenges.

Qualifications & profile 
- Bachelor’s or master’s degree in computer science, information systems, or information technology
- Strong understanding of web technologies, data structures, and algorithms.
- They should have knowledge of database management systems and data warehousing.
- Programming Languages: Proficiency in programming languages such as Python, Java, or C++ is essential. 
- Masterings of Web Technologies: Understanding of HTML, CSS, and JavaScript is crucial to navigate and scrape data from websites.
- Knowledge of HTTP and HTTPS protocols
- A good understanding of data structures (like queues, stacks, and hash maps) and algorithms is necessary
- Knowledge of databases (SQL or NoSQL) is important to store and manage the crawled data.
- Understanding distributed systems and technologies like Hadoop or Spark Experience using web Scraping Libraries and Frameworks like Scrapy, BeautifulSoup, Selenium, or MechanicalSoup
- Understanding how search engines work and how to optimize web crawling.
- Experience in Machine Learning to improve the efficiency and accuracy of web crawling
- Familiar with tools such as Pandas, NumPy, and Matplotlib to analyze and visualize data. 

Benefits 
- Daily lunch vouchers 
- Contribution to a Gympass subscription 
- Monthly contribution to a mobility pass 
- Full health insurance for you and your family 
- Generous parental leave policy 
Mistral AI
Mistral AI
Artificial Intelligence (AI) Generative AI Machine Learning Natural Language Processing Software

0 applies

26 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 401 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say