Mistral AI

Software Engineer, Data Acquisition - Paris/London

Paris, France
Redis SQL Spark Machine Learning CSS C++ Kubernetes PostgreSQL HTML API Java JavaScript Hadoop Pandas NumPy Python
Description
About Mistral 
- At Mistral AI, we are a tight-knit, nimble team dedicated to bringing our cutting-edge AI technology to the world.
- Our mission is to make AI ubiquitous and open. 
- We are creative, low-ego, team-spirited, and have been passionate about AI for years.
- We hire people that foster in competitive environments, because they find them more fun to work in.
- We hire passionate women and men from all over the world.
- Our teams are distributed between France, UK and USA 

Role Summary 
- We are seeking a skilled and motivated Web Crawling and Data Indexing Engineer to join our dynamic engineering team.
- The ideal candidate will have a strong background in web scraping, data extraction and indexing, with a focus on leveraging advanced tools and technologies to gather and process large-scale data from various web sources.
- The role is based in Paris or London 

Key Responsibilities 
- Develop and maintain web crawlers using Python libraries such as Beautiful Soup to extract data from target websites.
- Utilize headless browsing techniques, such as Chrome DevTools, to automate and optimize data collection processes.
- Collaborate with cross-functional teams to identify, scrape, and integrate data from APIs to support business objectives.
- Create and implement efficient parsing patterns using regular expressions, XPaths, and CSS selectors to ensure accurate data extraction.
- Design and manage distributed job queues using technologies such as Redis, Kubernetes, and Postgres to handle large-scale data processing tasks.
- Develop strategies to monitor and ensure data quality, accuracy, and integrity throughout the crawling and indexing process.
- Continuously improve and optimize existing web crawling infrastructure to maximize efficiency and adapt to new challenges.

Qualifications & profile 
- Bachelor’s or master’s degree in computer science, information systems, or information technology
- Strong understanding of web technologies, data structures, and algorithms.
- They should have knowledge of database management systems and data warehousing.
- Programming Languages: Proficiency in programming languages such as Python, Java, or C++ is essential. 
- Masterings of Web Technologies: Understanding of HTML, CSS, and JavaScript is crucial to navigate and scrape data from websites.
- Knowledge of HTTP and HTTPS protocols
- A good understanding of data structures (like queues, stacks, and hash maps) and algorithms is necessary
- Knowledge of databases (SQL or NoSQL) is important to store and manage the crawled data.
- Understanding distributed systems and technologies like Hadoop or Spark Experience using web Scraping Libraries and Frameworks like Scrapy, BeautifulSoup, Selenium, or MechanicalSoup
- Understanding how search engines work and how to optimize web crawling.
- Experience in Machine Learning to improve the efficiency and accuracy of web crawling
- Familiar with tools such as Pandas, NumPy, and Matplotlib to analyze and visualize data. 

Benefits 
- Daily lunch vouchers 
- Contribution to a Gympass subscription 
- Monthly contribution to a mobility pass 
- Full health insurance for you and your family 
- Generous parental leave policy 
Mistral AI
Mistral AI
Artificial Intelligence (AI) Generative AI Machine Learning Natural Language Processing Software

0 applies

27 views

Similar Jobs

Software Engineer

Remote Budapest, Hungary

Senior Software Engineer

Remote Kuala Lumpur, Malaysia

Full Stack Software Engineer

Remote Budapest, Hungary

Senior Data Engineer

Remote Kuala Lumpur, Malaysia

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 401 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say