ELCA

Intern, Testing Data Migrations with Synthetic Data

Pully, Switzerland
Python SQL Data Modeling LLM AI Machine Learning Deep Learning Git CI/CD
Description

Internship : Testing Data Migrations with Synthetic Data: AI-powered approach

Location: Pully, Switzerland

Description

Data platform migrations are common in enterprise environments, moving from legacy systems to modern infrastructure while preserving business logic. The technical challenge isn't just syntax translation; it's validation. When developers migrate SQL scripts or data pipelines between platforms, they face different execution environments, modified data access permissions, and no safe way to test against production data.

This internship tackles synthetic data generation for migration script testing. You'll design and implement a system that generates realistic test datasets mirroring production structure and behavior without exposing sensitive information. There are different approaches, it could be a small dataset living in a git repository, or a fully-fledged synthetic data warehouse. Still, the data must be realistic enough to catch real bugs.

The challenge goes beyond simple data mocking. You'll need to decide whether to generate from real data (anonymization risks), from query analysis alone (requires good documentation), or hybrid approaches. Should categorical values match production exactly or can we substitute them and adapt the scripts? Can we extend unit-testing to end-to-end testing, and what would be the required dataset properties?

Part of the work involves establishing an evaluation methodology—potentially collecting a reference set of migration scripts and their expected behaviors to measure how well different synthetic data approaches catch real issues. There's potential to explore multi-agent architectures where specialized agents handle different aspects: schema analysis, constraint extraction, data generation, anonymization verification, and test validation. This is applied research with immediate production impact.
 

Objectives

  • Design a strategy for migration script testing that balances realism, anonymization, and practical constraints
  • Implement a proof-of-concept system that generates test datasets from schema documentation, existing queries, or (carefully) sampled production data
  • Define testing strategies: unit tests vs. end-to-end tests, minimum viable data sizes, etc.
  • Develop an evaluation methodology to measure the effectiveness of different synthetic data generation approaches
  • Explore multi-agent architectures for decomposing the generation pipeline into specialized components (schema analysis, constraint satisfaction, validation)

 

Our offer

  • A dynamic work and collaborative environment with a highly motivated multi-cultural and international sites team
  • The chance to make a difference in peoples’ life by building innovative solutions
  • Various internal coding events (Hackathon, Brownbags), see our technical blog
  • Monthly After-Works organized per locations

 

Skills required

  • Strong Python programming: data processing, testing patterns, CI/CD integration
  • Understanding of relational databases, SQL, and data modeling concepts
  • Experience with LLMs and agentic systems: prompting, tool use, multi-agent orchestration
  • Familiarity with data security and data anonymization concepts
  • Problem-solving mindset: comfort with ambiguous requirements and making justified technical trade-offs
  • Clear technical writing and documentation skills

     

About Us

We are ELCA, one of the largest Swiss IT tribe with over 2,300 experts. We are multicultural with offices in Switzerland, Spain, France, Vietnam and Mauritius. Since 1968, our team of engineers, business analysts, software architects, designers and consultants provide tailor-made and standardized solutions to support the digital transformation of major public administrations and private companies in Switzerland. Our activity spans across multiples fields of leading-edge technologies such as AI, Machine & Deep learning, BI/BD, RPA, Blockchain, IoT and CyberSecurity.
ELCA
ELCA

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say