Data Scientist (Knowledge Graph & Identity)
Team: Data Science
Location: Warsaw
Commitment: International Full Time Employee
Workplace Type: hybrid
What You'll Do:
- Own end-to-end delivery of significant data science projects — from problem scoping and approach design through to production deployment, with a focus on knowledge graph and identity solutions
- Make sound, independently-reasoned decisions on methodology, model selection, and evaluation; document them clearly in technical solution documents covering problem statement, approach, metrics, and timeline
- Lead solution design for your own initiatives; break down complex epics into well-scoped user stories with clear acceptance criteria, adopting DataOps and MLOps best practices throughout — experiment tracking, pipeline orchestration, model monitoring, and reproducibility
- Build production-quality Python and PySpark code on Databricks — well-tested, documented, and reusable — and implement advanced ML and AI-powered workflows including entity resolution, probabilistic record linkage, embedding-based matching, semantic similarity, and LLM-augmented pipelines
- Develop and maintain reusable tools, libraries, and documentation that improve team efficiency and technical standards; conduct code reviews with constructive, specific feedback that raises the bar
- Mentor junior data scientists on technical execution, code quality, and career development; lead internal talks or workshops on knowledge graphs, identity, or ML topics
- Collaborate cross-functionally with product, engineering, and operations — translate business requirements into technical specifications, partner with data engineering on scalable pipeline design, and participate in cross-functional design reviews and working groups
Who You Are:
- Bachelor's degree required in Statistics, Data Science, Computer Science, Mathematics or a related quantitative field; Master's strongly preferred
- 3–5 years of hands-on data science experience with demonstrated ability to own and deliver complex, multi-sprint projects independently
- Advanced Python with production-quality code, testing, and documentation; strong SQL and PySpark for billion-row datasets
- Databricks workflows, Delta Lake, and job orchestration; working knowledge of cloud platforms (AWS or GCP)
- Solid command of core ML — regression, classification, clustering, model evaluation, and experimental design — applied to complex, high-volume data
- Proficiency with MLOps practices: experiment tracking, pipeline orchestration (Airflow), and reproducible model deployment
- Exposure to modern AI methodologies: RAG systems, LLM-augmented models, vector databases, and semantic search
- Strong communicator — able to translate technical work into clear documentation, user stories, and cross-functional conversations
- Demonstrated ability to mentor junior data scientists and contribute to team standards
Preferred skills:
- Hands-on experience with knowledge graph construction, entity resolution, or semantic data modeling (RDF, OWL, SPARQL, or equivalent graph frameworks)
- Familiarity with probabilistic record linkage, identity graph approaches, or embedding-based entity matching at scale
- Experience with causal inference methods (A/B testing, synthetic control, uplift modeling)
- Experience with deduplication, enrichment, or web-to-TV linkage problems
- Background in media, ad tech, or measurement — TV viewership (ACR/STB data), digital audience modeling, cross-platform measurement (linear + CTV/OTT), or identity resolution in privacy-constrained environments
- Familiarity with the measurement and identity vendor landscape (Nielsen, Comscore, LiveRamp, The Trade Desk
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
