AI Curation Data Scientist
Location: Remote (United States)
Department: Data Science
This Role is Perfect for You If You Love:
- A high-energy start-up working with a brilliant and passionate team
- Working on problems that make a real difference in people’s lives
- Understanding and delivering on reliable and well-characterized products and deliverables within a highly innovative and fast-changing environment: clinical data extraction and aggregation, the relation between data processing and QA framework; LLM tuning and training, pedantic data curation, compute architecture, data exchange.
- Rockstar teammates: you will be working with a strong team with decades of prior work experience in artificial intelligence, software systems, molecular biology, and clinical medicine
- Innovation and problem solving to provide order-of-magnitude improvements in capabilities for data handling and analysis while maintaining traceable data and methods development
About the role:
Reporting to the VP of Data Science, the AI Curation Data Scientist will, using traditional computing and custom AI model training, work on mission-critical projects driven by xCure’s product development needs and will expand xCures’ complex, innovative health data processing, extraction, and analysis capabilities. We’re looking for an individual who values team-building, cooperation, and communications with colleagues to serve the needs of our customers.
Projects will include significant data processing challenges, such as C-CDA XML parsing and de-identification of structured and unstructured EHR content. Equally important projects will address data curation and custom AI model training. You will author software and AI models and contribute to data set curation. You will coordinate data set quality assurance within an innovative, fast-moving team. Team responsibilities are key requirements for this position, which will deliver large and complex data products and data analysis tools.
Key Responsibilities:
- Developing and testing data extraction and integration software for structured EHR content (XML, FHIR) and unstructured text content (attached documents)
- Organizing and contributing to data set curation for model training
- Tuning and training LLMs
- Maintaining a strong understanding of PHI/PII and de-identification policies and strategies at xCures and implementing software solutions compliant with policies and strategies
- Developing and implementing tests of data extraction and aggregation performance to improve efficiency, timeliness, and cost-effectiveness
- Implementing and maintaining code repositories
- Working closely with manager to explore methods, test hypotheses, and collaboratively implement innovative solutions for data science
- Coordinating as required for a fully remote role
- Working with Engineering and other groups to improve overall company efficiency and effectiveness
Qualifications
Required
- Masters degree or equivalent experience in Computer Science, Software Engineering, Statistics, Biology, or related field
- Minimum of 5 years of hands-on experience in data science, machine learning, AI, data analysis, software development, and/or predictive analytics
- Experience applying generative AI and transformer models, especially training of LLMs
- Significant experience with curating data sets to train LLMs
- Significant hands-on coding experience with LLMs, embeddings models, sentence_transformers, and authoring python code to build data extraction and/or classification tools
- Significant prior work experience with parsing XML, JSON, and/or other complex data formats, preferably C-CDA health data
- Experience with TensorFlow, PyTorch, and/or scikit-learn
- Software development skills including git
- Proven efficiency using, and cautious approach to using, LLM-assisted coding
- Experience writing unit and integration tests for scientific/clinical data software as well as with developing scientifically motivated data quality assessments
- Flexible, innovative, can-do approach to delivering software and data products balanced with team cooperation
- A passion for successful delivery of team work products
Preferred
- Extensive experience with data handling efficiency tools, such as jq, xq, Unix command-line tools such as sed, bash programming
- Deep understanding of regex
- Extensive AWS experience and understanding of tradeoffs for different types of data storage for AI training
- Significant experience with PHI and PII, HIPAA, and de-identification is a major plus
- Software development experience in multiple coding languages
- Confidence extending the capabilities of open source tools
- Experience with multiple approaches to LLM-assisted coding, such as within Visual Studio, copilot, Claude Code; and familiarity with frontier and open model capabilities
- Experience with remote teams and solving technical project communications challenges
Location
xCures operates in a distributed, remote-first environment. Candidates may be located anywhere in the United States. Occasional travel to company offsites or key meetings may be required.
The successful candidate must already have authorization to work in the United States. At this time, xCures does not offer sponsorship.
Benefits
- Salary range : 100,000 to 165,000 annually
- Medical, Dental, Vision insurance
- 401k
- Equity options
xCures is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.
Agency Notice: xCures does not accept unsolicited resumes from staffing agencies, search firms, or other third-party recruiters. Any resumes submitted without a signed, active agreement with xCures will not be considered, and no fee will be paid in the event of a hire.
About the Company
About xCures
xCures is redefining how healthcare organizations in the US access, trust, and act on patient data. Our mission is to ensure that critical patient information is available when, where, and how it’s needed most — helping care providers and partners make faster, better-informed decisions that improve health outcomes.
Our AI-powered software platform aggregates, structures, normalizes, and distills patient health data from care encounters nationwide. Our driving purpose is to equip our partners with the critical pieces of validated, traceable information that they need to render care and services in a form that inspires confidence and provides real clinical utility.
At xCures, we hold ourselves to the highest standards of quality and trust. Like the tools we build, our work is driven by precision, performance, and purpose. xCures is excited to champion responsible interoperability and the transformative potential of AI in healthcare, when done with the right values front of mind.
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
