Description

The candidate will work in high visibility projects as a Data Scientist, bringing the Data Science and NLP expertise to projects. The candidate will work in RD Data Science team and collaborate with Product's managers, domain experts, Knowledge representation experts, to build high value outcome from Elsevier content. The candidate will have an opportunity to impact virtually all Elsevier applications related to Research and Operations.

In scientific publishing, consistent and clear organization of content is crucial for facilitating comprehension and navigation. However, the lack of standardized “section” labeling practices often poses challenges for readers seeking to navigate through scholarly works efficiently. Inconsistencies in section titles, such as variations between "Methods" and "Approach," can hinder effective information retrieval across scientific articles. While conventional methods like regular expressions have been utilized for section classification, their limitations in handling diverse terminology and context-specific variations call for more sophisticated solutions. Regular expressions are mostly to detect section types, but this method lacks generalizability and accuracy. This project aims to explore the use of modern natural language processing (NLP) models, such as large language models (LLMs) and BERT-based models, to accurately classify article sections into standardized section types, improving the consistency and usability of scientific articles on platforms like ScienceDirect.

The primary objectives of this project are:

Data Collection and Preprocessing: Compile a dataset of scientific articles with sections labeled using regular expressions. This step will invove collecting a diverse set of scientific articles from the ScienceDirect database and using existing regular expressions to label sections based on their titles. Next to the text, metadata such as article domain, publication year, and author information will be used to enhance the classification performance.
Model Training: Train NLP models on the noisy dataset to classify sections into standardized types. This will involve exploring the use of modern NLP models, including LLMs, and other transformer-based models, to classify section types.
Active Learning: This will include using an active learning method to improve the quality of the training data. In this setting, a set of samples will be selected (based on classifier’s confidence) to be evaluated and labeled by an LLM. This process will be iterated for several steps and at each step a set of high-quality samples will be extracted and added to the dataset to boost the performance of the classifier.

-----------------------------------------------------------------------

Elsevier is an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law. We are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form: https://forms.office.com/r/eVgFxjLmAK , or please contact 1-855-833-5120.

Please read our Candidate Privacy Policy.

Elsevier

Content Content Discovery Delivery Health Care Information Services Information Technology Publishing

0 applies

0 views

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 401 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say

Elsevier

Data Science Intern

Ugh.. sorry 😔 This job is closed.

Check out similar jobs below 😊

Other Jobs from Elsevier

Data Scientist II

Senior Systems Engineer - CICD Enablement

Consulting/Principal Software Engineer

Software Engineer III

Senior Software Engineer II