Hershey, Pennsylvania

New York, New YorkBeijing, China

Special Offers
- Up to 50% off Thousands of Research Books
  From July 1st through October 31st, 2025, we are offering discounts of up to 50% across thousands of titles in Business & Management; Science, Technology, & Medicine; and Education & Social Sciences. Through this campaign, we’re committed to ensuring that our mutual library customers worldwide can continue to access high-quality, peer-reviewed content during these challenging times. If this campaign is successful, we will extend through the end of the year and beyond if there’s a benefit to all parties involved. When hosted on the InfoSci^® Platform, e-books feature no DRM, no additional cost for unlimited-user licensing, full-text PDF & HTML formats, and more. Discount is automatically added at checkout.
  Browse Titles
- IGI Global Scientific Publishing Launches International Brand Ambassador Program
  IGI Global Scientific Publishing has launched a new Ambassador Program, designed to empower research professionals to help spread scholarly resources and foster global research engagement. As a local, mid-sized publisher, this initiative offers IGI Global Scientific Publishing an exciting opportunity to expand its global presence in the academic community and foster meaningful connections among scholars around the world. With currently over 130 ambassadors worldwide, these scholarly experts are dedicated to supporting the publisher’s initiative of disseminating cutting-edge research.
  Learn More
- Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 20 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no hosting or maintenance fees, no additional cost for unlimited-user licensing, full-text PDF & HTML format, and more.
  Learn More
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all available IGI Global Scientific Publishing open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all available IGI Global Scientific Publishing open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through the IGI Global Scientific Publishing Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global Scientific Publishing to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open access endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global Scientific Publishing to publish your work under open access? Review the IGI Global Scientific Publishing open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Creating a Sustainable Large-Scale Content-Based Biomedical Article Classifier Using BERT

Aakash Jayakumar (SRM Institute of Science and Technology, India), Kavya Saketharaman (SRM Institute of Science and Technology, India), J. Arthy (SRM Institute of Science and Technology, India), and S. Jayabharathi (SRM Institute of Science and Technology, India)

Source Title: Cross-Industry AI Applications

DOI: 10.4018/979-8-3693-5951-8.ch018

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Given the scarcity of labeled corpora and the high costs of human annotation by qualified experts, clinical decision-making algorithms in biomedical text classification require a significant number of costly training texts. To reduce labeling expenses, it is common practice to use the active learning (AL) approach to reduce the volume of labeled documents required to produce the required performance. There are two methods for categorizing articles: article-level classification and journal-level classification. In this chapter, the authors present a hybrid strategy for training classifiers with article metadata such as title, abstract, and keywords annotated with the journal-level classification FoR (fields of research) using natural language processing (NLP) embedding techniques. These classifiers are then applied at the article level to analyze biomedical publications using PubMed metadata. The authors trained BERT classifiers with FoR codes and applied them to classify publications based on their available metadata.

Chapter Preview

Top

Introduction

The vast corpus of biomedical literature accessible on PubMed poses a formidable challenge for researchers, healthcare providers, clinicians, and the general public when it comes to locating relevant information (Ghozali et al., 2022a). A standard search on PubMed yields hundreds to thousands of documents, impeding physicians from promptly accessing pertinent data during patient care. Hence, the need arises for a literature repository that is not only intuitive but also well-organized, ensuring ease of comprehension to aid clinical decision-making (Awais et al., 2023). Research has underscored the importance of presenting large document collections in an easily digestible manner, underscoring the necessity for human-friendly access (Bhuva & Kumar, 2023).

Machine learning stands as the predominant methodology for predictive analysis and data classification, assisting individuals in making critical decisions. Machine learning algorithms undergo training through instances wherein they assess historical data and derive insights from past experiences (Boopathy, 2023). With repeated training on instances, these algorithms become proficient in recognizing patterns that enable future predictions. At the core of machine learning algorithms lies data, and further data generation is achievable through training on historical datasets (Elaiyaraja et al., 2023). Generative adversarial networks, an advanced concept in machine learning, have been utilized to create additional visual content by learning from previously generated images, extending their utility to text and speech synthesis (Ghozali, 2022). Consequently, machine learning has significantly broadened the scope of data science applications, incorporating computer science, mathematics, and statistics for data-driven inferences (Ghozali et al., 2022b).

The scientific literature landscape is rapidly expanding, with over a million paper citations added to PubMed in the past year alone (Tak et al., 2023). Effective techniques are imperative to automatically identify entities, link them to standardized concepts within knowledge bases, and index key subjects, simplifying information retrieval for readers (Ravi et al., 2023). The task of named-entity recognition (NER), entity linking, and topic indexing, focusing on chemical names and themes within full-text PubMed publications, has been incorporated into BioCreative VII (Kothuru, 2023). Named entity recognition (NER) is a pivotal phase in the information extraction process from text. The latest NER methodologies employ BERT-based models, with demonstrated performance enhancements achieved through pretraining BERT on domain-specific texts and employing domain-specific lexicons (Krishna Vaddy, 2023). In the realm of biomedical NLP, larger models tend to perform better as NER is particularly sensitive to alterations in the model vocabulary. Following NER, entity linking assumes critical importance, involving the mapping of natural language concepts to their unique identifiers and canonical forms preserved in knowledge bases (Kumar et al., 2023). Older entity linking methods rely on heuristics such as string matching and edit distance calculations (Senbagavalli & Arasu, 2016). At the same time, modern deep learning techniques encompass a multi-step pipeline integrating an NER model, candidate generation, candidate selection, and entity ranking (Kumar Nomula, 2023).

Text classification is a common machine learning approach employed to structure the vast expanse of unstructured digital data (Veronin, et al., 2020a). Algorithms like Support Vector Machine and Naïve Bayes are frequently used due to their simplicity and high accuracy. The advent of pre-trained language models, such as BERT, founded on deep neural networks, has brought about a revolution in natural language processing (Vashist et al., 2023). However, irrespective of the method employed, the demand for appropriately labelled training data remains constant. Manual annotation of training examples can be prohibitively resource-intensive, particularly in domains like biomedicine, necessitating the exploration of alternative strategies, including active learning, to reduce annotation efforts (Thallaj & Vashishtha, 2023).

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Creating a Sustainable Large-Scale Content-Based Biomedical Article Classifier Using BERT

Abstract

Introduction

Complete Chapter List