Fine-Grained Drug Interaction Extraction Based on Entity Pair Calibration and Pre-Training Model for Chinese Drug Instructions

Fine-Grained Drug Interaction Extraction Based on Entity Pair Calibration and Pre-Training Model for Chinese Drug Instructions

Xiaoliang Zhang, Feng Gao, Lunsheng Zhou, Shenqi Jing, Zhongmin Wang, Yongqing Wang, Shumei Miao, Xin Zhang, Jianjun Guo, Tao Shan, Yun Liu
Copyright: © 2022 |Pages: 23
DOI: 10.4018/IJSWIS.307908
Article PDF Download
Open access articles are freely available for download

Abstract

Existing pharmaceutical information extraction research often focus on standalone entity or relationship identification tasks over drug instructions. There is a lack of a holistic solution for drug knowledge extraction. Moreover, current methods perform poorly in extracting fine-grained interaction relations from drug instructions. To solve these problems, this paper proposes an information extraction framework for drug instructions. The framework proposes deep learning models with fine-tuned pre-training models for entity recognition and relation extraction. In addition, it incorporates an novel entity pair calibration process to promote the performance for fine-grained relation extraction. The framework experiments on more than 60k Chinese drug description sentences from 4000 drug instructions. Empirical results show that the framework can successfully identify drug related entities (F1 3 0.95) and their relations (F1 3 0.83) from the realistic dataset, and the entity pair calibration plays an important role (~5% F1 score improvement) in extracting fine-grained relations.
Article Preview
Top

Introduction

Drug instructions contain a wealth of drug knowledge, which can be extracted for decision support in clinical diagnosis, prescriptions, and healthcare management. However, mining drug instruction knowledge is a challenging task. Textual descriptions in drug instructions usually take the form of long, complicated sentences, which are difficult to be processed by man or machine. Existing information extraction methods have been used for drug and disease entity recognition (Lin, & Xie, 2020; Sun et al., 2021; Zhu et al., 2021), drug-disease relationship extraction (Bose, et al., 2021; Fatehifar, & Karshenas, 2021; Mingliang, Jijun, & Fei, 2021), etc. Recently, pre-training models have become prominent in Natural Language Processing (NLP) tasks due to its generalization capability (Vaswani et al., 2017). However, many existing methods still consider the knowledge extraction task as a pipeline process that consists of Named Entity Recognition (NER) and Relation Extraction (RE) tasks. These pipeline methods are prone to error propagation since not all entities generated from NER are valid for RE (Pawar, Palshikar, & Bhattacharyya, 2017). In drug instructions, it is very common to have multiple entities and relations in a single sentence, without proper entity pair checking, the performance of fine-grained relation extraction will be significantly affected by the invalid entity pairs (Gao, Zhou, & Gu, 2021). Siriwon Taewijit, & Thanaruk Theeramunkong (2021) proposed an approach to learn meaningful patterns by hyperbolic embedding and then extract adverse drug reactions from electronic medical records. As a result, manual effort is required to prepare entity pairs. In other cases, joint methods are proposed to extract entities and relations in a single process (Wei et al., 2020), however, these methods struggle with many-to-many relations that involve overlapping entities (Tiktinsky et al., 2022).

Table 1.
Sample drug instruction text with multiple entities and relations
Compound paracetamol should not be used together with chloramphenicol, barbiturates (such as phenobarbital), etc.

To overcome the limitations in the current approaches, this paper proposes a novel pipeline drug information extraction framework using an entity pair calibration module based on pre-training models. It starts by automatically gathering the textual descriptions from public drug instruction data sources. Then, it uses deep learning models to identify four categories of entities: drugs, diseases, body parts, and symptoms. After the entity name recognition, it gathers the sentences with multiple relevant entities and applies the entity pair calibration (EPC) process. The goal of EPC is to distinguish between the main entity (the primary drug which the instruction is intended for, e.g., the first drug mention in Table 1) and the secondary entities (the other entities mentioned in the sentence of the instruction, potentially associated with the primary drug entity, e.g., the second, third and fourth drug mentions in Table 1). This step can reduce the noise for further drug relation extraction and facilitate accurate extraction of fine-grained relations. Finally, it extracts fine-grained relations from those entity pairs.

The main contribution of this paper is two-fold:

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 1 Issue (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing