Article Preview
TopIntroduction
Emerging technology supports the successful transition of paper-based medical records to electronic form. These electronic medical records (EMRs) benefit from fast data retrieval, time reduction during patient visiting, data sharing among medical departments, and high data security and privacy due to limitable user access. There are many research works in recent year utilize EMRs for knowledge extraction (Menaouer et al., 2020). The main advantage of such a data source is that EMRs repository contains tacit knowledge and explicit knowledge. The know-how and professional’s experience are usually narrated during medical treatment, including laboratory results and the diagnosis procedure. Among research works on EMRs analysis, the automated adverse drug reaction (ADR) extraction is a highlight. The ADR terminology is an unpleasant event (e.g., symptom, disease, and finding) associsated with a medication given at recommended dosages (Lortie, 1986).
In the earlier research to extract ADR from unstructured texts, the statistical co-occurrence analysis has been widely deployed (Wang et al., 2009; White et al., 2016; Nikfarjam et al., 2019) due to less complexity, straightforward, and highly significant results. However, the major drawback of the co-occurrence approach is disregard relation context. Drug and event entities are expected to appear together over a chance frequently regardless of the considering on clinical relation meaning between two entities; for example, a drug treats a medical event, or a drug may cause an adverse event. The pattern-based method is proposed to overcome the co-occurrence limitation. Xu & Wang (2014a, 2014b) proposes a pattern-ranking method. Similarly, Taewijit et al. (2017) incorporates the distant supervision approach with pattern-based for ADR identification from EMRs. Bollegala (2018) deploys lexical patterns from social media.
Regarding the success of many medical applications using deep learning, Zhang et al. (2018) construct word sequence dependency and relation sequence dependency from the dependency graph of a given medical sentence, then learn medical relation using a hybrid model of recurrent neural networks (RNNs) and convolutional neural networks (CNNs). Similar to the work of Gupta et al. (2018) and Cocos et al. (2019), they propose RNN to discover ADR relations. Although embedding methods have been applied to texts such as Word2vec (Mikolov et al., 2013), Glove (Pennington et al., 2014), Fast Text (Bojanowski et al., 2017), and graphs such as Node2vec (Grover et al., 2016), DeepWalk (Perozziea al., 2014), LINE (Tang et al., 2015), Metapath2vec (Dong et al., 2017), since the modeling of intricate patterns using embedding methods require a large dimensionality, it is fundamentally hard to compute the embeddings of large graph-structure such as social network, knowledge graphs or taxonomies without loss of information (Nickel and Kiela, 2017).
Unlike the above deep learning methods, they try to learn distributional semantic vectors on labeled instances that require massive manual data annotation to achieve good performance. In this work, we extend our previous work (Taewijit & Theeramunkong, 2016). We deploy distant supervision settings and integrate the pattern-based method with pattern expansion by learning hierarchical representations to examine entity pairs relation through the relation triple <entity1>, <phrase>, <entity2> embedding for ADR extraction from EMRs. All triples in a corpus are learned the semantic on hyperbolic space. Moreover, we add the evaluation results of our pattern-based relation by two domain experts. For the remainder of this paper, we organize it into three sections; the background of our study, the material and method, and the experimental results.