An Efficient Source Selection Approach for Retrieving Electronic Health Records From Federated Clinical Repositories

An Efficient Source Selection Approach for Retrieving Electronic Health Records From Federated Clinical Repositories

Nidhi Gupta, Bharat Gupta
DOI: 10.4018/IJITSA.307025
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The data retrieval and unification of patient electronic health records from distinct clinical repositories is essential for an effective health decision making. The scattering of patient data in many complementary but also overlapping sources poses a challenge in locating the desired records. Maintaining the patient master index also faced legal and privacy issues. Therefore, it is essential to locate the relevant records of the patient in federated clinical data sources. The research carried out proposed an efficient approach for relevant selection of patient records using semantic technologies. The approach uses both triple pattern-wise and join-aware source selection approaches for optimal selection of relevant data sources. The state-of-art federated engines are evaluated on the basis of source selection time and overall query execution time for different federated queries. The experimental results shows that the proposed approach selects the relevant data sources with reduced number of remote requests and significantly reduces the query execution time.
Article Preview
Top

Introduction

The health sector is experiencing advancements in data retrieval solutions. The growth in technology and communication has brought innovation in clinical data retrieval and its management. Due to the digitization of health-related information, a vast number of Electronic Health Record databases are maintained by healthcare organizations. Electronic Health Records (EHR) capture the details of clinical encounters, along with patient demographic information, diagnosis, and vital signs data. Reuse of clinical data is important to recognize the potential for better healthcare management, reduced healthcare costs, population health management, and effective clinical research (Meystre et al., 2017). The EHR data integration can be performed by exploiting a common data model or using semantic web principles (Almeida et al., 2019). The Integration of data through Extract-Transform-Load (ETL) processes relies on combining the data into a central data store. However, the Health Insurance Portability and Accountability Act (HIPAA) and other privacy laws prohibit the integration of patient data from multiple data sources to a central repository for analysis (Gostin, 2001). The data query and its retrieval over web-based linked data have witnessed remarkable success (Schmachtenberg et al., 2014). Semantic web technologies are widely used to form an integrated federated network of distributed data sources. Therefore, there is a paradigm shift in health systems towards federated clinical data stores and query processing across multiple data sources. It enables the health providers to store clinical records at the native place only, thereby maintaining data privacy.

The medical history of an individual is needed for effective clinical decision-making in order to manage the patient health. There are two challenges in locating the patient records across multiple health data stores. Firstly, the patient data is scattered in many complementary but also overlapping sources (Weber, 2015). Secondly, indexing of patient identification information would easily locate the patient records but the federated engine does not maintain a local master-patient index due to legal and privacy issues (Xu et al., 2021). The Federated query tools are being used to precisely estimate the number of patients matching a query (Dewri et al., 2016). The patient health history retrieval involves searching for relevant sources of patient data across multiple health organizations. An optimal relevant set of data sources actually contributes to the query solution. The selection of optimal data sources is required because during the execution of the query plan the intermediate results from some data sources may get excluded after performing a JOIN operation with the results of other subqueries. An overestimation of query relevant data sources increases the number of remote requests, produces irrelevant intermediate results and can significantly increase the query execution time. Therefore, the selection of an optimal set of data sources would not overestimate the data sources and it is also a significant step in federated query processing.

The research work carried out is an attempt for addressing the problem of source selection of patient data sources in a federated query processing. It enables the fetching of data to provide an integrated view of patient medical records across multiple health data sources through a common query interface with a significant reduction in the number of remote queries and execution time of the query.

Complete Article List

Search this Journal:
Reset
Volume 17: 1 Issue (2024)
Volume 16: 3 Issues (2023)
Volume 15: 3 Issues (2022)
Volume 14: 2 Issues (2021)
Volume 13: 2 Issues (2020)
Volume 12: 2 Issues (2019)
Volume 11: 2 Issues (2018)
Volume 10: 2 Issues (2017)
Volume 9: 2 Issues (2016)
Volume 8: 2 Issues (2015)
Volume 7: 2 Issues (2014)
Volume 6: 2 Issues (2013)
Volume 5: 2 Issues (2012)
Volume 4: 2 Issues (2011)
Volume 3: 2 Issues (2010)
Volume 2: 2 Issues (2009)
Volume 1: 2 Issues (2008)
View Complete Journal Contents Listing