Variable Selection by Domain Experts vs. Filter Algorithms for Clinical Predictive Modeling

Variable Selection by Domain Experts vs. Filter Algorithms for Clinical Predictive Modeling

DOI: 10.4018/978-1-6684-8103-5.ch019
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Studies on variable selection in the medical field have focused largely on algorithms with little attention paid to domain experts in this regard. This chapter compared the performance of domain experts with filter algorithms in variable selection for clinical predictive modeling. Five clinical datasets on bacterial survival, neonatal birthweight, breast cancer, diabetes, and myocardial infarction were employed. For each dataset, fifteen domain experts were requested to rank the importance of the variables on a five-point Likert scale. The same variables were ranked using four algorithms, namely, chi-squared, Fisher score, Pearson's correlation, and varImp function. Results of classification models showed that both methods performed competitively. This means human expertise and experience are important in clinical predictive modeling and must not be mortgaged to algorithms. Further studies should focus on developing automated platforms that codify domain knowledge and experience to facilitate real-time, speedy, and seamless variable selection.
Chapter Preview
Top

Background

Research on variable selection related to clinical, medical, and healthcare modeling abound in the literature (Atsa’am, 2020; Awan et al., 2019; Bodur & Atsa’am, 2019; Sanchez-Pintoa, Venableb, Fahrenbachc, & Churpek, 2018). For instance, Atsa’am (2020) developed an algorithm that computes the importance of the variables in medical datasets using a statistical measure known as the odds ratio. A similar study by Bodur & Atsa’am (2019) employed the risk ratio to design an algorithm that ranks predictors of healthcare data according to their importance. In a related study, Awan et al. (2019) used machine learning techniques to determine the most suitable variables for modeling the possibility that a heart failure patient will be readmitted to the hospital within thirty days after being discharged. Furthermore, Sanchez-Pintoa et al. (2018) compared the effectiveness of eight variable selection methods in predicting clinical outcomes among patients. The eight techniques cut across regression- and tree-based methods. The authors found that regression-based methods performed better with smaller datasets while tree-based methods are most appropriate for larger datasets.

Key Terms in this Chapter

Filter Variable Selection Algorithm: An algorithm that uses a statistical technique or measure to determine the importance of a variable outside a learning algorithm.

Clinical Prediction: The process of predicting the presence or absence of a medical condition with the aid of statistical or machine learning models.

Expert System: A computer program that codifies human knowledge and mimics human expertise when solving a problem or making decisions.

Predictive Modeling: The process of developing and implementing models that can accurately predict the class of an observation whose class is unknown.

Domain Knowledge: Experience and expertise related to a specific field or discipline.

Variable Selection: The process of evaluating the best variables to be included into a model to maximize model performance.

Domain Expert: A human expert in a specific field or discipline such as medicine, engineering, and computer science.

Complete Chapter List

Search this Book:
Reset