Article Preview
Top1. Introduction
Decision making must be accurate especially in medical and health sector. A critical decision-making system needs complete information otherwise degrades if the information goes missing by misinterpreting the decisions. To handle missing values, (Khan et al., 2013) proposed a medical decision system. Now-a-days in all fields, sophisticated applications have widely been used which collects huge quantity of data on daily-basis. Storage, analysis, mining such big datum needs computational intelligence techniques and data science analysis tools. The author in (Fernandez-Delgado et al., 2014) has done an exhaustive experiment on different datasets with various classifiers using many data analysis tools like R, Weka, C and Matlab. The performance of such data analysis tools is affected due to various issues. More attention is needed for handling such challenges by the data analysts for better analysis. The commonly occurring challenges or issues during data analysis and machine learning tasks are clearly explored in (Zhang et al., 2003) (Bai et al., 2015) (Zhu & Li, 2016) (Li & Ren, 2015). One such most important challenge is missing values. Missing values pose a hidden and unpredictable challenge which needs to be addressed. Missing values (Allison, 2001) are inevitable in real world data collected from different application domains. These applications use data mining, machine learning techniques to either impute or ignore such values. The possible reasons for missing values can be because of faulty devices, man-made mistakes, inaccurate or inconsistent entries, inadequate measurements, unanswered sensitive queries during survey etc. The existence of such missing values results in biased decisions affecting the accuracy of prediction. Incorrect data analysis or decisions may have severe consequences in medical domains, health sector (Gomila & Clark, 2020) (Stiglic et al., 2019), various financial applications etc. The possible ways to solve missing value could either ignore instances having missing data. Another replace the missing data with the approximate data called as parameter estimation so that correct decision making can be done. Ignoring the instances with missing values is an often-used simple method but it reduces the data thus affecting the learning process. Missing values-Ignore method affects the performance of the prediction model and leads to inaccurate decisions (Stiglic et al., 2019).The parameter estimation method called as model based imputation methods like EM algorithm is sensitive to outliers. The best alternative method would be to impute the missing values using Machine Learning (ML) based method (Lakshminarayan et al., 1999) (Little & Rubin, 2019). Such a process is treated as a data cleaning task in data analysis and machine learning during pre-processing phase. The process of inferring the missing value based on the existing data is called as missing data imputation (Myrtveit et al., 2001). Most of the data mining and machine learning algorithms need a complete dataset for knowledge extraction, pattern recognition and decision making. Researchers have proposed various missing data imputation methods for data analysis tasks like classification, regression, clustering etc.