Article Preview
Top1. Introduction
There are some issues in the gene expression data. For example by selecting the best extraction method and by reducing the dimensionality of the data. The efficient dimension reduction technique needs to be chosen to reduce the number of non-relevant features present in the dataset. Gene selection is also an important factor in removing essential elements which improve precision (Lamba et al., 2018).
Due to very high dimensionality of gene expression data, biologists would find it difficult to handle the data on gene expression(Bennet et al., 2015). Hence it is tedious to identify such microarray results. In addition, the irrelevant characteristics and noisy data of the gene expression dataset are also present. The statistical approaches are the optimal solution to such a problem. Automatic statistical computation is required to avoid the errors caused during manual calculations. Such problems can be addressed using the learning methods of the machine.
Additionally, irrelevant features may also be available along with noisy data in the gene expression data set. Therefore essential pre-processing methods are needed. The dominant elements that facilitate the prediction must be extracted from the enormous dataset. This reduction in technology has the advantage of enhancing accuracy, avoiding overfitting, decreasing model complexity and reducing training time. The selection of features (FS) allows the models to efficiently predict by using the remaining functions in the Machine Learning method(Aouf et al., 2019). The test results show that, if and only if the FS is included in the classification stage, the prediction precision can be increased. The accuracy will decrease if the FS is not included in the classification phase(Vanjimalar, Ramyachitra, & Manikandan, 2018).
The methods of feature selection can be broadly divided into three categories due to feature analysis being combined with the nature of the classification model. The division is based on how feature searching is combined with the design of a classification model - filter methods, wrapping methods, and embedded methods. Filter methods measure the relevance of features only by examining the data's intrinsic characteristics. (Haar, Anding, Trambitckii, & Notni, 2019) The quest for an ideal property sub-set, is included in the field of selection techniques known as embedded techniques, which can be found in the integrated areas of feature subsets and expectations. Embedded approaches are much more computational than wrapper models because they need to be interacted. The selection of features can also increase learning accuracy, reduce learning time, and improve learning performance utilising features(Zhao et al., 2010). The selection and extraction of functions(Sun et al., 2005) are two ways of reducing dimensions.