Classification of Gene Expression Data Using Feature Selection Based on Type Combination Approach Model With Advanced Feature Selection Technology

Classification of Gene Expression Data Using Feature Selection Based on Type Combination Approach Model With Advanced Feature Selection Technology

Siddesh G. M., Gururaj T.
DOI: 10.4018/IJCINI.20211001.oa46
Article PDF Download
Open access articles are freely available for download

Abstract

A key step in addressing the classification issue was the selection of genes for removing redundant and irrelevant genes. The proposed Type Combination Approach –Feature Selection(TCA-FS) model uses the efficient feature selection methods, and the classification accuracy can be enhanced. The three classifiers such as K Nearest Neighbour(KNN), Support Vector Machine(SVM) and Random Forest(RF) are selected for evaluating the opted feature selection methods, and prediction accuracy. The effects of three new approaches for feature selection are Improved Recursive Feature Elimination (IRFE), Revised Maximum Information co-efficient (RMIC), as well as Upgraded Masked Painter (UMP), are analysed. These three proposed techniques are compared with existing techniques and are validated with (i) Stability determination test. (ii) Classification accuracy. (iii) Error rates of three proposed techniques are analysed. Due to the selection of proper threshold on classification, the proposed TCA-FS method provides a higher accuracy compared to the existing system.
Article Preview
Top

1. Introduction

There are some issues in the gene expression data. For example by selecting the best extraction method and by reducing the dimensionality of the data. The efficient dimension reduction technique needs to be chosen to reduce the number of non-relevant features present in the dataset. Gene selection is also an important factor in removing essential elements which improve precision (Lamba et al., 2018).

Due to very high dimensionality of gene expression data, biologists would find it difficult to handle the data on gene expression(Bennet et al., 2015). Hence it is tedious to identify such microarray results. In addition, the irrelevant characteristics and noisy data of the gene expression dataset are also present. The statistical approaches are the optimal solution to such a problem. Automatic statistical computation is required to avoid the errors caused during manual calculations. Such problems can be addressed using the learning methods of the machine.

Additionally, irrelevant features may also be available along with noisy data in the gene expression data set. Therefore essential pre-processing methods are needed. The dominant elements that facilitate the prediction must be extracted from the enormous dataset. This reduction in technology has the advantage of enhancing accuracy, avoiding overfitting, decreasing model complexity and reducing training time. The selection of features (FS) allows the models to efficiently predict by using the remaining functions in the Machine Learning method(Aouf et al., 2019). The test results show that, if and only if the FS is included in the classification stage, the prediction precision can be increased. The accuracy will decrease if the FS is not included in the classification phase(Vanjimalar, Ramyachitra, & Manikandan, 2018).

The methods of feature selection can be broadly divided into three categories due to feature analysis being combined with the nature of the classification model. The division is based on how feature searching is combined with the design of a classification model - filter methods, wrapping methods, and embedded methods. Filter methods measure the relevance of features only by examining the data's intrinsic characteristics. (Haar, Anding, Trambitckii, & Notni, 2019) The quest for an ideal property sub-set, is included in the field of selection techniques known as embedded techniques, which can be found in the integrated areas of feature subsets and expectations. Embedded approaches are much more computational than wrapper models because they need to be interacted. The selection of features can also increase learning accuracy, reduce learning time, and improve learning performance utilising features(Zhao et al., 2010). The selection and extraction of functions(Sun et al., 2005) are two ways of reducing dimensions.

Complete Article List

Search this Journal:
Reset
Volume 18: 1 Issue (2024)
Volume 17: 1 Issue (2023)
Volume 16: 1 Issue (2022)
Volume 15: 4 Issues (2021)
Volume 14: 4 Issues (2020)
Volume 13: 4 Issues (2019)
Volume 12: 4 Issues (2018)
Volume 11: 4 Issues (2017)
Volume 10: 4 Issues (2016)
Volume 9: 4 Issues (2015)
Volume 8: 4 Issues (2014)
Volume 7: 4 Issues (2013)
Volume 6: 4 Issues (2012)
Volume 5: 4 Issues (2011)
Volume 4: 4 Issues (2010)
Volume 3: 4 Issues (2009)
Volume 2: 4 Issues (2008)
Volume 1: 4 Issues (2007)
View Complete Journal Contents Listing