Review on Feature Selection and Classification using Neuro-Fuzzy Approaches

Review on Feature Selection and Classification using Neuro-Fuzzy Approaches

Saroj Biswas, Monali Bordoloi, Biswajit Purkayastha
Copyright: © 2016 |Pages: 17
DOI: 10.4018/IJAEC.2016100102
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

This research article attempts to provide a recent survey on neuro-fuzzy approaches for feature selection and classification. Feature selection acts as a catalyst in reducing computation time and dimensionality, enhancing prediction performance or accuracy and curtailing irrelevant or redundant data. The neuro-fuzzy approach is used for feature selection and for providing some insight to the user about the symbolic knowledge embedded within the network. The neuro–fuzzy approach combines the merits of neural network and fuzzy logic to solve many complex machine learning problems. The objective of this article is to provide a generic introduction and a recent survey to neuro-fuzzy approaches for feature selection and classification in a wide area of machine learning problems. Some of the existing neuro-fuzzy models are also applied on standard datasets to demonstrate the applicability of neuro-fuzzy approaches.
Article Preview
Top

1. Introduction

The focus of this era is not simply serving the purpose of a work but to optimize the process involved, in order to minimize time and space complexity. Machine learning algorithms in pattern recognition, image processing and data mining mainly ensure classification. These algorithms operate on a huge amount of data with multiple dimensions, from which knowledge is extracted. However, the entire dataset in hand does not always prove to be significant to each and every domain. An important concept that contributes extensively in classification and better understanding of the domain is feature selection (Kohavi and John, 1997). Feature selection is a process of selecting a subset of features from a set of features in a balanced manner, without losing most of the characteristics and identity of the original object. There are two factors that affect feature selection – irrelevant features and redundant features (Dash and Liu, 1997). Irrelevant features are those which provide no useful information in that context and redundant features are those which provide the same information as the currently selected features.

Selection of an optimal number of distinct features contributes substantially in the improvement of the performance of a classification system with lower computational effort, data visualization and improved understanding of computational models. Feature selection also reduces running time of learning algorithm, risk of data over fitting, dimensions of the problem and cost of future data acquisition (Guyon and Elisseeff, 2003). Thus, in order to cope up with the rapidly evolving data, many researchers have been proposing different feature selection techniques for classification tasks.

The main goals of feature selection are to select the smallest feature subset that yields the minimum generalization error, to reduce time complexity and to reduce memory and money for handling large datasets (Vergara and Estévez, 2014). In most common scenarios, feature selection methods are used for solving classification problems or are a part of a classification problem. Many classical techniques exist for the purpose of feature selection such as Mutual Information (MI), decision tree, Bayesian network, genetic algorithm, Support Vector Machine (SVM), K-nearest neighbor (K-nn), Pearson correlation criteria, Linear Discriminant analysis (LDA), Artificial Neural Network (ANN), Fuzzy sets. The choice of using a specific algorithm is a critical step as no such best algorithm exists that fits for considering every scope and solving every problem of feature selection and classification.

The use of Mutual Information (MI) for feature selection can be found in many contributions by different researchers (Vergara and Estévez, 2014; Peng at al., 2005; Grande et al., 2007; Chandrasekhar and Sahin, 2014; Battiti, 1994). Mutual information provides the dependencies between variables in terms of their probabilistic density functions. However, if one among the two variables is continuous, a limited number of samples obtained after feature selection makes the computation of the integral in the continuous space a bit challenging. (Peng et al., 2005). It has also been found that MI does not work efficiently in high-dimensional spaces and there exists no standard theory for MI normalization (Vergara and Estévez, 2014).

Complete Article List

Search this Journal:
Reset
Volume 14: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 13: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing