Bio-Inspired Algorithms for Feature Selection: A Brief State of the Art

Bio-Inspired Algorithms for Feature Selection: A Brief State of the Art

Rachid Kaleche, Zakaria Bendaoud, Karim Bouamrane, Khadidja Yachba
Copyright: © 2023 |Pages: 19
DOI: 10.4018/978-1-7998-9220-5.ch113
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Feature selection is an important process of machine learning, especially when facing the high dimensionality challenges due the unprecedent increase of data, namely big data. The main objective of the feature selection process is to find the smaller feature subset which optimizes a learning algorithm performance providing by this a better readability. The feature selection problem is known to be an NP-hard problem, and classical approaches tackling it reached their limits. Therefore, tackling feature selection problem by bio-inspired algorithms has gained an increased interest due to the improved obtained results. This study presents an overview of feature selection problem and bio-inspired algorithms as a background. Based on this background, bio-inspired algorithms modeling elements for feature selection are described, followed by application domains and bio-inspired algorithms approach samples for feature selection problems. In addition, challenges and issues are discussed aiming to open future research opportunities.
Chapter Preview
Top

Introduction

Data is an important resource of knowledge used in decision making process, and which imply states, enterprises, and various organisations for different domains like economic, health, environment, policy, security, … Nowadays, the volume of data increases in an exponentially manner, and usual machine learning techniques reached their limits. According to the site Statista, the worldwide volume was 26 zetabytes in 2017, the forecasts for 2022 and 2025 are 97 zetabytes and 181 respectively. Therefore, extracting knowledge from voluminous data is a challenge due to the huge amount of features and/or the great number of instances. Hence, the high dimentionality implies a high computational cost, a difficulty in terms of readability due to the large amount of features, and irrelevant and redundant features could decrease the learning model performance (Dash, M., & Liu, H. 1997; Xue et al. 2016).

In order to reduce the high number of features, two main methods exist, which are feature selection (FS) and feature extraction (FE) (Kohavi, & John; 1997). The principal of FS is to select the relevant features, while in the case of FE, new features are constructed from the existant ones (Xue et al. 2016). This paper highlights on FS. The FS problem is a combinatorial problem, it is known to be an NP-Hard problem (Amaldi & Kann, 1998; Narendra & Fukunaga, 1977). The FS problem was addressed by classical algorithms such as sequential forward selection (SFS) and sequential backward selection (SBS) (Dash & Liu, 1997; Kohavi, & John; 1997) which reached their limits in terms of peformance and runtime (Xue et al. 2016). By contrast, these last decades global methods inspired from nature, namely bio-inspired algorithms, have been used and provided amazing results.

Nature have been an important inspiration source of algorithms to tackle hard problems, especially bio-inspired algorithms. These last have been inspired from living beings behaviors, as ants, bees, plants, bacteria and others, which smartly solve their own daily problems. Therefore, these bio-inspired algorithms have been used to address several hard problems in different fields as transport, medecine, robotic, industry and so on. In summary, bio-inspired algorithms proved their effectiveness and robusteness to tackle hard problems in a general manner. Since FS is a combinatorial problem, bio-inspired algorithms are, by consequence, the suited algorithms to tackle this problem. In addition, the hudge number of bio-inspired algorithms, more than 300 algorithms and their variantes, offers a wide range of choices.

Given that the FS process importance and the robustness and efficiency of the bio-inspired algorithms, the motivations of this paper are:

  • Provide an overview of FS as an important process of machine learning to face the high dimentional issue of big data

  • Provide a summary description of bio-inspired algorithms

  • Describe bio-inspired algorithms modeling elements to address the FS problem

  • Describe of bio-inspired approaches tackling FS problems

  • Enumerate the application domains of bio-inspired algorithms for FS problems

  • Outline challenges relevant to bio-inspired algorithms for FS problem, and those relied the bio-inspired algorithms in order to improve existant approaches tackling the FS process.

This paper is organized as follow: the second section is an overview of challenges related to the high dimentonality of data and the limits of classical FS algorithms. In the third section, an introduction to the bio-inspired algortihms is given. Some modelization elements of bio-inspired algorithms to address FS problem, and a brief state of art of applied bio-inspired algorithms with samples of applied approches are given in the forth section. In the fivth section, some future research directions are suggested. Finaly, the last section concludes this paper.

Key Terms in this Chapter

Feature Selection: A step of feature engineering by which the dimensionality of a problem is reduced obtaining a subset of pertinent features in order to improve the accuracy of a predictive model and to give more lisibility.

Classification: A process of putting objects on previously defined classes (supervised) or not defined (unsupervised) according to defined attributes by using specific algorithms.

Filter Methods: A set of FS methods based on statistical methods with no regards to the predictive algorithm and no explicit interaction between the features each other.

Swarm Intelligence Bio-Inspired Algorithms: A subclass of bio-inspired algorithms inspired from swarm intelligent behavioural strategies of living beings like ant, bee, and bird colonies.

Wrapper Methods: A set of FS methods interacting with predictive algorithms implying the interaction between features each other.

Intensification Process: A process of metaheuristics aiming to explore an area of a search space of an optimization problem.

Bio-Inspired Algorithms: Iterative and stochastic algorithms, called metaheuristics, inspired from living being and destinated to address hard problems.

Diversification Process: A process of metaheuristics used in order to explore new area of an optimization problem and to escape a local optima trap.

Complete Chapter List

Search this Book:
Reset