Feature Selection and Knapsack Problem Resolution Based on a Discrete Backtracking Optimization Algorithm

Feature Selection and Knapsack Problem Resolution Based on a Discrete Backtracking Optimization Algorithm

Khadoudja Ghanem, Abdesslem Layeb
Copyright: © 2021 |Pages: 15
DOI: 10.4018/IJAEC.2021040101
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Backtracking search optimization algorithm is a recent stochastic-based global search algorithm for solving real-valued numerical optimization problems. In this paper, a binary version of backtracking algorithm is proposed to deal with 0-1 optimization problems such as feature selection and knapsack problems. Feature selection is the process of selecting a subset of relevant features for use in model construction. Irrelevant features can negatively impact model performances. On the other hand, knapsack problem is a well-known optimization problem used to assess discrete algorithms. The objective of this research is to evaluate the discrete version of backtracking algorithm on the two mentioned problems and compare obtained results with other binary optimization algorithms using four usual classifiers: logistic regression, decision tree, random forest, and support vector machine. Empirical study on biological microarray data and experiments on 0-1 knapsack problems show the effectiveness of the binary algorithm and its ability to achieve good quality solutions for both problems.
Article Preview
Top

1. Introduction

In real-world situations, data is described by a set of features but generally only some features are used to better represent a situation, these features which are known to be relevant and useful are often unknown. Feature selection is one of the most fundamental problems in machine learning and has drawn increasing attention due to high-dimensional data sets emerging from different fields and collected from millions of IoT devices and sensors. It is the process of automatically or manually selecting a subset of relevant features for use in model construction. It hugely impacts the performance of a model. Indeed, irrelevant or partially relevant features can negatively impact model performance.

Feature selection methods select a subset of M features from the complete set of N input features (M<N) so that the subset can predict the output with accuracy comparable (or better than) to the performance of the complete input set X, and with great reduction of the computational cost and data volume. Feature selection allows reducing model complexity and improving accuracy of supervised learning. FS helps by reducing the dimensions without much loss of the total information. It also helps to make sense of the features and its importance.

When using FS for classification purpose, the problem consists on selecting a subset of features that maximize accuracy with a minimum of subset features. When using FS for regression purpose, the problem consists on selecting a subset of features that minimizes prediction error with a minimum of subset features.

The Knapsack Problem (KP) is a very known NP-hard problem. The KP is defined as follows: Assuming that we have a knapsack with maximum capacity C and a set of N objects. Each object i has a profit pi and a weight wi. The problem consists to select a subset of objects that maximize the knapsack profit without exceeding the maximum capacity of the knapsack.

Backtracking search algorithm (BSA) (Civicioglu 2013) is a recent evolutionary-computing-based global search algorithm designed to be a global minimizer. BSA’s bio-inspired philosophy is analogous to the return of a social group of living creatures at random intervals to hunting areas that were previously found fruitful for obtaining nourishment. BSA has been compared to many other metaheuristics widely used in scientific applications which are: Particle Swarm Optimization (PSO), Covariance Matrix Adaptation Evolutionary Strategy (CMAES), Artificial Bee Colony (ABC), Self adaptive Differential Evolution (JDE), Comprehensive Learning PSO (CLPSO) and Self-Adaptive Differential Evolution (SADE), by using the Wilcoxon Signed-Rank Test. The BSA results are statistically better than those of the other algorithms and it is statistically identical to SADE in solving many problems (Civicioglu 2013).

Contrary to other optimization algorithms, the structure of BSA is quite simple; it has a single control parameter. Moreover, BSA’s problem-solving performance is not over sensitive to the initial value of this parameter. Thus, it is easily adapted to different numerical optimization problems. Also, BSA’s strategies for generating trial populations and controlling the amplitude of the search-direction matrix and search-space boundaries give it powerful exploration and exploitation capabilities. For all these reasons, we investigate the use of this optimization method to find the best feature subset.

Unfortunately, BSA is a real-valued numerical optimization algorithm. and to apply it in feature selection problem which can be seen as a binary problem where each feature can be considered in the construction of the model (1) or not (0) similar to 0-1 Knapsack problem, we introduce in this paper a discrete Backtracking Search Optimization Algorithm to find good solution for both FS and KP problems.

The aim of this paper is three-fold:

Complete Article List

Search this Journal:
Reset
Volume 14: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 13: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing