Article Preview
Top1. Introduction
In real world application, the feature set is generally large in term of dimensionality. The features may be noisy and may contain irrelevant or redundant information about the target concept. This may cause performance degradation on classifiers. Besides that, large feature set also increases the storage cost and require more computation time to process. Feature selection is crucial to select an “optimized” subset of features from the original feature set based on certain objective function. In general, feature selection removes redundant or irrelevant data while retaining classification accuracy. Although there has been a great deal of work in different areas to address this issue (Guyon & Elissee, 2003), these results have not been fully explored or exploited in emerging computer vision applications. Only recently, there has been an increased interest in implementing feature selection for applications such as face detection (Sun, Bebis & Miller, 2003; Viola & Jones, 2001), face recognition (Shen & JI, 2009; Kanan & Faez, 2008a, 2008b), auditory brainstem responses tracking (Acir, Ozdamar & Guzelisc, 2005), gene expression data (Chuang et al., 2008) etc.
Sequential forward selection (SFS) and sequential backward selection (SBS) are the two well-known feature selection methods. SFS starts with an empty feature set; the feature that benefits the performance is added iteratively. In contrast, SBS starts with the full feature set and at each step, feature whose absence causes least decreases in terms of classification performance is dropped. Combining SFS and SBS gives birth to the “plus l-take away r” feature selection method (Stearns, 1976), which first enlarges the feature subset by adding l using SFS and then deletes r features using SBS. Sequential forward floating search (SFFS) and sequential backward floating search (SBFS) (Pudil, Novovicova & Kittler, 1994) are generalizations of the Stearn’s method since they make decisions based on single feature iteratively. Hence, they cannot be expected to find globally optimal solutions. Another famous feature selection method is the relief algorithm (Kira & Rendell, 1992) and extension of it (Wiskott et al., 1994). Features are ranked according to the hypothesis margin. Features that give large hypothesis margin are selected.