Article Preview
Top1. Introduction
Nowadays, the image has penetrated into all aspects of our lives. It is considered an ideal object to transmit information. In particular, with the development of the Internet, the number of images continues to increase. And through the existence of multimedia technology, the image has become an important medium for transmitting modern information. Its growth rate makes the traditional management method of manual labeling more and more impossible. So, it is essential to implement algorithms that replace the human being. In this sense, we find several applications proposed in the field of image processing and artificial vision (Cevikalp & Triggs, 2017; Druzhkov & Kustikova, 2016; Zerdoumi et al., 2018) example, the detection, and tracking of objects, the classification of images etc. For now, this area of research focuses primarily on the understanding, description, and detection of objects that make up the image. Thus, to detect and identify objects in an image is essential to provide an accurate description using the appropriate descriptor for decades. Researchers have focused their efforts on the problems of detection, extraction, and classification of images (Perronnin, Sánchez & Mensink, 2010; Zhu, Wang, Mao, & Yang, 2017). Detection and feature extraction (Chen, Li, Peng, Wong, 2015; Ahmed, Irtaza, & Iqbal, 2017) are often combined to solve common problems in the field of computer vision, such as object detection and recognition, content-based image retrieval, detection and face recognition, and image classification. The methods of classifying images can be summarized in two categories:
- •
The first uses key points, for example Canny (Canny, 1986), Harris (Harris, & Stephens, 1988), Features from Accelerated Segment Test (FAST) (Rosten, & Drummond, 2006).
- •
The second uses descriptors to describe images as Local Binary Pattern (LBP) (Ojala, Pietikäinen, & Harwood, 1996), Scale-Invariant Feature Transform (SIFT) (Lowe, 1999), Speeded Up Robust Features (SURF) (Bay, Tuytelaars, & VanGool, 2006), Accelerated KAZE (A-KAZE) (Alcantarilla, Nuevo, & Bartoli, 2013).
On the other hand, the image classification is a challenging problem due to the complexity and the variety of images (Hsu, Chen, & Huang, 2015). Our aim in this paper is to improve the image classification process combining visual and spatial information. The classification of images is based on two very important aspects: descriptors and classifiers. Descriptors are feature vectors that can represent an image. One of the most efficiency ways to represent an image is Bag of Features (BoF) Csurka, Dance, Fan, Willamowski, & Bray, 2004). BoF has become popular in image classification. This model extracts local characteristics and quantifies them in discrete 'visual words'. Then, a histogram associated with this image is defined from the BoF. It was introduced for the first time to analyze text, then expanded to represent images by the frequency of visual words (features) using the K-means classification algorithm. Otherwise, the description of the images using the BoF is limited because it does not preserve spatial information objects in the image. To overcome this problem, we suggest using Spatial Pyramid Matching (SPM) (Lazebnik, Schmid, & Ponce, 2006). This method is very effective for capturing spatial information of objects. SPM considers that the spatial information of the BoF model represented in pyramidal form gives good results. Detecting features and extracting descriptors in an image can be done by using feature extractor algorithms (for example, SIFT, SURF, etc.).