Article Preview
TopI. Introduction
The ever-lasting growth of multimedia information has been witnessed and experienced by human beings since the beginning of the information era. An immediate challenge resulting from the information explosion is how to intelligently manage and enjoy the multimedia databases. In the course of the technological development of multimedia information retrieval, various approaches have been proposed with the ultimate goal of enabling semantic-based search and browsing. Among those intensively explored topics, content-based image retrieval (CBIR), born at the crossroad of computer vision, machine learning and database technologies, has been studied for more than a decade, yet still remaining difficult (Smeulders, Worring, Santini, Gupta, & Jain, 2001; Datta, Joshi, Li, & Wang, 2008). In a nutshell, the content-based approaches to image retrieval primarily rely on the pictorial information, a.k.a. low level visual features such as color, texture, shape and layout, which can be automatically extracted from images for similarity measure. The essential challenge is that the low level visual features accurately characterizing the semantic meaning of images are difficult to discover. Therefore, semantically relevant images may be located far away from each other in the space of the pictorial information, which is referred to as the semantic gap. To reduce the semantic gap, human knowledge was utilized to help refine the representation of the semantic meaning in a user's query. To this end, the relevance feedback (RF), a technique originally proposed for traditional document retrieval, was adapted to solve the problem of image retrieval (Crucianu, Ferecatu, & Boujemaa, 2004; Zhou & Huang, 2003). A common aspect of most RF techniques is that the learned knowledge will not be propagated forward to the retrieval in the future and hence can be considered as the short-term relevance feedback (STRF). STRF techniques alleviate the semantic gap by incorporating human users' knowledge into the process of labeling training samples yet still suffering from the problem of sample sparseness, as average users are normally willing to select only a few relevant and irrelevant images. In addition, as irrelevant images may be distinct from the relevant ones in many different ways, there is a good chance that training samples of the two categories in the context of STRF are imbalanced. Along with the demand for the real-time performance of a practical search engine, the above-mentioned problems can be considered as the major factors leading to the performance bottleneck.