Article Preview
Top1. Introduction
Content Based Image Retrieval (CBIR) is considered as the mainstay of image retrieval systems. Due to the complexity of multimedia contents, image understanding and semantic searching have become a difficult yet interesting issue in the field. Text-based image retrieval systems requires manually annotated information provided by text descriptors, and the considerable level of human labor requirement becomes the bottleneck of these approaches (Ying et al., 2007). Meanwhile, the storage of the large-scale image database itself reaches some problems nowadays, as the traditional centered database may not meet the need of searching on the scaled datasets.
Cloud computing enables users to flexibly access reconfigurable computing resources without the burden of managing and maintaining the resources (Peng et al., 2012). It has the essential characteristics including reliable and infinite storage capacity, data access independent of locations and time, and dynamical resources provision in a multi-tenant way to avoid costly wasting (Dikaiakos et al., 2009). Due to these features, cloud computing has been acquired by the distributed storage and retrieval systems nowadays.
With the employment of the peer-to-peer (P2P) paradigm, flexible and reliable frameworks for data storing and querying tasks can be provided, which allow designing CBIR systems on large databases. The typical CBIR systems are constructed in similar ways: the features of images are extracted by the system, images containing similar features are considered related, and methods are provided to distinguish whether the images are similar or not. In the P2P paradigm employed system, the distributed storing system provides a method to store images separately while maintaining the searching performance. By using these designs, the LRFIR system is set up to provide CBIR functions on distributed systems (Liao et al., 2014). Different from Approximate Nearest Neighborhood (ANN) algorithms (Ozan et al., 2016), LRFIR provides algorithms of finding exact results instead of ranked list of whole dataset, which is suitable for P2P searches, but the performance is considered promotable.
In real systems, the user of a retrieval system may propose different forms of searching demands, and sometimes those demands are related hierarchically. For example, user may be willing to find either the other traffic sign or exactly the same sign delivered in a searching query. Users may use different benchmarks to judge similarity rather than using single feature distance, e.g., when retrieve images using sketches (Peng et al., 2017), color sensitive features may lead to worse performance compared to texture or shape features.
These two kinds of semantic mismatch are described as gradient mismatch and objective mismatch. In both situations, the cause of mismatch could be somehow related to the features used to express the image. Different features used to describe images such as Multi-Texton Histogram (MTH) (Liu et al., 2010) and Speed-up Robust Features (SURF) (Bay et al., 2006), have different capabilities to measure images in different dimensions. For instance, when finding similar images in a cloth dataset, texture and color information may each represent different categories of similarity, yet users may only pay attention to one kind of them. Hence the problem is to find a combination method to provide high recall rate by using precise feature while maintaining the accuracy by reducing the interference of imprecise features.
To overcome this problem, it is necessary to find a way to let the retrieval system understand the high level semantic information instead of low level image features, which is very difficult to realize. The semantic gap between the high level perceive and the low-level features is hard to cross, and thus a different way of dealing with the semantic gap is proposed: let the retrieval system imitate instead of understanding the choices made by following the high level semantic information, which makes the system function like human.