Article Preview
TopIntroduction
With the development of the Internet and multimedia information technology, videos become one of the main carriers of the modern information transmission. How to obtain user interest from a large number of videos efficiently has become an issue in the field of video retrieval (Yin et al. 2010; Yang & Meinel 2014). The traditional approach of video retrieval is based on the technology of database management systems, with the cost of heavy burden of manual annotation. It is time-consuming, and the manual tag of video information may be inaccurate sometimes. In order to solve these problems, content-based video retrieval (CBVR) was proposed (Hoi & Lyu 2008; Hu et al. 2011).
“Content” in this context might refer to colors, shapes, textures, or any other information that can be derived from the image itself (Patel et al. 2012; Dyana et al. 2010). CBVR is based on the effective analysis of scene, shot and frame in the video data, to extract the video features, such as pixel (Ling et al. 2008), color (Guo et al. 2016), edge (Priya et al. 2012) and motion (Chen & Wu 2011) of low-level features and semantics (Agharwal et al. 2016) of high-level features. To get video with the highest matching similarity and meet users’ satisfaction, we need to compare the features between user input and large-scale database (Wang et al. 2015).
Video stream has a great number of data and it is non-structured. Therefore, it is difficult for us to use the whole video to retrieve directly. In order to achieve effective structural analysis, we do shot boundary detection (Lu & Shi 2013), and then keyframe extraction (Yin et al. 2010). Thus, shot boundary detection and keyframe extraction are very important in CBVR technologies. Applied to forensics, video retrieval technology can help criminal investigation personnel to focus on information which they need efficiently and accurately.
Shot boundary detection is necessary for almost all video analysis, indexing, search, browsing and content-based operations (Smeaton et al. 2010), with most research work focused on it. Then keyframe extraction is carried out after shot segmentation. The keyframe is a frame or a number of frames which reflect the main content of shots or scenes. The content must be as representative as possible (Chao et al. 2010). Therefore, the use of keyframe greatly reduces the amount of data required in the video retrieval and browsing, and provides a framework for organizing the processing of video content (Chakraborty et al. 2015). Then we can retrieve the required information through the keyframe quickly, so as to improve the efficiency. In particular, it can save a lot of valuable time to use the keyframe to retrieve in the criminal investigation.
The rest of this paper is organized as follows. Section 2 introduces three common methods of shot boundary detection. After comparison, the method used in this paper is obtained. And different methods are tested and analyzed by experiments. Section 3 describes four keyframe extraction methods and four evaluation standards. Through experimental analysis, we summarize the keyframe extraction methods under different evaluation criteria, and finally put forward the different types of videos can adopt different types of keyframe extraction method. Section 4 concludes the work.