Semisupervised Surveillance Video Character Extraction and Recognition With Attentional Learning Multiframe Fusion

Semisupervised Surveillance Video Character Extraction and Recognition With Attentional Learning Multiframe Fusion

Guiyan Cai, Liang Qu, Yongdong Li, Guoan Cheng, Xin Lu, Yiqi Wang, Fengqin Yao, Shengke Wang
Copyright: © 2022 |Pages: 15
DOI: 10.4018/IJDCF.315745
Article PDF Download
Open access articles are freely available for download

Abstract

Character extraction in the video is very helpful to the understanding of the video content, especially the artificially superimposed characters such as time and place in the surveillance video. However, the performance of the existing algorithms does not meet the needs of application. Therefore, the authors improve semisupervised surveillance video character extraction and recognition with attentional learning multiframe feature fusion. First, the multiframe fusion strategy based on an attention mechanism is adopted to solve the target missing problem, and the Dense ASPP network is introduced to solve the character multiscale problem. Second, a character image denoising algorithm based on semisupervised fuzzy C-means clustering is proposed to isolate and extract clean binary character images. Finally, for some video characters that may involve privacy, traditional and deep learning-based video restoration algorithms are used for characteristic elimination.
Article Preview
Top

Introduction

With the progress of society and technology development we have entered the era of intelligence and digitization. The emergence of abundant smart devices such as smartphones and computers has made the way people obtain information more convenient and diverse. Moreover, compared with traditional media, people are willing to get information through intuitive new media like pictures and videos; thus, plentiful videos and pictures data will be generated daily.

Although the new media brings much convenience, the generation of mass video data also raises some urgent problems about how to efficiently index such a large volume of videos on the internet, retrieve important content information, and block unhealthy content information. As the technology of directly converting images to semantic conversion is not mature yet, it is difficult to understand the video content directly through image recognition technology, and it is labor-intensive to solve the problem manually, so we need to find a way to accurately describe and understand the content information in the scene video.

Generally, abundant text character information is contained in videos. We can classify the characters in the video into natural scene characters and artificially superimposed characters based on the form in which the characters appear. These characters are often a kind of description and supplement to the content of the video scene, which can help us understand the content of the video, especially the artificially superimposed characters. For example, the time and location information in surveillance videos directly indicates the time and location of the video scene; the pop-ups in live videos usually contain highly relevant information to the video scene content. Therefore, it is necessary to understand video content by extracting and recognizing characters.

The traditional optical character recognition (OCR) technology has matured, but it only recognizes characters in simple contexts such as text documents, and the recognition approach struggles with characters in complex video image scenes. These video scenes with complex and variable character shapes and colors and varying backgrounds bring new challenges to character extraction and recognition technology. In recent years, studies have focused primarily on character extraction and recognition, video restoration, and other aspects. In terms of character extraction, there were traditional algorithms based on character texture features (Chen et al., 2004) or the ideas of the connected domain (Epshtein et al., 2010). More recent methods can be broadly classified into three categories based on regression, segmentation, and linking. Meanwhile, research on character recognition techniques has emerged due to the rapid development of neural networks. The two main research directions are character recognition based on linkage temporal classification (CTC) and character recognition based on attention mechanism (ATTENTION). On the other hand, the progress of video restoration work is closely related to the effectiveness of character extraction and recognition tasks in videos. Early video restoration tasks mainly involved splitting videos into single-frame images for restoration, which could not make full use of temporal information. After deep learning, we divide video restoration algorithms into two categories: algorithms based on optical flow and those based on the attention mechanism.

However, since most current algorithms only focus on extracting natural scene characters in single-frame images, relatively little research has been conducted on extracting characters in videos. Therefore, the results are not ideal when they are used to extract and recognize artificially superimposed characters in videos, and the recognition results still have certain errors. Although the artificially superimposed characters in the video are more standardized than the natural scene characters and rarely affected by perspective transformation and other problems, which seem to be less difficult to extract and recognize, the artificially superimposed characters are more seriously affected by the background than the natural scene characters. Moreover, the background of the video is complex and changeable, and there are many types, colors, and shapes of artificially superimposed characters, so the extraction and recognition of artificially superimposed characters in the video are challenging.

Complete Article List

Search this Journal:
Reset
Volume 16: 1 Issue (2024)
Volume 15: 1 Issue (2023)
Volume 14: 3 Issues (2022)
Volume 13: 6 Issues (2021)
Volume 12: 4 Issues (2020)
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing