Article Preview
Top1. Introduction
Multimedia technology has been widely used in intelligent sports. Computer vision and video analysis systems have higher accuracy and real-time than human eyes, which can quickly capture moving objects, and record various motion data of the objects (Li et.al, 2021; Xiao et.al, 2020). As a new intelligent analysis technology, it can automatically analyze image sequences and judge video content without human intervention to achieve fully automatic target detection, tracking, recognition, judgment, recording and emergency disposal (Liu et.al,2020; Jiao et.al, 2020; Gao et al. 2022) instead of traditional feature selection (Zheng et al. 2018; Zheng et al. 2021) or feature extraction (Zhu et al. 2022) technologies.
To effectively realize volleyball trajectory estimation and analysis, Yamato et. Al (1992) adopted the motion, color, texture and other features of two-dimensional small-area blocks to identify different kinds of balls (Yamato et.al, 1992). Lipton (1998) utilized spatial subtraction to detect and track moving objects in a real video stream (Lipton,1998). In order to further improve the detection efficiency, Rowley et.al (2006) et al. used the information from the optical flow field of moving objects. Each pixel was represented by optical flow. The flow vector formed a block with consistent motion, and the feature was represented by a multivariable Gaussian mixture (Rowley et.al, 2006). Although these methods have made good achievements, there are some deficiencies in the accuracy of recognition.
Recently, deep learning-based object detection methods have achieved great success in volleyball trajectory estimation and analysis (Zhao et.al, 2019; Wu et.al, 2020). Due to fast movement, the volleyball size changes rapidly which leads to low detection accuracy. So, one of the major challenges in volleyball detection is small object detection. To alleviate the effect of small object detection, some works exploit multi-scale features. Tan et.al (2020) designed a weighted bi-directional feature pyramid network to fuse multi-scale features. Moreover, by exploiting a compound scaling method, some efficient object detectors are proposed (Tan et.al, 2020). Zhao et.al (2019) designed a multi-level feature pyramid network for scale variation issues. By exploiting feature fusion modules and thinned U-shape modules, this model can effectively detect different scale objects (Zhao et.al, 2019).
Figure 1.
The overview of object detection and tracking
Although the use of multi-scale feature information can improve the accuracy of recognition, it will lead to more computational complexity of the whole model. To this end, Tian et.al (2019) used an anchor box-free mechanism to construct a one-stage object detection deep network. Reducing the number of predefined anchor boxes can effectively avoid the computation. Moreover, small targets only occupy a small area of the image. So selecting sparsely multi-scale features from just a few regions is becoming a hotspot to reduce the computational complexity in small object detection.