Article Preview
Top1. Introduction
Stereo vision technology was first proposed by Dr. Marr of the Artificial Intelligence Laboratory of Massachusetts Institute of Technology. He extracted three-dimensional information from two plane images with disparity, thus laying the theoretical foundation for the development of stereo vision (Marr, 1982). According to the principle of stereo vision, a large number of stereo vision sensors have been developed. In these stereo vision sensors, some of them use two cameras, some use three cameras, and some use two cameras to work in with the projection gratings, the purpose of which is to calculate disparity images from plane images with disparity relations and provide depth data for the 3D reconstruction of the measured objects (Liu and et al., 2015; Ho and et al., 2017).
After stereo images are photographed by stereo vision sensor, a stereo matching algorithm is needed to generate disparity image, which is the core step of stereo vision technology. In order to achieve the accurate reconstruction of shooting scenes, we need to generate disparity images that cover all pixels. Since 1990s, how to obtain dense disparity images for stereo vision sensors has become a hot topic in the field of stereos vision. A large number of stereo matching algorithms have been proposed by scholars. By 2002, Daniel Scharstein1 and Richard Szeliski (2002) has summarized the existing stereo matching methods, constructed a famous stereo vision test standard (Middlebury stereo benchmark), and provided a general tool for evaluating the performance of stereo matching algorithms for dense disparity image.
In the work of Daniel’s, stereo matching algorithms for the dense disparity image is divided into four steps: initial matching cost calculation, cost aggregation, disparity calculation, and disparity optimization. On the basis of Daniel’s work, the researchers are devoted to the careful work of one or some of the four steps to improve the performance of the stereo matching method to obtain a higher quality of dense disparity image.
In the step of initial matching cost calculation, early research is to calculate matching cost based on pixel gray. But this method is very sensitive to the illumination condition of the shooting environment, and the matching accuracy will decrease obviously due to the distortion of the radiance. In order to solve this problem, Hirshmuller (2007) tested a large number of stereo matching algorithms and found that the Census transform is robust to different light intensity. Based on this situation, Census transformation is gradually taken as the computation basis of the initial matching cost instead of pixel gray (Chang and et al., 2010; Zhu and et al., 2016). However, the Census transform is especially dependent on the central pixel. If the center pixel is disturbed by noise, the accuracy of the initial matching cost will be reduced. Therefore, in recent years, many methods have been used in improving the performance of Census transform, in order to reduce its sensitivity to central pixel interference.