Vehicle Detection and Distance Estimation Using Improved YOLOv7 Model

Vehicle Detection and Distance Estimation Using Improved YOLOv7 Model

DOI: 10.4018/979-8-3693-1738-9.ch009
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

In this book chapter, the authors propose a low-cost distance estimation approach to develop more accurate predictions from a 3D perspective for vehicle detection and ranging by using inexpensive monocular cameras. This distance estimation model integrates YOLOv7 model with an attention module (CBAM) and transformer, as well as extend the prediction vector as the fundamental architecture to improved high-level semantic understanding and enhanced feature extraction ability. This integration significantly improved detection and ranging performance, offering a more suitable and cost-effective solution for distance estimation.
Chapter Preview
Top

Introduction

Traffic scene understanding is an important component of autonomous vehicles. In order to consolidate the safety of autonomous vehicles, improving spatial perception capabilities to improve their understanding and interpretation of the surrounding environment is the focus of research (Ignatious, 2023; Liu et al., 2019). Self-driving cars with strong spatial perception capabilities can well judge the distance between themselves and surrounding vehicles, thereby maintaining a safe following distance from other vehicles and avoiding scratches and collisions with vehicles and obstacles (Liu, 2019; Sarker, 2021; Guo, 2021; Zhang, 2020; Hu, 2020).

In recent years, as artificial intelligence has become more and more popular, many researchers have adopted machine learning technology to achieve many scene understanding tasks. Among them, the distance estimation task is one of the main methods to achieve the spatial perception capability of vehicles (Zalevsky et al., 2021). Some distance estimation tasks use lasers to measure the distance between two vehicles, but their disadvantages are high cost and limited effectiveness. In addition, there are some studies using hardware devices such as ultrasonic, infrared, and microwave radar to achieve distance estimation (Aliew, 2022; Özcan et al., 2020).

These hardware devices are sensitive to interference, and it is difficult to distinguish between two different detection targets that are very close, resulting in unreliable estimation results.

In order to avoid these shortcomings, some researchers gave up using hardware devices to obtain distance information, and instead used cameras to obtain vehicle scene images and infer distance information through vehicle detection tasks (Mehtab et al., 2021). This method fundamentally solves the problem of high hardware cost and impractical use. However, methods for estimating distance from 2D information also face many challenges.

Deep learning technology has made satisfactory achievements in distance estimation. Deep learning is capable of achieving higher accuracy than traditional machine learning models. And automatically learn complex data features, reducing the need for manual feature engineering. Deep learning models can also handle large amounts of data and can be trained on distributed computing systems, making them scalable to large data sets. Deep learning has shown strong learning capabilities after training, testing and verification, and due to its larger number and width of network layers, it can be mapped to more functions to solve more complex problems. Moreover, as the amount of data sets increases and appropriate parameters are adjusted, deep learning networks will have better and better performance.

Models based on deep learning adjust network parameters by learning a large amount of ground truth information during the training process, thereby achieving the purpose of accurately estimating distance. Currently, distance estimation tasks based on deep learning are divided into two types, one is to train the network through information captured by a monocular camera, and the other is to obtain information through binocular cameras. The principle of monocular ranging is to identify pedestrians, vehicles, etc. in the scene through image matching, and then estimate the distance based on the size of the target in the image. Binocular ranging is to directly measure the distance of the object in front by calculating the parallax of two images.

Complete Chapter List

Search this Book:
Reset