A Mixture Model for Fruit Ripeness Identification in Deep Learning

A Mixture Model for Fruit Ripeness Identification in Deep Learning

DOI: 10.4018/978-1-6684-9999-3.ch016
Chapter PDF Download
Open access chapters are freely available for download

Abstract

Visual object detection is a foundation in the field of computer vision. Since the size of visual objects in an images is various, the speed and accuracy of object detection are the focus of current research projects in computer vision. In this book chapter, the datasets consist of fruit images with various maturity. Different types of fruit are divided into the classes “ripe” and “overripe” according to the degree of skin folds. Then the object detection model is employed to automatically classify different ripeness of fruits. A family of YOLO models are representative algorithms for visual object detection. The authors make use of ConvNeXt and YOLOv7, which belong to the CNN network, to locate and detect fruits, respectively. YOLOv7 employs the bag-of-freebies training method to achieve its objectives, which reduces training costs and enhances detection accuracy. An extended E-ELAN module, based on the original ELAN, is proposed within YOLOv7 to increase group convolution and improve visual feature extraction. In contrast, ConvNeXt makes use of a standard neural network architecture, with ResNet-50 serving as the baseline. The authors compare the proposed models, which result in an optimal classification model with best precision of 98.9%.
Chapter Preview
Top

Introduction

In the field of computer vision (Gowdra, 2021), digital cameras are utilized to emulate biological vision, enabling computers to process the contents of images or videos in a manner akin to human perception (Pan, & Yan, 2020). The object detection (Qi, Nguyen, & Yan, 2022) task (Zhang, Wang, Liu, & Xiong 2022). in computer vision primarily focuses on identifying visual objects within entire images, which includes detecting both the object and its location (Zhao, & Yan, 2021). In the realm of visual object detection (Shi, Li, & Yamaguchi 2020)., models such as CNN (Liu, Yan,&Yang, 2018), R-CNN, Fast R-CNN, Faster R-CNN(Al-Sarayreh,et. al., 2019), and the YOLO (Zhijun, et. al., 2021) series have successfully located and classified (Liu, Nouaze, Touko Mbouembe, & Kim 2020) fruit images (Gowdra, et. al., 2021). Building upon this foundation, the YOLOv7 (Liu, & Yan, 2023) model has improved the speed and accuracy of visual object detection (Yao et. al., 2021).

In recent years, artificial intelligence has been widely employed in various fields (Wang, & Yan, 2021). In view of the lack of labor in fruit picking and subsequent fruit quality classification (Xia, Nguyen, & Yan, 2022) that requires a lot of human labors (Xia, Nguyen, & Yan, 2023). In this book chapter, we propose an automatic fruit recognition algorithm based on YOLOv7 and ConvNext (Tian, 2022) models. The application of the above is mainly to build a deep learning model that can distinguish different fruit categories (Bazame, 2021 (apples and pears) for the same kind of fruit to distinguish the category level according to the degree of skin folds (Kang, & Chen, 2020). The high-precision fruit (apple, pear) detection and recognition (Wang, &Yan, 2021) system based on deep learning can be harnessed in daily life or in the wild to detect and locate fruit targets (Fu, Nguyen, & Yan, 2022). Using deep learning algorithms, it can realize fruit target detection and recognition in the form of pictures, videos, cameras, etc. In addition, it supports results visualization and export of image or video inspection results (Bhargava, & Bansal, 2021).

Visual object detection is characterized by using location and classification (Liu, Sun, Gu, & Deng, 2022). In a two-dimensional image, target detection can locate the position of an apple in the picture, and distinguish the current apple type as “ripe apple”. Firstly, we preprocess the dataset, then input the backbone network to extract features, and take use of ELAN attention. The module acts on the corresponding channel of the feature map to obtain effective features for fruit recognition; then the model performs feature fusion to obtain semantic information and locate the feature map of the information. Finally, accurate detection results are obtained through classification and prediction frame regression calculations (Gokhale, Chavan, & Sonawane, 2023).

Complete Chapter List

Search this Book:
Reset