Novel Bilinear Fusion Network Based on Multimodal Data for Student Distracted Behavior Recognition: BFNMD

Novel Bilinear Fusion Network Based on Multimodal Data for Student Distracted Behavior Recognition: BFNMD

Jian Zhang
Copyright: © 2023 |Pages: 14
DOI: 10.4018/JCIT.326131
Article PDF Download
Open access articles are freely available for download

Abstract

As governments, education departments, and academic accreditation bodies have begun to encourage schools to develop evidence-based decision-making and innovation systems, learning analysis techniques have shown great advantages in decision-making aid and teaching evaluation. After integrating relevant algorithms and technologies in artificial intelligence and machine learning, learning analysis has achieved higher analysis accuracy. In order to realize the recognition of students' classroom behaviors such as standing up, sitting up, and raising hands and improve the recognition accuracy and recall rate, multi-modal data such as human key point information and RGB images are used for experiments. To further improve the feature extraction capability of the model, features are extracted from the improved ResNet-50 and EfficientNet-B0 models, and bilinear fusion is performed to further improve the recognition accuracy of the models.
Article Preview
Top

1. Introduction

Classroom is an important place for teachers to teach and students to acquire knowledge. With the continuous development of the society and the enhancement of the emphasis on student education, the intelligent analysis of classroom teaching quality becomes more and more important. Using information technology to detect, process and analyze students' behavior in class can not only remind students to standardize their behavior in class, but also reflect the active degree of class and help teachers improve teaching methods (Wu et al. 2020; Luo et al. 2015).

At the same time, in order to realize the rapid and extensive sharing of high-quality educational resources, video recording and broadcasting technology has been developed. Video recording and broadcasting system is a kind of educational system which uses multimedia technology to shoot and record classroom teaching activities in real time, and broadcast them live or on demand through the Internet (Meng et al. 2013). Traditional video recording and broadcasting system needs manual real-time shooting of teaching content, teachers in the classroom, blackboard writing, students stand up and sit down and other situations need to artificially control the camera to track the moving target. Therefore, extra manpower is needed to operate the camera, which leads to the instability of shooting quality and the increase of labor cost. In addition, the behavior of the filming staff controlling the camera and moving around in the classroom may interrupt the teacher's teaching ideas or distract the attention of the students, which affects the teaching quality to a certain extent.

With the development of artificial intelligence, deep learning and computer vision technology, it greatly promotes the application of intelligent video recording and broadcasting system, overcomes the shortcomings of previous manual monitoring, and has significant advantages in recognition performance, efficiency and other aspects. It only needs to install a camera in the classroom in advance, detect the behavior state of students in the classroom by using target detection and behavior recognition technology, and control the gimbal camera to track or shoot close-up pictures of students according to their state. The whole recording process does not require human participation, achieving a major breakthrough in video recording and broadcasting technology (Novakovsky et al. 2023; Wang et al. 2022).

However, there are few papers on classroom behavior recognition in academic circles, and the research methods mainly focus on machine learning and deep learning. (Cheng et al. 2022) obtained data from the number of faces, contour features and the range of subject actions, and used Bayesian causality network to deduce the subject behavior characteristics to identify students' behaviors. (Ahmad et al. 2008) extracted Zernike moment feature, optical flow feature and global motion direction feature of actions, and combined with naive Bayes classifier to recognize students' behaviors. The above method mainly uses traditional machine learning method, which requires tedious manual feature extraction steps and has low accuracy. (Jones et al. 2011) extracted the students' target area through background difference method and input it into VGG network China, and successfully identified three kinds of students' classroom behaviors: sleeping, playing mobile phone and normal.

Complete Article List

Search this Journal:
Reset
Volume 26: 1 Issue (2024)
Volume 25: 1 Issue (2023)
Volume 24: 5 Issues (2022)
Volume 23: 4 Issues (2021)
Volume 22: 4 Issues (2020)
Volume 21: 4 Issues (2019)
Volume 20: 4 Issues (2018)
Volume 19: 4 Issues (2017)
Volume 18: 4 Issues (2016)
Volume 17: 4 Issues (2015)
Volume 16: 4 Issues (2014)
Volume 15: 4 Issues (2013)
Volume 14: 4 Issues (2012)
Volume 13: 4 Issues (2011)
Volume 12: 4 Issues (2010)
Volume 11: 4 Issues (2009)
Volume 10: 4 Issues (2008)
Volume 9: 4 Issues (2007)
Volume 8: 4 Issues (2006)
Volume 7: 4 Issues (2005)
Volume 6: 1 Issue (2004)
Volume 5: 1 Issue (2003)
Volume 4: 1 Issue (2002)
Volume 3: 1 Issue (2001)
Volume 2: 1 Issue (2000)
Volume 1: 1 Issue (1999)
View Complete Journal Contents Listing