Article Preview
Top
Sotiropoulos et al. (2008) explore the use of objective audio-signal features to model the individualized (subjective) perception of similarity between music files. They present MUSIPER, a content-based music retrieval system that constructs music-similarity perception models of its users by associating different music similarity measures to different users. Feature learning and deep learning have drawn great attention in recent years as a way of transforming input data into more effective representations using learning algorithms. Nam et al. (2015) present a two-stage learning model to effectively predict multiple labels from music audio. Yu et al. (2019) propose a deep cross-modal correlation learning architecture involving two-branch deep neural networks for audio modality and text modality (lyrics). A pretrained Doc2Vec model followed by fully connected layers (fully connected deep neural network) is used to represent lyrics. Ghosal and Kolekar (2018) propose a novel approach for music-genre recognition using an ensemble of convolutional long short-term memory based neural networks (CNN LSTM) and a transfer-learning model. The neural network models are trained on a diverse set of spectral and rhythmic features, whereas the transfer-learning model was originally trained on the task of music tagging.
Xie et al. (2018) propose a CNN-based hard-hat detection algorithm. In this algorithm, the detection of construction workers and hard hats is assisted by a computer-vision technique where deep-learning models are trained to identify the proper wearing of hard hats. Based on characteristics of the knowledge expression of construction procedural constraints in Chinese regulations, Zhong et al. (2020) explore a hybrid deep neural network, combining bidirectional LSTM and CRF for the automatic extraction of the qualitative construction procedural constraints. The model-implementation results demonstrate the good performance of the end-to-end deep neural network in the extraction of construction procedural constraints. In this research work, a deep learning–based model has been discussed for content-based image retrieval (CBIR). Singh et al. (2020) study CBIR-CNN, content-based image retrieval on celebrity data using deep convolution neural network. For classification purposes, a four convolution layer model has been proposed.
Content-based music information retrieval has seen rapid progress with the adoption of deep learning. Manco et al. (2021) propose to address music description via audio captioning, defined as the task of generating a natural language description of music audio content in a human-like manner. In order to study the application of the deep-learning method in music-genre recognition, Xu (2022) proposes the parameter-extraction feature and the recognition-classification method of an ethnic music genre based on the deep beliefs network (DBN) with five kinds of ethnic musical instruments as the experimental objects. The DBN is the best way for softmax to identify and classify national musical instruments, and the accuracy rate is 99.2%. The deep CNN model in the field of deep learning has achieved good results in the fields of image and voice. Miao and Cheng (2023) study construction of a multimodal automatic music-annotation model based on a neural network algorithm. The construction of a multimodal automatic music-labeling model based on a neural network algorithm is launched.