Small-Footprint Keyword Spotting for Controlling Smart Home Appliances Using TCN and CRNN Models

Small-Footprint Keyword Spotting for Controlling Smart Home Appliances Using TCN and CRNN Models

Hemalatha Alapati, Christopher Paolini, Suchismita Chinara, Mahasweta Sarkar
DOI: 10.4018/IJITN.299365
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Smart homes feature automatic fire/smoke detection, voice-operated assets and appliances etc. More often, smart home appliances like lights, fans, etc., can be controlled through voice commands. Voice-operated devices like Alexa, Siri, and Google Assistant, are not new in the current age concerning voice command execution. However, working with these supports requires a global connection with the internet that costs time and bandwidth. Controlling home appliances need concise commands involving keywords on/off. Further, to operate the home appliances, bandwidth consumption for internet is not a wise idea. Through this paper, models based on Temporal Convolutional Networks (TCN) and Convolutional Recurrent Neural Networks (CRNN) have been studied for Keyword Spotting (KWS) by training models with keywords pronounced in different accents. The performance of these models is compared, and their ability to detect unknown words is studied. Finally, how these models are suitable for building Smart Home assistants to control home utilities with minimum bandwidth consumption is discussed.
Article Preview
Top

Literature Survey

Fadhil et al., (2020) define a smart home simply as some electronic devices that communicate with each other and are easily controlled by the owner with minimum effort. Also, the use-cases and applications of smart home systems have been discussed that cover many aspects of our daily lives and reduce the cost of living via controlling and managing home appliances.

Davis et al., (2020) contribute to reviewing the known vulnerability study of smart home devices, conducting vulnerability analysis in IoT devices, and studying security postures of lesser-known vendors and well-known vendors. They found out that prominent vendors have stronger security postures, whereas lesser-known vendors have weaker security postures.

Warden et al., (2018) designed an audio dataset of spoken words to help train and evaluate keyword spotting systems and discussed why it requires a specialized dataset different from conventional datasets. The Top-One error for a model runs inference on each audio clip and compares the top predicted class against the ground truth label encoded in its sub-folder name. The proportion of correct predictions will give the Top-One error based on which evaluation was done.

Bai et al., (2018) have concluded that convolutional networks should be regarded as a natural point and more powerful kit for sequence modelling tasks. The evaluation and performance of recurrent and generic convolutional architecture on various metrics were studied. The results suggested that the TCN convincingly outperforms recurrent architectures across a broad range of sequence modelling tasks.

Complete Article List

Search this Journal:
Reset
Volume 16: 1 Issue (2024)
Volume 15: 1 Issue (2023)
Volume 14: 1 Issue (2022)
Volume 13: 4 Issues (2021)
Volume 12: 4 Issues (2020)
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing