Presentation + Paper
1 August 2021 Expanding smart assistant accessibility through dysarthria speech-trained transformer networks
Author Affiliations +
Abstract
Smart assistant usage has increased significantly with the AI boom and growth of IoT. Speech as an input modality brings a level of personalization to the various smart voice assistant products and applications; however, many smart assistants underperform when tasked with interpreting atypical speech input. Dysarthria, heavy accents, and deaf and hard-of-hearing speech characteristics prove difficult for smart assistants to interpret despite the large amounts of diverse data used to train automatic speech recognition models. In this study, we explore the Transformer architecture for use as an automatic speech recognition model for speech with medium to low intelligibility scores. We utilize the Transformer model pre-trained on the Librispeech dataset and fine-tuned on the Torgo dataset of atypical speech, as well as a subset of the University of Memphis Speech Perception Assessment Laboratory’s (UMemphis SPAL) Deaf speech dataset. We also develop a methodology for performing automatic speech recognition using a Node.JS application running on a Raspberry Pi 4 to function as a pipeline between the user and a Google Home smart assistant device. The highest performing Transformer model shows a 20.2% character error rate with a corresponding 29.0% word error rate on a subset of medium intelligibility audio samples from the UMemphis SPAL dataset. This study highlights the importance for a large, transcribed dataset, fueling a large atypical-speech data gathering effort through a newly developed web application, My-Voice.
Conference Presentation
© (2021) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Daniel W. Adams and Cory Merkel "Expanding smart assistant accessibility through dysarthria speech-trained transformer networks", Proc. SPIE 11843, Applications of Machine Learning 2021, 118430R (1 August 2021); https://doi.org/10.1117/12.2594212
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data modeling

Transformers

Prototyping

Statistical modeling

Computer programming

Speech recognition

Instrument modeling

Back to Top