Paper
19 February 2024 Automatic speech recognition based on attention-enhanced blockformer
Wei Liu, Tianyu Zhan, Chunsheng Xu
Author Affiliations +
Proceedings Volume 13063, Fourth International Conference on Computer Vision and Data Mining (ICCVDM 2023); 130630Q (2024) https://doi.org/10.1117/12.3021478
Event: Fourth International Conference on Computer Vision and Data Mining (ICCVDM 2023), 2023, Changchun, China
Abstract
The Blockformer speech recognition model has recently been proposed as a state-of-the-art (SOTA) model ontheAishell-1 Chinese speech dataset. This model exhibited significant improvements in character error rate (CER) when compared to its baseline, Conformer. The key improvement of Blockformer is the addition of the Squeeze-and-Excitation (SE) block on top of Conformer, which enables better utilization of the information contained in each Conformer block. In our study of Blockformer, we identified scope for improving its block information extraction method. To this end, we used the attention mechanism to enhance the SE block's efficacy in squeezing block information. And we enhanced the model's structure in attention inference mode to align more effectively with the training structure. Under the four inference modes, namely attention, attention rescoring, ctc greedy search, and ctc prefix beam search, the CER reaches 4.67%, 4.43%, 4.75% and 4.75%. All of these rates are at the level of Blockformer or exceed it.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Wei Liu, Tianyu Zhan, and Chunsheng Xu "Automatic speech recognition based on attention-enhanced blockformer", Proc. SPIE 13063, Fourth International Conference on Computer Vision and Data Mining (ICCVDM 2023), 130630Q (19 February 2024); https://doi.org/10.1117/12.3021478
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
Back to Top