Paper
19 February 2024 Application of transformer on the multitask of identification of accent and gender toward an audio
Wenpeng Xu, Yihong Yang, Yixiang Ying
Author Affiliations +
Proceedings Volume 13063, Fourth International Conference on Computer Vision and Data Mining (ICCVDM 2023); 1306311 (2024) https://doi.org/10.1117/12.3021354
Event: Fourth International Conference on Computer Vision and Data Mining (ICCVDM 2023), 2023, Changchun, China
Abstract
Transformer is a new type of deep learning model that relies on attention mechanism. This study investigated a way of correctly identifying both the accent and gender of audio with high accuracy and preciseness. Thorough trainings were conducted with 1000+ hours of audio recordings, our model classified the gender and the accent. Multitask refers to the capability of a program to perform multiple tasks or processes concurrently. The model we finished not only correctly classify the accent of the audio but also gender.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Wenpeng Xu, Yihong Yang, and Yixiang Ying "Application of transformer on the multitask of identification of accent and gender toward an audio", Proc. SPIE 13063, Fourth International Conference on Computer Vision and Data Mining (ICCVDM 2023), 1306311 (19 February 2024); https://doi.org/10.1117/12.3021354
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
Back to Top