PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Transformer is a new type of deep learning model that relies on attention mechanism. This study investigated a way of correctly identifying both the accent and gender of audio with high accuracy and preciseness. Thorough trainings were conducted with 1000+ hours of audio recordings, our model classified the gender and the accent. Multitask refers to the capability of a program to perform multiple tasks or processes concurrently. The model we finished not only correctly classify the accent of the audio but also gender.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Wenpeng Xu,Yihong Yang, andYixiang Ying
"Application of transformer on the multitask of identification of accent and gender toward an audio", Proc. SPIE 13063, Fourth International Conference on Computer Vision and Data Mining (ICCVDM 2023), 1306311 (19 February 2024); https://doi.org/10.1117/12.3021354
ACCESS THE FULL ARTICLE
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
The alert did not successfully save. Please try again later.
Wenpeng Xu, Yihong Yang, Yixiang Ying, "Application of transformer on the multitask of identification of accent and gender toward an audio," Proc. SPIE 13063, Fourth International Conference on Computer Vision and Data Mining (ICCVDM 2023), 1306311 (19 February 2024); https://doi.org/10.1117/12.3021354