Paper
19 February 2024 Time-domain speech separation based on local parallel recursive networks and global fast attention
Yan Yan, Kailin Zheng, Chunsheng Xu
Author Affiliations +
Proceedings Volume 13063, Fourth International Conference on Computer Vision and Data Mining (ICCVDM 2023); 1306320 (2024) https://doi.org/10.1117/12.3021503
Event: Fourth International Conference on Computer Vision and Data Mining (ICCVDM 2023), 2023, Changchun, China
Abstract
The problem of separating mixed audio signals into independent speech signals is referred to as speech separation problem, also known as the cocktail party problem. We propose a novel method that utilizes TasNets to separate mixed speech signals. Similar to Dual-Path RNN, our innovation lies in the use of parallel recurrent neural networks within the block sequence, with an additional recurrent neural network added. We utilize fast attention to achieve global attention, aggregating context-aware information and achieving parallelism. We use the Swish activation function in the separation module. Our approach consistently outperforms GALR in the benchmark WSJ0-2mix task, with an absolute improvement of 0.7dB in SI-SNRi, for models of a similar size to GALR. Experimental results demonstrate the effectiveness of our network compared to previous methods.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Yan Yan, Kailin Zheng, and Chunsheng Xu "Time-domain speech separation based on local parallel recursive networks and global fast attention", Proc. SPIE 13063, Fourth International Conference on Computer Vision and Data Mining (ICCVDM 2023), 1306320 (19 February 2024); https://doi.org/10.1117/12.3021503
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
Back to Top