The problem of separating mixed audio signals into independent speech signals is referred to as speech separation problem, also known as the cocktail party problem. We propose a novel method that utilizes TasNets to separate mixed speech signals. Similar to Dual-Path RNN, our innovation lies in the use of parallel recurrent neural networks within the block sequence, with an additional recurrent neural network added. We utilize fast attention to achieve global attention, aggregating context-aware information and achieving parallelism. We use the Swish activation function in the separation module. Our approach consistently outperforms GALR in the benchmark WSJ0-2mix task, with an absolute improvement of 0.7dB in SI-SNRi, for models of a similar size to GALR. Experimental results demonstrate the effectiveness of our network compared to previous methods.
|