Paper
8 November 2024 Channel attention concatenation multi-taper Fbank features for deep speaker verification
Yifan He, Liyun Xu
Author Affiliations +
Proceedings Volume 13416, Fourth International Conference on Advanced Algorithms and Neural Networks (AANN 2024); 134162J (2024) https://doi.org/10.1117/12.3049499
Event: 2024 4th International Conference on Advanced Algorithms and Neural Networks, 2024, Qingdao, China
Abstract
Automatic speaker verification (ASV) has become a widely used application of deep learning. However, many early positive findings were rooted in traditional methods like Gaussian mixture models (GMM) rather than deep learning. While the multi-taper spectrum estimator has proven effective in enhancing GMM-based ASV accuracy, the integration of traditional multi-tapers with modern deep learning models may not be seamless. To address this, we introduce the Channel Attention Concatenation Multi-taper Fbank (CCM-Fbank), which seamlessly integrates multi-taper spectral estimation with the popular ECAPA-TDNN model, resulting in improved accuracy and robustness. Additionally, we propose a deeper model named Double Block ECAPA-TDNN, which has just over half the number of parameters of ECAPA-TDNN (C=1024), and performs better with limited training samples.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Yifan He and Liyun Xu "Channel attention concatenation multi-taper Fbank features for deep speaker verification", Proc. SPIE 13416, Fourth International Conference on Advanced Algorithms and Neural Networks (AANN 2024), 134162J (8 November 2024); https://doi.org/10.1117/12.3049499
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Speaker recognition

Feature extraction

Signal to noise ratio

Neural networks

Back to Top