Robust Keyword Spotting for Noisy Environments by Leveraging Speech Enhancement and Speech Presence Probability
Abstract
Although various deep keyword spotting (KWS) systems have demonstrated promising performance under relatively noiseless environments, accurate keyword detection in the presence of strong noise encountered in our daily lives remains challenging. Room acoustics and noise conditions can be highly diverse, which can lead to drastic performance degradation if not handled carefully. In this paper, we aim to make deep KWS systems with small model sizes robust to environmental noise. We propose a noise management module (SE-SPP Net) that estimates both the denoised Mel spectrogram and the position of the speech utterance in the noisy input signal. The latter is estimated as the probability of a particular TF bin containing speech. Further, it comes at relatively no cost in model size when compared to a model estimating the denoising mask. Our proposed SE-SPP with KWD module can improve keyword speech performance by up to 7% compared to a similar sized SOTA model at SNR -10dB.
Author: Chouchang Yang, Yashas Malur Saidutta, Rakshith Srinivasa,, Chinghua Lee, Yilin Shen, Hongxia Jin
Published: Annual Conference of the International Speech Communication Association (INTERSPEECH)
Date: Aug 21, 2023