Leveraging Self-Supervised Speech Representations for Domain Adaptation in Speech Enhancement
Abstract
Deep learning based speech enhancement (SE) approaches could suffer from performance degradation due to mismatch between training and testing environments. A realistic situation is that an SE model trained on parallel noisy-clean utterances from one environment, the source domain, may fail to perform adequately in another environment, the target (new) domain of unseen acoustic or noise conditions. Even though we can improve the target domain performance by leveraging paired data in that domain, in reality, noisy data is more straightforward to collect. Therefore, it is worth studying unsupervised domain adaptation techniques for SE that utilize only noisy data from the target domain, together with exploiting the knowledge available from the source domain paired data, for improved SE in the new domain. In this paper, we present a novel adaptation framework for SE by leveraging self-supervised learning (SSL) based speech models. SSL models are pre-trained with large amount of raw speech data to extract representations rich in phonetic and acoustics information. We explore the potential of leveraging SSL representations for effective SE adaptation to new domains. To our knowledge, it is the first attempt to apply SSL models for domain adaptation in SE.
Author: Ching-Hua Lee, Chouchang Yang, Rakshith Sharma Srinivasa, Yashas Malur Saidutta, Jaejin Cho, Yilin Shen, Hongxia Jin
Published: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date: Apr 14, 2024