Generative Adversarial Networks for Speech Processing

Student thesis: Master thesis (including HD thesis)

  • Daniel Michelsanti
Deep learning approaches have gained popularity in a variety of fields, such as computer vision, speech processing, and natural language processing, due to their impressive performance and their flexibility. Among them, a new framework for deep generative model estimation has been recently proposed: generative adversarial network. This framework has already shown good performance in different image processing and computer vision tasks, but its adoption for speech-related tasks is still limited. In this project we explore some of the possibilities that an adversarial training can offer for speech processing. In particular, two applications have been considered: speech enhancement and automatic speech generation. Regarding speech enhancement, experimental results show that the adopted approach overall outperforms the classical short-time spectral amplitude minimum mean square error method, and is comparable to a deep neural network-based technique. On the other hand, the results on automatic speech generation indicate that our models are able to generate plausible spectrograms, even though some artefacts can be heard in the reconstructed signals. We provide generated samples for a subjective evaluation of the quality.
Publication date2017
ID: 259347156