AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Speech Coding using Deep Neural Networks and the Information Bottleneck Principle

Term

4. semester

Publication year

2019

Submitted on

Pages

104

Abstract

In this project the possibility of using Deep Neural Networks (DNNs) and the Information Bottleneck (IB) principle to perform speech coding is explored. An end-to-end strategy using DNNs in form of autoencoders is developed and the DNNs are trained using both synthetic data and speech files from the TIMIT database. Signals are encoded using a b-bit scalar quantizer employed internally in the DNNs and the bit rate is easy controllable by parameters of the quantizer amongst others. It was found that the the developed speech autoencoders trained with the Mean Squared Error (MSE) as a objective function did not outperform the results obtained by encoding signals using the Broad- Voice32 (BV32) codec in terms of both bit rate and Perceptual Evaluation of Speech Quality (PESQ) scores. The DNNs outperformed the BV32 codec in terms of PESQ scores for bit rates of 5 bit per sample or higher. By exploring the marginal entropies it was possible to achieve an average PESQ score of 4:46 and standard deviation of 0:03 for the DNN speech autoencoders and by using a bit rate less than half the bit rate used for standard 16-bit Pulse Code Modulation encoding. A loss function involving the MSE and marginal entropies was proposed inspired by the IB principle. However it was not possible to find adequate weights such that the loss function was suitable for training DNN speech autoencoders