Speech Coding using Deep Neural Networks and the Information Bottleneck Principle

Studenteropgave: Kandidatspeciale og HD afgangsprojekt

  • Barbara Martinovic
4. semester, Matematik-teknologi (cand.polyt.), Kandidat (Kandidatuddannelse)
In this project the possibility of using
Deep Neural Networks (DNNs) and the
Information Bottleneck (IB) principle
to perform speech coding is explored.
An end-to-end strategy using DNNs in
form of autoencoders is developed and
the DNNs are trained using both synthetic
data and speech files from the
TIMIT database. Signals are encoded
using a b-bit scalar quantizer employed
internally in the DNNs and the bit rate
is easy controllable by parameters of the
quantizer amongst others. It was found
that the the developed speech autoencoders
trained with the Mean Squared
Error (MSE) as a objective function
did not outperform the results obtained
by encoding signals using the Broad-
Voice32 (BV32) codec in terms of both
bit rate and Perceptual Evaluation of
Speech Quality (PESQ) scores. The
DNNs outperformed the BV32 codec
in terms of PESQ scores for bit rates of
5 bit per sample or higher. By exploring
the marginal entropies it was possible to
achieve an average PESQ score of 4:46
and standard deviation of 0:03 for the
DNN speech autoencoders and by using
a bit rate less than half the bit rate used
for standard 16-bit Pulse Code Modulation
encoding.
A loss function involving the MSE and
marginal entropies was proposed inspired
by the IB principle. However
it was not possible to find adequate
weights such that the loss function was
suitable for training DNN speech autoencoders
SprogEngelsk
Udgivelsesdato7 jun. 2019
Antal sider104
Ekstern samarbejdspartnerRTX A/S
Peter Mariager pm@rtx.dk
Anden
RTX A/S
Ricco Jensen rje@rtx.dk
Anden
ID: 305310370