Timbre modification using deep learning

Author

Paterna, Mattia

Term

4. Term

Education

Sound and Music Computing

Publication year

2017

Submitted on

2017-06-02

Pages

Abstract

This thesis introduces timbre transformations by means of deep learning. A set of convolutional autoencoders is created to deal with the task. Each structure uses convolutional layers as building blocks. First, a shallow architecture is used to perform the reconstruction of a series of piano notes and infer a set of optimal hyperparameters for the building blocks. Later, several architectures are deployed and compared in the attempt to transform an input sound into a target sound. Doing so, two wind instruments are used, namely the flute and the clarinet. The input and the output of the deep structure are log-magnitude spectra of the audio signals. The Griffin-Lim algorithm is used for reconstructing phase information and generate an audio output using the outcome of the autoencoder. Results show that the convolutional autoencoder performs a fair job in the timbre transformation, especially when techniques, such as residual learning and dilation, are implemented. Moreover, constraints, such as sparsity, and regularisation help in retrieving an optimal latent representation of the spectra.

Keywords

deep learning ; timbre transformation ; representations ; convolutional ; autoencoder

Documents

Download
View record in AAU Student Projects

A master's thesis from Aalborg University

Timbre modification using deep learning