AAU Student Projects - visit Aalborg University's student projects portal
A master thesis from Aalborg University

Timbre modification using deep learning

Author(s)

Term

4. Term

Education

Publication year

2017

Submitted on

2017-06-02

Pages

57 pages

Abstract

This thesis introduces timbre transformations by means of deep learning. A set of convolutional autoencoders is created to deal with the task. Each structure uses convolutional layers as building blocks. First, a shallow architecture is used to perform the reconstruction of a series of piano notes and infer a set of optimal hyperparameters for the building blocks. Later, several architectures are deployed and compared in the attempt to transform an input sound into a target sound. Doing so, two wind instruments are used, namely the flute and the clarinet. The input and the output of the deep structure are log-magnitude spectra of the audio signals. The Griffin-Lim algorithm is used for reconstructing phase information and generate an audio output using the outcome of the autoencoder. Results show that the convolutional autoencoder performs a fair job in the timbre transformation, especially when techniques, such as residual learning and dilation, are implemented. Moreover, constraints, such as sparsity, and regularisation help in retrieving an optimal latent representation of the spectra.

This thesis introduces timbre transformations by means of deep learning. A set of convolutional autoencoders is created to deal with the task. Each structure uses convolutional layers as building blocks. First, a shallow architecture is used to perform the reconstruction of a series of piano notes and infer a set of optimal hyperparameters for the building blocks. Later, several architectures are deployed and compared in the attempt to transform an input sound into a target sound. Doing so, two wind instruments are used, namely the flute and the clarinet. The input and the output of the deep structure are log-magnitude spectra of the audio signals. The Griffin-Lim algorithm is used for reconstructing phase information and generate an audio output using the outcome of the autoencoder. Results show that the convolutional autoencoder performs a fair job in the timbre transformation, especially when techniques, such as residual learning and dilation, are implemented. Moreover, constraints, such as sparsity, and regularisation help in retrieving an optimal latent representation of the spectra.

Keywords

Documents


Colophon: This page is part of the AAU Student Projects portal, which is run by Aalborg University. Here, you can find and download publicly available bachelor's theses and master's projects from across the university dating from 2008 onwards. Student projects from before 2008 are available in printed form at Aalborg University Library.

If you have any questions about AAU Student Projects or the research registration, dissemination and analysis at Aalborg University, please feel free to contact the VBN team. You can also find more information in the AAU Student Projects FAQs.