Timbre modification using deep learning

Student thesis: Master Thesis and HD Thesis

  • Mattia Paterna
4. Term, Sound and Music Computing (Master Programme)
This thesis introduces timbre transformations by means of deep learning. A set of convolutional autoencoders is created to deal with the task. Each structure uses convolutional layers as building blocks. First, a shallow architecture is used to perform the reconstruction of a series of piano notes and infer a set of optimal hyperparameters for the building blocks. Later, several architectures are deployed and compared in the attempt to transform an input sound into a target sound. Doing so, two wind instruments are used, namely the flute and the clarinet. The input and the output of the deep structure are log-magnitude spectra of the audio signals. The Griffin-Lim algorithm is used for reconstructing phase information and generate an audio output using the outcome of the autoencoder. Results show that the convolutional autoencoder performs a fair job in the timbre transformation, especially when techniques, such as residual learning and dilation, are implemented. Moreover, constraints, such as sparsity, and regularisation help in retrieving an optimal latent representation of the spectra.
Publication date2017
Number of pages57
ID: 258867791