AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Syntetisk Udvidelse af Båndbredde i Smalbåndet Tale

Translated title

Artificial Bandwidth Extension of Narrowband Speech

Authors

;

Term

10. term

Publication year

2007

Pages

103

Abstract

Dette speciale undersøger, hvordan man kan genskabe bredbåndstale (0–8000 Hz) ud fra smalbåndstale (0–3400 Hz) ved at estimere de manglende højfrekvente komponenter med statistiske metoder. Med udgangspunkt i kilde–filter-modellen deles opgaven i to: at estimere en bredbånds-enveloppe (den overordnede spektrale form) og et bredbånds-excitation-signal (kildesignalets fine struktur). Disse to dele kombineres for at skabe en kunstigt udvidet bredbåndstale. Til estimering af enveloppen udvikles tre metoder baseret på hhv. vektorkvantificering (VQ), Gaussiske blandingsmodeller (GMM) og Skjulte Markov-modeller (HMM). Resultaterne viser, at GMM- og HMM-metoderne overgår VQ, både i objektive målinger og i hørbar kvalitet. Excitationen estimeres ved simpel spektral replikering. Der foreslås desuden en ny perceptuel træningsprocedure, som anvender Mel-frekvens-cepstrale koefficienter (MFCC) til at estimere enveloppen. En formel lyttetest konkluderer, at den foreslåede metode til at udvide smalbåndstale foretrækkes frem for båndbegrænset smalbåndstale med meget høj statistisk signifikans (rapporteret som >99).

This thesis explores how to reconstruct wideband speech (0–8000 Hz) from narrowband speech (0–3400 Hz) by estimating the missing high-frequency components using statistical methods. Using the source–filter model, the task is split into estimating a wideband envelope (the overall spectral shape) and a wideband excitation signal (the fine structure of the source). These estimates are then combined to produce an artificially extended wideband speech signal. Three methods for envelope estimation are developed: vector quantization (VQ), Gaussian mixture models (GMM), and hidden Markov models (HMM). Results show that the GMM and HMM approaches outperform VQ in both objective measures and perceived audio quality. The excitation is estimated by simple spectral replication. A new perceptually motivated training procedure that uses Mel-frequency cepstral coefficients (MFCCs) to estimate the envelope is proposed. A formal listening test concludes that the proposed extension method is preferred over bandlimited narrowband speech with very high statistical significance (reported as >99).

[This abstract was generated with the help of AI]