Syntetisk Udvidelse af Båndbredde i Smalbåndet Tale
Translated title
Artificial Bandwidth Extension of Narrowband Speech
Authors
Rohde, Nels ; Vedstesen, Svend Aage
Term
10. term
Education
Publication year
2007
Pages
103
Abstract
Dette speciale undersøger, hvordan man kan genskabe bredbåndstale (0–8000 Hz) ud fra smalbåndstale (0–3400 Hz) ved at estimere de manglende højfrekvente komponenter med statistiske metoder. Med udgangspunkt i kilde–filter-modellen deles opgaven i to: at estimere en bredbånds-enveloppe (den overordnede spektrale form) og et bredbånds-excitation-signal (kildesignalets fine struktur). Disse to dele kombineres for at skabe en kunstigt udvidet bredbåndstale. Til estimering af enveloppen udvikles tre metoder baseret på hhv. vektorkvantificering (VQ), Gaussiske blandingsmodeller (GMM) og Skjulte Markov-modeller (HMM). Resultaterne viser, at GMM- og HMM-metoderne overgår VQ, både i objektive målinger og i hørbar kvalitet. Excitationen estimeres ved simpel spektral replikering. Der foreslås desuden en ny perceptuel træningsprocedure, som anvender Mel-frekvens-cepstrale koefficienter (MFCC) til at estimere enveloppen. En formel lyttetest konkluderer, at den foreslåede metode til at udvide smalbåndstale foretrækkes frem for båndbegrænset smalbåndstale med meget høj statistisk signifikans (rapporteret som >99).
This thesis explores how to reconstruct wideband speech (0–8000 Hz) from narrowband speech (0–3400 Hz) by estimating the missing high-frequency components using statistical methods. Using the source–filter model, the task is split into estimating a wideband envelope (the overall spectral shape) and a wideband excitation signal (the fine structure of the source). These estimates are then combined to produce an artificially extended wideband speech signal. Three methods for envelope estimation are developed: vector quantization (VQ), Gaussian mixture models (GMM), and hidden Markov models (HMM). Results show that the GMM and HMM approaches outperform VQ in both objective measures and perceived audio quality. The excitation is estimated by simple spectral replication. A new perceptually motivated training procedure that uses Mel-frequency cepstral coefficients (MFCCs) to estimate the envelope is proposed. A formal listening test concludes that the proposed extension method is preferred over bandlimited narrowband speech with very high statistical significance (reported as >99).
[This abstract was generated with the help of AI]
Keywords
Bandwidth Extension ; wideband ; narrowband ; HMM ; Udvidelse af Båndbredde ; bredbånd ; smalbånd ; HMM
Documents
