Author(s)
Term
7. semester
Education
Publication year
2022
Submitted on
2022-06-02
Pages
38 pages
Abstract
Denne rapport er udarbejdet i forbindelse med et 4 måneder langt Elektronik og IT diplomingeniør afgangsprojekt ved Aalborg Universitet. Det langsigtede formål med projektet har været at udvikle en algoritme, der er i stand til automatisk at transskribere tale fonetisk i realtid. Kortsigtet er projektet afgrænset til at transskribere vokaler i realtid. Som grundlag for udviklingen undersøges, hvordan mennesker producerer tale, og hvordan denne tale kan analyseres ved hjælp af signalbehandling. I den forbindelse undersøges sammenhængen mellem zero-crossing rate (ZCR) og stemt tale, herunder vokaler, samt hvordan formanter i særligt vokaler kan estimeres ved hjælp af linear predictive coding (LPC). Der opstilles en række krav, således at det kan undersøges, hvorvidt den udviklede algoritme er i stand til at transskribere vokaler samt automatisk registrere, hvornår tale består af en vokal. Den udviklede algoritme overholdt ikke kravene. Dog var det muligt at se ud fra testresultaterne, at algoritmen er i stand til at nå noget af vejen. Det vurderes, at det ikke umiddelbart vil være muligt at overholde vokalregistreringskravene, så længe algoritmen skal kunne køre i realtid. Til gengæld lader det til, at vokalestimeringen kan optimeres ved også at inddrage den grundlæggende frekvens.
With the ultimate goal of automatically transcribing speech phonetically in real time, this report focuses on the ability to automatically transcribe vowels in real time. For the first part of the report, human speech production was examined, including the resulting acoustic signal, as well as how it can be analyzed using signal processing methods, such as zero-crossing rate and linear predictive coding. With this background, an algorithm was created for automatically detecting vowel segments of speech, as well as estimating which vowel was spoken. The algorithm was programmed and tested in MATLAB. The results did not fulfill the requirements defined in the report. Vowel detection may be difficult to improve on with the goal of running the algorithm in real time, however the vowel estimation had an obvious flaw which may be solvable with further development.
Keywords
ipa ; phonetic ; transcription ; formant ; vowel ; zero-crossing ; linear prediction
Documents
Colophon: This page is part of the AAU Student Projects portal, which is run by Aalborg University. Here, you can find and download publicly available bachelor's theses and master's projects from across the university dating from 2008 onwards. Student projects from before 2008 are available in printed form at Aalborg University Library.
If you have any questions about AAU Student Projects or the research registration, dissemination and analysis at Aalborg University, please feel free to contact the VBN team. You can also find more information in the AAU Student Projects FAQs.