AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Real-time Blind Source Separation: A Feasibility Study using CUDA

Authors

;

Term

10. term

Publication year

2009

Pages

201

Abstract

In many audio applications, we need to separate sounds recorded together—for example, different speakers captured by multiple microphones. When little is known about the sources or how they were mixed, this is called blind source separation (BSS). This thesis examines whether a real-time BSS method for speech can run on an NVIDIA graphics processing unit (GPU) using higher-order statistics (HOS), which analyze patterns beyond averages and variances. The study uses a two-input, two-output model, so the task becomes estimating a set of digital filters that undo the mixing. A filter-estimation method based on HOS (specifically, fourth-order statistics via the trispectrum) is presented. Simulations indicate that the method can achieve a signal-to-interference ratio of about 10 dB, showing effective separation under test conditions. A complexity analysis reveals that computing the trispectrum estimates is the main bottleneck. Under initial assumptions of fully utilizing the GPU, the method would need to be roughly 130 times faster to meet real-time constraints. By exploiting how the trispectrum is used in the algorithm, the computational cost can be reduced by a factor of 263. Part of the method is implemented in CUDA, and, after optimization, the measured performance corresponds to a filter update rate of about 1.19 times per second. Because the target is 25 updates per second, the current implementation on this platform does not yet achieve real-time operation.

I mange lydapplikationer skal man adskille flere lyde, der er optaget sammen—for eksempel flere talere opfanget af flere mikrofoner. Når man ved meget lidt om kilderne og blandingen, kaldes det blind kildeseparation (BSS). Dette speciale undersøger, om en BSS-metode til tale kan køre i realtid på en NVIDIA grafikprocessor (GPU) ved hjælp af højere ordens statistik (HOS), som analyserer mønstre ud over middelværdi og varians. Der anvendes en to-indgang-to-udgangsmodel (TITO), så opgaven bliver at estimere et sæt digitale filtre, der ophæver blandingen. En filterestimeringsmetode baseret på HOS (specifikt fjerdeordens statistik via trispektrum) præsenteres. Simulationer viser, at metoden kan opnå et signal-til-interferens-forhold omkring 10 dB, hvilket indikerer effektiv adskillelse under testforhold. En kompleksitetsanalyse viser, at beregning af trispektrum-estimaterne er den største flaskehals. Under de indledende antagelser om fuld udnyttelse af GPU’en skulle metoden køre cirka 130 gange hurtigere for at nå realtid. Ved at udnytte, hvordan trispektrum anvendes i algoritmen, kan beregningsomkostningen reduceres med en faktor 263. En del af metoden er implementeret i CUDA, og efter optimering svarer den målte ydelse til en filteropdateringshastighed på cirka 1,19 gange pr. sekund. Da målet er 25 opdateringer pr. sekund, opnår den nuværende implementering på denne platform endnu ikke realtid.

[This apstract has been rewritten with the help of AI based on the project's original abstract]