AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Predictability-Based Objective Evaluation of Sound

Author

Term

4. semester

Publication year

2021

Submitted on

Pages

85

Abstract

We investigate whether predictability can serve as an objective way to find the parts of a speech signal that matter most for understanding. We introduce a new measure that provides a perceptually relevant, non-intrusive (can be applied directly to the signal) estimate of the information-theoretic quantity known as mutual information (a measure of how much knowing one thing tells us about another). The measure is computed on short time frames of speech using deep convolutional neural networks to learn patterns in the audio. In a listening test, we compared this measure with two existing approaches: sound intensity and cochlea-scaled spectral entropy (a measure of spectral randomness aligned with the ear’s frequency scale). In some conditions, our measure more accurately highlighted the time frames that support speech intelligibility than these alternatives; in other conditions, it did not. These results indicate that predictability has potential as an objective indicator, but the proposed measure requires refinement before it can be reliably applied.

Vi undersøger, om forudsigelighed kan bruges som et objektivt, målbart grundlag for at finde de dele af et talesignal, der er vigtigst for forståelsen. Vi præsenterer en ny målemetode, der giver et perceptuelt relevant, ikke-intrusivt (kan anvendes direkte på signalet) estimat af den informationsteoretiske størrelse gensidig information (et mål for, hvor meget viden om én ting fortæller om en anden). Metoden beregner værdier for korte tidsvinduer af tale ved hjælp af dybe konvolutionelle neurale netværk, der lærer mønstre i lyden. I et lytteforsøg sammenlignede vi metoden med to eksisterende tilgange: lydintensitet og cochlea-skaleret spektral entropi (et mål for spektral tilfældighed tilpasset ørets frekvensskala). Under nogle forhold fremhævede vores metode de tidsvinduer, der understøtter taleforståelighed, bedre end de to alternativer; under andre forhold gjorde den ikke. Resultaterne peger på, at forudsigelighed har potentiale som et objektivt mål, men at den foreslåede metode skal forfines, før den kan anvendes pålideligt.

[This apstract has been rewritten with the help of AI based on the project's original abstract]