AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


A hybrid approach to structural modeling of individualized HRTFs: Generating and combining pinna responses, head-and-torso filtering, and interaural time difference data

Translated title

A hybrid approach to structural modeling of individualized HRTFs

Author

Term

4. Term

Publication year

2020

Pages

72

Abstract

For at virtuel virkelighed skal føles virkelig, skal rumlig lyd tilpasses hver lytter, så lyde høres uden for hovedet og fra den rigtige retning. Dette kræver en personlig HRTF (head-related transfer function), som beskriver, hvordan hoved, overkrop og øremusling (pinna) former den indkommende lyd. Direkte måling af en persons HRTF kræver specialudstyr og er både krævende og dyrt. Vi præsenterer en hybrid HRTF-model, der kun bruger tre antropometriske mål og et enkelt billede af øremuslingens konturer. En forudsigelsesalgoritme baseret på variational autoencoders (en maskinlæringsmodel, der kan generere data) syntetiserer ørets akustiske respons ud fra konturbilledet. Dette respons bruges til at filtrere en målt hoved- og torsorespons. Derefter justeres ITD (interaural tidsforskel), den lille tidsforskel mellem ørerne, så den matcher en udvalgt person i HUTUBS-datasættet med det formål at minimere den forudsagte lokaliseringsfejl. Vi evaluerer metoden med måling af spektral forvrængning og en perceptuel lokaliseringsmodel. Mens den perceptuelle model er uafklaret med hensyn til den strukturelle models effektivitet, viser spektral forvrængning lovende resultater for kodning af HRTF-datasæt.

For virtual reality to feel immersive, spatial audio must be tailored so listeners hear sounds outside the head and from the correct direction. This requires a personalized head-related transfer function (HRTF), which describes how the head, torso, and outer ear (pinna) shape incoming sound. Directly measuring a person’s HRTF needs specialized equipment and is both demanding and expensive. We present a hybrid HRTF modeling approach that uses only three anthropometric measurements and a single image of the pinna contours. A prediction algorithm based on variational autoencoders—a machine-learning model that can generate data—synthesizes the pinna’s acoustic response from the contour image. This synthesized response is then used to filter a measured head-and-torso response. Next, the interaural time difference (ITD), the tiny timing difference between the ears, is adjusted to match a subject from the HUTUBS dataset, aiming to minimize predicted localization error. We evaluate performance using spectral distortion and a perceptual localization model. While the perceptual model is inconclusive about the structural model’s effectiveness, spectral distortion shows promising results for encoding HRTF datasets.

[This summary has been rewritten with the help of AI based on the project's original abstract]