AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Initialization of Speaker Segmentation System Using Friends and Enemies Algorithm

Author

Term

10. term

Publication year

2007

Pages

62

Abstract

This thesis examines the initialization of a speaker segmentation system using the non‑uniform Friends and Enemies (FE) algorithm to improve automated diarization of large audio collections. The system first detects speaker change points with a standard Bayesian Information Criterion (BIC) method and then groups “friend segments” with similar likelihoods to create initial models. Because the original FE paper fixed several parameters without clear theoretical justification (e.g., the number of friend segments and initial models) and did not address domain suitability, this work tests different parameter settings and evaluates performance on meeting and broadcast news domains. Results show that FE initialization achieves high purity (99.52%) and a low diarization error rate (DER, 0.48%) on meeting data, outperforming uniform segmentation in that domain, while being less suitable for broadcast news. The findings highlight that initialization strategy and parameter choices should be adapted to the target audio domain.

Denne afhandling undersøger initialisering af et talersegmenteringssystem ved hjælp af den ikke‑uniforme Friends and Enemies (FE) algoritme for at forbedre automatiseret diarisation i store mængder lyddata. Systemet detekterer først talerskift med en standardmetode baseret på Bayesiansk informationskriterium (BIC) og grupperer derefter “vennesegmenter” med lignende sandsynligheder for at danne de første modeller. Da den oprindelige litteratur fastsatte flere FE‑parametre uden tydelig teoretisk begrundelse (fx antal vennesegmenter og antal initiale modeller) og ikke belyste domænespecificitet, afprøves forskellige parameterindstillinger og evalueres på møde‑ og broadcast‑nyhedsdomæner. Resultaterne viser, at FE‑initialisering opnår høj renhed (99,52 %) og lav diarization error rate (DER, 0,48 %) på mødedata og overgår uniform segmentering i dette domæne, mens den er mindre velegnet til nyhedsudsendelser. Arbejdet peger på, at valg af initialiseringsstrategi og parametre bør tilpasses det konkrete lyddomæne.

[This apstract has been generated with the help of AI directly from the project full text]