AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Baum-Welch Algorithm for Markov Models Using Algebraic Decision Diagrams

Authors

; ;

Term

4. term

Publication year

2025

Submitted on

Pages

17

Abstract

Skjulte Markov-modeller (HMM'er) og Markov-kæder (MC'er) er udbredte probabilistiske modeller til at beskrive systemer over tid. De trænes ofte med Baum-Welch-algoritmen ud fra observationssekvenser. Traditionelle, rekursive eller matrixbaserede implementeringer skalerer dog dårligt, fordi de gentager beregninger og bruger meget hukommelse. Denne afhandling præsenterer en ny, symbolsk implementering af Baum-Welch ved hjælp af Algebraic Decision Diagrams (ADD'er), som er kompakte datastrukturer til at repræsentere sandsynligheder. Vi bygger videre på CuPAAL, et C++-bibliotek der implementerer Baum-Welch udelukkende med ADD'er, og integrerer det i Jajapy-biblioteket, hvilket resulterer i et nyt symbolsk læringsværktøj, Jajapy 2. Tilgangen muliggør effektiv læring fra flere observationssekvenser og understøtter både HMM'er og MC'er. Gennem eksperimenter på modeller fra QComp-benchmark-sættet viser vi, at den symbolske implementering markant forbedrer ydeevnen for større mængder observationer og for modeller med gentagne strukturer, samtidig med at læringsnøjagtigheden bevares. Resultaterne peger på, at ADD-baseret symbolsk beregning er et skalerbart alternativ til læring af probabilistiske modeller.

Hidden Markov Models (HMMs) and Markov Chains (MCs) are widely used probabilistic models to describe systems over time. They are often trained from observation sequences using the Baum-Welch algorithm. However, traditional recursive or matrix-based implementations scale poorly because they repeat work and consume a lot of memory. This thesis presents a new, symbolic implementation of Baum-Welch using Algebraic Decision Diagrams (ADDs), compact data structures for representing probabilities. We build on CuPAAL, a C++ library that implements Baum-Welch entirely with ADDs, and integrate it into the Jajapy library, resulting in a new symbolic learning tool, Jajapy 2. Our approach enables efficient learning from multiple observation sequences and supports both HMMs and MCs. Experiments on models from the QComp benchmark set show that the symbolic implementation significantly improves performance for larger observation sets and for models with repeated structures, while maintaining learning accuracy. These results indicate that ADD-based symbolic computation is a scalable alternative for learning probabilistic models.

[This summary has been rewritten with the help of AI based on the project's original abstract]