AAU Student Projects is unavailable between June 15th 1.30pm and 17th 1.30pm due to planned system maintenance. The projects cannot be downloaded during this period.
AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University

OntoAgentKG: Ontology-Aware Knowledge Graph Construction with LLM Agents

Translated title

OntoAgentKG: Ontologibevidst Konstruktion af Vidensgraf med LLM-agenter

Authors

; ;

Term

4. term

Publication year

2026

Abstract

This thesis investigates how large language models (LLMs) can be used to automatically construct knowledge graphs (KGs) from unstructured text in complex domains where manual KG building is costly and time-consuming. The authors introduce OntoAgentKG, a novel, domain-agnostic approach that combines ontology guidance with agent-based workflows. Instead of asking an LLM to extract all entity types and relations in a single pass, OntoAgentKG uses an input ontology to annotate entity types per text segment and then performs multiple focused LLM calls, each dedicated to one semantic category. In parallel, the method preconstructs the set of allowed relations between entity types from the ontology, so the LLM mainly has to decide whether a candidate triple is actually supported by the source text. This lowers task complexity and makes relation extraction closer to a classification problem than open-ended text generation. Agentic components orchestrate an iterative process in which extracted entities and relations are verified, missing information is detected, and selected steps are re-invoked with updated prompts to improve both correctness and completeness. The resulting data is organised into a star-like graph structure that preserves traceability between extracted knowledge and its textual sources. The method is evaluated on two medical datasets, the GutBrainIE corpus of PubMed abstracts and the CORAL dataset of electronic health records, using the open-source gpt-oss-20b model. OntoAgentKG outperforms the TreeKG baseline on precision, recall, and F1 scores for both entities and relations, achieving up to 37% higher F1. An ablation study shows that ontology guidance alone yields up to an 11% F1 improvement, agentic capabilities up to 2%, and their combination up to 17% over a variant without ontology and agents, which itself also surpasses TreeKG. The thesis thus contributes a scalable, ontology-aware, agentic framework that can build more precise and comprehensive knowledge graphs from unstructured text without domain-specific fine-tuning.

Denne afhandling undersøger, hvordan store sprogmodeller (LLM’er) kan bruges til automatisk at konstruere vidensgrafer (KG’er) fra ustruktureret tekst i komplekse domæner, hvor manuel opbygning er dyr og tidskrævende. Forfatterne introducerer OntoAgentKG, en ny, domæneuafhængig metode, der kombinerer ontologistyret viden med agent-baserede arbejdsgange. I stedet for at lade en LLM udtrække alle entiteter og relationer på én gang, anvender OntoAgentKG en input-ontologi til først at annotere entitetstyper pr. tekstsegment og derefter udføre flere fokuserede LLM-kald, hvor hvert kald kun håndterer én semantisk kategori. Samtidig forudkonstruerer metoden de tilladte relationer mellem entitetstyper baseret på ontologien, så LLM’en primært skal afgøre, om et foreslået triple faktisk understøttes af kildeteksten. Dette reducerer kompleksiteten og gør relationsekstraktion mere lig en klassifikationsopgave end fri tekstgenerering. Agentiske komponenter styrer en iterativ proces, hvor udtrukne entiteter og relationer verificeres, mangler opdages, og udvalgte trin genkøres med justerede prompts for at forbedre både korrekthed og komplethed. Resultatet organiseres i en stjernelignende grafstruktur, der bevarer sporbarhed mellem udtrukket viden og tekstkilderne. Metoden evalueres på to medicinske datasæt, GutBrainIE-korpuset med PubMed-abstracts og CORAL-datasættet med elektroniske patientjournaler, ved brug af den åbne model gpt-oss-20b. OntoAgentKG overgår baseline-metoden TreeKG på både præcision, recall og F1-score for entiteter og relationer, med op til 37 % højere F1-score. En ablationsundersøgelse viser, at den ontologistyrede strategi alene giver op til 11 % forbedring i F1, agentiske funktioner op til 2 %, og at kombinationen af begge giver op til 17 % forbedring i forhold til en variant uden ontologi og agentik, som i sig selv også klarer sig bedre end TreeKG. Afhandlingen bidrager dermed med et skalerbart, ontologi-bevidst og agentisk framework, der kan konstruere mere præcise og fyldestgørende vidensgrafer fra ustruktureret tekst uden domænespecifik finjustering.

[This abstract has been generated with the help of AI directly from the project full text]