AAU Student Projects is unavailable between June 15th 1.30pm and 17th 1.30pm due to planned system maintenance. The projects cannot be downloaded during this period.
AAU Student Projects - visit Aalborg University's student projects portal
An executive master's programme thesis from Aalborg University
Book cover


Enabling Controlled Reasoning Capabilities for LLMs in the Healthcare Diagnostic Process Using KG-RAG

Authors

;

Term

4. semester

Publication year

2026

Submitted on

Pages

74

Abstract

This study evaluates a clinical decision support system (CDSS) that combines a large language model (LLM) with knowledge-graph retrieval-augmented generation (KG-RAG), and compares it to an LLM without RAG. LLMs can assist diagnosis, but are rarely used proactively because of opaque reasoning (black-box behavior) and hallucinations (made-up or incorrect statements). We run an in silico, cross-sectional experiment to identify general tendencies when a low-effort support tool is introduced into diagnostic workflows. The dataset is 66 synthetically generated clinical notes tested with varied prompts. We test two hypotheses: (1) improved precision of the final diagnosis label, and (2) stronger adherence to instructions and source material (control). The hypotheses are assessed using LLM-based judges and examination of the model’s reasoning explanations. Results show that the KG-RAG setup has lower precision than a no-RAG alternative, but its outputs closely follow instructions and remain faithful to the provided sources. We therefore reject the precision hypothesis and accept the control hypothesis: integrating KG-RAG increases control over outputs, but reduces precision.

Dette studie undersøger en klinisk beslutningsstøtte (CDSS), der kombinerer en stor sprogmodel (LLM) med knowledge graph-baseret retrieval-augmented generation (KG-RAG), og sammenligner den med en LLM uden RAG. LLM’er kan hjælpe i diagnosticering, men bruges sjældent proaktivt på grund af skjult logik (black box) og hallucinationer (opdigtede eller fejlagtige udsagn). Vi gennemfører et computerbaseret (in silico) tværsnitsstudie for at afdække generelle tendenser, når man indfører et lav-indsats støtteværktøj i diagnoseprocessen. Datagrundlaget er 66 syntetisk genererede journalnoter, som testes med forskellige prompts. Vi opstiller to hypoteser: (1) at systemet forbedrer præcisionen af den endelige diagnose, og (2) at systemet i højere grad efterlever instrukser og holder sig til kilderne (kontrol). Hypoteserne vurderes med LLM-baserede bedømmere og gennemgang af modellens begrundelser. Resultaterne viser, at KG-RAG-løsningen har lavere præcision end en løsning uden RAG, men at outputtet i høj grad følger instrukserne og er tro mod kildematerialet. Vi forkaster derfor hypotesen om forbedret præcision og accepterer hypotesen om bedre kontrol: Integration af KG-RAG giver mere styring af output, men på bekostning af præcision.

[This apstract has been rewritten with the help of AI based on the project's original abstract]