Enabling Controlled Reasoning Capabilities for LLMs in the Healthcare Diagnostic Process Using KG-RAG

Authors

Juul, Mathias Salomonsen ; Graahede, Esben Münster

Term

4. semester

Education

Economics and Business Administration (Business Data Science), MSc.

Publication year

2026

Submitted on

2026-05-29

Abstract

This study evaluates a clinical decision support system (CDSS) that combines a large language model (LLM) with knowledge-graph retrieval-augmented generation (KG-RAG), and compares it to an LLM without RAG. LLMs can assist diagnosis, but are rarely used proactively because of opaque reasoning (black-box behavior) and hallucinations (made-up or incorrect statements). We run an in silico, cross-sectional experiment to identify general tendencies when a low-effort support tool is introduced into diagnostic workflows. The dataset is 66 synthetically generated clinical notes tested with varied prompts. We test two hypotheses: (1) improved precision of the final diagnosis label, and (2) stronger adherence to instructions and source material (control). The hypotheses are assessed using LLM-based judges and examination of the model’s reasoning explanations. Results show that the KG-RAG setup has lower precision than a no-RAG alternative, but its outputs closely follow instructions and remain faithful to the provided sources. We therefore reject the precision hypothesis and accept the control hypothesis: integrating KG-RAG increases control over outputs, but reduces precision.

Dette studie undersøger en klinisk beslutningsstøtte (CDSS), der kombinerer en stor sprogmodel (LLM) med knowledge graph-baseret retrieval-augmented generation (KG-RAG), og sammenligner den med en LLM uden RAG. LLM’er kan hjælpe i diagnosticering, men bruges sjældent proaktivt på grund af skjult logik (black box) og hallucinationer (opdigtede eller fejlagtige udsagn). Vi gennemfører et computerbaseret (in silico) tværsnitsstudie for at afdække generelle tendenser, når man indfører et lav-indsats støtteværktøj i diagnoseprocessen. Datagrundlaget er 66 syntetisk genererede journalnoter, som testes med forskellige prompts. Vi opstiller to hypoteser: (1) at systemet forbedrer præcisionen af den endelige diagnose, og (2) at systemet i højere grad efterlever instrukser og holder sig til kilderne (kontrol). Hypoteserne vurderes med LLM-baserede bedømmere og gennemgang af modellens begrundelser. Resultaterne viser, at KG-RAG-løsningen har lavere præcision end en løsning uden RAG, men at outputtet i høj grad følger instrukserne og er tro mod kildematerialet. Vi forkaster derfor hypotesen om forbedret præcision og accepterer hypotesen om bedre kontrol: Integration af KG-RAG giver mere styring af output, men på bekostning af præcision.

[This abstract has been rewritten with the help of AI based on the project's original abstract]

Keywords

Clinical Decision Support System (CDSS) ; Knowledge Graph-augmented RAG (KG-RAG) ; Retrieval-Augmented Generation ; Large Language Models (LLMs) ; LLM-as-a-judge / LLM judges

Documents

Download PDF
View record in AAU Student Projects

An executive master's programme thesis from Aalborg University

Enabling Controlled Reasoning Capabilities for LLMs in the Healthcare Diagnostic Process Using KG-RAG