Author(s)
Term
4. term
Education
Publication year
2025
Submitted on
2025-05-26
Pages
54 pages
Abstract
This thesis explores how to build observable and reliable AI systems for legal text analysis using Large Language Models (LLMs). Focusing on EU legislative documents from MultiEURLEX, a baseline LLM system, a Retrieval-Augmented Generation (RAG) pipeline, and an agentic multi-step variant are developed and compared. The systems are evaluated using a curated gold-standard dataset, quantitative metrics (F1, precision, recall), and qualitative assessments (LLM-as-a-Judge). Tools like Langfuse, LiteLLM provide full observability, tracing, metric logging across local free open-source, open-weights and cloud based proprietary LLM configurations. Key findings reveal direct LLM access outperforms RAG variants due to low retrieval recall, highlighting retrieval is current bottleneck in specific domain RAG application. The work skills and competences demonstrate a full-stack MLOps deployment on AAU's uCloud HPC GPU platform and highlights importance of traceability, and human centered evaluation in trustworthy AI. This thesis and related research contribute both a methodological blueprint and critical insights for operationalizing GenAI in high stakes domains.
Keywords
Documents
Colophon: This page is part of the AAU Student Projects portal, which is run by Aalborg University. Here, you can find and download publicly available bachelor's theses and master's projects from across the university dating from 2008 onwards. Student projects from before 2008 are available in printed form at Aalborg University Library.
If you have any questions about AAU Student Projects or the research registration, dissemination and analysis at Aalborg University, please feel free to contact the VBN team. You can also find more information in the AAU Student Projects FAQs.