Term
4. term
Education
Publication year
2025
Submitted on
2025-06-12
Pages
11 pages
Abstract
Natural Language to SQL (NL2SQL) systems translate natural lan- guage questions into executable SQL queries, enabling non-technical users to interact with databases. While recent advances in large language models (LLMs) and schema-aware techniques have driven performance on benchmarks such as Spider and BIRD, existing systems continue to struggle with ambiguity—particularly when queries admit multiple valid interpretations due to overlapping schema elements. This issue, termed Schema-Induced Ambiguity (SIA), arises when natural language tokens ambiguously refer to multiple tables, columns, or relations. SIA is especially common in real-world databases, where evolving and denormalised schemas diverge from the clean structure typically found in academic bench- marks. Current approaches address ambiguity only implicitly or par- tially. LLMs can reduce lexical ambiguity, but fail to reliably detect structural ambiguities without explicit schema reasoning. More- over, few systems are designed to proactively identify SIA before generating a query, leading to silent failures and misinterpretations. To address this gap, we propose a two-step detection framework: a fine-tuned BERT cross-encoder identifies schema elements likely to be involved in the intended query, followed by a Graph Attention Network (GAT) operating over the induced subgraph to predict the presence of ambiguity. Our method outperforms baseline ap- proaches in-domain, yet generalisation to unseen schemas remains limited, as evidenced by performance drops on BIRD-bench and Trial-Bench. Nonetheless, in-context training demonstrates strong potential for scaling ambiguity detection. While this work focuses exclusively on schema-induced sources, future extensions must address other forms of ambiguity to ensure reliability in production deployments. Code available at: https://github.com/P10-NLIDB.
Keywords
Documents
Colophon: This page is part of the AAU Student Projects portal, which is run by Aalborg University. Here, you can find and download publicly available bachelor's theses and master's projects from across the university dating from 2008 onwards. Student projects from before 2008 are available in printed form at Aalborg University Library.
If you have any questions about AAU Student Projects or the research registration, dissemination and analysis at Aalborg University, please feel free to contact the VBN team. You can also find more information in the AAU Student Projects FAQs.