GutHub: Tackling the GutBrainIE Task Through Entity Recognition, Disambiguation, and Relation Extraction
Authors
Rydder, Andreas Cornelius ; Christensen, Simon Bech ; Austys, Dziugas
Term
4. term
Education
Publication year
2026
Submitted on
2026-06-12
Abstract
We present GutHub, an end-to-end extraction pipeline for the CLEF 2026 GutBrainIE task that automatically identifies technical terms (Named Entity Recognition, NER), resolves which concept is meant (Named Entity Disambiguation, NERD), and extracts relations between them at both the mention level (M-RE) and the concept level (C-RE). The key challenge is extreme class imbalance across a layered training corpus ranging from large, noisy auto-generated text to sparse, expert-curated examples. To address this, GutHub uses a stacked ensemble of transformer models to reduce label bias. For disambiguation, we implement a hybrid search that combines two text representations (dense embeddings from SapBERT and sparse TF-IDF vectors) with a biomedical cross-encoder reranker fine-tuned using hard negative mining (difficult negative training examples). For relation extraction, we apply class-specific confidence thresholds and deterministic heuristic pruning, reducing false-positive hallucinations from 1,399 to 874. Mention-level relations are then mapped to unique URIs to produce concept-level relations. Our micro-F1 scores were 0.7893 (NER), 0.3497 (NERD), 0.4029 (M-RE), and 0.1295 (C-RE). Micro-F1 is a standard metric balancing precision and recall.
Vi præsenterer GutHub, en samlet ekstraktionspipeline til CLEF 2026 GutBrainIE, der automatisk genkender faglige begreber (Named Entity Recognition, NER), afklarer hvilket koncept der menes (Named Entity Disambiguation, NERD) og finder relationer mellem dem på både omtaleniveau (M-RE) og konceptniveau (C-RE). Den største udfordring er ekstrem klasseubalance i et flerlaget træningskorpus, der spænder fra store, støjende autogenererede tekster til få, ekspert-kuraterede eksempler. For at imødegå dette bruger GutHub et stablet ensemble af transformer-modeller for at mindske label-bias. Til disambiguering anvender vi en hybrid søgning, der kombinerer to måder at repræsentere tekst (tætte embeddings fra SapBERT og sparsomme TF-IDF-vektorer) sammen med en biomedicinsk cross-encoder-reranker, fintunet med hard negative mining (svære negative træningseksempler). Til relationsekstraktion indfører vi klassespecifikke tillidstærskler og deterministisk, heuristisk beskæring (pruning), hvilket reducerede falsk-positive hallucinationer fra 1.399 til 874. Relationer på omtaleniveau mappes derefter til unikke URI'er for at danne relationer på konceptniveau. Vores mikro-F1 var 0,7893 (NER), 0,3497 (NERD), 0,4029 (M-RE) og 0,1295 (C-RE). Mikro-F1 er et standardmål, der balancerer præcision og dækning.
[This abstract has been rewritten with the help of AI based on the project's original abstract]
Keywords
Stacked Ensemble ; Transformer Models ; Biomedical Information Extraction ; NLP ; NER ; RE ; NERD ; C-RE ; M-RE
