AAU Student Projects - visit Aalborg University's student projects portal
A master thesis from Aalborg University

Preserving Contextual Information from Unstructured Free Text Documents Using NLP, SNOMED CT, and HL7 FHIR to Achieve Semantic Interoperability

[Bevaring af kontekstuel information fra ustrukturerede fritekstdokumenter ved brug af NLP, SNOMED CT og HL7 FHIR for at opnå semantisk interoperabilitet]

Author(s)

Term

4. term

Education

Publication year

2021

Submitted on

2021-05-28

Pages

85 pages

Abstract

Introduktion: Anvendelsen af fritekstdokumentation understøtter kliniske praksis, men der opstår udfordringer ved genanvendelse af fritekstdokumenter. Da det ikke er muligt at genanvende alt information, bør fokus være på bevarelse af den situationelle kontekst ved håndteringen af udfordringerne. Derfor vil dette studie udforske, hvordan den situationelle kontekst kan bevares når relevant information udtrækkes og struktureres fra fritekstdokumenter, for at opnå semantisk interoperabilitet. Metode: Udskrivningsepikriser fra N2C2 2010 konkurrencen blev anvendt som datagrundlag, som sammen med en implementeringskontekst satte dette rammen for udviklingen. HL7 FHIR ressourcer, SNOMED CT udtryk og NLP systemet, cTAKES, blev anvendt til at strukturere, kode og udtrække information fra udskrivningsepikriserne. cTAKES blev justeret vha. en agil udviklingstilgang. Fokus for justeringerne var at inkludere mere kontekstuel information ved brug af post-koordinerede udtryk fra SNOMED CT. Dette var testet op imod en gold standard. Resultat: De 21 validerede FHIR profiler indeholdte 95,5% af information fra udskrivningsepikriserne. Det justerede cTAKES havde en F-score på 0,120. Konklusion: Den situationelle kontekstuelle information fra fritekstdokumenter kan bevares ved brug af HL7 FHIR og SNOMED CT. Derimod er automatiseret dataudtræk ved brug af cTAKES endnu ikke moden til klinisk anvendelse.

Introduction: The use of free text documentation supports clinical practice but challenges arise when reusing free text documents. Since it is not possible to reuse all information within healthcare, a focus on preserving the situational context must be retained when handling the challenges. Therefore, the objective of this study was to explore how the situational context can be preserved when extracting and structuring relevant information from free text documents in order to obtain semantic interoperability. Method: Discharge summaries from the N2C2 2010 challenge were used as the data foundation, which together with an implementation context set the scope for the development. HL7 FHIR resources, SNOMED CT expressions, and the NLP system cTAKES, were used to structure, encode, and extract information from the discharge summaries. cTAKES was adjusted using an agile development approach. The focus of the adjustments were to include more contextual information by using post-coordinated expressions from SNOMED CT, and these were tested against a gold standard. Result: The 21 FHIR profiles contained 95.5% of information from the discharge summaries. The adjusted cTAKES had a F-score of 0.120. Conclusion: The situational contextual information from free text documents can be preserved using HL7 FHIR and SNOMED CT. However, automatic information extraction using cTAKES, lack the maturity for clinical use.

Documents


Colophon: This page is part of the AAU Student Projects portal, which is run by Aalborg University. Here, you can find and download publicly available bachelor's theses and master's projects from across the university dating from 2008 onwards. Student projects from before 2008 are available in printed form at Aalborg University Library.

If you have any questions about AAU Student Projects or the research registration, dissemination and analysis at Aalborg University, please feel free to contact the VBN team. You can also find more information in the AAU Student Projects FAQs.