Preserving Contextual Information from Unstructured Free Text Documents Using NLP, SNOMED CT, and HL7 FHIR to Achieve Semantic Interoperability

Translated title

Bevaring af kontekstuel information fra ustrukturerede fritekstdokumenter ved brug af NLP, SNOMED CT og HL7 FHIR for at opnå semantisk interoperabilitet

Authors

Jensen, Johanne Krogsgaard ; Sørensen, Thea Mentz

Term

4. term

Education

Biomedical Engineering and Informatics, Master

Publication year

2021

Abstract

Background: Clinical notes are often written as free text. This supports daily practice but makes it hard to reuse information across IT systems. When turning free text into structured data, it is crucial to keep the situational context (for example, who, when, and in what circumstances an item applies) so the meaning is preserved. Objective: To explore how situational context can be preserved when extracting and structuring relevant information from free-text documents in order to achieve semantic interoperability—so different systems can share and interpret data consistently. Method: We used hospital discharge summaries from the 2010 N2C2 challenge as our data. Information was structured and encoded with HL7 FHIR (an international standard for exchanging health data) and SNOMED CT (a clinical terminology). For automatic extraction, we applied the NLP system cTAKES and iteratively adjusted it using an agile development approach. We focused on capturing more context by using post-coordinated SNOMED CT expressions (combinations of concepts to express meaning more precisely) and evaluated the output against a gold standard. Results: The 21 FHIR profiles we created could represent 95.5% of the information in the discharge summaries. The adjusted cTAKES achieved an F-score of 0.120 (a combined measure of precision and recall), which is low. Conclusion: Situational contextual information from free text can be preserved using HL7 FHIR and SNOMED CT. However, automatic information extraction with cTAKES is not yet mature enough for clinical use.

Baggrund: Kliniske notater skrives ofte som fritekst. Det støtter den daglige praksis, men gør det svært at genbruge oplysninger på tværs af it-systemer. Når fritekst omformes til strukturerede data, er det afgørende at bevare den situationsbestemte kontekst (fx hvem, hvornår og i hvilken sammenhæng en oplysning gælder), så meningen ikke går tabt. Formål: At undersøge, hvordan situationsbestemt kontekst kan bevares, når relevante oplysninger udtrækkes og struktureres fra fritekstdokumenter, for at opnå semantisk interoperabilitet—så forskellige systemer kan dele og forstå data på samme måde. Metode: Som datagrundlag brugte vi udskrivningsresumeer fra N2C2-udfordringen 2010. Oplysningerne blev struktureret og kodet med HL7 FHIR (en international standard for udveksling af sundhedsdata) og SNOMED CT (en klinisk terminologi). Til automatisk udtræk anvendte vi NLP-systemet cTAKES, som vi løbende tilpassede med en agil udviklingsmetode. Vi fokuserede på at indfange mere kontekst ved at bruge postkoordinerede SNOMED CT-udtryk (kombinationer af begreber for at udtrykke betydning mere præcist) og evaluerede resultaterne mod en guldstandard. Resultater: De 21 udviklede FHIR-profiler kunne repræsentere 95.5% af indholdet i udskrivningsresumeerne. Den tilpassede cTAKES opnåede en F-score på 0.120 (en samlet måling af præcision og dækning), hvilket er lavt. Konklusion: Situationsbestemt kontekst i fritekst kan bevares ved hjælp af HL7 FHIR og SNOMED CT. Derimod er automatisk informationsudtræk med cTAKES endnu ikke tilstrækkeligt modent til klinisk brug.

[This abstract has been rewritten with the help of AI based on the project's original abstract]

Documents

Download PDF
View record in AAU Student Projects

A master's thesis from Aalborg University

Preserving Contextual Information from Unstructured Free Text Documents Using NLP, SNOMED CT, and HL7 FHIR to Achieve Semantic Interoperability