Temporal Illness Prediction using a Bayesian Model
Authors
Møller, Esben Pilgaard ; Panum, Thomas Kobber ; Søndergaard, Bjarke Hesthaven
Term
4. term
Education
Publication year
2014
Submitted on
2014-06-04
Abstract
Sundhedsvæsenet indsamler og digitaliserer store mængder talbaserede oplysninger om patienter, fx laboratoriesvar og diagnoser. Læger bruger ofte laboratoriesvar til at stille diagnoser, men når data lagres digitalt, bevares den direkte kobling mellem prøvesvar og diagnoser ikke altid. Dette projekt undersøger, om der findes en systematisk sammenhæng i de digitale data mellem analysesvar og sygdomme. Til formålet konstrueres en ikke-parametrisk bayesiansk model, der forsøger at forudsige den diagnosticerede sygdom ud fra analysesvar. Ikke-parametrisk betyder, at modellen ikke antager en fast form på dataenes fordeling, og den bayesianske tilgang gør det muligt at opdatere sandsynligheder i lyset af nye data. Modellen anvender Kernel Density Estimation (kernetæthedsestimering) til at estimere normalitetsrum – altså fordelinger af typiske værdier for laboratorieegenskaber – for hver sygdom. Disse fordelinger er baseret på tidslige målinger af analysesvar og bruges til at beregne, hvor sandsynlige sygdomme er givet nye prøver. Modellen evalueres på forskellige sæt af sygdomme og sammenlignes med simple, naive baseline-metoder. I alle tilfælde klarede modellen sig bedre end de naive tilgange. Resultaterne peger på, at der eksisterer en relation mellem analysesvar og diagnoser i de digitaliserede data.
The healthcare sector collects and digitizes large amounts of numerical patient information, such as lab test results and diagnoses. Clinicians routinely use lab results to diagnose illness, but when these data are stored digitally, the explicit link between tests and diagnoses is often not preserved. This project examines whether a systematic relationship between lab analyses and illnesses can be detected in digitized records. To do so, a non-parametric Bayesian model is built to predict the diagnosed illness from lab results. Non-parametric here means the model does not assume a fixed shape for the data, and the Bayesian framework updates probabilities as new evidence arrives. The model uses Kernel Density Estimation to learn normality spaces (distributions of typical values for lab variables) conditioned on each illness. These distributions are computed from temporal (over-time) lab measurements and are used to estimate how likely different illnesses are given new test results. The model is evaluated on multiple sets of illnesses and compared with simple, naive baseline methods. Across all sets, the model outperforms the baselines, providing evidence that lab analyses and diagnoses are related in the digitized data.
[This abstract was generated with the help of AI]
Keywords
Documents
