Dose-Informed Multiple Imputation: Incorporating Cumulative Drug Exposure into 3D-MICE for Longitudinal Clinical Trial Data

Authors

Volavkova, Tereza ; Pedersen, Sandra Søborg

Term

4. semester

Education

Data Science and Machine Learning, Msc.

Publication year

2026

Abstract

This thesis investigates how to best impute missing end-of-study (EoS) outcomes in longitudinal clinical trials, where participants frequently drop out and create a monotone missing data pattern that threatens valid estimation of treatment effects. The focus is on dose-escalation trials, in which both outcomes and cumulative drug exposure evolve over time, and where regulatory and pharmacokinetic frameworks suggest cumulative dose is an important predictor of EoS values. The current state-of-the-art, Three-Dimensional Multiple Imputation by Chained Equations (3D-MICE), combines cross-sectional information and longitudinal structure via Multiple Imputation by Chained Equations and a time-indexed Gaussian process, but does not explicitly exploit the trajectory of cumulative dose. To address this, the project develops four extensions of 3D-MICE that incorporate cumulative dose into the Gaussian process component in different ways: 3D-MICE-DOSE (replacing the time index with cumulative dose), 4D-MICE-2GP (fitting separate processes over time and dose and then pooling), 4D-MICE-PREC (combining the two via precision-weighting), and 4D-MICE-PROD (jointly modeling time and dose using a product kernel). All five methods are implemented and evaluated on two synthetic and two real clinical trial datasets under a range of simulated dropout mechanisms, including missing completely at random and missing at random driven by cumulative dose or baseline BMI, with varying dropout timings and proportions. Performance is assessed using RMSE, bias, and empirical coverage of 95% prediction intervals together with interval width. The results show that no extension achieves a clear overall advantage over baseline 3D-MICE; 4D-MICE-2GP provides the most consistent setting-specific gains with up to roughly 20% reduction in RMSE, but analysis indicates these improvements largely arise from structurally increasing the weight of the Gaussian process rather than from genuine dose-trajectory information, a conclusion supported by the near-baseline performance of 4D-MICE-PROD across settings. 3D-MICE-DOSE generally performs worst, particularly for placebo subjects where cumulative dose carries little information. Across methods, the 95% prediction intervals exhibit a strong temporal pattern of miscalibration: under early dropout, coverage often falls well below the nominal level and coincides with positive bias, while late dropout tends to show mild negative bias and, in some cases, again sub-nominal coverage. Overall, the findings suggest that cumulative dose does not necessarily add substantial signal beyond elapsed time in this context and that prediction intervals from 3D-MICE-based imputation methods should be empirically checked for calibration before being used to support regulatory or clinical decision-making.

Dette speciale undersøger, hvordan man bedst kan imputere manglende slutmålinger (End-of-Study, EoS) i longitudinelle kliniske forsøg, hvor deltagere ofte udgår undervejs og efterlader et monotont mangemønster, der vanskeliggør korrekt estimering af behandlingseffekter. Særligt fokuseres der på dosis-eskalationsforsøg, hvor både udfald og kumulativ lægemiddeldosis udvikler sig over tid, og hvor regulatoriske og farmakokinetiske rammer peger på kumulativ dosis som en vigtig forklarende variabel. Den eksisterende metode 3D-MICE kombinerer tværsnitsinformation og longitudinel struktur via Multiple Imputation by Chained Equations og en tidsindekseret Gaussisk proces, men udnytter ikke eksplicit dosisforløbet. Projektet formulerer derfor fire udvidelser af 3D-MICE, der på forskellige måder inddrager kumulativ dosis i den Gaussiske proces: 3D-MICE-DOSE (tid erstattes af kumulativ dosis), 4D-MICE-2GP (separate processer for tid og dosis, der efterfølgende sammenlægges), 4D-MICE-PREC (samling via præcisions-vægtning) og 4D-MICE-PROD (fælles model for tid og dosis via produktkernel). Alle fem metoder implementeres og testes på to syntetiske og to virkelige datasæt under forskellige scenarier for simuleret frafald, herunder både missing completely at random og missing at random baseret på kumulativ dosis eller baseline BMI, med varierende frafaldstidspunkt og -omfang. Ydelsen vurderes vha. RMSE, bias og empirisk dækning af 95 % prædiktionsintervaller samt deres bredde. Resultaterne viser, at ingen udvidelse konsekvent overgår den oprindelige 3D-MICE; 4D-MICE-2GP giver de mest robuste scenario-specifikke forbedringer med op til ca. 20 % lavere RMSE, men analysen tyder på, at gevinsterne primært skyldes en strukturel opvægtning af den Gaussiske proces frem for reel udnyttelse af dosisforløbet, hvilket understøttes af at 4D-MICE-PROD ligger tæt på baseline på tværs af scenarier. 3D-MICE-DOSE klarer sig generelt dårligst, især for placebo, hvor kumulativ dosis er uinformativ. På tværs af metoder observeres en markant tidsafhængig miskalibrering af 95 % prædiktionsintervallerne: ved tidligt frafald ligger dækningen ofte betydeligt under det nominelle niveau og ledsages af positiv bias, mens der ved sent frafald ses let negativ bias og i visse tilfælde igen for lav dækning. Samlet peger resultaterne på, at kumulativ dosis ikke nødvendigvis tilfører væsentlig ekstra signal ud over tid i denne kontekst, og at prædiktionsintervaller fra 3D-MICE-baserede metoder bør underkastes empirisk kalibrering, før de anvendes i regulatoriske eller kliniske beslutninger.

[This abstract has been generated with the help of AI directly from the project full text]

Documents

Download PDF
View record in AAU Student Projects

A master's thesis from Aalborg University

Dose-Informed Multiple Imputation: Incorporating Cumulative Drug Exposure into 3D-MICE for Longitudinal Clinical Trial Data