Autoencoder techniques for survival analysis on renal cell carcinoma

Author

Sanz Ilundain, Iñigo

Term

4. term

Education

Computer Science (IT), Master

Publication year

2024

Submitted on

2024-06-07

Pages

Abstract

Survival analysis heavily impacts the study of diseases by providing statistical methods and metrics to analyze time-to-event data, crucial in understanding disease progression and the effectiveness of treatments. However, in the medical domain, the data is often high-dimensional, complicating the regression of such methodologies. For this reason, in this work, we have focused on compressing the high-dimensionality found in the transcriptomic data of patients treated with an immunotherapy (avelumab + axitinib) and a TKI (sunitinib) into latent, meaningful features using autoencoders. We then applied a statistical methodology based on the COX Proportional Hazards model, a semi-parametric approach, combined with Breslow’s estimator to determine the survival functions of the patients and predict each patient's Progression-Free Survival (PFS). We extensively analyzed different penalties as well as their combinations. Due to the nature of the transcriptomic data, we extended the model to accept not only tabular data but also its graph variant, where the edges represent protein-to-protein interactions between genes, which proved to be a more meaningful approach. Finally, since neural networks, and especially autoencoders, are often seen as black boxes, we worked on interpretability by identifying the mutual information between the genes in the original data and the representations of the latent features. This approach attempts to clarify which genes are most presented in which latent variables. Our results show that certain types of autoencoders are more relevant depending on the situation. To obtain accurate reconstruction, denoising autoencoders prove useful. To find meaningful representations of the data, the sparse variant is the best option. Moreover, these penalties can be combined to achieve both accurate representations and meaningful latent features. The interpretable models also suggested that genes such as LRP2 and ACE2 are highly related to renal cell carcinoma. We present this work as extensive research demonstrating the usefulness of autoencoders in high-dimensional problems.

Keywords

autoencoders ; renal cell carcinoma ; cox proportional hazards ; breslow ; machine learning

Documents

Download
View record in AAU Student Projects

A master's thesis from Aalborg University

Autoencoder techniques for survival analysis on renal cell carcinoma