Enriching Clinical Sample Analysis With Pathway Knowledge Graphs and GNNs

Author

Shad Bakhsh, Fatemeh

Term

4. term

Education

Computer Science (IT), Master

Publication year

2024

Submitted on

2024-06-10

Abstract

Biological research often faces challenges in analyzing protein data due to limited sample sizes, hindering the use of traditional statistical methods. Integrating knowledge from extensive graph databases like Reactome and UniProt can offer valuable insights, but efficient techniques are required for effective analysis. This thesis proposes Cluster-GAE, a novel method that combines graph sampling techniques with Graph Neural Networks (GNNs) to learn informative representations from large-scale biological networks. By adapting the cluster-GCN algorithm for graph representation learning, Cluster-GAE addresses the computational challenges of analyzing large graphs while preserving crucial structural information. Our evaluation, which included a comparison with Random Walk, Forest Fire, and No Sampling methods, demonstrates Cluster-GAE's superior performance in preserving graph structure and generating meaningful protein embeddings. Through t-SNE visualization and functional enrichment analysis, we showcase the ability of Cluster-GAE to identify distinct protein clusters and reveal over-represented biological pathways, potentially leading to the discovery of novel biological mechanisms. This thesis establishes a robust framework for analyzing biological samples with limited data, enhancing the interpretability of protein data analysis and opening new possibilities for biological discovery.

Documents

Download
View record in AAU Student Projects

A master's thesis from Aalborg University

Enriching Clinical Sample Analysis With Pathway Knowledge Graphs and GNNs