AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Molecule Generation using Diffusion Models on Graphs

Authors

;

Term

4. term

Publication year

2024

Submitted on

Pages

59

Abstract

This project investigates how diffusion models can generate new molecules for de novo drug design by representing molecules as graphs and applying both continuous and categorical diffusion. We adopt the categorical approach of Hoogeboom, Nielsen, et al. (2021) for graph-based molecules and introduce a novel spatial graph representation that captures molecular internal coordinates via triplet and dihedral angles. Generated molecules are evaluated using standard metrics—validity, uniqueness, and novelty—and 3D-specific metrics such as potential energy and RMSD. The new representation is converted to a Z-matrix with a proposed algorithm, and Cartesian coordinates are computed using NeRF to enable 3D evaluation. Our experiments show that the choice of noise schedule (linear vs. cosine) has limited impact. For generation on simple graphs, categorical diffusion achieves higher validity than continuous diffusion, while continuous diffusion yields similar or slightly better uniqueness and novelty; both approaches outperform state-of-the-art baselines on these metrics. For 3D conformations, the generated structures are close to energy-minimized conformations in terms of RMSD but exhibit high energies, indicating issues in the spatial graph representation and conversion algorithm. Overall, the categorical approach is effective for graph representations, and using internal coordinates shows promise for producing sensible 3D conformations; moreover, choosing distributions aligned with the diffused variables improved results and reduced training time.

Dette projekt undersøger, hvordan diffusionmodeller kan bruges til at generere nye molekyler for de novo-lægemiddeldesign ved at repræsentere molekyler som grafer og anvende både kontinuert og kategorisk diffusion. Vi følger den kategoriske tilgang fra Hoogeboom, Nielsen, et al. (2021) for grafbaserede molekyler og introducerer en ny rumlig grafrepræsentation, der indfanger molekylers interne koordinater via tripel- og dihedrale vinkler. For at evaluere de genererede molekyler anvender vi standardmetrikkerne gyldighed, unikhed og nyhed samt 3D-specifikke metrikker som potentiel energi og RMSD. Den nye repræsentation konverteres til en Z-matrix med en foreslået algoritme, hvorefter kartesiske koordinater beregnes med NeRF til 3D-evaluering. I vores eksperimenter har valg af støjskema (lineært vs. cosinus) begrænset indflydelse på resultaterne. Ved generering på simple grafer opnår kategorisk diffusion højere gyldighed end kontinuert, mens kontinuert diffusion giver lignende eller lidt bedre unikhed og nyhed; begge tilgange overgår state of the art på disse mål. For 3D-konformationer er de genererede strukturer tæt på energiminimerede konformationer i RMSD, men har forhøjede energier, hvilket peger på udfordringer i den rumlige grafrepræsentation og konverteringsalgoritmen. Overordnet viser den kategoriske tilgang sig anvendelig for grafrepræsentationer, og brugen af interne koordinater har potentiale til at skabe fornuftige 3D-konformationer; desuden gav distributionsvalg, der matcher de diffunderede værdier, bedre resultater og hurtigere træning.

[This apstract has been generated with the help of AI directly from the project full text]