AAU Student Projects - visit Aalborg University's student projects portal
An executive master's programme thesis from Aalborg University
Book cover


Graph Clustering with an Emphasis on Algorithms Employing the Commuting Times Distance.

Authors

;

Term

4. term

Publication year

2013

Submitted on

Pages

95

Abstract

Grafer – samlinger af noder (punkter) forbundet af kanter (linjer) – er en enkel måde at repræsentere netværk på, så man kan fokusere på relationer. Sådan en visualisering kan afsløre mønstre, der er svære at se ellers. Grafklyngedannelse forsøger at finde grupper af noder, der er særlig tæt forbundne. Metoden bruges bl.a. i sociale netværk (f.eks. videnskabeligt samarbejde) og i biologi (f.eks. protein–protein-interaktionsnetværk). Afhandlingen fokuserer på et afstandsmål for grafer kaldet Commuting Times-afstanden. Det beregnes ud fra Moore–Penrose-pseudoinversen af grafens Laplacian-matrice, en lineær-algebraisk beskrivelse af, hvordan noderne er forbundet. Vi implementerer denne afstand og etablerer en ramme, hvor den kan kombineres med afstandsbaserede klynge-metoder – specifikt K-Medoids og hierarkisk klyngedannelse – for at klynge grafer. Vi afprøver disse metoder sammen med Commuting Times-afstanden på flere computergenererede grafer, på to datasæt med kendt klyngestruktur og på ét datasæt med ukendt struktur. Resultaterne sammenlignes med den klassiske Girvan-Newman-algoritme på de samme datasæt.

Graphs—collections of nodes (points) connected by edges (lines)—provide a simple way to represent networks and focus on relationships. Such visualizations can reveal patterns that are hard to see otherwise. Graph clustering aims to find groups of nodes that are especially tightly linked. This approach is used in areas like social networks (e.g., scientific collaboration) and biology (e.g., protein–protein interaction networks). This thesis focuses on a graph distance called the Commuting Times distance. It is computed from the Moore–Penrose pseudoinverse of the graph Laplacian, a linear-algebra summary of how nodes are connected. We implement this distance and set up a framework where it can be combined with distance-based clustering methods—specifically K-Medoids and hierarchical clustering—to group nodes in graphs. We evaluate these methods together with the Commuting Times distance on several computer-generated graphs, on two datasets with known cluster structure, and on one dataset with unknown structure. We also compare the results with those of the classic Girvan-Newman algorithm on the same datasets.

[This abstract was generated with the help of AI]