Graph Clustering with an Emphasis on Algorithms Employing the Commuting Times Distance.
Student thesis: Master programme thesis
- Andrew Bernard Lannie
- Rodney Michael Miles
4. term, Master of IT (Continuing education) (Continuing Education Programme (Master))
It has for some time been found useful to represent networks of various kinds by graphs. The intention is to simplify the data in order to focus on relations between objects in the network represented by edges joining the corresponding vertices. Perhaps this geometric visualization of the reduced data will reveal patterns which might not be seen in other representations.
One of the techniques increasingly used is Graph Clustering in which an attempt is made to find groups of vertices with especially tight bonds. Graph Clustering methods have been used successfully in a number of areas but perhaps especially in the fields of social networks (e.g. networks of scientific collaboration) and biology (e.g. protein-protein interaction networks)
In various ways, the Laplacian matrix of a graph has been found to have a strong relationship with the clusters in that graph. We implement a distance function, the Commuting Times distance, which is simply derived from the Moore-Penrose pseudoinverse of the Laplacian matrix and create an environment in which it may be applied together with distance based clustering methods to cluster graphs. The clustering methods employed are the K-Medoids algorithm and Hierarchical Clustering techniques.
We test these algorithms together with the Commuting Times distance on a number of computer-generated graphs, on two datasets where the cluster structure is well known and on one dataset where it is unknown.
We compare our results with the results of the classic Girvan-Newman algorithm on the same datasets.
One of the techniques increasingly used is Graph Clustering in which an attempt is made to find groups of vertices with especially tight bonds. Graph Clustering methods have been used successfully in a number of areas but perhaps especially in the fields of social networks (e.g. networks of scientific collaboration) and biology (e.g. protein-protein interaction networks)
In various ways, the Laplacian matrix of a graph has been found to have a strong relationship with the clusters in that graph. We implement a distance function, the Commuting Times distance, which is simply derived from the Moore-Penrose pseudoinverse of the Laplacian matrix and create an environment in which it may be applied together with distance based clustering methods to cluster graphs. The clustering methods employed are the K-Medoids algorithm and Hierarchical Clustering techniques.
We test these algorithms together with the Commuting Times distance on a number of computer-generated graphs, on two datasets where the cluster structure is well known and on one dataset where it is unknown.
We compare our results with the results of the classic Girvan-Newman algorithm on the same datasets.
Specialisation | Software Construction |
---|---|
Language | English |
Publication date | 6 Jun 2013 |
Number of pages | 95 |