'Practical Data Mining on a Swiss Flora Database and Geographical Clustering Software Implementation'

Authors

Centeno, Óliver ; Mediano, Javier T.

Term

4. term

Education

Computer Science, Master

Publication year

2006

Abstract

This thesis explores data mining with a focus on clustering—methods that group similar data points. We describe partition-based techniques, such as k-means, and probabilistic approaches, such as Naïve Bayes combined with Expectation-Maximization (EM). We implemented k-means, its trimmed variant (which reduces the influence of outliers), and Naïve Bayes with EM for clustering, and compared their results on a provided database. We also built a tool that applies these techniques and displays the clustering results on a map of the geographical area from which the data originate.

Dette speciale undersøger data mining med fokus på klyngedannelse – metoder, der grupperer lignende datapunkter. Vi beskriver partitionsbaserede teknikker som k-means og probabilistiske tilgange som Naïve Bayes kombineret med Expectation-Maximization (EM). Vi har implementeret k-means, den trimmede variant (som reducerer indflydelsen af udliggere), samt Naïve Bayes med EM til klyngedannelse og sammenlignet deres resultater på en given database. Derudover har vi udviklet et værktøj, der anvender disse teknikker og viser klynge-resultaterne på et kort over det geografiske område, hvor dataene stammer fra.

[This abstract has been rewritten with the help of AI based on the project's original abstract]

Documents

Download PDF
View record in AAU Student Projects

A master's thesis from Aalborg University

'Practical Data Mining on a Swiss Flora Database and Geographical Clustering Software Implementation'