Multi-dimensional Classification - Data Mining using Data Cubes

Studenteropgave: Speciale (inkl. HD afgangsprojekt)

  • Peter Jensen
4. semester, Datalogi, Kandidat (Kandidatuddannelse)
This thesis deals with the use of data mining on data warehouse structured data, also known as multi-dimensional data. The theory regarding data warehouses is investigated with the purpose of understanding the structure of data in these. Then a data set, dealing with the sales of products, and the payments of the customers, is analysed. There are to goals for this analysis, one is to create a multi-dimensional design, the other, and more important, is to get experience in creating such designs, to understand the structure of the designs better. Then it is tried to analyse the multi-dimensional data using a traditional data mining tool, Clementine. The aim of this analysis is to discover weaknesses in traditional data mining tools when dealing with multi-dimensional data. We then propose a way to analyse multi-dimensional data in general, and we propose changes to decision tree induction algorithms such that they utilise the multi-dimensional structure better. Finally, we evaluate the proposed way to analyse multi-dimensional data using a prototype of a graphical interface, and we analyse some of the proposed changed to decision tree induction using the data set we have been working with.
SprogEngelsk
Udgivelsesdatoaug. 2003
ID: 61058494