Unsupervised Feature Subset Selection

Studenteropgave: Kandidatspeciale og HD afgangsprojekt

  • Nicolaj Søndberg-Madsen
  • Casper Thomsen
4. semester, Datalogi, Kandidat (Kandidatuddannelse)
This master thesis has been developed in the domain of Decision Support Systems and it covers the sparsely researched area of unsupervised feature subset selection for data clustering. In the report we discuss what characterizes features that are relevant for data clustering and we propose new relevance score measures which are capable of producing a ranking of the features with respect to their relevance. The relevance scores, combined with a threshold, can be used in a filter approach where the uninformative features are discarded. The report proposes two methods for setting a threshold and the score measures are tested empirically on 3 synthetic data sets and 4 real world data sets. In a second step we propose to use the relevance rankings in a hybrid approach to performing unsupervised feature subset selection. This method allows us to perform unsupervised feature subset selection with less model inductions than ordinary wrapper approaches. Empirical tests show both the filter and hybrid approaches to perform satisfactory.
SprogEngelsk
Udgivelsesdatojun. 2003
ID: 61058307