AAU Student Projects - visit Aalborg University's student projects portal
A master thesis from Aalborg University

E-mail Categorization

Author(s)

Term

4. term

Education

Publication year

2013

Submitted on

2013-10-02

Pages

78 pages

Abstract

Nowadays e-mails became the most important medium between individuals but also companies and various organizations and they settled down closely in almost any aspect of our everyday activity. E-mails are not just simple text information, they can also transport different kind of attachments. They can be archived and form a powerful, non-volatile source of knowledge and in some cases they can even constitute clear evidences in trials. Maintaining mailboxes in a structured form is a challenging task. When incoming and outgoing correspondence have a low rate the task is relatively easy but as the rate is increasing the problem is getting more and more complicated and its solutions more and more time consuming. This process may be improved in a few ways. Most mailboxes allow for some helper options as Journaling to address and automate it at a basic level. However to achieve a really good organization level it is necessary to search for external tools such as machine learning methods. Four machine learning algorithms have been implemented and their performance examined in this project. Also two additional methods based on combination of the results from the single classifiers have been implemented. Eventually all methods have been compared and the one which gives the best improvement to the e-mail classification process has been chosen. This master thesis uses a part of the Enron e-mail collection [1] for training and testing phase. The best result achieved combination of single classifiers with F-measure equal to 0.7102. The topics elaborated in the thesis, both the text and the software part, offer to the reader great knowledge about Information Retrieval, Machine Learning and related topics.

Keywords

Documents


Colophon: This page is part of the AAU Student Projects portal, which is run by Aalborg University. Here, you can find and download publicly available bachelor's theses and master's projects from across the university dating from 2008 onwards. Student projects from before 2008 are available in printed form at Aalborg University Library.

If you have any questions about AAU Student Projects or the research registration, dissemination and analysis at Aalborg University, please feel free to contact the VBN team. You can also find more information in the AAU Student Projects FAQs.