Term
4. term
Education
Publication year
2021
Submitted on
2021-06-10
Pages
32 pages
Abstract
Topic models are used to find underlying topics in a set of documents. Integrating metadata into topic models can improve their performance. We introduce models that extend latent Dirichlet allocation (LDA) to include author and category metadata information and a model which integrates taxonomy metadata into the Pachinko Allocation Model (PAM). The author-topic and category-topic models are based on the author-topic model with modifications, and the taxonomy-topic model is based on PAM. To make the PAM include the metadata information, a novel topic locking mechanism is created. The results show that for a news article dataset, our taxonomy-topic model integrates the metadata well and improves the elapsed time in comparison to the original PAM. The taxonomy-topic model has a higher topic coherence and more understandable topics than LDA. Our results show that integrating metadata can improve topic modeling in various ways.
Keywords
Documents
Colophon: This page is part of the AAU Student Projects portal, which is run by Aalborg University. Here, you can find and download publicly available bachelor's theses and master's projects from across the university dating from 2008 onwards. Student projects from before 2008 are available in printed form at Aalborg University Library.
If you have any questions about AAU Student Projects or the research registration, dissemination and analysis at Aalborg University, please feel free to contact the VBN team. You can also find more information in the AAU Student Projects FAQs.