AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Integrating News Article Metadata into Topic Models

Authors

; ;

Term

4. term

Publication year

2021

Submitted on

Pages

32

Abstract

Topic models are used to find underlying topics in a set of documents. Integrating metadata into topic models can improve their performance. We introduce models that extend latent Dirichlet allocation (LDA) to include author and category metadata information and a model which integrates taxonomy metadata into the Pachinko Allocation Model (PAM). The author-topic and category-topic models are based on the author-topic model with modifications, and the taxonomy-topic model is based on PAM. To make the PAM include the metadata information, a novel topic locking mechanism is created. The results show that for a news article dataset, our taxonomy-topic model integrates the metadata well and improves the elapsed time in comparison to the original PAM. The taxonomy-topic model has a higher topic coherence and more understandable topics than LDA. Our results show that integrating metadata can improve topic modeling in various ways.