Integrating News Article Metadata into Topic Models

Student thesis: Master Thesis and HD Thesis

  • Rasmus Engesgaard Christensen
  • Dennis Højbjerg Rose
  • Peter Langballe Erichsen
4. term, Computer Science, Master (Master Programme)
Topic models are used to find underlying topics in a set of documents. Integrating metadata into topic models can improve their performance. We introduce models that extend latent Dirichlet allocation (LDA) to include author and category metadata information and a model which integrates taxonomy metadata into the Pachinko Allocation Model (PAM). The author-topic and category-topic models are based on the author-topic model with modifications, and the taxonomy-topic model is based on PAM. To make the PAM include the metadata information, a novel topic locking mechanism is created. The results show that for a news article dataset, our taxonomy-topic model integrates the metadata well and improves the elapsed time in comparison to the original PAM. The taxonomy-topic model has a higher topic coherence and more understandable topics than LDA. Our results show that integrating metadata can improve topic modeling in various ways.
LanguageEnglish
Publication date10 Jun 2021
Number of pages32
ID: 414400427