Integrating News Article Metadata into Topic Models

Student thesis: Master Thesis and HD Thesis

  • Rasmus Engesgaard Christensen
  • Dennis Højbjerg Rose
  • Peter Langballe Erichsen
4. term, Computer Science, Master (Master Programme)
Topic models are used to find underlying topics in a set of documents. Integrating metadata into topic models can improve their performance. We introduce models that extend latent Dirichlet allocation (LDA) to include author and category metadata information and a model which integrates taxonomy metadata into the Pachinko Allocation Model (PAM). The author-topic and category-topic models are based on the author-topic model with modifications, and the taxonomy-topic model is based on PAM. To make the PAM include the metadata information, a novel topic locking mechanism is created. The results show that for a news article dataset, our taxonomy-topic model integrates the metadata well and improves the elapsed time in comparison to the original PAM. The taxonomy-topic model has a higher topic coherence and more understandable topics than LDA. Our results show that integrating metadata can improve topic modeling in various ways.
Publication date10 Jun 2021
Number of pages32
ID: 414400427