Integrating News Article Metadata into Topic Models
Student thesis: Master Thesis and HD Thesis
- Rasmus Engesgaard Christensen
- Dennis Højbjerg Rose
- Peter Langballe Erichsen
4. term, Computer Science, Master (Master Programme)
Topic models are used to find underlying topics in a set of documents. Integrating metadata into topic models can improve their performance. We introduce models that extend latent Dirichlet allocation (LDA) to include author and category metadata information and a model which integrates taxonomy metadata into the Pachinko Allocation Model (PAM). The author-topic and category-topic models are based on the author-topic model with modifications, and the taxonomy-topic model is based on PAM. To make the PAM include the metadata information, a novel topic locking mechanism is created. The results show that for a news article dataset, our taxonomy-topic model integrates the metadata well and improves the elapsed time in comparison to the original PAM. The taxonomy-topic model has a higher topic coherence and more understandable topics than LDA. Our results show that integrating metadata can improve topic modeling in various ways.
Language | English |
---|---|
Publication date | 10 Jun 2021 |
Number of pages | 32 |