AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Twitter Data Mining

Author

Term

4. term

Publication year

2016

Submitted on

Pages

95

Abstract

This thesis explores how to leverage Twitter data through scalable sentiment analysis. The project aims to develop a concept that classifies streamed tweets by sentiment polarity while designing a Big Data architecture capable of processing Twitter streams on clusters. The methodology combines desktop research on state-of-the-art technologies with experimental concept development: a pipeline is built around the Twitter Streaming API, NoSQL storage (MongoDB), natural language processing (tokenization, normalization), and supervised machine learning using a Naïve Bayes classifier (Bernoulli model). Apache Spark Streaming supports continuous processing, and sentiment results are visualized as graphs and presented via a graphical user interface. The work includes analysis of tweet structure, feature extraction, training on explicitly labeled tweets, and examination of classifier accuracy. The thesis presents an architecture and proof-of-concept for real-time Twitter sentiment analysis and discusses limitations and future directions; detailed quantitative findings are not provided in the available excerpt.

Denne afhandling undersøger, hvordan Twitter-data kan udnyttes gennem skalerbar sentimentanalyse. Projektets formål er at udvikle et koncept, der klassificerer streamede tweets efter sentimentpolaritet og samtidig designe en Big Data-arkitektur, der kan behandle Twitter-strømme på klynger. Metodisk kombineres desktop-research af state-of-the-art teknologier med eksperimentel konceptudvikling: en pipeline opbygges omkring Twitter Streaming API, lagring i en NoSQL-database (MongoDB), naturlig sprogbehandling (tokenisering, normalisering) og supervisioneret maskinlæring med Naïve Bayes (Bernoulli-model). Apache Spark Streaming anvendes til løbende behandling, og sentimentresultater visualiseres som grafer samt præsenteres via en grafisk brugerflade. Arbejdet omfatter også analyse af tweet-struktur, feature-ekstraktion, træning på eksplicit mærkede tweets og vurdering af klassifikatortreffsikkerhed. Afhandlingen fremlægger en arkitektur og et proof-of-concept for realtids sentimentanalyse af Twitter, og diskuterer begrænsninger og forslag til fremtidigt arbejde; detaljerede kvantitative resultater er ikke inkluderet i det tilgængelige uddrag.

[This apstract has been generated with the help of AI directly from the project full text]