AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


The Comparative analysis of different predictive analytics models in predicting cyberbullying

Author

Term

4. Semester

Publication year

2024

Submitted on

Abstract

Social media, especially the platform X, has blurred the boundaries of free speech and can enable harmful behavior such as cyberbullying. This thesis examines how to detect cyberbullying more effectively by comparing several machine learning models and by testing whether sentiment analysis and Psychosocial Safety Climate (PSC) principles make the models more efficient. Using 2,000 tweets, we apply TextBlob for sentiment analysis and convert text into numbers with TF-IDF vectorization (a method that weights words by how informative they are). We train and test Multinomial Naive Bayes, Random Forests, XGBoost, and Support Vector Machines (SVM). Performance is assessed with 5-fold cross-validation, and model settings are fine-tuned with GridSearchCV (a systematic search for the best parameters). We compare accuracy (how often the model is right), precision (how often flagged posts are truly bullying), recall (how many bullying posts it finds), and F1 score (the balance of precision and recall). Results show that Random Forest and XGBoost achieved the highest overall scores, 0.761 and 0.740. Multinomial Naive Bayes is exceptionally fast, making it suitable for real-time use. Adding sentiment analysis improves detection by capturing emotional context, and PSC principles enhance effectiveness by incorporating features such as "number_negative_words" and "number_positive_words". Overall, the study highlights that combining machine learning with psychosocial theory strengthens cyberbullying detection. Model choice should match the application: Random Forest for a balance of performance and interpretability, XGBoost for high accuracy, and Multinomial Naive Bayes for efficiency. Future work should expand datasets, address privacy concerns, and add features like social network analysis, while involving administrators and moderators to improve online safety in practice.

Sociale medier, især platformen X, har gjort grænserne for ytringsfrihed uklare og kan muliggøre skadelig adfærd som cybermobning. Dette speciale undersøger, hvordan man mere effektivt kan opdage cybermobning ved at sammenligne flere typer maskinlæringsmodeller og ved at teste, om følelsesanalyse og principper fra det psykosociale sikkerhedsklima (PSC) gør modellerne mere effektive. Med 2.000 tweets anvendes TextBlob til følelsesanalyse, og teksten omsættes til tal med TF-IDF-vektorisering (en metode der vægter ord efter, hvor informative de er). Vi træner og tester Multinomial Naive Bayes, Random Forests, XGBoost og Support Vector Machines (SVM). Ydelsen vurderes med 5-fold krydsvalidering, og modelindstillinger finjusteres med GridSearchCV (en systematisk søgning efter de bedste parametre). Vi sammenligner nøjagtighed (hvor ofte modellen har ret), præcision (hvor stor en andel af de markerede opslag der faktisk er mobning), recall/genkaldelse (hvor stor en andel af mobbeopslagene modellen finder) og F1-score (en balance mellem præcision og recall). Resultaterne viser, at Random Forest og XGBoost opnår de højeste samlede scorer, 0.761 og 0.740. Multinomial Naive Bayes er til gengæld meget hurtig at køre og egner sig til realtidsbrug. At inddrage følelsesanalyse forbedrer registreringen ved at udnytte det følelsesmæssige indhold, og PSC-principper øger effekten ved at indarbejde træk som "number_negative_words" og "number_positive_words". Samlet peger studiet på, at en kombination af maskinlæring og psykosocial teori styrker detektionen af cybermobning. Valg af model bør afhænge af anvendelsen: Random Forest for en balance mellem ydelse og fortolkbarhed, XGBoost for høj nøjagtighed, og Multinomial Naive Bayes for effektivitet. Fremtidigt arbejde bør omfatte større datasæt, håndtere privatlivshensyn og tilføje funktioner som social netværksanalyse, samt inddrage administratorer og moderatorer for at forbedre online sikkerhed i praksis.

[This apstract has been rewritten with the help of AI based on the project's original abstract]