AAU Student Projects - visit Aalborg University's student projects portal
An executive master's programme thesis from Aalborg University
Book cover


Optimizing the Performance of Machine Learning Algorithms in Detecting Malicious Files using Hybrid Models

Term

4. semester

Publication year

2023

Submitted on

Pages

71

Abstract

The exfiltration of digital systems using malcrafted files has been an evolving issue for the last two decades. Malicious actors deploy diverse payloads through files that posses potentiality of evading possible detection mechanism and cause alarming harm. Leveraging the universal file format, support of advanced features like JavaScript, and inclusion of additional files make Portable Document File (PDF) and Portable Executable (PE) an apparent choice for to be weaponized by the hackers. This project explores the performance of different branches of machine learning approaches in malware detection. Two dataset each for PDF and PE files are selected after an extensive review of the existing research. At first, Gaussian Naive Bayes (GNB) and Logistic Regression (LR) algorithms are applied from the classical branch. Random Forest (RF) from bagging and Adaptive Boosting (AdaBoost) from boosting are selected from the ensemble classification. Next, three variants of Artificial Neural Network (ANN) are deployed to improve the detection. Finally, a novel hybrid approach integrating ANN and ensemble techniques is proposed for both PDF and PE files and discovered that the hybrid model outperforms all the previous models. The hybrid model combining ANN with AdaBoost achieve an accuracy of 99.51% and F1-score of 99.53% for malware detection in PDF. Similarly, 98.45% of accuracy and 98.95% of F1-score for PE files.