Network Intrusion Simulation: Creating Labeled Datasets with Attack Chain Analysis in an Emulated Environment

Author

Kjergaard, Jacob Norlyk

Term

4. semester

Education

Cyber Security, Master

Publication year

2024

Abstract

This thesis addresses alert fatigue in cybersecurity by enhancing Intrusion Detection Systems (IDS) through new labeled datasets derived from cyberattack simulations in an emulated network environment. The core research question is how to generate realistic traffic and fully annotated, ground-truth datasets that capture complete attack chains, so IDS and machine learning models can be trained more precisely and false positives can be reduced. The approach includes emulating a small enterprise network, producing synthetic yet realistic benign traffic, and executing detailed attack scenarios mapped to the Cyber Kill Chain (CKC) and enriched with MITRE ATT&CK TTPs. Data are collected via multi-source logging and packet capture (e.g., Wireshark), correlated into Chains of Events (CoE), and annotated with CKC stages and ground truth to create training-ready datasets. The work presents methods and architecture for dataset creation, compares attack simulation frameworks, and discusses practical challenges in traffic realism, limitations in tools such as Caldera, and emulation issues, alongside recommendations for future research. The goal is to support more precise IDS detection and thereby reduce alert fatigue in real-world settings.

Denne afhandling adresserer alert fatigue i cybersikkerhed ved at forbedre mulighederne for Intrusion Detection Systems (IDS) gennem nye, mærkede datasæt skabt fra cyberangrebssimuleringer i et emuleret netværksmiljø. Det centrale forskningsspørgsmål er, hvordan man genererer realistisk trafik og fuldt ud annoterede datasæt med ground truth, der afspejler hele angrebskæder, så IDS og maskinlæringsmodeller kan trænes mere præcist og falske positiver kan reduceres. Tilgangen omfatter emulering af et lille virksomhedsnetværk, syntetisk men realistisk benign trafik, samt detaljerede angrebsscenarier kortlagt til Cyber Kill Chain (CKC) og beriget med MITRE ATT&CK TTP’er. Data indsamles via fler-kilde logging og pakkekapture (fx Wireshark), korreleres til Chain of Events (CoE) og annoteres med CKC-stadier og ground truth for at skabe træningsklare datasæt. Arbejdet beskriver både metodik og arkitektur for datasætsskabelse, sammenligner rammeværker for angrebssimulering og diskuterer praktiske udfordringer ved realistisk trafik, begrænsninger i værktøjer som Caldera og emuleringsproblemer, samt anbefalinger til fremtidig forskning. Målet er at understøtte mere præcis detektion i IDS og dermed reducere alert fatigue i praksis.

[This abstract has been generated with the help of AI directly from the project full text]

Keywords

IDS ; Datasets ; Attack Simulation ; Cyber ; Emulated ; Network ; Benign ; Malicious

Documents

Download PDF
View record in AAU Student Projects

A master's thesis from Aalborg University

Network Intrusion Simulation: Creating Labeled Datasets with Attack Chain Analysis in an Emulated Environment