Network Intrusion Simulation: Creating Labeled Datasets with Attack Chain Analysis in an Emulated Environment
Author
Kjergaard, Jacob Norlyk
Term
4. semester
Education
Publication year
2024
Pages
99
Abstract
Denne afhandling adresserer alert fatigue i cybersikkerhed ved at forbedre mulighederne for Intrusion Detection Systems (IDS) gennem nye, mærkede datasæt skabt fra cyberangrebssimuleringer i et emuleret netværksmiljø. Det centrale forskningsspørgsmål er, hvordan man genererer realistisk trafik og fuldt ud annoterede datasæt med ground truth, der afspejler hele angrebskæder, så IDS og maskinlæringsmodeller kan trænes mere præcist og falske positiver kan reduceres. Tilgangen omfatter emulering af et lille virksomhedsnetværk, syntetisk men realistisk benign trafik, samt detaljerede angrebsscenarier kortlagt til Cyber Kill Chain (CKC) og beriget med MITRE ATT&CK TTP’er. Data indsamles via fler-kilde logging og pakkekapture (fx Wireshark), korreleres til Chain of Events (CoE) og annoteres med CKC-stadier og ground truth for at skabe træningsklare datasæt. Arbejdet beskriver både metodik og arkitektur for datasætsskabelse, sammenligner rammeværker for angrebssimulering og diskuterer praktiske udfordringer ved realistisk trafik, begrænsninger i værktøjer som Caldera og emuleringsproblemer, samt anbefalinger til fremtidig forskning. Målet er at understøtte mere præcis detektion i IDS og dermed reducere alert fatigue i praksis.
This thesis addresses alert fatigue in cybersecurity by enhancing Intrusion Detection Systems (IDS) through new labeled datasets derived from cyberattack simulations in an emulated network environment. The core research question is how to generate realistic traffic and fully annotated, ground-truth datasets that capture complete attack chains, so IDS and machine learning models can be trained more precisely and false positives can be reduced. The approach includes emulating a small enterprise network, producing synthetic yet realistic benign traffic, and executing detailed attack scenarios mapped to the Cyber Kill Chain (CKC) and enriched with MITRE ATT&CK TTPs. Data are collected via multi-source logging and packet capture (e.g., Wireshark), correlated into Chains of Events (CoE), and annotated with CKC stages and ground truth to create training-ready datasets. The work presents methods and architecture for dataset creation, compares attack simulation frameworks, and discusses practical challenges in traffic realism, limitations in tools such as Caldera, and emulation issues, alongside recommendations for future research. The goal is to support more precise IDS detection and thereby reduce alert fatigue in real-world settings.
[This summary has been generated with the help of AI directly from the project (PDF)]
Documents
