Cyber Threat Intelligence: Cybercrime in the Clear
Author
Frederiksen, Thomas Birk
Term
4. semester
Education
Publication year
2022
Submitted on
2022-05-22
Pages
120
Abstract
This thesis examines cybercrime activity on illicit forums on the clear web, a space that has received less attention than darknet markets. A custom web scraper collected data from multiple forums, and fine-tuned large language models were applied to perform natural language processing that classified thread intent (e.g., sales vs. requests), extracted prices, and grouped predominant product categories. The resulting dataset indicates that illicit digital products across categories sell for about $25 on average, with some items priced as low as $0.50, and that users typically request goods at roughly $5 below prevailing offers. Discussions most often concern the sale of compromised accounts and services such as account-banning, botnet operation, and web hosting. The work outlines a pipeline from scraping to NLP, addresses access and ethical considerations, and shows how clear-web sources can enrich cyber threat intelligence with structured visibility into commodities, categories, and pricing.
Dette speciale undersøger cyberkriminalitet på ulovlige fora på det åbne internet (clear web), et område der er mindre udforsket end darknet-markeder. En specialudviklet webscraper indsamlede data fra flere fora, hvorefter finjusterede store sprogmodeller blev brugt til at udføre naturlig sprogbehandling for at identificere trådenes hensigt (fx salg vs. efterspørgsel), udtrække priser og gruppere de fremherskende produktkategorier. Det resulterende datasæt viser, at ulovlige digitale produkter på tværs af kategorier i gennemsnit sælges for omkring $25, med enkelte priser helt ned til $0,50, og at brugere typisk efterspørger varer til cirka $5 under udbudsprisen. Trådene drejer sig især om salg af kompromitterede konti samt forskellige tjenester, herunder kontobannings-tjenester, botnet-tjenester og webhosting. Arbejdet beskriver den tekniske pipeline fra scraping til NLP, håndterer adgangs- og etiske overvejelser, og peger på, at clear-web-kilder kan bidrage med struktureret indsigt i varer, kategorier og prissætning til brug i cyber threat intelligence.
[This apstract has been generated with the help of AI directly from the project full text]
