Data Poisoning attacks in Machine Learning: Risks and Defenses
Author
Song, Choeun
Term
4. semester
Education
Publication year
2025
Submitted on
2025-06-03
Pages
58
Abstract
Denne afhandling undersøger risikoen ved backdoor-dataforgiftningsangreb i maskinlæring, hvor skjulte triggere i træningsdata får modeller til at fejlklassificere bestemte inputs, mens præstationen forbliver normal på rene data. Med fokus på billedklassifikation på CIFAR-10 sammenlignes to triggere: en synlig hvid firkant og en næsten umærkelig støjbaseret trigger. Metodisk anvendes en baseline-CNN og kontrollerede eksperimenter, herunder variation af andelen af forgiftede træningseksempler, for at vurdere angrebssucces og stealth-egenskaber. Som forsvar evalueres finjustering, hvor en kompromitteret model genoplæres på et lille sæt data med trigger men med korrekte etiketter. Resultaterne viser, at begge triggertyper pålideligt kan narre modeller, og at støjbaserede triggere er sværere at opdage. Finjustering kan markant reducere backdoor-effekten uden at forringe ydeevnen på rene inputs, men er kun effektiv, når triggertypen er kendt på forhånd. Arbejdet fremhæver behovet for stærkere og mere generelle forsvarsmekanismer i takt med den udbredte anvendelse af maskinlæring.
This thesis examines the risks of backdoor data poisoning in machine learning, where hidden triggers in training data cause targeted misclassification while models retain normal performance on clean inputs. Focusing on image classification with CIFAR-10, it compares two trigger types: a visible white square and a nearly imperceptible noise-based pattern. Using a baseline CNN and controlled experiments—including varying the proportion of poisoned training data—the study measures attack success and stealth. As a defense, it evaluates fine-tuning by retraining a compromised model on a small set of triggered samples with correct labels. Findings show that both trigger types can reliably fool models, with noise-based triggers being harder to detect. Fine-tuning can substantially reduce backdoor effects without harming performance on clean inputs, but it is effective only when the trigger type is known in advance. The work underscores the need for stronger, more general defenses as machine learning is deployed more widely.
[This abstract was generated with the help of AI]
Keywords
Documents
