AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Performance evaluation of Explainable AI algorithms against adversarial noise

Authors

;

Term

4. term

Publication year

2020

Submitted on

Abstract

Maskinlæringssystemer kan være meget præcise, men i områder med store konsekvenser som sundhed, finans og jura bliver de ofte opfattet som 'black boxes' - vi kan ikke se, hvorfor de træffer en beslutning, eller hvorfor de fejler. Forklarlig AI (XAI) forsøger at afhjælpe dette ved at vise, hvilke dele af et input (fx et billede) påvirkede modellens forudsigelse, typisk som et saliency-kort, et varmekort over vigtige områder. Små, målrettede ændringer af input - såkaldte adversarielle angreb - kan dog snyde både modeller og deres forklaringer. I dette studie undersøger vi, hvordan et udbredt angreb, Fast Gradient Sign Method (FGSM), påvirker to XAI-teknikker: Similarity Difference and Uniqueness (SIDU) og Gradient-weighted Class Activation Mapping (Grad-CAM). Vi bruger også øjensporing til at registrere, hvor mennesker naturligt kigger på billeder, og behandler disse fikseringskort som en reference for, hvad der bør være vigtigt. Sådan kan vi sammenligne menneskelig opmærksomhed med de saliency-kort, XAI producerer. Vi finder, at uden angreb stemmer Grad-CAM bedre overens med menneskelige fikseringer end SIDU. Under adversarial støj fra FGSM bytter resultaterne plads: SIDU’s forklaringer ændrer sig mindre og stemmer bedre overens, hvilket tyder på større robusthed over for adversarielle angreb.

Machine learning systems can be very accurate, but in high-stakes areas like healthcare, finance, and law they are often treated as 'black boxes'—we cannot see why they make a decision or why they fail. Explainable AI (XAI) methods try to address this by showing which parts of an input (for example, an image) influenced a model’s prediction, often as a saliency map—a heatmap of important regions. However, small, carefully crafted changes to inputs—known as adversarial attacks—can mislead both models and their explanations. This study examines how a common attack, the Fast Gradient Sign Method (FGSM), affects two XAI techniques: Similarity Difference and Uniqueness (SIDU) and Gradient-weighted Class Activation Mapping (Grad-CAM). We also use an eye tracker to record where people naturally look in images and treat these fixation maps as a reference for what should be important, allowing us to compare human attention with the saliency maps produced by XAI. We find that, without attacks, Grad-CAM aligns better with human fixations than SIDU. Under adversarial noise from FGSM, the ranking reverses: SIDU’s explanations change less and align better, indicating greater robustness to adversarial attacks.

[This abstract was generated with the help of AI]