AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Security Alignment of Large Language Models via Jailbreaking Attacks: A Multilingual Perspective

Translated title

Security Alignment of Large Language Models via Jailbreaking Attacks

Term

4. semester

Publication year

2025

Submitted on

Pages

67

Abstract

Large language models (LLMs) have become popular in recent years due to their abilities in performing tasks such as text summarization, language translation, and code generation. However, research has shown that LLMs often come with security challenges. One such challenge is ensuring that the responses that LLMs produce does not contain any offensive content. Jailbreaking is a red teaming technique that aims to exploit LLMs with the intention of making them generate offensive responses. Jailbreaking is usually performed in either a black-box setting where the attackers have no access to the LLMs inner mechanisms and in a white-box setting where they have some access. This thesis explores LLM jailbreaking under a monolingual setting and a multilingual setting, which has been subject to less research, using a white-box setting against two open-source LLMs. The results indicate that under the monolingual setting the two LLMs are more vulnerable while under the multilingual setting the results are more ambiguous.