AAU Student Projects - visit Aalborg University's student projects portal
A master thesis from Aalborg University

Security Alignment of Large Language Models via Jailbreaking Attacks: A Multilingual Perspective

[Security Alignment of Large Language Models via Jailbreaking Attacks]

Author(s)

Term

4. semester

Education

Publication year

2025

Submitted on

2025-06-04

Pages

67 pages

Abstract

Large language models (LLMs) have become popular in recent years due to their abilities in performing tasks such as text summarization, language translation, and code generation. However, research has shown that LLMs often come with security challenges. One such challenge is ensuring that the responses that LLMs produce does not contain any offensive content. Jailbreaking is a red teaming technique that aims to exploit LLMs with the intention of making them generate offensive responses. Jailbreaking is usually performed in either a black-box setting where the attackers have no access to the LLMs inner mechanisms and in a white-box setting where they have some access. This thesis explores LLM jailbreaking under a monolingual setting and a multilingual setting, which has been subject to less research, using a white-box setting against two open-source LLMs. The results indicate that under the monolingual setting the two LLMs are more vulnerable while under the multilingual setting the results are more ambiguous.

Keywords

Documents


Colophon: This page is part of the AAU Student Projects portal, which is run by Aalborg University. Here, you can find and download publicly available bachelor's theses and master's projects from across the university dating from 2008 onwards. Student projects from before 2008 are available in printed form at Aalborg University Library.

If you have any questions about AAU Student Projects or the research registration, dissemination and analysis at Aalborg University, please feel free to contact the VBN team. You can also find more information in the AAU Student Projects FAQs.