Security Alignment of Large Language Models via Jailbreaking Attacks: A Multilingual Perspective

Translated title

Security Alignment of Large Language Models via Jailbreaking Attacks

Author

Jacobsen, Nicolai Østergaard

Term

4. semester

Education

Cyber Security, Master

Publication year

2025

Submitted on

2025-06-04

Pages

Abstract

Large language models (LLMs) have become popular in recent years due to their abilities in performing tasks such as text summarization, language translation, and code generation. However, research has shown that LLMs often come with security challenges. One such challenge is ensuring that the responses that LLMs produce does not contain any offensive content. Jailbreaking is a red teaming technique that aims to exploit LLMs with the intention of making them generate offensive responses. Jailbreaking is usually performed in either a black-box setting where the attackers have no access to the LLMs inner mechanisms and in a white-box setting where they have some access. This thesis explores LLM jailbreaking under a monolingual setting and a multilingual setting, which has been subject to less research, using a white-box setting against two open-source LLMs. The results indicate that under the monolingual setting the two LLMs are more vulnerable while under the multilingual setting the results are more ambiguous.

Keywords

LLM ; Cybersecurity ; Jailbreaking ; LLM Jailbreaking ; AI ; Large Language Models

Documents

Download
View record in AAU Student Projects

A master's thesis from Aalborg University

Security Alignment of Large Language Models via Jailbreaking Attacks: A Multilingual Perspective