AAU Student Projects - visit Aalborg University's student projects portal
A master thesis from Aalborg University

Revisiting Bilevel Optimization for Aligning Self-Supervised Pretraining with Downstream Fine-Tuning: Advancing BiSSL Through Systematic Vari ations, Novel Design Modifications, and Adaptation to New Data Domains

Author(s)

Term

4. semester

Education

Publication year

2025

Submitted on

2025-07-28

Pages

99 pages

Abstract

The BiSSL framework models the pipeline of self-supervised pretraining followed by downstream fine-tuning as the lower- and upper-levels of a bilevel optimization problem. The lower-level parameters are additionally regularized to resemble the ones of the upper-level, which collectively yields a pretrained model initialization more aligned with the downstream task. This project extends the study of BiSSL by first evaluating its sensitivity to hyperparameter variations. Then, design modifications, including adaptive lower-level regularization scaling and generalized upper-level gradient expressions, are furthermore proposed and tested. Lastly, BiSSL is adapted to natural language processing tasks using the generative pretrained transformer pretext task, and then evaluated on a range of diverse downstream tasks. Results show that BiSSL is robust towards variations in most of its hyperparameters, provided that the training duration is sufficiently long. The proposed design modifications yield no consistent improvements and may even degrade performance. For natural language processing tasks, BiSSL achieves occasional gains and otherwise matches the baseline. The findings overall suggest that the original BiSSL design is robust, effective, and able to improve downstream accuracy across input domains.

The BiSSL framework models the pipeline of self-supervised pretraining followed by downstream fine-tuning as the lower- and upper-levels of a bilevel optimization problem. The lower-level parameters are additionally regularized to resemble the ones of the upper-level, which collectively yields a pretrained model initialization more aligned with the downstream task. This project extends the study of BiSSL by first evaluating its sensitivity to hyperparameter variations. Then, design modifications, including adaptive lower-level regularization scaling and generalized upper-level gradient expressions, are furthermore proposed and tested. Lastly, BiSSL is adapted to natural language processing tasks using the generative pretrained transformer pretext task, and then evaluated on a range of diverse downstream tasks. Results show that BiSSL is robust towards variations in most of its hyperparameters, provided that the training duration is sufficiently long. The proposed design modifications yield no consistent improvements and may even degrade performance. For natural language processing tasks, BiSSL achieves occasional gains and otherwise matches the baseline. The findings overall suggest that the original BiSSL design is robust, effective, and able to improve downstream accuracy across input domains.

Keywords

Documents


Colophon: This page is part of the AAU Student Projects portal, which is run by Aalborg University. Here, you can find and download publicly available bachelor's theses and master's projects from across the university dating from 2008 onwards. Student projects from before 2008 are available in printed form at Aalborg University Library.

If you have any questions about AAU Student Projects or the research registration, dissemination and analysis at Aalborg University, please feel free to contact the VBN team. You can also find more information in the AAU Student Projects FAQs.