Revisiting Bilevel Optimization for Aligning Self-Supervised Pretraining with Downstream Fine-Tuning: Advancing BiSSL Through Systematic Vari ations, Novel Design Modifications, and Adaptation to New Data Domains

Author

Zakarias, Gustav Wagner

Term

4. semester

Education

Mathematical Engineering, Master

Publication year

2025

Submitted on

2025-07-28

Pages

Abstract

The BiSSL framework models the pipeline of self-supervised pretraining followed by downstream fine-tuning as the lower- and upper-levels of a bilevel optimization problem. The lower-level parameters are additionally regularized to resemble the ones of the upper-level, which collectively yields a pretrained model initialization more aligned with the downstream task. This project extends the study of BiSSL by first evaluating its sensitivity to hyperparameter variations. Then, design modifications, including adaptive lower-level regularization scaling and generalized upper-level gradient expressions, are furthermore proposed and tested. Lastly, BiSSL is adapted to natural language processing tasks using the generative pretrained transformer pretext task, and then evaluated on a range of diverse downstream tasks. Results show that BiSSL is robust towards variations in most of its hyperparameters, provided that the training duration is sufficiently long. The proposed design modifications yield no consistent improvements and may even degrade performance. For natural language processing tasks, BiSSL achieves occasional gains and otherwise matches the baseline. The findings overall suggest that the original BiSSL design is robust, effective, and able to improve downstream accuracy across input domains.

Keywords

Machine Learning ; Self-Supervised Learning ; Bilevel Optimization ; Bi-Level Optimization ; Fine-Tuning ; Deep Learning ; Model Alignment ; Pretraining ; Pre-Training ; Downstream Task ; SimCLR ; BYOL ; GPT ; Transformer ; Generative Pretrained Transformer ; ResNet ; ViT ; BiSSL ; Pretext Task

Documents

Download
View record in AAU Student Projects

A master's thesis from Aalborg University

Revisiting Bilevel Optimization for Aligning Self-Supervised Pretraining with Downstream Fine-Tuning: Advancing BiSSL Through Systematic Vari ations, Novel Design Modifications, and Adaptation to New Data Domains