Compression of Wireless Multi-Modal Foundation Models
Authors
Pedesk, Jonas ; Andersen, Emil Kirk ; Pavlidis, Michail
Term
4. semester
Education
Publication year
2026
Submitted on
2026-06-04
Abstract
This thesis examines what happens to a transformer-based neural network (a type of advanced machine learning model) when it is compressed to make it smaller and more efficient. The work focuses on three well-known compression techniques: pruning (removing less important connections in the model), quantization (storing numbers with lower precision to save space), and knowledge distillation (transferring knowledge from a large model to a smaller one). To test this, the thesis sets up a simulated communication scenario: randomly generated QAM signals (quadrature amplitude modulation) are transmitted through a Rician channel, which is used to imitate multipath effects in a factory building where radio signals are reflected by walls and machines. The neural network has to handle these signals, and the study measures how different compression methods affect its performance. The simulations are evaluated using several performance metrics to determine how far the model can be compressed, both when each technique is applied on its own and when they are combined in a single compression pipeline. The results show that two combined pipelines – first quantization, then pruning, and finally knowledge distillation (Quantize-Prune-KD), and first pruning, then quantization, and finally knowledge distillation (Prune-Quantize-KD) – are promising candidates for future so-called Foundation Models. Both pipelines significantly reduce memory usage and the number of parameters while largely preserving accuracy.
Dette speciale undersøger, hvad der sker med en transformer-baseret neuralt netværk (en type avanceret maskinlæringsmodel), når man komprimerer det for at gøre det mindre og mere effektivt. Der fokuseres på tre velkendte komprimeringsteknikker: pruning (at fjerne mindre vigtige forbindelser i modellen), kvantisering (at gemme tal i lavere præcision for at spare plads) og knowledge distillation (at overføre viden fra en stor model til en mindre model). For at teste dette opstilles en simuleret kommunikationssituation: tilfældigt genererede QAM-signaler (kvadratur-amplitudemodulation) sendes gennem en Rician-kanal, som bruges til at efterligne multipath-effekter i en fabriksbygning, hvor radiosignaler reflekteres fra vægge og maskiner. Det neurale netværk skal håndtere disse signaler, og der måles på, hvordan forskellige former for komprimering påvirker dets ydeevne. Simulationerne vurderes ved hjælp af flere forskellige performancemål, som bruges til at se, hvor meget modellen kan komprimeres, både når hver teknik bruges for sig, og når de kombineres i en samlet komprimeringskæde. Resultaterne viser, at to kombinerede forløb – først kvantisering, dernæst pruning og til sidst knowledge distillation (Quantize-Prune-KD), samt først pruning, derefter kvantisering og til sidst knowledge distillation (Prune-Quantize-KD) – begge er lovende kandidater til fremtidige såkaldte Foundation Models. De giver en tydelig reduktion i hukommelsesforbrug og antal parametre, samtidig med at modellens nøjagtighed stort set bevares.
[This abstract has been rewritten with the help of AI based on the project's original abstract]
Keywords
