AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Accelerating Static Taint Analysis with CUDA Toolkit

Authors

; ;

Term

4. term

Education

Publication year

2022

Submitted on

Abstract

This report examines whether static taint analysis can be accelerated by offloading computation from the CPU to the GPU using Nvidia’s CUDA toolkit. Building on a prior matrix-based attempt that ran much slower than a CPU baseline, the analysis is redesigned to better exploit GPUs’ SIMT execution: the control-flow graph (CFG) is stored as a contiguous structure with uniform node types to simplify device transfers and enable consistent threading. The work extends the SC toy language and implements two GPU variants: bit-cuda, which assigns one thread per CFG node to propagate taint in parallel, and cuda work-list, which reduces idle work via a work list coordinated with atomic operations. To ensure a fair comparison, the CPU implementation was substantially optimized. Benchmarks indicate the CPU is faster on smaller programs, while the GPU becomes slightly faster only around 700,000 CFG nodes. Overall, the CUDA approach improves on the prior matrix-based GPU solution by roughly a factor of 1000, yet does not consistently outperform the optimized CPU; nevertheless, the results suggest GPU and CPU can be used in tandem to increase overall compilation throughput.

Denne rapport undersøger, om statisk taint-analyse kan accelereres ved at flytte beregningerne fra CPU til GPU med Nvidias CUDA-værktøjskasse. Som opfølgning på et tidligere matrix-baseret forsøg, der var væsentligt langsommere end en CPU-benchmark, redesignes analysen, så den bedre udnytter GPU’ers SIMT-model: kontrollstrømsgrafen (CFG) lagres som en sammenhængende datastruktur med ensartede nodetyper for at lette kopiering til enheden og ensartet trådudførelse. Arbejdet udvider det anvendte minisprog SC og implementerer to GPU-varianter: bit-cuda, hvor én tråd håndterer hver CFG-node og kan propagere taint parallelt, samt cuda work-list, der reducerer spildarbejde via en arbejdsliste synkroniseret med atomare operationer. For en retfærdig sammenligning blev CPU-analysen samtidig optimeret betydeligt. Benchmarkresultater viser, at CPU’en er hurtigere på mindre programmer, mens GPU’en først bliver en smule hurtigere omkring 700.000 CFG-noder. Samlet set forbedrer CUDA-tilgangen den tidligere matrix-baserede GPU-løsning med cirka en faktor 1000, men slår ikke konsekvent den optimerede CPU; dog peger resultaterne på, at GPU og CPU kan bruges side om side for at øge den samlede throughput i en kompilationskæde.

[This apstract has been generated with the help of AI directly from the project full text]