Accelerated Parallel Library: A .NET library for GPGPU programming using the Task Parallel Library abstraction
Authors
Hørup, Søren Alsbjerg ; Juul, Søren Andreas ; Larsen, Henrik Holtegaard
Term
4. term
Education
Publication year
2011
Submitted on
2011-06-07
Pages
143
Abstract
This thesis introduces the Accelerated Parallel Library (APL), a .NET library for GPGPU programming in Common Language Infrastructure languages such as C# and VB.NET. The goal is to enable GPU acceleration with minimal code changes by exposing the same programming interface as the Task Parallel Library’s Parallel class. APL uses reflection to access Common Intermediate Language (CIL), just-in-time compiles it to Parallel Thread Execution (PTX), and runs it on the GPU via the CUDA Driver API. The library currently supports the CIL opcodes needed for a benchmark suite of four benchmarks. Experiments show that APL is generally slower than handwritten CUDA C, but on Vector Addition and Black Scholes it achieves steady-state speedups of 1.03x and 1.02x compared to the CUDA C implementations. APL also outperforms the Task Parallel Library (TPL) in most cases, with a maximum observed speedup of 82x in one case. These results demonstrate the viability of a TPL-like abstraction for leveraging GPUs in .NET while highlighting a remaining performance gap to specialized CUDA C in several scenarios.
Denne afhandling præsenterer Accelerated Parallel Library (APL), et .NET-bibliotek til GPGPU-programmering i Common Language Infrastructure-sprog som C# og VB.NET. Formålet er at gøre GPU-acceleration tilgængelig for .NET-udviklere med minimale kodeændringer ved at tilbyde samme programmeringsgrænseflade som Task Parallel Librarys Parallel-klasses parallelle løkker. APL anvender refleksion til at læse Common Intermediate Language (CIL), JIT-kompilerer til Parallel Thread Execution (PTX) og eksekverer på GPU’en via CUDA Driver API. Biblioteket understøtter de CIL-opkoder, der kræves for en benchmarksuite bestående af fire benchmarks. Evalueringen viser, at APL generelt er langsommere end håndskrevet CUDA C, men i to benchmarks (Vector Addition og Black Scholes) opnår APL en steady-state hastighedsforøgelse på hhv. 1.03x og 1.02x i forhold til CUDA C. Samtidig overgår APL i de fleste tilfælde Task Parallel Library (TPL) og giver i et tilfælde en hastighedsforøgelse på 82x. Resultaterne indikerer, at TPL-lignende abstraktion i .NET kan udnytte GPU’er effektivt og brugervenligt, om end der stadig er en ydeevnekløft til specialiseret CUDA C i flere scenarier.
[This apstract has been generated with the help of AI directly from the project full text]
