Data Access Problem on the Cell BE Architecture

Student thesis: Master thesis (including HD thesis)

  • Ales Kozumplik
4. term, Computer Science, Master (Master Programme)
Cell BE is a novel processor architecture suitable for multimedia applications, video games and complex scientific computations. To overcome the technological problems that pre- vent the current architectures from achieving higher performance, Cell BE introduced sev- eral novel features in its design which cause it to be more difficult to develop software for. Eight of the nine cores on the chip are not allowed to read or write the main memory di- rectly using its instruction set. Each of these cores has a 256 KB of local memory instead and can initiate DMA communication between this memory and the local memory. Pro- cessing data on these cores therefore has to be wrapped in calls to the DMA subsystem. Moreover, to avoid limiting the performance potential of the processor, the DMA commu- nication should be interleaved with computation. This implies that advanced data fetch using double buffering and other techniques is often employed. Managing buffers, DMA communication and synchronization litters the source code and counts for a substantial number of lines in it. In this work we present a semi-automatic approach to the data access problem on Cell BE. This is done by extending a traditional C-like imperative programming language with new syntax and semantics to form an experimental language called Dali. The two main extensions are accessor declaration and accessor application. In an accessor decla- ration the programmer specifies a strategy of main memory access. Then, using accessor application, the programmer applies the declared accessor to a variable. Within a scope of the application, the variable is accessed according to the declaration. For instance in the most common scenario of traversing an array of items one by one the programmer can suggest a double buffering accessor to be applied on the array vari- able. This will cause the code managing the DMA communication and the two buffers to be generated automatically by the Dali compiler. The programmer can then focus on the problem itself more and less on writing the boiler plate code. Other access methods than double buffering are also allowed in Dali. Those include speculative methods that, based on the programmer’s suggestion, prefetch data with high probability of being needed in close future. Caching methods are another possibility. Those expect high data locality and for each read element they also keep in cache other elements within the near proximity of the original location. The main limitation of the approach is that the program correctness depends on the programmer’s judgement when applying accessors. The programmer has to make sure that the suggested access strategy is in fact consistent with the way the memory is dynam- ically accessed during runtime. For instance, it would not make sense to apply a double buffering accessor on a part of the memory which is accessed in mostly a random pattern. Depending on the implementation, such solution would either slow down the system un- necessarily because most of the memory and DMA bandwidth would be wasted, or, in case no runtime checking was implemented, it would even yield incorrect program output. To provide a complete overview of the problematics, other existing approaches to man- aging memory access are summarized and taken into account in the discussion of our so- lution. To evaluate the viability of the approach, an experimental Dali compiler was de- veloped as a part of this work. It operates by parsing the Dali source code and emitting C++ code ready to be built by the Cell toolchain and executed on a Cell machine. Several simple experiments were performed with programs generated using our compiler. Both the implementation effort and the experimental results are also documented in this work.
Publication date2009
Number of pages83
Publishing institutionAalborg University
ID: 17599691