AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Efficient Skyline Computation for Large Volume Data in MapReduce Utilising Multiple Reducers

Translated title

Effektiv Skyline Udregning for Store Mængder Data MapReduce ved Brug af Flere Reducers

Term

4. term

Education

Publication year

2013

Submitted on

Pages

35

Abstract

A skyline query is useful for extracting a complete set of interesting tuples from a large data set according to multiple criteria. The sizes of data sets are constantly increasing and the architecture of backends are switching from single node environments to cluster oriented setups. Previous work has presented ways to run the skyline query in these setups using the MapReduce framework, but the parallel possibilities are not taken advantage of since a significant part of the query is always run serially. In this paper, we propose the novel algorithm GPMRS that runs the entire query in parallel. This means that GPMRS scales well for large data sets and large clusters. We demonstrate this using experiments showing that GPMRS runs several times faster than the alternatives for large data sets with high skyline percentages.