Efficient Skyline Computation for Large Volume Data in MapReduce Utilising Multiple Reducers
Translated title
Effektiv Skyline Udregning for Store Mængder Data MapReduce ved Brug af Flere Reducers
Authors
Term
4. term
Education
Publication year
2013
Submitted on
2013-06-07
Pages
35
Abstract
A skyline query is useful for extracting a complete set of interesting tuples from a large data set according to multiple criteria. The sizes of data sets are constantly increasing and the architecture of backends are switching from single node environments to cluster oriented setups. Previous work has presented ways to run the skyline query in these setups using the MapReduce framework, but the parallel possibilities are not taken advantage of since a significant part of the query is always run serially. In this paper, we propose the novel algorithm GPMRS that runs the entire query in parallel. This means that GPMRS scales well for large data sets and large clusters. We demonstrate this using experiments showing that GPMRS runs several times faster than the alternatives for large data sets with high skyline percentages.
