• Jens Laurits Pedersen
  • Kasper Mullesgaard
4. term, Software, Master (Master Programme)
A skyline query is useful for extracting a complete set of interesting
tuples from a large data set according to multiple criteria. The sizes of data
sets are constantly increasing and the architecture of backends are switching
from single node environments to cluster oriented setups. Previous work has
presented ways to run the skyline query in these setups using the MapReduce
framework, but the parallel possibilities are not taken advantage of since a
significant part of the query is always run serially. In this paper, we propose
the novel algorithm GPMRS that runs the entire query in parallel. This
means that GPMRS scales well for large data sets and large clusters. We
demonstrate this using experiments showing that GPMRS runs several times
faster than the alternatives for large data sets with high skyline percentages.
Publication date7 Jun 2013
Number of pages35
ID: 77335626