Efficient Skyline Computation for Large Volume Data in MapReduce Utilising Multiple Reducers
Translated title
Effektiv Skyline Udregning for Store Mængder Data MapReduce ved Brug af Flere Reducers
Authors
Pedersen, Jens Laurits ; Mullesgaard, Kasper
Term
4. term
Education
Publication year
2013
Submitted on
2013-06-07
Pages
35
Abstract
En skyline-forespørgsel bruges til at finde alle “interessante” poster i et datasæt ud fra flere kriterier på én gang—altså poster, der ikke er klart dårligere end andre på tværs af målene, uden at samle alt i én samlet score. Datasæt bliver stadig større, og bagvedliggende systemer flytter fra enkeltmaskiner til klyngebaserede opsætninger. Tidligere løsninger har brugt MapReduce til at køre skyline-forespørgsler i sådanne miljøer, men en væsentlig del af arbejdet udføres stadig serielt, hvilket begrænser den parallelle udnyttelse. I denne afhandling foreslår vi den nye algoritme GPMRS, som kører hele forespørgslen parallelt. Det betyder, at GPMRS skalerer godt til både store datasæt og store klynger. Gennem eksperimenter viser vi, at GPMRS er flere gange hurtigere end alternativerne for store datasæt med en høj skyline-procent (dvs. når mange poster ender i skylinen).
A skyline query helps find all “interesting” items in a dataset based on several criteria at once—that is, items that are not clearly worse than others across all measures, without combining everything into a single score. Datasets keep growing, and backend systems are moving from single machines to cluster-based setups. Prior work has used the MapReduce framework to run skyline queries in these environments, but a significant part of the computation still runs serially, limiting parallelism. In this thesis, we propose a new algorithm, GPMRS, that executes the entire query in parallel. As a result, GPMRS scales well to both large datasets and large clusters. Our experiments show that GPMRS is several times faster than alternatives for large datasets with a high skyline percentage (i.e., when many items end up in the skyline).
[This abstract was generated with the help of AI]
