AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Big Data Cloud Computing Infrastructure Framework: A framework for developing reproducible cloud computing infrastructures suitable for big data processing jobs

Authors

;

Term

4. Term

Publication year

2019

Submitted on

Pages

98

Abstract

At køre big data-opgaver—opgaver der behandler meget store datamængder—kræver, at flere værktøjer arbejder sammen. Denne afhandling samler en software-ramme: et sæt softwarepakker, der er designet til at skabe et konsistent, reproducerbart miljø til big data-opgaver. Jeg begrunder valget af hvert værktøj, forklarer den tilhørende kode og demonstrerer det færdige miljø ved at køre eksperimentelle big data-jobs. Beregningsinfrastrukturen bygges på Google Cloud Platform (GCP), en cloud-udbyder. Terraform, et værktøj til infrastruktur som kode, bruger GCP’s API til automatisk at oprette hardwareinfrastrukturen. Nix, et pakkehåndteringsværktøj, henter, installerer og konfigurerer softwarepakkerne for at sikre en forudsigelig softwareopsætning. Resultatet er en ramme, som andre kan genbruge til at opbygge lignende miljøer eller tilpasse og udvide koden præsenteret i dette arbejde.

Running big data workloads—tasks that process very large amounts of data—requires multiple tools to work together. This thesis assembles a software framework: a set of software packages designed to create a consistent, reproducible environment for big data tasks. I explain why each tool was chosen, describe the accompanying code, and demonstrate the finished environment by running experimental big data jobs. The computing infrastructure is built on Google Cloud Platform (GCP), a cloud provider. Terraform, an infrastructure-as-code tool, uses the GCP API to automatically create the hardware infrastructure. Nix, a package manager, downloads, sets up, and configures the software packages to provide a predictable software setup. The resulting framework can be reused to build similar environments or adapted and extended with the code presented in this work.

[This abstract was generated with the help of AI]