Author(s)
Term
4. term
Education
Publication year
2023
Submitted on
2023-06-16
Abstract
Motivated by the advance of mathematical optimization within contemporary analytics, this project develops a sample-efficient black-box optimization library, extending the Apache Spark platform for data-intensive analytics. Named DIBBOlib (Data-Intensive Black-Box Optimization library), this new tool enables a data-driven, simulation-based approach to problem solving, which unlike other black-box methodologies copes with non-trivial data-intensive workloads. DIBBOlib technically forms an extension of Spark MLlib, and is designed to feel as such from a usability standpoint. It offers an extensible standard suite of optimization algorithms and generic constraint handling methods, fully integrated with Spark SQL. Mainline features include an algorithmic wizard, global support for vertical transfer learning, a novel constraint handling method, load-balanced trial parallelism, as well as dynamic search space partitioning based on a hybrid dynamic/greedy programming approach and e.g. cooperative game theory. Compared to alternatives, the library inhabits a special niche as a general-purpose solution for data-intensive analytics, while having unique features in its own right. Experiments demonstrate the usefulness of novel library features on a set of example problems.
Documents
Colophon: This page is part of the AAU Student Projects portal, which is run by Aalborg University. Here, you can find and download publicly available bachelor's theses and master's projects from across the university dating from 2008 onwards. Student projects from before 2008 are available in printed form at Aalborg University Library.
If you have any questions about AAU Student Projects or the research registration, dissemination and analysis at Aalborg University, please feel free to contact the VBN team. You can also find more information in the AAU Student Projects FAQs.