DIBBOlib: A Data-Intensive Black-Box Optimization Library for Apache Spark

Studenteropgave: Kandidatspeciale og HD afgangsprojekt

  • Martin Moesmann
4. semester, Software, Kandidat (Kandidatuddannelse)
Motivated by the advance of mathematical optimization within contemporary analytics, this project develops a sample-efficient black-box optimization library, extending the Apache Spark platform for data-intensive analytics. Named DIBBOlib (Data-Intensive Black-Box Optimization library), this new tool enables a data-driven, simulation-based approach to problem solving, which unlike other black-box methodologies copes with non-trivial data-intensive workloads. DIBBOlib technically forms an extension of Spark MLlib, and is designed to feel as such from a usability standpoint. It offers an extensible standard suite of optimization algorithms and generic constraint handling methods, fully integrated with Spark SQL. Mainline features include an algorithmic wizard, global support for vertical transfer learning, a novel constraint handling method, load-balanced trial parallelism, as well as dynamic search space partitioning based on a hybrid dynamic/greedy programming approach and e.g. cooperative game theory. Compared to alternatives, the library inhabits a special niche as a general-purpose solution for data-intensive analytics, while having unique features in its own right. Experiments demonstrate the usefulness of novel library features on a set of example problems.
Udgivelsesdato16 jun. 2023
ID: 535081481