AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Bridge-DB: Query Optimization in a Multi-Database System

Authors

;

Term

4. term

Education

Publication year

2015

Submitted on

Pages

38

Abstract

Vi præsenterer Bridge-DB, et distribueret databasesystem, der gør det muligt at bruge flere datakilder, som om de var ét system. Målet er at forenkle arbejdet på tværs af forskellige databaser uden at kræve forhåndskendskab til, hvordan de er bygget. Bridge-DB har sit eget forespørgselssprog, BQL, som understøtter standard CRUD-operationer (oprette, læse, opdatere, slette). Systemet er forbundet til PostgreSQL (en relationsdatabase) og Neo4J (en grafdatabase), og den modulære arkitektur gør det muligt at tilføje andre lagringssystemer via driver-moduler. Vores hovedbidrag er en cost-baseret optimizer, der kombinerer en dynamisk model og en black-box-model (dvs. den bruger målinger frem for interne detaljer) til at vælge, hvor en forespørgsel skal køre. Optimizeren kan også dele eller duplikere arbejdet på flere databaser og derefter efterbehandle resultaterne for at opfylde forespørgslen. Vi evaluerer løsningen på to datasæt, som hver især favoriserer henholdsvis Neo4J og PostgreSQL, samt en kombination af dem, for at vurdere cost-modellens effektivitet. Målt på svartider, datatrafik og systemets overhead viser vi, at modellen forbedrer svartiderne, men øger datatrafikken mellem Neo4J og Bridge-DB.

We present Bridge-DB, a distributed database system that lets users work with multiple data sources as if they were a single system. It aims to simplify working across different databases without requiring prior knowledge of their internal architecture. Bridge-DB offers its own query language, BQL, which supports standard CRUD operations (create, read, update, delete). The system connects to PostgreSQL (a relational database) and Neo4J (a graph database), and its modular design allows adding other storage systems via driver modules. Our main contribution is a cost-based optimizer that combines a dynamic model and a black-box model (i.e., it relies on measurements rather than internal details) to decide where a query should run. The optimizer can also split or duplicate work across multiple databases and then post-process the results to satisfy the query. We evaluate the solution on two datasets that respectively favor Neo4J and PostgreSQL, as well as a combined case, to assess the effectiveness of the cost model. Based on response times, data traffic, and system overhead, we show that the model improves response times while increasing data traffic between Neo4J and Bridge-DB.

[This abstract was generated with the help of AI]