Declarative Data Warehouse setup in PygramETL
Author
Term
4. term
Education
Publication year
2023
Submitted on
2023-06-16
Pages
13
Abstract
In order to begin Extract-Transform-Load pro- gramming a data warehouse must be created in a database management system, and the schema of the data warehouse must be programmed in an Extract-Transform-Load framework to properly load data from sources. However, the set up of a data warehouse and the definition of a schema in an appropriate framework can be labor intensive. Furthermore, the complexity of this task increases as schemas become bigger, as the developer must ensure that the data warehouse schema matches the schema defined in the framework for Extract-Transform-Load. In this paper I present the framework DeclarativeETL which is an addition to PygramETL [3] used to generate implementation for data warehouse schema, and PygramETL. DeclarativeETL results in a DDL and Python file generated from a shared declarative specification. By exploiting TOML [6], a simple configuration language, and a simple syntax for the declarative specification, developer productivity is increased as they are only required to name dimension and fact tables, and their respective attributes and measures, in conjunction with a set of default values, e.g. schema type, and attribute and measure types. The defaults saves the developer many keystrokes as most attributes share the same type. DeclarativeETL is evaluated to be fast and lightweight while providing more than 100% increased productivity in terms of lines of code when compared to programming DDL/PygramETL manually.
Keywords
Documents
