DIPAAL: DIstributed PostgreSQL-based AIS Analytics and Loading
Student thesis: Master Thesis and HD Thesis
- Mikael Vind Mikkelsen
- Lau Ernebjerg Josefsen
- Alex Skov Klitgaard
4. term, Software, Master (Master Programme)
AIS data show promise for analytical purposes, but
as the data are not intended for analysis, the data need to be
cleaned, processed, and stored before being usable. This paper
presents an extension of DIPAAL, a system consisting of an
efficient and modular ETL process for loading AIS data, as
well as a distributed data warehouse storing the trajectories of
ships. A spatially distributed data warehouse, with granularized
cell and heatmap representations, is designed, developed, and
evaluated. At the time of writing, DIPAAL stores 414 million
kilometres of ship trajectories and more than 10 billion rows in
the largest relation. It is found that the introduced granularized
cell representation resolved out-of-memory errors of previous
work, while improving the runtime of up to 324% compared
to a trajectory-based query. It is also found that the spatially
divided shards enable a consistently good scale up for both cell
and heatmap analytics in large areas, ranging between 354% to
1164% with a 5x increase in workers. Lastly, it is found that
the spatial divisions become slightly skewed over time, as traffic
patterns evolve.
as the data are not intended for analysis, the data need to be
cleaned, processed, and stored before being usable. This paper
presents an extension of DIPAAL, a system consisting of an
efficient and modular ETL process for loading AIS data, as
well as a distributed data warehouse storing the trajectories of
ships. A spatially distributed data warehouse, with granularized
cell and heatmap representations, is designed, developed, and
evaluated. At the time of writing, DIPAAL stores 414 million
kilometres of ship trajectories and more than 10 billion rows in
the largest relation. It is found that the introduced granularized
cell representation resolved out-of-memory errors of previous
work, while improving the runtime of up to 324% compared
to a trajectory-based query. It is also found that the spatially
divided shards enable a consistently good scale up for both cell
and heatmap analytics in large areas, ranging between 354% to
1164% with a 5x increase in workers. Lastly, it is found that
the spatial divisions become slightly skewed over time, as traffic
patterns evolve.
Language | English |
---|---|
Publication date | 2023 |
Number of pages | 25 |
External collaborator | Geodatastyrelsen Ove Andersen ovand@gst.dk Information group |