AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


Quality Assessment of Danish government geographical data using the gradient boosting algorithm Catboost

Author

Term

4. term

Publication year

2012

Submitted on

Pages

75

Abstract

Geodata er blevet en central ressource for både erhverv og offentlige myndigheder, og deres værdi afhænger af kvaliteten. Dette speciale undersøger datakvaliteten af Danmarks åbne geografiske vandfladedata (søer) fra GeoDanmark/SDFE. Formålet er at vurdere, hvor pålidelige de registrerede søobjekter er, og om kombinationen af fjernmåling og maskinlæring kan understøtte systematisk kvalitetssikring. Metodisk kobles multispektrale satellitobservationer fra ESA’s Sentinel-2 med gradient boosting‑algoritmen CatBoost for at beregne sandsynligheder for, om registrerede søer faktisk repræsenterer vand i terrænet. I den analyserede stikprøve vurderes omkring 28 % af objekterne som fejlklassificerede, et tal der kan være påvirket af sæsonvariationer i vandforekomster. Resultaterne indikerer, at CatBoost er et effektivt redskab til kvalitetsvurdering af geospatiale datasæt og peger på behovet for yderligere undersøgelser af Danmarks åbne geografiske data.

Geodata have become a key asset for industry and public authorities, and their value depends on data quality. This thesis evaluates the quality of Denmark’s open geographic water body (lake) data from GeoDanmark/SDFE. The aim is to assess how reliable the recorded lake features are and whether combining Earth observation and machine learning can support systematic quality control. The approach integrates multispectral Sentinel‑2 satellite imagery from ESA with the CatBoost gradient boosting algorithm to estimate the likelihood that mapped lakes correspond to actual water on the ground. In the analyzed sample, about 28% of features are assessed as incorrect, a figure that may be biased by seasonal variability in water bodies. The findings indicate that CatBoost is an effective tool for geospatial data quality assessment and motivate further examination of Denmark’s open geographic datasets.

[This summary has been generated with the help of AI directly from the project (PDF)]