Price Prediciton - How Much Data Do You Need?: An Ablation Study of Short-Term Electricity Price Prediction Utilising Trees, Neural Networks, and AutoGluon
Author
Gislinge, Nicklas Peter Kvist
Term
4. term
Education
Publication year
2026
Submitted on
2026-06-11
Pages
14
Abstract
Short-term electricity price forecasts are essential for industrial planning and grid stability. Yet many studies rely on highly specialized, data-hungry models that are costly to build and maintain. This thesis conducts a large-scale ablation study of the Danish day-ahead market, systematically evaluating 1,052 default-configured machine learning models across eight model families, including tree-based models, neural networks, and the AutoML ensemble AutoGluon. Models are assessed with walk-forward validation, which mimics real operations by training on past data and testing on future periods in rolling windows. We examine 13 progressively richer feature sets, eight forecasting horizons from 0 to 168 hours ahead, and two prediction targets: absolute price and price difference. Under default settings, general-purpose tree-based models—especially CatBoost—outperform neural networks on short-term horizons. Feature contribution analysis shows that historical prices are the most informative predictors, while adding lagged grid consumption metrics yields diminishing or even negative returns. Supplementary experiments indicate that targeted hyperparameter tuning can significantly reduce errors, measured by mean absolute error (MAE). Replacing raw station-level weather data with professionally processed commercial weather data offers no meaningful predictive advantage. These findings offer practical guidance: strong accuracy is achievable with out-of-the-box algorithms and a compact feature set of weather, time, and historical prices, avoiding the operational overhead of specialized deep learning and exhaustive data acquisition and maintenance.
Kortsigtede prognoser for elpriser er vigtige for planlægning i industrien og for stabil drift af elnettet. Alligevel bygger mange studier på meget komplekse modeller og fastlåste datasæt, som er dyre at udvikle og vedligeholde. Denne afhandling gennemfører et storstilet ablationsstudie af det danske day-ahead elmarked. Vi evaluerer systematisk 1.052 maskinlæringsmodeller i standardopsætning fordelt på otte modeltyper, herunder træbaserede modeller, neurale netværk og AutoML-ensemblet AutoGluon. Modellerne vurderes med walk-forward-validering, der efterligner drift ved at træne på fortid og teste på fremtid i rullende vinduer. Vi tester 13 gradvist udvidede featuresammensætninger, otte prognosehorisonter fra 0 til 168 timer, og to målsætninger: absolut pris og prisforskel. Resultaterne viser, at i standardopsætning klarer generelle træbaserede modeller — især CatBoost — sig bedre end neurale netværk på kortsigtede horisonter. Analyse af, hvilke variable der bidrager mest, peger på, at historiske priser er den vigtigste kilde til forudsigelsesevne, mens det at tilføje forsinkede mål for netforbrug giver aftagende eller negative gevinster. Supplerende forsøg viser, at målrettet hyperparametertuning kan reducere fejlene markant, målt som mean absolute error (MAE). Derudover giver det ingen væsentlig forbedring at udskifte rå vejrdata fra stationer med professionelt bearbejdede kommercielle vejrdata. Konklusionen er praktisk anvendelig: Man kan opnå solid prognosepræcision med algoritmer i standardopsætning og et kompakt featuresæt bestående af vejr, tid og historiske priser — og dermed undgå den driftsmæssige byrde ved specialiserede deep learning-arkitekturer og omfattende dataindsamling og -vedligehold.
[This apstract has been rewritten with the help of AI based on the project's original abstract]
Keywords
Short-Term ; Elektricitet ; Pris ; Forudsigelse ; Machine Learning ; Ablation ; CatBoost ; Feature Importance ; Vejr ; DMI ; Energinet ; Nord Pool
