AAU Student Projects - visit Aalborg University's student projects portal
A master's thesis from Aalborg University
Book cover


ORB-E: Ensemble Method with Query-Specific Weight Assignment Depending on Query Characteristics

Authors

; ;

Term

4. term

Education

Publication year

2023

Pages

13

Abstract

Knowledge graphs are databases that store facts as connections between entities (such as people, places, or events). Temporal knowledge graphs add timestamps, so facts also include when they are valid. Link prediction aims to fill in missing facts—for example, who did what, with whom, and when. Many recent methods tackle this by turning graphs into numeric vectors, known as Knowledge Graph Embeddings (KGEs), which allow algorithms to score possible links over time. We evaluate several temporal KGE methods and examine where they work well or poorly. We focus on two aspects of the data: how much of the graph is time-stamped (temporal data density) and how relations behave over time, shaped by the surrounding graph structure. Using this analysis, we design ORB-E, an ensemble voting method that combines multiple models. ORB-E assigns a weight to each model for each query based on the query’s characteristics and how each model has performed for similar queries. The characteristics include the prediction target (entity, relation, or time), temporal data density, relation properties, and each model’s overall score. The weighting rules capture which factors matter most across methods and datasets. Our results show that the prediction target is the most influential factor, and that query-specific weighting performs better than fixed weights. Performance improves notably when predicting time in temporally dense data, but not for other targets. We also find that a method’s theoretical expressivity does not always match its actual performance. Furthermore, when we analyze time prediction specifically, most methods cannot estimate time within an acceptable error margin. We propose a strategy that leverages the continuous nature of time and combines predictions from multiple models; this can improve time estimates when the models have similar precision.

Vidensgrafer er databaser, der gemmer fakta som forbindelser mellem entiteter (fx personer, steder eller begivenheder). Temporale vidensgrafer tilføjer tidsstempler, så fakta også indeholder, hvornår de er gyldige. Link prediction handler om at udfylde manglende fakta—for eksempel hvem gjorde hvad, med hvem og hvornår. Mange nyere metoder løser dette ved at omsætte grafer til numeriske vektorer, kaldet Knowledge Graph Embeddings (KGE), som gør det muligt at give mulige forbindelser over tid en score. Vi vurderer flere temporale KGE-metoder og undersøger, hvor de fungerer godt eller dårligt. Vi fokuserer på to forhold i data: hvor stor en del af grafen der er tidsannoteret (tidslig datatæthed), og hvordan relationer opfører sig over tid, formet af den omgivende grafstruktur. På baggrund af analysen udvikler vi ORB-E, en ensemble-afstemningsmetode, der kombinerer flere modeller. ORB-E tildeler hver model en vægt for hver forespørgsel baseret på forespørgslens karakteristika og modellernes præstation for lignende forespørgsler. Karakteristika omfatter forudsigelsesmål (entitet, relation eller tid), tidslig datatæthed, relationsegenskaber og modellernes overordnede score. Vægtningsreglerne afspejler, hvilke faktorer der betyder mest på tværs af metoder og datasæt. Vores resultater viser, at forudsigelsesmålet er den mest indflydelsesrige faktor, og at forespørgselsspecifik vægtning klarer sig bedre end faste vægte. Ydelsen forbedres især ved tidsforudsigelser i tidsligt tætte data, men ikke for andre mål. Vi finder også, at metoders teoretiske udtrykskraft ikke altid afspejler deres faktiske præstation. Desuden viser analysen af tidsforudsigelser, at de fleste metoder ikke kan forudsige tid inden for en acceptabel fejlmargen. Vi præsenterer en strategi, der udnytter tidsinformationens kontinuerte karakter og kombinerer flere modellers forudsigelser; den kan forbedre tidsestimatet, når modellerne har nogenlunde samme præcision.

[This apstract has been rewritten with the help of AI based on the project's original abstract]