Exploring the Efficacy of Specially-Trained Transformers on Geospatial Entity Matching of Historic Toponyms

Author

Jørgensen, Morten

Term

4. term

Education

Software, Master

Publication year

2024

Submitted on

2024-06-06

Pages

Abstract

Substantial effort has been put into digitizing and extracting information from historical and ancient manuscripts. These efforts often focus on a single civilization, its language, and culture. Thereby isolating these efforts and making it harder to collaborate and share knowledge between them. Some works have tried to connect these efforts and their data based on toponym matches using traditional methods such as transliteration for toponym matching. However, results have been uneven. The advent of transformer-based language models such as BERT has brought about improved performance in many language-related tasks, including toponym matching. However, these language models are often trained over large corpora of modern text in English. Even multi-lingual models are often trained on modern texts collected on the web. Here, we examine whether creating specially-trained multi-lingual models over ancient texts matching the toponym languages can be beneficial for this task. In this paper, we examine several methods using ancient manuscripts to adapt BERT-based models to identify matching toponyms in Arabic and Hebrew, two related Semitic languages with historical dialects and sizeable corpora of ancient texts. We evaluated our methods on a historical toponym matching task comprising several datasets of toponyms extracted from Middle East scholars The evaluation results were surprising in that the models presented in this work were outperformed by a multilingual model (mBERT) that was pre-trained on modern data.

Keywords

Exploring the Efficacy of Specially-Trained Transformers on Geospatial Entity Matching of Historic Toponyms

Documents

Download
View record in AAU Student Projects

A master's thesis from Aalborg University

Exploring the Efficacy of Specially-Trained Transformers on Geospatial Entity Matching of Historic Toponyms