KALCHAS - a Dynamic Full-Text XML Search Engine

Studenteropgave: Speciale (inkl. HD afgangsprojekt)

  • Dennis Alexander Noergaard
  • Rasmus Christian Kaae
  • Duy Thanh Nguyen
4. semester, Datalogi, Kandidat (Kandidatuddannelse)
This report documents our work on engineering a dynamic XML full-text index, usable for querying structured XML documents with simple syntax. We focus primarily on designing an index structure feasible for incremental updates while also discussing associated maintenance strategies for efficiently migrating data from an in-memory index into disk based B-trees. The proposed index structure is designed as a series of cascading inverted indexes: One index kept in main memory containing the working set of documents, one incrementally built disk based B-tree containing documents recently ruled out of the former index, and finally one static disk based B-tree containing documents that have remained static in a period long time. The efficiency of minimizing the amount of data stored in the indexes is researched. We evaluate on various compression schemes in the context of compressing entire inverted lists vs. single postings. In extension, we propose a customized variable byte length encoding scheme for storing Dewey paths efficiently. We facilitate the concept of Meet operator in order to filter search results and return the most relevant XML elements. We refine our previously proposed algorithm for the Meet operator in order to increase the relevance of result sets. In conclusion we conduct empirical tests, showing that the implemented system performs reasonably within the intended environment.
Udgivelsesdatojun. 2005
ID: 61065262