Using XML Data in OLAP Queries

Student thesis: Master Thesis and HD Thesis

  • Karsten Riis
  • Dennis Pedersen
4. term, Computer Science, Master (Master Programme)
The changing data requirements of today''s dynamic business environments are not handled well by current On-Line Analytical Processing (OLAP) systems. Physically integrating unexpected data into such systems is a long and time-consuming process making logical integration the better choice in many situations. The increasing use of Extended Markup Language (XML), e.g. in business-to-business (B2B) applications, suggests that the required data will often be available as XML data.
In this paper we present a flexible and theoretically well-founded approach to the logical federation of OLAP and XML data sources. This makes it possible to reference external XML data in OLAP queries, which allows XML data to be presented along with dimensional data in the result of an OLAP query, and enables the use of XML data for selection and grouping. Special care is taken to ensure that semantic problems do not occur in the integration process. To demonstrate the capabilities of this approach, we present a multi-schema query language based on the SQL and XPath languages. A complete federated system is also presented, covering all important areas of a federated approach to the integration of OLAP and XML. This work includes a complete formal background, a collection of algebraic rewrite rules, architectural and procedural design, and several effective cost based optimization techniques. A prototype is being developed and initial experimental studies have been conducted, indicating that our federated approach is indeed a feasible alternative to physical integration. Thus, our federated approach provides a powerful and flexible way to handle unexpected or short-term data requirements as well as rapidly changing data. As almost all data sources can be efficiently wrapped in XML format, the approach also allows the logical integration of external data from sources such as relational, object-relational, and object databases, opening up totally new application areas for OLAP.
Publication dateJun 2001
ID: 61080512