Selectivity estimation of extended XML query tree patterns based on prime number labeling and synopsis modeling

Salahadin Mohammed*, Ahmad F. Barradah, El Sayed M. El-Alfy

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

With the new era of big data and the proliferation of XML documents for representing and exchanging data over the web, selectivity estimation of XML query patterns has become a crucial component of database optimizers. It helps the optimizer choose the best possible plan for query evaluation. Existing selectivity estimators for XML queries can only support basic Query Tree Patterns (QTPs) with logical AND operator. In this paper, we propose a novel approach, called XQuest, for selectivity estimation that supports extended QTPs that may contain logical operators or wildcards. This approach is based on a modified implementation of prime number labeling to construct a structural summary model of the XML data. Subsequently, a simulator of an XML query evaluator runs on the resulting model from the previous stage and aggregates the estimate for each target QTP. We conducted several experiments to study the performance of the proposed approach on three XML benchmark datasets; in terms of synopsis generation time, storage requirements, and estimation accuracy. The results show that the proposed approach can have more accurate estimates with low memory and time requirements. For example, when compared to a Sampling algorithm with the same allocated memory budget, the error rate of the proposed approach never reached 5% whereas it reached 98.5% for the Sampling algorithm.

Original languageEnglish
Pages (from-to)30-42
Number of pages13
JournalSimulation Modelling Practice and Theory
Volume64
DOIs
StatePublished - May 2016

Bibliographical note

Funding Information:
The authors would like to thank King Fahd University of Petroleum & Minerals (KFUPM), the National Science, Technology and Innovation Plan (NSTIP) under Grant Numbers 11-INF1657-04 and 11-INF1658-04 , and the Saudi ARAMCO, for their support during this work. They also thank Cheng Luo for providing them the source code of the Sampling algorithm. Special thanks for the journal editors and the anonymous reviewers who provided invaluable and generous comments to enhance the presentation and content of this paper.

Publisher Copyright:
© 2016

Keywords

  • Extended query tree patterns
  • Prime number labeling
  • Query optimization
  • Selectivity estimation
  • XML query
  • XML synopsis

ASJC Scopus subject areas

  • Software
  • Modeling and Simulation
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Selectivity estimation of extended XML query tree patterns based on prime number labeling and synopsis modeling'. Together they form a unique fingerprint.

Cite this