XHQE: A hybrid system for scalable selectivity estimation of XML queries

E. S.M. El-Alfy*, S. Mohammed, A. F. Barradah

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


With the increasing popularity of XML applications in enterprise and big data systems, the use of efficient query optimizers is becoming very essential. The performance of an XML query optimizer depends heavily on the query selectivity estimators it uses to find the best possible query execution plan. In this work, we propose a novel selectivity estimator which is a hybrid of structural synopsis and statistics, called XHQE. The structural synopsis enhances the accuracy of estimation and the structural statistics makes it scalable to the allocated memory space. The structural synopsis is generated by labeling the nodes of the source XML dataset using a fingerprint function and merging subtrees with similar fingerprints (i.e. having similar structures). The generated structural synopsis and structural statistics are then used to estimate the selectivity of given queries. We studied the performance of the proposed approach using different types of queries and four benchmark datasets with different structural characteristics. We compared XHQE with existing algorithms such as Sampling, TreeSketch and one histogram-based algorithm. The experimental results showed that the XHQE is significantly better than other algorithms in terms of estimation accuracy and scalability for semi-uniform datasets. For non-uniform datasets, the proposed algorithm has comparable estimation accuracy to TreeSketch as the allocated memory size is highly reduced, yet the estimation data generation time of the proposed approach is much lower (e.g., TreeSketch took more than 50 times longer than that of the proposed approach for XMark dataset). Comparing to the histogram-based algorithm, our approach supports regular twig quires in addition to having higher accuracy when both run under similar memory constraints.

Original languageEnglish
Pages (from-to)1233-1249
Number of pages17
JournalInformation Systems Frontiers
Issue number6
StatePublished - 1 Dec 2016

Bibliographical note

Funding Information:
The first author would like to acknowledge the support provided by King Abdulaziz City for Science and Technology (KACST) through the Science & Technology Unit at King Fahd University of Petroleum & Minerals (KFUPM) for funding this work under Project no. 11-INF1658-04 as part of the National Science, Technology, and Innovation Plan.

Publisher Copyright:
© 2015, Springer Science+Business Media New York.


  • Query optimization
  • Selectivity estimation
  • Structural synopsis
  • Twig pattern matching
  • XML query processing

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Software
  • Information Systems
  • Computer Networks and Communications


Dive into the research topics of 'XHQE: A hybrid system for scalable selectivity estimation of XML queries'. Together they form a unique fingerprint.

Cite this