Automatic extraction of data from bar charts

  • Rabah A. Al-Zaidy
  • , C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

47 Scopus citations

Abstract

Scientific charts are an effective tool to visualize numerical data trends. They appear in a wide range of contexts, from experimental results in scientific papers to statistical anal- yses in business reports. The abundance of scientific charts in the web has made it inevitable for search engines to in- clude them as indexed content. However, the queries based on only the textual data used to tag the images can limit query results. Many studies exist to address the extraction of data from scientific diagrams in order to improve search results. In our approach to achieving this goal, we attempt to enhance the semantic labeling of the charts by using the original data values that these charts were designed to rep- resent. In this paper, we describe a method to extract data values from a specific class of charts, bar charts. The extrac- tion process is fully automated using image processing and text recognition techniques combined with various heuristics derived from the graphical properties of bar charts. The ex- tracted information can be used to enrich the indexing con- tent for bar charts and improve search results. We evaluate the effectiveness of our method on bar charts drawn from the web as well as charts embedded in digital documents.

Original languageEnglish
Title of host publicationProceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450338493
DOIs
StatePublished - 7 Oct 2015
Externally publishedYes
Event8th International Conference on Knowledge Capture, K-CAP 2015 - Palisades, United States
Duration: 7 Oct 201510 Oct 2015

Publication series

NameProceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015

Conference

Conference8th International Conference on Knowledge Capture, K-CAP 2015
Country/TerritoryUnited States
CityPalisades
Period7/10/1510/10/15

Bibliographical note

Publisher Copyright:
© 2015 ACM.

Keywords

  • Information extraction
  • Scientific chart understanding
  • Web search

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Information Systems
  • Software

Fingerprint

Dive into the research topics of 'Automatic extraction of data from bar charts'. Together they form a unique fingerprint.

Cite this