Abstract
Scientific charts are an effective tool to visualize numerical data trends. They appear in a wide range of contexts, from experimental results in scientific papers to statistical anal- yses in business reports. The abundance of scientific charts in the web has made it inevitable for search engines to in- clude them as indexed content. However, the queries based on only the textual data used to tag the images can limit query results. Many studies exist to address the extraction of data from scientific diagrams in order to improve search results. In our approach to achieving this goal, we attempt to enhance the semantic labeling of the charts by using the original data values that these charts were designed to rep- resent. In this paper, we describe a method to extract data values from a specific class of charts, bar charts. The extrac- tion process is fully automated using image processing and text recognition techniques combined with various heuristics derived from the graphical properties of bar charts. The ex- tracted information can be used to enrich the indexing con- tent for bar charts and improve search results. We evaluate the effectiveness of our method on bar charts drawn from the web as well as charts embedded in digital documents.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015 |
| Publisher | Association for Computing Machinery |
| ISBN (Electronic) | 9781450338493 |
| DOIs | |
| State | Published - 7 Oct 2015 |
| Externally published | Yes |
| Event | 8th International Conference on Knowledge Capture, K-CAP 2015 - Palisades, United States Duration: 7 Oct 2015 → 10 Oct 2015 |
Publication series
| Name | Proceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015 |
|---|
Conference
| Conference | 8th International Conference on Knowledge Capture, K-CAP 2015 |
|---|---|
| Country/Territory | United States |
| City | Palisades |
| Period | 7/10/15 → 10/10/15 |
Bibliographical note
Publisher Copyright:© 2015 ACM.
Keywords
- Information extraction
- Scientific chart understanding
- Web search
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Information Systems
- Software