Abstract
The continuing growth of published scholarly content on the web ensures the availability of the most recent scient findings to researchers. Scholarly documents, such as research articles, are easily accessed by using academic search engines that are built on large repositories of scholarly documents. Scienti.c information extraction from documents into a structured knowledge graph representation facilitates automated machine understanding of a document's content. Traditional information extraction approaches, that either require training samples or a preexisting knowledge base to assist in the extraction, can be challenging when applied to large repositories of digital documents. Labeled training examples for such large scale are diicult to obtain for such datasets. Also, most available knowledge bases are built from web data and do not have suicient coverage to include concepts found in scienti.c articles. In this paper we aim to construct a knowledge graph from scholarly documents while addressing both these issues. We propose a fully automatic, unsupervised system for scienti.c information extraction that does not build on an existing knowledge base and avoids manually-tagged training data. We describe and evaluate a constructed taxonomy that contains over 15k entities resulting from applying our approach to 10k documents.
Original language | English |
---|---|
Title of host publication | DocEng 2017 - Proceedings of the 2017 ACM Symposium on Document Engineering |
Publisher | Association for Computing Machinery, Inc |
Pages | 149-152 |
Number of pages | 4 |
ISBN (Electronic) | 9781450346894 |
DOIs | |
State | Published - 31 Aug 2017 |
Externally published | Yes |
Event | 17th ACM Symposium on Document Engineering, DocEng 2017 - Valletta, Malta Duration: 4 Sep 2017 → 7 Sep 2017 |
Publication series
Name | DocEng 2017 - Proceedings of the 2017 ACM Symposium on Document Engineering |
---|
Conference
Conference | 17th ACM Symposium on Document Engineering, DocEng 2017 |
---|---|
Country/Territory | Malta |
City | Valletta |
Period | 4/09/17 → 7/09/17 |
Bibliographical note
Publisher Copyright:© 2017 ACM.
Keywords
- Knowledge base
- Scholarly documents
- Taxonomy construction
ASJC Scopus subject areas
- Software
- Information Systems
- Computer Science Applications