Abstract
We introduce WikiDoMiner - a tool for automatically generating domain-specific corpora by crawling Wikipedia. WikiDoMiner helps requirements engineers create an external knowledge resource that is specific to the underlying domain of a given requirements specification (RS). Being able to build such a resource is important since domain-specific datasets are scarce. WikiDoMiner generates a corpus by first extracting a set of domain-specific keywords from a given RS, and then querying Wikipedia for these keywords. The output of WikiDoMiner is a set of Wikipedia articles relevant to the domain of the input RS. Mining Wikipedia for domain-specific knowledge can be beneficial for multiple requirements engineering tasks, e.g., ambiguity handling, requirements classification, and question answering. WikiDoMiner is publicly available on Zenodo under an open-source license (https: //doi.org/10.5281/zenodo.6672682)
| Original language | English |
|---|---|
| Title of host publication | ESEC/FSE 2022 - Proceedings of the 30th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering |
| Editors | Abhik Roychoudhury, Cristian Cadar, Miryung Kim |
| Publisher | Association for Computing Machinery, Inc |
| Pages | 1706-1710 |
| Number of pages | 5 |
| ISBN (Electronic) | 9781450394130 |
| DOIs | |
| State | Published - 7 Nov 2022 |
| Externally published | Yes |
| Event | 30th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022 - Singapore, Singapore Duration: 14 Nov 2022 → 18 Nov 2022 |
Publication series
| Name | ESEC/FSE 2022 - Proceedings of the 30th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering |
|---|
Conference
| Conference | 30th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022 |
|---|---|
| Country/Territory | Singapore |
| City | Singapore |
| Period | 14/11/22 → 18/11/22 |
Bibliographical note
Publisher Copyright:© 2022 Owner/Author.
Keywords
- Domain-specific Corpus Generation
- Natural Language Processing
- Natural-language Requirements
- Requirements Engineering
- Wikipedia
ASJC Scopus subject areas
- Artificial Intelligence
- Software