WikiDoMiner: wikipedia domain-specific miner

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

We introduce WikiDoMiner - a tool for automatically generating domain-specific corpora by crawling Wikipedia. WikiDoMiner helps requirements engineers create an external knowledge resource that is specific to the underlying domain of a given requirements specification (RS). Being able to build such a resource is important since domain-specific datasets are scarce. WikiDoMiner generates a corpus by first extracting a set of domain-specific keywords from a given RS, and then querying Wikipedia for these keywords. The output of WikiDoMiner is a set of Wikipedia articles relevant to the domain of the input RS. Mining Wikipedia for domain-specific knowledge can be beneficial for multiple requirements engineering tasks, e.g., ambiguity handling, requirements classification, and question answering. WikiDoMiner is publicly available on Zenodo under an open-source license (https: //doi.org/10.5281/zenodo.6672682)

Original languageEnglish
Title of host publicationESEC/FSE 2022 - Proceedings of the 30th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
EditorsAbhik Roychoudhury, Cristian Cadar, Miryung Kim
PublisherAssociation for Computing Machinery, Inc
Pages1706-1710
Number of pages5
ISBN (Electronic)9781450394130
DOIs
StatePublished - 7 Nov 2022
Externally publishedYes
Event30th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022 - Singapore, Singapore
Duration: 14 Nov 202218 Nov 2022

Publication series

NameESEC/FSE 2022 - Proceedings of the 30th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Conference

Conference30th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022
Country/TerritorySingapore
CitySingapore
Period14/11/2218/11/22

Bibliographical note

Publisher Copyright:
© 2022 Owner/Author.

Keywords

  • Domain-specific Corpus Generation
  • Natural Language Processing
  • Natural-language Requirements
  • Requirements Engineering
  • Wikipedia

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software

Fingerprint

Dive into the research topics of 'WikiDoMiner: wikipedia domain-specific miner'. Together they form a unique fingerprint.

Cite this