Exploiting Parts-of-Speech for effective automated requirements traceability

Nasir Ali, Haipeng Cai, Abdelwahab Hamou-Lhadj*, Jameleddine Hassine

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

23 Scopus citations

Abstract

Context: Requirement traceability (RT) is defined as the ability to describe and follow the life of a requirement. RT helps developers ensure that relevant requirements are implemented and that the source code is consistent with its requirement with respect to a set of traceability links called trace links. Previous work leverages Parts Of Speech (POS) tagging of software artifacts to recover trace links among them. These studies work on the premise that discarding one or more POS tags results in an improved accuracy of Information Retrieval (IR) techniques. Objective: First, we show empirically that excluding one or more POS tags could negatively impact the accuracy of existing IR-based traceability approaches, namely the Vector Space Model (VSM) and the Jensen Shannon Model (JSM). Second, we propose a method that improves the accuracy of IR-based traceability approaches. Method: We developed an approach, called ConPOS, to recover trace links using constraint-based pruning. ConPOS uses major POS categories and applies constraints to the recovered trace links for pruning as a filtering process to significantly improve the effectiveness of IR-based techniques. We conducted an experiment to provide evidence that removing POSs does not improve the accuracy of IR techniques. Furthermore, we conducted two empirical studies to evaluate the effectiveness of ConPOS in recovering trace links compared to existing peer RT approaches. Results: The results of the first empirical study show that removing one or more POS negatively impacts the accuracy of VSM and JSM. Furthermore, the results from the other empirical studies show that ConPOS provides 11%-107%, 8%-64%, and 15%-170% higher precision, recall, and mean average precision (MAP) than VSM and JSM. Conclusion: We showed that ConPos outperforms existing IR-based RT approaches that discard some POS tags from the input documents.

Original languageEnglish
Pages (from-to)126-141
Number of pages16
JournalInformation and Software Technology
Volume106
DOIs
StatePublished - Feb 2019

Bibliographical note

Publisher Copyright:
© 2018 Elsevier B.V.

Keywords

  • Information retrieval (IR)
  • Parts of Speech (POS)
  • Requirements traceability (RT)
  • Trace links

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Exploiting Parts-of-Speech for effective automated requirements traceability'. Together they form a unique fingerprint.

Cite this