Skip to main navigation Skip to search Skip to main content

Machine learning approaches for automated software traceability: A systematic literature review

  • Nouf Alturayeif*
  • , Jameleddine Hassine
  • , Irfan Ahmad
  • *Corresponding author for this work

Research output: Contribution to journalReview articlepeer-review

2 Scopus citations

Abstract

Software traceability is the process of tracking and managing relationships between software artifacts throughout the Software Development Life-Cycle (SDLC). It ensures that all software artifacts are correctly linked, facilitating change management, impact analysis, and regulatory compliance. Automated traceability can be achieved using Information Retrieval (IR) and Machine Learning (ML) approaches. This systematic literature review summarizes and synthesizes ML-based automated traceability studies. Considering the rapid ML advancements, analyzing current research is crucial for progress in the field. We identified 59 studies published between 2014 and June 2024. We found an increase in the publications, particularly in 2023 and continuing into 2024, with sustained citation impact. Around 170 datasets from different domains are used, covering natural and programming languages artifacts. Common artifacts include use cases and source code, focusing on Requirements Analysis and Implementation phases. Existing solutions mostly use classification and supervised learning, with the emerging use of deep learning and Large Language Models (LLMs), showing superior performance. We identified challenges and gaps with future trends to guide researchers. Challenges include imbalanced datasets, data scarcity, and limited real-world data, while gaps include handling missing true links, lack of benchmark datasets, and limited exploration of LLMs. Lastly, we provide recommendations for researchers based on the findings.

Original languageEnglish
Article number112536
JournalJournal of Systems and Software
Volume230
DOIs
StatePublished - Dec 2025

Bibliographical note

Publisher Copyright:
© 2025

Keywords

  • Deep learning
  • Machine learning
  • Software traceability
  • Systematic literature review
  • Transfer learning

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Machine learning approaches for automated software traceability: A systematic literature review'. Together they form a unique fingerprint.

Cite this