Abstract
Software traceability is the process of tracking and managing relationships between software artifacts throughout the Software Development Life-Cycle (SDLC). It ensures that all software artifacts are correctly linked, facilitating change management, impact analysis, and regulatory compliance. Automated traceability can be achieved using Information Retrieval (IR) and Machine Learning (ML) approaches. This systematic literature review summarizes and synthesizes ML-based automated traceability studies. Considering the rapid ML advancements, analyzing current research is crucial for progress in the field. We identified 59 studies published between 2014 and June 2024. We found an increase in the publications, particularly in 2023 and continuing into 2024, with sustained citation impact. Around 170 datasets from different domains are used, covering natural and programming languages artifacts. Common artifacts include use cases and source code, focusing on Requirements Analysis and Implementation phases. Existing solutions mostly use classification and supervised learning, with the emerging use of deep learning and Large Language Models (LLMs), showing superior performance. We identified challenges and gaps with future trends to guide researchers. Challenges include imbalanced datasets, data scarcity, and limited real-world data, while gaps include handling missing true links, lack of benchmark datasets, and limited exploration of LLMs. Lastly, we provide recommendations for researchers based on the findings.
| Original language | English |
|---|---|
| Article number | 112536 |
| Journal | Journal of Systems and Software |
| Volume | 230 |
| DOIs | |
| State | Published - Dec 2025 |
Bibliographical note
Publisher Copyright:© 2025
Keywords
- Deep learning
- Machine learning
- Software traceability
- Systematic literature review
- Transfer learning
ASJC Scopus subject areas
- Software
- Information Systems
- Hardware and Architecture
Fingerprint
Dive into the research topics of 'Machine learning approaches for automated software traceability: A systematic literature review'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver