Skip to main navigation Skip to search Skip to main content

A systematic literature review on cross-language source code clone detection

Research output: Contribution to journalReview articlepeer-review

1 Scopus citations

Abstract

Context Cross-language code Clone Detection (CLCCD) is crucial to maintaining consistency and minimizing redundancy in modern software development, where similar code may appear in different projects written in various programming languages. While previous reviews have explored code clone detection in general, none have exclusively focused on CLCCD. Objective This study aims to bridge this gap by reviewing the existing CLCCD approaches, focusing on detection techniques, preprocessing methods, feature extraction approaches, datasets, and evaluation metrics used. Method A systematic literature review (SLR) was conducted, analyzing 26 studies published in journals, conferences, and workshops until May 2025. Both quantitative and qualitative data were systematically analyzed to derive the findings. Results CLCCD has evolved from traditional techniques to deep learning models, but fully automated tools remain unavailable. Parsing (73 %), normalization (35 %), and tokenization (27 %) are widely used preprocessing techniques in CLCCD methods. Most studies (38.5 %) employ hybrid feature extraction, which combines tree-based and graph-based methods to capture code structure and semantics. However, the datasets primarily sourced from programming competition platforms lack diversity and standardization. Performance evaluation largely relies on metrics like precision, recall, and F1-score, while incorporating additional evaluation metrics could provide more insights into detection performance. Conclusion This SLR summarizes current CLCCD research, highlighting advancements and challenges. Significant gaps include the absence of diverse and standardized datasets and the limited exploration of advanced feature extraction techniques. Future research should focus on creating better datasets, adopting novel detection techniques, and exploring feature extraction methods to improve CLCCD performance for modern multi-language systems.

Original languageEnglish
Article number100786
JournalComputer Science Review
Volume58
DOIs
StatePublished - Nov 2025

Bibliographical note

Publisher Copyright:
© 2025 Elsevier Inc.

Keywords

  • CCD
  • CLCCD
  • Code clone
  • Cross-language code clone detection
  • SLR
  • Systematic literature review

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'A systematic literature review on cross-language source code clone detection'. Together they form a unique fingerprint.

Cite this