Python code smells detection using conventional machine learning models

Rana Sandouka, Hamoud Aljamaan*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

16 Scopus citations

Abstract

Code smells are poor code design or implementation that affect the code maintenance process and reduce the software quality. Therefore, code smell detection is important in software building. Recent studies utilized machine learning algorithms for code smell detection. However, most of these studies focused on code smell detection using Java programming language code smell datasets. This article proposes a Python code smell dataset for Large Class and Long Method code smells. The built dataset contains 1,000 samples for each code smell, with 18 features extracted from the source code. Furthermore, we investigated the detection performance of six machine learning models as baselines in Python code smells detection. The baselines were evaluated based on Accuracy and Matthews correlation coefficient (MCC) measures. Results indicate the superiority of Random Forest ensemble in Python Large Class code smell detection by achieving the highest detection performance of 0.77 MCC rate, while decision tree was the best performing model in Python Long Method code smell detection by achieving the highest MCC Rate of 0.89.

Original languageEnglish
Article numbere1370
JournalPeerJ Computer Science
Volume9
DOIs
StatePublished - 2023

Bibliographical note

Publisher Copyright:
© 2023 Sandouka and Aljamaan

Keywords

  • Code smell
  • Detection
  • Large class
  • Long method
  • Machine learning
  • Python

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Python code smells detection using conventional machine learning models'. Together they form a unique fingerprint.

Cite this