Abstract
Code smells indicate potential issues in Software design that can impact maintainability, testing and overall quality. Detecting them early is crucial for improving system reliability. While machine learning has been used for code smell detection, most studies focused on Java, with limited research on other languages. In this study, we empirically investigated the effectiveness of both deep learning and heterogeneous ensemble models in detecting multiple Python code smells, including Large Class, Long Method, Long Scope Chaining, Long Parameter List and Long Base Class List. We evaluated three heterogeneous ensemble models: Stacking, Hard Voting and Soft Voting ensembles, alongside three deep learning models: Convolutional Neural Networks, Long Short-Term Memory and Gated Recurrent Units. Each ensemble was built using eight base models, and the Wilcoxon test was used to assess performance differences. Results indicated that Stacking consistently outperformed other models with superior stability and detection performance. Convolutional Neural Networks performed well in some smells but struggled with complex nested structures, where ensemble models offered more stability. Hard and Soft Voting ensembles were competitive but less stable than Stacking. These findings highlight the potential of ensemble and deep learning models in enhancing Python code smell detection.
| Original language | English |
|---|---|
| Pages (from-to) | 963-986 |
| Number of pages | 24 |
| Journal | International Journal of Software Engineering and Knowledge Engineering |
| Volume | 35 |
| Issue number | 7 |
| DOIs | |
| State | Published - 1 Jul 2025 |
Bibliographical note
Publisher Copyright:© 2025 World Scientific Publishing Company.
Keywords
- Python
- code smell
- deep learning
- ensemble learning
- stacking
- voting
ASJC Scopus subject areas
- Software
- Computer Networks and Communications
- Computer Graphics and Computer-Aided Design
- Artificial Intelligence