Abstract
Categorizing the reported software bugs into their types is a vital aspect of software development and maintenance. This procedure is initially handled manually by a bug triage. However, the classification approach should be automated to facilitate and improve the process. This research aims to enhance the predictive performance of machine learning models in classifying bug reports. The study proposes a novel framework for integrating chi-square for feature selection with stacked generalization ensemble-based models into the bug report classification process. The study involves an empirical investigation utilizing a set of seven base classifiers and three meta-classifiers (Logistic Regression (LoR), Naive Bayes (NB), and Multilayer Perceptron (MLP)) to construct the stacking ensemble. The models were trained on two open-source Java datasets using the textual data fields for the reported bug. Features were extracted using different variants of N-grams, including uni-grams, bi-grams, and tri-grams. The chi-square feature selection technique was applied to reduce the high dimensionality and select only the informative features. The experimental results were evaluated using the Matthews correlation coefficient and F1 metric and compared with state-of-the-art bug classification methods. The results show that the stacking models’ performance is comparatively higher than the standalone classifiers in almost all cases and for both datasets. Increasing the dataset size for all three stacked models improves the chances of achieving higher performance. The analytical comparison among the three stacking models and the statistical results using the Wilcoxon signed-rank test showed that MLP-Stacked and LoR-Stacked ensemble models were the best-performing classifiers among the other models.
Original language | English |
---|---|
Journal | Innovations in Systems and Software Engineering |
DOIs | |
State | Accepted/In press - 2024 |
Bibliographical note
Publisher Copyright:© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.
Keywords
- Bug classification
- Feature selection
- Machine learning
- Software repositories
- Stacked ensemble learning
ASJC Scopus subject areas
- Software