Abstract
Studies have shown that the lack of information about a bug often delays the bug report (BR) resolution process. Existing approaches rely mainly on BR descriptions as the main features for predicting BR fields. BR descriptions, however, tend to be informal and not always reliable. In this study, we show that the use of stack traces, a more formal source, and categorical features of BRs provides better accuracy than BR descriptions. We focus on the prediction of faulty components and products, two important BR fields, often used by developers to investigate a bug. Our method relies on mining historical BRs in order to predict faulty components and products of new incoming bugs. We map stack traces of historical BRs to feature vectors, weighted using TF-IDF. The vectors, together with a selected set of BR categorical information, are then fed to a classification algorithm. The method also tackles the problem of unbalanced data. Our approach achieves an average accuracy of 58% (when predicting faulty components) and 60% (when predicting faulty products) on Eclipse dataset and 70% (when predicting faulty components) and 70% (when predicting faulty products) on Gnome dataset. For both datasets, our approach improves over the method that uses BR descriptions by a large margin, up to an average of 46%.
Original language | English |
---|---|
Pages | 224-233 |
Number of pages | 10 |
State | Published - 2020 |
Bibliographical note
Publisher Copyright:© 2019 Copyright held by the owner/author(s).
Keywords
- Machine Learning
- Mining Software Repositories
- Software Bugs Reports
- Software Maintenance and Evolution
ASJC Scopus subject areas
- Computer Networks and Communications
- Software
- Computer Science Applications