Predicting bug report fields using stack traces and categorical attributes

Korosh K. Sabor, Abdelwahab Hamou-Lhadj, Abdelaziz Trabelsi, Jameleddine Hassine

Research output: Contribution to conferencePaperpeer-review

5 Scopus citations

Abstract

Studies have shown that the lack of information about a bug often delays the bug report (BR) resolution process. Existing approaches rely mainly on BR descriptions as the main features for predicting BR fields. BR descriptions, however, tend to be informal and not always reliable. In this study, we show that the use of stack traces, a more formal source, and categorical features of BRs provides better accuracy than BR descriptions. We focus on the prediction of faulty components and products, two important BR fields, often used by developers to investigate a bug. Our method relies on mining historical BRs in order to predict faulty components and products of new incoming bugs. We map stack traces of historical BRs to feature vectors, weighted using TF-IDF. The vectors, together with a selected set of BR categorical information, are then fed to a classification algorithm. The method also tackles the problem of unbalanced data. Our approach achieves an average accuracy of 58% (when predicting faulty components) and 60% (when predicting faulty products) on Eclipse dataset and 70% (when predicting faulty components) and 70% (when predicting faulty products) on Gnome dataset. For both datasets, our approach improves over the method that uses BR descriptions by a large margin, up to an average of 46%.

Original languageEnglish
Pages224-233
Number of pages10
StatePublished - 2020

Bibliographical note

Publisher Copyright:
© 2019 Copyright held by the owner/author(s).

Keywords

  • Machine Learning
  • Mining Software Repositories
  • Software Bugs Reports
  • Software Maintenance and Evolution

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Predicting bug report fields using stack traces and categorical attributes'. Together they form a unique fingerprint.

Cite this