Locating Faulty Source Code Files to Fix Bug Reports

Locating Faulty Source Code Files to Fix Bug Reports

Abeer Hamdy, Abdelrahman E. Arabi
Copyright: © 2022 |Pages: 15
DOI: 10.4018/IJOSSP.308791
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Open source software is usually released while it still contains bugs. In order to fix a reported bug during maintenance phase, the developer has to search the source code files to identify the faulty ones; this process is called bug localization (BL). Automating BL is a necessity to boost the developer's productivity and enhance the software quality. The paper proposes an information retrieval based approach for retrieving and ranking a list of suspicious faulty source files relevant to a submitted bug report (BR). The proposed approach leverages textual features of the BRs and source files, which are parts-of-speech tagging, lexical and semantic similarity between the source files and BRs, in addition to the source file change history. The effectiveness of the proposed approach was evaluated over three open-source software repositories. Experimental results showed the superiority of the proposed approach over eight previous approaches in terms of top@N and MAP metrics.
Article Preview
Top

Introduction

Usually, open source software projects are released while they still contain bugs. So, projects utilize bug tracking systems, such as Bugzilla, to manage the bug fixes during the maintenance phases (Hamdy & El-Laithy, 2020). When a bug is found by a user or a developer, it is reported through the bug tracking system by means of a bug report. The bug report is a description of the bug in natural language. Sometimes stack traces are copied to the bug report too. If the bug is confirmed, it is assigned to a developer (bug fixer) to fix it (Hamdy & El-laithy, 2019). The bug fixer searches the project source code repository to locate the faulty source files in order to fix the bug; this process is called bug localization. Bug localization is a time consuming task, especially for large software projects. The first reason is that it is hard to locate the faulty source file(s). The second reason, is that there is usually a large number of bugs, e.g. at the early releases of Mozilla and Eclipse, about 170 and 120 bugs respectively were reported daily (Hamdy & El-Laithy, 2020).

Information retrieval (IR) techniques have been widely used for automating bug localization task, where the submitted bug report is treated as a query, then the top N similar source files are retrieved and ranked (Akbar & Kak, Jun. 2020). IR-based bug localization approaches could be utilized with software bug repositories of different sizes, however, their performance depend on the features extracted from each of the source files and bug reports such as: 1) Textual and semantic similarities between a submitted bug report and source files, 2) Similarity between a submitted bug report and previously fixed ones, 3) Change history of source code files. Authors leveraged one or more of these features (Zhou, Zhang, & Lo, 2012), (Zhou, Tong, Chen, & Han, Aug. 2017). The approach proposed by (Gharibi, Rasekh, Sadreddini, & Fakhrahmad, Nov. 2018) is one of the comprehensive approaches that utilized most of the important features except the source code change history. The change history of source code is a very important feature as the source files that were modified are likely to include bugs and may get complex. Furthermore, modifications are usually implemented under strict deadlines to reduce the cost. So, the developers usually do not take into consideration the guidelines for a clean code; which leads to the existence of code smells and consequently the occurrence of bugs (Hamdy & Tazy, 2020).

With the breakthrough of deep learning (DL) techniques and their performance advance in several fields including software engineering, e.g. Bug severity prediction (Hamdy & Ezzat, 2020), code smells detection (Hamdy & Tazy, 2020). Several DL-based bug localization approaches have been proposed in the literature (Y. Xiao, 2019), (Liang, Sun, Wang, & Yang, 2019), (Sanglea, Muvvaa, Chimalakondaa, Ponnalagub, & Venkoparao, 2020). In these approaches a deep learning model is trained to classify the source files as faulty or not, with regard to a submitted bug report. However, these models require a large amount of historical data (previously fixed bug reports) to be used in the training, so the trained model does not overfit. Consequently, DL- based approaches could be utilized only with very large software bug repositories, that include a vast number of previously fixed bug reports. Some authors (Sanglea, Muvvaa, Chimalakondaa, Ponnalagub, & Venkoparao, 2020) used oversampling techniques, in order to generate synthetic data to train the DL model.

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 14: 1 Issue (2023)
Volume 13: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 1 Issue (2015)
Volume 5: 3 Issues (2014)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing