Evaluating the Veracity of Software Bug Reports using Entropy-based Measures

Evaluating the Veracity of Software Bug Reports using Entropy-based Measures

Madhu Kumari, V. B. Singh, Meera Sharma
Copyright: © 2022 |Pages: 21
DOI: 10.4018/IJOSSP.315280
(Individual Articles)
No Current Special Offers


The wide usage of open source software (OSS) results in an increase of bug data forming an integral part of the extensive data ecosystem. This bug report data needs to be analyzed for bug fixing and prediction of various important attributes like bug severity, priority, fix time, assignees, etc. The increased volume of bug data and different bug reporters from different geographical locations make veracity an important concern. We assume that the bug reports (i.e., different bug attributes) reported in software bug repositories are trustworthy during the bug triaging process. In reality, the bug report data are not trustworthy regarding various aspects like integrity, authenticity, and trusted origin as the bugs are reported by users who may or may not have proper knowledge of the software. In this paper, we proposed entropy-based models for veracity estimation of different bug attributes.
Article Preview

1. Introduction

Everyday data production of IBM is 2.5 quintillion bytes (Dev, 2015). The wide usage of social media results in an exponential increase in data (Wang et al., 2013). On Twitter, around 340 million tweets are posted daily (Wang et al., 2013). This shows that social media data forms an integral part of big data. Data from Twitter has been used in various studies, like crime rate prediction (Gerber, 2014), stock price movement prediction (Chung & Liu, 2011), National Football League prediction (Sinha et al., 2013), information dissemination (Zaman et al.,2010), Box office collection (Asur & Huberman, 2010) and US primary elections (Tumasjan et al., 2010). Data veracity is the degree to which data is accurate, precise, and trusted. In reality, data is often uncertain, imprecise, and difficult to trust. Studies (Tapia et al., 2013; Morstatter et al., 2013; Rubin & Lukoianova, 2013; Hutton & Henderson, 2015; sanger et al., 2014 and Swamynathan et al., 2010) deal with the issues with the veracity of tweets in Twitter. Like social media data, open source software bug repositories have become an invaluable source of information for software developers and managers in bug triaging and fixing. Many authors have used the information present in bug reports to predict and analyze different bug attributes and bug fix time (Yu et al., 2010; Tian et al., 2012, 2013; Kanwal & Maqbool, 2012; Menzies and Marcus, 2008; Lamkanfi et al., 2010,2011; Chaturvedi & Singh, 2012a, 2012b; Lliev et al., 2012; Yang et al., 2012, 2014; Roy & Rossi, 2014; Anbalagan & Vouk, 2009; Bhattacharya & Neamtiu, 2011 and Giger et al., 2010) The bug report data is being reported from hundreds of users from different geographical locations. The reporting of bugs keeps an irregular state and results in inaccurate and unverified bug data in bug repositories (Tamura and Yamada, 2009). Thus, the bug data is not trustworthy and veracity is an important issue that needs to be addressed. Different groups of users based in various places work for the development of open source software. Along with the software source code, we also obtain data from open source software development projects' history repositories, bug repositories, source code repositories, deployment logs, etc. The information is produced at several sites and kept in repositories. It is clear that data may contain uncertainty and irregularity because they are produced by various users in various areas.

Complete Article List

Search this Journal:
Volume 14: 1 Issue (2023)
Volume 13: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 1 Issue (2015)
Volume 5: 3 Issues (2014)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing