Reference Hub2
Improving Logging Prediction on Imbalanced Datasets: A Case Study on Open Source Java Projects

Improving Logging Prediction on Imbalanced Datasets: A Case Study on Open Source Java Projects

Sangeeta Lal (Jaypee Institute of Information Technology Noida, Department of CSE & IT, Noida, Uttar-Pradesh, India), Neetu Sardana (Jaypee Institute of Information Technology Noida, Department of CSE & IT, Noida, Uttar-Pradesh, India), and Ashish Sureka (ABB Corporate Research Center, Bangalore, India)
ISBN13: 9781799824602|ISBN10: 1799824608|EISBN13: 9781799824619
DOI: 10.4018/978-1-7998-2460-2.ch039
Cite Chapter Cite Chapter

MLA

Lal, Sangeeta, et al. "Improving Logging Prediction on Imbalanced Datasets: A Case Study on Open Source Java Projects." Cognitive Analytics: Concepts, Methodologies, Tools, and Applications, edited by Information Resources Management Association, IGI Global Scientific Publishing, 2020, pp. 740-772. https://doi.org/10.4018/978-1-7998-2460-2.ch039

APA

Lal, S., Sardana, N., & Sureka, A. (2020). Improving Logging Prediction on Imbalanced Datasets: A Case Study on Open Source Java Projects. In I. Management Association (Ed.), Cognitive Analytics: Concepts, Methodologies, Tools, and Applications (pp. 740-772). IGI Global Scientific Publishing. https://doi.org/10.4018/978-1-7998-2460-2.ch039

Chicago

Lal, Sangeeta, Neetu Sardana, and Ashish Sureka. "Improving Logging Prediction on Imbalanced Datasets: A Case Study on Open Source Java Projects." In Cognitive Analytics: Concepts, Methodologies, Tools, and Applications, edited by Information Resources Management Association, 740-772. Hershey, PA: IGI Global Scientific Publishing, 2020. https://doi.org/10.4018/978-1-7998-2460-2.ch039

Export Reference

Mendeley
Favorite

Abstract

Logging is an important yet tough decision for OSS developers. Machine-learning models are useful in improving several steps of OSS development, including logging. Several recent studies propose machine-learning models to predict logged code construct. The prediction performances of these models are limited due to the class-imbalance problem since the number of logged code constructs is small as compared to non-logged code constructs. No previous study analyzes the class-imbalance problem for logged code construct prediction. The authors first analyze the performances of J48, RF, and SVM classifiers for catch-blocks and if-blocks logged code constructs prediction on imbalanced datasets. Second, the authors propose LogIm, an ensemble and threshold-based machine-learning model. Third, the authors evaluate the performance of LogIm on three open-source projects. On average, LogIm model improves the performance of baseline classifiers, J48, RF, and SVM, by 7.38%, 9.24%, and 4.6% for catch-blocks, and 12.11%, 14.95%, and 19.13% for if-blocks logging prediction.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global Scientific Publishing bookstore.