Using Kolmogorov Complexity to Study the Coevolution of Header Files and Source Files of C-alike Programs

Using Kolmogorov Complexity to Study the Coevolution of Header Files and Source Files of C-alike Programs

Liguo Yu (Indiana University South Bend, Computer Science Department, South Bend, IN, USA)
Copyright: © 2017 |Pages: 10
DOI: 10.4018/IJKSS.2017040102
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

In C-alike programs, the source code is separated into header files and source files. During the software evolution process, both these two kinds of files need to adapt to changing requirement and changing environment. This paper studies the coevolution of header files and source files of C-alike programs. Using normalized compression distance that is derived from Kolmogorov complexity, we measure the header file difference and source file difference between versions of an evolving software product. Header files distance and source files distance are compared to understand their difference in pace of evolution. Mantel tests are performed to investigate the correlation of header file evolution and source file evolution. The study is performed on the source code of Apache HTTP web server.
Article Preview

Introduction

In C, C++, or Objective-C programs, there are two types of code files: header files and source files. Header files contain the declarations of classes, functions, global variables, and identifiers. Source files contain the implementations of classes and functions and the usages of variables and identifiers. The purpose of this organization is to separate program structure and interface from specific implementations (Stroustrup, 2013). On the other hand, software evolution is inevitable. Software products need to adapt to new requirements and new environments (Bendifallah, 1987; Gall et al., 1997; Kemerer & Slaughter, 1999; Lehman, 1980; Yu & Ramaswamy, 2009b; Yu & Ramaswamy, 2006). In this evolution process, both the program structure (interface) and the program implementation will need to be changed.

Because header files contain the program structure and interface, they are expected to be more robust than source files to requirement changes and environment changes. However, there has been no empirical research to validate this speculation. The objective of this study is to understand and compare the evolution of header files and source files. Using normalized compression distance that is derived from Kolmogorov complexity, we measure the difference of versions of header files and difference of versions of source files. Our study is performed on the source code of Apache HTTP, a web server written in C. Through this study, our objective is to answer the following two questions:

  • 1.

    In the source code of Apache HTTP, do header files and source files have similar paces of evolution?

  • 2.

    In the source code of Apache HTTP, is the evolution of header files correlated with the evolution of source files?

To answer these two questions, the remainder of the paper is organized as follows. Section 2 reviews Kolmogorov complexity and normalized compression distance. Section 3 describes our research method and research data. Section 4 presents the results of this study. Conclusions appear in Section 5.

Kolmogorov Complexity And Normalized Compression Distance

In algorithmic information theory, the Kolmogorov complexity of an object, such as a string (a piece of text), is the length of the string’s shortest description in some fixed universal description language (Li & Vitányi, 2009). The Kolmogorov complexity of a binary string x can be represented by denoting the shortest length of string x described using a universal language.

Based on Kolmogorov complexity, Bennett et al. (1998) defined information distance between two binary strings x and y as the following.

(1)

In Equation 1, denotes the conditional Kolmogorov complexity of string x given string y. Information distance measures the absolute distance between two strings x and y and thus does not reflect the relative difference between the two strings. Li et al. [10] then defined normalized information distance between two binary strings x and y.

(2)

Equation 2 is based on Kolmogorov complexity, which is non-computable in practice. To make the information distance more practical to use, Li et al. (2004) further substituted Kolmogorov complexity of a string x with the optimal compressed length of string x. The normalized compression distance between two binary strings x and y is defined below (Li et al., 2004).

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 8: 4 Issues (2017): 3 Released, 1 Forthcoming
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing