Analyzing the Anatomy of GNU/Linux Distributions: Methodology and Case Studies (Red Hat and Debian)
Jesús M. Gonzalez-Barahona (Universidad Rey Juan Carlos, Spain), Gregorio Robles (Universidad Rey Juan Carlos, Spain), Miguel Ortuno-Perez (Universidad Rey Juan Carlos, Spain), Luis Rodero-Merino (Universidad Rey Juan Carlos, Spain), José Centeno-Gonzalez (Universidad Rey Juan Carlos, Spain), Vicente Matellan-Olivera (Universidad Rey Juan Carlos, Spain), Eva Castro-Barbero (Universidad Rey Juan Carlos, Spain) and Pedro de-las-Heras-Quirós (Universidad Rey Juan Carlos, Spain)
Copyright: © 2005
GNU/Linux distributions are probably the largest coordinated pieces of software ever put together. Each one is in some sense a snapshot of a large fraction of the libre software development landscape at the time of the release and, therefore, its study is important to understand the appearance of that landscape. They are also the working proof of the possibility of releasing reliable software systems in the range of 50-100 millions of lines of code, even when the components of such systems are built by hundreds of independent groups of developers, with no formal connection to the group releasing the whole system. In this chapter, we provide some quantitative information about the software included in two such distributions: Red Hat and Debian. Differences in policy and organization of both distributions will show up in the results, but some common patterns will also arise. For instance, both are doubling their size every two years, and both present similar patterns in programming language usage and package size distributions. All in all, this study pretends to show how GNU/Linux distributions are with respect to their source code, and how they evolve over time. A methodology of how to make comparable and automated studies on this kind of distributions is also presented.