On the Geographic Allocation of Open Source Software Activities

On the Geographic Allocation of Open Source Software Activities

Sebastian von Engelhardt, Andreas Freytag, Christoph Schulz
DOI: 10.4018/978-1-4666-7230-7.ch065
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

This article contributes to research on the geographic origin of open source software (OSS) developers by analyzing the geographic allocation of active OSS developers and OSS activities. Based on data from the SourceForge Research Data Archive, the authors exploit information about developers' IP address, email address, and indicated time-zone. This enables them to assign 94% of all registered users in 2006. As proxy for activity the authors use information about the number of posted messages. Thus they provide a detailed picture of the world-wide allocation of OSS activities. Such country data about the supply-side of OSS is a valuable stock for both cross-country studies on OSS and country-specific research and policy advice.
Chapter Preview
Top

Introduction

Open source software (OSS) is developed by communities that include hobbyists as well as companies, and the source code—the human-readable recipe—is ‘open’. This means that everyone has access to, and the right to read, modify, improve, redistribute and use the source code. Thus, OSS appears to be a case of a “private provision of a public good” (Johnson, 2002). As the community is often described as global, OSS seems to be a digital public good with a truly globalized private provision.

However, apart from anecdotal evidence for the internationality of certain OSS project teams, the question remains how global the OSS community actually is and how the supply side of OSS differs among countries. This has motivated researchers to study the geographical allocation of OSS developers. It turns out that the most OSS developers come from North America and Europe. This result is quite consistent regardless of the method used. The methods to gather information about the geographic origin of OSS developers can be broadly distinguished into two approaches. Some studies are based on survey-data, while other work is based on specific data drawn from code of certain OSS projects such as credit files, mailing lists or data from platforms like SourceForge.

Robles et al. (2001) provide a combination of both types of data collection. In Ghosh (2006); David et al. (2003) and Ghosh et al. (2002) one can find survey-based information about the origin of OSS developers. Lancashire (2001) provides information about the world-wide distribution of Linux and Gnome developers, based on data collected from the Linux Credit file and in case of Gnome developer-contact information from the project’s web-site. The most recent research dealing with the geographic origin of OSS developers is Gonzalez-Barahona et al. (2008), who provide a worldwide picture of OSS developers, weighted by population, internet users and GDP. Gonzalez-Barahona et al. (2008) build on Robles and Gonzalez-Barahona (2006). Robles and Gonzalez-Barahona (2006) use information about the email addresses of registered users and the indicated time-zone to assign developers at SourceForge in 2005 to their countries. However, they were unable to assign 25% to countries, because of the combination of a generic (non-country specific) Top Level Domain like .com with the country unspecific time zone GMT. Robles and Gonzalez-Barahona (2006) develop methods to estimate the geographic allocation of this 25%.

Our work is inspired by Gonzalez-Barahona et al. (2008) and Robles and Gonzalez-Barahona (2006), but proceeds along two lines: First, we do not have to estimate any geographic origins, since we can directly assign 94% of all developers registered at SourceForge in 2006. We make use of relevant information obtained from email, time zone and the Internet Protocol addresses. Combining these, we are able to assign 1.3 million developers to their countries without the need to estimate geographic origin. We cross-checked the results delivered by the different methods in order to have an indicator for the validity of each of our methods. Second, we provide information about how active each developer is. With individual data about the number of posted messages we have a good proxy for activity. We can thus distinguish active from non-active (but nevertheless registered) developers, and we are able to show the worldwide allocation of OSS activities. Information about activities are important, since members of the OSS community differ in their effort levels, numbers of contributions etc. (see e.g. David & Rullani, 2008).1 With the active developers and activity, our study can show a more accurate geography of the supply side of OSS development.

Complete Chapter List

Search this Book:
Reset