FLOSSmole: A Collaborative Repository for FLOSS Research Data and Analyses

James Howison (Syracuse University, USA), Megan Conklin (Elon University, USA) and Kevin Crowston (Syracuse University , USA)
DOI: 10.4018/978-1-60566-418-7.ch002
This paper introduces and expands on previous work on a collaborative project, called FLOSSmole (formerly OSSmole), designed to gather, share and store comparable data and analyses of free, libre, and open source software (FLOSS) development for academic research. The project draws on the ongoing collection and analysis efforts of many research groups, reducing duplication, and promoting compatibility both across sources of FLOSS data and across research groups and analyses. The paper outlines current difficulties with the current typical quantitative FLOSS research process and uses these to develop requirements and presents the design of the system.
Background Of Problem

Obtaining data on FLOSS projects is both easy and difficult. It is easy because FLOSS development utilizes computer-mediated communications heavily for both development team interactions and for storing artifacts such as code and documentation. This way of developing software leaves a freely available and, in theory at least, highly accessible trail of data upon which many academics have built interesting analyses about optimal organization of development teams, economics of building software in the commons, and the like. Yet, despite this presumed plethora of data, researchers often face significant practical challenges in using this data to construct a collaborative and deliberative research discourse. In Figure 1, we outline the research process we believe is followed in much of the quantitative literature on FLOSS.

The typical quantitative FLOSS research process (notice its noncyclical and noncollaborative nature)

The first step in collecting online FLOSS data is selecting which projects and which attributes to study, two techniques often used in estimation and selection are census and sampling. (Case studies are also used but these will not be discussed in this article.)

Conducting a census means to examine all cases of a phenomena, taking the measures of interest to build up an entire accurate picture. Taking a census is difficult in FLOSS for a number of reasons. First, it is hard to know how many FLOSS projects there are “out there,” and it is hard to know which projects should actually be included. For example, are corporate-sponsored projects part of the phenomenon or not? Do single-person projects count? What about school projects?

Second, the projects themselves, and the records they leave, are scattered across a surprisingly large number of locations. It is true that many are located in the major general repositories, such as Sourceforge2 and GNU Savannah.3 It is also true, however, that there are a number of other repositories of varying sizes and focuses (e.g., CodeHaus,4 CPAN5), and that many projects, including the well-known and much-studied Apache and Linux projects, prefer to use their own repositories and their own tools. This diversity of location effectively hides significant portions of the FLOSS world from attempts at census. Even if a full listing of projects and their locations could be collated, there is also the practical difficulty of dealing with the huge amount of data — sometimes years and years of e-mails, CVS, and bug tracker conversations — required to conduct certain comprehensive analyses.

Complete Chapter List

Editorial Advisory Board
Table of Contents
Chapter 1
Olivier Berger, Christian Bac, Benoît Hamet
Libre software provides powerful applications ready to be integrated for the build-up of platforms for internal use in organizations. We describe... Sample PDF
Integration of Libre Software Applications to Create a Collaborative Work Platform for Researchers at GET
Chapter 2
James Howison, Megan Conklin, Kevin Crowston
This paper introduces and expands on previous work on a collaborative project, called FLOSSmole (formerly OSSmole), designed to gather, share and... Sample PDF
FLOSSmole: A Collaborative Repository for FLOSS Research Data and Analyses
Chapter 3
Luis López-Fernández, Gregorio Robles, Jesus M. Gonzalez-Barahona, Israel Herraiz
Source code management repositories of large, long-lived libre (free, open source) software projects can be a source of valuable data about the... Sample PDF
Applying Social Network Analysis Techniques to Community-Driven Libre Software Projects
Chapter 4
Walt Scacchi, Chris Jensen, John Noll, Margaret Elliott
Understanding the context, structure, activities, and content of software development processes found in practice has been and remains a challenging... Sample PDF
Multi-Modal Modeling, Analysis, and Validation of Open Source Software Development Processes
Chapter 5
B. B. Rossi, M. Scotto, A. Sillitti, G. Succi
The aim of the paper is to report the results of a migration to Open Source Software (OSS) in one Public Administration. The migration focuses on... Sample PDF
An Empirical Study on the Migration to OpenOffice.org in a Public Administration
Chapter 6
Claudio Agostino Ardagna, Fulvio Frati, Gabriele Gianini
Business and recreational activities on the global communication infrastructure are increasingly based on the use of remote resources and services... Sample PDF
Open Source in Web-Based Applications: A Case Study on Single Sign-On
Chapter 7
Qusay H. Mahmoud, Zakaria Maamar
Conventional desktop software applications are usually designed, built, and tested on a platform similar to the one on which they will be deployed... Sample PDF
Engineering Wireless Mobile Applications
Chapter 8
G. Sivaradje, R. Nakkeeran, P. Dananjayan
In this paper, a novel prediction technique is proposed, which uses road topology information for prediction. The proposed scheme uses real time... Sample PDF
A Prediction Based Flexible Channel Assignment in Wireless Networks using Road Topology Information
Chapter 9
Hesham A. Ali, Tamer Ahmed Farrag
Due to the rapidly increasing of the mobile devices connected to the internet, a lot of researches are being conducted to maximize the benefit of... Sample PDF
High Performance Scheduling Mechanism for Mobile Computing Based on Self-Ranking Algorithm (SRA)
Chapter 10
Khaldoon Al-Zoubi
This paper proposes hierarchal scheduling schemes for Grid systems: a self-discovery scheme for the resource discovery stage and an adaptive child... Sample PDF
Hierarchical Scheduling in Heterogeneous Grid Systems
Chapter 11
Amjad Mahmood, Taher S.K. Homeed
Object replication is a well-known technique to improve performance of a distributed Web server system. This paper first presents an algorithm to... Sample PDF
Object Grouping and Replication on a Distributed Web Server System
Chapter 12
Saher S. Manaseer, Mohamed Ould-Khaoua, Lewis M. Mackenzie
In wireless communication environments, backoff is traditionally based on the IEEE binary exponential backoff (BEB). Using BEB results in a high... Sample PDF
On the Logarithmic Backoff Algorithm for MAC Protocol in MANETs
Chapter 13
Xunhua Wang, David Rine
Domain Name System (DNS) is the system for the mapping between easily memorizable host names and their IP addresses. Due to its criticality, the... Sample PDF
Secure Online DNS Dynamic Updates: Architecture and Implementation
Chapter 14
Osama H.S. Khader
In mobile ad hoc networks, routing protocols are becoming more complicated and problematic. Routing in mobile ad hoc networks is multi-hop because... Sample PDF
FSR Evaluation Using the Suboptimal Operational Values
Chapter 15
Suet Chun Lee
Software product line (SPL) is a software engineering paradigm for software development. A software product within a product line often has specific... Sample PDF
Modeling Variant User Interfaces for Web-Based Software Product Lines
Chapter 16
M. Brian Blake, Lisa Singh, Andrew B. Williams, Wendell Norman, Amy L. Sliva
Organizations are beginning to apply data mining and knowledge discovery techniques to their corporate data sets, thereby enabling the... Sample PDF
Experience Report: A Component-Based Data Management and Knowledge Discovery Framework for Aviation Studies
Chapter 17
A. F. Tappenden, T. Huynh, J. Miller, A. Geras, M. Smith
This article outlines a four-point strategy for the development of secure Web-based applications within an agile development framework and... Sample PDF
Agile Development of Secure Web-Based Applications
Chapter 18
D. Xuan Le, J. Wenny Rahayu, David Taniar
This paper proposes a data warehouse integration technique that combines data and documents from different underlying documents and database design... Sample PDF
Web Data Warehousing Convergence: From Schematic to Systematic
Chapter 19
Haya El-Ghalayini, Mohammed Odeh, Richard McClatchey
This paper studies the differences and similarities between domain ontologies and conceptual data models and the role that ontologies can play in... Sample PDF
Engineering Conceptual Data Models from Domain Ontologies: A Critical Evaluation
Chapter 20
John D. Ferguson, James Miller
It is now widely accepted that software projects utilizing the Web (e-projects) face many of the same problems and risks experienced with more... Sample PDF
Modeling Defects in E-Projects
Chapter 21
Jaime Gomez, Alejandro Bia, Antonio Parraga
This paper describes the engineering foundations of VisualWADE, a CASE tool to automate the production of Web applications. VisualWADE follows a... Sample PDF
Tool Support for Model-Driven Development of Web Applications
About the Editors