Integrating Projects from Multiple Open Source Code Forges

Integrating Projects from Multiple Open Source Code Forges

Megan Squire (Elon University, USA)
DOI: 10.4018/978-1-60960-513-1.ch003

Abstract

Much of the data about free, libre, and open source (FLOSS) software development comes from studies of code forges or code repositories used for managing projects. This paper presents a method for integrating data about open source projects by way of matching projects (entities) across multiple code forges. After a review of the relevant literature, a few of the methods are chosen and applied to the FLOSS domain, including a comparison of some simple scoring systems for pairwise project matches. Finally, the paper describes limitations of this approach and recommendations for future work.
Chapter Preview
Top

About Entity Matching

The act of integrating multiple data sets and finding the resulting duplicate records (“matches”) is nearly as old as database processing itself. In practice and in the literature, this set of processes is known by many names (Bitton and DeWitt, 1983; Hernandez and Stolfo, 1985; Winkler, 1999; Garcia-Molina, 2006): merge/purge, object identification, object matching, object consolidation, record linkage, entity matching, entity resolution, reference reconciliation, deduplication, duplicate identification, and name disambiguation. The term entity matching will be used in this article.

Complete Chapter List

Search this Book:
Reset