This chapter discusses the design and the implementation of a recommender system for open source projects on GitHub using the collaborative-filtering approach. Having such a system can be helpful for many developers, especially those who search for a particular project based on their interests. It can also reduce searching time and make search results more relevant. The system presented in this chapter was evaluated on a real-world dataset and using various evaluation metrics. Results obtained from these experiments are very promising. The authors found that their recommender system can reach better precision and recall accuracy.
TopIntroduction
GitHub is a very popular crowdfunding software development platform, a social coding platform and a web based Git repository hosting service, allowing anyone to participate in open source project documentation, design, coding and testing in a social way. In order to participate in these activities, a developer must create an account, allowing him to share his own projects, forking other’s projects and following other developers, Figure 1 shows a sample GitHub profile.
One of the most helpful implemented features on GitHub is the fork feature, which means making a full copy of the repository of the original project. Forking a repository allows the developer to freely experiment with the project without affecting the original copy, forking is considered as the first task to do in order to make contributions to an existing project. Another implemented feature is the Star feature, when a developer gives a star to a repository it means that he is interested in this project. For example, a developer who is interested in mobile game development may give stars to some 2d mobile game libraries like: AndEngine, LibGDX, cocos1d-x and others.
Figure 1. Example of a GitHub profile page
Developers are always searching for good open source projects to make project prototypes or to enhance their own software projects with new features, GitHub provides them a search functionality to do this manually without any automatic recommendations provided, Figure 2 shows a sample search page. However, searching for suitable repositories can be a difficult task and may take a long time, it can also interpret the development process of a project, for that reason the existence of an automatic recommender system for GitHub repositories may be very helpful for developers to reduce search time and make search results more relevant and organized, these are the main benefits of such a system for all developers. However, developers may benefit differently from it according to their profile type and their professional skills, for instance: a professional developer is probably searching for new programming challenges or even for business opportunities, while a beginner is probably looking for good stuff to learn something new or to improve its skills, or he is simply searching for repositories to work on. The issue that arises in these cases is how we can find a relevant content on GitHub and recommend it to a user.
In this paper, the authors present a new system for recommending relevant GitHub repositories for developers; they use a collaborative-filtering approach and they model the user behaviors as a User-Item matrix so they can apply different recommendation methods like calculating similarities between users (developers) and items (repositories) and so on. Then, the authors evaluate their recommender system on a real data set using well-known evaluation metrics, the design and the implementation of this system will be discussed in detail in later sections.
The main contributions of the authors in this paper are as follows:
- •
They address a new problem which is the recommendation of code to developers, they study the problem of finding and recommending relevant repositories on GitHub website.
- •
They propose a new recommender system based on collaborative filtering techniques to recommend relevant repositories for developers on the GitHub website.
- •
They investigate the performance of their system by testing it on a real dataset; they perform technical experiments using well-known metrics to show the effectiveness of their proposed approach.
- •
They develop a small prototype to show system functionalities and how developers can benefit from it.
The outline of this paper is given as follows: Section 2 presents related work done in the field of collaborative-filtering and content –based techniques and shows some related work on GitHub. Section 3 defines the problem and illustrates a use case example. In section 4 the authors present their system approach and its architecture. Section 5 details the used data set. Section 6 illustrates the obtained results from our system evaluation. Finally, section 7 concludes this paper with conclusions and some future works.
Figure 2. Sample search results on GitHub