A Semi-Supervised Algorithm to Manage Communities of Interests

A Semi-Supervised Algorithm to Manage Communities of Interests

Pascal Francq (Paul Otlet Institute, Belgium & Université Catholique de Louvain, Belgium)
DOI: 10.4018/978-1-61520-841-8.ch006
OnDemand PDF Download:


This chapter presents a genetic algorithm, called the Similarity-based Clustering Genetic Algorithm (SCGA), used to group users‘ profiles. This algorithm is integrated in an approach which allows to share documents among users browsing a collection of documents. The users are described in terms of profiles, with each profile corresponding to one area of interest. While browsing through the collection of documents, users‘ profiles are computed. These profiles are then grouped into communities of interests using the SCGA which is based on the Grouping Genetic Algorithm (GGA). In fact, the SCGA can solve other similar problems under certain circumstances. The approach is part of a more generic model to manage information called the GALILEI Framework. This framework, which provides promising results, has been developed in a software platform available under the GNU GPL license.
Chapter Preview

The Galilei Framework

The main purpose of the GALILEI Framework2 is to propose an integrated model to manage digital information (mostly electronic documents). A complete description of the framework is outside the scope of this chapter, but an important feature is to identify users' interests as precisely as possible and consequently grouping them. The approach integrated in the GALILEI Framework, and called social browsing, proposes (a) to model the multiple interests of a given user as separate profiles (each profile corresponds to a particular interest), (b) to automatically compute the descriptions of these profiles based on documents assessments done by the corresponding users and (c) to group these profiles into communities of interests (one user belonging to as many communities as the number of his or her profiles).

In order to compute the profiles descriptions, the GALILEI framework uses the relevance assessments of documents read by the users for a particular profile3 and the analysis of their content (Technical Overview describes the document analysis process). Currently, three different assessments have been adopted:

  • 1.

    The document is relevant. A Web page about the Beatles is, for example, a relevant document for a “Beatles profile” representing a fan of this group.

  • 2.

    The document is partially relevant (fuzzy relevant), but does not fall exactly within the scope of the domain. A Web page about the Wings may be considered by a “Beatles profile” as fuzzy relevant since there are connections between the two groups (Paul McCartney plays in both).

  • 3.

    The document is outside the scope of the interest (irrelevant). The home page of Steve Jobs is (probably) completely irrelevant for a “Beatles profile” (Apple, the name of the company he founded, is not related with the record label of the Beatles while having the same name).

Complete Chapter List

Search this Book: