Article Preview
TopIntroduction
Until the recent past, social network research was based on questionnaire data, reaching typically a few dozen individuals (Granovetter, 1992; Wasserman, 1994; White, 1976). The main advantage of this approach is that it can provide very detailed information concerning the ties between people: what sort of acquaintance is it based on, how intense is the relation, whether it is mutual or not, what is the emotional background behind the connection, etc. However, a major drawback is that the size of the sample that can be generated this way is very limited, and as long as it is based solely on the opinion of the surveyed people, the strength of the ties remains subjective.
A major shift in paradigm begun to take place in this field around the millennium, when large datasets describing various social relations between people have become available for research (Barabási, 2003; Mendes & Dorogovtsev, 2003; Watts & Strogatz, 1998). Due to the rapid development in informatics, the handling of social networks constructed from e-mail or phone-call records with more than a million nodes can be easily solved with present day computers. When compared to the questionnaire date, the information about the individual links is limited in these systems. However, the strength of the ties can be measured in more objective way, by e.g., aggregating the number of e-mails or phone-calls between the people. One of the most important new results obtained from the study of large scale social networks based on automated data collection was given by J.-P. Onnela et al. (Onella et al., 2007), providing empirical evidence for the famous Granovetter hypothesis (Granovetter, 1973).
In this chapter our focus is on the communities (modules, clusters, or cohesive groups) of large social networks, associated with more densely linked parts. These structural sub-units can correspond to families, friendship circles, or a tightly connected group of colleagues (Scott, 2000; Watts, Dodds, & Newman, 2002), and have no widely accepted unique definition (Everitt, 1993; Fortunato & Castellano, 2009; Girvan & Newman, 2002; Newman, 2004; Palla, Derényi, Farkas, & Vicsek, 2005; Radicchi, Castellano, Cecconi, Loreto, & Parisi, 2004; Shiffrin & Börner, 2004). Community finding turned out to be an important issue in other types of network systems as well (Knudsen, 2004), e.g., the location of multi-protein function al units in molecular biology (Ravasz, Somera, Mongru, Oltvai, & Barabási, 2002; Spirin & Mirny, 2003) or finding sets of tightly coupled stocks in economy (Heimo, Saramäki, Onnela, & Kaski, 2007; Onnela, Chakraborti, Kaski, Kertész, & Kanto, 2003) can be crucial to the understanding of the structural and functional properties of the systems under investigation. Due to the importance of communities in complex network theory, the set of available community finding methods is vast (Fortunato & Castellano, 2009). Here we shall use the Clique Percolation Method (CPM, Palla et al., 2005), which (due to its local nature) is especially suitable for studying evolving communities.