Detecting Local Communities within a Large Scale Social Network Using Mapreduce

Detecting Local Communities within a Large Scale Social Network Using Mapreduce

Hongjun Yin (School of Computer Science and Technology, University of Science and Technology of China, Hefei, China), Jing Li (School of Computer Science and Technology, University of Science and Technology of China, Hefei, China) and Yue Niu (School of Computer Science and Technology, University of Science and Technology of China, Hefei, China)
Copyright: © 2014 |Pages: 20
DOI: 10.4018/ijiit.2014010104


Social network partitioning has become a very important function. One objective for partitioning is to identify interested communities to target for marketing and advertising activities. The bottleneck to detection of these communities is the large scalability of the social network. Previous methods did not effectively address the problem because they considered the overall network. Social networks have strong locality, so designing a local algorithm to find an interested community to address this objective is necessary. In this paper, we develop a local partition algorithm, named, Personalized PageRank Partitioning, to identify the community. We compute the conductance of the social network with a Personalized PageRank and Markov chain stationary distribution of the social network, and then sweep the conductance to find the smallest cut. The efficiency of the cut can reach. In order to detect a larger scale social network, we design and implement the algorithm on a MapReduce-programming framework. Finally, we execute our experiment on several actual social network data sets and compare our method to others. The experimental results show that our algorithm is feasible and very effective.
Article Preview


Community detection is a very meaningful research field. There has been significant previous research, from both the perspective of community detection and the division of social networks. A graph represents a series of points and the set of relationships among these points. These relationships are often referred to as the edges in the graph. Many real-life problems can be abstracted as a graph. In social networks, the points represent the people and the edges represent the relationship between the people. Graph partition refers to the will of a graph G = (V, E) divided into k molecular graphs. In the computer science community, the earliest research on graph partitioning was to process an integrated circuit fragmentation problem. There were, however, many problems due to the graph size being too large. Segmentation made graph partitioning more convenient for distributed computing as well as for web page classification (Gibson, Kleinberg, & Raghavan, 1998). We know there are many successful application areas such as text classification, image segmentation, information retrieval, and bioinformatics. In this paper, we study graph partitioning as a process for the division of a social network to determine different community interests and functions and thereby effectively identify direct advertising areas and targeted marketing (Zhang, Li, & Wang, 2013). Others have previously undertaken similar research. Christian used a spectral partition method to explore the community structure of a newsgroup for effective community information retrieval and organization (Borgs, Chayes, Mahdian, & Saberi, 2004). Graph partitioning generally has two goals: one is a balanced partition size; the second is a low connectivity between the partitions with high clustering within each partition. Our research is aimed at searching a function or interest community in the social network and pursuing a higher partitioning clustering. We do not address the balance of the sub-graphs. Different characteristics of the social network will affect the selection of the large-scale graph partition, storage, and indexing. In order to achieve an overall improvement of the application performance, according to the application purpose, we consider different social network features to select different partitions, storage, and index methods.

Social networks have attracted a great deal of research in recent years. Many characteristics have been identified, such as slope degree distribution, small world characteristics, geographic limitations, and community features (Newman, & Park, 2003; Jin, Girvan, & Newman, 2001). Social networking has a strong geographic correlation. A new connection generally has a geographical distance relationship. In Twitter, Americans usually have more than 80% of their friends and fans in the United States and typically, the distribution between different states is satisfied (Karypis, & Kumar, 1998). Although the related research on the characteristics of social networking is plentiful, there is no scheme to design social network community detection for some of the features. With the development of Web 2.0 technologies, many intelligent applications have been applied (Baloglu, Wyne, & Bahcetepe, 2010). A multi-agent system model has been extensively used in e-commerce applications (Mazamdar, & Mishra, 2010). There are definitely many security problems when we mine user data from the Internet (VidyaBanu, & Nagaveni, 2013). It is even possible to detect customer and competitor sentiments in the social network (Sommer, Schieber, Heinrich, & Hilbert, 2012).

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 16: 4 Issues (2020): Forthcoming, Available for Pre-Order
Volume 15: 4 Issues (2019): 2 Released, 2 Forthcoming
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing