Article Preview
TopIntroduction
In the current era, the usage of online social networks has grown exponentially. Nowadays, social networking websites offers various kind of services such as exchange of information, marketing of products or services, product review collection, opinion collection regarding an event, social or political awareness, etc. Services offered by online social networks can be efficiently utilized if the inter-connectivity among users in online social network is high. Since the online social network is continuously evolving, hence, to maximize the information flow in the existing network, it is essential to recognize the number of disconnected users in the present network system. In the present scenario, link prediction is one of the most fundamental problems in the broad domain of social network analysis (Zhou et al., 2018). Link prediction techniques are employed to calculate the possibility of any future link formation among the disconnected pair of users in current social network. Link prediction techniques generally uses common shared attributes of users (or nodes) to compute whether two users will get connected in the near future or not.
Link prediction problem in an online social network is usually considered as a binary classification problem (Hassan et al., 2011) (Lee & Seung, 2001). Generally, a trained classifier predicts if any association is possible among the two disconnected nodes in the future or not. The classifiers use several scoring functions such as Common Neighbours, Jaccard Index, Katz Index, etc., in order to measure the probability of link formation (Zhang & Chen, 2018). In the binary classification problems, one of the key challenges while classifying the objects is to decide the threshold value for object classification. The threshold value used for classification should neither be too high nor too low. A high threshold value might reduce the number of correct predictions (true positives) whereas a low threshold might cause the huge number of false predictions (false positives). For a given network, a link prediction technique calculates a similarity value among users based on varied attributes and then uses a fixed threshold value to determine if two users will get linked in the future or not. Two users will have high likelihood to be linked to each other in near future if the similarity value among them is greater than the given threshold value (calculated on the basis on some heuristic). Otherwise, if the calculated similarity score between a pair of nodes is lower than the given threshold value, it can be inferred that the possibility of a new association among the two nodes (or users) is minimal.