Article Preview
TopIntroduction
Online social networks have become important platforms for people to communicate, share knowledge, and disseminate information. Moreover, online social networks are also widely used to spread news, create trends. Due to the widespread use of online social networks, individual thoughts, preferences, and behaviors are often influenced by peers or friends via social networks. From listening to a song, watching a movie, reading a new book, and choosing a restaurant, to buying a property, choosing a career, choosing a city to live in, and determining political opinions, traces of influence can be found in user’s sharing behaviors on social media. Analyzing and understanding how users in online social networks interact with each other can help researchers better understand, control, and effectively utilize the diffusion process and is conducive to efficiently making public opinion analyses, information predictions, and commercial recommendations.
A core issue of influence diffusion research is influence maximization, which aims to optimize the spread of influence. One of the important applications is viral marketing (Domingos & Richardson, 2001; Qi et al., 2019; Xu et al., 2022; Yuan et al., 2021). Specifically, a company looks for the most influential initial customers in a virtual market and provides them with the opportunity to try a product for free. The company expects these initial customers to actively promote the product in the circle of their friends after using it. Finally, these influential customers can influence their friends, friends of their friends, to accept and purchase the product. This is the goal of influence maximization. Besides viral marketing, there are many other applications of influence maximization, including network monitoring (Leskovec et al., 2007; Wang, Wang, et al., 2021; Qi et al., 2020), rumor control (Budak et al., 2011; He et al., 2012; Xu, Tian, et al., 2021; Peng et al., 2021; Hou et al., 2021), and social recommendation (Ye et al., 2012; Wu et al., 2021; Liao et al., 2021; Sun et al., 2021; Nitu et al., 2021).
The ubiquitous social networks generate massive amounts of social big data with large scale, dynamics, and heterogeneity. Different from many big data analyses, the big data analysis of influence diffusion requires analyzing the influence strength between any two associated users. This is much more difficult than analyzing only the characteristics of a user or the characteristics of a group. In addition, influence diffusion involves the analysis of people’s complex behaviors. These behavioral data are not easy to be mined in social media data. Therefore, it is very important to design efficient and effective influence maximization techniques. In the big data environment, the performance of an influence maximization algorithm can be evaluated from three aspects: quality, computational efficiency, and scalability. Firstly, the influential seed users identified by a good influence maximization algorithm should have a superior expected influence spread. Secondly, a good influence maximization algorithm is superior in the running time. Thirdly, a good influence maximization algorithm should have little memory consumption as the number of required seed users increases.
In summary, this paper mainly summarizes research achievements of influence maximization. Specifically, this paper focuses on two aspects of influence maximization, i.e., models and algorithms. This paper first introduces basic models of influence diffusion and then introduces optimization problems and their algorithms under basic models. Because algorithms need to be suitable for large-scale networks in a big data environment, this paper introduces the design and analysis of efficient and scalable optimization algorithms in detail. Figure 1 shows the common workflow of influence maximization (Arora et al., 2017).
Figure 1.
The framework for influence maximization
The next section of this paper introduces related definitions of influence maximization, followed by an introduction to the basic models of influence diffusion, including an independent cascade model and linear threshold model, and then introduces monotonicity and submodularity of diffusion models closely related to algorithm designs. This is followed by a discussion of the computational hardness of influence maximization and various efficient and scalable approximation algorithms of it, which in turn is followed by a summary and brief discussion of the direction of developments in this field.