Exploring the effect of dynamic seed activation in social networks

Exploring the effect of dynamic seed activation in social networks

International Journal of Information Management xxx (xxxx) xxxx Contents lists available at ScienceDirect International Journal of Information Manag...

2MB Sizes 0 Downloads 36 Views

International Journal of Information Management xxx (xxxx) xxxx

Contents lists available at ScienceDirect

International Journal of Information Management journal homepage: www.elsevier.com/locate/ijinfomgt

Exploring the effect of dynamic seed activation in social networks Sinjana Yerasania, Suprabhat Tripathia, Monalisa Sarmab, Manoj Kumar Tiwaria,* a b

Department of Industrial Systems and Engineering, Indian Institute of Technology-Kharagpur, Kharagpur, 721302, India Reliability Engineering Centre, Indian Institute of Technology-Kharagpur, Kharagpur 721302, India

ARTICLE INFO

ABSTRACT

Keywords: Social networks Influence maximization Memetic algorithm Local search

In this paper, we address the problem of maximizing the influence in a social network by hiring a few users in the network to propagate the information. Considering limited budget and time, hired users (seeds) are activated dynamically at different time intervals over a time horizon. This motivates to avoid the same seed activation in consecutive time intervals that leads to deteriorating the seeds’ efficiency. The aim of this paper is to maximize the total gain obtained in the process of maximizing the influence in a social network. Total gain is obtained by earning of influencing the customers and then deducting the cost incurred for influencing the users. Therefore, an improvised memetic algorithm is developed to find the seeds that are to be activated at different time intervals to maximize the gain. Experimental results validate the effectiveness of the proposed algorithm, and it is found to perform better in identifying the potential seeds with minimum expenditure.

1. Introduction With increasing competition and emerging opportunities in the way of marketing the business, social media marketing has been evolved to propagate the information rapidly with relatively less investment. Among a huge number of online social networks, 90 % of the company’s target their customers through Facebook then followed by Instagram which is expected to grow to 47 % in 2020 (Marketing & digital budget, 2019). It is also observed that 45 % of the marketing budget is spent in online marketing, where 25 % of it is utilized for marketing through social media. Information infused into social media is shared by the users leads to influencing their neighbors and became important aspect for the marketers (Kim & Kim, 2018; Li & Shiu, 2012). Since, the resources, such as time and budget are limited, selecting the set of influencers (seeds) is an important aspect. Seeds are the essential nodes among the users who are considered effective in propagating the information to their neighbors. In a considered network, active and highly connected nodes are eligible to consider them as seeds in the span of advertising. Social media users having more willingness to share positive information and also having more friends indicates the potential influence (Li & Du, 2014). Users’ anticipation about the social media content quality majorly depends on their family members’ and friends’ perception (Aladwani & Dwivedi, 2018). In this paper, seeds are paid to propagate the information, and accordingly, the aim is to maximize the gain by choosing apt seeds at appropriate time periods. In order to propagate the information among the users, each user



after receiving the information either from the seed or from the fellow users, it is expected to share the information to a small number of users who are socially related (Ni, Xie, & Liu, 2010). Generally, people tend to forget the information received gradually with time. When a user in a social network spreads the information about a product or a service, it 1 factor of the total informais considered that an adopter retains tion. Also, if the same information is repeatedly shared to the same set of users by the same seed, there is a possibility of ignoring the information by adopters after few time periods. When a user receives less information from the influencers, propagating the information further will be lesser (Monteserin & Amandi, 2015). Whereas, avoiding the same seed activation in consecutive time periods will prevent from deteriorating the influence propagation (Samadi, Nagi, Semenov, & Nikolaev, 2018). In order to achieve this, an efficiency reduction factor has been introduced if the seed is activated in the consecutive time periods. The rest of this paper is organized as follows: Section 2, gives an overview of the relevant literature. Section 3, gives then description of the network model and the considered problem. Formulation of dynamic seed activation problem in given in Section 4. Section 5 presents the details of the solution methodology used in the paper. Results, extensive analysis of the paper and discussions are presented in Section 6. Section 7 concludes the paper and also includes some recommendation for future research.

Corresponding author. E-mail addresses: [email protected], [email protected] (M.K. Tiwari).

https://doi.org/10.1016/j.ijinfomgt.2019.11.007 Received 23 July 2019; Received in revised form 8 November 2019; Accepted 9 November 2019 0268-4012/ © 2019 Elsevier Ltd. All rights reserved.

Please cite this article as: Sinjana Yerasani, et al., International Journal of Information Management, https://doi.org/10.1016/j.ijinfomgt.2019.11.007

International Journal of Information Management xxx (xxxx) xxxx

S. Yerasani, et al.

2. Literature review

3.1. Network model

The problem of selecting the most influential nodes in the process of maximizing the influence in a considered social network is NP-hard (Kempe, Kleinberg, & Tardos, 2003). They introduced two independent diffusion models, independent cascade model, and linear threshold model to brief how the influence spreads in a social network. However, these basic models do not incorporate time in influence propagation. Mandala, Kumara, Rao, and Albert (2013) dealt with e-marketing problem using an ant colony optimization based approach and there is a scope for network formulations where node’s states evolve temporally. These diffusion models assumed that once the node attains the state as activated, it remains in this state forever. However, the studies of competitive marketing show that the evidence accumulated, evanesce with time. Whereas, in this paper, we consider the partial activation of nodes dynamically along the time periods. Deterministic algorithms can be used to find the optimal solution, i.e. the set of most influential nodes, but these algorithms cannot solve NP-hard problems quickly. In this regard, approximate algorithms are implemented to find near-optimal solution in a short time. Many authors solved influence maximization problem using genetic and other intuitive algorithms like greedy algorithm (Bucur & Iacca, 2016; Chen, Wang, & Yang, 2009; Chang, Yeh, & Chuang, 2018). Cost effective lazy forward method employees local search for solving influence maximization problem and proved that it is 700 times faster than the greedy algorithm (Leskovec et al., 2007). Community based greedy algorithm was proposed by Wang, Cong, Song, and Xie (2010). These enhanced greedy algorithms improve the efficiency but Jiang et al. (2011) proved that algorithms based on simulated annealing (SA) outperforms the greedy algorithms in efficiency and accuracy. Later in 2017, Zhang, Du, and Feldman (2017) proved by their experimental results that genetic algorithm performs much better than SA. Also, SA does not address the diversity of solutions whereas genetic algorithm does and also improves the efficiency of algorithm. Memetic algorithm is one of the most popular metaheuristic algorithms where local search heuristic is considered during the evolution (Gong, Song, Duan, Ma, & Shen, 2016). Tsai, Hong, and Lin (2015) focused on two-dimensional representation of solution schedules as chromosomes for genetic algorithm to solve the aircraft scheduling problem. This representation is adapted in the current paper for its effectiveness of representing schedules as matrices. The steepest mutation-based hill-climbing method proposed by Wang, Wang, and Yang (2009) is modified for two-dimensional chromosomes to improve the elite solution. However, most of the literature in the area of influence maximization in social networks have not extensively addressed the retention factor of previous influence and the efficiency reduction factor when the seed is activated in consecutive time periods. Samadi et al. (2018) considered these factors in their paper with the objective to maximize the gain. We extended the work by considering the cost incurred for implementing the strategy while calculating the gain for maximizing the influence. Cost of executing the strategy is defined as the budget that is spent for influencing the users over a period of time. The gain obtained can be defined as the difference between the earning obtained by influencing the users and the budget spent to hire the seeds for influencing users in appropriate time intervals.

Based on the social network structure, decision-makers select J number of nodes as paid seeds. In such method of marketing strategy, the paid seeds are supposed to diffuse the advertising information to their peers. Generally, 6.5%–8.5% of the company’s revenue is spent for marketing (Marketing & digital budget, 2019) and significant amount of this budget is spent for marketing through social media (Lin, Li, & Wang, 2017). Every user is considered to have its own opinion about a product or its company. Social recommendations have significant influence on users’ opinion (Chen, Lu, & Wang, 2017; Davis & Agrawal, 2018). The marketing company’s aim is to influence individuals’ opinion by offering free/discounted products to promotors (Yerasani, Appam, Sarma, & Tiwari, 2019) or by hiring the promotors (seeds). In this model we consider that the seeds are paid based on their potential to spread the information which is measured based on various factors like number of nodes that the seed is connected to and number of times the seed has been previously activated as a seed to promote the same. Performance deterioration can be observed when the seed is activated in consecutive time periods. To activate node j as a seed at time period t, a cost of Fij is incurred with a discount factor t . The seed activation is constrained to budget spent to activate the seeds in each time period and the overall campaign budget allotted for information diffusion. Also, seed activation and information diffusion is done within a considered time period, and within this time period, information diffused to other nodes are considered to deteriorate over a period of time at a rate of (Samadi et al., 2018). As well as the cost of activating the seeds reduces with time, implies, the seeds that are activated lately are paid less considering net present value of money. To represent this scenario a discount factor t is introduced. This concept is termed as retention; hence cumulative accumulated evidence is multiplied by . Initially, a set of nodes were identified as potential seeds to spread the information in a considered network. For each time period, few seeds were activated based on the potential level of influence propagation. The simulation over the time period activates different seeds at each time interval dynamically in order to reduce the consecutive activation of seeds and to enhance the information diffusion. At any time period the nodes connected to the active seeds receive the evidence and update their influence value. Here, current influence is added to previously received evidence multiplied with . Nodes also spread evidence to other connected nodes, at the rate proportional to their cumulative evidence level Cit . A threshold value i is the minimum evidence level that a node i should reach in order to consider it as a fully activated node. It also implies that a node i is ready to purchase the product. 3.2. Problem description In this study, seed identification and their dynamic activation problem are considered with the objective to maximize the gain in the network. As explained earlier, seeds are termed as the nodes that propagate influence further in a considered network. In order to maximize the gain while spreading the information among the nodes in the network, foremost important thing is to identify the effective nodes, i.e. seeds in the network. We also consider the time periods at which each seed must be activated in order to maximize the influence propagation. Considering a fixed set of nodes as seeds for each time horizon is not effective (Samadi et al., 2018). Furthermore, in this problem we are focusing on the scenario where the nodes are assumed to forget the influence over time. Activating and deactivating the seed nodes over time periods retains their effectiveness in influencing other nodes and also minimize the budget to spend in each time period. The overall objective of this problem is selecting seeds dynamically at each period for maximizing the spread of influence in social networks. This includes avoiding the activation of same seed in consecutive time periods and considering that the users have a

3. Dynamic seed activation model Influence maximization is a problem of selecting the most influential nodes in order to propagate the information in a considered network. In this section, initially the network model designed for influence maximization based on seed hiring for spreading the influence is studied. Later, problem description followed by problem formulation were discussed. 2

International Journal of Information Management xxx (xxxx) xxxx

S. Yerasani, et al.

factor of forgetfulness. In this regard, budget and time constraints are incorporated to attain the maximum profit. The next section presents the detailed formulation of this problem.

i No node can reject a seed nomination ii Voluntary activation of seeds is not considered iii In the current model, it is considered that the cost spent on a seed is proportional to the number of its followers iv Fully activated node is considered as one of the buyers of the product

B Bt G t

Fjt Decision Variables

Ait Cit Zit

Eit

Sjt

Ci0 = 0i

N

Cit = Ci (t

1)

Index for node Index for seed Index for time period Set of network edges Set of nodes in the network Set of seeds in the network Total no of time periods in the time horizon Evidence threshold for activation of node i Evidence delivered by peer nodes in 1 transfer Evidence delivered by activated seed in 1 transfer Rate at which nodes accumulate evidence from the previous time period (Retention factor) Efficiency reduction factor for consecutive activation of a seed Max budget available for the entire time horizon Max budget available with the decision-maker for period t Unit gain from one positive activation per time period Discount factor for conversion of cost to present value at time period t Cost of hiring seed j at period t

t

i

G *Zit j

si

( Eit 1) + (1

Sjt 1)

Eit

(Sjt ) si

S , t = 0, 1, ..., T

t.

N ,j

N ,j

S , t = 1, 2, ..., T

(7) (8)

Fjt . Sjt

B

(9)

N t.

Fjt . Sjt

Bt

t

(1, T )

(10)

j

0

i

N,

Fjt

0

S,

t

(1, T )

(12)

Sjt

{0, 1}

j

S,

t

(13)

Zit

{0, 1}

i

N,

j

t

(11)

Ait , Eit , Cit

(1, T )

(1, T ) t

(14)

(1, T )

Total budget spent on activating the seeds that include activation cost and the fixed amount meant to activate a seed along with discounting factor is specified in Constraint (9). Constraint (10) is the budget-balancing constraint that represents the total budget spent in each time period should not be greater than the budget allocated for each time period. N

MaxZ =

S

T

(G *ZiT )

( j=1 t =1

Sjt *Fjt t

)

(15)

The objective function defined in Eq. (1) can be replaced by Eq. (15) when the aim of the marketing campaign is to maximize the influence on a given deadline with the same above constraints. Eq. (15) calculates only the gain obtained at the time period T, this is applicable for the scenario like elections that are having a deadline. 5. Solution approach In order to maximize the gain by incorporating the scenario of information diffusion in a social network, the model in the previous section has been formulated. Kempe et al. (2003) proved the influence maximization problem to be a NP-hard problem. The solution obtained by solving NP-hard problem will not be an optimal solution. Heuristics

Sjt *Fjt t

(5)

(6)

N , t = 0, 1, ..., T

t=0 j N

This study aims to determine the set of seeds to be activated at each period of time, i.e. seed scheduling for maximizing the influences in a network such that total gain should be maximized. The overall objective function of the model is to maximize the gain which includes unrealized gain due to seed activation and the cost incurred for executing a strategy.

Max Z=

N , t = 1, 2, ..., T

T

Partial Activation level for node i ∈ N at time t Cumulative evidence for node i ∈ N at time t 1, if node i ∈ N is fully positively activated at time t 0, otherwise Value of evidence that node i provides to each of its followers at time t 1, if seed j is activated at time t 0, otherwise

S

Ekt i

Updating the evidence by each node to all its neighbors at time period t is ensured by Constraint (6). Here, we considered s is the amount of evidence that a seed can transfer to its neighbors. Also, activating the same seed in the subsequent time period deteriorates its s value by the factor and it is given in Constraints (7) and (8).

4.3. Formulation

N

+

Eit

i=1

T

(3)

(4)

pA i it

Eit

4.2. Notations

s

N , t = 0, 1, ..., T

Constraint (4) guarantees that each node starts with zero influence value before the diffusion process starts in that particular period. Later, for each node, the current evidence the activation value is calculated as the sum of the current influenced value and the previously accrued evidence in period t 1 is shown in Constraint (5). Here, previous evidence is multiplied with the negative factor representing that the nodes retaining their acquired influence over a time period.

The following assumptions are considered while developing the model

i p

Zit i

(2)

k S;(k , i) E

4.1. Assumptions

Inputs

N , t = 0, 1, ..., T

Constraint (2) represents that for a node, the partial activation level increases linearly before reaching its threshold value and the activation level should be maximum 1, and minimum 0 is ensured by Constraint (3).

In this section, various assumptions considered in the model, parameters, decision variables, and the notations used while developing the model are considered. A similar procedure of mathematical formulation without considering the cost for executing a strategy had been used by Samadi et al. (2018). In later stage, the objective function and the constraints of the problem are explained.

i j t E N S T

i

i

Ait

4. Problem formulation

Indices

Cit

Ait

(1)

Subject to 3

International Journal of Information Management xxx (xxxx) xxxx

S. Yerasani, et al.

Fig. 1. Flowchart for proposed improvised memetic algorithm.

are used for the reasons of tractability and robustness. For solving realworld problems related to social behavior learning, community detection, etc., memetic algorithms have been proven to perform efficiently. In this study, an improvised memetic algorithm is considered to identify the potential seeds across time periods in order to maximize the total profit. The ability to perform local search among the generated candidates increases the searchability and possibility of optimizing the result. This algorithm constitutes generating the initial solution and performing the genetic operators, comprising crossover, mutation and finally local search operator is conducted as shown in Fig. 1. The whole framework of the proposed improvised memetic algorithm for influence maximization is given in Algorithm 1.

optimization is done for all the time periods together. The columns represent the seeds, and the value 1 indicates that the seed is activated for the corresponding time period. Each row represents a time period and the total number of rows equals the number of time periods considered in the campaign horizon. For a chromosome matrix S , Si, t = 1 represents that node i in time period t has been activated and Si, t = 0 , otherwise. As shown in Fig. 2, the fitness value is calculated as the total gain obtained and it is calculated as the difference between gain obtained by partial activation of seeds and total expenditure for the campaign. 5.2. Crossover and mutation A crossover operator recombines the two parents to generate two new off-springs. Here, the total population represents one solution. So in order to optimize, the entire chromosome matrix S should be considered for performing the crossover. A 2-dimensional crossover increases the diversity of offsprings. As shown in Fig. 3, 2-D crossover adopts either horizontal or vertical combinations to generate offsprings. A pseudocode overview of 2-D crossover is given in Algorithm 2.

5.1. Generating the initial solution Initially, random binary matrices of dimension (N x T) are created and validated for constraints satisfaction mentioned in the above sections. If the matrix is invalid as a chromosome, it is discarded, and a new random matrix is generated. This process is repeated until we have number of valid chromosomes equal to the algorithm population size. The chromosomes are encoded in a 2- dimensional matrix as the

Fig. 2. Fitness function. 4

International Journal of Information Management xxx (xxxx) xxxx

S. Yerasani, et al.

crawling twitter data from public sources. We have used data in directed edges form for inferring network structure. Furthermore, all the experiments are executed on the workstation of 8 GB RAM and Intel Core i5 with 1.70 GHz frequency processor. The termination criteria of memetic and genetic algorithms are set as the maximum iteration of 80 in each run. 6.1. Parameter tuning Convergence rate and the solution quality are majorly affected by parameter tuning of the algorithm. The appropriate parameter values are found through the numerous runs using the proposed algorithm. The high rate of mutation probability [pm > 0.1] led to creation of mostly invalid offspring which dramatically increased processing times. Also, very low rates (< 0.05) led to very low exploration of search space. We empirically reached an acceptable level of pm = 0.01 which gives quick results with sufficient exploration. Similarly, in our methodology, we have incorporated two-tier approach for crossover. Firstly, a mating pool is created using fitness proportionate selection. Later, two parents are randomly selected from this mating pool for performing crossover between these two parents. Suppose, if the total number of chromosomes that participated in crossover is c . Probability of crossover is calculated by dividing c with total number of population. This can be varied from 0.1 to 0.6 to find out the optimal value of the parameter. Fig. 4 portrays the parameter tuning for crossover probability, and it can be observed that the fitness value is maximized at probability 0.2.

Fig. 3. 2- D crossover representation.

6.2. Experimental results Three different sized datasets for effective gauging of algorithmic effectiveness is considered. For each network corresponding to a dataset, we run the two optimization algorithms for 80 iterations to find the best seed activation strategy, and the results are compiled in Table 1. The results obtained are the total number of seeds required to spread the influences under given budget b. Here, the maximum number of nodes in the network is given as N . Also, total expenditure (Exp) is calculated as the budget utilized for activating the seeds when solved using improvised memetic and genetic algorithms. We can see that the proposed algorithms can give better solutions, particularly in the marginally larger datasets. The fitness value is also compared and for instance 3 convergence graphs are represented. Fig. 5 shows the convergence graphs for proposed algorithm and genetic algorithm. It can be seen that the rate of convergence of improvised memetic algorithm is faster than genetic algorithm. This shows that the improvised memetic algorithm performs better compared to genetic algorithm in the concept of influencing the users.

5.3. Local search operator In order to improve the elite solution obtained after performing mutation operation, local search operator is used to optimize the solution more effectively. Pseudocode of local search function is given in Algorithm 3.

6.3. Sensitivity analysis In this work, two main aspects are to be focused to analyze their effect on the performance of algorithm. Firstly, the consecutive time

6. Computational results and discussion The performance of the proposed algorithm for solving dynamic seed activation problem is evaluated through the computational experiments. In this section, various problem instances, parameter tuning of the algorithm, and computational experiments along with results are described. The datasets used for analysis were obtained from Stanford Network Analysis Project repository (2018) and were compiled by

Fig. 4. Parameter tuning for crossover probability. 5

International Journal of Information Management xxx (xxxx) xxxx

S. Yerasani, et al.

Table 1 Considered problem cases and their results. Instance

1 2 3 4 5

Network details

GA

MA

b

N

Number of seeds

Exp

Fitness

Number of seeds

Exp

Fitness

100 350 100 300 75

65 184 184 129 129

6 18 13 13 48

6.35 20.04 14.1 16.74 55.94

445.64 1983.95 1724.89 1370.25 1205.05

10 46 35 41 46

32.83 54.34 42.31 52.2 61.85

559.16 2135.65 1871.68 1492.79 1266.14

between 0 and 1 is represented in Fig. 7. This graph shows that if the user’s recall retention value is high, indicates that the user remembered most of the information shared in previous time periods. If the user remembers the information induced in the previous time periods, then there is more chance of getting influenced and attaining the fully activated level. All users together pertaining high level of recall retention rate leads to climb in the fitness value as shown in Fig. 7. 6.4. Discussion and implications The statistical reports from e-marketers (Marketing & digital budget, 2019) show that the highest amount of marketing budget is spent on search engine marketing, followed by social media advertising. However, the researchers proved that search engine marketing services are not as successful as they seem to be (Aswani, Kar, Ilavarasan, & Dwivedi, 2018). It can be more beneficial if the marketers spend their maximum advertising budget in social media for information propagation. Even further, to utilize this budget effectively this study has been conducted to maximize the influence, given limited time period, campaign budget and a network. First, this study is significant in that it considers the concept of forgetfulness and also the impact of consecutive activation of seeds in a given time period. Second, the gain in the objective function contains the cost of executing a strategy along with unrealized gain obtained from information diffusion in a network. We solved this influence maximization problem by incorporating both genetic algorithm and a proposed improvised memetic algorithm and found that proposed algorithm performs better as follows: 1. The local search enables us to explore search space which is difficult for the GA to reach. 2. Since the number of seeds used and expenditure is higher in most cases it can be concluded that the proposed algorithm can create more complicated but effective seed activation strategies. The complexity is choosing the appropriate seeds and activating for a particular time period in order to maintain the minimum budget. The complexity of seed activation increases as the number of seeds increases and constraints like preventing consecutive seed activation exists. Considering the instance 3, it can be seen that the rate of convergence of improvised memetic algorithm is faster than genetic algorithm.

Fig. 5. Convergence graph of instance 3.

Fig. 6. Total gain vs Consecutive Activation Effectiveness Factor.

7. Conclusion and future work This paper examines the dynamic seed activation problem to maximize the influence in a considered social network. The mathematical model in the form of mixed-integer programming is formulated to maximize the gain considering the gain attained by influencing the people and the cost incurred to execute the strategy. Each node is considered to deliver evidence proportional to its activation level to all of its followers. Here, it is studied that people forget the influence accumulated over a period of time and also if the same seed is activated in the consecutive time periods then there will be a reduction in the seed influence efficiency. Improvised memetic and genetic algorithms are employed to solve the considered influence maximization problem, and that is NP-hard. Social network datasets are considered from SNAP repository, and the results of the computational experiments clearly

Fig. 7. Total gain against recall retention rate.

period seed activation factor and then the retention factor. Sensitivity analysis has been carried out by changing the problem environment of instance 3. Fig. 6 shows the comparison of fitness function value with various levels of effectiveness factor varying from 0 to 1. If the seed is activated consecutive time periods, then the effectiveness in their performance in the current time period is affected and it can be clearly observed in Fig. 6. The consecutive activation effectiveness factor equals to 1 if the seed was not activated in the previous time period and accordingly the total gain obtained will be higher. Accordingly, the variation of fitness value when the recall retention value is varied 6

International Journal of Information Management xxx (xxxx) xxxx

S. Yerasani, et al.

illustrate that the improvised memetic algorithm performs better compared to the genetic algorithm. The valuable insights evolved from this research can be useful to take the proper decision of selecting appropriate users a seed at appropriate time periods for marketing a product in a social network. In future study, we can extend this model by considering multiple networks for influence maximization. Also, for this problem, weighted edges between two nodes can be considered based on their friendship strength. Similarly, user expenditure can be considered as one of the attributes to calculate the gain.

Gong, M., Song, C., Duan, C., Ma, L., & Shen, B. (2016). An efficient memetic algorithm for influence maximization in social networks. IEEE Computational Intelligence Magazine, 11(3), 22–33. Jiang, Q., Song, G., Gao, C., Wang, Y., Si, W., & Xie, K. (2011). Simulated annealing based influence maximization in social networks (2011, August) . Twenty-Fifth AAAI Conference on Artificial Intelligence. Kempe, D., Kleinberg, J., & Tardos, É. (2003). Maximizing the spread of influence through a social network (2003, August) Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 137–146. Kim, N., & Kim, W. (2018). Do your social media lead you to make social deal purchases? Consumer-generated social referrals for sales via social commerce. International Journal of Information Management, 39, 38–48. Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., Faloutsos, C., VanBriesen, J., et al. (2007). Cost-effective outbreak detection in networks (2007, August) Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 420–429. Li, F., & Du, T. C. (2014). Listen to me—Evaluating the influence of micro-blogs. Decision Support Systems, 62, 119–130. Li, Y. M., & Shiu, Y. L. (2012). A diffusion mechanism for social advertising over microblogs. Decision Support Systems, 54(1), 9–22. Lin, X., Li, Y., & Wang, X. (2017). Social commerce research: Definition, research themes and the trends. International Journal of Information Management, 37(3), 190–201. Mandala, S. R., Kumara, S. R., Rao, C. R., & Albert, R. (2013). Clustering social networks using ant colony optimization. Operational Research, 13(1), 47–65. Marketing and digital budget (2019). How much should you budget for marketing in 2019? Online; Accessed March 2019https://www.webstrategiesinc.com/blog/how-muchbudget-for-online-marketing-in-2014. Monteserin, A., & Amandi, A. (2015). Whom should I persuade during a negotiation? An approach based on social influence maximization. Decision Support Systems, 77, 1–20. Ni, Y., Xie, L., & Liu, Z. Q. (2010). Minimizing the expected complete influence time of a social network. Information Sciences, 180(13), 2514–2527. Samadi, M., Nagi, R., Semenov, A., & Nikolaev, A. (2018). Seed activation scheduling for influence maximization in social networks. Omega, 77, 96–114. Stanford Network Analysis Project repository (2018). Twitter dataset information. Online; Accessed November 2018https://snap.stanford.edu/data/ego-Twitter.html. Tsai, M. W., Hong, T. P., & Lin, W. T. (2015). A two-dimensional genetic algorithm and its application to aircraft scheduling problem. Mathematical Problems in Engineering, 2015. Wang, H., Wang, D., & Yang, S. (2009). A memetic algorithm with adaptive hill climbing strategy for dynamic optimization problems. Soft Computing, 13(8–9), 763–780. Wang, Y., Cong, G., Song, G., & Xie, K. (2010). Community-based greedy algorithm for mining top-k influential nodes in mobile social networks (2010, July) Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1039–1048. Yerasani, S., Appam, D., Sarma, M., & Tiwari, M. K. (2019). Estimation and maximization of user influence in social networks. International Journal of Information Management, 47, 44–51. Zhang, K., Du, H., & Feldman, M. W. (2017). Maximizing influence in a social network: Improved results using a genetic algorithm. Physica A Statistical Mechanics and Its Applications, 478, 20–30.

Funding The work has been financially supported by the research project EBusiness Center of Excellence (ECO) funded by Ministry of Human Resource and Development (MHRD), Government of India under the scheme of Center for Training and Research in Frontier Areas of Science and Technology (FAST), Grant No. F.No.5-5/2014-TS.VII. Declaration of Competing Interest None. References Aladwani, A. M., & Dwivedi, Y. K. (2018). Towards a theory of SocioCitizenry: Quality anticipation, trust configuration, and approved adaptation of governmental social media. International Journal of Information Management, 43, 261–272. Aswani, R., Kar, A. K., Ilavarasan, P. V., & Dwivedi, Y. K. (2018). Search engine marketing is not all gold: Insights from Twitter and SEOClerks. International Journal of Information Management, 38(1), 107–116. Bucur, D., & Iacca, G. (2016). Influence maximization in social networks with genetic algorithms (2016, March)European conference on the applications of evolutionary computation. Cham: Springer379–392. Chang, C. W., Yeh, M. Y., & Chuang, K. T. (2018). Node reactivation model to intensify influence on network targets. Knowledge and Information Systems, 54(3), 567–590. Chen, W., Wang, Y., & Yang, S. (2009). Efficient influence maximization in social networks (2009, June) Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 199–208. Chen, A., Lu, Y., & Wang, B. (2017). Customers’ purchase decision-making process in social commerce: A social learning perspective. International Journal of Information Management, 37(6), 627–638. Davis, J. M., & Agrawal, D. (2018). Understanding the role of interpersonal identification in online review evaluation: An information processing perspective. International Journal of Information Management, 38(1), 140–149.

7