Abstract
Cascades are ubiquitous in various network environments. Predicting these cascades is decidedly nontrivial in various important applications, such as viral marketing, epidemic prevention, and traffic management. Most previous works have focused on predicting the final cascade sizes. As cascades are dynamic processes, it is always interesting and important to predict the cascade size at any given time, or to predict the time when a cascade will reach a certain size (e.g., the threshold for an outbreak). In this paper, we unify all these tasks into a fundamental problem: cascading process prediction. That is, given the early stage of a cascade, can we predict its cumulative cascade size at any later time? For such a challenging problem, an understanding of the micromechanism that drives and generates the macrophenomena (i.e., the cascading process) is essential. Here, we introduce behavioral dynamics as the micromechanism to describe the dynamic process of an infected node’s neighbors getting infected by a cascade (i.e., one-hop sub-cascades). Through data-driven analysis, we find out the common principles and patterns lying in the behavioral dynamics and propose the novel NEtworked WEibull Regression model for modeling it. We also propose a novel method for predicting cascading processes by effectively aggregating behavioral dynamics and present a scalable solution to approximate the cascading process with a theoretical guarantee. We evaluate the proposed method extensively on a large-scale social network dataset. The results demonstrate that the proposed method can significantly outperform other state-of-the-art methods in multiple tasks including cascade size prediction, outbreak time prediction, and cascading process prediction.
Similar content being viewed by others
Notes
Here, the cascades are information cascades. When a user re-tweets/generates a post, several of his/her followers will further re-tweet the post and so on to form an information cascade.
The dataset is complete and publicly available at http://www.thumedia.org.
We think that a follower with a different re-tweet number will have different effects to the user, and so we modify the weights of each term \(follower\_avg\_inflow\_rate\) and \(follower\_avg\_retweet\_rate\).
It will be counted multiple times if a specific user is involved in multiple cascades.
References
Chen W, Collins A, Cummings R, Ke T, Liu Z, Rincon D, Sun X, Wang Y, Wei W, Yuan Y (2011) Influence maximization in social networks when negative opinions may emerge and propagate. In: SDM, vol. 11, pp 379–390. SIAM
Chen W, Wang Y, Yang S (2009) Efficient influence maximization in social networks. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 199–208
Cheng J, Adamic L, Dow PA, Kleinberg JM, Leskovec J (2014) Can cascades be predicted? In: Proceedings of the 23rd international conference on World wide web. International World Wide Web Conferences Steering Committee, pp 925–936
Cohen R, Havlin S, Ben-Avraham D (2003) Efficient immunization strategies for computer networks and populations. Phys Rev Lett 91(24):247901
Cox D (1972) Regression models and life-tables. J R Stat Soc Ser B 34(2):187–220
Cui P, Jin S, Yu L, Wang F, Zhu W, Yang S (2013) Cascading outbreak prediction in networks: a data-driven approach. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 901–909
Cui P, Wang F, Liu S, Ou M, Yang S, Sun L (2011) Who should share what?: item-level social influence prediction for users and posts ranking. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 185–194
Domingos P, Richardson M (2001) Mining the network value of customers. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 57–66
Du N, Song L, Woo H, Zha H (2013) Uncover topic-sensitive information diffusion networks. In: Proceedings of the sixteenth international conference on artificial intelligence and statistics, pp 229–237
Gionis A, Terzi E, Tsaparas P (2013) Opinion maximization in social networks. arXiv preprint arXiv:1301.7455
Gomez-Rodriguez M, Leskovec J, Schölkopf B (2013) Modeling information propagation with survival theory. In: ICML
Gomez Rodriguez M, Leskovec J, Schölkopf B (2013) Structure and dynamics of information pathways in online media. In: Proceedings of the sixth ACM international conference on web search and data mining. ACM, pp 23–32
Kempe D, Kleinberg J, Tardos É (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 137–146
Miller RG Jr (2011) Survival analysis, vol 66. Wiley, London
Myers S, Leskovec J (2010) On the convexity of latent social network inference. In: Advances in neural information processing systems, pp 1741–1749
Pinder III JE, Wiener JG, Smith MH (1978) The weibull distribution: a new method of summarizing survivorship data. Ecology 59(1):175–179. doi:10.2307/1936645
Rodriguez G (2005) Parametric survival models. Lectures Notes, Princeton University
Rodriguez MG, Balduzzi D, Schölkopf B (2011) Uncovering the temporal dynamics of diffusion networks. arXiv preprint arXiv:1105.0697
Rodriguez MG, Schölkopf B (2012) Influence maximization in continuous time diffusion networks. arXiv preprint arXiv:1205.1682
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58(1):267–288. http://www.jstor.org/stable/2346178
Yu L, Cui P, Wang F, Song C, Yang S (2015) From micro to macro: uncovering and predicting information cascading process with behavioral dynamics. In: Data mining (ICDM), 2015 IEEE international conference on, pp 559–568. IEEE
Acknowledgments
This work was supported by National Program on Key Basic Research Project, No. 2015CB352300; National Natural Science Foundation of China, Nos. 61370022, 61531006, 61472444 and 61210008; and International Science and Technology Cooperation Program of China, No. 2013DFG12870. Thanks for the research fund of Tsinghua-Tencent Joint Laboratory for Internet Innovation Technology, and Beijing Key Laboratory of Networked Multimedia.
Author information
Authors and Affiliations
Corresponding author
Appendix: Proof of Theorem 1
Appendix: Proof of Theorem 1
Proof
It is evident that both \(G_2(\beta , \lambda )\) and \(G_3(\gamma , k)\) have global minimum value. Next we prove that \(G_1(\lambda , k)\) also has global minimum value, or to prove \(\log {L(\lambda , k)}\) has global maximum value.
Let \(\lambda _i'=\lambda _i^{-k_i}\), \(\log {L'(\lambda ', k)}=\log {L(\lambda , k)}=\sum _{i=1}^N l_i'(\lambda _i',k_i)\) where \(l_i'(\lambda _i',k_i)=m_i\log {k_i}+(k_i-1)\sum _{j=1}^{m_i}{\log {T_{i,j}}}+m_i\log {\lambda _i'}-\lambda _i'\sum _{j=1}^{m_i}{T_{i,j}^{k_i}}\), the partial derivatives of the \(l_i'\) are given by:
Since \(\frac{\partial ^2{l_i'}}{\partial {\lambda _i'^2}}<0\) and \(\frac{\partial ^2{l_i'}}{\partial {k_i^2}}<0\), the conditional marginal posterior densities of parameters \(\lambda _i'\) and \(k_i\) are log-concave. Moreover, when \(0<k_i<1, 0<\lambda _i'<\min \left( \frac{m_i}{\sum _{i=1}^{m_i}T_{i,j}^{k_i}},\frac{1}{\sum _{j=1}^{m_i}{T_{i,j}\log {T_i}}}\right) \),
when \(k_i \ge \max \left( 1, \frac{m_i}{\lambda _i'\sum _{j=1}^{m_i}{T_{i,j}\log {T_{i,j}}} - \sum _{j=1}^{m_i}{\log {T_{i,j}}}} \right) \) and \(\lambda _i'\ge \max \left( 1, \frac{m_i}{\sum _{j=1}^{m_i}T_{i,j}^{k_i}}\right) \),
which means there should be a global maximum of \(l_i'\), so does \(\log L\). \(\square \)
Rights and permissions
About this article
Cite this article
Yu, L., Cui, P., Wang, F. et al. Uncovering and predicting the dynamic process of information cascades with survival model. Knowl Inf Syst 50, 633–659 (2017). https://doi.org/10.1007/s10115-016-0955-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-016-0955-7