Abstract
Scholarly documents are sources of information on research topics written by academic experts. Topic drift in such scholarly documents is usually linked with the contextual variation in the title or abstract or entire document over time. However, topic distribution over words in different components of the document is non-uniform due to the varying impact of authors and citations, and their contribution to drift must be processed accordingly. This paper builds a model that distinguishes the context of a research document based on the author and citation by incorporating relation between topic, author, citation, word and time in the form of author context vector and citation context vector. To infer posterior probabilities, a parallel author cited_author topic model is presented. Continuous time bivariate Brownian motion model is employed for deducing the evolving bivariate topic parameters, specific to the author and citation. The word, topic pairs from the author and citation context vectors are jointly learned to yield topical word embeddings over time conditioned on author and citation contexts. When evaluated with NIPS and business journals datasets, the proposed model identifies topical variations over time precisely compared to other methods. It is found that broadening of topic happens due to the author context, and topic deviation is mainly caused by citation context.
Similar content being viewed by others
References
Alzubi OA, Alzubi JA, Tedmori S, Rashaideh H, Almomani O (2018) Consensus-based combining method for classifier ensembles. Int Arab J Inf Technol 15(1):76–86
Alzubi OA, Alzubi JA, Alweshah M, Qiqieh I, Al-Shami S, Ramachandran M (2020a) An optimal pruning algorithm of classifier ensembles: dynamic programming approach. Neural Comput Appl 32:16091–16107
Alzubi JA, Jain R, Kathuria A, Khandelwal A, Saxena A, Singh A (2020b) Paraphrase identification using collaborative adversarial networks. J Intell Fuzzy Syst 39(1):1021–1032
Alzubi JA (2016) Diversity-based boosting algorithm. Int J Adv Comput Sci Appl 7(5):524–529
Amjad T, Daud A, Song M (2018) Measuring the impact of topic drift in scholarly networks. Companion Proc Web Conf 2018:373–378
Bai X, Zhang F, Lee I (2019) Predicting the citations of scholarly paper. J Informetr 13(1):407–418
Bhadury A, Chen J, Zhu J, Liu S (2016) Scaling up dynamic topic models. In: Proceedings of the 25th international conference on world wide web. International World Wide Web Conferences Steering Committee, pp 381–390
Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the 23rd international conference on machine learning, ACM, pp 113–120
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
Bolellia L, Gilesb SECL (2009) What is trendy? Generative models for topic detection in scientific literature
Dietz L, Bickel S, Scheffer T (2007) Unsupervised prediction of citation influences. In: Proceedings of the 24th international conference on machine learning, ACM, pp 233–240
Dubey A, Hefny A, Williamson S, Xing EP (2013) A nonparametric mixture model for topic modeling over time. In: Proceedings of the 2013 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, pp 530–538
Giaquinto R, Banerjee A (2018) Topic modeling on health journals with regularized variational inference. In: Thirty-second AAAI conference on artificial intelligence
Gupta P, Rajaram S, Schütze H, Andrassy B (2017) Deep temporal-recurrent-replicated-softmax for topical trends over time. arXiv:1711.05626
Jensen S, Liu X, Yu Y, Milojevic S (2016) Generation of topic evolution trees from heterogeneous bibliographic networks. J Informetr 10(2):606–621
Jeong YS, Lee SH, Gweon G (2016) Discovery of research interests of authors over time using a topic model. In: 2016 international conference on big data and smart computing (BigComp), IEEE, pp 24–31
Jeong YK, Song M, Ding Y (2014) Content-based author co-citation analysis. J Informetr 8(1):197–211
Jiang D, Shi L, Lian R, Wu H (2016) Latent topic embedding. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, pp 2689–2698
Jin J, Geng Q, Mou H, Chen C (2019) Author–subject–topic model for reviewer recommendation. J Inf Sci 45(4):554–570
Kataria S, Mitra P, Caragea C, Giles CL (2011) Context sensitive topic models for author influence in document networks. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, vol 3, pp 2274–2280
Kim J, Kim D, Oh A (2017) Joint modeling of topics, citations, and topical authority in academic corpora. Trans Assoc Comput Linguist 5:191–204
Li Y, Xu Z, Wang X, Wang X (2020) A bibliometric analysis on deep learning during 2007–2019. Int J Mach Learn Cybern 1–20
Lim KW, Buntine W (2015) Bibliographic analysis with the citation network topic model. In: Asian conference on machine learning, pp 142–158
Liu Y, Liu Z, Chua TS, Sun M (2015) Topical word embeddings. In: AAAI. 2015, January, pp 2418–2424
McCallum A, Corrada-Emmanuel A, Wang X (2005) The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks, with Enron and Academic Email. In: Workshop on Link Analysis, Counterterrorism and Security, pp 33–44
Meng C, Yang C, Wang Y (2016) Community detection and topic drift with word embedding. In 33rd international conference on machine learning, vol 48
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Morters P, Peres Y (2010) Brownian motion, vol 30. Cambridge University Press, Cambridge
Naveed N, Sizov S, Rauf Z (2019) ATTention: understanding authors and topics in context of temporal evolution. J Appl Emerg Sci 8(2):181–185
Naveed N, Sizov S, Staab S (2011) Attention: understanding authors and topics in context of temporal evolution. In: European conference on information retrieval. Springer, Berlin, pp 733–737
Nguyen DQ, Billingsley R, Du L, Johnson M (2015) Improving topic models with latent feature word representations. Trans Assoc Comput Linguist 3:299–313
Niu L, Dai XY, Huang S, Chen J (2016) A unified framework for jointly learning distributed representations of word and attributes. In: Asian conference on machine learning, pp 143–156
Niu L, Dai X, Zhang J, Chen J (2015) Topic2Vec: learning distributed representations of topics. In: 2015 international conference on Asian language processing (IALP), IEEE, pp 193–196
Rismanto R, Syulistyo AR, Agusta BPC (2020) Research supervisor recommendation system based on topic conformity. Int J Mod Educ Comput Sci 12(1):26
Rosen-Zvi M, Chemudugunta C, Griffiths T, Smyth P, Steyvers M (2010) Learning author-topic models from text corpora. ACM Transactions on Information Systems (TOIS) 28(1):4
Rudolph M, Blei D (2017) Dynamic Bernoulli embeddings for language evolution. arXiv:1703.08052
Sahragard R, Meihami H (2016) A diachronic study on the information provided by the research titles of applied linguistics journals. Scientometrics 108(3):1315–1331
Saier T, Farber M (2020) unarXive: a large scholarly data set with publications’ full-text, annotated in-text citations, and links to metadata. Scientometrics 125:3085–3108
Sigman K (2006) Geometric Brownian motion. http://www.columbia.edu/~ks20/FE-Notes/4700-07-Notes-GBM.pdf
Sleeman J, Halem M, Finin T, Cane M (2016) Dynamic topic modeling to infer the influence of research citations on ipcc assessment reports. In: Big data challenges, research, and technologies in the earth and planetary sciences workshop, IEEE international conference on big data, IEEE
Shi B, Lam W, Jameel S, Schockaert S, Lai KP (2017) Jointly learning word embeddings and latent topics. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 375–384
Shi M, Tang Y, Zhu X, Liu J, He H (2020) Topical network embedding. Data Min Knowl Disc 34(1):75–100
Wang C, Blei D, Heckerman D (2012) Continuous time dynamic topic models. arXiv:1206.3298
Wang X, McCallum A (2006) Topics over time: a non-Markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 424–433
Wang J, Wu X, Li L (2018) A framework for semantic connection based topic evolution with DeepWalk. Intell Data Anal 22(1):211–237
Yang J, Donnat C (2017) CS 224N: language dynamics analysis through Word2Vec embeddings
Yang M, Zhu D, Tang Y, Wang J (2017) Authorship attribution with topic drift model. In: AAAI, pp 5015–5016
Zhou H, Yu H, Hu R (2017) Topic evolution based on the probabilistic topic model: a review. Front Comput Sci 11(5):786–802
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Vijayarani, J., Geetha, T.V. Joint learning of author and citation contexts for computing drift in scholarly documents. Int. J. Mach. Learn. & Cyber. 12, 1667–1686 (2021). https://doi.org/10.1007/s13042-020-01265-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-020-01265-6