Skip to main content
Log in

Joint learning of author and citation contexts for computing drift in scholarly documents

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Scholarly documents are sources of information on research topics written by academic experts. Topic drift in such scholarly documents is usually linked with the contextual variation in the title or abstract or entire document over time. However, topic distribution over words in different components of the document is non-uniform due to the varying impact of authors and citations, and their contribution to drift must be processed accordingly. This paper builds a model that distinguishes the context of a research document based on the author and citation by incorporating relation between topic, author, citation, word and time in the form of author context vector and citation context vector. To infer posterior probabilities, a parallel author cited_author topic model is presented. Continuous time bivariate Brownian motion model is employed for deducing the evolving bivariate topic parameters, specific to the author and citation. The word, topic pairs from the author and citation context vectors are jointly learned to yield topical word embeddings over time conditioned on author and citation contexts. When evaluated with NIPS and business journals datasets, the proposed model identifies topical variations over time precisely compared to other methods. It is found that broadening of topic happens due to the author context, and topic deviation is mainly caused by citation context.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. http://www.edscave.com/forecasting---exponential-smoothing.html.

  2. http://www.datalab.uci.edu/author-topic/NIPs.htm.

  3. https://github.com/ukgovdatascience/topic2vec.

  4. http://qwone.com/~jason/20Newsgroups.

  5. http://deeplearning.net/tutorial/rnnslu.html.

References

  1. Alzubi OA, Alzubi JA, Tedmori S, Rashaideh H, Almomani O (2018) Consensus-based combining method for classifier ensembles. Int Arab J Inf Technol 15(1):76–86

    Google Scholar 

  2. Alzubi OA, Alzubi JA, Alweshah M, Qiqieh I, Al-Shami S, Ramachandran M (2020a) An optimal pruning algorithm of classifier ensembles: dynamic programming approach. Neural Comput Appl 32:16091–16107

    Article  Google Scholar 

  3. Alzubi JA, Jain R, Kathuria A, Khandelwal A, Saxena A, Singh A (2020b) Paraphrase identification using collaborative adversarial networks. J Intell Fuzzy Syst 39(1):1021–1032

    Article  Google Scholar 

  4. Alzubi JA (2016) Diversity-based boosting algorithm. Int J Adv Comput Sci Appl 7(5):524–529

    Google Scholar 

  5. Amjad T, Daud A, Song M (2018) Measuring the impact of topic drift in scholarly networks. Companion Proc Web Conf 2018:373–378

    Google Scholar 

  6. Bai X, Zhang F, Lee I (2019) Predicting the citations of scholarly paper. J Informetr 13(1):407–418

    Article  Google Scholar 

  7. Bhadury A, Chen J, Zhu J, Liu S (2016) Scaling up dynamic topic models. In: Proceedings of the 25th international conference on world wide web. International World Wide Web Conferences Steering Committee, pp 381–390

  8. Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the 23rd international conference on machine learning, ACM, pp 113–120

  9. Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  10. Bolellia L, Gilesb SECL (2009) What is trendy? Generative models for topic detection in scientific literature

  11. Dietz L, Bickel S, Scheffer T (2007) Unsupervised prediction of citation influences. In: Proceedings of the 24th international conference on machine learning, ACM, pp 233–240

  12. Dubey A, Hefny A, Williamson S, Xing EP (2013) A nonparametric mixture model for topic modeling over time. In: Proceedings of the 2013 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, pp 530–538

  13. Giaquinto R, Banerjee A (2018) Topic modeling on health journals with regularized variational inference. In: Thirty-second AAAI conference on artificial intelligence

  14. Gupta P, Rajaram S, Schütze H, Andrassy B (2017) Deep temporal-recurrent-replicated-softmax for topical trends over time. arXiv:1711.05626

  15. Jensen S, Liu X, Yu Y, Milojevic S (2016) Generation of topic evolution trees from heterogeneous bibliographic networks. J Informetr 10(2):606–621

    Article  Google Scholar 

  16. Jeong YS, Lee SH, Gweon G (2016) Discovery of research interests of authors over time using a topic model. In: 2016 international conference on big data and smart computing (BigComp), IEEE, pp 24–31

  17. Jeong YK, Song M, Ding Y (2014) Content-based author co-citation analysis. J Informetr 8(1):197–211

    Article  Google Scholar 

  18. Jiang D, Shi L, Lian R, Wu H (2016) Latent topic embedding. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, pp 2689–2698

  19. Jin J, Geng Q, Mou H, Chen C (2019) Author–subject–topic model for reviewer recommendation. J Inf Sci 45(4):554–570

    Article  Google Scholar 

  20. Kataria S, Mitra P, Caragea C, Giles CL (2011) Context sensitive topic models for author influence in document networks. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, vol 3, pp 2274–2280

  21. Kim J, Kim D, Oh A (2017) Joint modeling of topics, citations, and topical authority in academic corpora. Trans Assoc Comput Linguist 5:191–204

    Article  Google Scholar 

  22. Li Y, Xu Z, Wang X, Wang X (2020) A bibliometric analysis on deep learning during 2007–2019. Int J Mach Learn Cybern 1–20

  23. Lim KW, Buntine W (2015) Bibliographic analysis with the citation network topic model. In: Asian conference on machine learning, pp 142–158

  24. Liu Y, Liu Z, Chua TS, Sun M (2015) Topical word embeddings. In: AAAI. 2015, January, pp 2418–2424

  25. McCallum A, Corrada-Emmanuel A, Wang X (2005) The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks, with Enron and Academic Email. In: Workshop on Link Analysis, Counterterrorism and Security, pp 33–44

  26. Meng C, Yang C, Wang Y (2016) Community detection and topic drift with word embedding. In 33rd international conference on machine learning, vol 48

  27. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

  28. Morters P, Peres Y (2010) Brownian motion, vol 30. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  29. Naveed N, Sizov S, Rauf Z (2019) ATTention: understanding authors and topics in context of temporal evolution. J Appl Emerg Sci 8(2):181–185

    Google Scholar 

  30. Naveed N, Sizov S, Staab S (2011) Attention: understanding authors and topics in context of temporal evolution. In: European conference on information retrieval. Springer, Berlin, pp 733–737

  31. Nguyen DQ, Billingsley R, Du L, Johnson M (2015) Improving topic models with latent feature word representations. Trans Assoc Comput Linguist 3:299–313

    Article  Google Scholar 

  32. Niu L, Dai XY, Huang S, Chen J (2016) A unified framework for jointly learning distributed representations of word and attributes. In: Asian conference on machine learning, pp 143–156

  33. Niu L, Dai X, Zhang J, Chen J (2015) Topic2Vec: learning distributed representations of topics. In: 2015 international conference on Asian language processing (IALP), IEEE, pp 193–196

  34. Rismanto R, Syulistyo AR, Agusta BPC (2020) Research supervisor recommendation system based on topic conformity. Int J Mod Educ Comput Sci 12(1):26

    Article  Google Scholar 

  35. Rosen-Zvi M, Chemudugunta C, Griffiths T, Smyth P, Steyvers M (2010) Learning author-topic models from text corpora. ACM Transactions on Information Systems (TOIS) 28(1):4

    Article  Google Scholar 

  36. Rudolph M, Blei D (2017) Dynamic Bernoulli embeddings for language evolution. arXiv:1703.08052

  37. Sahragard R, Meihami H (2016) A diachronic study on the information provided by the research titles of applied linguistics journals. Scientometrics 108(3):1315–1331

    Article  Google Scholar 

  38. Saier T, Farber M (2020) unarXive: a large scholarly data set with publications’ full-text, annotated in-text citations, and links to metadata. Scientometrics 125:3085–3108

  39. Sigman K (2006) Geometric Brownian motion. http://www.columbia.edu/~ks20/FE-Notes/4700-07-Notes-GBM.pdf

  40. Sleeman J, Halem M, Finin T, Cane M (2016) Dynamic topic modeling to infer the influence of research citations on ipcc assessment reports. In: Big data challenges, research, and technologies in the earth and planetary sciences workshop, IEEE international conference on big data, IEEE

  41. Shi B, Lam W, Jameel S, Schockaert S, Lai KP (2017) Jointly learning word embeddings and latent topics. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 375–384

  42. Shi M, Tang Y, Zhu X, Liu J, He H (2020) Topical network embedding. Data Min Knowl Disc 34(1):75–100

    Article  MathSciNet  Google Scholar 

  43. Wang C, Blei D, Heckerman D (2012) Continuous time dynamic topic models. arXiv:1206.3298

  44. Wang X, McCallum A (2006) Topics over time: a non-Markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 424–433

  45. Wang J, Wu X, Li L (2018) A framework for semantic connection based topic evolution with DeepWalk. Intell Data Anal 22(1):211–237

    Article  Google Scholar 

  46. Yang J, Donnat C (2017) CS 224N: language dynamics analysis through Word2Vec embeddings

  47. Yang M, Zhu D, Tang Y, Wang J (2017) Authorship attribution with topic drift model. In: AAAI, pp 5015–5016

  48. Zhou H, Yu H, Hu R (2017) Topic evolution based on the probabilistic topic model: a review. Front Comput Sci 11(5):786–802

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Vijayarani.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vijayarani, J., Geetha, T.V. Joint learning of author and citation contexts for computing drift in scholarly documents. Int. J. Mach. Learn. & Cyber. 12, 1667–1686 (2021). https://doi.org/10.1007/s13042-020-01265-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-020-01265-6

Keywords

Navigation