Joint learning of author and citation contexts for computing drift in scholarly documents

Vijayarani, J.; Geetha, T. V.

doi:10.1007/s13042-020-01265-6

Joint learning of author and citation contexts for computing drift in scholarly documents

Original Article
Published: 14 January 2021

Volume 12, pages 1667–1686, (2021)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

250 Accesses
1 Citation
Explore all metrics

Abstract

Scholarly documents are sources of information on research topics written by academic experts. Topic drift in such scholarly documents is usually linked with the contextual variation in the title or abstract or entire document over time. However, topic distribution over words in different components of the document is non-uniform due to the varying impact of authors and citations, and their contribution to drift must be processed accordingly. This paper builds a model that distinguishes the context of a research document based on the author and citation by incorporating relation between topic, author, citation, word and time in the form of author context vector and citation context vector. To infer posterior probabilities, a parallel author cited_author topic model is presented. Continuous time bivariate Brownian motion model is employed for deducing the evolving bivariate topic parameters, specific to the author and citation. The word, topic pairs from the author and citation context vectors are jointly learned to yield topical word embeddings over time conditioned on author and citation contexts. When evaluated with NIPS and business journals datasets, the proposed model identifies topical variations over time precisely compared to other methods. It is found that broadening of topic happens due to the author context, and topic deviation is mainly caused by citation context.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identifying interdisciplinary topics and their evolution based on BERTopic

Article 03 July 2023

Zhongyi Wang, Jing Chen, … Haihua Chen

Visualizing Bibliometric Networks

The Sci-Hub effect on papers’ citations

Article 25 January 2021

Juan C. Correa, Henry Laverde-Rojas, … Fernando Marmolejo-Ramos

Notes

References

Alzubi OA, Alzubi JA, Tedmori S, Rashaideh H, Almomani O (2018) Consensus-based combining method for classifier ensembles. Int Arab J Inf Technol 15(1):76–86
Google Scholar
Alzubi OA, Alzubi JA, Alweshah M, Qiqieh I, Al-Shami S, Ramachandran M (2020a) An optimal pruning algorithm of classifier ensembles: dynamic programming approach. Neural Comput Appl 32:16091–16107
Article Google Scholar
Alzubi JA, Jain R, Kathuria A, Khandelwal A, Saxena A, Singh A (2020b) Paraphrase identification using collaborative adversarial networks. J Intell Fuzzy Syst 39(1):1021–1032
Article Google Scholar
Alzubi JA (2016) Diversity-based boosting algorithm. Int J Adv Comput Sci Appl 7(5):524–529
Google Scholar
Amjad T, Daud A, Song M (2018) Measuring the impact of topic drift in scholarly networks. Companion Proc Web Conf 2018:373–378
Google Scholar
Bai X, Zhang F, Lee I (2019) Predicting the citations of scholarly paper. J Informetr 13(1):407–418
Article Google Scholar
Bhadury A, Chen J, Zhu J, Liu S (2016) Scaling up dynamic topic models. In: Proceedings of the 25th international conference on world wide web. International World Wide Web Conferences Steering Committee, pp 381–390
Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the 23rd international conference on machine learning, ACM, pp 113–120
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Bolellia L, Gilesb SECL (2009) What is trendy? Generative models for topic detection in scientific literature
Dietz L, Bickel S, Scheffer T (2007) Unsupervised prediction of citation influences. In: Proceedings of the 24th international conference on machine learning, ACM, pp 233–240
Dubey A, Hefny A, Williamson S, Xing EP (2013) A nonparametric mixture model for topic modeling over time. In: Proceedings of the 2013 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, pp 530–538
Giaquinto R, Banerjee A (2018) Topic modeling on health journals with regularized variational inference. In: Thirty-second AAAI conference on artificial intelligence
Gupta P, Rajaram S, Schütze H, Andrassy B (2017) Deep temporal-recurrent-replicated-softmax for topical trends over time. arXiv:1711.05626
Jensen S, Liu X, Yu Y, Milojevic S (2016) Generation of topic evolution trees from heterogeneous bibliographic networks. J Informetr 10(2):606–621
Article Google Scholar
Jeong YS, Lee SH, Gweon G (2016) Discovery of research interests of authors over time using a topic model. In: 2016 international conference on big data and smart computing (BigComp), IEEE, pp 24–31
Jeong YK, Song M, Ding Y (2014) Content-based author co-citation analysis. J Informetr 8(1):197–211
Article Google Scholar
Jiang D, Shi L, Lian R, Wu H (2016) Latent topic embedding. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, pp 2689–2698
Jin J, Geng Q, Mou H, Chen C (2019) Author–subject–topic model for reviewer recommendation. J Inf Sci 45(4):554–570
Article Google Scholar
Kataria S, Mitra P, Caragea C, Giles CL (2011) Context sensitive topic models for author influence in document networks. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, vol 3, pp 2274–2280
Kim J, Kim D, Oh A (2017) Joint modeling of topics, citations, and topical authority in academic corpora. Trans Assoc Comput Linguist 5:191–204
Article Google Scholar
Li Y, Xu Z, Wang X, Wang X (2020) A bibliometric analysis on deep learning during 2007–2019. Int J Mach Learn Cybern 1–20
Lim KW, Buntine W (2015) Bibliographic analysis with the citation network topic model. In: Asian conference on machine learning, pp 142–158
Liu Y, Liu Z, Chua TS, Sun M (2015) Topical word embeddings. In: AAAI. 2015, January, pp 2418–2424
McCallum A, Corrada-Emmanuel A, Wang X (2005) The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks, with Enron and Academic Email. In: Workshop on Link Analysis, Counterterrorism and Security, pp 33–44
Meng C, Yang C, Wang Y (2016) Community detection and topic drift with word embedding. In 33rd international conference on machine learning, vol 48
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Morters P, Peres Y (2010) Brownian motion, vol 30. Cambridge University Press, Cambridge
MATH Google Scholar
Naveed N, Sizov S, Rauf Z (2019) ATTention: understanding authors and topics in context of temporal evolution. J Appl Emerg Sci 8(2):181–185
Google Scholar
Naveed N, Sizov S, Staab S (2011) Attention: understanding authors and topics in context of temporal evolution. In: European conference on information retrieval. Springer, Berlin, pp 733–737
Nguyen DQ, Billingsley R, Du L, Johnson M (2015) Improving topic models with latent feature word representations. Trans Assoc Comput Linguist 3:299–313
Article Google Scholar
Niu L, Dai XY, Huang S, Chen J (2016) A unified framework for jointly learning distributed representations of word and attributes. In: Asian conference on machine learning, pp 143–156
Niu L, Dai X, Zhang J, Chen J (2015) Topic2Vec: learning distributed representations of topics. In: 2015 international conference on Asian language processing (IALP), IEEE, pp 193–196
Rismanto R, Syulistyo AR, Agusta BPC (2020) Research supervisor recommendation system based on topic conformity. Int J Mod Educ Comput Sci 12(1):26
Article Google Scholar
Rosen-Zvi M, Chemudugunta C, Griffiths T, Smyth P, Steyvers M (2010) Learning author-topic models from text corpora. ACM Transactions on Information Systems (TOIS) 28(1):4
Article Google Scholar
Rudolph M, Blei D (2017) Dynamic Bernoulli embeddings for language evolution. arXiv:1703.08052
Sahragard R, Meihami H (2016) A diachronic study on the information provided by the research titles of applied linguistics journals. Scientometrics 108(3):1315–1331
Article Google Scholar
Saier T, Farber M (2020) unarXive: a large scholarly data set with publications’ full-text, annotated in-text citations, and links to metadata. Scientometrics 125:3085–3108
Sigman K (2006) Geometric Brownian motion. http://www.columbia.edu/~ks20/FE-Notes/4700-07-Notes-GBM.pdf
Sleeman J, Halem M, Finin T, Cane M (2016) Dynamic topic modeling to infer the influence of research citations on ipcc assessment reports. In: Big data challenges, research, and technologies in the earth and planetary sciences workshop, IEEE international conference on big data, IEEE
Shi B, Lam W, Jameel S, Schockaert S, Lai KP (2017) Jointly learning word embeddings and latent topics. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 375–384
Shi M, Tang Y, Zhu X, Liu J, He H (2020) Topical network embedding. Data Min Knowl Disc 34(1):75–100
Article MathSciNet Google Scholar
Wang C, Blei D, Heckerman D (2012) Continuous time dynamic topic models. arXiv:1206.3298
Wang X, McCallum A (2006) Topics over time: a non-Markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 424–433
Wang J, Wu X, Li L (2018) A framework for semantic connection based topic evolution with DeepWalk. Intell Data Anal 22(1):211–237
Article Google Scholar
Yang J, Donnat C (2017) CS 224N: language dynamics analysis through Word2Vec embeddings
Yang M, Zhu D, Tang Y, Wang J (2017) Authorship attribution with topic drift model. In: AAAI, pp 5015–5016
Zhou H, Yu H, Hu R (2017) Topic evolution based on the probabilistic topic model: a review. Front Comput Sci 11(5):786–802
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of CSE, CEG, Anna University, Chennai, India
J. Vijayarani & T. V. Geetha

Authors

J. Vijayarani
View author publications
You can also search for this author in PubMed Google Scholar
T. V. Geetha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J. Vijayarani.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vijayarani, J., Geetha, T.V. Joint learning of author and citation contexts for computing drift in scholarly documents. Int. J. Mach. Learn. & Cyber. 12, 1667–1686 (2021). https://doi.org/10.1007/s13042-020-01265-6

Download citation

Received: 10 November 2019
Accepted: 17 December 2020
Published: 14 January 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s13042-020-01265-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Joint learning of author and citation contexts for computing drift in scholarly documents

Abstract

Access this article

Similar content being viewed by others

Identifying interdisciplinary topics and their evolution based on BERTopic

Visualizing Bibliometric Networks

The Sci-Hub effect on papers’ citations

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Joint learning of author and citation contexts for computing drift in scholarly documents

Abstract

Access this article

Similar content being viewed by others

Identifying interdisciplinary topics and their evolution based on BERTopic

Visualizing Bibliometric Networks

The Sci-Hub effect on papers’ citations

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation