Skip to main content

Topic-Aware Visual Citation Tracing via Enhanced Term Weighting for Efficient Literature Retrieval

  • Conference paper
  • First Online:
  • 459 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 737))

Abstract

Efficient retrieval of scientific literature related to a certain topic plays a key role in research work. While little has been done on topic-enabled citation filtering in traditional citation tracing, this paper presents visual citation tracing of scientific papers with document topics taken into consideration. Improved term selection and weighting are employed for mining the most relevant citations. A variation of the TF-IDF scheme, which uses external domain resources as references is proposed to calculate the term weighting in a particular domain. Moreover document weight is also incorporated in the calculation of term weight from a group of citations. A simple hierarchical word weighting method is also presented to handle keyword phrases. A visual interface is designed and implemented to interactively present the citation tracks in chord diagram and Sankey diagram.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Wei, H., Zhao, Y., Liu, E., Wu, S., Deng, Z., Parvinzamir, F., Dong, F.: Management of scientific documents and visualization of citation relationships using weighted key scientific terms. In: DATA 2016, pp. 135–143 (2016)

    Google Scholar 

  2. Wei, H., Wu, S., Zhao, Y., Deng, Z., Ersotelos, N., Parvinzamir, F., Liu, B., Liu, E., Dong, F.: Data mining, management and visualization in large scientific corpuses. Edutainment 2016, 371–379 (2016)

    Google Scholar 

  3. Grolinger, K., HigashinoEmail, W., Tiwari, A., Capretz, M.: Data management in cloud environments: NoSQL and NewSQL data stores. J. Cloud Comput. Adv. Syst. Appl. Adv. Syst. Appl. 2(1), 2–22 (2013)

    Article  Google Scholar 

  4. Kivikangas, P., Ishizuka, M.: Improving semantic queries by utilizing UNL ontology and a graph database. In: Proceedings of the 6th IEEE International Conference on Semantic Computing, pp. 83–86 (2012)

    Google Scholar 

  5. Neo4j. https://neo4j.com/

  6. Tsai, F.S., Kwee, A.T.: Experiments in term weighting for novelty mining. Expert Syst. Appl. 38(11), 14094–14101 (2011)

    Google Scholar 

  7. Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Proceedings of the 2003 ACM Symposium on Applied Computing, pp. 784–788. ACM Press (2003)

    Google Scholar 

  8. Zhang, Y., Tsai, F.S.: Combining named entities and tags for novel sentence detection. In: Proceedings of the WSDM Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR 2009), pp. 30–34 (2009)

    Google Scholar 

  9. Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)

    Book  MATH  Google Scholar 

  10. Domeniconi, G., Moro, G., Pasolini, R., Sartori, C.: A study on term weighting for text categorization: a novel supervised variant of tf.idf. In: Proceedings of the 4th International Conference on Data Management Technologies and Applications, pp. 26–37 (2015)

    Google Scholar 

  11. Li, F., Pan, S.J., Jin, O., Yang, Q., Zhu, X.: Cross-domain co-extraction of sentiment and topic lexicons. In: Proceedings of the 50th Annual Meeting Association for Computational Linguistics: Long Papers (ACL 2012), vol. 1, pp. 410–419 (2012)

    Google Scholar 

  12. Domeniconi, G., Moro, G., Pasolini, R., Sartori, C.: Cross-domain text classification through iterative refining of target categories representations. In: Proceedings of the 6th International Conference on Knowledge Discovery & Information Retrieval (KDIR) (2014)

    Google Scholar 

  13. Alencar, A.B., Oliveira, M.C., Paulovich, F.V.: Seeing beyond reading: a survey on visual text analytics. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 2(6), 476–492 (2012)

    Article  Google Scholar 

  14. Fu, S.: A survey on visual text analytics (2015). http://www.cse.ust.hk/~sfuaa/data/pqe.pdf

  15. Federico, P., Heimerl, F., Koch, S., Miksch, S.: A survey on visual approaches for analyzing scientific literature and patents. TVCG (2016)

    Google Scholar 

  16. Zhao, D., Strotmann, A.: Analysis and Visualization of Citation Networks. Synthesis Lectures on Information Concepts Retrieval and Services, vol. 7(1) (2015)

    Google Scholar 

  17. Chen, C.: CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. J. Am. Soc. Inf. Sci. Technol. 57(3), 359–377 (2006)

    Article  Google Scholar 

  18. Zhang, J., Chen, C., Li, J.: Visualizing the intellectual structure with paper-reference matrices. IEEE TVCG 15(6), 1153–1160 (2009)

    Google Scholar 

  19. Stasko, J., Choo, J., Han, Y., Hu, M., Pileggi, H., Sadana, R., Stolper, C.: Citevis: exploring conference paper citation data visually. Poster IEEE Vis. (2013)

    Google Scholar 

  20. Gorg, C., Liu, Z., Kihm, J., Choo, J., Park, H., Stasko, J.: Combining computational analyses and interactive visualization for document exploration and sense making in jigsaw. IEEE TVCG 19(10), 1646–1663 (2013)

    Google Scholar 

  21. Doerk, M., Riche, N., Ramos, G., Dumais, S.: Pivotpaths: strolling through faceted information spaces. IEEE TVCG 18(12), 2709–2718 (2012)

    Google Scholar 

  22. van Eck, N., Waltman, L.: CitNetExplorer: a new software tool for analyzing and visualizing citation network. J. Inf. 8(4), 802–823 (2014)

    Article  Google Scholar 

  23. Heimerl, F., Han, Q., Koch, S., Ertl, T.: CiteRivers: visual analytics of citation patterns. IEEE TVCG 22(1), 190–199 (2016)

    Google Scholar 

  24. ACM SIGGRAPH. www.siggraph.org

  25. MAS API. http://academic.research.microsoft.com/about/

  26. Fensel, D., Hendler, J., Lieberman, H., Wahlster, W., Berners-Lee, T.: Sesame: An Architecture for Storing and Querying RDF Data and Schema Information. In: MIT Press eBook Chapters: Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential, pp. 197–222 (2005)

    Google Scholar 

  27. Cunningham, H., Maynard, D., Bontcheva, K., Tablan., V.: GATE: a framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia (2002)

    Google Scholar 

  28. Apach CouchDB. http://couchdb.apache.org/

  29. Huang, H., Dong, Z.: Research on architecture and query performance based on distributed graph database Neo4j. In: Proceedings of the 3rd International Conference Consumer Electronics, Communications and Networks (CECNet), pp. 533–536 (2013)

    Google Scholar 

  30. Elasticsearch. https://www.elastic.co/products/elasticsearch

  31. Elasticsearch attachment plugin. https://github.com/elastic/elasticsearch-mapper-attachments

  32. pdfbox. https://pdfbox.apache.org/

  33. CARRE. https://www.carre-project.eu/

  34. ANNIE. https://gate.ac.uk/sale/tao/splitch6.html#chap:annie

  35. Thakker, D., Sman, T., Lakin, P.: GATE Jape Grammar Tutorial, Version 1.0, A, Pictures, UK (2009)

    Google Scholar 

  36. Microsoft Academic Search (MAS) API. http://academic.research.microsoft.com/

  37. D3. http://d3js.org/

  38. Riehmann, P., Hanfler, M., Froehlich, B.: Interactive sankey diagrams. In: Proceedings of the IEEE Symposium on Information Visualization, pp. 233–240 (2005)

    Google Scholar 

  39. Blei, M., Ng, Y., Jordan, I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(4–5), 993–1022 (2003)

    MATH  Google Scholar 

  40. Havre, S., Hetzler, E., Whitney, P., Nowell, L.: Themeriver: visualizing thematic changes in large document collections. IEEE Trans. Vis. Comput. Graph. 8(1), 9–20 (2002)

    Article  Google Scholar 

Download references

Acknowledgments

The research is supported by the FP7 Programme of the European Commission within projects Dr Inventor [FP7-ICT-611383] and CARRE [FP7-ICT-611140]. We would like to thank the European Commission for the funding and thank the project officers and reviewers for their indispensable support for both of the projects.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Wei .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zhao, Y. et al. (2017). Topic-Aware Visual Citation Tracing via Enhanced Term Weighting for Efficient Literature Retrieval. In: Francalanci, C., Helfert, M. (eds) Data Management Technologies and Applications. DATA 2016. Communications in Computer and Information Science, vol 737. Springer, Cham. https://doi.org/10.1007/978-3-319-62911-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-62911-7_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-62910-0

  • Online ISBN: 978-3-319-62911-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics