Exploiting edge semantics in citation graphs using efficient, vertical ARM

Rahal, Imad; Ren, Dongmei; Wu, Weihua; Denton, Anne; Besemann, Christopher; Perrizo, William

doi:10.1007/s10115-005-0229-2

Exploiting edge semantics in citation graphs using efficient, vertical ARM

Regular Paper
Published: 09 September 2005

Volume 10, pages 57–91, (2006)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Imad Rahal¹,
Dongmei Ren²,
Weihua Wu²,
Anne Denton²,
Christopher Besemann² &
…
William Perrizo²

106 Accesses
3 Citations
Explore all metrics

Abstract

Graphs are increasingly becoming a vital source of information within which a great deal of semantics is embedded. As the size of available graphs increases, our ability to arrive at the embedded semantics grows into a much more complicated task. One form of important hidden semantics is that which is embedded in the edges of directed graphs. Citation graphs serve as a good example in this context. This paper attempts to understand temporal aspects in publication trends through citation graphs, by identifying patterns in the subject matters of scientific publications using an efficient, vertical association rule mining model. Such patterns can (a) indicate subject-matter evolutionary history, (b) highlight subject-matter future extensions, and (c) give insights on the potential effects of current research on future research. We highlight our major differences with previous work in the areas of graph mining, citation mining, and Web-structure mining, propose an efficient vertical data representation model, introduce a new subjective interestingness measure for evaluating patterns with a special focus on those patterns that signify strong associations between properties of cited papers and citing papers, and present an efficient algorithm for the purpose of discovering rules of interest followed by a detailed experimental analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A Visualization of Relationships Among Papers Using Citation and Co-citation Information

ReViz: A Tool for Automatically Generating Citation Graphs and Variants

Analysis of Text-Enriched Heterogeneous Information Networks

References

Agrawal R, Imielinski T, Swami A (1993) Mining associations between sets of items in massive databases. In: Buneman P, Jajodia S (eds) Proceedings of the ACM SIGMOD international conference on management of data, May 1993. Washington, DC, USA. ACM Press, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Bocca JB, Jarke M, Zaniolo C (eds) Proceedings of the 20th VLDB international conference on very large databases, September 1994. Morgan Kaufmann, Santiago, Chile, pp 487–499
Akutsu T, Miyano S, Kuhara S (1999) Identification of genetic networks from a small number of gene expression patterns under the Boolean network model. In: Proceedings of the PSB Pacific symposium on biocomputing. Hawaii, USA, January 1999, pp 17–28
AIP (2004) American Institute of Physics. http://www.aip.org, 24 June 2004
Borgelt C (2003) Efficient implementations of Apriori and ECLAT. In: Geothals B, Zaki M (eds) CEUR workshop proceedings of the IEEE ICDM 1st FIMI international workshop frequent itemset mining implementations, 6 June 2004. http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS//Vol-90/
Brown M, Grundy WN, Lin D et al (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Genetics 97(1):262–267
Google Scholar
Chakrabarti S, Dom BE, Gibson D et al (1999) Mining the link structure of the World Wide Web. IEEE Comput 32(8):60–67
Google Scholar
Ding Q, Ding Qi, Perrizo W (2002a) Association rule mining on remotely sensed images using P-trees. In: Cheng MS, Yu PS, Liu B (eds) Proceedings of the 6th PAKDD Pacific-Asian conference on knowledge discovery and data mining, Taipei, Taiwan, May 2002. Lecture notes in computer science 2336. Springer-Verlag, Berlin Heidelberg New York, pp 66–79
Ding Q, Khan M, Roy A et al (2002b) The P-tree algebra. In: Panda B (ed) Proceedings of the 17th ACM SAC symposium on applied computing, Madrid, Spain, March 2002. ACM Press, pp 413–417
Egghe L, Rousseau R (1990) Introduction to informetrics: quantitative methods in library, documentation and information science. Elsevier, Amsterdam
Google Scholar
FIMI (2003) Frequent Itemset Mining Implementations Repository. http://fimi.cs.helsinki.fi, 1 June 2004
Geurts K, Wets G, Brijs T et al (2003) Profiling high frequency accident locations using association rules. In: Proceedings of the 82nd TRB annual meeting of the transportation tesearch board. Washington, DC, USA, January 2003, p 18
Glover F (1994) Tabu search for nonlinear and parametric optimization (with links to genetic algorithms). Discrete Appl Math 49(1–3):231–255
Article MathSciNet MATH Google Scholar
Goethals B (2002) Personal Website. http://www.adrem.ua.ac.be/~goethals/, 2 March 2004
Goethals B (2003) Survey on frequent pattern mining. http://www.adrem.ua.ac.be/~goethals/, 30 December 2004
Goethals B (2004) Memory issues in frequent itemset mining. In: Haddad H, Omicini A, Wainwright RL, Liebrock LM (eds) Proceeding of the 19th ACM SAC symposium on applied computing. Nicosia, Cyprus, March 2004. ACM Press, pp 530–534
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Chen W, Naughton JF, Bernstein PA (eds) Proceedings of the ACM SIGMOD international conference on management of data, Dallas, Texas, USA, June 2002. ACM Press, pp 1–12
Heilbroner RL, Galbraith JK (1986) The nature and logic of capitalism. W.W. Norton and Company, New York
Google Scholar
IBM (2004) IBM Quest Synthetic Data Generator. http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html, 7 April 2004
Inokuchi A, Washio T, Motoda H (2000) An Apriori-based algorithm for mining frequent substructures from graph data. In: Zighed DA, Komorowski HJ, Zytkow JM (eds) Proceedings of the PKDD European conference on principles and practice of knowledge discovery in databases, Lyon, France, September 2000. Lecture notes in computer science 1910. Springer, Berlin Heidelberg New York, pp 13–23
KDD Cup Competition (2003) The 9th ACM SIGKDD CUP international conference on knowledge discovery and data mining. http://www.cs.cornell.edu/projects/kddcup, 30 December 2004
Khan M, Ding Q, Perrizo W (2002) textitk-nearest neighbor classification on spatial data streams using P-trees. In: Cheng MS, Yu PS, Liu B (eds) Proceedings of the 6th PAKDD Pacific-Asian conference on knowledge discovery and data mining. Taipei, Taiwan, May 2002. Lecture notes in computer science 2336. Springer, pp 517–518
Kleinberg JM (1998) Authoritative sources in a hyperlinked environment. In: Proceedings of the ACM-SIAM SODA symposium on discrete algorithms. San Francisco, California, USA, January 1998, pp 668–677
Kosters WA, Pijls W (2003) APRIORI, a depth first implementation. In: Geothals B, Zaki M (eds) CEUR workshop proceedings of the IEEE ICDM 1st FIMI international workshop frequent itemset mining implementations. http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS//Vol-90/, 6 June 2004
Kostoff RN, Del Rio JA, Humenik JA, et al (2001) Citation mining: integrating text mining and bibliometrics for research user profiling. J Am Soc Inform Sci Technol 52(13)
Google Scholar
Kuramochi M, Karypis G (2002) An efficient algorithm for discovering frequent subgraphs. Technical Report 02-026, University of Minnesota, Department of Computer Science, http://www.users.cs.umn.edu/~karypis/publications/Papers/PDF/fsg2.pdf, 20 February 2004
Law A, Kelton W (2000) Experimental design, sensitivity analysis, and optimization. In: Simulation modeling and analysis. McGraw-Hill, Singapore, pp 622–668
Google Scholar
Liu B, Hsu W (1996) Post-analysis of learned rules. In: Proceedings of the AAAI national conference on artificial intelligence. Portland, Oregon, USA, August 1996. AAAI Press/MIT Press, pp 828–834
Matsuda T, Motoda H, Yoshida T et al (2002) Mining patterns from structured data by beam-wise graph-based induction. In: Lange S, Satoh K, Smith CH (eds) Proceedings of the 5th DS international conference on discovery science. London, UK, November 2002. Lecture notes in computer science 2534, pp 422–429
Osareh F (1996) Bibliometrics, citation analysis and co-citation analysis: a review of literature I. Libri 46(4):149–158
Article Google Scholar
Oyama T, Kitano K, Satou K, et al (2000) Mining association rules related to protein–protein interactions. Genome Inform 11:358–359
Google Scholar
Padmanabhan B, Tuzhilin A (1999) Unexpectedness as a measure of interestingness in knowledge discovery. Decision Support Syst 27(3)
Parthasarathy S, Coatney M (2002) Efficient discovery of common substructures in macromolecules. In: Proceedings of the 2nd IEEE ICDM international conference on data mining. IEEE Computer Society 2002. Maebashi City, Japan, December 2002, pp 362–369
Perrizo W (2001) Peano count tree technology lab notes. Technical Report NDSU-CS-TR-01, North Dakota State University, Computer Science Department, http://www.cs.ndsu.nodak.edu/~perrizo/classes/785/pct.html, 18 January 2003
Perrizo W, Ding Q, Denton A et al (2003) PINE–-podium incremental neighbor evaluator for spatial data using ptrees. In: Proceedings of the ACM SAC symposium on applied computing. Melbourne, Florida, USA, March 2003. ACM Press, pp 503–508
Perrizo W, Ding Q, Roy A (2001) Deriving high confidence rules from spatial data using peano count trees. In: Wang WS, Yu G, Lu H (eds) Proceedings of the WAIM international conference on web-page information management, Xi'an, China. Lecture notes in computer science 2118. Springer, Berlin Heidelberg New York, pp 91–102.
Pietracaprina A, Zandolin D (2003) Mining frequent itemsets using patricia tries. In: Geothals B, Zaki M (eds) CEUR workshop proceedings of the IEEE ICDM 1st FIMI international workshop on frequent itemset mining implementations. http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS//Vol-90/, 6 June 2004
Pipenbacker P, Schliep A, Schneckener S et al (2002) ProClust: improved clustering of protein sequences with an extended graph-based approach. Bioinformatics 18:182–191
Article PubMed Google Scholar
Pray L (2003) Unraveling protein–protein interactions. Scientist 17(2). http://www.the-scientist.com/yr2003/jan/lcprofile1_030127.html, 30 December 2004
Rahal I, Perrizo W (2004) An optimized approach for knn text categorization using p-trees. In: Haddad H, Omicini A, Wainwright RL, Liebrock LM (eds) Proceedings of the 19th ACM SAC symposium on applied computing. Nicosia, Cyprus, March 2004. ACM Press, pp 613–617
Serazi M, Perera A, Ding Q, et al (2004) DataMIME™. In: Weikum G, König AC, Deßloch S (eds) Proceedings of the ACM SIGMOD international conference on management of data. Paris, France, June 2004. ACM Press, pp 923–924
Shenoy P, Haristsa JR, Sudatsham S. et al (2000) Turbo-charging vertical mining of large databases. In: Chen W, Naughton JF, Bernstein PA (eds) Proceedings of the ACM SIGMOD international conference on management of data. Dallas, Texas, USA, May 2000. ACM Press, pp 22–29
Silberschatz A, Tuzhilin A (1995) On subjective measures of interestingness in knowledge discovery. In: Fayyad U, Uthurusamy R (eds) Proceedings of the 1st ACM SIGKDD international conference on knowledge discovery and data mining. Montreal, Canada, August 1995. AAAI Press, pp 275–281
Silberschatz A, Tuzhilin A (1996) What makes patterns interesting in knowledge discovery systems. Special issue on data mining. IEEE TKDE Trans Knowledge Data Eng 8(6):970–974
Article Google Scholar
Suzuki E (1997) Autonomous discovery of reliable exception rules. In: Heckerman D, Mannila H, Pregibon D (eds) Proceedings of the 3rd ACM SIGKDD international conference on knowledge discovery and data mining. Newport Beach, California, USA, August 1997. AAAI Press, pp 259–262
TouchGraph LLC (2001) http://www.touchgraph.com [24 November 2004]
UCI (1998) University of California Irvine data repository. http://www.ics.uci.edu/~mlearn/MLRepository.html [30 December 2004]
Wang B, Pan F, Ren D et al (2003) Efficient olap operations for spatial using Peano trees. In: Zaki M, Aggrawal C (eds) Proceedings of the ACM SIGMOD DMKD workshop on data mining and knowledge discovery. San Diego, California, USA, June 2003. ACM Press, pp 28–34
Yahoo! Inc. (2005). http://www.yahoo.com [10 February 2005]
Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: Proceedings of the IEEE ICDM international conference on data mining. Maebashi City, Japan, December 2002. IEEE Computer Society, pp 721–724
Zaki M, Gouda K (2003) Fast vertical mining using diffsets. In: Getoor L, Senator TE, Domingos P, Faloutsos C (eds) Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining. Washington, DC, USA, August 2003. ACM Press, pp 326–335
Zaki M, Parthasarathy S, Ogihara M, et al (1997) New algorithms for fast discovery of association rules. In: Heckerman D, Mannila H, Pregibon D (eds). Proceedings of the 3rd ACM SIGKDD international conference on knowledge discovery and data mining. Newport Beach, California, USA, August 1997. AAAI Press, pp 283–286

Download references

Author information

Authors and Affiliations

Department of Computer Science, Peter Engel Science Center (room 211), Saint John's University, Collegeville, MN, 56321-3000, USA
Imad Rahal
Computer Science and Operations Research Department, North Dakota State University, Fargo, ND, USA
Dongmei Ren, Weihua Wu, Anne Denton, Christopher Besemann & William Perrizo

Authors

Imad Rahal
View author publications
You can also search for this author in PubMed Google Scholar
Dongmei Ren
View author publications
You can also search for this author in PubMed Google Scholar
Weihua Wu
View author publications
You can also search for this author in PubMed Google Scholar
Anne Denton
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Besemann
View author publications
You can also search for this author in PubMed Google Scholar
William Perrizo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Imad Rahal.

Additional information

Imad Rahal is a newly appointed assistant professor in the Department of Computer Science at the College of Saint Benedict ∣ Saint John's University, Collegeville, MN, and a Ph.D. candidate at North Dakota State University, Fargo, ND. In August 2003, he earned his master's degree in computer science from North Dakota State University. Prior to that, he graduated summa cum laude from the Lebanese American University, Beirut, Lebanon, in February 2001 with a bachelor's degree in computer science. Currently, he is completing the final requirements for his Ph.D. degree in computer science on an NSF ND-EPSCoR doctoral dissertation assistantship with August of 2005 as a projected completion date. He is very active in research, proposal writing, and publications; his research interests are largely in the broad areas of data mining, machine learning, databases, artificial intelligence, and bioinformatics.

Dongmei Ren is working for the Database Technology Institute for z/OS, IBM Silicon Valley Lab, San Jose, CA, as a staff software engineer. She holds a Ph.D. degree from North Dakota State University, Fargo, ND, and master's and bachelor's degrees from TianJin University, TianJin, China. She has been a software engineer at DaTang Telecommunications, Beijing, China. Her areas of expertise are outlier analysis, data mining and knowledge discovery, database systems, machine learning, intelligent systems, wireless networks and bioinformatics. She has been awarded the Siemens Scholarship research enhancement for excellent performance in study and research. She is a member of ACM, IEEE.

Weihua Wu is a network monitoring & managed services analyst at Hewlett-Packard Co. in Canada. He holds a master's degree from North Dakota State University and a bachelor's degree from Nanjing University, both in computer science. His research areas of interest include data mining, knowledge discovery, data warehousing, information technology, network security, and bioinformatics. He has participated in various projects supported by NSF, DARPA, NASA, USDA, and GSA grants.

Anne Denton is an assistant professor in computer science at North Dakota State University. Her research interests are in data mining, knowledge discovery in scientific data, and bioinformatics. Specific interests include data mining of diverse data, in which objects are characterized by a variety of properties such as numerical and categorical attributes, graphs, sequences, time-dependent attributes, and others. She received her Ph.D. in physics from the University of Mainz, Germany, and her M.S. in computer science from North Dakota State University, Fargo, ND.

Christopher Besemann received his M.Sc. in computer science from North Dakota State University in Fargo, ND, 2005. Currently, he works in data mining research topics including association mining and relational data mining with recent work in model integration as a research assistant. He is accepted under a fellowship program for Ph.D. study at North Dakota State University.

William Perrizo is a professor of computer science at North Dakota State University. He holds a Ph.D. degree from the University of Minnesota, a master's degree from the University of Wisconsin and a bachelor's degree from St. John's University. He has been a research scientist at the IBM Advanced Business Systems Division and the U.S. Air Force Electronic Systems Division. His areas of expertise are data mining, knowledge discovery, database systems, distributed database systems, high speed computer and communications networks, precision agriculture and bioinformatics. He is a member of ISCA, ACM, IEEE, IAAA, and AAAS.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rahal, I., Ren, D., Wu, W. et al. Exploiting edge semantics in citation graphs using efficient, vertical ARM. Knowl Inf Syst 10, 57–91 (2006). https://doi.org/10.1007/s10115-005-0229-2

Download citation

Received: 21 February 2004
Revised: 23 February 2004
Accepted: 02 May 2005
Published: 09 September 2005
Issue Date: July 2006
DOI: https://doi.org/10.1007/s10115-005-0229-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Exploiting edge semantics in citation graphs using efficient, vertical ARM

Abstract

Access this article

Similar content being viewed by others

A Visualization of Relationships Among Papers Using Citation and Co-citation Information

ReViz: A Tool for Automatically Generating Citation Graphs and Variants

Analysis of Text-Enriched Heterogeneous Information Networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Exploiting edge semantics in citation graphs using efficient, vertical ARM

Abstract

Access this article

Similar content being viewed by others

A Visualization of Relationships Among Papers Using Citation and Co-citation Information

ReViz: A Tool for Automatically Generating Citation Graphs and Variants

Analysis of Text-Enriched Heterogeneous Information Networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation