Clustering High Dimensional Data Using SVM

Lin, Tsau Young; Ngo, Tam

doi:10.1007/978-3-540-72530-5_30

Tsau Young Lin²⁴ &
Tam Ngo²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4482))

Included in the following conference series:

International Workshop on Rough Sets, Fuzzy Sets, Data Mining, and Granular-Soft Computing

1564 Accesses
4 Citations

Abstract

The Web contains massive amount of documents to the point where it has become impossible to classify them manually. This project’s goal is to find a new method for clustering documents that is as close to humans’ classification as possible and at the same time to reduce the size of the documents. This project uses a combination of Latent Semantic Indexing (LSI) with Singular Value Decomposition (SVD) calculation and Support Vector Machine (SVM) classification. Using SVD, data is decomposed and truncated to reduce the data size. The reduced data will be clustered into different categories. Using SVM, clustered data from SVD calculation is used for training to allow new data to be classified based on SVM’s prediction. The project’s result show that the method of combining SVD and SVM is able to reduce data size and classifies documents reasonably compared to humans’ classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bennett, K.P., Campbell, C.: Support Vector Machines: Hype or Hellelujah? ACM SIGKDD Explorations 2(2), 1–13 (2000)
Article Google Scholar
Chang, C., Lin, C.: LIBSVM: a library for support vector machines (November 29, 2006), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)
MATH Google Scholar
Fan, R.: LIBSVM Data: Classification, Regression, and Multi-label (November 28, 2006), http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
Garcia, E.: SVD and LSI Tutorial 4: Latent Semantic Indexing (LSI) How-to Calculations (November 28, 2006), http://www.miislita.com/information-retrieval-tutorial/svd-lsi-tutorial-4-lsi-how-to-calculations.html
Hicklin, J., Moler, C., Webb, P.: JAMA: A Java Matrix Package (November 28, 2006), http://math.nist.gov/javanumerics/jama/
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features (November 28, 1998), http://www.cs.cornell.edu/People/tj/publications/joachims_98a.pdf
Joachims, T.: Support Vector Machines (November 28, 2006), http://svmlight.joachims.org/
Reuters-21578 Text Categorization Test Collection ( November 28, 2006), http://www.daviddlewis.com/resources/testcollections/reuters21578/
Support vector machine (December 28, 2005), http://en.wikipedia.org/wiki/Support_vector_machine
Wikipedia (December 8, 2005), http://en.wikipedia.org/wiki/Tf
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (2000)
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, San José State University, San Jose, CA 95192, USA
Tsau Young Lin & Tam Ngo

Authors

Tsau Young Lin
View author publications
You can also search for this author in PubMed Google Scholar
Tam Ngo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science and Engineering, York University, M3J 1P3, Toronto, Ontario, Canada
Aijun An
Institute of Computing Sciences, Poznań University of Technology, ul. Piotrowo 2, 60–965, Poznań, Poland
Jerzy Stefanowski
Department of Applied Computer Science, University of Winnipeg, R3B 2E9, Winnipeg, Manitoba, Canada
Sheela Ramanna
Department of Computer Science, University of Regina, S4S 0A2, Regina, Saskatchewan, Canada
Cory J. Butz
Department of Electrical and Computer Engineering, University of Alberta, T6G 2V4, Edmonton, Alberta, Canada
Witold Pedrycz
Institute of Compuer Science and Technology, Chongqing University of Posts and Telecommunications, 40065, Chongqing, P.R. China
Guoyin Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, T.Y., Ngo, T. (2007). Clustering High Dimensional Data Using SVM. In: An, A., Stefanowski, J., Ramanna, S., Butz, C.J., Pedrycz, W., Wang, G. (eds) Rough Sets, Fuzzy Sets, Data Mining and Granular Computing. RSFDGrC 2007. Lecture Notes in Computer Science(), vol 4482. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72530-5_30

Download citation

DOI: https://doi.org/10.1007/978-3-540-72530-5_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72529-9
Online ISBN: 978-3-540-72530-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics