Abstract
The Web contains massive amount of documents to the point where it has become impossible to classify them manually. This project’s goal is to find a new method for clustering documents that is as close to humans’ classification as possible and at the same time to reduce the size of the documents. This project uses a combination of Latent Semantic Indexing (LSI) with Singular Value Decomposition (SVD) calculation and Support Vector Machine (SVM) classification. Using SVD, data is decomposed and truncated to reduce the data size. The reduced data will be clustered into different categories. Using SVM, clustered data from SVD calculation is used for training to allow new data to be classified based on SVM’s prediction. The project’s result show that the method of combining SVD and SVM is able to reduce data size and classifies documents reasonably compared to humans’ classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bennett, K.P., Campbell, C.: Support Vector Machines: Hype or Hellelujah? ACM SIGKDD Explorations 2(2), 1–13 (2000)
Chang, C., Lin, C.: LIBSVM: a library for support vector machines (November 29, 2006), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)
Fan, R.: LIBSVM Data: Classification, Regression, and Multi-label (November 28, 2006), http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
Garcia, E.: SVD and LSI Tutorial 4: Latent Semantic Indexing (LSI) How-to Calculations (November 28, 2006), http://www.miislita.com/information-retrieval-tutorial/svd-lsi-tutorial-4-lsi-how-to-calculations.html
Hicklin, J., Moler, C., Webb, P.: JAMA: A Java Matrix Package (November 28, 2006), http://math.nist.gov/javanumerics/jama/
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features (November 28, 1998), http://www.cs.cornell.edu/People/tj/publications/joachims_98a.pdf
Joachims, T.: Support Vector Machines (November 28, 2006), http://svmlight.joachims.org/
Reuters-21578 Text Categorization Test Collection ( November 28, 2006), http://www.daviddlewis.com/resources/testcollections/reuters21578/
Support vector machine (December 28, 2005), http://en.wikipedia.org/wiki/Support_vector_machine
Wikipedia (December 8, 2005), http://en.wikipedia.org/wiki/Tf
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lin, T.Y., Ngo, T. (2007). Clustering High Dimensional Data Using SVM. In: An, A., Stefanowski, J., Ramanna, S., Butz, C.J., Pedrycz, W., Wang, G. (eds) Rough Sets, Fuzzy Sets, Data Mining and Granular Computing. RSFDGrC 2007. Lecture Notes in Computer Science(), vol 4482. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72530-5_30
Download citation
DOI: https://doi.org/10.1007/978-3-540-72530-5_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72529-9
Online ISBN: 978-3-540-72530-5
eBook Packages: Computer ScienceComputer Science (R0)