Skip to main content

Clustering High Dimensional Data Using SVM

  • Conference paper
Rough Sets, Fuzzy Sets, Data Mining and Granular Computing (RSFDGrC 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4482))

Abstract

The Web contains massive amount of documents to the point where it has become impossible to classify them manually. This project’s goal is to find a new method for clustering documents that is as close to humans’ classification as possible and at the same time to reduce the size of the documents. This project uses a combination of Latent Semantic Indexing (LSI) with Singular Value Decomposition (SVD) calculation and Support Vector Machine (SVM) classification. Using SVD, data is decomposed and truncated to reduce the data size. The reduced data will be clustered into different categories. Using SVM, clustered data from SVD calculation is used for training to allow new data to be classified based on SVM’s prediction. The project’s result show that the method of combining SVD and SVM is able to reduce data size and classifies documents reasonably compared to humans’ classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bennett, K.P., Campbell, C.: Support Vector Machines: Hype or Hellelujah? ACM SIGKDD Explorations 2(2), 1–13 (2000)

    Article  Google Scholar 

  2. Chang, C., Lin, C.: LIBSVM: a library for support vector machines (November 29, 2006), http://www.csie.ntu.edu.tw/~cjlin/libsvm

  3. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)

    MATH  Google Scholar 

  4. Fan, R.: LIBSVM Data: Classification, Regression, and Multi-label (November 28, 2006), http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/

  5. Garcia, E.: SVD and LSI Tutorial 4: Latent Semantic Indexing (LSI) How-to Calculations (November 28, 2006), http://www.miislita.com/information-retrieval-tutorial/svd-lsi-tutorial-4-lsi-how-to-calculations.html

  6. Hicklin, J., Moler, C., Webb, P.: JAMA: A Java Matrix Package (November 28, 2006), http://math.nist.gov/javanumerics/jama/

  7. Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features (November 28, 1998), http://www.cs.cornell.edu/People/tj/publications/joachims_98a.pdf

  8. Joachims, T.: Support Vector Machines (November 28, 2006), http://svmlight.joachims.org/

  9. Reuters-21578 Text Categorization Test Collection ( November 28, 2006), http://www.daviddlewis.com/resources/testcollections/reuters21578/

  10. Support vector machine (December 28, 2005), http://en.wikipedia.org/wiki/Support_vector_machine

  11. Wikipedia (December 8, 2005), http://en.wikipedia.org/wiki/Tf

  12. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (2000)

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lin, T.Y., Ngo, T. (2007). Clustering High Dimensional Data Using SVM. In: An, A., Stefanowski, J., Ramanna, S., Butz, C.J., Pedrycz, W., Wang, G. (eds) Rough Sets, Fuzzy Sets, Data Mining and Granular Computing. RSFDGrC 2007. Lecture Notes in Computer Science(), vol 4482. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72530-5_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72530-5_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72529-9

  • Online ISBN: 978-3-540-72530-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics