skip to main content
10.1145/3097983.3098153acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

Sparse Compositional Local Metric Learning

Published:13 August 2017Publication History

ABSTRACT

Mahalanobis distance metric learning becomes an especially challenging problem as the dimension of the feature space p is scaled upwards. The number of parameters to optimize grows with space complexity of order O (p 2), making storage infeasible, interpretability poor, and causing the model to have a high tendency to overfit. Additionally, optimization while maintaining feasibility of the solution becomes prohibitively expensive, requiring a projection onto the positive semi-definite cone after every iteration. In addition to the obvious space and computational challenges, vanilla distance metric learning is unable to model complex and multi-modal trends in the data. Inspired by the recent resurgence of Frank-Wolfe style optimization, we propose a new method for sparse compositional local Mahalanobis distance metric learning. Our proposed technique learns a set of distance metrics which are composed of local and global components. We capture local interactions in the feature space, while ensuring that all metrics share a global component, which may act as a regularizer. We optimize our model using an alternating pairwise Frank-Wolfe style algorithm. This serves a dual purpose, we can control the sparsity of our solution, and altogether avoid any expensive projection operations. Finally, we conduct an empirical evaluation of our method with the current state of the art and present the results on five datasets from varying domains.

References

  1. Aurélien Bellet, Amaury Habrard, and Marc Sebban. 2013. A Survey on Metric Learning for Feature Vectors and Structured Data. CoRR abs/1306.6709 (2013). http://arxiv.org/abs/1306.6709Google ScholarGoogle Scholar
  2. Deng Cai, Xuanhui Wang, and Xiaofei He. 2009. Probabilistic Dyadic Data Analysis with Local and Global Consistency. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML '09). ACM, New York, NY, USA, 105--112. https://doi.org/10.1145/1553374.1553388 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2 (2011), 27:1--27:27. Issue 3. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jason V. Davis and Inderjit S. Dhillon. 2008. Structured Metric Learning for High Dimensional Problems. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '08). ACM, New York, NY, USA, 195--203. https://doi.org/10.1145/1401890.1401918 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jason V. Davis, Brian Kulis, Prateek Jain, Suvrit Sra, and Inderjit S. Dhillon. 2007. Information-theoretic Metric Learning. In Proceedings of the 24th International Conference on Machine Learning (ICML '07). ACM, New York, NY, USA, 209--216. https://doi.org/10.1145/1273496.1273523 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ethan Fetaya and Shimon Ullman. 2015. Learning Local Invariant Mahalanobis Distances. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), David Blei and Francis Bach (Eds.). JMLR Workshop and Conference Proceedings, 162--168. http://jmlr.org/proceedings/papers/v37/fetaya15.pdfGoogle ScholarGoogle Scholar
  7. Marguerite Frank and Philip Wolfe. 1956. An algorithm for quadratic programming. Naval Research Logistics Quarterly 3, 1--2 (1956), 95--110. https://doi.org/10.1002/nav.3800030109Google ScholarGoogle ScholarCross RefCross Ref
  8. Andrea Frome, Yoram Singer, Fei Sha, and Jitendra Malik. 2007. Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification. In IEEE 11th International Conference on Computer Vision, ICCV 2007, Rio de Janeiro, Brazil, October 14--20, 2007. 1--8. https://doi.org/10.1109/ICCV.2007.4408839Google ScholarGoogle ScholarCross RefCross Ref
  9. Xingyu Gao, Steven C. H. Hoi, Yongdong Zhang, Ji Wan, and Jintao Li. 2014. SOML: Sparse Online Metric Learning with Application to Image Retrieval. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27-31, 2014, Québec City, Québec, Canada. 1206--1212. http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8369Google ScholarGoogle ScholarCross RefCross Ref
  10. Jacob Goldberger, Geoffrey E Hinton, Sam T. Roweis, and Ruslan R Salakhutdinov. 2005. Neighbourhood Components Analysis. In Advances in Neural Information Processing Systems 17, L. K. Saul, Y. Weiss, and L. Bottou (Eds.). MIT Press, 513--520. http://papers.nips.cc/paper/2566-neighbourhood-components-analysis.pdfGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  11. Derek Greene and Pádraig Cunningham. 2006. Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering. In Proc. 23rd International Conference on Machine learning (ICML'06). ACM Press, 377--384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yi Hong, Quannan Li, Jiayan Jiang, and Zhuowen Tu. 2011. Learning a mixture of sparse distance metrics for classification and dimensionality reduction. In 2011 International Conference on Computer Vision. 906--913. https://doi.org/10.1109/ICCV.2011.6126332Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Sung Ju Hwang, Kristen Grauman, and Fei Sha. 2011. Learning a tree of metrics with disjoint visual features.Google ScholarGoogle Scholar
  14. Martin Jaggi. 2011. Sparse Convex Optimization Methods for Machine Learning. Ph.D. Dissertation. ETH Zurich. https://doi.org/10.3929/ethz-a-007050453Google ScholarGoogle Scholar
  15. Martin Jaggi. 2013. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), Sanjoy Dasgupta and David Mcallester (Eds.), Vol. 28. JMLR Workshop and Conference Proceedings, 427--435. http://jmlr.csail.mit.edu/proceedings/papers/v28/jaggi13.pdfGoogle ScholarGoogle Scholar
  16. D. M. Johnson, C. Xiong, and J. J. Corso. 2014. Semi-Supervised Nonlinear Distance Metric Learning via Forests of Max-Margin Cluster Hierarchies. ArXiv e-prints (Feb. 2014). arXiv:stat.ML/1402.5565Google ScholarGoogle Scholar
  17. Armand Joulin, Kevin Tang, and Li Fei-Fei. 2014. Efficient image and video co-localization with frank-wolfe algorithm. In European Conference on Computer Vision. Springer, 253--268. Google ScholarGoogle ScholarCross RefCross Ref
  18. Dor Kedem, Stephen Tyree, Fei Sha, Gert R. Lanckriet, and Kilian Q Weinberger. 2012. Non-linear Metric Learning. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2573--2581. http://papers.nips.cc/paper/4840-non-linear-metric-learning.pdfGoogle ScholarGoogle Scholar
  19. Yumi Kondo, Matias Salibian-Barrera, and Ruben Zamar. 2016. RSKC: An R Package for a Robust and Sparse K-Means Clustering Algorithm. Journal of Statistical Software, Articles 72, 5 (2016), 1--26. https://doi.org/10.18637/jss.v072.i05Google ScholarGoogle Scholar
  20. Brian Kulis. 2013. Metric Learning: A Survey. Foundations and Trends in Machine Learning 5, 4 (2013), 287--364. https://doi.org/10.1561/2200000019 Google ScholarGoogle ScholarCross RefCross Ref
  21. Simon Lacoste-Julien and Martin Jaggi. 2015. On the Global Linear Convergence of Frank-Wolfe Optimization Variants. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS'15). MIT Press, Cambridge, MA, USA, 496--504. http://dl.acm.org/citation.cfm?id=2969239.2969295Google ScholarGoogle Scholar
  22. Simon Lacoste-Julien, Fredrik Lindsten, and Francis Bach. 2015. Sequential kernel herding: Frank-Wolfe optimization for particle filtering. arXiv preprint arXiv:1501.02056(2015).Google ScholarGoogle Scholar
  23. Hao Lei, Kuizhi Mei, Jingmin Xin, Peixiang Dong, and Jianping Fan. 2016. Hierarchical learning of large-margin metrics for large-scale image classification. Neurocomputing 208 (2016), 46--58. https://doi.org/10.1016/j.neucom.2016.01.100 SI: BridgingSemantic. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Daryl Lim, Gert R. G. Lanckriet, and Brian McFee. 2013. Robust Structural Metric Learning.. In ICML (1) (JMLR Workshop and Conference Proceedings), Vol. 28. JMLR.org, 615--623.Google ScholarGoogle Scholar
  25. Kuan Liu, Aurélien Bellet, and Fei Sha. 2015. Similarity Learning for High-Dimensional Sparse Data. In AISTATS.Google ScholarGoogle Scholar
  26. Wei Liu, Cun Mu, Rongrong Ji, Shiqian Ma, John R. Smith, and Shih-Fu Chang. 2015. Low-rank Similarity Metric Learning in High Dimensions. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI'15). AAAI Press, 2792--2799. http://dl.acm.org/citation.cfm?id=2886521.2886710Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Brian Mcfee and Gert Lanckriet. 2010. Metric learning to rank. In In Proceedings of the 27th annual International Conference on Machine Learning (ICML.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Shreyas Saxena and Jakob Verbeek. 2015. Coordinated Local Metric Learning. In ICCV ChaLearn Looking at People workshop (Proceedings IEEE International Conference on Computer Vision Workshops). IEEE, Santiago, Chile, 369--377. https://doi.org/10.1109/ICCVW.2015.56 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Matthew Schultz and Thorsten Joachims. 2004. Learning a Distance Metric from Relative Comparisons. In Advances in Neural Information Processing Systems 16, S. Thrun, L. K. Saul, and P. B. Schölkopf (Eds.). MIT Press, 41--48. http://papers.nips.cc/paper/2366-learning-a-distance-metric-from-relative-comparisons.pdfGoogle ScholarGoogle Scholar
  30. Chunhua Shen, Junae Kim, Lei Wang, and Anton Van Den Hengel. 2012. Positive Semidefinite Metric Learning Using Boosting-like Algorithms. J. Mach. Learn. Res. 13, 1 (April 2012), 1007--1036. http://dl.acm.org/citation.cfm?id=2503308.2343679Google ScholarGoogle Scholar
  31. Yuan Shi, Aurélien Bellet, and Fei Sha. 2014. Sparse Compositional Metric Learning. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence (AAAI'14). AAAI Press, 2078--2084. http://dl.acm.org/citation.cfm?id=2892753.2892841Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jun Wang, Adam Woznica, and Alexandros Kalousis. 2012. Parametric Local Metric Learning for Nearest Neighbor Classification. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS'12). Curran Associates Inc., USA, 1601--1609. http://dl.acm.org/citation.cfm?id=2999134.2999313Google ScholarGoogle Scholar
  33. K.Q. Weinberger and L.K. Saul. 2009. Distance metric learning for large margin nearest neighbor classification. The Journal of Machine Learning Research 10 (2009), 207--244.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Eric P. Xing, Michael I. Jordan, Stuart J Russell, and Andrew Y. Ng. 2003. Distance Metric Learning with Application to Clustering with Side-Information. In Advances in Neural Information Processing Systems 15, S. Becker, S. Thrun, and K. Obermayer (Eds.). MIT Press, 521--528. http://papers.nips.cc/paper/2164-distance-metric-learning-with-application-to-clustering-with-side-information. pdfGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  35. Caiming Xiong, David Johnson, Ran Xu, and Jason J. Corso. 2012. Random Forests for Metric Learning with Implicit Pairwise Position Dependence. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '12). ACM, New York, NY, USA, 958--966. https://doi.org/10.1145/2339530.2339680 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yiming Ying and Peng Li. 2012. Distance Metric Learning with Eigenvalue Optimization. J. Mach. Learn. Res. 13 (Jan. 2012), 1--26. http://dl.acm.org/citation. cfm?id=2188385.2188386Google ScholarGoogle Scholar
  37. De-Chuan Zhan, Ming Li, Yu-Feng Li, and Zhi-Hua Zhou. 2009. Learning Instance Specific Distances Using Metric Propagation. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML '09). ACM, New York, NY, USA, 1225--1232. https://doi.org/10.1145/1553374.1553530 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Yu Zheng, Jianping Fan, Ji Zhang, and Xinbo Gao. 2017. Hierarchical learning of multi-task sparse metrics for large-scale image classification. Pattern Recognition 67 (2017), 97--109. https://doi.org/10.1016/j.patcog.2017.01.02g Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Sparse Compositional Local Metric Learning

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
              August 2017
              2240 pages
              ISBN:9781450348874
              DOI:10.1145/3097983

              Copyright © 2017 ACM

              Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 13 August 2017

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              KDD '17 Paper Acceptance Rate64of748submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%

              Upcoming Conference

              KDD '24

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader