research-article

Public Access

Sparse Compositional Local Metric Learning

Authors:
Joseph St.Amand

University of Kansas, Lawrence, KS, USA

University of Kansas, Lawrence, KS, USA
View Profile

,
Jun Huan

University of Kansas, Lawrence, KS, USA

University of Kansas, Lawrence, KS, USA
View Profile

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningAugust 2017Pages 1097–1104https://doi.org/10.1145/3097983.3098153

Published:13 August 2017Publication History

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 1097–1104

ABSTRACT

Mahalanobis distance metric learning becomes an especially challenging problem as the dimension of the feature space p is scaled upwards. The number of parameters to optimize grows with space complexity of order O (p ²), making storage infeasible, interpretability poor, and causing the model to have a high tendency to overfit. Additionally, optimization while maintaining feasibility of the solution becomes prohibitively expensive, requiring a projection onto the positive semi-definite cone after every iteration. In addition to the obvious space and computational challenges, vanilla distance metric learning is unable to model complex and multi-modal trends in the data. Inspired by the recent resurgence of Frank-Wolfe style optimization, we propose a new method for sparse compositional local Mahalanobis distance metric learning. Our proposed technique learns a set of distance metrics which are composed of local and global components. We capture local interactions in the feature space, while ensuring that all metrics share a global component, which may act as a regularizer. We optimize our model using an alternating pairwise Frank-Wolfe style algorithm. This serves a dual purpose, we can control the sparsity of our solution, and altogether avoid any expensive projection operations. Finally, we conduct an empirical evaluation of our method with the current state of the art and present the results on five datasets from varying domains.

References

Aurélien Bellet, Amaury Habrard, and Marc Sebban. 2013. A Survey on Metric Learning for Feature Vectors and Structured Data. CoRR abs/1306.6709 (2013). http://arxiv.org/abs/1306.6709Google Scholar
Deng Cai, Xuanhui Wang, and Xiaofei He. 2009. Probabilistic Dyadic Data Analysis with Local and Global Consistency. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML '09). ACM, New York, NY, USA, 105--112. https://doi.org/10.1145/1553374.1553388 Google ScholarDigital Library
Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2 (2011), 27:1--27:27. Issue 3. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.Google ScholarDigital Library
Jason V. Davis and Inderjit S. Dhillon. 2008. Structured Metric Learning for High Dimensional Problems. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '08). ACM, New York, NY, USA, 195--203. https://doi.org/10.1145/1401890.1401918 Google ScholarDigital Library
Jason V. Davis, Brian Kulis, Prateek Jain, Suvrit Sra, and Inderjit S. Dhillon. 2007. Information-theoretic Metric Learning. In Proceedings of the 24th International Conference on Machine Learning (ICML '07). ACM, New York, NY, USA, 209--216. https://doi.org/10.1145/1273496.1273523 Google ScholarDigital Library
Ethan Fetaya and Shimon Ullman. 2015. Learning Local Invariant Mahalanobis Distances. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), David Blei and Francis Bach (Eds.). JMLR Workshop and Conference Proceedings, 162--168. http://jmlr.org/proceedings/papers/v37/fetaya15.pdfGoogle Scholar
Marguerite Frank and Philip Wolfe. 1956. An algorithm for quadratic programming. Naval Research Logistics Quarterly 3, 1--2 (1956), 95--110. https://doi.org/10.1002/nav.3800030109Google ScholarCross Ref
Andrea Frome, Yoram Singer, Fei Sha, and Jitendra Malik. 2007. Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification. In IEEE 11th International Conference on Computer Vision, ICCV 2007, Rio de Janeiro, Brazil, October 14--20, 2007. 1--8. https://doi.org/10.1109/ICCV.2007.4408839Google ScholarCross Ref
Xingyu Gao, Steven C. H. Hoi, Yongdong Zhang, Ji Wan, and Jintao Li. 2014. SOML: Sparse Online Metric Learning with Application to Image Retrieval. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27-31, 2014, Québec City, Québec, Canada. 1206--1212. http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8369Google ScholarCross Ref
Jacob Goldberger, Geoffrey E Hinton, Sam T. Roweis, and Ruslan R Salakhutdinov. 2005. Neighbourhood Components Analysis. In Advances in Neural Information Processing Systems 17, L. K. Saul, Y. Weiss, and L. Bottou (Eds.). MIT Press, 513--520. http://papers.nips.cc/paper/2566-neighbourhood-components-analysis.pdfGoogle ScholarDigital Library
Derek Greene and Pádraig Cunningham. 2006. Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering. In Proc. 23rd International Conference on Machine learning (ICML'06). ACM Press, 377--384. Google ScholarDigital Library
Yi Hong, Quannan Li, Jiayan Jiang, and Zhuowen Tu. 2011. Learning a mixture of sparse distance metrics for classification and dimensionality reduction. In 2011 International Conference on Computer Vision. 906--913. https://doi.org/10.1109/ICCV.2011.6126332Google ScholarDigital Library
Sung Ju Hwang, Kristen Grauman, and Fei Sha. 2011. Learning a tree of metrics with disjoint visual features.Google Scholar
Martin Jaggi. 2011. Sparse Convex Optimization Methods for Machine Learning. Ph.D. Dissertation. ETH Zurich. https://doi.org/10.3929/ethz-a-007050453Google Scholar
Martin Jaggi. 2013. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), Sanjoy Dasgupta and David Mcallester (Eds.), Vol. 28. JMLR Workshop and Conference Proceedings, 427--435. http://jmlr.csail.mit.edu/proceedings/papers/v28/jaggi13.pdfGoogle Scholar
D. M. Johnson, C. Xiong, and J. J. Corso. 2014. Semi-Supervised Nonlinear Distance Metric Learning via Forests of Max-Margin Cluster Hierarchies. ArXiv e-prints (Feb. 2014). arXiv:stat.ML/1402.5565Google Scholar
Armand Joulin, Kevin Tang, and Li Fei-Fei. 2014. Efficient image and video co-localization with frank-wolfe algorithm. In European Conference on Computer Vision. Springer, 253--268. Google ScholarCross Ref
Dor Kedem, Stephen Tyree, Fei Sha, Gert R. Lanckriet, and Kilian Q Weinberger. 2012. Non-linear Metric Learning. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2573--2581. http://papers.nips.cc/paper/4840-non-linear-metric-learning.pdfGoogle Scholar
Yumi Kondo, Matias Salibian-Barrera, and Ruben Zamar. 2016. RSKC: An R Package for a Robust and Sparse K-Means Clustering Algorithm. Journal of Statistical Software, Articles 72, 5 (2016), 1--26. https://doi.org/10.18637/jss.v072.i05Google Scholar
Brian Kulis. 2013. Metric Learning: A Survey. Foundations and Trends in Machine Learning 5, 4 (2013), 287--364. https://doi.org/10.1561/2200000019 Google ScholarCross Ref
Simon Lacoste-Julien and Martin Jaggi. 2015. On the Global Linear Convergence of Frank-Wolfe Optimization Variants. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS'15). MIT Press, Cambridge, MA, USA, 496--504. http://dl.acm.org/citation.cfm?id=2969239.2969295Google Scholar
Simon Lacoste-Julien, Fredrik Lindsten, and Francis Bach. 2015. Sequential kernel herding: Frank-Wolfe optimization for particle filtering. arXiv preprint arXiv:1501.02056(2015).Google Scholar
Hao Lei, Kuizhi Mei, Jingmin Xin, Peixiang Dong, and Jianping Fan. 2016. Hierarchical learning of large-margin metrics for large-scale image classification. Neurocomputing 208 (2016), 46--58. https://doi.org/10.1016/j.neucom.2016.01.100 SI: BridgingSemantic. Google ScholarDigital Library
Daryl Lim, Gert R. G. Lanckriet, and Brian McFee. 2013. Robust Structural Metric Learning.. In ICML (1) (JMLR Workshop and Conference Proceedings), Vol. 28. JMLR.org, 615--623.Google Scholar
Kuan Liu, Aurélien Bellet, and Fei Sha. 2015. Similarity Learning for High-Dimensional Sparse Data. In AISTATS.Google Scholar
Wei Liu, Cun Mu, Rongrong Ji, Shiqian Ma, John R. Smith, and Shih-Fu Chang. 2015. Low-rank Similarity Metric Learning in High Dimensions. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI'15). AAAI Press, 2792--2799. http://dl.acm.org/citation.cfm?id=2886521.2886710Google ScholarDigital Library
Brian Mcfee and Gert Lanckriet. 2010. Metric learning to rank. In In Proceedings of the 27th annual International Conference on Machine Learning (ICML.Google ScholarDigital Library
Shreyas Saxena and Jakob Verbeek. 2015. Coordinated Local Metric Learning. In ICCV ChaLearn Looking at People workshop (Proceedings IEEE International Conference on Computer Vision Workshops). IEEE, Santiago, Chile, 369--377. https://doi.org/10.1109/ICCVW.2015.56 Google ScholarDigital Library
Matthew Schultz and Thorsten Joachims. 2004. Learning a Distance Metric from Relative Comparisons. In Advances in Neural Information Processing Systems 16, S. Thrun, L. K. Saul, and P. B. Schölkopf (Eds.). MIT Press, 41--48. http://papers.nips.cc/paper/2366-learning-a-distance-metric-from-relative-comparisons.pdfGoogle Scholar
Chunhua Shen, Junae Kim, Lei Wang, and Anton Van Den Hengel. 2012. Positive Semidefinite Metric Learning Using Boosting-like Algorithms. J. Mach. Learn. Res. 13, 1 (April 2012), 1007--1036. http://dl.acm.org/citation.cfm?id=2503308.2343679Google Scholar
Yuan Shi, Aurélien Bellet, and Fei Sha. 2014. Sparse Compositional Metric Learning. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence (AAAI'14). AAAI Press, 2078--2084. http://dl.acm.org/citation.cfm?id=2892753.2892841Google ScholarDigital Library
Jun Wang, Adam Woznica, and Alexandros Kalousis. 2012. Parametric Local Metric Learning for Nearest Neighbor Classification. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS'12). Curran Associates Inc., USA, 1601--1609. http://dl.acm.org/citation.cfm?id=2999134.2999313Google Scholar
K.Q. Weinberger and L.K. Saul. 2009. Distance metric learning for large margin nearest neighbor classification. The Journal of Machine Learning Research 10 (2009), 207--244.Google ScholarDigital Library
Eric P. Xing, Michael I. Jordan, Stuart J Russell, and Andrew Y. Ng. 2003. Distance Metric Learning with Application to Clustering with Side-Information. In Advances in Neural Information Processing Systems 15, S. Becker, S. Thrun, and K. Obermayer (Eds.). MIT Press, 521--528. http://papers.nips.cc/paper/2164-distance-metric-learning-with-application-to-clustering-with-side-information. pdfGoogle ScholarDigital Library
Caiming Xiong, David Johnson, Ran Xu, and Jason J. Corso. 2012. Random Forests for Metric Learning with Implicit Pairwise Position Dependence. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '12). ACM, New York, NY, USA, 958--966. https://doi.org/10.1145/2339530.2339680 Google ScholarDigital Library
Yiming Ying and Peng Li. 2012. Distance Metric Learning with Eigenvalue Optimization. J. Mach. Learn. Res. 13 (Jan. 2012), 1--26. http://dl.acm.org/citation. cfm?id=2188385.2188386Google Scholar
De-Chuan Zhan, Ming Li, Yu-Feng Li, and Zhi-Hua Zhou. 2009. Learning Instance Specific Distances Using Metric Propagation. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML '09). ACM, New York, NY, USA, 1225--1232. https://doi.org/10.1145/1553374.1553530 Google ScholarDigital Library
Yu Zheng, Jianping Fan, Ji Zhang, and Xinbo Gao. 2017. Hierarchical learning of multi-task sparse metrics for large-scale image classification. Pattern Recognition 67 (2017), 97--109. https://doi.org/10.1016/j.patcog.2017.01.02g Google ScholarDigital Library

Index Terms

Sparse Compositional Local Metric Learning
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning algorithms
      1. Feature selection
      2. Regularization
2. Theory of computation
  1. Design and analysis of algorithms
    1. Mathematical optimization
      1. Continuous optimization
        Semidefinite programming
  2. Theory and algorithms for application domains
    1. Machine learning theory
      1. Semi-supervised learning

Recommendations

Kernel regression with sparse metric learning

Kernel regression is a popular non-parametric fitting technique. It aims at learning a function which estimates the targets for test inputs as precise as possible. Generally, the function value for a test input is estimated by a weighted average of the ...
Read More
Sparse metric learning via smooth optimization
NIPS'09: Proceedings of the 22nd International Conference on Neural Information Processing Systems

In this paper we study the problem of learning a low-rank (sparse) distance matrix. We propose a novel metric learning model which can simultaneously conduct dimension reduction and learn a distance matrix. The sparse representation involves a mixed-...
Read More
Sparse distance metric learning

Nearest neighbour classification requires a good distance metric. Previous approaches try to learn a quadratic distance metric learning so that observations of different classes are well separated. For high-dimensional problems, where many uninformative ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2017
2240 pages
ISBN:9781450348874
DOI:10.1145/3097983
General Chairs:
Stan Matwin
Dalhousie University
,
Shipeng Yu
LinkedIn
,
Faisal Farooq
IBM
Copyright © 2017 ACM
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 August 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
frank-wolfe optimization
mahalanobis distance
sparse metric learning
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '17 Paper Acceptance Rate64of748submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 506
  Total Downloads
- Downloads (Last 12 months)26
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Sparse Compositional Local Metric Learning

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Kernel regression with sparse metric learning

Sparse metric learning via smooth optimization

Sparse distance metric learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Sparse Compositional Local Metric Learning

KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Kernel regression with sparse metric learning

Sparse metric learning via smooth optimization

Sparse distance metric learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media