ABSTRACT
Mahalanobis distance metric learning becomes an especially challenging problem as the dimension of the feature space p is scaled upwards. The number of parameters to optimize grows with space complexity of order O (p 2), making storage infeasible, interpretability poor, and causing the model to have a high tendency to overfit. Additionally, optimization while maintaining feasibility of the solution becomes prohibitively expensive, requiring a projection onto the positive semi-definite cone after every iteration. In addition to the obvious space and computational challenges, vanilla distance metric learning is unable to model complex and multi-modal trends in the data. Inspired by the recent resurgence of Frank-Wolfe style optimization, we propose a new method for sparse compositional local Mahalanobis distance metric learning. Our proposed technique learns a set of distance metrics which are composed of local and global components. We capture local interactions in the feature space, while ensuring that all metrics share a global component, which may act as a regularizer. We optimize our model using an alternating pairwise Frank-Wolfe style algorithm. This serves a dual purpose, we can control the sparsity of our solution, and altogether avoid any expensive projection operations. Finally, we conduct an empirical evaluation of our method with the current state of the art and present the results on five datasets from varying domains.
- Aurélien Bellet, Amaury Habrard, and Marc Sebban. 2013. A Survey on Metric Learning for Feature Vectors and Structured Data. CoRR abs/1306.6709 (2013). http://arxiv.org/abs/1306.6709Google Scholar
- Deng Cai, Xuanhui Wang, and Xiaofei He. 2009. Probabilistic Dyadic Data Analysis with Local and Global Consistency. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML '09). ACM, New York, NY, USA, 105--112. https://doi.org/10.1145/1553374.1553388 Google ScholarDigital Library
- Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2 (2011), 27:1--27:27. Issue 3. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.Google ScholarDigital Library
- Jason V. Davis and Inderjit S. Dhillon. 2008. Structured Metric Learning for High Dimensional Problems. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '08). ACM, New York, NY, USA, 195--203. https://doi.org/10.1145/1401890.1401918 Google ScholarDigital Library
- Jason V. Davis, Brian Kulis, Prateek Jain, Suvrit Sra, and Inderjit S. Dhillon. 2007. Information-theoretic Metric Learning. In Proceedings of the 24th International Conference on Machine Learning (ICML '07). ACM, New York, NY, USA, 209--216. https://doi.org/10.1145/1273496.1273523 Google ScholarDigital Library
- Ethan Fetaya and Shimon Ullman. 2015. Learning Local Invariant Mahalanobis Distances. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), David Blei and Francis Bach (Eds.). JMLR Workshop and Conference Proceedings, 162--168. http://jmlr.org/proceedings/papers/v37/fetaya15.pdfGoogle Scholar
- Marguerite Frank and Philip Wolfe. 1956. An algorithm for quadratic programming. Naval Research Logistics Quarterly 3, 1--2 (1956), 95--110. https://doi.org/10.1002/nav.3800030109Google ScholarCross Ref
- Andrea Frome, Yoram Singer, Fei Sha, and Jitendra Malik. 2007. Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification. In IEEE 11th International Conference on Computer Vision, ICCV 2007, Rio de Janeiro, Brazil, October 14--20, 2007. 1--8. https://doi.org/10.1109/ICCV.2007.4408839Google ScholarCross Ref
- Xingyu Gao, Steven C. H. Hoi, Yongdong Zhang, Ji Wan, and Jintao Li. 2014. SOML: Sparse Online Metric Learning with Application to Image Retrieval. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27-31, 2014, Québec City, Québec, Canada. 1206--1212. http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8369Google ScholarCross Ref
- Jacob Goldberger, Geoffrey E Hinton, Sam T. Roweis, and Ruslan R Salakhutdinov. 2005. Neighbourhood Components Analysis. In Advances in Neural Information Processing Systems 17, L. K. Saul, Y. Weiss, and L. Bottou (Eds.). MIT Press, 513--520. http://papers.nips.cc/paper/2566-neighbourhood-components-analysis.pdfGoogle ScholarDigital Library
- Derek Greene and Pádraig Cunningham. 2006. Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering. In Proc. 23rd International Conference on Machine learning (ICML'06). ACM Press, 377--384. Google ScholarDigital Library
- Yi Hong, Quannan Li, Jiayan Jiang, and Zhuowen Tu. 2011. Learning a mixture of sparse distance metrics for classification and dimensionality reduction. In 2011 International Conference on Computer Vision. 906--913. https://doi.org/10.1109/ICCV.2011.6126332Google ScholarDigital Library
- Sung Ju Hwang, Kristen Grauman, and Fei Sha. 2011. Learning a tree of metrics with disjoint visual features.Google Scholar
- Martin Jaggi. 2011. Sparse Convex Optimization Methods for Machine Learning. Ph.D. Dissertation. ETH Zurich. https://doi.org/10.3929/ethz-a-007050453Google Scholar
- Martin Jaggi. 2013. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), Sanjoy Dasgupta and David Mcallester (Eds.), Vol. 28. JMLR Workshop and Conference Proceedings, 427--435. http://jmlr.csail.mit.edu/proceedings/papers/v28/jaggi13.pdfGoogle Scholar
- D. M. Johnson, C. Xiong, and J. J. Corso. 2014. Semi-Supervised Nonlinear Distance Metric Learning via Forests of Max-Margin Cluster Hierarchies. ArXiv e-prints (Feb. 2014). arXiv:stat.ML/1402.5565Google Scholar
- Armand Joulin, Kevin Tang, and Li Fei-Fei. 2014. Efficient image and video co-localization with frank-wolfe algorithm. In European Conference on Computer Vision. Springer, 253--268. Google ScholarCross Ref
- Dor Kedem, Stephen Tyree, Fei Sha, Gert R. Lanckriet, and Kilian Q Weinberger. 2012. Non-linear Metric Learning. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2573--2581. http://papers.nips.cc/paper/4840-non-linear-metric-learning.pdfGoogle Scholar
- Yumi Kondo, Matias Salibian-Barrera, and Ruben Zamar. 2016. RSKC: An R Package for a Robust and Sparse K-Means Clustering Algorithm. Journal of Statistical Software, Articles 72, 5 (2016), 1--26. https://doi.org/10.18637/jss.v072.i05Google Scholar
- Brian Kulis. 2013. Metric Learning: A Survey. Foundations and Trends in Machine Learning 5, 4 (2013), 287--364. https://doi.org/10.1561/2200000019 Google ScholarCross Ref
- Simon Lacoste-Julien and Martin Jaggi. 2015. On the Global Linear Convergence of Frank-Wolfe Optimization Variants. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS'15). MIT Press, Cambridge, MA, USA, 496--504. http://dl.acm.org/citation.cfm?id=2969239.2969295Google Scholar
- Simon Lacoste-Julien, Fredrik Lindsten, and Francis Bach. 2015. Sequential kernel herding: Frank-Wolfe optimization for particle filtering. arXiv preprint arXiv:1501.02056(2015).Google Scholar
- Hao Lei, Kuizhi Mei, Jingmin Xin, Peixiang Dong, and Jianping Fan. 2016. Hierarchical learning of large-margin metrics for large-scale image classification. Neurocomputing 208 (2016), 46--58. https://doi.org/10.1016/j.neucom.2016.01.100 SI: BridgingSemantic. Google ScholarDigital Library
- Daryl Lim, Gert R. G. Lanckriet, and Brian McFee. 2013. Robust Structural Metric Learning.. In ICML (1) (JMLR Workshop and Conference Proceedings), Vol. 28. JMLR.org, 615--623.Google Scholar
- Kuan Liu, Aurélien Bellet, and Fei Sha. 2015. Similarity Learning for High-Dimensional Sparse Data. In AISTATS.Google Scholar
- Wei Liu, Cun Mu, Rongrong Ji, Shiqian Ma, John R. Smith, and Shih-Fu Chang. 2015. Low-rank Similarity Metric Learning in High Dimensions. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI'15). AAAI Press, 2792--2799. http://dl.acm.org/citation.cfm?id=2886521.2886710Google ScholarDigital Library
- Brian Mcfee and Gert Lanckriet. 2010. Metric learning to rank. In In Proceedings of the 27th annual International Conference on Machine Learning (ICML.Google ScholarDigital Library
- Shreyas Saxena and Jakob Verbeek. 2015. Coordinated Local Metric Learning. In ICCV ChaLearn Looking at People workshop (Proceedings IEEE International Conference on Computer Vision Workshops). IEEE, Santiago, Chile, 369--377. https://doi.org/10.1109/ICCVW.2015.56 Google ScholarDigital Library
- Matthew Schultz and Thorsten Joachims. 2004. Learning a Distance Metric from Relative Comparisons. In Advances in Neural Information Processing Systems 16, S. Thrun, L. K. Saul, and P. B. Schölkopf (Eds.). MIT Press, 41--48. http://papers.nips.cc/paper/2366-learning-a-distance-metric-from-relative-comparisons.pdfGoogle Scholar
- Chunhua Shen, Junae Kim, Lei Wang, and Anton Van Den Hengel. 2012. Positive Semidefinite Metric Learning Using Boosting-like Algorithms. J. Mach. Learn. Res. 13, 1 (April 2012), 1007--1036. http://dl.acm.org/citation.cfm?id=2503308.2343679Google Scholar
- Yuan Shi, Aurélien Bellet, and Fei Sha. 2014. Sparse Compositional Metric Learning. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence (AAAI'14). AAAI Press, 2078--2084. http://dl.acm.org/citation.cfm?id=2892753.2892841Google ScholarDigital Library
- Jun Wang, Adam Woznica, and Alexandros Kalousis. 2012. Parametric Local Metric Learning for Nearest Neighbor Classification. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS'12). Curran Associates Inc., USA, 1601--1609. http://dl.acm.org/citation.cfm?id=2999134.2999313Google Scholar
- K.Q. Weinberger and L.K. Saul. 2009. Distance metric learning for large margin nearest neighbor classification. The Journal of Machine Learning Research 10 (2009), 207--244.Google ScholarDigital Library
- Eric P. Xing, Michael I. Jordan, Stuart J Russell, and Andrew Y. Ng. 2003. Distance Metric Learning with Application to Clustering with Side-Information. In Advances in Neural Information Processing Systems 15, S. Becker, S. Thrun, and K. Obermayer (Eds.). MIT Press, 521--528. http://papers.nips.cc/paper/2164-distance-metric-learning-with-application-to-clustering-with-side-information. pdfGoogle ScholarDigital Library
- Caiming Xiong, David Johnson, Ran Xu, and Jason J. Corso. 2012. Random Forests for Metric Learning with Implicit Pairwise Position Dependence. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '12). ACM, New York, NY, USA, 958--966. https://doi.org/10.1145/2339530.2339680 Google ScholarDigital Library
- Yiming Ying and Peng Li. 2012. Distance Metric Learning with Eigenvalue Optimization. J. Mach. Learn. Res. 13 (Jan. 2012), 1--26. http://dl.acm.org/citation. cfm?id=2188385.2188386Google Scholar
- De-Chuan Zhan, Ming Li, Yu-Feng Li, and Zhi-Hua Zhou. 2009. Learning Instance Specific Distances Using Metric Propagation. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML '09). ACM, New York, NY, USA, 1225--1232. https://doi.org/10.1145/1553374.1553530 Google ScholarDigital Library
- Yu Zheng, Jianping Fan, Ji Zhang, and Xinbo Gao. 2017. Hierarchical learning of multi-task sparse metrics for large-scale image classification. Pattern Recognition 67 (2017), 97--109. https://doi.org/10.1016/j.patcog.2017.01.02g Google ScholarDigital Library
Index Terms
- Sparse Compositional Local Metric Learning
Recommendations
Kernel regression with sparse metric learning
Kernel regression is a popular non-parametric fitting technique. It aims at learning a function which estimates the targets for test inputs as precise as possible. Generally, the function value for a test input is estimated by a weighted average of the ...
Sparse metric learning via smooth optimization
NIPS'09: Proceedings of the 22nd International Conference on Neural Information Processing SystemsIn this paper we study the problem of learning a low-rank (sparse) distance matrix. We propose a novel metric learning model which can simultaneously conduct dimension reduction and learn a distance matrix. The sparse representation involves a mixed-...
Sparse distance metric learning
Nearest neighbour classification requires a good distance metric. Previous approaches try to learn a quadratic distance metric learning so that observations of different classes are well separated. For high-dimensional problems, where many uninformative ...
Comments