A new accelerated proximal technique for regression with high-dimensional datasets

Verma, Mridula; Shukla, K. K.

doi:10.1007/s10115-017-1047-z

A new accelerated proximal technique for regression with high-dimensional datasets

Regular Paper
Published: 28 March 2017

Volume 53, pages 423–438, (2017)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

623 Accesses
7 Citations
Explore all metrics

Abstract

We consider the problem of minimization of the sum of two convex functions, one of which is a smooth function, while another one may be a nonsmooth function. Many high-dimensional learning problems (classification/regression) can be designed using such frameworks, which can be efficiently solved with the help of first-order proximal-based methods. Due to slow convergence of traditional proximal methods, a recent trend is to introduce acceleration to such methods, which increases the speed of convergence. Such proximal gradient methods belong to a wider class of the forward–backward algorithms, which mathematically can be interpreted as fixed-point iterative schemes. In this paper, we design few new proximal gradient methods corresponding to few state-of-the-art fixed-point iterative schemes and compare their performances on the regression problem. In addition, we propose a new accelerated proximal gradient algorithm, which outperforms earlier traditional methods in terms of convergence speed and regression error. To demonstrate the applicability of our method, we conducted experiments for the problem of regression with several publicly available high-dimensional real datasets taken from different application domains. Empirical results exhibit that the proposed method outperforms the previous methods in terms of convergence, accuracy, and objective function values.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Application of a new accelerated algorithm to regression problems

Article 13 April 2019

Proximal Gradient Method with Extrapolation and Line Search for a Class of Non-convex and Non-smooth Problems

Article 19 December 2023

An accelerated proximal gradient method for multiobjective optimization

Article 09 June 2023

Notes

References

Agarwal RP, O’Regan D, Sahu DR (2007) Iterative construction of fixed points of nearly asymptotically nonexpansive mappings. J Nonlinear Convex Anal 8(1):61–79
MathSciNet MATH Google Scholar
Ando RK, Zhang T (2005) A framework for learning predictive structures from multiple tasks and unlabeled data. J Mach Learn Res 6:1817–1853, http://dl.acm.org/citation.cfm?id=1046920.1194905
Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272. doi:10.1007/s10994-007-5040-8
Article Google Scholar
Bach F, Jenatton R, Mairal J, Obozinski G (2012) Structured sparsity through convex optimization. Stat Sci 27(4):450–468. doi:10.1214/12-STS394
Article MathSciNet MATH Google Scholar
Bach FR (2008) Consistency of the group lasso and multiple kernel learning. J Mach Learn Res 9:1179–1225, http://dl.acm.org/citation.cfm?id=1390681.1390721
Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci 2(1):183–202. doi:10.1137/080716542
Article MathSciNet MATH Google Scholar
Bertsekas D (1999) Nonlinear programming, 2nd edn. Athena Scientific. http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-20&path=ASIN/1886529000
Bickel S, Bogojeska J, Lengauer T, Scheffer T (2008) Multi-task learning for HIV therapy screening. In: Proceedings of the 25th international conference on machine learning. ACM, New York, ICML ’08, pp 56–63. doi:10.1145/1390156.1390164
Chambolle A, Dossal C (2015) On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”. J Optim Theory Appl 166(3):968–982. doi:10.1007/s10957-015-0746-4
Article MathSciNet MATH Google Scholar
Chang S, Qi GJ, Yang Y, Aggarwal CC, Zhou J, Wang M, Huang TS (2016) Large-scale supervised similarity learning in networks. Knowl Inform Syst 48(3):707–740. doi:10.1007/s10115-015-0894-8
Article Google Scholar
Chapelle O, Shivaswamy P, Vadrevu S, Weinberger K, Zhang Y, Tseng B (2010) Multi-task learning for boosting with application to web search ranking. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, KDD ’10, pp 1189–1198. doi:10.1145/1835804.1835953
Chen X, Lin Q, Kim S, Carbonell JG, Xing EP (2012) Smoothing proximal gradient method for general structured sparse regression. Ann Appl Stat 6(2):719–752. doi:10.1214/11-AOAS514
Article MathSciNet MATH Google Scholar
Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning. ACM, New York, ICML ’08, pp 160–167. doi:10.1145/1390156.1390177
Daumé H III, Kumar A, Saha A (2010) Frustratingly easy semi-supervised domain adaptation. In: Proceedings of the 2010 workshop on domain adaptation for natural language processing, association for computational linguistics, Stroudsburg, PA, DANLP 2010, pp 53–59. http://dl.acm.org/citation.cfm?id=1870526.1870534
Duchi J, Singer Y (2009) Efficient online and batch learning using forward backward splitting. J Mach Learn Res 10:2899–2934. http://dl.acm.org/citation.cfm?id=1577069.1755882
Jenatton R, Mairal J, Obozinski G, Bach F (2011) Proximal methods for hierarchical sparse coding. J Mach Learn Res 12:2297–2334. http://dl.acm.org/citation.cfm?id=1953048.2021074
Juditsky A, Nemirovski A (2012) First-order methods for nonsmooth convex large-scale optimization, I: general purpose methods. MIT Press, pp 121–148. http://www.jstor.org/stable/j.ctt5hhgpg.9
Li B, Yang Q, Xue X (2009) Transfer learning for collaborative filtering via a rating-matrix generative model. In: Proceedings of the 26th annual international conference on machine learning. ACM, New York, ICML ’09, pp 617–624. doi:10.1145/1553374.1553454
Liu H, Palatucci M, Zhang J (2009) Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery. In: Proceedings of the 26th annual international conference on machine learning. ACM, New York, ICML ’09, pp 649–656. doi:10.1145/1553374.1553458
Luo D, Ding C, Huang H (2013) Toward structural sparsity: an explicit \(l_2-l_0\) approach. Knowl Inform Syst 36(2):411–438. doi:10.1007/s10115-012-0545-2
Article Google Scholar
Mainge PE (2008) Convergence theorems for inertial km-type algorithms. J Comput Appl Math 219(1):223–236. doi:10.1016/j.cam.2007.07.021
Article MathSciNet MATH Google Scholar
Mann WR (1953) Mean value methods in iteration. Proc Am Math Soc 4:506–510. doi:10.1090/S0002-9939-1953-0054846-3
Article MathSciNet MATH Google Scholar
Nesterov Y (2007) Gradient methods for minimizing composite objective function. CORE Discussion Papers 2007076, Universit catholique de Louvain, Center for Operations Research and Econometrics (CORE). http://EconPapers.repec.org/RePEc:cor:louvco:2007076
Parikh N, Boyd S (2014) Proximal algorithms. Found Trends Optim 1(3):127–239. doi:10.1561/2400000003
Article Google Scholar
Quattoni A, Carreras X, Collins M, Darrell T (2009) An efficient projection for l1,∞ regularization. In: Proceedings of the 26th annual international conference on machine learning. ACM, New York, ICML ’09, pp 857–864. doi:10.1145/1553374.1553484
Sahu DR (2011) Applications of the s-iteration process to constrained minimization problems and split feasibility problems. Fixed Point Theory 12(1):187–204
MathSciNet MATH Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288. http://www.jstor.org/stable/2346178
Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J R Stat Soc Ser B (Stat Methodol) 67(1):91–108. doi:10.1111/j.1467-9868.2005.00490.x
Article MathSciNet MATH Google Scholar
Turlach BA, Venables WN, Wright SJ (2005) Simultaneous variable selection. Technometrics 47(3):349–363. doi:10.1198/004017005000000139
Article MathSciNet Google Scholar
Wang J, Liu J, Ye J (2013) Efficient mixed-norm regularization: algorithms and safe screening methods. arXiv:1307.4156 abs/CoRR
Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83
Article Google Scholar
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Stat Methodol) 68(1):49–67. doi:10.1111/j.1467-9868.2005.00532.x
Article MathSciNet MATH Google Scholar
Zhang S, Qian H, Gong X (2016) An alternating proximal splitting method with global convergence for nonconvex structured sparsity optimization. In: Proceedings of the thirtieth AAAI conference on artificial intelligence. AAAI Press, AAAI’16, pp 2330–2336, http://dl.acm.org/citation.cfm?id=3016100.3016224
Zhao P, Rocha G, Yu B (2009) The composite absolute penalties family for grouped and hierarchical variable selection. Ann Stat 37(6A):3468–3497. doi:10.1214/07-AOS584
Article MathSciNet MATH Google Scholar
Zhou J, Yuan L, Liu J, Ye J (2011) A multi-task learning formulation for predicting disease progression. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, KDD ’11, pp 814–822, doi:10.1145/2020408.2020549

Download references

Acknowledgements

The first author would like to acknowledge the financial support by Indian Institute of Technology (Banaras Hindu University) in terms of teaching assistantship.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, IIT (BHU), Varanasi, India
Mridula Verma & K. K. Shukla

Authors

Mridula Verma
View author publications
You can also search for this author in PubMed Google Scholar
K. K. Shukla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mridula Verma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Verma, M., Shukla, K.K. A new accelerated proximal technique for regression with high-dimensional datasets. Knowl Inf Syst 53, 423–438 (2017). https://doi.org/10.1007/s10115-017-1047-z

Download citation

Received: 26 October 2016
Revised: 03 March 2017
Accepted: 17 March 2017
Published: 28 March 2017
Issue Date: November 2017
DOI: https://doi.org/10.1007/s10115-017-1047-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new accelerated proximal technique for regression with high-dimensional datasets

Abstract

Access this article

Similar content being viewed by others

Application of a new accelerated algorithm to regression problems

Proximal Gradient Method with Extrapolation and Line Search for a Class of Non-convex and Non-smooth Problems

An accelerated proximal gradient method for multiobjective optimization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A new accelerated proximal technique for regression with high-dimensional datasets

Abstract

Access this article

Similar content being viewed by others

Application of a new accelerated algorithm to regression problems

Proximal Gradient Method with Extrapolation and Line Search for a Class of Non-convex and Non-smooth Problems

An accelerated proximal gradient method for multiobjective optimization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation