Skip to main content
Log in

A new accelerated proximal technique for regression with high-dimensional datasets

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

We consider the problem of minimization of the sum of two convex functions, one of which is a smooth function, while another one may be a nonsmooth function. Many high-dimensional learning problems (classification/regression) can be designed using such frameworks, which can be efficiently solved with the help of first-order proximal-based methods. Due to slow convergence of traditional proximal methods, a recent trend is to introduce acceleration to such methods, which increases the speed of convergence. Such proximal gradient methods belong to a wider class of the forward–backward algorithms, which mathematically can be interpreted as fixed-point iterative schemes. In this paper, we design few new proximal gradient methods corresponding to few state-of-the-art fixed-point iterative schemes and compare their performances on the regression problem. In addition, we propose a new accelerated proximal gradient algorithm, which outperforms earlier traditional methods in terms of convergence speed and regression error. To demonstrate the applicability of our method, we conducted experiments for the problem of regression with several publicly available high-dimensional real datasets taken from different application domains. Empirical results exhibit that the proposed method outperforms the previous methods in terms of convergence, accuracy, and objective function values.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://featureselection.asu.edu/datasets.php.

  2. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.

References

  1. Agarwal RP, O’Regan D, Sahu DR (2007) Iterative construction of fixed points of nearly asymptotically nonexpansive mappings. J Nonlinear Convex Anal 8(1):61–79

    MathSciNet  MATH  Google Scholar 

  2. Ando RK, Zhang T (2005) A framework for learning predictive structures from multiple tasks and unlabeled data. J Mach Learn Res 6:1817–1853, http://dl.acm.org/citation.cfm?id=1046920.1194905

  3. Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272. doi:10.1007/s10994-007-5040-8

    Article  Google Scholar 

  4. Bach F, Jenatton R, Mairal J, Obozinski G (2012) Structured sparsity through convex optimization. Stat Sci 27(4):450–468. doi:10.1214/12-STS394

    Article  MathSciNet  MATH  Google Scholar 

  5. Bach FR (2008) Consistency of the group lasso and multiple kernel learning. J Mach Learn Res 9:1179–1225, http://dl.acm.org/citation.cfm?id=1390681.1390721

  6. Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci 2(1):183–202. doi:10.1137/080716542

    Article  MathSciNet  MATH  Google Scholar 

  7. Bertsekas D (1999) Nonlinear programming, 2nd edn. Athena Scientific. http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-20&path=ASIN/1886529000

  8. Bickel S, Bogojeska J, Lengauer T, Scheffer T (2008) Multi-task learning for HIV therapy screening. In: Proceedings of the 25th international conference on machine learning. ACM, New York, ICML ’08, pp 56–63. doi:10.1145/1390156.1390164

  9. Chambolle A, Dossal C (2015) On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”. J Optim Theory Appl 166(3):968–982. doi:10.1007/s10957-015-0746-4

    Article  MathSciNet  MATH  Google Scholar 

  10. Chang S, Qi GJ, Yang Y, Aggarwal CC, Zhou J, Wang M, Huang TS (2016) Large-scale supervised similarity learning in networks. Knowl Inform Syst 48(3):707–740. doi:10.1007/s10115-015-0894-8

    Article  Google Scholar 

  11. Chapelle O, Shivaswamy P, Vadrevu S, Weinberger K, Zhang Y, Tseng B (2010) Multi-task learning for boosting with application to web search ranking. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, KDD ’10, pp 1189–1198. doi:10.1145/1835804.1835953

  12. Chen X, Lin Q, Kim S, Carbonell JG, Xing EP (2012) Smoothing proximal gradient method for general structured sparse regression. Ann Appl Stat 6(2):719–752. doi:10.1214/11-AOAS514

    Article  MathSciNet  MATH  Google Scholar 

  13. Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning. ACM, New York, ICML ’08, pp 160–167. doi:10.1145/1390156.1390177

  14. Daumé H III, Kumar A, Saha A (2010) Frustratingly easy semi-supervised domain adaptation. In: Proceedings of the 2010 workshop on domain adaptation for natural language processing, association for computational linguistics, Stroudsburg, PA, DANLP 2010, pp 53–59. http://dl.acm.org/citation.cfm?id=1870526.1870534

  15. Duchi J, Singer Y (2009) Efficient online and batch learning using forward backward splitting. J Mach Learn Res 10:2899–2934. http://dl.acm.org/citation.cfm?id=1577069.1755882

  16. Jenatton R, Mairal J, Obozinski G, Bach F (2011) Proximal methods for hierarchical sparse coding. J Mach Learn Res 12:2297–2334. http://dl.acm.org/citation.cfm?id=1953048.2021074

  17. Juditsky A, Nemirovski A (2012) First-order methods for nonsmooth convex large-scale optimization, I: general purpose methods. MIT Press, pp 121–148. http://www.jstor.org/stable/j.ctt5hhgpg.9

  18. Li B, Yang Q, Xue X (2009) Transfer learning for collaborative filtering via a rating-matrix generative model. In: Proceedings of the 26th annual international conference on machine learning. ACM, New York, ICML ’09, pp 617–624. doi:10.1145/1553374.1553454

  19. Liu H, Palatucci M, Zhang J (2009) Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery. In: Proceedings of the 26th annual international conference on machine learning. ACM, New York, ICML ’09, pp 649–656. doi:10.1145/1553374.1553458

  20. Luo D, Ding C, Huang H (2013) Toward structural sparsity: an explicit \(l_2-l_0\) approach. Knowl Inform Syst 36(2):411–438. doi:10.1007/s10115-012-0545-2

    Article  Google Scholar 

  21. Mainge PE (2008) Convergence theorems for inertial km-type algorithms. J Comput Appl Math 219(1):223–236. doi:10.1016/j.cam.2007.07.021

    Article  MathSciNet  MATH  Google Scholar 

  22. Mann WR (1953) Mean value methods in iteration. Proc Am Math Soc 4:506–510. doi:10.1090/S0002-9939-1953-0054846-3

    Article  MathSciNet  MATH  Google Scholar 

  23. Nesterov Y (2007) Gradient methods for minimizing composite objective function. CORE Discussion Papers 2007076, Universit catholique de Louvain, Center for Operations Research and Econometrics (CORE). http://EconPapers.repec.org/RePEc:cor:louvco:2007076

  24. Parikh N, Boyd S (2014) Proximal algorithms. Found Trends Optim 1(3):127–239. doi:10.1561/2400000003

    Article  Google Scholar 

  25. Quattoni A, Carreras X, Collins M, Darrell T (2009) An efficient projection for l1,∞ regularization. In: Proceedings of the 26th annual international conference on machine learning. ACM, New York, ICML ’09, pp 857–864. doi:10.1145/1553374.1553484

  26. Sahu DR (2011) Applications of the s-iteration process to constrained minimization problems and split feasibility problems. Fixed Point Theory 12(1):187–204

    MathSciNet  MATH  Google Scholar 

  27. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288. http://www.jstor.org/stable/2346178

  28. Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J R Stat Soc Ser B (Stat Methodol) 67(1):91–108. doi:10.1111/j.1467-9868.2005.00490.x

    Article  MathSciNet  MATH  Google Scholar 

  29. Turlach BA, Venables WN, Wright SJ (2005) Simultaneous variable selection. Technometrics 47(3):349–363. doi:10.1198/004017005000000139

    Article  MathSciNet  Google Scholar 

  30. Wang J, Liu J, Ye J (2013) Efficient mixed-norm regularization: algorithms and safe screening methods. arXiv:1307.4156 abs/CoRR

  31. Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83

    Article  Google Scholar 

  32. Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Stat Methodol) 68(1):49–67. doi:10.1111/j.1467-9868.2005.00532.x

    Article  MathSciNet  MATH  Google Scholar 

  33. Zhang S, Qian H, Gong X (2016) An alternating proximal splitting method with global convergence for nonconvex structured sparsity optimization. In: Proceedings of the thirtieth AAAI conference on artificial intelligence. AAAI Press, AAAI’16, pp 2330–2336, http://dl.acm.org/citation.cfm?id=3016100.3016224

  34. Zhao P, Rocha G, Yu B (2009) The composite absolute penalties family for grouped and hierarchical variable selection. Ann Stat 37(6A):3468–3497. doi:10.1214/07-AOS584

    Article  MathSciNet  MATH  Google Scholar 

  35. Zhou J, Yuan L, Liu J, Ye J (2011) A multi-task learning formulation for predicting disease progression. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, KDD ’11, pp 814–822, doi:10.1145/2020408.2020549

Download references

Acknowledgements

The first author would like to acknowledge the financial support by Indian Institute of Technology (Banaras Hindu University) in terms of teaching assistantship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mridula Verma.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Verma, M., Shukla, K.K. A new accelerated proximal technique for regression with high-dimensional datasets. Knowl Inf Syst 53, 423–438 (2017). https://doi.org/10.1007/s10115-017-1047-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-017-1047-z

Keywords

Navigation