Applications of Spectral Gradient Algorithm for Solving Matrix ℓ2,1-Norm Minimization Problems in Machine Learning

Yunhai Xiao; Qiuyu Wang; Lihong Liu

doi:10.1371/journal.pone.0166169

Abstract

The main purpose of this study is to propose, then analyze, and later test a spectral gradient algorithm for solving a convex minimization problem. The considered problem covers the matrix ℓ_2,1-norm regularized least squares which is widely used in multi-task learning for capturing the joint feature among each task. To solve the problem, we firstly minimize a quadratic approximated model of the objective function to derive a search direction at current iteration. We show that this direction descends automatically and reduces to the original spectral gradient direction if the regularized term is removed. Secondly, we incorporate a nonmonotone line search along this direction to improve the algorithm’s numerical performance. Furthermore, we show that the proposed algorithm converges to a critical point under some mild conditions. The attractive feature of the proposed algorithm is that it is easily performable and only requires the gradient of the smooth function and the objective function’s values at each and every step. Finally, we operate some experiments on synthetic data, which verifies that the proposed algorithm works quite well and performs better than the compared ones.

Citation: Xiao Y, Wang Q, Liu L (2016) Applications of Spectral Gradient Algorithm for Solving Matrix ℓ_2,1-Norm Minimization Problems in Machine Learning. PLoS ONE 11(11): e0166169. https://doi.org/10.1371/journal.pone.0166169

Editor: Xiaolei Ma, Beihang University, CHINA

Received: June 19, 2016; Accepted: October 23, 2016; Published: November 18, 2016

Copyright: © 2016 Xiao et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data is contained within the manuscript.

Funding: The research is supported by the Major State Basic Research Development Program of China (973 Program) (Grant No. 2015CB856003), the National Natural Science Foundation of China (Grant No. 11471101), and the Program for Science and Technology Innovation Talents in Universities of Henan Province (Grant No. 13HASTIT050).

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

The tasks in medical diagnosis [1], text classification [2–5], biomedical informatics [6, 7] and other applications [8–12] are always related to each other. Hence, capturing the shared information among each task becomes the key issue to learn [13–15]. Given the training set of t tasks and , where A_j is the data for the j-th task and b_j is the corresponding response. We let be the sparse feature for the j-th task, and let be the joint feature to be learned. In order to select features globally, it encourages several rows of X to be zeros and solves the following ℓ_2,1-norm regularized least squares [16, 17] (1) where μ > 0 is a weighting parameter, and ‖X‖_2,1 is defined by the sum of the ℓ₂-norm of each row of a matrix. It is well known that the ℓ_2,1-norm is used to encourage the multiple predictions from different tasks to share similar parameter sparsity patterns.

In the past few years, several algorithms have been proposed, analyzed, and tested to solve the nonsmooth convex minimization Problem (1). The algorithm in [18] transformed Eq (1) equivalently into a smooth convex optimization problem and minimized consequently by Nesterov’s gradient method. The method in [16] reformulated Eq (1) as a constrained optimization problem and minimized alternately. The algorithm in [19] and its variant [20] reformulated the problem as an equivalent constrained minimization by introducing an auxiliary variable, and then minimized the corresponding augmented Lagrange function alternatively. Finally, for another accelerated proximal gradient version of the algorithm [19], one can refer to [21].

Unlike all the research activities which mainly concerned about Problem (1), in this paper, we focus on the following generalized nonsmooth convex optimization problem (2) where is continuously differentiable (may be non-convex) and bounded below. Clearly, Model (2) includes Eq (1) as a special case when F is a least square. As we all know, the spectral gradient method was originated by Barzilai and Borwein [22] for solving smooth unconstrained minimization problems, later was developed in [23–26], and then was extended to solve ℓ₁-regularized nonsmooth minimization [27]. However, its numerical performance in solving matrix ℓ_2,1-norm involved nonsmooth minimization problems is still undiscovered. Therefore, extending the spectral gradient algorithm to solve Problem (2) may have significance both in theory and practice. The first contribution of this study lies in the design of the search direction at each iteration, which is derived by minimizing a quadratic approximated model of the objective function and at the same time making full use of the special structure of the ℓ_2,1-norm. We also show that the generated direction descends automatically provided that the spectral coefficient is positive. The second contribution of the paper is the nonmonotone line search, which is used to improve the algorithm’s performance. At each iteration, the algorithm requires the gradient of the smooth term and the value of the objective function, which means it has the ability to solve high dimensional problems. Finally, we do performance comparisons with a couple of solvers IAMD_MFL and SLEP, which illustrate that the proposed method is fast, efficient, and competitive.

The paper is organized as follows. In Section 2, we provide some notations and preliminaries, and construct the new algorithm together with its properties. In Section 3, we establish the global convergence of the algorithm. In Section 4, we report some numerical results and do some performance comparisons. Finally, we conclude our paper in Section 5.

2 Algorithm

2.1 Notations and preliminaries

In the first place, we summarize the notations used in this paper. Matrices are written as uppercase letters. Vectors are described as lowercase letters. For the matrix X, its i-th row and j-th column are denoted by X_i,: and X_:,j respectively. The Frobenius norm and the ℓ_2,1-norm of the matrix are defined as, respectively, For any two matrices , we define 〈X, Y〉 = tr(X^⊤ Y) (the standard trace inner product in ), so that . If , we denote “Diag(x)” the diagonal matrix possessing the components of vector x on the diagonal. We define “⊤” as the transpose of a vector or a matrix. For the sake of simplicity, we let Φ(X) = F(X) + μ‖X‖_2,1. Additional notations will be introduced when they occur.

We now quickly review the spectral gradient method for the unconstrained smooth minimization problem where is a continuously differentiable function. The spectral gradient method is defined by where one of the choices of λ_k (named as spectral coefficient) is given by where s_k−1 = x_k − x_k−1 and y_k−1 = ∇f(x_k) − ∇f(x_k−1). Obviously, if , i.e. λ_k > 0, the search direction descends automatically at current point.

2.2 Algorithm

Now, we turn our attention to the original Model (2). Since the ℓ_2,1-norm is nodifferentiable, we approximate the objective function by the following quadratic function Q_k: (3) where is the gradient of F at X_k; Λ_k is the so-called spectral coefficient which defined by (4) where S_k−1 = X_k − X_k−1 and Y_k−1 = ∇F(X_k) − ∇F(X_k−1). Minimizing Eq (3) yields Denote M_k = X_k + D and . One can get (5) The favorable structure of Eq (5) make the i-th row of matrix M_k write explicitly as where the convention 0 ⋅ 0/0 = 0 is followed. Hence, the search direction at current point can be expressed as (6) Obviously, the Eq (6) reduces to at the case of μ = 0, which means Eq (6) covers the traditional spectral gradient direction as a special case.

The following lemma verifies that D_k is a descent direction when the optimal solution is not achieved.

Theorem 1 Suppose that Λ_k > 0 and D_k is determined by Eq (6). Then (7) and (8)

Proof. By the differentiability of F and the convexity of ‖X‖_2,1, we have that for any θ ∈ (0, 1], which is exactly Eq (7). Noting that D_k is the minimizer of Eq (3) and θ ∈ (0, 1], by Eq (3) and the convexity of ‖X‖_2,1, one can get Hence, i.e., Recalling θ ∈ (0, 1], the above inequality indicates Eq (8) is correct.

To improve the algorithm’s performance, we use the classical nonmonotone line search [28] to find a suitable stepsize along the direction. It is well known that this technique allows the functional values to increase occasionally in some iterations but decrease in the whole iterative process. Letting δ ∈ (0, 1), ρ ∈ (0, 1) and be a given positive integer, we choose the smallest nonnegative integer j_k such that the stepsize satisfies (9) where (m(0) = 0) and (10) From Eq (8), it is clear that whenever D_k ≠ 0, which shows that Eq (9) is well-defined.

In summary, the full steps of the Nonmonotone Spectral Gradient algorithm for L_2,1-norm minimization (abbr. NSGL21) can be described as follows:

Algorithm 1 (NSGL21)

Step 0. Choose initial point X₀, constants μ > 0, , ρ ∈ (0, 1), δ ∈ (0, 1) and positive integer . Set k: = 0.

Step 1. Stop if ‖D_k‖_F = 0. Otherwise, continue.

Step 2. Compute D_k via Eq (6).

Step 3. Compute α_k via Eq (9).

Step 4. Let X_k+1: = X_k+α_k d_k.

Step 5. Let k: = k+1. Go to Step 1.

As is stated in the proceeding section that the generated direction descend automatically whenever Λ_k > 0. To ensure Λ_k > 0, we choose a sufficiently small Λ_(min) > 0 and a sufficiently large Λ_(max) > 0, such that Λ_k is forced as This approach ensures that the hereditary descent property is guaranteed at each and every step.

Remark 1. The steps of the proposed algorithm is novel and different to other existing approaches. The well-known approach [18] reformulated Problem (2) as the following constrained smooth convex optimization problem and then solved via the Nesterov’s method. The method in [19] paid attention least square Model (1) and used an auxiliary variable to transform the model equivalently as An alternating direction method of multiplier is used immediately to solve the resulting model and closed-form solution are derived at each subproblem. Clearly, our proposed algorithm is different from the above mentioned approaches in sense that we solve the original Model (2) directly without any transformation.

3 Convergence analysis

This section is devoted to establishing the global convergence of algorithm NSGL21. For this purpose, we make the following assumption.

Assumption 1. The level set Ω = {X: F(X) ≤ F(X₀)} is bounded.

Lemma 2. Suppose that the Assumption 1 holds and the sequence {X_k} is generated by Algorithm 1. Then X_k is a stationary point of Problem (2) if and only if D_k = 0.

Proof. In the case of D_k ≠ 0, Lemma 1 shows that D_k is a descent direction, which implies that X_k is not a stationary point of F. On the other hand, since D_k = 0 is the solution of Eq (5), for any with ξ > 0 we have (11) Combining the fact F(X_k + ξD) − F(X_k) = 〈∇F(X_k), ξD〉 + o(ξ) with Eq (11), it yields which indicates that X_k is a stationary point of F.

Lemma 3. Let l(k) be an integer such that Then the sequence {Φ(X_l(k))} is nonincreasing and the search direction D_l(k) satisfies (12)

Proof. It is not difficult to see that Φ(X_l(k+1)) ≤ Φ(X_l(k)), which indicates that the maximum value of the objective function is nonincreasing at each iteration. Moreover, by Eq (9), we have that for all , By Assumption 1, the sequence {Φ(X_l(k))} admits a limit as k → ∞. Hence, it follows that (13) On the other hand, by the definition of Δ_k in Eq (10) and the inequality Eq (8), it is easy to deduce that Combining with Eq (13), one get which indicates the desirable result Eq (12).

Theorem 1. Let the sequence {X_k} and {D_k} be generated by Algorithm 1. Then, there exists a subsequence such that (14)

Proof. Let be a limit point of {X_k}, and be a subsequence of {X_k} converging to . Then by Eq (12) either , or there exists a subsequence () such that (15) In this condition, we assume that there exists a constant ϵ > 0 such that (16) Since α_k is the first value to satisfy Eq (9), it follows from Step 3 in Algorithm 1 that there exists an index such that, for all and , (17) Since F is continuously differentiable, by the mean-value theorem on F, we can find that there exists a constant θ_k ∈ (0, 1), such that Combining with Eq (17), we have (18) Since α_k → 0 in Eq (15), we have α_k < ρ as k → ∞. It is not difficult to show that (19) Subtracting left side of Eq (18) by Δ_k and noting the definition of Δ_k, it is distinct that Noting Eq (19), thus Eq (18) shows that (20) Taking the limit as , k → ∞ in the both sides of Eq (20) and using the smoothness of F, we obtain which implies ‖D_k‖_F → 0 as , k → ∞. This yields a contradiction because Eq (16) indicates that ‖D_k‖_F is bounded.

4 Numerical experiments

In this section, we present numerical results to illustrate the feasibility and efficiency of the algorithm NSGL21. In particular, we also test against the recent solvers IADM_MFL and SLEP for performance comparison. In running SLEP (Sparse Learning with Efficient Projections), we use the code at http://www.public.asu.edu/~jye02/Software/SLEP/index.htm in its Matlab package, and choose mFlag = 1 and lFlag = 1 for using an adaptive line search. All experiments are carried out under Windows 7 and Matlab v7.8 (2009a) running on a Lenovo laptop with an Intel Pentium CPU at 2.5 GHz and 4 GB of memory.

As [16], in the first test, is generated from a 5-dimensional Gaussian distribution with zero-mean and con-variance diag{1, 0.64, 0.49, 0.36, 0.25}. Regarding each , we keep adding up to 20 irrelevant dimensions which are exactly zeros. The training and test data A_j is Gaussian matrices and their response data b_j is generated by where ω is zero-mean Gaussian noise with standard deviation 1.e − 2. We start NSGL21 from zero point and terminate the iterative process when (21) where tol > 0 is a tolerance. The quality of the solution X* is measured by the relative error to , i.e., In this test, we take , μ = 1e − 2, t = 200, n = 15, tol = 1e − 3, Λ_(min) = 10⁻²⁰, Λ_(max) = 10²⁰, and m_j = 100 for all j = 1, 2, …, t. Moreover, to compare the performance of these algorithms in a fair way, we run each code from zero point, use all the default parameter values, and observe their convergence behavior in obtaining similar accurate solutions. To specifically illustrate the performance of each algorithm, we draw a couple of figures to show their convergence behaviors with respect to the relative error and computing time proceed in Figs 1 and 2.

Download:

Fig 1. Comparison results of NSGL21, IADM MFL, and SLEP.

The x-axes represents the number of iterations and the y-axes represents the relative error.

https://doi.org/10.1371/journal.pone.0166169.g001

Download:

Fig 2. Comparison results of NSGL21, IADM MFL, and SLEP.

The x-axes represents the CPU time in seconds and the y-axes represents the relative error.

https://doi.org/10.1371/journal.pone.0166169.g002

Observing Figs 1 and 2, we clearly know that IADM_MFL and NSGL21 produced faithful results expect for SLEP. We have tried to run SLEP with more iterations in our experiments’ preparation, but it cannot achieve progress any more. Meanwhile, NSGL21 requires less number of iterations than IADM_MFL to achieve the similar quality of solutions. In both plots, we see that the green line lies at the bottom of each plot in most cases, which indicates that NSGL21 is superior to the other two solvers.

The simple test is not enough to verify that NSGL21 is the winner. To further illustrate the benefit of NSGL21, we give some insights to the behavior of NSGL21 with different dimensions and different number of tasks. The results are listed in Table 1, which contains the number of iterations (Iter), the CPU time in seconds (Time), the relative errors (RelErr), and the final functional values (Fun).

Download:

Table 1. Comparison results of NSGL21 with IADM_MFL and SLEP.

https://doi.org/10.1371/journal.pone.0166169.t001

From Table 1, we clearly observe that each algorithm requires more computing time with the increase of the problems’ dimensions and the number of tasks. Meanwhile, the number of iterations required by NSGL21 and IADM_MFL increases slightly at the higher dimensions case. We also observe that, for all the tested problems, both NSGL21 and IADM_MFL are terminated abnormally in producing similar quality solutions in sense of comparable relative errors and final function values. However, SLEP cannot generate acceptable solutions although more iterations are permitted in experiments’ preparation. Hence, we conclude that NSGL21 and IADM_MFL perform better than SLEP. Now, we turn our attention to the performance comparison of solvers IADM_MFL and NSGL21. For getting similar quality of solutions, we take notice that NSGL21 is faster than IADM_MFL and saves at least 50% number of iterations. It is reasonable to make an conclusion that NSGL21 is the winner among the compared solvers.

5 Conclusions

In this paper, we have proposed, then analyzed, and later tested a nonmonotone spectral gradient algorithm for solving ℓ_2,1-norm regularized minimization problem. The type of this problem mainly appears in computer version, text classification and biomedical informatics. Due to the nonsmoothness of the regularization term, the task of minimizing the problem is full of challenges. To the best of our knowledge, SLEP and IADM_MFL are the only available solvers of solving this problem. However, both solvers transferred equivalently to an equality-constrained minimization problem and then minimized alternatively. As we all know that the spectral gradient algorithm is very effective to solve smooth minimization problem. Hence, its performance in solving ℓ_2,1-norm regularized problems is worthy of investigating. Certainly, it is the main motivation of our paper. At each iteration, the method proposed in this paper minimizes an approximal quadratic model of the objective function to produce a search direction. We showed that the generated direction descends automatically and the algorithm converges globally under some mild conditions. Additionally, the numerical experiments illustrate that the proposed algorithm is competitive with or even performs better than SLEP and IADM_MFL. Of course, this is the numerical contribution of our paper. We have said that the ℓ_2,1-norm regularized minimization problem is partly arising in multi-task learning for capturing joint feather between each task. However, we did not test its real performance by using real data, this should be our further task to investigate. Finally, we expect that the proposed method and its extensions could produce even applications for problems in relevant areas of the machine learning.

Acknowledgments

The research of Y. Xiao was supported by the Major State Basic Research Development Program of China (973 Program) (Grant No. 2015CB856003), the National Natural Science Foundation of China (Grant No. 11471101), and the Program for Science and Technology Innovation Talents in Universities of Henan Province (Grant No. 13HASTIT050).

Author Contributions

Conceptualization: YX.
Data curation: QW.
Formal analysis: YX LL.
Methodology: YX LL.
Project administration: YX.
Software: QW.
Supervision: YX.
Validation: YX.
Writing – original draft: LL.
Writing – review & editing: YX.

References

1. Bi J, Xiong X, Yu S, Dundar M, Rao B, An improved multi-task learning approach with applications in medical diagnosis. In European Conference on Machine Learning, 2008.
2. Zhang J, Ghahramani Z, Yang Y, Flexible latent variable models for multi-task learning. Maching Learning, 3 (2008), 221–242.
- View Article
- Google Scholar
3. Zheng Y, Jeon B, Xu D, Wu QMJ, Zhang H. Image segmentation by generalized hierarchical fuzzy C-means algorithm Journal of Intelligent and Fuzzy Systems, 28 (2015), 961–973.
- View Article
- Google Scholar
4. Obozinski G, Taskar B, Jordan MI. Joint covariate selection for grouped classification. Technical report, Statistics Department, UC Berkeley, 2007.
5. Obozinski G, Taskar B, Jordan MI. Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing, 20 (2010), 231–252.
- View Article
- Google Scholar
6. Fu Z, Wu X, Guan C, Sun X, Ren K. Towards Efficient Multi-keyword Fuzzy Search over Encrypted Outsourced Data with Accuracy Improvement. IEEE Transactions on Information Forensics and Security,
- View Article
- Google Scholar
7. Saeys Y, Inza I, Larranaga P. A review of feature selection techniques in bioinformatics. Bioinformatics, 23 (2007), 2507–2517. pmid:17720704
- View Article
- PubMed/NCBI
- Google Scholar
8. Fu Z, Ren K, Shu J, Sun X, and Huang F. Enabling Personalized Search over Encrypted Outsourced Data with Efficiency Improvement. IEEE Transactions on Parallel and Distributed Systems, 2015.
- View Article
- Google Scholar
9. Gu B, Sheng VS. A Robust Regularization Path Algorithm for v-Support Vector Classification IEEE Transactions on Neural Networks and Learning Systems, 2016 pmid:26929067
- View Article
- PubMed/NCBI
- Google Scholar
10. Xia Z, Wang X, Sun X, Wang B. Steganalysis of least significant bit matching using multi-order differences. Security and Communocatopm Networks, 7 (2014), 1283–1291.
- View Article
- Google Scholar
11. Chen B, Shu H, Coatrieux G, Chen G, Sun X, Coatrieux JL. Color image analysis by quaternion-type moments. Journal of Mathematical Imaging and Vision, 51 (2015), 124–144.
- View Article
- Google Scholar
12. Xia Z, Wang X, Zhang L, Qin Z, Sun X, Ren K. A Privacy-preserving and Copy-deterrence Content-based Image Retrieval Scheme in Cloud Computing IEEE Transactions on Information Forensics and Security, 2016,
- View Article
- Google Scholar
13. Ando RK, Zhang T. A framework for learning predictive structures from multiple tasks and unlabeled data Journal of Machine Learning Research, 6 (2005), 1817–1853.
- View Article
- Google Scholar
14. Bakker B. Heskes T. Task clustering and gating for Bayesian multi-task learning Journal of Machine Learning Research, 4 (2003), 83–99.
- View Article
- Google Scholar
15. Evgeniou T, Micchelli CA, Pontil M. Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6 (2005), 615–637.
- View Article
- Google Scholar
16. Argyriou A, Evgeniou T, Massimiliano P. Convex multi-convex feature learning. Machine Learning, 73 (2008), 243–272.
- View Article
- Google Scholar
17. Obozinski G, Taskar B, Jordan MI. Multi-task feature selection Technical Report, UC Berkeley, 2006.
18. Liu J, Ji S, Ye J. Multi-task feather learning via efficient ℓ_2,1-norm minimization. in Comference on Uncertainty in Artificial Intelligence, 2009.
19. Xiao Y, Wu SY, He BS. A proximal alternating direction method for ℓ_2,1-norm least squares problem in multi-task feature learning. Journal of Industrial and Management Optimization, 8 (2012), 1057–1069.
- View Article
- Google Scholar
20. Deng W, Yin W, Zhang Y. Group sparse optimization by alternating direction method Technical Report TR11-06, Rice University, 2011. available at http://www.caam.rice.edu/~zhang/reports/tr1106.pdf.
21. Hu Y, Wei Z, Yuan G. Inexact accelerated proximal gradient algorithms for matrix ℓ_2,1-Norm minimization problem in multi-task feature learning. Statistics, Optimization & Information Computing, 2 (2014), 352–367.
- View Article
- Google Scholar
22. Barzilai J, Borwein JM. Two point step size gradient method. IMA Journal of Numerical Analysis, 8 (1988), 141–148.
- View Article
- Google Scholar
23. Birgin EG, Martínez JM, Raydan M. Nonmonotone spectral projected gradient methods on convex sets. SIAM Journal on Optimization, 10 (2000), 1196–1121.
- View Article
- Google Scholar
24. Raydan M. On the Barzilai and Borwein choice of steplength for the gradient methodz. IMA Journal of Numerical Analysis, 13 (1993), 321–326.
- View Article
- Google Scholar
25. Raydan M. The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM Journal on Optimization, 7 (1997), 26–33.
- View Article
- Google Scholar
26. Cheng W, Li DH. A derivative-free nonmonotone line search and its application to the spectral residual method. IMA Journal of Numerical Analysis, 29 (2009), 814–825.
- View Article
- Google Scholar
27. Xiao Y, Wu SY, Qi L. Nonmonotone Barzilai-Borwein Gradient Algorithm for ℓ₁-Regularized Nonsmooth Minimization in Compressive Sensingzz. Journal of Scientific Computing, 61 (2014), 17–41.
- View Article
- Google Scholar
28. Grippo L, Lampariello F, Lucidi S. A nonmonotone line search technique for Newton’s methodz. SIAM Journal on Numerical Analysis, 23 (1986), 707–716.
- View Article
- Google Scholar

[ref1] 1. Bi J, Xiong X, Yu S, Dundar M, Rao B, An improved multi-task learning approach with applications in medical diagnosis. In European Conference on Machine Learning, 2008.

[ref2] 2. Zhang J, Ghahramani Z, Yang Y, Flexible latent variable models for multi-task learning. Maching Learning, 3 (2008), 221–242.
View Article
Google Scholar

[3] View Article

[4] Google Scholar

[ref3] 3. Zheng Y, Jeon B, Xu D, Wu QMJ, Zhang H. Image segmentation by generalized hierarchical fuzzy C-means algorithm Journal of Intelligent and Fuzzy Systems, 28 (2015), 961–973.
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref4] 4. Obozinski G, Taskar B, Jordan MI. Joint covariate selection for grouped classification. Technical report, Statistics Department, UC Berkeley, 2007.

[ref5] 5. Obozinski G, Taskar B, Jordan MI. Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing, 20 (2010), 231–252.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref6] 6. Fu Z, Wu X, Guan C, Sun X, Ren K. Towards Efficient Multi-keyword Fuzzy Search over Encrypted Outsourced Data with Accuracy Improvement. IEEE Transactions on Information Forensics and Security,
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref7] 7. Saeys Y, Inza I, Larranaga P. A review of feature selection techniques in bioinformatics. Bioinformatics, 23 (2007), 2507–2517. pmid:17720704
View Article
PubMed/NCBI
Google Scholar

[16] View Article

[17] PubMed/NCBI

[18] Google Scholar

[ref8] 8. Fu Z, Ren K, Shu J, Sun X, and Huang F. Enabling Personalized Search over Encrypted Outsourced Data with Efficiency Improvement. IEEE Transactions on Parallel and Distributed Systems, 2015.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref9] 9. Gu B, Sheng VS. A Robust Regularization Path Algorithm for v-Support Vector Classification IEEE Transactions on Neural Networks and Learning Systems, 2016 pmid:26929067
View Article
PubMed/NCBI
Google Scholar

[23] View Article

[24] PubMed/NCBI

[25] Google Scholar

[ref10] 10. Xia Z, Wang X, Sun X, Wang B. Steganalysis of least significant bit matching using multi-order differences. Security and Communocatopm Networks, 7 (2014), 1283–1291.
View Article
Google Scholar

[27] View Article

[28] Google Scholar

[ref11] 11. Chen B, Shu H, Coatrieux G, Chen G, Sun X, Coatrieux JL. Color image analysis by quaternion-type moments. Journal of Mathematical Imaging and Vision, 51 (2015), 124–144.
View Article
Google Scholar

[30] View Article

[31] Google Scholar

[ref12] 12. Xia Z, Wang X, Zhang L, Qin Z, Sun X, Ren K. A Privacy-preserving and Copy-deterrence Content-based Image Retrieval Scheme in Cloud Computing IEEE Transactions on Information Forensics and Security, 2016,
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref13] 13. Ando RK, Zhang T. A framework for learning predictive structures from multiple tasks and unlabeled data Journal of Machine Learning Research, 6 (2005), 1817–1853.
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref14] 14. Bakker B. Heskes T. Task clustering and gating for Bayesian multi-task learning Journal of Machine Learning Research, 4 (2003), 83–99.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref15] 15. Evgeniou T, Micchelli CA, Pontil M. Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6 (2005), 615–637.
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref16] 16. Argyriou A, Evgeniou T, Massimiliano P. Convex multi-convex feature learning. Machine Learning, 73 (2008), 243–272.
View Article
Google Scholar

[45] View Article

[46] Google Scholar

[ref17] 17. Obozinski G, Taskar B, Jordan MI. Multi-task feature selection Technical Report, UC Berkeley, 2006.

[ref18] 18. Liu J, Ji S, Ye J. Multi-task feather learning via efficient ℓ_2,1-norm minimization. in Comference on Uncertainty in Artificial Intelligence, 2009.

[ref19] 19. Xiao Y, Wu SY, He BS. A proximal alternating direction method for ℓ_2,1-norm least squares problem in multi-task feature learning. Journal of Industrial and Management Optimization, 8 (2012), 1057–1069.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref20] 20. Deng W, Yin W, Zhang Y. Group sparse optimization by alternating direction method Technical Report TR11-06, Rice University, 2011. available at http://www.caam.rice.edu/~zhang/reports/tr1106.pdf.

[ref21] 21. Hu Y, Wei Z, Yuan G. Inexact accelerated proximal gradient algorithms for matrix ℓ_2,1-Norm minimization problem in multi-task feature learning. Statistics, Optimization & Information Computing, 2 (2014), 352–367.
View Article
Google Scholar

[54] View Article

[55] Google Scholar

[ref22] 22. Barzilai J, Borwein JM. Two point step size gradient method. IMA Journal of Numerical Analysis, 8 (1988), 141–148.
View Article
Google Scholar

[57] View Article

[58] Google Scholar

[ref23] 23. Birgin EG, Martínez JM, Raydan M. Nonmonotone spectral projected gradient methods on convex sets. SIAM Journal on Optimization, 10 (2000), 1196–1121.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref24] 24. Raydan M. On the Barzilai and Borwein choice of steplength for the gradient methodz. IMA Journal of Numerical Analysis, 13 (1993), 321–326.
View Article
Google Scholar

[63] View Article

[64] Google Scholar

[ref25] 25. Raydan M. The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM Journal on Optimization, 7 (1997), 26–33.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref26] 26. Cheng W, Li DH. A derivative-free nonmonotone line search and its application to the spectral residual method. IMA Journal of Numerical Analysis, 29 (2009), 814–825.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref27] 27. Xiao Y, Wu SY, Qi L. Nonmonotone Barzilai-Borwein Gradient Algorithm for ℓ₁-Regularized Nonsmooth Minimization in Compressive Sensingzz. Journal of Scientific Computing, 61 (2014), 17–41.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref28] 28. Grippo L, Lampariello F, Lucidi S. A nonmonotone line search technique for Newton’s methodz. SIAM Journal on Numerical Analysis, 23 (1986), 707–716.
View Article
Google Scholar

[75] View Article

[76] Google Scholar

Figures

Abstract

1 Introduction

2 Algorithm

2.1 Notations and preliminaries

2.2 Algorithm

3 Convergence analysis

4 Numerical experiments

5 Conclusions

Acknowledgments

Author Contributions

References