The degrees of freedom of the Lasso in underdetermined linear regression models

Size: px

Start display at page:

Download "The degrees of freedom of the Lasso in underdetermined linear regression models"

Lambert Watson
10 years ago
Views:

1 The degrees of freedom of the Lasso in underdetermined linear regression models C. Dossal (1), M. Kachour (2), J. Fadili (2), G. Peyré (3), C. Chesneau (4) (1) IMB, Université Bordeaux 1 (2) GREYC, ENSICAEN (3) CNRS and Ceremade Université Paris-Dauphine (4) LMNO, Université de Caen SPARS 11, June 2011 Edinburgh

Chesneau (4) (1) IMB, Université Bordeaux 1 (2) GREYC, ENSICAEN (3) CNRS and

2 The contents 1 Motivations 2 Some characteristics of the Lasso 3 The overdetermined case 4 Main results 5 References

3 Motivations The contents 1 Motivations 2 Some characteristics of the Lasso 3 The overdetermined case 4 Main results 5 References

4 Motivations Statistical framework : The underdetermined (or high dimensional) linear regression model estimation : y = Ax 0 + ε = θ 0 + ε, (1) where y R n, A = (a j ) j {1,,p}, with a j R n and n < p, x 0 R p, and ε N (0, σ 2 I ) Applications : Genomics Econometrics Signal processing Tomography,...

(1) where y R n, A = (a j ) j {1,,p}, with a j R n and n < p, x 0 R p, and

5 Motivations Strict sparsity assumption : Weak sparsity assumption : x 0 has few non-zero components x 0 can be approximated to good quality by a vector with few non-zero components Let ˆx = f (y) be an estimator of x 0 and ˆµ = Aˆx Since y N (Ax 0, σ 2 ), according to Efron [2], the degrees of freedom df of ˆµ is dened by df (ˆµ) = n i=1 cov(ˆµ i, y i ) σ 2 (2)

components Let ˆx = f (y) be an estimator of x 0 and ˆµ = Aˆx Since y N (Ax 0, σ 2 ),

6 Motivations Stein's lemma [3] : Let ˆx be an estimator of x 0. Suppose that ˆµ = Aˆx is almost dierentiable. Then, we have [ ] df (ˆµ) = E df ˆ (ˆµ), where df ˆ (ˆµ) = div (ˆµ) = n i=1 ˆµ i y i (3)

7 Motivations Stein's lemma [3] : Let ˆx be an estimator of x 0. Suppose that ˆµ = Aˆx is almost dierentiable. Then, we have [ ] df (ˆµ) = E df ˆ (ˆµ), where df ˆ (ˆµ) = div (ˆµ) = n i=1 ˆµ i y i (3) We dene MSE(ˆµ) = E ˆµ θ = E A(ˆx x 0 ) 2 2 (4) SURE(ˆµ) = nσ 2 + ˆµ y σ 2 ˆ df (ˆµ) (5) So, according to Stein's lemma, we have MSE(ˆµ) = E [SURE(ˆµ)] (6)

Then, we have [ ] df (ˆµ) = E df ˆ (ˆµ), where df ˆ (ˆµ) = div (ˆµ) = n i=1 ˆµ i y i (3) We

8 Some characteristics of the Lasso The contents 1 Motivations 2 Some characteristics of the Lasso 3 The overdetermined case 4 Main results 5 References

9 Some characteristics of the Lasso The Lasso estimate [4] amounts to solving : P 1 (y, λ) : 1 min x R p 2 y Ax λ x 1 (7) Depending on λ, some coecients are set exactly equal to zero The estimator can be computed for very large p Let x = ( x j ) R p. The support of x and its cardinal are dened by I ( x) = supp( x) = {j : x j 0}, I ( x) = card(i ( x)) We note x I ( x) = ( x i ) i I ( x), A I ( x) = (a i ) i I ( x) and sign ( x I ( x) ) = ( sign ( xi) )) i I ( x) { 1, 1} I ( x)

10 Some characteristics of the Lasso x is a solution of P 1 (y, λ) if and only if 1 a k, y A x = λ, k I ( x) 2 a j, y A x λ, j I ( x) x is the unique minimizer of P 1 (y, λ) if 1 a k, y A x = λ, k I ( x) 2 a j, y A x < λ, j I ( x) 3 A I ( x) is full rank Here, we have ( ) 1 ) 1 x I ( x) = A t I ( x) A I ( x) A t I ( x) (A y λ t I ( x) A ) I ( x) sign ( xi ( x) (8) Furtheremore, there exists a nite sequence of λ's, called the transition points, so that between two transition point the support and the sign of the optimum (8) are constant with respect to λ.

( x) = A t I ( x) A I ( x) A t I ( x) (A y λ t I ( x) A ) I ( x) sign ( xi ( x) (8) Furtheremore, there exists a nite sequence of λ's,

11 The overdetermined case The contents 1 Motivations 2 Some characteristics of the Lasso 3 The overdetermined case 4 Main results 5 References

12 The overdetermined case Consider the overdetermined linear regression model : y = Ax 0 + ε, where y R n, x 0 R p, ε N (0, σ 2 I ), and (a j ) j {1,,p} are linearly independent, that is rank(a) = p Here, the Lasso problem P 1 (y, λ) has an unique solution x λ Let K λ R n, so that z K λ, λ is not a transition point

linearly independent, that is rank(a) = p Here, the Lasso problem P 1 (y, λ)

13 The overdetermined case Theorem (Zou and al. [5]) Let λ > 0 and y K λ. Let µ λ = A x λ (y) be the Lasso response vector. Then, we have [ ] df ( µ λ ) = E Ĩ, (9) where Ĩ is the support of x λ

14 Main results The contents 1 Motivations 2 Some characteristics of the Lasso 3 The overdetermined case 4 Main results 5 References

15 Main results Goal : Extend the previous result to the underdetermined case where the Lasso solution is not unique Remark : If x 1 and x 2 are solutions of P 1 (y, λ), then A x 1 = A x 2 Let ˆµ λ (y) be the unique image of the solutions of P 1 (y, λ)

Remark : If x 1 and x 2 are solutions of P 1 (y, λ), then A x

16 Main results Notations Let I {1, 2,, p}, such that A I = (a i ) i I is full rank, A + I = (A t I A I ) 1 A t I the pseudo-inverse of A I, V I = span(a i ) i I, P VI = A I A + I, P V = I n n P VI, S { 1, 1} I I Let j {1, 2,, p} and x λ > 0. We dene H I,j,S = {u R n : P V (a j ), u = ±λ(1 a j, (A + I I ) t S )} (10)

i I, P VI = A I A + I, P V = I n n P VI, S { 1, 1} I I Let j {1, 2,, p} and x

17 Main results Notations Let I {1, 2,, p}, such that A I = (a i ) i I is full rank, A + I = (A t I A I ) 1 A t I the pseudo-inverse of A I, V I = span(a i ) i I, P VI = A I A + I, P V = I n n P VI, S { 1, 1} I I Let j {1, 2,, p} and x λ > 0. We dene H I,j,S = {u R n : P V (a j ), u = ±λ(1 a j, (A + I I ) t S )} (10) Ω = {(I, j, S) : a j V I } (11)

+ I, P V = I n n P VI, S { 1, 1} I I Let j {1, 2,, p} and x λ > 0.

18 Main results Notations Let I {1, 2,, p}, such that A I = (a i ) i I is full rank, A + I = (A t I A I ) 1 A t I the pseudo-inverse of A I, V I = span(a i ) i I, P VI = A I A + I, P V = I n n P VI, S { 1, 1} I I Let j {1, 2,, p} and x λ > 0. We dene H I,j,S = {u R n : P V (a j ), u = ±λ(1 a j, (A + I I ) t S )} (10) and Ω = {(I, j, S) : a j V I } (11) G λ = R n \ H I,j,S (12) (I,j,S) Ω

VI, S { 1, 1} I I Let j {1, 2,, p} and x λ > 0.

19 Main results Notations Let I {1, 2,, p}, such that A I = (a i ) i I is full rank, A + I = (A t I A I ) 1 A t I the pseudo-inverse of A I, V I = span(a i ) i I, P VI = A I A + I, P V = I n n P VI, S { 1, 1} I I Let j {1, 2,, p} and x λ > 0. We dene H I,j,S = {u R n : P V (a j ), u = ±λ(1 a j, (A + I I ) t S )} (10) and Ω = {(I, j, S) : a j V I } (11) G λ = R n \ H I,j,S (12) (I,j,S) Ω Remark : K λ G λ

20 Main results Theorem : Let λ > 0, y G λ, and M y,λ the set of all the solutions of P 1 (y, λ). Let xλ be a solution of P 1(y, λ) such that A I the associated matrix to its support I is full rank. Then, I = min x λ M y,λ supp(x λ ). (13) Furthermore, there exists ε > 0 such that for all z B(y, ε), we have ˆµ λ (z) = ˆµ λ (y) + P VI (z y). (14)

Let xλ be a solution of P 1(y, λ) such that A I the associated matrix to its support

21 Main results Corollary 1 : Let λ > 0 and y G λ. Then, we have div(ˆµ λ (y)) = I (15) Moreover, df = E( I ). (16) Corollary 2 : Let λ > 0 and y G λ. Suppose that P 1 (y, λ) has an unique solution ˆx λ, and let Î be its support. Then, we have df = E( Î ). (17)

22 Main results Condition (UC), Dossal [1] A matrix A satises condition (UC) if, for all subsets I {1,, p} with I n, such that (a i ) i I are linearly independent, for all indices j I and all vectors S { 1, 1} I, a j, (A + I ) t S 1. (18) Consequences Suppose that A satises condition (UC). Then, we have y G λ P 1 (y, λ) has an unique solution ˆx λ Therefore, df = E( Î ), where Î is the support of ˆx λ

23 References The contents 1 Motivations 2 Some characteristics of the Lasso 3 The overdetermined case 4 Main results 5 References

24 References Dossal, C (2007). A necessary and sucient condition for exact recovery by l1 minimization. Technical report, HAL :1. Efron, B. (1981). How biased is the apparent error rate of a prediction rule. J. Amer. Statist. Assoc. vol. 81 pp Stein, C. (1981). Estimation of the mean of a multivariate normal distribution. Ann. Statist Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58(1) Zou, H., Hastie, T. and Tibshirani, R. (2007). On the "degrees of freedom" of the Lasso. Ann. Statist. Vol. 35, No

Degrees of Freedom and Model Search

Degrees of Freedom and Model Search Ryan J. Tibshirani Abstract Degrees of freedom is a fundamental concept in statistical modeling, as it provides a quantitative description of the amount of fitting performed