On the degrees of freedom in shrinkage estimation

Size: px
Start display at page:

Download "On the degrees of freedom in shrinkage estimation"

Transcription

1 On the degrees of freedom in shrinkage estimation Kengo Kato Graduate School of Economics, University of Tokyo, Hongo, Bunkyo-ku, Tokyo, , Japan kato October, 2007 Abstract We study the degrees of freedom in shrinkage estimation of the regression coefficients. Generalizing the idea of the Lasso, we consider the problem of estimating the coefficients by the projection of the ordinary least squares estimator onto a closed convex set. Then an unbiased estimator of the degrees of freedom is derived in terms of geometric quantities under a smoothness condition on the boundary of the closed convex set. The result presented in this paper is applicable to estimation with a wide class of constraints. As an application, we obtain a C p -type criterion and AIC for selecting the tuning parameter. Keywords: AIC, degrees of freedom, fused Lasso, group Lasso, Lasso, Mallows C p, second fundamental form, shrinkage estimation, Stein s lemma, tubal coordinates. Running title: Degrees of freedom in shrinkage estimation 1 Introduction In recent years, much attention has been paid for shrinkage methods in estimating coefficients of a linear model. Compared with the ordinary least squares OLS, shrinkage methods often improve the prediction accuracy. In addition, if the constraint region towards which the estimator is shrunk has edges or corners, some coefficients can be set to exactly zero. To be precise, suppose y = y 1,..., y n is the response vector and x j = x 1j,..., x nj, j = 1,..., p are p linearly independent predictors. Let X = [x 1 x p ] be the design matrix. We consider a linear model y = Xβ + ɛ, 1.1 where β = β 1,..., β p is the coefficient vector and ɛ N n 0, σ 2 I n. Without loss of generality, we assume that the predictors are centered so that the intercept is not included in the above linear model. 1

2 A canonical example of shrinkage methods is the Lasso Tibshirani [10]. Let 2 be the ordinary Euclidean norm: z 2 = z z 1 2 for z R n. The Lasso estimate is defined as the solution of the following problem: or equivalently min β y Xβ 2 2 subject to min y Xβ λ β β j t, 1.2 j=1 β j, 1.3 where t and λ are non-negative tuning parameters. The Lasso shrinks the coefficients towards zero as t decreases or λ increases. An important feature of the Lasso is that, depending on the tuning parameter, some coefficients are set exactly equal to zero. It should be noted that although 1.2 and 1.3 are equivalent as minimization problems, the solutions of these two problems are different as estimators since the correspondence between t and λ generally depends on the data. As explained in Efron [2], the degrees of freedom plays an important role in selecting the optimal tuning parameter. The degrees of freedom reflects the model complexity controlled by the shrinkage and corresponds to the penalty term of model selection criteria such as Mallows C p Mallows [4] and Akaike s information criterion AIC, Akaike [1]. Recently, Zou et al. [15] show that, with parametrization 1.3, the number of non-zero coefficients is an unbiased estimator of the degrees of the freedom of the Lasso. Their derivation, however, requires the local explicit form of the Lasso estimator and can not be applied to estimation with a more general restriction. The Lasso can be viewed as the projection of the OLS estimator onto the diamond shaped region. For u, v R p, we denote j=1 u, v = u V v, 1.4 where V = X X and let =,. Then the Lasso problem 1.2 is rewritten as min β ˆβ subject to β β j t, 1.5 j=1 where ˆβ is the OLS estimator of β. The natural generalization of the minimization problem 1.5 is min β ˆβ subject to β K, 1.6 β with a closed convex set K R p. The solution ˆβ K to the problem 1.6 is given by the projection of ˆβ onto K. Since K is closed and convex, ˆβK is uniquely defined. The problem of selecting the optimal tuning parameter is viewed as the problem of selecting the optimal constraint region K among a given collection of closed convex sets. The class of estimation methods considered here includes the Lasso, the fused Lasso Tibshirani et al. [11], and the group Lasso Yuan and Lin [12]. 2

3 Here we present illustrative examples of the constraint regions of the Lasso and the group Lasso. The left of the figures below corresponds to the Lasso constraint β 1 + β 2 + β 3 1. The right one corresponds to the group Lasso constraint β β β beta3 0.0 beta beta2 0 beta beta2 0 beta1 Fig. The constraint regions of the Lasso left and the group Lasso right. In this paper, we study the degrees of freedom of the fit ˆµ K = X ˆβ K. From Stein s lemma Stein [8], an unbiased estimator of the degrees of freedom is given by the divergence of ˆµ K with respect to y, which coincides with the divergence of ˆβ K with respect to ˆβ. However, in general, the estimator ˆβ K can not be expressed in an explicit form. Thus it is often impossible to directly calculate the divergence. To overcome this difficulty, we use the idea of the tubal coordinates Weyl [14]. From an approach similar to that of Kuriki and Takemura [3], we derive the divergence of the projection onto K in terms of geometric quantities under a regularity condition on the boundary K of K. Hence we obtain an unbiased estimator of the degrees of freedom of ˆµ K. As an application, a C p -type statistic and AIC for ˆµ K are also derived. The organization of this paper is as follows. In Section 2, we briefly review Stein s unbiased risk theory. In Section 3, we first prepare notations of geometry of a piecewise smooth boundary of a closed convex set and derive a divergence formula for the projection onto K from the differential geometric approach. An unbiased estimator of the degrees of freedom of ˆµ K is provided in Section 3.2. The result presented in this paper seems to be fairly general. In Section 4, we exemplify our method to obtain unbiased estimators of the degrees of freedom for the Lasso and its variants. Section 5 is devoted to some concluding remarks. 2 Unbiased estimation of the prediction risk In this section, according to Efron [2], we first introduce Stein s unbiased risk estimation theory. The precise definition of the degrees of freedom is given. Then we explain the strategy to derive an unbiased estimator of the degrees of freedom for the estimator defined by the solution to the minimization problem 1.6. Given a fit ˆµ = ˆµy = X ˆβ where ˆβ is an estimator of β, we focus on the accuracy of ˆµ to predict future data. Suppose y new is a new response vector generated from the same distribution as y. We shall consider to estimate the prediction risk E y new ˆµ 2 2/n. 3

4 Define µ = Xβ. Partitioning yi new ˆµ i 2 as y new i and substituting into 2.1, we obtain ˆµ i 2 = y new i µ i 2 + 2y new i µ i µ i ˆµ i + µ i ˆµ i µ i ˆµ i 2 = y i ˆµ i 2 y i µ i 2 + 2y i µ i ˆµ i µ i yi new ˆµ i 2 = y i ˆµ i 2 +2y i µ i ˆµ i µ i +yi new µ i 2 y i µ i 2 +2yi new µ i µ i ˆµ i. 2.2 Taking expectation of both sides of the equation 2.2, we obtain the decomposition where E y new ˆµ 2 2 = E y ˆµ dfˆµσ 2, dfˆµ = n covˆµ i, y i /σ i=1 is called the degrees of freedom of the fit ˆµ. When ˆµ is given by a linear function of y, i.e., ˆµ = Sy with some matrix S being independent of y, the degrees of freedom is dfˆµ = tr S, which is a known constant. However, in general it is necessary to estimate dfˆµ. We employ Stein s lemma to accomplish the task. Lemma 2.1 Stein s lemma. Suppose ˆµ i : R n R is absolutely continuous in i-th coordinate for i = 1,..., n. If E ˆµ i / y i < for each i, then where div ˆµ = n i=1 ˆµ i/ y i. n covˆµ i, y i /σ 2 = Ediv ˆµ, i=1 Therefore an unbiased estimator of the degrees of freedom is given by and we can define a C p -type criterion by dfˆµ = div ˆµ, 2.4 C p ˆµ = y ˆµ 2 2 n + 2 dfˆµ σ 2 n which is an unbiased estimator of the prediction risk. Let ˆβ K be the estimator defined as the solution to the problem 1.6 with a closed convex set K. We verify the absolute continuity of ˆµ K with ˆµ K = X ˆβ K. Lemma 2.2. For every i, ˆµ K,i is absolutely continuous in each coordinate and ˆµ K,i / y = ˆµ K,i / y 1,..., ˆµ K,i / y n is essentially bounded. 4

5 Proof. Since ˆβ K is the projection of ˆβ onto K, ˆβK is Lipschitz continuous in ˆβ see Theorem of Webster [13]. Therefore ˆµ K is shown to be Lipschitz continuous in y and so is each ˆµ K,i. The absolute continuity and the essential boundedness follow directly from the Lipschitz continuity. Note that if ˆβ K is differentiable in ˆβ, the divergence div ˆµ K is same as the divergence of ˆβ K with respect to ˆβ. This can be verified by the chain rule: div ˆµ K = tr X ˆβ K ˆβ ˆβ y = tr X ˆβ K ˆβ X X 1 X = tr ˆβ K ˆβ, where ˆβ K / ˆβ is the matrix whose i, k-th component is ˆβ K,i / ˆβ k and ˆβ / y is the matrix whose k, j-th component is ˆβ k / y j. Therefore we only need to calculate the divergence of ˆβ K with respect to ˆβ in order to derive an unbiased estimator of the degree of the freedom dfˆµ K. For the normal linear model 1.1, ˆβ is a complete sufficient statistic for β when σ 2 is known and ˆβ, y y is a complete sufficient statistic for β, σ 2 when σ 2 is unknown. In either case, dfˆµ K = tr ˆβ K / ˆβ is shown to be the unique uniformly minimum variance unbiased estimator of the degrees of freedom dfˆµ K since dfˆµ K is a function of ˆβ. Thus, in terms of estimating the degrees of freedom, the analytical estimator dfˆµ K is more efficient than cross-validation and related nonparametric methods. 3 Main results In this section, we first derive a divergence formula for the projection onto K under a smoothness condition on the boundary K. As noted in the previous section, it enables us to obtain an unbiased estimator of the degrees of freedom for the shrinkage estimator projected on K. The result presented here is an extension of that of Meyer and Woodroofe [5], which treats the case where K is a convex polyhedral cone. 3.1 Divergence formula Let K R p be a closed convex set. For x R p x K denotes the orthogonal projection of x onto K in terms of, : x x K = min x z. z K Recall that the inner product, is defined by 1.4. Since K is closed and convex, x K is uniquely defined. Our main aim is to evaluate the divergence of the projection onto K defined as fx = f 1 x,..., f p x = x K. Note that f is Lipschitz continuous see the proof of Lemma

6 Let K be boundary of K. For s K, the normal cone of K at s is defined by NK, s = {z s z K = s}. Depending on the dimension of the normal cone NK, s, we have a disjoint partition of the boundary K as K = D 1 D p, where Define D m = {s K dim NK, s = m}. E m = {x R p \ K x K D m }. Then we have a disjoint partition of R p \ K as R p \ K = E 1 E p. We put a condition on smoothness of K as in Kuriki and Takemura [3]. E m denotes the interior of E m. Assumption 3.1. D m is a p m-dimensional C 2 -manifold consisting of a finite number of relatively open connected components. Furthermore the Lebesgue measure of E m \E m is zero. Remark 3.1. In Kuriki and Takemura [3], they call K piecewise smooth if K meets Assumption 3.1. Let T s D m be the tangent space of D m at s and T s D m be the orthogonal complement of T s D m in terms of, : T s D m = {v R p v, z = 0, z T s D m }. Clearly, T s D m is the affine hull of NK, s. Following Milnor [6], the normal bundle of D m is defined as N m = {s, v s D m, v T s D m }. It is not difficult to show that N m is a p-dimensional C 1 -manifold imbedded in R 2p. Let us define ϕ : N m R p as ϕs, v = s + v. Notice that ϕ is a C 1 -mapping. Then we show the following basic fact. Lemma 3.1. For each fixed x E m, there exist an ɛ-ball B ɛ = {x R p x x 2 < ɛ} E m around x with sufficiently small ɛ > 0 and an open neighborhood W of x K, x x K in N m such that ϕ W : W B ɛ is a diffeomorphism and ϕ W 1 x = x K, x x K for x B ɛ. Especially, f is continuously differentiable on E m. Proof. See Appendix A.1. To calculate the divergence of f in an explicit form, we introduce the tubal coordinates on Em. Let θ = θ 1,..., θ p m be a C 2 -local coordinate system on D m and write s D m as sθ = sθ 1,..., θ p m. The tangent space T sθ D m at sθ is spanned by { b a θ = s } θ, a = 1,..., p m. θa 6

7 Let {n α θ, α = 1,..., m} be an orthonormal basis of Ts D m in terms of,. Since {b a θ} are C 1 -mappings in θ, we can choose {n α θ} so as to be of class C 1 as well. Hence we know that m θ, τ sθ, τ α n α θ, with τ = τ 1,..., τ m R m, gives a C 1 -local parametrization of N m. From Lemma 3.1, taking α=1 θ, τ ϕθ, τ = sθ + m τ α n α θ 3.1 as a C 1 -local parametrization of Em, we can express f in the local coordinates θ, τ as fθ, τ = sθ. Thus the Jacobian matrix of f with respect to x at x = ϕθ, τ is given by [ ] b1 θ b p m θ } 0 {{ 0} Jϕ 1 θ,τ, 3.2 m where Jϕ θ,τ is the Jacobian matrix of ϕ with respect to θ, τ. Especially, the divergence of f with respect to x at x = ϕθ, τ is given by the trace of the Jacobian matrix 3.2. To state our main result, we prepare some notations used in differential geometry: the first fundamental form and the second fundamental form. The first fundamental form of D m associated with the coordinate system θ = θ 1,..., θ p m is the symmetric matrix α=1 Gθ = g ab θ 1 a,b p m with g ab θ = b a θ, b b θ. The second fundamental form of D m in the normal direction n α θ is defined as with H α θ = h abα θ 1 a,b p m h abα θ = n α θ, 2 s θ a θ b θ. For x = ϕθ, τ, we define 2 s Hθ, τ = x x K, θ a θ θ b 1 a,b p m m = τ α H α θ, 3.3 α=1 which is a positive semi-definite matrix. See Appendix A.2. 7

8 Lemma 3.2. The divergence div fx = p j=1 f jx/ x j of f at x E m is given by div fx = p m a= κ a x, where κ a x = κ a θ, τ, a = 1,... p m are the eigenvalues satisfying the equation Hθ, τ κgθ = Proof. We need to evaluate the Jacobian matrix 3.2. In the following calculation, we abbreviate arguments like b a = b a θ. Since the elements of the Jacobian matrix Jϕ = [ ϕ/ θ 1 ϕ/ θ p m ϕ/ τ 1 ϕ/ τ m ] is given by we have ϕ θ a = b a + ϕ τ β = n β, m α=1 τ α n α θ a, Jϕ V [b 1 b p m n 1 n m ] [ gab + m α=1 = τ α nα, b θ a b m 1 a,b p m α=1 τ α nα, n θ a β ] 1 a p m,1 β m I m Differentiating both sides of n α, b b = 0 with respect to θ a, we obtain and hence 0 = θ n α, b a b = n α θ, b 2 s b + n a α, θ a θ, b n α θ, b b = n a α, Thus the right hand side of 3.5 is written as [ g ab m α=1 τ α n α, 2 s θ a θ b [ ] A11 A = 12, 0 I m 1 a,b p m 2 s θ a θ b. m α=1 τ α nα θ a, n β 1 a p m,1 β m 0 I m where p m p m matrix A 11 and p m m matrix A 12 are given by m A 11 = g ab τ α 2 s n α, θ a θ b α=1 = Gθ + Hθ, τ, m A 12 = τ α n α θ, n β a α=1 8 1 a,b p m 1 a p m,1 β m. ]

9 Therefore we obtain [ ] 1 Jϕ 1 A11 0 = [b 1 b p m n 1 n m ] V. A 12 The Jacobian matrix 3.2 is given by [ ] [ ] 1 [ ] A B B A 12 I m N V = [ B 0 ] [ ] [ ] A B A 12A 1 11 I m N V = BA 1 11 B V = BG + H 1 B V I m = BB V B + H 1 B V, 3.6 where G = Gθ, H = Hθ, τ, B = [b 1 b p m ], N = [n 1 n m ]. Let κ 1 θ, τ,..., κ p m θ, τ be the eigenvalues of Hθ, τ with respect to Gθ, i.e., solutions of the equation 3.4. Then, the divergence is written as Therefore the proof is completed. tr BG + H 1 B V = trg + H 1 G = p m a= κ a. Remark 3.2. The local coordinates θ, τ given in 3.1 is called the tubal coordinates, which is used in Weyl [14] to derive formulas for the volume of tubes. Remark 3.3. When K is a convex polyhedron, it holds that Bθ B constant matrix and Hθ, τ 0. In this case, the Jacobian matrix 3.6 reduces to the constant projection matrix. Remark 3.4. In Kuriki and Takemura [3], the average codimension dx is defined as dx = m + tri p m + HG 1 1 HG 1 = m + p m a=1 p m κ a 1 + κ a 1 = p 1 + κ a for x E m. Hence we have the relation div fx = p dx, a.e. 3.2 Degrees of freedom a=1 Using Lemma 3.2, we can derive an unbiased estimator of the degrees of freedom dfˆµ K. We assume that K is a closed convex set satisfying Assumption 3.1. For ˆβ E m, identifying x = ˆβ and x K = ˆβ K, let κ m,1 ˆβ,..., κ m,p m ˆβ be the eigenvalues satisfying 3.4. Formally we define E 0 = K and κ 0,a ˆβ 0, a = 1,..., p. Then, we obtain the following theorem. Note that ˆβ E m is equivalent to ˆβ / K and ˆβ K D m for m 1. 9

10 Theorem 3.1. Suppose K is a closed convex set satisfying Assumption 3.1. Then, dfˆµ K = p m m=0 a= κ m,a ˆβ I ˆβ E m 3.7 gives an unbiased estimator of the degrees of freedom dfˆµ K. Here, I is an indicator function. Hence, a C p -type criterion for ˆµ K is given by C p ˆµ K = y ˆµ K 2 2 n + 2 dfˆµ K σ 2, n which is an unbiased estimator of the prediction risk E[ y new ˆµ K 2 2]/n. Equivalently, we can define AIC for ˆµ K as AICˆµ K = y ˆµ K dfˆµ K. nσ 2 n When σ 2 is unknown, it is replaced by an unbiased estimate. In our setting 1.6, K plays a role of a tuning parameter. Practically, we choose the optimal K which minimizes C p ˆµ K or AICˆµ K among a given collection K of closed convex sets satisfying Assumption 3.1. For instance, K = {{β R p p j=1 β j t} t > 0} in the Lasso case. The usefulness of Theorem 3.1 is that it does not required to know the functional form of ˆβ K in calculation of 3.7. Once we know the numerical values of ˆβ and ˆβ K, we can calculate the value of 3.7 through the geometric quantities such as the first fundamental form and the second fundamental form. Especially, if K is a convex polyhedron, all κ m,a s turn out to be zero. Therefore, 3.7 is simply expressed as dfˆµ K = p mi ˆβ E m, 3.8 m=1 which coincides with the dimension of the face which contains ˆβ K as a relatively interior point when ˆβ / K. 4 Examples In this section, we provide unbiased estimators of the degrees of freedom for the Lasso, the fused Lasso, and the group Lasso. Our result is also applicable to order restricted inference. The degrees of freedom in order restricted inference is studied in Meyer and Woodroofe [5] in the case where K is a convex polyhedral cone. 10

11 4.1 Lasso For the Lasso, the constraint region is given by K = {β R p β j t}. j=1 We denote the Lasso estimator by ˆβt rather than ˆβ K. Since K is a convex polyhedron, an unbiased estimator dft of the degrees of freedom of ˆµt = X ˆβt is given by 3.8. In this case, if ˆβt D m, then the number of zeros in ˆβt is equal to m 1. Therefore we obtain the expression { #{j dft = ˆβt j 0} 1 if p j=1 ˆβ j > t, p if p j=1 ˆβ j t. A similar result is presented in Zou et al. [15], although their parametrization is not same as ours. 4.2 Fused Lasso The fused Lasso Tibshirani et al. [11] is the shrinkage method with the constraint region K = {β R p β j t 1, j=1 β j β j 1 t 2 }. j=2 We assume t 1 t 2. Let ˆβt be the fused Lasso estimator with t = t 1, t 2. Since K is a convex polyhedron, an unbiased estimator dft of the degrees of freedom of ˆµt = X ˆβt is given by 3.8. Define K 1 = {β R p β j t 1 } and j=1 K 2 = {β R p β j β j 1 t 2 }. j=2 Corresponding to the 2 p different possible signs for p components of β, K 1 is expressed as the solution set of 2 p linear inequalities: K 1 = {β R p a iβ t 1, i = 1,..., 2 p }. Similarly, K 2 is expressed as the solution set of 2 p 1 linear inequalities: K 2 = {β R p b iβ t 2, i = 1,..., 2 p 1 }. 11

12 For instance, if p = 3, a 1 = 1, 1, 1, a 2 = 1, 1, 1, a 3 = 1, 1, 1, a 4 = 1, 1, 1, a 5 = 1, 1, 1, a 6 = 1, 1, 1, a 7 = 1, 1, 1, a 8 = 1, 1, 1 and b 1 = 1, 0, 1, b 2 = 1, 2, 1, b 3 = 1, 2, 1, b 4 = 1, 0, 1. Each open face of the polytope K = K 1 K 2 is of the form {β R p a iβ = t 1, i I 1, b iβ = t 2, i I 2, a jβ < t 1, j {1,..., 2 p }\I 1, b jβ < t 2, j {1,..., 2 p 1 }\I 2 }, 4.1 where I 1 {1,..., 2 p } and I 2 {1,..., 2 p 1 }. Suppose a nonempty open face F of K is given by 4.1 where the matrix whose column vectors are a i, i I 1 and b i, i I 2 is of rank m. Then the dimension of F is p m. From these observations, we know that the unbiased estimator dft of dfˆµt is given by p m 1 t if ˆβt K1 K2 and ˆβ / K, p m 2 t if ˆβt K dft = 1 K 2 and ˆβ / K, p m 3 t if ˆβt K1 K 2 and ˆβ / K, p if ˆβ K, where m 1 t = #{j ˆβt j = 0} + 1, m 2 t = #{j 2 ˆβt j ˆβt j 1 = 0} + 1, m 3 t = #{j ˆβt j = 0} + #{j 2 ˆβt j ˆβt j 1 = 0, ˆβt j 1, ˆβt j 0} + 2. Remark 4.1. In Tibshirani et al. [11], with the penalization formulation, they propose p #{j ˆβ j = 0} #{j 2 ˆβ j ˆβ j 1 = 0, ˆβ j, ˆβ j 1 0} as an estimator of the degrees of freedom for the fused Lasso, where ˆβ is the fused Lasso estimator. They, however, do not present a mathematical proof for unbiasedness of this estimator. 4.3 Group Lasso The group Lasso is proposed in Yuan and Lin [12]. The constraint region of the group Lasso is J K = {β R p β[j]v j β [j] 1/2 t}, j=1 where β is partitioned as β = β [1],..., β [J] with β [j] being a p j 1 vector, and V j is a p j p j symmetric positive definite matrix. In the subsequent calculation, we assume that X is orthonormal, i.e., X X = I p and hence V = I p. For x R p, let x [1] = x 1,..., x q. We first treat the case K = {x R p x [1] 2 + x q x p t},

13 where x [1] 2 = q j=1 x2 j 1 2. We focus on the following surface area: M ={x R p x [1] 2 + x q x q+r = t, x [1] 2 > 0, x q+1 > 0,..., x q+r > 0, x q+r+1 = = x p = 0}. The set M is a q + r 1-dimensional smooth manifold. To introduce a local coordinate system on M, we transform x [1] into polar coordinates Takemura [9] as x [1] = θ q uθ 1,..., θ q 1, with cos θ 1 sin θ 1 cos θ 2 uθ 1,..., θ q 1 =., sin θ 1 sin θ 2 cos θ q 1 sin θ 1 sin θ 2 sin θ q 1 where 0 θ i π,i = 1,..., q 2, 0 θ q 1 < 2π, and 0 < θ q < t. Then the rest of the variables x q+1,..., x q+r must satisfy x q x q+r = t θ q. Let e i R p be the vector of which only i-th component is 1 and all other components are zero. Take b q+j = e q+1+j e q+1, j = 1,..., r 1. Then x M is expressed as x = xθ 1,..., θ q+r 1 = x [1] x q+1. x q+r 0 p q r uθ1,..., θ = θ q 1 q + t θ 0 q e q+1 + θ q+1 b q θ q+r 1 b q+r 1, p q where 0 i is i 1 zero vector, and θ q+1,..., θ q+r 1 satisfy θ q+j > 0, j = 1,..., r 1 and r 1 j=1 θ q+j < 1. The partial derivative of uθ 1,..., θ q 1 with respect to θ 1 is given by sin θ 1 u cos θ 1 sin θ 2 θ 1,..., θ q 1 = θ 1. cos θ 1 sin θ 2 sin θ q 2 cos θ q 1 cos θ 1 sin θ 2 sin θ q 2 sin θ q 1 vθ 1,..., θ q 1. 13

14 Define vθ i,..., θ q 1 for i 2 in the similar manner. Then, we have u 0 θ 1,..., θ q 1 = sin θ 1 sin θ i 1 i 1. θ i vθ i,..., θ q 1 Thus the tangent space T x M at x is spanned by the following q+r 1 linearly independent vectors: 0 x i 1 = θ q sin θ 1 sin θ i 1 vθ i,..., θ q 1, i = 1,..., q 1, θ i 0 p q x uθ1,..., θ = q 1 e θ q 0 q+1 + θ q+1 b q θ q+r 1 b q+r 1, p q x = t θ q b q+j, j = 1,..., r 1. θ q+j It is easy to see that the orthonormal system with n 1 = {n 1,..., n p q r+1 }, uθ 1 1,..., θ q 1 1 r r p q r and n 2 = e q+r+1,..., n p q r+1 = e p, gives a basis of Tx M. Here, 1 r = 1,..., 1 } {{ }. To r calculate the second fundamental forms, we evaluate the second partial derivatives of x, which are summarized as follows: 2 0 x i 1 = θ θi 2 q sin θ 1 sin θ i 1 uθ i,..., θ q 1, i = 1,..., q 1, 0 p q 2 0 x j 1 = θ q sin θ 1 cos θ i sin θ j 1 vθ j,..., θ q 1, 1 i < j q 1, θ i θ j 0 p q 2 0 x i 1 = sin θ 1 sin θ i 1 vθ i,..., θ q 1, i = 1,..., q 1, θ i θ q 0 p q 2 x θ q θ q+j = b q+j, j = 1,..., r 1, 2 x = 2 x θ i θ q+j θq 2 = 2 x θ 2 q+j = 0, i = 1,..., q 1, j = 1,..., r 1. Here, uθ i,..., θ q 1, i 2 are defined in the similar way as uθ 1,..., θ q 1. 14

15 Therefore the second fundamental forms are calculated as follows: 2 x H 1 = n 1, θ i θ j 1 i,j q+r 1 h = r h q 1 0, with h i = θ q sin 2 θ 1 sin 2 θ i 1, i = 1,..., q 1, and 2 x H k = n k, θ i θ j = 0, 1 i,j q+r 1 for k = 2,..., p q r + 1. Since the first fundamental form is given in the form θ q h G = θ q h q 1 0, G 22 the eigenvalues of H = p q r+1 k=1 τ k H k = τ 1 H 1 with respect to G are given by κ 1 = = κ q 1 = τ 1 θ q, κ q = = κ q+r 1 = 0, where τ 1 = τ 1 / r + 1. Returning to the original problem, let ˆβt be the resulting estimator with the constraint region 4.2. When ˆβt M, θ q and τ 1 correspond to θ q = ˆβt [1] 2 and τ 1 = ˆβ [1] ˆβt [1] 2. Since ˆβt [1] ˆβ [1] ˆβt [1] = θ q uθ 1,..., θ q 1 τ 1uθ 1,..., θ q 1 = θ q τ 1 = ˆβt [1] 2 ˆβ [1] ˆβt [1] 2, we have ˆβt [1] 2 + ˆβ [1] ˆβt [1] 2 = ˆβ [1] 2. Thus an unbiased estimator dft of dfˆµt, where ˆµt = X ˆβt, is given by 1 dft = r + q ˆβ [1] ˆβt [1] 2 / ˆβt [1] 2 = r + q 1 ˆβt [1] 2 ˆβ [1], 2 when ˆβt M and ˆβ / K. A similar calculation shows that entire dft is given by I dft = ˆβt [1] 2 > q 1 ˆβt [1] 2 ˆβ + p q [1] 2 j=1 I ˆβt q+j > 0 1 if ˆβ / K, p if ˆβ K. 15

16 Since ˆβ [1] 2 > 0 with probability 1, we also have dft as an unbiased estimator of dfˆµt, where I dft ˆβt [1] 2 > 0 + p q j=1 = I ˆβt q+j > 0 + q 1 ˆβt [1] 2 ˆβ 1 if ˆβ / K, [1] 2 p if ˆβ K. Next, for x R p, we write x = x [1],..., x [J] as a partition of x where x [j] is a p j 1 vector. Define x [j] = x [j] x [j] 1 2 for x R p. We consider the group Lasso estimation with the constraint region J K = {β R p β [j] t}. Let ˆβt be the resulting estimator. Assuming that X is orthonormal, an unbiased estimator of the degrees of freedom dfˆµt with ˆµt = X ˆβt is given by J j=1 dft = I ˆβ [j] > 0 + J j=1 p j 1 ˆβt [j] ˆβ 1 if ˆβ / K, [j] p if ˆβ K. The proof is similar as above and thus omitted. Remark 4.2. Even when X is not orthonormal, we can calculate the estimator 3.7 of the degrees of freedom numerically. j=1 5 Concluding remarks In this paper, we have derived an unbiased estimator of the degrees of freedom for the shrinkage estimator towards a closed convex set with piecewise smooth boundary. Setting the estimation problem to 1.6, we can treat selection criteria for the tuning parameter in recently proposed estimation methods such as the Lasso, the fused Lasso, and the group Lasso in unified sense. It seems to be necessary to study some optimal properties of C p or AIC in selecting the tuning parameter. For the traditional variable selection problem in linear model, there are lots of literature on properties of model selection criteria for example, Shao [7]. This topic remains in the future research. Acknowledgment The author would like to thank Professor Tatsuya Kubokawa for his encouragement and helpful suggestions. 16

17 A Appendix A.1 Proof of Lemma 3.1 Let x Em be an arbitrary fixed vector. From Remark A.1 below, x K, x x K is a regular point of ϕ. Thus the inverse function theorem implies that there exists an open neighborhood U N m of x K, x x K in N m such that ϕ U Nm : U N m ϕu N m is a diffeomorphism. Here U is an open set in R 2p containing x K, x x K. Let L > 0 be the Lipschitz constant of f. Let us define and B ɛ = {x R p x x 2 < ɛ} Q ɛ = {z R 2p z i x i < Lɛ, 1 i p, with ɛ > 0 small enough to have z p+j x j x K,j < 1 + Lɛ, 1 j p} B ɛ ϕu N m, Q ɛ U. Since f is Lipschitz continuous with Lipschitz constant L, it holds that for x B ɛ, and x K x K 2 < Lɛ, x x K x x K 2 x x 2 + x K x K 2 < 1 + Lɛ. Therefore we have x K, x x K Q ɛ N m U N m for x B ɛ. Define W = ϕ U Nm 1 B ɛ Q ɛ N m. Note that W is an open set in N m. Then it is seen that the diffeomorphism ϕ W : W B ɛ corresponds to the mapping x K, x x K x. A.2 Positive semi-definiteness of the matrix 3.3 We follow the notations used in Section 3.1. Let s 0 D m and v 0 NK, s 0 be arbitrary fixed vectors. We take a C 2 -local coordinate system θ 1,..., θ p m of D m around s 0 such that s 0 = s0,..., 0. Then we shall show the following fact: Lemma A.1. The matrix is positive semi-definite. 2 s v 0, θ a θ 0 b 1 a,b p m 17 A.1

18 Proof. Define Lθ = v 0, sθ s 0. in an appropriate neighborhood of 0. From Theorem of Webster [13], it follows that Lθ 0 for all θ in the neighborhood and L0 = 0. Hence θ = 0 is the minimizer of Lθ. Noting that 2 L θ a θ 0 = v 2 s 0, b θ a θ 0, b the second order necessary condition for the minimizer ensures that the matrix A.1 is indeed positive semi-definite. Remark A.1. From this lemma and Assertion 6.4 of Milnor [6] or our calculation in the proof of Lemma 3.2, it can be proved that x K, x x K with x E m is a regular point of ϕ. References [1] Akaike, H Information theory and an extension of the maximum liklihood principle. Second International Symposium on Information Theory [2] Efron, B The estimation of prediction error: covariance penalties and cross validation. J. Amer. Statist. Assoc. 99, [3] Kuriki, S. and Takemura, A Shrinkage estimation towards a closed convex set with a smooth boundary. J. Multivariate Anal. 75, [4] Mallows, C Some comments on C p. Technometrics 15, [5] Meyer, M. and Woodroofe, M On the degrees of freedom in shape-restricted regression. Ann. Statist. 28, [6] Milnor, J Morse Theory. Ann. Math. Stud. 51, Princeton Univ. Press, Princeton. [7] Shao, J An asymtotic theory for linear model selection with discussion. Statist. Sinica 7, [8] Stein, C Estimation of the mean of a multivariate normal distribution. Ann. Statist. 9, [9] Takemura, A Foundation of Multivariate Statistical Inference in Japanese. Kyoritsu Shuppan, Tokyo. [10] Tibshirani, R Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58, [11] Tibshirani, R., Saunders, M., Rosset, S., Zu, J. and Knight, K Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 67,

19 [12] Yuan, M. and Lin, Y Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68, [13] Webster, R Convexity. Oxford Univ. Press, Oxford. [14] Weyl, H On the volume of tubes. Amer. J. Math. 61, [15] Zou, H., Hastie, T. and Tibshirani, R On the degrees of freedom of the Lasso. Ann. Statist. to appear. 19

The degrees of freedom of the Lasso in underdetermined linear regression models

The degrees of freedom of the Lasso in underdetermined linear regression models The degrees of freedom of the Lasso in underdetermined linear regression models C. Dossal (1), M. Kachour (2), J. Fadili (2), G. Peyré (3), C. Chesneau (4) (1) IMB, Université Bordeaux 1 (2) GREYC, ENSICAEN

More information

Degrees of Freedom and Model Search

Degrees of Freedom and Model Search Degrees of Freedom and Model Search Ryan J. Tibshirani Abstract Degrees of freedom is a fundamental concept in statistical modeling, as it provides a quantitative description of the amount of fitting performed

More information

Duality of linear conic problems

Duality of linear conic problems Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least

More information

Similarity and Diagonalization. Similar Matrices

Similarity and Diagonalization. Similar Matrices MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

More information

On the Degrees of Freedom of the Lasso

On the Degrees of Freedom of the Lasso On the Degrees of Freedom of the Lasso Hui Zou Trevor Hastie Robert Tibshirani Abstract We study the degrees of freedom of the Lasso in the framewor of Stein s unbiased ris estimation (SURE). We show that

More information

Lecture 14: Section 3.3

Lecture 14: Section 3.3 Lecture 14: Section 3.3 Shuanglin Shao October 23, 2013 Definition. Two nonzero vectors u and v in R n are said to be orthogonal (or perpendicular) if u v = 0. We will also agree that the zero vector in

More information

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Mathematics Course 111: Algebra I Part IV: Vector Spaces Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are

More information

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Linear Algebra Notes for Marsden and Tromba Vector Calculus Linear Algebra Notes for Marsden and Tromba Vector Calculus n-dimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of

More information

Inner Product Spaces

Inner Product Spaces Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

More information

LINEAR ALGEBRA W W L CHEN

LINEAR ALGEBRA W W L CHEN LINEAR ALGEBRA W W L CHEN c W W L Chen, 1997, 2008 This chapter is available free to all individuals, on understanding that it is not to be used for financial gain, and may be downloaded and/or photocopied,

More information

Notes on Symmetric Matrices

Notes on Symmetric Matrices CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.

More information

Inner product. Definition of inner product

Inner product. Definition of inner product Math 20F Linear Algebra Lecture 25 1 Inner product Review: Definition of inner product. Slide 1 Norm and distance. Orthogonal vectors. Orthogonal complement. Orthogonal basis. Definition of inner product

More information

1 VECTOR SPACES AND SUBSPACES

1 VECTOR SPACES AND SUBSPACES 1 VECTOR SPACES AND SUBSPACES What is a vector? Many are familiar with the concept of a vector as: Something which has magnitude and direction. an ordered pair or triple. a description for quantities such

More information

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0. Cross product 1 Chapter 7 Cross product We are getting ready to study integration in several variables. Until now we have been doing only differential calculus. One outcome of this study will be our ability

More information

BANACH AND HILBERT SPACE REVIEW

BANACH AND HILBERT SPACE REVIEW BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Inner Product Spaces and Orthogonality

Inner Product Spaces and Orthogonality Inner Product Spaces and Orthogonality week 3-4 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,

More information

NOTES ON LINEAR TRANSFORMATIONS

NOTES ON LINEAR TRANSFORMATIONS NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all

More information

Smoothing and Non-Parametric Regression

Smoothing and Non-Parametric Regression Smoothing and Non-Parametric Regression Germán Rodríguez grodri@princeton.edu Spring, 2001 Objective: to estimate the effects of covariates X on a response y nonparametrically, letting the data suggest

More information

Data analysis in supersaturated designs

Data analysis in supersaturated designs Statistics & Probability Letters 59 (2002) 35 44 Data analysis in supersaturated designs Runze Li a;b;, Dennis K.J. Lin a;b a Department of Statistics, The Pennsylvania State University, University Park,

More information

3. INNER PRODUCT SPACES

3. INNER PRODUCT SPACES . INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.

More information

THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS

THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS KEITH CONRAD 1. Introduction The Fundamental Theorem of Algebra says every nonconstant polynomial with complex coefficients can be factored into linear

More information

Penalized regression: Introduction

Penalized regression: Introduction Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

More information

The degrees of freedom of the Group Lasso for a General Design

The degrees of freedom of the Group Lasso for a General Design The degrees of freedom of the Group Lasso for a General Design Samuel Vaiter, Charles Deledalle, Gabriel Peyré, Jalal M. Fadili, Charles Dossal To cite this version: Samuel Vaiter, Charles Deledalle, Gabriel

More information

Math 215 HW #6 Solutions

Math 215 HW #6 Solutions Math 5 HW #6 Solutions Problem 34 Show that x y is orthogonal to x + y if and only if x = y Proof First, suppose x y is orthogonal to x + y Then since x, y = y, x In other words, = x y, x + y = (x y) T

More information

THE DEGREES OF FREEDOM OF THE LASSO FOR GENERAL DESIGN MATRIX

THE DEGREES OF FREEDOM OF THE LASSO FOR GENERAL DESIGN MATRIX Statistica Sinica 23 (2013), 809-828 doi:http://dx.doi.org/10.5705/ss.2011.281 THE DEGREES OF FREEDOM OF THE LASSO FOR GENERAL DESIGN MATRIX C. Dossal 1 M. Kachour 2, M.J. Fadili 2, G. Peyré 3 and C. Chesneau

More information

Rotation Rate of a Trajectory of an Algebraic Vector Field Around an Algebraic Curve

Rotation Rate of a Trajectory of an Algebraic Vector Field Around an Algebraic Curve QUALITATIVE THEORY OF DYAMICAL SYSTEMS 2, 61 66 (2001) ARTICLE O. 11 Rotation Rate of a Trajectory of an Algebraic Vector Field Around an Algebraic Curve Alexei Grigoriev Department of Mathematics, The

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Section 1.1. Introduction to R n

Section 1.1. Introduction to R n The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to

More information

CONTROLLABILITY. Chapter 2. 2.1 Reachable Set and Controllability. Suppose we have a linear system described by the state equation

CONTROLLABILITY. Chapter 2. 2.1 Reachable Set and Controllability. Suppose we have a linear system described by the state equation Chapter 2 CONTROLLABILITY 2 Reachable Set and Controllability Suppose we have a linear system described by the state equation ẋ Ax + Bu (2) x() x Consider the following problem For a given vector x in

More information

Date: April 12, 2001. Contents

Date: April 12, 2001. Contents 2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Vector and Matrix Norms

Vector and Matrix Norms Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

More information

MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

More information

Max-Min Representation of Piecewise Linear Functions

Max-Min Representation of Piecewise Linear Functions Beiträge zur Algebra und Geometrie Contributions to Algebra and Geometry Volume 43 (2002), No. 1, 297-302. Max-Min Representation of Piecewise Linear Functions Sergei Ovchinnikov Mathematics Department,

More information

Recall that two vectors in are perpendicular or orthogonal provided that their dot

Recall that two vectors in are perpendicular or orthogonal provided that their dot Orthogonal Complements and Projections Recall that two vectors in are perpendicular or orthogonal provided that their dot product vanishes That is, if and only if Example 1 The vectors in are orthogonal

More information

A QUICK GUIDE TO THE FORMULAS OF MULTIVARIABLE CALCULUS

A QUICK GUIDE TO THE FORMULAS OF MULTIVARIABLE CALCULUS A QUIK GUIDE TO THE FOMULAS OF MULTIVAIABLE ALULUS ontents 1. Analytic Geometry 2 1.1. Definition of a Vector 2 1.2. Scalar Product 2 1.3. Properties of the Scalar Product 2 1.4. Length and Unit Vectors

More information

ISOMETRIES OF R n KEITH CONRAD

ISOMETRIES OF R n KEITH CONRAD ISOMETRIES OF R n KEITH CONRAD 1. Introduction An isometry of R n is a function h: R n R n that preserves the distance between vectors: h(v) h(w) = v w for all v and w in R n, where (x 1,..., x n ) = x

More information

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by Statistics 580 Maximum Likelihood Estimation Introduction Let y (y 1, y 2,..., y n be a vector of iid, random variables from one of a family of distributions on R n and indexed by a p-dimensional parameter

More information

How To Prove The Dirichlet Unit Theorem

How To Prove The Dirichlet Unit Theorem Chapter 6 The Dirichlet Unit Theorem As usual, we will be working in the ring B of algebraic integers of a number field L. Two factorizations of an element of B are regarded as essentially the same if

More information

Equations Involving Lines and Planes Standard equations for lines in space

Equations Involving Lines and Planes Standard equations for lines in space Equations Involving Lines and Planes In this section we will collect various important formulas regarding equations of lines and planes in three dimensional space Reminder regarding notation: any quantity

More information

Row Ideals and Fibers of Morphisms

Row Ideals and Fibers of Morphisms Michigan Math. J. 57 (2008) Row Ideals and Fibers of Morphisms David Eisenbud & Bernd Ulrich Affectionately dedicated to Mel Hochster, who has been an inspiration to us for many years, on the occasion

More information

Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

More information

24. The Branch and Bound Method

24. The Branch and Bound Method 24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no

More information

University of Lille I PC first year list of exercises n 7. Review

University of Lille I PC first year list of exercises n 7. Review University of Lille I PC first year list of exercises n 7 Review Exercise Solve the following systems in 4 different ways (by substitution, by the Gauss method, by inverting the matrix of coefficients

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Section 4.4 Inner Product Spaces

Section 4.4 Inner Product Spaces Section 4.4 Inner Product Spaces In our discussion of vector spaces the specific nature of F as a field, other than the fact that it is a field, has played virtually no role. In this section we no longer

More information

α = u v. In other words, Orthogonal Projection

α = u v. In other words, Orthogonal Projection Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v

More information

ABSTRACT. For example, circle orders are the containment orders of circles (actually disks) in the plane (see [8,9]).

ABSTRACT. For example, circle orders are the containment orders of circles (actually disks) in the plane (see [8,9]). Degrees of Freedom Versus Dimension for Containment Orders Noga Alon 1 Department of Mathematics Tel Aviv University Ramat Aviv 69978, Israel Edward R. Scheinerman 2 Department of Mathematical Sciences

More information

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

Metric Spaces. Chapter 7. 7.1. Metrics

Metric Spaces. Chapter 7. 7.1. Metrics Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some

More information

Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel

Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 2, FEBRUARY 2002 359 Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel Lizhong Zheng, Student

More information

Penalized Logistic Regression and Classification of Microarray Data

Penalized Logistic Regression and Classification of Microarray Data Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification

More information

RIGIDITY OF HOLOMORPHIC MAPS BETWEEN FIBER SPACES

RIGIDITY OF HOLOMORPHIC MAPS BETWEEN FIBER SPACES RIGIDITY OF HOLOMORPHIC MAPS BETWEEN FIBER SPACES GAUTAM BHARALI AND INDRANIL BISWAS Abstract. In the study of holomorphic maps, the term rigidity refers to certain types of results that give us very specific

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

1 Sets and Set Notation.

1 Sets and Set Notation. LINEAR ALGEBRA MATH 27.6 SPRING 23 (COHEN) LECTURE NOTES Sets and Set Notation. Definition (Naive Definition of a Set). A set is any collection of objects, called the elements of that set. We will most

More information

ON THE DEGREES OF FREEDOM IN RICHLY PARAMETERISED MODELS

ON THE DEGREES OF FREEDOM IN RICHLY PARAMETERISED MODELS COMPSTAT 2004 Symposium c Physica-Verlag/Springer 2004 ON THE DEGREES OF FREEDOM IN RICHLY PARAMETERISED MODELS Salvatore Ingrassia and Isabella Morlini Key words: Richly parameterised models, small data

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

8.1 Examples, definitions, and basic properties

8.1 Examples, definitions, and basic properties 8 De Rham cohomology Last updated: May 21, 211. 8.1 Examples, definitions, and basic properties A k-form ω Ω k (M) is closed if dω =. It is exact if there is a (k 1)-form σ Ω k 1 (M) such that dσ = ω.

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

ON THE DEGREES OF FREEDOM OF THE LASSO

ON THE DEGREES OF FREEDOM OF THE LASSO The Annals of Statistics 2007, Vol. 35, No. 5, 2173 2192 DOI: 10.1214/009053607000000127 Institute of Mathematical Statistics, 2007 ON THE DEGREES OF FREEDOM OF THE LASSO BY HUI ZOU, TREVOR HASTIE AND

More information

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they

More information

Irreducibility criteria for compositions and multiplicative convolutions of polynomials with integer coefficients

Irreducibility criteria for compositions and multiplicative convolutions of polynomials with integer coefficients DOI: 10.2478/auom-2014-0007 An. Şt. Univ. Ovidius Constanţa Vol. 221),2014, 73 84 Irreducibility criteria for compositions and multiplicative convolutions of polynomials with integer coefficients Anca

More information

Factorization Theorems

Factorization Theorems Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

More information

Math 312 Homework 1 Solutions

Math 312 Homework 1 Solutions Math 31 Homework 1 Solutions Last modified: July 15, 01 This homework is due on Thursday, July 1th, 01 at 1:10pm Please turn it in during class, or in my mailbox in the main math office (next to 4W1) Please

More information

Math 241, Exam 1 Information.

Math 241, Exam 1 Information. Math 241, Exam 1 Information. 9/24/12, LC 310, 11:15-12:05. Exam 1 will be based on: Sections 12.1-12.5, 14.1-14.3. The corresponding assigned homework problems (see http://www.math.sc.edu/ boylan/sccourses/241fa12/241.html)

More information

Chapter 17. Orthogonal Matrices and Symmetries of Space

Chapter 17. Orthogonal Matrices and Symmetries of Space Chapter 17. Orthogonal Matrices and Symmetries of Space Take a random matrix, say 1 3 A = 4 5 6, 7 8 9 and compare the lengths of e 1 and Ae 1. The vector e 1 has length 1, while Ae 1 = (1, 4, 7) has length

More information

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α

More information

Section 6.1 - Inner Products and Norms

Section 6.1 - Inner Products and Norms Section 6.1 - Inner Products and Norms Definition. Let V be a vector space over F {R, C}. An inner product on V is a function that assigns, to every ordered pair of vectors x and y in V, a scalar in F,

More information

Mean Value Coordinates

Mean Value Coordinates Mean Value Coordinates Michael S. Floater Abstract: We derive a generalization of barycentric coordinates which allows a vertex in a planar triangulation to be expressed as a convex combination of its

More information

CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e.

CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e. CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e. This chapter contains the beginnings of the most important, and probably the most subtle, notion in mathematical analysis, i.e.,

More information

Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number

More information

17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function

17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function 17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function, : V V R, which is symmetric, that is u, v = v, u. bilinear, that is linear (in both factors):

More information

BLOCKWISE SPARSE REGRESSION

BLOCKWISE SPARSE REGRESSION Statistica Sinica 16(2006), 375-390 BLOCKWISE SPARSE REGRESSION Yuwon Kim, Jinseog Kim and Yongdai Kim Seoul National University Abstract: Yuan an Lin (2004) proposed the grouped LASSO, which achieves

More information

ON FIBER DIAMETERS OF CONTINUOUS MAPS

ON FIBER DIAMETERS OF CONTINUOUS MAPS ON FIBER DIAMETERS OF CONTINUOUS MAPS PETER S. LANDWEBER, EMANUEL A. LAZAR, AND NEEL PATEL Abstract. We present a surprisingly short proof that for any continuous map f : R n R m, if n > m, then there

More information

MATH1231 Algebra, 2015 Chapter 7: Linear maps

MATH1231 Algebra, 2015 Chapter 7: Linear maps MATH1231 Algebra, 2015 Chapter 7: Linear maps A/Prof. Daniel Chan School of Mathematics and Statistics University of New South Wales danielc@unsw.edu.au Daniel Chan (UNSW) MATH1231 Algebra 1 / 43 Chapter

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

More information

The Ideal Class Group

The Ideal Class Group Chapter 5 The Ideal Class Group We will use Minkowski theory, which belongs to the general area of geometry of numbers, to gain insight into the ideal class group of a number field. We have already mentioned

More information

1 Introduction to Matrices

1 Introduction to Matrices 1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns

More information

LEAST ANGLE REGRESSION. Stanford University

LEAST ANGLE REGRESSION. Stanford University The Annals of Statistics 2004, Vol. 32, No. 2, 407 499 Institute of Mathematical Statistics, 2004 LEAST ANGLE REGRESSION BY BRADLEY EFRON, 1 TREVOR HASTIE, 2 IAIN JOHNSTONE 3 AND ROBERT TIBSHIRANI 4 Stanford

More information

MATH 425, PRACTICE FINAL EXAM SOLUTIONS.

MATH 425, PRACTICE FINAL EXAM SOLUTIONS. MATH 45, PRACTICE FINAL EXAM SOLUTIONS. Exercise. a Is the operator L defined on smooth functions of x, y by L u := u xx + cosu linear? b Does the answer change if we replace the operator L by the operator

More information

Notes from February 11

Notes from February 11 Notes from February 11 Math 130 Course web site: www.courses.fas.harvard.edu/5811 Two lemmas Before proving the theorem which was stated at the end of class on February 8, we begin with two lemmas. The

More information

3. Linear Programming and Polyhedral Combinatorics

3. Linear Programming and Polyhedral Combinatorics Massachusetts Institute of Technology Handout 6 18.433: Combinatorial Optimization February 20th, 2009 Michel X. Goemans 3. Linear Programming and Polyhedral Combinatorics Summary of what was seen in the

More information

IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL. 1. Introduction

IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL. 1. Introduction IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL R. DRNOVŠEK, T. KOŠIR Dedicated to Prof. Heydar Radjavi on the occasion of his seventieth birthday. Abstract. Let S be an irreducible

More information

5. Orthogonal matrices

5. Orthogonal matrices L Vandenberghe EE133A (Spring 2016) 5 Orthogonal matrices matrices with orthonormal columns orthogonal matrices tall matrices with orthonormal columns complex matrices with orthonormal columns 5-1 Orthonormal

More information

a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.

a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2. Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given

More information

Mean value theorem, Taylors Theorem, Maxima and Minima.

Mean value theorem, Taylors Theorem, Maxima and Minima. MA 001 Preparatory Mathematics I. Complex numbers as ordered pairs. Argand s diagram. Triangle inequality. De Moivre s Theorem. Algebra: Quadratic equations and express-ions. Permutations and Combinations.

More information

Classification of Cartan matrices

Classification of Cartan matrices Chapter 7 Classification of Cartan matrices In this chapter we describe a classification of generalised Cartan matrices This classification can be compared as the rough classification of varieties in terms

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation

More information

Largest Fixed-Aspect, Axis-Aligned Rectangle

Largest Fixed-Aspect, Axis-Aligned Rectangle Largest Fixed-Aspect, Axis-Aligned Rectangle David Eberly Geometric Tools, LLC http://www.geometrictools.com/ Copyright c 1998-2016. All Rights Reserved. Created: February 21, 2004 Last Modified: February

More information

4: SINGLE-PERIOD MARKET MODELS

4: SINGLE-PERIOD MARKET MODELS 4: SINGLE-PERIOD MARKET MODELS Ben Goldys and Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2015 B. Goldys and M. Rutkowski (USydney) Slides 4: Single-Period Market

More information

Orthogonal Diagonalization of Symmetric Matrices

Orthogonal Diagonalization of Symmetric Matrices MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding

More information

THREE DIMENSIONAL GEOMETRY

THREE DIMENSIONAL GEOMETRY Chapter 8 THREE DIMENSIONAL GEOMETRY 8.1 Introduction In this chapter we present a vector algebra approach to three dimensional geometry. The aim is to present standard properties of lines and planes,

More information

Example 4.1 (nonlinear pendulum dynamics with friction) Figure 4.1: Pendulum. asin. k, a, and b. We study stability of the origin x

Example 4.1 (nonlinear pendulum dynamics with friction) Figure 4.1: Pendulum. asin. k, a, and b. We study stability of the origin x Lecture 4. LaSalle s Invariance Principle We begin with a motivating eample. Eample 4.1 (nonlinear pendulum dynamics with friction) Figure 4.1: Pendulum Dynamics of a pendulum with friction can be written

More information

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA

More information

SYSTEMS OF REGRESSION EQUATIONS

SYSTEMS OF REGRESSION EQUATIONS SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations

More information

Linear Algebraic Equations, SVD, and the Pseudo-Inverse

Linear Algebraic Equations, SVD, and the Pseudo-Inverse Linear Algebraic Equations, SVD, and the Pseudo-Inverse Philip N. Sabes October, 21 1 A Little Background 1.1 Singular values and matrix inversion For non-smmetric matrices, the eigenvalues and singular

More information

The Characteristic Polynomial

The Characteristic Polynomial Physics 116A Winter 2011 The Characteristic Polynomial 1 Coefficients of the characteristic polynomial Consider the eigenvalue problem for an n n matrix A, A v = λ v, v 0 (1) The solution to this problem

More information