SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA)

Size: px
Start display at page:

Download "SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA)"

Transcription

1 SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI (B.Sc.(Hons.), BUAA) A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF MATHEMATICS NATIONAL UNIVERSITY OF SINGAPORE 2005

2 Acknowledgements I am very grateful to my supervisor, Dr Sun Defeng, for his abundant knowledge and kindness. In the past two years, Dr Sun has always helped me when I was in trouble, encouraged me when I lost confidence. This thesis would not have been finished without the invaluable suggestions and patient guidance from him. I would also like to thank Miss. Shi Shengyuan, who has helped me so much in my study. Thanks all the staff and friends at National University of Singapore for their help and support. Yu Qi August 2005 ii

3 Summary In order to solve a convex non-differentiable optimization problem, one may introduce a smoothing convex function to approximate the non-differentiable objective function and solve the smoothing convex optimization problem to get an approximate solution. Nesterov showed that if the gradient of the smoothing function is Lipschitz continuous, one may get an ɛ-approximation solution with the number of iterations bounded by O( 1 ) [9]. In [11], Nesterov discussed the problem ɛ of minimizing the maximal eigenvalue and the problem of minimizing the spectral radius. Recently, Shi [12] presented a smoothing function for the sum of the κ largest components of a vector. This smoothing function is highly advantageous because the composition of this function and eigenvalue functions allows one to compute smoothing functions to approximate all eigenvalue functions of a real symmetric matrix. In this thesis, we further study the properties of the smoothing functions to approximate the eigenvalue functions. In particular, we obtain an estimation of the Lipschitz constant of the gradient of these smoothing functions. iii

4 Summary iv We then consider the problem of minimizing the sum of the κ largest eigenvalues by applying Nesterov s smoothing method [9]. Finally, we extend this algorithm to solve the problem of minimizing the sum of the κ largest absolute values of eigenvalues. These two problems are general forms of the problem of minimizing the maximal eigenvalue and the problem of minimizing the spectral radius, considered respectively in [11]. We also report some numerical results for both problems. The organization of this thesis is as follows. In Chapter 1 we first describe the problems discussed in [11] by Nesterov and then extend them to general cases. In Chapters 2 and 3 we discuss some important properties of smoothing functions for approximating the sum of the κ largest eigenvalues and the sum of the κth largest absolute values of eigenvalues of a parametric affine operator, respectively. The smoothing algorithm and computational results are given in Chapter 4.

5 List of Notation A, B,... denote matrices; M n,m denotes the n-by-m matrix. S m is the set of all m m real symmetric matrices; O m is the set of all m m orthogonal matrices. A superscript T represents the transpose of matrices and vectors. For a matrix M, M i and M j represent the ith row and j th column of M, respectively. M ij denotes the (i, j)th entry of M. A diagonal matrix is written as Diag(β 1,..., β n ). We use to denote the Hadamard product between matrices, i.e. X Y = [X ij Y ij ] m i,j=1. Let A 1,..., A n S m be given, we define the linear operator A : R n S m by A(x) := x i A i, x R n. (1) v

6 List of Notation vi Let A : S m R n be the adjoint of the linear operator A : R n S m defined by (1): d, A D = D, Ad, (d, D) R n S m. Hence, for all D S m, A D = ( A 1, D,..., A n, D ) T. Denote G S m as (G) ij = A i, A j. The eigenvalues of X S m are designated by λ i (X), i = 1,..., m, and λ 1 (X) λ 2 (X) λ m (X). We write X = O(α) (respectively, o(α)) if X / α is uniformly bounded (respectively, tends to zero) as α 0. For B M n,m, the operator Ξ : M n,m S m+n is defined as Ξ(B) = 0 B. B T 0 We define the linear operator Γ : R n S 2m as Γ(x) = Ξ(A(x)). (2) Let Γ : S 2m R n be the adjoint operator of the linear operator Γ, for any Y S 2m, Γ (Y ) = (2 A 1, Y 2,, 2 A n, Y 2 ) T, where Y = Y 1 Y 2 Y2 T Y 3, and Y 1, Y 2, Y 3 S m.

7 Contents Acknowledgements ii Summary iii List of Notation v 1 Introduction 3 2 Properties of the Smoothing Function for the Sum of the κ Largest Eigenvalues The Smoothing Function for the Sum of κ Largest Components Spectral Functions Smoothing Functions for φ(x) Smoothing Functions for the Sum of the κ Largest Absolute Values of Eigenvalues Preliminaries

8 Contents Smoothing Functions for ψ(x) Numerical Experiments Computational Results Conclusions Bibliography 32

9 Chapter 1 Introduction Let S m be the set of real m-by-m symmetric matrices. For X S m, let {λ i (X)} m be the eigenvalues of X which are sorted in nonincreasing order, i.e. λ 1 (X) λ 2 (X) λ κ (X) λ m (X). Let A 1,..., A n S m be given. Define the operator A : R n S m by A(x) = x i A i, x R n. (1.1) Let Q be a bounded closed convex set in R n and C S m. In [11], Nesterov considered the following nonsmooth problem: min λ 1(C + A(x)). (1.2) x Q Nesterov s approach is to replace the above nonsmooth function λ 1 (C + A(x)) by the following smooth function: S µ (C + A(x)), (1.3) where the tolerance parameter µ > 0, and S µ (X) is the product of the entropy function and µ, i.e. S µ (X) = µ ln[ e λi(x)/µ ]. (1.4) 3

10 4 The function S µ (X) approximates λ 1 (X) as µ 0. following smoothed optimization problem: Now, let us consider the min {S µ(c + A(x))}. (1.5) x Q Note that for any µ > 0, the gradient mapping S µ ( ) is globally Lipschitz continuous. Nesterov suggested to use a gradient based numerical method [9] to solve problem (1.5). For a given matrix X S m, we define its spectral radius by: ρ(x) := max 1 i m λ i(x) = max{λ 1 (X), λ m (X)}. (1.6) Let ϕ(x) := ρ(c + A(x)). Another problem discussed in Nesterov s paper [11] is min ϕ(x) (1.7) x Q with C 0. Nesterov constructed the smoothing function as ϕ p (x) = F p (A(x)), x Q, where F p ( ) is: F p (X) = 1 2 X2p, I n 1 p. (1.8) Nesterov considered the smoothing problem min {ϕ p(x)} (1.9) x Q and used method [8] to solve the smoothing problem (1.9). In this thesis, we shall extend Nesterov s approach to the following two problems. Let φ(x) be the sum of κth largest eigenvalues of C + A(x), i.e. φ(x) = κ λ i (C + A(x)). (1.10)

11 5 We shall first consider in this thesis the following nonsmooth problem: Clearly, if κ = 1, (1.11) turns to be problem (1.2). min φ(x). (1.11) x Q Let λ [κ] (X) be the κth largest absolute value of eigenvalues of X, sorted in the nonincreasing order, i.e. λ [1] (X) λ [2] (X) λ [κ] (X) λ [n] (X). We define the following function: ψ(x) = The second problem that we shall consider in this thesis is which is a general case of (1.7). κ λ [i] (C + A(x)). (1.12) min ψ(x), (1.13) x Q We shall construct smoothing functions for problem (1.11) and (1.13) respectively. The gradients of the smoothing functions must satisfy the global Lipschitz condition, which makes it possible for us to apply Nesterov s algorithm [9] to solve the smoothing problems. Nesterov s method is to solve the optimization problem: where θ( ) is a nonsmooth convex function. x Q, i.e. min θ(x), (1.14) x Q Our goal is to find an ɛ solution θ( x) θ ɛ, (1.15) where θ = min x Q θ(x). We denote θ µ( ) as a smoothing function for θ( ). The smoothing function θ µ ( ) satisfies the following inequality: θ µ (x) θ(x) θ µ (x) + µr, (1.16)

12 6 where R is a constant for a specified smoothing function θ µ (x) (We will give its definition in Chapter 2). Nesterov proved the following inequality [9, Theorem 2] θ( x) θ θ µ ( x) θ µ + µr, where θ µ = min x Q θ µ(x). Let µ = µ(ɛ) = ε 2R and θ µ ( x) θµ 1 ɛ. (1.17) 2 We have θ( x) θ ɛ. Nesterov s algorithm can improve the bound on the number of iterations to O( 1), ɛ while the traditional algorithms need O( 1 ) iterations. ɛ 2 The remaing part of this thesis is as follows. We discuss the properties of smoothing functions for approximating the sum of the κ largest eigenvalues of a parametric affine operator in Chapter 2 and the sum of the κ largest absolute values in Chapter 3. Computational results are given in Chapter 4.

13 Chapter 2 Properties of the Smoothing Function for the Sum of the κ Largest Eigenvalues 2.1 The Smoothing Function for the Sum of κ Largest Components In her thesis [12], Shi discussed the smoothing function for the κ largest components of a vector. For x R n we denote by x [κ] the κth largest component of x, i.e., x [1] x [2] x [κ] x [n] sorted in the nonincreasing order. Define f κ (x) = κ x [i]. (2.1) Denote by Q κ the convex set in R n : Q κ = {v R n : v i = κ, 0 v i 1, i = 1, 2,..., n}, (2.2) and 7

14 2.1 The Smoothing Function for the Sum of κ Largest Components 8 Let where r(v) = z ln z, z (0, 1], p(z) = 0, z = 0. p(v i ) + (2.3) p(1 v i ) + R, v Q κ, (2.4) R := n ln n κ ln κ (n κ) ln(n κ), (2.5) which is the maximal value of r(v). The smoothing function for f κ ( ) is f µ κ ( ) : R n R, which is defined by: f κ µ (x) := max x T v µr(v) s.t. v i = κ 0 v i 1, i = 1,..., n. (2.6) Shi has also provided the optimal solution to f µ κ ( ) in (2.6): where α satisfies v i (µ, x) = e α(µ,x) x i µ e α(µ,x) x i µ (2.7) = κ. (2.8) In order to introduce Lemma 2.1 in [12], we need the definition of the γ function: γ i (µ, x) := e α(µ,x) x i µ µ(1 + e α(µ,x) x i µ ) 2, i = 1,, n. (2.9) Without causing any confusion, let γ i := γ i (µ, x), for i = 1,..., n, and γ = (γ 1,..., γ n ) T.

15 2.1 The Smoothing Function for the Sum of κ Largest Components 9 Lemma 2.1. The optimal solution to problem (2.6), v(µ, x), is continuously differentiable on R ++ R n, with γ 1 x v(µ, x) =... 1 γ(γ) T. (2.10) (γ k ) Proof. From (2.7), for each i = 1,, n, γ n k=1 v i (µ, x)(1 + e α(µ,x) x i µ ) = 1. (2.11) Taking derivatives of x on both side of (2.11), we have ( x v(µ, x)) i = 1 e α(µ,x) xi µ(1 + e α(µ,x) x µ (e i i x α(µ, x)) µ ) 2 = γ i (e i x α(µ, x)). (2.12) From (2.8), we have x α(µ, x) = = 1 (γ k ) k=1 (γ i e i ) 1 (γ 1,, γ n ) T, (γ k ) k=1 (2.13) where e i R n, its ith entry is 1 and others are all zeros. From (2.12) and (2.13), we obtain: ( x v(µ, x)) i = e i γ i γ i (γ 1,, γ n ) T. (2.14) (γ k ) k=1 Lemma 2.2. For γ i, i = 1,..., n given by (2.9), has bounds: 0 γ i 1 4µ. (2.15)

16 2.1 The Smoothing Function for the Sum of κ Largest Components 10 Proof. Denote p i (µ, x) = e α(µ,x) x i µ, i = 1,..., n. Thus we have γ i = 1 µ p i (1 + p i ) 2 = 1 µ ( p i 1 (1 + p i ) 2 ) = 1 µ ( ( p i 1 2 ) ) 1 4µ. (2.16) The following theorem from [12] describes some properties of f µ κ (x). Theorem 2.3. For µ > 0, x R n, the function f µ κ (x) has the following properties: 1. f µ κ (x) is convex; 2. f µ κ (x) is continuously differentiable; 3. f µ κ (x) f κ (x) f µ κ (x) + µr. From the above theorem, for each µ > 0, f κ µ (x) is continuously differentiable. According to [12], the gradient of f κ µ (x) is the optimal solution to problem (2.6), i.e., v(µ, x). Therefore the gradient and Hessian of f κ µ (x) for any µ > 0 are given by f κ µ (x) = v(µ, x); (2.17) 2 f κ µ (x) = x v(µ, x). (2.18) Up to now we have reviewed some of Shi s results. provide an estimate to 2 f µ κ (x) in the following theorem. Based on these results, we

17 2.1 The Smoothing Function for the Sum of κ Largest Components 11 Theorem 2.4. For µ > 0, we have the following conclusions on 2 f µ κ (x): 1. For h R n, 0 h, 2 (f µ κ (x))h 1 4µ h 2 2. (2.19) 2. (( 2 (f µ κ (x))) ij is the (i, j)th entry of ( 2 (f µ κ (x))), for i j 0 ( 2 (f κ µ (x))) ii 1 4µ ; (2.20) 1 4µ ( 2 (f κ µ (x))) ij 0. (2.21) Proof. First we prove part 1. Since f µ κ (x) is convex, its Hessian 2 f µ κ (x) is positive semidefinite, i.e., By Lemma 2.1, h, x f µ κ (x)h 0. h, 2 f κ µ (x)h = h, x v(µ, x)h = γ i h 2 i 1 (γ T h) 2 γ k γ i h 2 i 1 4µ h2 i = 1 4µ h 2 2. Now let us prove part 2. For any 1 i n, 1 j n, i j, k=1 ( 2 (f µ κ (x))) ii = ( x v(µ, x)) ii = γ i (γ i) 2 γ k k=1 = γ i 1 γ i γ k k=1. (2.22) (2.23)

18 2.1 The Smoothing Function for the Sum of κ Largest Components 12 According to Lemma 2.2, we have 0 γ i 1 4µ, and Therefore, On the other hand, 0 1 γ i γ k k= ( 2 (f µ κ (x))) ii 1 4µ. (2.24) ( 2 (f µ κ (x))) ij = ( x v(µ, x)) ij Since we have = γ iγ j γ k k=1 γ j = γ i. γ j j=1 (2.25) 0 γ j 1, (2.26) γ j j=1 1 4µ ( 2 (f µ κ (x))) ij 0, i j. (2.27) Remark. Nesterov proved in [9, Theorem 1] that for problem (1.14), if the function θ µ ( ) (µ > 0) is continuously differentiable, then its gradient θ µ (x) = A v µ (x) is Lipschitz continuous with its Lipschitz constant L µ = 1 µσ 2 A 2 1,2.

19 2.2 Spectral Functions 13 As to the smoothing function f µ κ ( ), σ 2 = 4 and A is an identity operator. So Theorem 2.4 may also be derived from [9, Theorem 1]. Here we provide a direct proof. 2.2 Spectral Functions A function F on the space of m-by-m real symmetric matrices is called spectral if it depends only on the eigenvalues of its argument. Spectral functions are just symmetric functions of the eigenvalues. A symmetric function is a function that is unchanged by any permutation of its variables. In this thesis, we are interested in functions F of a symmetric matrix argument that are invariant under orthogonal similarity transformations [6]: F (U T AU) = F (A), U O, A S m, (2.28) where O is the set of orthogonal matrices. Every such function can be decomposed as F (A) = (f λ)(a), where λ is the map that gives the eigenvalues of the matrix A and f is a symmetric function. We call such functions F spectral functions because they depend only on the spectrum of the operator A. Therefore, we can regard a spectral function as a composition of a symmetric function and the eigenvalue function. In order to show some preliminary results, we give the following definition. For each X S m, define the set of orthonormal eigenvectors of X by O X := {P O : P T XP = Diag[λ(X)]}. Now we refer to the formula for the gradient of a differential spectral function [6]. Proposition 2.5. Let f be a symmetric function from R n to R and X S n. Then the following holds:

20 2.3 Smoothing Functions for φ(x) 14 (a) (f λ) is differentiable at point X if and only if f is differentiable at point λ(x). In the case the gradient of (f λ) at X is given by (f λ)(x) = UDiag[ f(λ(x))]u T, U O X. (2.29) (b) (f λ) is continuously differentiable at point X if and only if f is continuously differentiable at point λ(x). Lewis and Sendov [7, Theorems 3.3 and 4.2] proved the following proposition, which gives the formula for calculating the Hessian of the spectral function. Proposition 2.6. Let f : R n R be symmetric. Then for any X S n, it holds that (f λ) is twice (continuously) differentiable at X if and only if f is twice (continuously) differentiable at λ(x). Moreover, in this case the Hessian of the spectral function at X is 2 (f λ)(x)[h] = U(Diag[ 2 f(λ(x))diag[ H]] + C(λ(X)) H)U T, H S n, (2.30) where U is any orthogonal matrix in O X and H = U T HU. The matrix C in Proposition 2.6 is defined as follows: C(ω) R n n : 0, if i = j (C(ω)) ij := ( 2 f(ω)) ii ( 2 f(ω)) ij, if i j and ω i = ω j ( f(ω)) i ( f(ω)) j, else. ω i ω j (2.31) 2.3 Smoothing Functions for φ(x) Consider the φ(x) in (1.10), clearly, it is a composition function: φ(x) = (f κ λ)(c + A(x)), (2.32)

21 2.3 Smoothing Functions for φ(x) 15 where f κ (x) is given in (2.1). In [12], Shi investigated the smoothing function f κ µ (x) as a approximation for f κ (x). So it is natural for us to think of the below composition function: φ µ (x) = (f κ µ λ)(c + A(x)), (2.33) According to the properties of f κ µ ( ), for µ > 0, φ µ (x) φ(x) φ µ (x) + µr, x Q. (2.34) Since the function f κ µ ( ) is a symmetric function, (f κ µ λ) is a composition function of a symmetric function f κ µ ( ) : R m R and the eigenvalue function λ( ) : S m R m. Hence the function (f κ µ λ) is a spectral function. Consequently, (f κ µ λ) is twice continuously differentiable. We will prove that the composition φ µ (x) is twice continuously differentiable and provide an estimation of the Lipschitz constant of the gradient of φ µ (x) as follows. First, we will derive the gradient of φ µ (x). For µ > 0, for any h R n and h 0, φ µ (x + h) φ µ (x) = f κ µ (λ(c + A(x + h))) f κ µ (λ(c + A(x))) = f µ κ (λ(c + A(x) + A(h))) f µ κ (λ(c + A(x))) = (f µ κ λ)(c + A(x)), A(h) + O( h 2 ) (2.35) = A ( (f κ µ λ)(c + A(x))), h + O( h 2 ), where A D = ( A 1, D,..., A m, D ) T. Thus we have the following proposition. Proposition 2.7. For µ > 0, φ µ ( ) is continuously differentiable with its gradient given by: φ µ (x) = A ( (f κ µ λ)(c + A(x))). (2.36)

22 2.3 Smoothing Functions for φ(x) 16 Next, we shall consider the Hessian of φ µ (x). First we define C µ (ω) R n n as 0, if i = j (C µ (ω)) ij := ( 2 f κ µ (ω)) ii ( 2 f κ µ (ω)) ij, if i j and ω i = ω j (2.37) ( f κ µ (ω)) i ( f κ µ (ω)) j, else. ω i ω j Proposition 2.8. For µ > 0, for any h R n and h 0, φ(x) is twice continuously differentiable with its Hessian given by: where H := A(h). 2 φ µ (x)[h] = A ( 2 (f µ κ λ)(c + A(x))[H]), (2.38) Proof. For µ > 0, for any h R n and h 0,, φ µ (x + h) φ µ (x), h = A ( (f κ µ λ)(c + A(x + h))) A ( (f κ µ λ)(c + A(x))), h = A ( (f κ µ λ)(c + A(x + h)) (f κ µ λ)(c + A(x))), h = (f µ κ λ)(c + A(x + h)) (f µ κ λ)(c + A(x)), A(h) (2.39) = (f κ µ λ)(c + A(x) + A(h)) (f κ µ λ)(c + A(x)), A(h) = 2 (f κ µ λ)(c + A(x))A(h), A(h) + O( h 2 ) = A ( 2 (f κ µ λ)(c + A(x)))H, h + O( h 2 ), which shows that (2.50) holds. In order to estimate 2 φ µ (x), we need the estimation of C µ (ω) given in (2.31). The following lemma is motivated by Lewis and Sendov [7]. Lemma 2.9. For ω R n with ω i ω j, i j, i, j = 1,, n, there exists ξ, η R n, such that ( f µ κ (ω)) i ( f µ κ (ω)) j ω i ω j = ( 2 f µ κ (ξ)) ii ( 2 f µ κ (η)) ij. (2.40)

23 2.3 Smoothing Functions for φ(x) 17 Proof. For each ω R n, we define the two vectors ω and ω R n coordinatewise as follows: ω p, p i, ω = ω j, p = i. ω = ω p, p i, j, ω j, p = i. ω i, p = j. (2.41) By the mean value theorem, ( f µ κ (ω)) i ( f µ κ (ω)) j ω i ω j = ( f µ κ (ω)) i ( f µ κ ( ω)) i + ( f µ κ ( ω)) i ( f µ κ (ω)) j ω i ω j = (ω i ω j )( 2 f µ κ (ξ)) ii + ( f µ κ ( ω)) i ( f µ κ (ω)) j ω i ω j =( 2 f µ κ (ξ)) ii + ( f µ κ ( ω)) i ( f µ κ ( ω)) i + ( f µ κ ( ω)) i ( f µ κ (ω)) j ω i ω j (2.42) =( 2 f µ κ (ξ)) ii + (ω j ω i )( 2 f µ κ (η)) ij + ( f µ κ ( ω)) i ( f µ κ (ω)) j ω i ω j =( 2 f µ κ (ξ)) ii ( 2 f µ κ (η)) ij + ( f µ κ ( ω)) i ( f µ κ (ω)) j ω i ω j, where ξ is a vector between ω and ω, and η is a vector between ω and ω. We next consider the term ( f κ µ ( ω)) i ( f κ µ (ω)) j ω i ω j. By the definitions, we know and ( f µ κ ( ω)) i = ( f µ κ (ω)) j = Since f µ κ ( ) is a symmetric function, Consequently, ( f µ κ ( ω)) i = ω i f µ κ ( ω) = ω j f µ κ (ω) (2.43) ω j f µ κ (ω) = ( f µ κ (ω)) j. ω j f µ κ ( ω). (2.44) Therefore (2.40) holds. ( f µ κ ( ω)) i ( f µ κ (ω)) j ω i ω j = 0.

24 2.3 Smoothing Functions for φ(x) 18 Lemma For any 1 i, j m, each entry of (C µ (ω)) ij bound: has the following 0 (C µ (ω)) ij 1 2µ. (2.45) Proof. ( 2 f (C µ κ µ (ω)) ii ( 2 f κ µ (ω)) ij, i = j (ω)) ij = ( 2 f κ µ (ξ)) ii ( 2 f κ µ (η)) ij, i j. For any x, y R n, by Theorem 2.4, (2.46) ( 2 f µ κ (x)) ii ( 2 f µ κ (y)) ij max ( 2 f µ κ (x)) ii min ( 2 f µ κ (y)) ij = 1 2µ ; (2.47) ( 2 f µ κ (x)) ii ( 2 f µ κ (y)) ij min ( 2 f µ κ (x)) ii max ( 2 f µ κ (y)) ij = 0. (2.48) Let i = j, x = ω, y = ω, then Let i j, x = ξ, y = η, then 0 (C µ (ω)) ij 1 2µ ; 0 (C µ (ω)) ij 1 2µ. Next, we estimate the Lipschitz constant of φ µ (x). For any h R n, we have h, 2 φ µ (x)[h] = h, A ( 2 (f µ κ λ)(c + A(x))[H]) (2.49) = H, 2 (f µ κ λ)(c + A(x))[H], with H = A(h), and 2 (f µ κ λ)(c + A(x))[H] =U(Diag[ 2 f µ κ (λ(c + A(x)))diag[ H]] + C µ (λ(c + A(x))) H)U T, (2.50)

25 2.3 Smoothing Functions for φ(x) 19 where U O C+A(x), and H = U T HU. Therefore h, 2 φ µ (x)[h] = ( 2 (f κ µ λ)(c + A(x))) ii Hii + 1 4µ H ii + 1 2µ 1 H, H 2µ = 1 A(h), A(h) 2µ = 1 2µ i,j=1 m h i h j A i, A j i,j=1 = 1 2µ ht Gh 1 2µ G h 2, i j H ij i,j=1 i j C µ ij (λ(c + A(x))) H ij (2.51) where G S m, (G) ij = A i, A j, and G := max 1 i m λ i(g) = max{λ 1 (G), λ m (G)}. Thus the Lipschitz constant for the gradient of the smoothing function φ µ (x) is: L = 1 G. (2.52) 2µ In particular, if we take µ := µ(ε) = ɛ, where R = 2m ln(2m) κ ln κ (2m 2R κ) ln(2m κ), then we have L = R G. (2.53) ɛ

26 Chapter 3 Smoothing Functions for the Sum of the κ Largest Absolute Values of Eigenvalues In order to solve problem (1.13), we need the concept of singular values. According to some properties of singular values, we can obtain a computable smoothing function for problem ψ(x). 3.1 Preliminaries Similar to the eigenvalue decomposition of a symmetric matrix, a non-symmetric matrix has singular value decomposition. Let A M n,m, and without generality we assume n m. Then there exist orthogonal matrices U M n,n and V M m,m such that A has the following singular value decomposition (SVD): U T AV = [Σ(A) 0], (3.1) where Σ(A) = diag(σ 1 (A),..., σ n (A)) and σ 1 (A) σ 2 (A)... σ n (A) 0 are the singular values of A [5]. 20

27 3.1 Preliminaries 21 The singular value has the following property: AAT = UΣ 2 (A)U T = Udiag[σ 1 (A),..., σ n (A)]U T. (3.2) In particular, if A is symmetric, we have AA T = A 2 = P diag[λ 1 (A) 2,..., λ n (A) 2 ]P T, (3.3) where λ 1 (A),..., λ n (A) are the eigenvalues of A. Comparing (3.2) and (3.3), the singular values are the square root of respective eigenvalues of AA T, which means σ i (A) = λ i (A), i = 1,..., n for all symmetric matrix A. For any W S n+m, we define: Λ(W ) = diag(λ (1) (W ),..., λ (n) (W ), λ (n+m) (W ),..., λ (n+1) (W )), (3.4) where {λ i (W ) : i = 1,, n + m} are the eigenvalues of W arrange in decreasing order. Noted that the first n diagonal entries of Λ(W ) are just the n largest eigenvalues of W, arranged in deceasing order, while the last m diagonal entries of Λ(W ) are the m smallest eigenvalues of W, arranged in increasing order. This arrangement will show convenience shortly afterwards. Define the linear operator: Ξ : M n,m S m+n by Ξ(B) = 0 B, B M n,m. (3.5) B T 0 For A S m, let it have the following eigenvalue decomposition: V T AV = Σ(A), V O A. (3.6) The following result is derived from Golub and Van Loan [5, Section 8.6]. Proposition 3.1. Suppose that A S m has the eigenvalue decomposition (3.6). Then the matrix Ξ(A) has the following spectral decomposition: Ξ(A) = Q(Λ(Ξ(A)))Q T = Q Σ(A) 0 Q T, (3.7) 0 Σ(A)

28 3.2 Smoothing Functions for ψ(x) 22 where Q O 2m,2m is defined by: Q = 1 V 2 i.e. the eigenvalues of Ξ(A) are ±σ i (A). V V V, (3.8) 3.2 Smoothing Functions for ψ(x) From Proposition 3.1, we know that κ ψ(x) = λ i (Ξ(C + A(x))). (3.9) Similar to Chapter 2, we define the smoothing function for ψ(x) by ψ µ (x) = (f κ µ λ)(ξ(c + A(x))), (3.10) where the f κ µ ( ) is the smoothing function defined in Chapter 2 and κ m. For x R n, we define a linear operator Γ : R n S 2m as follows: Γ(x) = Ξ(A(x)). (3.11) Let us consider the adjoint of Γ(x). For Y S 2m, Γ(x), Y = Ξ(A(x)), Y = 0 A(x) (A(x)) T 0, = A(x), Y 2 + (A(x)) T, Y T 2 = 2 A(x), Y 2 = 2x i A i, Y 2. Y 1 Y 2 Y2 T Y 3 (3.12) Thus, Γ (Y ) = (2 A 1, Y 2,, 2 A n, Y 2 ) T.

29 3.2 Smoothing Functions for ψ(x) 23 The smoothing function ψ µ (x) takes the following form: ψ µ (x) = (f µ κ λ)(ξ(c) + Γ(x))). (3.13) Since we have already known that, for any µ > 0, (f κ µ λ) is twice continuously differentiable, ψ µ ( ) is also twice continuously differentiable. Now we discuss its gradient and Hessian. Proposition 3.2. For µ > 0, ψ µ ( ) is continuously differentiable with its gradient given by ψ µ (x) = Γ ( (f κ µ λ)(ξ(c) + Γ(x))). (3.14) Proof. For µ > 0, for any h R n and h 0, ψ µ (x + h) ψ µ (x) = f κ µ (λ(ξ(c) + Γ(x + h))) f κ µ (λ(ξ(c) + Γ(x))) = f µ κ (λ(ξ(c) + Γ(x) + Γ(h))) f µ κ (λ(ξ(c) + Γ(x))) = (f µ κ λ)(ξ(c) + Γ(x)), Γ(h) + O( h 2 ) (3.15) = Γ ( (f κ µ λ)(ξ(c) + Γ(x))), h + O( h 2 ), which agrees with (3.14). Proposition 3.3. For µ > 0, ψ µ ( ) is twice continuously differentiable with its Hessian given by 2 ψ µ (x)[h] = Γ ( 2 (f κ µ λ)(ξ(c) + Γ(x))[H]), h R n, (3.16) where H := Γ(h).

30 3.2 Smoothing Functions for ψ(x) 24 Proof. For µ > 0, for any h R n and h 0, ψ µ (x + h) ψ µ (x), h = Γ ( (f κ µ λ)(ξ(c) + Γ(x + h))) Γ ( (f κ µ λ)(ξ(c) + Γ(x))), h = Γ ( (f κ µ λ)(ξ(c) + Γ(x + h)) (f κ µ λ)(ξ(c) + Γ(x))), h = (f µ κ λ)(ξ(c) + Γ(x + h)) (f µ κ λ)(ξ(c) + Γ(x)), Γ(h) (3.17) = (f κ µ λ)(ξ(c) + Γ(x) + Γ(h)) (f κ µ λ)(ξ(c) + Γ(x)), Γ(h) = 2 (f κ µ λ)(ξ(c) + Γ(x))Γ(h), Γ(h) + O( h 2 ) = Γ ( 2 (f κ µ λ)(ξ(c) + Γ(x)))H, h + O( h 2 ), which shows that (3.16) holds. Next, we estimate the Lipschitz constant of ψ µ (x). For any h R n, we have h, 2 ψ µ (x)[h] = h, Γ ( 2 (f µ κ λ)(ξ(c) + Γ(x))[H]) = H, ( 2 f µ κ λ)(ξ(c) + Γ(x))H, (3.18) with 2 (f µ κ λ)(ξ(c) + Γ(x))[H] =U(Diag[ 2 f µ κ (λ(ξ(c) + Γ(x)))diag[ H]] + C µ (λ(ξ(c) + Γ(x))) H)U T, (3.19)

31 3.2 Smoothing Functions for ψ(x) 25 where U O Ξ(C)+Γ(x) and H = U T HU. Therefore h, 2 ψ µ (x)[h] = ( 2 (f κ µ λ)(ξ(c) + Γ(x))) ii H2 ii + 1 4µ H 2 ii + 1 2µ 1 H, H 2µ 1 Γ(h), Γ(h) 2µ 1 A(h), A(h) µ i,j=1 i j H 2 ij i,j=1 i j C µ ij (λ(ξ(c) + Γ(x))) H 2 ij (3.20) 1 µ G h 2. Thus, the Lipschitz constant for the gradient of the smoothing function ψ µ (cot) is L = 1 G. (3.21) µ In particular, if we take µ := µ(ε) = ɛ, wherer := m ln m κ ln κ (m κ) ln(m 2R κ) then we have L = 2R G. (3.22) ɛ

32 Chapter 4 Numerical Experiments Nesterov s method is to solve the following optimization problem: min{f(x)}, (4.1) x Q where f is a convex function with its gradient of f(x) satisfied the Lipschitz condition: where L > 0 is a constant. Consider a prox-function d(x) in Q. f(x) f(y) L x y, x, y R n. (4.2) We assume that d(x) is continuous and strongly convex on Q with convexity parameter σ > 0. Denote x 0 by x 0 = min d(x). (4.3) x Q Without loss of generality we assume d(x 0 ) = 0. Thus, for any x Q, we have Define d(x) 1 2 σ x x 0 2. (4.4) T Q (x) = min y { f(x), y x L y x 2 : y Q}. (4.5) Now we are ready to give Nesterov s smoothing algorithm [9]: For k 0 do 26

33 27 1. Compute f(x k ) and f(x k ). 2. Find y k = T Q (x k ). 3. Find z k = arg min{ Ld(x) + k i+1 [f(x x σ 2 i) + f(x i ), x x i ] : x Q}. i=0 4. Set x k+1 = 2 k+3 z k + k+1 k+3 y k. Nesterov prove the following Theorem [9, Theorem 3]: Theorem 4.1. Let the sequences {x k } k=0 and {y k} k=0 algorithm. Then for any k 0, we have { (k + 1)(k + 2) L k f(y k ) min 4 x σ d(x) + Therefore, i=0 f(y k ) f(x ) where x is an optimal solution to the problem. (4.1). be generated by the above } i [f(x i) + f(x i ), x x i ] : x Q. (4.6) 4Ld(x ) σ(k + 1)(k + 2). (4.7) By applying Bregman s distance, Nesterov provided a modified algorithm, which gives a way to compute T Q (x k ). In the new algorithm, we compute the V Q instead of T Q. Bregman s distance was introduced in [3], as an extension to the usual metric discrepancy measure (x, y) x y 2. If f( ) is a real convex function, then the Bregman distance between two parameters z and x is defined as The Bregman distance satisfies ξ(z, x) = f(x) f(z) f(z), x z, x, z Q. (4.8) Define the Bregman projection of h as follows: ξ(z, x) 1 2 σ x z 2. (4.9) V Q (z, h) = argmin{h T (x z) + ξ(z, x) : x Q}. (4.10)

34 4.1 Computational Results 28 The following algorithm is Nesterov s algorithm via the Bregman distance [9]: 1. Choose y 0 = z 0 = arg min{ Ld(x) + 1[f(x x σ 2 0) + f(x 0 ), x x 0 ] : x Q}. 2. For k 0 iterate: a. Find z k = arg min{ Ld(x) + k i+1 [f(x x σ 2 i) + f(x i ), x x i ] : x Q}. i=0 b. Set τ k = 2 and x k+3 k+1 = τ k z k + (1 τ k )y k. c. Find ˆx k+1 = V Q (z k, σ L τ k f(x k+1 )). d. Set y k+1 = τ kˆx k+1 + (1 τ k )y k. Nesterov pointed out for the above method, Theorem 4.1 holds. The computational results shown in the next section are achieved by applying the above algorithm. 4.1 Computational Results First, we solve the smoothing problem: min x Q φµ (x), where the closed convex set Q is given by Q = {x R n : x i = 1, 0 x i 1, i = 1, 2,..., n}. Let P i be a m-by-m random matrix with its entries in [-1,1], define C, A 1, A 2,, A n as C = 1 2 (P 0 + P T 0 ) and A i = i 1 2 (P i + P T i ), i = 1,, n. Let ɛ be the desired accuracy, i.e., φ( x) φ ɛ. R is defined by 2.5 and d(x) = lnn + n x i. According to Theorem 4.1, we have the iteration bound N := [ 2 2R ln n ɛ G ]. We composed the matlab codes for this problem. Tables 4.1, 4.2 and 4.3 are the numerical results for this problem.

35 4.1 Computational Results 29 Table 4.1: for m = 10, n = 4 and different κ κ ɛ N time starting value optimal value Table 4.2: for m = 30, n = 4 and different κ κ ɛ N time starting value optimal value Table 4.3: for n = 4, κ = 4, ɛ = 0.01 and different m m N time starting value optimal value

36 4.2 Conclusions 30 Next, we solve the smoothing problem : min x Q ψµ (x). The parameters Q, C, A i, i = 1,, n are defined as for φ µ (x). The maximal number of iterations is N = [ 4 R ln n ɛ G ]. We composed the matlab codes for this problem. Tables 4.4 contains the computational results. Table 4.4: for n = 4, κ = 3 and different m m ɛ N time starting value optimal value Conclusions Note that the two problems we have discussed can both be converted into semidefinite programming problems. One may then consider second order approaches like Newton s method to solve these problems. However, for high dimensional problems, the efficiency of such approaches are not satisfactory. In this chapter, we have done some experiments, but our final goal is to solve high dimension problems. In our experiments, the number of iterations is very high. In order to achieve the required accuracy with efficiency, we make the following observation on the improvement of the smoothing algorithm.

37 4.2 Conclusions 31 Firstly, we can reduce the time of eigenvalue decomposition in each iteration. In our algorithm, we only need the first κ eigenvalues. In many cases, κ m, we can try to do the partial eigenvalue decomposition. In each decomposition steps, we only decompose the first l largest eigenvalues, where l is a heuristic parameter, and κ < l m. For instance, let l be κ + 3, κ + 5, or 2κ, 3κ. Then we compare the lth largest eigenvalue and the κth eigenvalue. If the lth eigenvalue is far less than the κth eigenvalue, the eigenvalues less than the lth have little contribution to the optimal value v T λ in each iteration, which means that the correspondence v i s are very small for i > l. When we apply (2.29) to compute the gradient of φ µ (x) or ψ µ (x), v i s also have little contribution to the gradient matrix for i > l. From the above discussion, the partial eigenvalues decomposition will not cause great loss in the process. How to find a proper l in each decomposition step, and how to estimate the loss need to be taken into consideration in further research. Secondly, Nesterov has provided an excessive gap algorithm [10], which is based on the upper bound and lower bound of the optimal value. In further research, we may try to apply this algorithm to our problems and reduce the number of total iterations. Finally, in our analysis, we prove that the gradients of the two classes of problems are Lipschitz continuous, and derive the Lipschitz constant. The maximal iteration number depends on the Lipschitz constant, which depends on the norm of the matrix G. If G becomes larger, the Lipschitz constant becomes larger, thereby the iteration number becomes larger. In Nesterov s analysis, A is the infinity norm, while in our result, G is the normal matrix norm, its property still needs investigation.

38 Bibliography [1] H. H. Bauschke, J. M. Borwein, and P. L. Combettes, Bregman monotone optimization algorithms, SIAM Journal on Control and Optimization, vol. 42, pp , [2] A. Ben-Tal and M. Teboulle, A smoothing technique for nondifferentiable optimization problems, in Optimization. Fifth French German Conference, Lecture Notes in Math. 1405, Springer-Verlag, New York, 1989, [3] L. M. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming, U.S.S.R Computational Mathematics and Mathematical Physics, vol. 7, pp , [4] X. Chen, H.D. Qi, L. Qi and K. Teo, Smooth convex approximation to the maximum eigenvalue function to appear in J. of Global Optimization. [5] G.H. Golub and C.F. Van Loan, Matrix Computations, the Johns Hopkins University Press, Baltimore, USA, Third Edition,(1996). 32

39 Bibliography 33 [6] A.S. Lewis, Derivatives of spectral functions, Mathematics of Operations Research, 21 (1996), [7] A.S. Lewis and H.S. Sendov, Twice differentiable spectral functions, SIAM J. Matrix Anal. Appl., 22 (2001), [8] Yu. Nesterov, Unconstrained convex minimization in relative scale, CORE DP 2003/96. [9] Yu. Nesterov, Smooth minimization of non-smooth functions, CORE DP 2003/12, February Accepted by Mathematcal Programming. [10] Yu. Nesterov, Excessive gap technique in non-smooth convex minimizarion, CORE DP 2003/35, May Accepted by SIAM J. Optimization. [11] Yu. Nesterov, Smoothing technique and its applications in semidefinite Optimizarion, CORE DP 2004/73, October Accepted by SIAM J. Optimization. [12] S. Shi. Smooth convex approximation and its applications, Master s thesis, National University of Singapore, [13] D. Sun and J. Sun, Strong semismoothness of eigenvalues of symmetric matrices and its application to inverse eigenvalue problems, SIAM J. Numer. Anal., 40 (2003),

40 Name: Degree: Department: Thesis Title: Yu Qi Master of Science Mathematics Smoothing Approximations for Two Classes of Convex Eigenvalue Optimization Problems Abstract In this thesis, we consider two problems: minimizing the sum of the κ largest eigenvalues and the sum of the κth largest absolute values of eigenvalues of a parametric linear operator. In order to apply Nesterov s smoothing algorithm to solve these two problems, we construct two computable smoothing functions whose gradients are Lipschitz continuous. This construction is based on Shi s thesis [12] and new techniques introduced in this thesis. Numerical results on the performance of Nesterov s smooth algorithm are also reported. Keywords: Smoothing functions, Lipschitz constants, Smoothing algorithm, Eigenvalue problems.

41 SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI NATIONAL UNIVERSITY OF SINGAPORE 2005

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression The SVD is the most generally applicable of the orthogonal-diagonal-orthogonal type matrix decompositions Every

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

Notes on Symmetric Matrices

Notes on Symmetric Matrices CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.

More information

Similarity and Diagonalization. Similar Matrices

Similarity and Diagonalization. Similar Matrices MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

More information

Computing a Nearest Correlation Matrix with Factor Structure

Computing a Nearest Correlation Matrix with Factor Structure Computing a Nearest Correlation Matrix with Factor Structure Nick Higham School of Mathematics The University of Manchester higham@ma.man.ac.uk http://www.ma.man.ac.uk/~higham/ Joint work with Rüdiger

More information

Inner Product Spaces and Orthogonality

Inner Product Spaces and Orthogonality Inner Product Spaces and Orthogonality week 3-4 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,

More information

Math 550 Notes. Chapter 7. Jesse Crawford. Department of Mathematics Tarleton State University. Fall 2010

Math 550 Notes. Chapter 7. Jesse Crawford. Department of Mathematics Tarleton State University. Fall 2010 Math 550 Notes Chapter 7 Jesse Crawford Department of Mathematics Tarleton State University Fall 2010 (Tarleton State University) Math 550 Chapter 7 Fall 2010 1 / 34 Outline 1 Self-Adjoint and Normal Operators

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Similar matrices and Jordan form

Similar matrices and Jordan form Similar matrices and Jordan form We ve nearly covered the entire heart of linear algebra once we ve finished singular value decompositions we ll have seen all the most central topics. A T A is positive

More information

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka kosecka@cs.gmu.edu http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

More information

[1] Diagonal factorization

[1] Diagonal factorization 8.03 LA.6: Diagonalization and Orthogonal Matrices [ Diagonal factorization [2 Solving systems of first order differential equations [3 Symmetric and Orthonormal Matrices [ Diagonal factorization Recall:

More information

Inner products on R n, and more

Inner products on R n, and more Inner products on R n, and more Peyam Ryan Tabrizian Friday, April 12th, 2013 1 Introduction You might be wondering: Are there inner products on R n that are not the usual dot product x y = x 1 y 1 + +

More information

Chapter 6. Orthogonality

Chapter 6. Orthogonality 6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be

More information

Section 6.1 - Inner Products and Norms

Section 6.1 - Inner Products and Norms Section 6.1 - Inner Products and Norms Definition. Let V be a vector space over F {R, C}. An inner product on V is a function that assigns, to every ordered pair of vectors x and y in V, a scalar in F,

More information

Applied Linear Algebra I Review page 1

Applied Linear Algebra I Review page 1 Applied Linear Algebra Review 1 I. Determinants A. Definition of a determinant 1. Using sum a. Permutations i. Sign of a permutation ii. Cycle 2. Uniqueness of the determinant function in terms of properties

More information

LINEAR ALGEBRA W W L CHEN

LINEAR ALGEBRA W W L CHEN LINEAR ALGEBRA W W L CHEN c W W L Chen, 1997, 2008 This chapter is available free to all individuals, on understanding that it is not to be used for financial gain, and may be downloaded and/or photocopied,

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Properties of BMO functions whose reciprocals are also BMO

Properties of BMO functions whose reciprocals are also BMO Properties of BMO functions whose reciprocals are also BMO R. L. Johnson and C. J. Neugebauer The main result says that a non-negative BMO-function w, whose reciprocal is also in BMO, belongs to p> A p,and

More information

Inner Product Spaces

Inner Product Spaces Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

More information

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they

More information

Orthogonal Diagonalization of Symmetric Matrices

Orthogonal Diagonalization of Symmetric Matrices MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding

More information

Lecture 5: Singular Value Decomposition SVD (1)

Lecture 5: Singular Value Decomposition SVD (1) EEM3L1: Numerical and Analytical Techniques Lecture 5: Singular Value Decomposition SVD (1) EE3L1, slide 1, Version 4: 25-Sep-02 Motivation for SVD (1) SVD = Singular Value Decomposition Consider the system

More information

Inner product. Definition of inner product

Inner product. Definition of inner product Math 20F Linear Algebra Lecture 25 1 Inner product Review: Definition of inner product. Slide 1 Norm and distance. Orthogonal vectors. Orthogonal complement. Orthogonal basis. Definition of inner product

More information

LINEAR ALGEBRA. September 23, 2010

LINEAR ALGEBRA. September 23, 2010 LINEAR ALGEBRA September 3, 00 Contents 0. LU-decomposition.................................... 0. Inverses and Transposes................................. 0.3 Column Spaces and NullSpaces.............................

More information

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Linear Algebra Notes for Marsden and Tromba Vector Calculus Linear Algebra Notes for Marsden and Tromba Vector Calculus n-dimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of

More information

(Quasi-)Newton methods

(Quasi-)Newton methods (Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

More information

The Characteristic Polynomial

The Characteristic Polynomial Physics 116A Winter 2011 The Characteristic Polynomial 1 Coefficients of the characteristic polynomial Consider the eigenvalue problem for an n n matrix A, A v = λ v, v 0 (1) The solution to this problem

More information

BANACH AND HILBERT SPACE REVIEW

BANACH AND HILBERT SPACE REVIEW BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,

More information

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace

More information

Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski.

Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski. Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski. Fabienne Comte, Celine Duval, Valentine Genon-Catalot To cite this version: Fabienne

More information

Continuity of the Perron Root

Continuity of the Perron Root Linear and Multilinear Algebra http://dx.doi.org/10.1080/03081087.2014.934233 ArXiv: 1407.7564 (http://arxiv.org/abs/1407.7564) Continuity of the Perron Root Carl D. Meyer Department of Mathematics, North

More information

Notes on Determinant

Notes on Determinant ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without

More information

A note on companion matrices

A note on companion matrices Linear Algebra and its Applications 372 (2003) 325 33 www.elsevier.com/locate/laa A note on companion matrices Miroslav Fiedler Academy of Sciences of the Czech Republic Institute of Computer Science Pod

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Lecture 1: Schur s Unitary Triangularization Theorem

Lecture 1: Schur s Unitary Triangularization Theorem Lecture 1: Schur s Unitary Triangularization Theorem This lecture introduces the notion of unitary equivalence and presents Schur s theorem and some of its consequences It roughly corresponds to Sections

More information

Applications to Data Smoothing and Image Processing I

Applications to Data Smoothing and Image Processing I Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is

More information

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = 36 + 41i.

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = 36 + 41i. Math 5A HW4 Solutions September 5, 202 University of California, Los Angeles Problem 4..3b Calculate the determinant, 5 2i 6 + 4i 3 + i 7i Solution: The textbook s instructions give us, (5 2i)7i (6 + 4i)(

More information

Lecture Topic: Low-Rank Approximations

Lecture Topic: Low-Rank Approximations Lecture Topic: Low-Rank Approximations Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original

More information

Factorization Theorems

Factorization Theorems Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

More information

Vector and Matrix Norms

Vector and Matrix Norms Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

More information

Journal of Computational and Applied Mathematics

Journal of Computational and Applied Mathematics Journal of Computational and Applied Mathematics 226 (2009) 92 102 Contents lists available at ScienceDirect Journal of Computational and Applied Mathematics journal homepage: www.elsevier.com/locate/cam

More information

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued). MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors Jordan canonical form (continued) Jordan canonical form A Jordan block is a square matrix of the form λ 1 0 0 0 0 λ 1 0 0 0 0 λ 0 0 J = 0

More information

Factor analysis. Angela Montanari

Factor analysis. Angela Montanari Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

We shall turn our attention to solving linear systems of equations. Ax = b

We shall turn our attention to solving linear systems of equations. Ax = b 59 Linear Algebra We shall turn our attention to solving linear systems of equations Ax = b where A R m n, x R n, and b R m. We already saw examples of methods that required the solution of a linear system

More information

Matrix Differentiation

Matrix Differentiation 1 Introduction Matrix Differentiation ( and some other stuff ) Randal J. Barnes Department of Civil Engineering, University of Minnesota Minneapolis, Minnesota, USA Throughout this presentation I have

More information

10. Proximal point method

10. Proximal point method L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing

More information

NOTES ON LINEAR TRANSFORMATIONS

NOTES ON LINEAR TRANSFORMATIONS NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all

More information

6. Cholesky factorization

6. Cholesky factorization 6. Cholesky factorization EE103 (Fall 2011-12) triangular matrices forward and backward substitution the Cholesky factorization solving Ax = b with A positive definite inverse of a positive definite matrix

More information

Classification of Cartan matrices

Classification of Cartan matrices Chapter 7 Classification of Cartan matrices In this chapter we describe a classification of generalised Cartan matrices This classification can be compared as the rough classification of varieties in terms

More information

(67902) Topics in Theory and Complexity Nov 2, 2006. Lecture 7

(67902) Topics in Theory and Complexity Nov 2, 2006. Lecture 7 (67902) Topics in Theory and Complexity Nov 2, 2006 Lecturer: Irit Dinur Lecture 7 Scribe: Rani Lekach 1 Lecture overview This Lecture consists of two parts In the first part we will refresh the definition

More information

1 Introduction to Matrices

1 Introduction to Matrices 1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

More information

MATH 551 - APPLIED MATRIX THEORY

MATH 551 - APPLIED MATRIX THEORY MATH 55 - APPLIED MATRIX THEORY FINAL TEST: SAMPLE with SOLUTIONS (25 points NAME: PROBLEM (3 points A web of 5 pages is described by a directed graph whose matrix is given by A Do the following ( points

More information

GI01/M055 Supervised Learning Proximal Methods

GI01/M055 Supervised Learning Proximal Methods GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators

More information

13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions. 3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in three-space, we write a vector in terms

More information

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem

More information

3. Let A and B be two n n orthogonal matrices. Then prove that AB and BA are both orthogonal matrices. Prove a similar result for unitary matrices.

3. Let A and B be two n n orthogonal matrices. Then prove that AB and BA are both orthogonal matrices. Prove a similar result for unitary matrices. Exercise 1 1. Let A be an n n orthogonal matrix. Then prove that (a) the rows of A form an orthonormal basis of R n. (b) the columns of A form an orthonormal basis of R n. (c) for any two vectors x,y R

More information

SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH

SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH 31 Kragujevac J. Math. 25 (2003) 31 49. SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH Kinkar Ch. Das Department of Mathematics, Indian Institute of Technology, Kharagpur 721302, W.B.,

More information

About the inverse football pool problem for 9 games 1

About the inverse football pool problem for 9 games 1 Seventh International Workshop on Optimal Codes and Related Topics September 6-1, 013, Albena, Bulgaria pp. 15-133 About the inverse football pool problem for 9 games 1 Emil Kolev Tsonka Baicheva Institute

More information

Variational approach to restore point-like and curve-like singularities in imaging

Variational approach to restore point-like and curve-like singularities in imaging Variational approach to restore point-like and curve-like singularities in imaging Daniele Graziani joint work with Gilles Aubert and Laure Blanc-Féraud Roma 12/06/2012 Daniele Graziani (Roma) 12/06/2012

More information

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In

More information

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d). 1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

More information

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree

More information

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions 3. Convex functions Convex Optimization Boyd & Vandenberghe basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions log-concave and log-convex functions

More information

3. INNER PRODUCT SPACES

3. INNER PRODUCT SPACES . INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.

More information

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0. Cross product 1 Chapter 7 Cross product We are getting ready to study integration in several variables. Until now we have been doing only differential calculus. One outcome of this study will be our ability

More information

Orthogonal Projections

Orthogonal Projections Orthogonal Projections and Reflections (with exercises) by D. Klain Version.. Corrections and comments are welcome! Orthogonal Projections Let X,..., X k be a family of linearly independent (column) vectors

More information

CS3220 Lecture Notes: QR factorization and orthogonal transformations

CS3220 Lecture Notes: QR factorization and orthogonal transformations CS3220 Lecture Notes: QR factorization and orthogonal transformations Steve Marschner Cornell University 11 March 2009 In this lecture I ll talk about orthogonal matrices and their properties, discuss

More information

1 VECTOR SPACES AND SUBSPACES

1 VECTOR SPACES AND SUBSPACES 1 VECTOR SPACES AND SUBSPACES What is a vector? Many are familiar with the concept of a vector as: Something which has magnitude and direction. an ordered pair or triple. a description for quantities such

More information

Two classes of ternary codes and their weight distributions

Two classes of ternary codes and their weight distributions Two classes of ternary codes and their weight distributions Cunsheng Ding, Torleiv Kløve, and Francesco Sica Abstract In this paper we describe two classes of ternary codes, determine their minimum weight

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of

More information

7 Gaussian Elimination and LU Factorization

7 Gaussian Elimination and LU Factorization 7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method

More information

3 Orthogonal Vectors and Matrices

3 Orthogonal Vectors and Matrices 3 Orthogonal Vectors and Matrices The linear algebra portion of this course focuses on three matrix factorizations: QR factorization, singular valued decomposition (SVD), and LU factorization The first

More information

ISOMETRIES OF R n KEITH CONRAD

ISOMETRIES OF R n KEITH CONRAD ISOMETRIES OF R n KEITH CONRAD 1. Introduction An isometry of R n is a function h: R n R n that preserves the distance between vectors: h(v) h(w) = v w for all v and w in R n, where (x 1,..., x n ) = x

More information

18.06 Problem Set 4 Solution Due Wednesday, 11 March 2009 at 4 pm in 2-106. Total: 175 points.

18.06 Problem Set 4 Solution Due Wednesday, 11 March 2009 at 4 pm in 2-106. Total: 175 points. 806 Problem Set 4 Solution Due Wednesday, March 2009 at 4 pm in 2-06 Total: 75 points Problem : A is an m n matrix of rank r Suppose there are right-hand-sides b for which A x = b has no solution (a) What

More information

Lecture 5 Principal Minors and the Hessian

Lecture 5 Principal Minors and the Hessian Lecture 5 Principal Minors and the Hessian Eivind Eriksen BI Norwegian School of Management Department of Economics October 01, 2010 Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and

More information

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively. Chapter 7 Eigenvalues and Eigenvectors In this last chapter of our exploration of Linear Algebra we will revisit eigenvalues and eigenvectors of matrices, concepts that were already introduced in Geometry

More information

α = u v. In other words, Orthogonal Projection

α = u v. In other words, Orthogonal Projection Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v

More information

Constrained Least Squares

Constrained Least Squares Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580-587 CICN may05/1 Background The least squares problem: min Ax b 2 x Sometimes,

More information

Pacific Journal of Mathematics

Pacific Journal of Mathematics Pacific Journal of Mathematics GLOBAL EXISTENCE AND DECREASING PROPERTY OF BOUNDARY VALUES OF SOLUTIONS TO PARABOLIC EQUATIONS WITH NONLOCAL BOUNDARY CONDITIONS Sangwon Seo Volume 193 No. 1 March 2000

More information

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Mathematics Course 111: Algebra I Part IV: Vector Spaces Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are

More information

Section 1.1. Introduction to R n

Section 1.1. Introduction to R n The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to

More information

October 3rd, 2012. Linear Algebra & Properties of the Covariance Matrix

October 3rd, 2012. Linear Algebra & Properties of the Covariance Matrix Linear Algebra & Properties of the Covariance Matrix October 3rd, 2012 Estimation of r and C Let rn 1, rn, t..., rn T be the historical return rates on the n th asset. rn 1 rṇ 2 r n =. r T n n = 1, 2,...,

More information

5. Orthogonal matrices

5. Orthogonal matrices L Vandenberghe EE133A (Spring 2016) 5 Orthogonal matrices matrices with orthonormal columns orthogonal matrices tall matrices with orthonormal columns complex matrices with orthonormal columns 5-1 Orthonormal

More information

NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing

NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing Alex Cloninger Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu

More information

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like

More information

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set.

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set. MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set. Vector space A vector space is a set V equipped with two operations, addition V V (x,y) x + y V and scalar

More information

THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS

THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS KEITH CONRAD 1. Introduction The Fundamental Theorem of Algebra says every nonconstant polynomial with complex coefficients can be factored into linear

More information

Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances

Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances It is possible to construct a matrix X of Cartesian coordinates of points in Euclidean space when we know the Euclidean

More information

Chapter 7. Lyapunov Exponents. 7.1 Maps

Chapter 7. Lyapunov Exponents. 7.1 Maps Chapter 7 Lyapunov Exponents Lyapunov exponents tell us the rate of divergence of nearby trajectories a key component of chaotic dynamics. For one dimensional maps the exponent is simply the average

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

More information

An Accelerated First-Order Method for Solving SOS Relaxations of Unconstrained Polynomial Optimization Problems

An Accelerated First-Order Method for Solving SOS Relaxations of Unconstrained Polynomial Optimization Problems An Accelerated First-Order Method for Solving SOS Relaxations of Unconstrained Polynomial Optimization Problems Dimitris Bertsimas, Robert M. Freund, and Xu Andy Sun December 2011 Abstract Our interest

More information