A Perfect Example for The BFGS Method

Size: px
Start display at page:

Download "A Perfect Example for The BFGS Method"

Transcription

1 A Perfect Example for The BFGS Method Yu-Hong Dai Abstract Consider the BFGS quasi-newton method applied to a general nonconvex function that has continuous second derivatives. This paper aims to construct a four-dimensional example such that the BFGS method need not converge. The example is perfect in the following sense: (a All the stepsizes are exactly equal to one; the unit stepsize can also be accepted by various line searches including the Wolfe line search and the Arjimo line search; (b The objective function is strongly convex along each search direction although it is not in itself. The unit stepsize is the unique minimizer of each line search function. Hence the example also applies to the global line search and the line search that always picks the first local minimizer; (c The objective function is polynomial and hence is infinitely continuously differentiable. If relaxing the convexity requirement of the line search function; namely, (b, we are able to construct a relatively simple polynomial example. Keywords. unconstrained optimization, quasi-newton method, nonconvex function, global convergence. 1 Introduction Consider the unconstrained optimization problem min f(x, x R n, (1.1 where f is a general non-convex function that has continuous second derivatives. The quasi-newton method is a class of well-known and efficient methods for solving (1.1. It is of the iterative scheme x k+1 = x k α k H k g k, (1. This work was partially supported by the Chinese NSF grants and and the CAS grant kjcx-yw-s7-03. State Key Laboratory of Scientific and Engineering Computing, Institute of Computational Mathematics and Scientific/Engineering Computing, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, P.O. Box 719, Beijing , P.R. China. 1

2 where x 1 is a starting point, H k is some approximation to the inverse Hessian [ f(x k ] 1, g k = f(x k, and α k is a stepsize obtained in some way. Defining the vectors δ k = x k+1 x k, γ k = g k+1 g k, the quasi-newton method asks the next approximation matrix H k+1 to satisfy the secant equation H k+1 γ k = δ k. (1.3 This similarity to the identical equation [ f(x k+1 ] 1 γ k = δ k in the quadratic case enables the quasi-newton method to be superlinearly convergent (see Dennis and Moré [5], for example and makes it very attractive in practical optimization. The first quasi-newton method was dated back to Davidon [4] and Fletcher and Powell [9]. The DFP method updates the approximation matrix H k to H k+1 by the formula H k+1 = H k H kγ k γ T k H k γ T k H kγ k + δ kδ T k δ T k γ k. (1.4 Nowadays, the most efficient quasi-newton method is perhaps the BFGS method, which was proposed by Broyden [], Fletcher [7], Goldfarb [10], and Shanno [4], independently. The matrix H k+1 in the BFGS method can be updated by the way H k+1 = H k δ kγ T k H k + H k γ k δ T ( k δ T γt k H kγ k δk δ T k k γ k δ T k γ k δ T. (1.5 k γ k To pass the positive definiteness of the matrix H k to H k+1, practical quasi- Newton algorithms make use of the Wolfe line search to calculate the stepsize α k, which ensures a positive curvature to be found at each iteration; namely, δ T k γ k > 0. More exactly, defining the one-dimensional line search function Ψ k (α = f(x k αh k g k, α 0, (1.6 the Wolfe line search consists in finding a stepsize α k such that and Ψ k (α k Ψ k (0 + µ Ψ k(0 α k (1.7 Ψ k(α k η Ψ k(0, (1.8 where µ and η are constants with 0 < µ η < 1. There have been a large quantity of research works devoting to the global convergence of the quasi-newton method (see Yuan [8] and Sun and Yuan [6], for example. Specifically, Powell [18] showed that the DFP method with exact line searches is globally convergent for uniformly convex functions. In another paper [0], Powell established the global convergence of the BFGS method with the Wolfe line search for uniformly convex functions. This result was extended

3 by Byrd, Nocedal and Yuan [1] to the whole Broyden s convex family of methods except the DFP method. Therefore it is natural to ask the following question: Does the DFP method with the Wolfe line search converge globally for uniformly convex functions? On the other hand, since the BFGS method with the Wolfe line search works well for both convex and non-convex functions, we might ask another question: Does the BFGS method with the Wolfe line search converge globally for general functions? The difficulty and importance of the two convergence problems has been addressed in many situations, including Nocedal [15], Fletcher [8], and Yuan [8]. Recent studies provide a negative answer to the convergence problem of the BFGS method for nonconvex functions. As a matter of fact, in an early paper [1], which analyzes the convergence properties of the conjugate gradient method, Powell mentioned that the BFGS method need not converge if the line search can pick any local minimizer of the line search function Ψ k (α. After further studies on the two-dimensional example in [1], Dai [3] presented an example with six cycling points and showed by the example that the BFGS method with the Wolfe line search may fail for nonconvex functions. Later, Mascarenhas [14] constructed a three-dimensional counter-example such that the BFGS method does not converge if the line search picks the global minimizer of the function Ψ k (α. It should be noted that the stepsize in the counterexample of [14] also satisfies the Wolfe line search conditions. However, neither examples are such that the stepsize is the first local minimizer of the line search function Ψ k (α. Surprisingly enough, if there are only two variables, and if the stepsize is chosen to be the first local minimizer of Ψ k (α; namely, α k = arg min {α > 0 : α is a local minimizer of Ψ k (α}, (1.9 Powell [] established the global convergence of the BFGS method for general twice continuously differentiable function. Powell s proof makes use of the principle of contradiction and is quite sophisticated. Assuming the nonconvergence of the method and the relation lim inf g k 0, Powell showed that the limit points of the BFGS path k P = {x : x lies on the line segment connecting x k and x k+1 for some k 1} (that is exactly the same as the Polak-Ribière conjugate gradient path if n = and gk+1 T δ k = 0 for all k forms some line segment L. The use of the specific line search (1.9 is such that the objective function is monotonically decreasing on the line segment L k that connects x k and x k+1. It follows that f(x lim f(x k for all x L. Consequently, the line segment L k cannot cross L k and all the points x k with large indices lies in the same side of L. Furthermore, Powell showed that starting around from one end of the line segment L, the BFGS path can not get any close to the other end of L and finally obtained a contradiction. Intuitively, it is difficult to extend Powell s result to the case when n 3. This is because, even if it has been shown that the BFGS path 3

4 tends to some limit line segment L, each line segment L k could turn around L providing that n 3, making the similar analysis of the BFGS path complicated and nearly impossible. The purpose of this paper is to construct a four-dimensional counter-example for the BFGS method with the following features: (a All the stepsizes in the example are exactly equal to one; namely, α k 1; (1.10 For the search direction defined by the BFGS update in the example, a unit stepsize satisfies various line search conditions, including the Wolfe conditions and the Armijo conditions; (b The objective function is strongly convex along each search direction; namely, the line search function Ψ k (α is strongly convex for all k, although the objective function is not in itself. The unit stepsize is the unique minimizer of Ψ k (α. Hence the example also applies to the global line search and the specific line search (1.9; (c The objective function is polynomial and hence is infinitely continuously differentiable. On the other hand, the objective function in our example is linear in the third and fourth variables and thus has no local minimizer. However, the iterations generated by the BFGS method tend to a non-stationary point. As seen from 3.4, the construction of the example is quite complicated. To be such that the line search function Ψ k (α is strongly convex, a polynomial of degree 38 is introduced. Nevertheless, if we relax the convexity requirement of Ψ k (α; namely, (b in the above, it is possible to construct a relatively simple polynomial example of low degree (see 3.3. The construction of our examples can be divided into four procedures: (1 Prefix the special forms of the steps {δ k } and gradients {g k }, leaving several parameters to be determined later. In this case, once x 1 is given, the whole sequence {x k } is fixed. As seen in.1, our steps {δ k } and gradients {g k } are asked to possess some symmetrical and cyclic properties and to push the iterations {x k } tend to the eight vertices of a regular octagon. This is very helpful in simplifying the construction of the examples and we can focus our attention on the choice of the parameter t, that answers for the decay of the last two components of δ k and the first two components of g k. ( To enable the BFGS method to generate those prefixed steps, investigate the consistency conditions on the steps {δ k } and gradients {g k }. With the prefixed forms of {δ k } and {g k }, we show in. that the unit stepsize is indispensable, whereas this stepsize is usually used as the first trial stepsize in the implementation of quasi-newton methods. Four more consistency conditions on {δ k } and {g k } are obtained in.3 by expressing the vectors H k+1 γ k, H k+1 g k+1, H k+1 g k+ and H k+1 g k+3 by some δ k s and g k s instead of considering the quasi-newton 4

5 matrix H k+1 itself. (3 Choose the parameters in some way to satisfy the consistency conditions and other necessary conditions on the objective function. As seen in.4, the first three consistency conditions are actually corresponding to some under-determined linear system. Substituting its general solution by the Cramer s rule into the fourth consistency condition, we are led to a nonlinear equation from which the exact value can be obtained for the decay parameter t. By suitably choosing the other parameters, we can express all the quasi-newton updating matrices including the initial choice for H 1 in.5. (4 Construct a suitable objective function f whose gradients are the preassigned values; namely, f(x k = g k for all k 1 and the line search has the desired properties. This will be done in the whole third section. To make full use of symmetrical and cyclic properties of the steps {δ k } and {g k }, we carefully choose a special form of the objective function. In addition, the introduction of element function φ in 3.4 helps us greatly to convexify the line search function and finally complete the perfect example that meets all the requirements (a, (b and (c. Some concluding remarks are given in the last section. Looking for Consistent Steps and Gradients.1 The Forms of Steps and Gradients Consider the case of four dimension. Inspired by [1], [3] and [14], we assume that the steps {δ k } have the following form: δ 1 = (η 1, ξ 1, γ 1, τ 1 T ; δ k+1 = M δ k (k 1, (.1 where M is the 4 4 matrix defined by ( cos θ1 sin θ 1 M = sin θ 1 cos θ 1 0 t 0 ( cos θ sin θ sin θ cos θ. (. In the above, t is a parameter satisfying 0 < t < 1. This parameter answers for the decay of the last two components of the steps {δ k }. The angles θ 1 and θ are chosen so that θ 1 = π m 1 and θ = π m for some positive integers m 1 and m. Specifically, we choose in this paper θ 1 = 1 4 π, θ = 3 π. (.3 4 Consequently, the first two components of the steps {δ k } turn to the same after every eight iterations, and the last two components will shrink at a factor of t 8 after every eight iterations. Further, we can see that the iterations {x k } will tend to turn around the eight vertices of some regular octagon (see 3.1 for more details. Accordingly, the gradients {g k } are assumed to be of the form g 1 = (l 1, h 1, c 1, d 1 T ; g k+1 = P g k (k 1, (.4 5

6 where P = ( cos θ1 sin θ t 1 sin θ 1 cos θ 1 0 ( cos θ 0 sin θ. (.5 sin θ cos θ Since 0 < t < 1, we see that the first two components of {g k } vanish, whereas the last two components of the gradient {g k }, turn to the same after eight iterations. It follows that none of the cluster points generated by the BFGS method are stationary points.. Unit Stepsizes It is well known that the unit stepsize is usually used as the first trial stepsize in practical implementations of the BFGS method. Under suitable assumptions on f, the unit stepsize will be accepted by the line search as the iterates tend to the solution and will enable a superlinear convergence step (see [5], for example. In the following, we are going to show that if the line search satisfies g T k+1δ k = 0, for all k 1, (.6 and if the steps generated by the BFGS method have the form (.1-(. and the gradients have the form (.4-(.5, then the use of unit stepsizes is also indispensable. To begin with, we see that the line search condition (.6, the secant equation (1.3 and the definition of the search direction indicate that H k+1 g k+1 = α 1 k+1 δ k+1 (.7 δ T k+1γ k = α k+1 g T k+1h k+1 γ k = α k+1 g T k+1δ k = 0. (.8 The above relation (.8 is sometimes called as the conjugacy condition in the context of nonlinear conjugate gradient methods. By multiplying the BFGS updating formula (1.5 with g k+1 and using (.6, which with (.7 gives H k+1 g k+1 = H k g k+1 γt k H kg k+1 δ T k γ k δ k, H k g k+1 = α 1 k+1 δ k+1 + γt k H kg k+1 δ T k γ k δ k. (.9 Multiplying (.9 by g T k and noticing gt k H kg k+1 = 0, we get that 0 = α 1 k+1 gt k δ k+1 + γt k H kg k+1 δ T gk T δ k = α 1 k+1 k γ gt k δ k+1 γ T k H k g k+1. k 6

7 Thus γ T k H kg k+1 = α 1 k+1 gt k δ k+1 = α 1 k+1 gt k+1 δ k+1. Hence, by (.9, H k g k+1 = α 1 k+1 δ k+1 α 1 gk+1 T δ k+1 k+1 δ T δ k. (.10 k γ k It follows from (.10 and (.7 with k replaced by k 1 that [ H k γ k = α 1 k+1 δ k+1 + α 1 k α 1 gk+1 T δ ] k+1 k+1 δ T δ k. (.11 k γ k Further, substituting this into the BFGS updating formula (1.5 yields [ H k+1 = H k + α 1 δ k δ T k+1 + δ k+1 δ T k k+1 δ T + 1 α 1 k + α 1 gk+1 T δ ] k+1 k+1 k γ k δ T k γ k δ k δ T k δ T k γ k. (.1 The above new updating formula requires the quantity δ k+1, that depends on H k+1 itself, and hence has theoretical meanings only. Lemma.1. Assume that (.6 holds. Then for all k 1 and i 0, the vector H k g k+i + α 1 k+i δ k+i belongs to the subspace spanned by δ k, δ k+1,..., δ k+i 1 ; namely, H k g k+i + α 1 k+i δ k+i Span{δ k, δ k+1,..., δ k+i 1 }. (.13 Proof. For convenience, we write (.1 as H k+1 = H k + V (s k, s k+1, (.14 where V (s k, s k+1 means the rank-two matrix in the right hand of (.1. Therefore we have for all i 1, H k+i = H k + i V (s k+j 1, s k+j. (.15 j=1 The statement follows by multiplying the above relation with g k+i and using (.7 with k replaced by k + i 1 and g T k+i δ k+i 1 = 0. Q.E.D. Lemma.. To construct a desired example with Det (S 1 0, we must have that α k = 1, for all k 4. Proof. Define the following matrices G k = [ γ k 1 g k g k+1 g k+ ], S k = [ δ k 1 δ k δ k+1 δ k+ ]. 7

8 By H k γ k 1 = δ k 1 and Lemma.1, we can get that Det(H k Det(G k = Det [ ] H k γ k 1 H k g k H k g k+1 H k g k+ = Det [ δ k 1 α 1 k δ k α 1 k+1 δ k+1 α k+ δ ] k+ = α 1 k α 1 k+1 α 1 k+ Det(S k. (.16 Replacing k with k + 1 in the above yields Det(H k+1 Det(G k+1 = α 1 k+1 α 1 k+ α 1 k+3 Det(S k+1. (.17 On the other hand, due to the special forms of {g k } and {δ k }, we know that G k+1 = P G k and S k+1 = MS k. Hence Det(G k+1 = Det(P Det(G k = t Det(G k, Det(S k+1 = Det(M Det(S k = t Det(S k. (.18 Due to the basic determinant relation of the BFGS update, (.6 and (.7, it is not difficult to see that Det(H k+1 = δt k H 1 k δ k δ T Det(H k = α k Det(H k. (.19 k γ k If Det(S 1 0, (.16 with k = 1 implies that Det(G 1 0. Then by (.18, Det(G k 0, Det(S k 0, for all k 1. (.0 Dividing (.17 by (.16 and using the above relations, we then obtain α k+3 = 1. So the statement holds due to the arbitrariness of k 1. Q.E.D. The deletion of the first finite iterations does not influence the whole example. Thus we will ask our counter-example to satisfy and Det(S 1 0 (.1 α k 1. (. In this case, the updating formula (.1 of H k can be simplified as H k+1 = H k + δ kδ T k+1 + δ k+1 δ T k δ T gt k+1 δ k+1 k γ k gk T δ k δ k δ T k δ T k γ k. (.3 8

9 .3 Consistency Conditions Now we ask what else conditions, besides the requirement of unit stepsizes, have to be satisfied by the parameters in the definitions of {g k } and {δ k } so that the steps {s k ; k 1} can be generated by the BFGS update. Our early idea is based on the observation on the updating formula (1.5 that H k+1 is linear with H k and the linear system H 8j+9 = diag(t 4 E, t 4 E H 8j+1 diag(t 4 E, t 4 E, (.4 where E is the two-dimensional identity matrix. However, this way seems to be quite complicated. Notice that the dimension is n = 4 and by Lemma., the assumption (.1 implies (.0. Then the matrix H k+1 can be uniquely defined by the equations given by H k+1 γ k, H k+1 g k+1, H k+1 g k+ and H k+1 g k+3. As a matter of fact, we have that H k+1 γ k = δ k (the secant equation, (.5 H k+1 g k+1 = δ k+1 ( by (.7 and α k 1, (.6 H k+1 g k+ = δ k+ + gt k+ δ k+ gk+1 T δ δ k+1 (by (.10 and α k 1, (.7 k+1 ( g T H k+1 g k+3 = δ k+3 + k+3 δ k+3 gk+ T δ + gt k+3 δ k+1 k+ gk+1 T δ δ k+ k+1 ( ( g T k+ δ k+ g T k+3 δ k+1 gk+1 T δ k+1 gk+1 T δ δ k+1. (.8 k+1 The last equality is obtained by multiplying (.3 by g k+, using (.7 and finally replacing k with k + 1. Relations (.5-(.8 provide a system of 16 equations, while the symmetric matrix H k+1 only has 10 independent entries. How to ensure that this linear system has a symmetric solution H k+1? We have the following general lemma. Lemma.3. Assume that {u 1, u,..., u n } and {v 1, v,..., v n } are two sets of n-dimensional linearly independent vectors. Then there exists a symmetric matrix H R n n satisfying Hu i = v i, i = 1,,..., n (.9 if and only if u T i v j = u T j v i, i, j = 1,,..., n. (.30 Proof. The only if part. If H = H T satisfies (.9, we have for all i, j = 1,,..., n, u T i v j = u T i Hu j = u T i HT u j = (Hu i T u j = v T i u j. The if part. Assume that (.30 holds. Defining the matrices U = (u 1 u... u n, V = (v 1 v... v n, (.31 9

10 direct calculations show that U T HU = U T V = u T 1 v 1 u T 1 v u T 1 v n u T v 1 u T v u T v n u T n v 1 u T n v u T n v n := A. (.3 By (.30, A is symmetric. So H = U T AU 1 satisfies H = H T and (.9. This completes our proof. Q.E.D. By the above lemma, the following six conditions are sufficient for the linear system (.5-(.8 to allow a symmetric solution matrix H k+1. g T k+1 (H k+1y k = (H k+1 g k+1 T y k, g T k+ (H k+1y k = (H k+1 g k+ T y k, g T k+3 (H k+1y k = (H k+1 g k+3 T y k, g T k+ (H k+1g k+1 = (H k+1 g k+ T g k+1, g T k+3 (H k+1g k+1 = (H k+1 g k+3 T g k+1, g T k+3 (H k+1g k+ = (H k+1 g k+3 T g k+. Further, considering the whole sequence of {H k+1 ; k 1} and combining the line search condition g T k+1 δ k = 0, we can deduce the following four consistency conditions, which should be satisfied for all k 1, g T k+1δ k = 0, (.33 δ T k+1y k = 0, (.34 gk+δ T k = δ T k+y k, (.35 ( g gk+3δ T k = δ T T k+3y k + k+3 δ k+3 gk+ T δ + gt k+3 δ k+1 k+ gk+1 T δ δ T k+y k. (.36 k+1 The positive definiteness of the matrix H k+1 will be further considered in.5..4 Choosing The Parameters In this subsection we focus on how to choose the decay parameter t and suitable vectors δ 1 and g 1 such that the consistency conditions (.33, (.34, (.35 and (.36 hold with k = 1. Due to the special structure of this example, we then know that the consistency conditions hold for all k. Define v = ( T, where 1 = l 1 η 1 + h 1 ξ 1, = h 1 η 1 l 1 ξ 1, 3 = c 1 γ 1 + d 1 τ 1, 4 = d 1 γ 1 c 1 τ 1. (.37 As will be seen, the first three consistent conditions provide an under-determined linear system with v, from which we can get a general solution by the Cramer s rule. The substitution of v into the fourth condition (.36, which is nonlinear, yields a desired value for the decay parameter t. 10

11 At first, the condition δ T 1 g = δ T 1 P g 1 = 0, that is (.33 with k = 1, asks [t cos θ 1 t sin θ 1 cos θ sin θ ] v = 0. (.38 The condition δ T γ 1 = δ T 1 M T (P Ig 1 = δ T 1 (ti M T g 1 = 0, that is (.34 with k = 1, requires the vector v to satisfy [t cos θ 1 sin θ 1 t(1 cos θ t sin θ ] v = 0. (.39 The requirement (.35 with k = 1, that is, 0 = δ T 1 g 3 + δ T 3 γ 1 = δ T 1 P g 1 + δ T 1 (M T (P Ig 1 = δ T 1 [P + tm T (M T ]g 1, yields the equation [ t cos θ1 + (t 1 cos θ 1 t sin θ 1 (1 + t sin θ 1 t (cos θ cos θ + cos θ t (sin θ sin θ sin θ ] v = 0. (.40 By (.38, (.39, (.40 and the choices of θ 1 and θ, we see that { i } must satisfy t t t (1 + t t t t (1 + t t ( + 1t By the Cramer s rule, to meet (.41, we may choose { i } as follows: 1 = ± [ ( + t 3 + (1 + t + 1 ], = ± [ ( + t 3 + (1 + t 1 ], 3 = ± [ t 3 + t + (1 t + 1 ], 4 = ± [ t 3 t + (1 + t 1 ]. = 0. (.41 (.4 Since g1 T δ 1 = = ±(t + 1(t + 1 and 0 < t < 1, we choose all the signs in (.4 as so that g1 T δ 1 < 0. We now want to substitute the general solution to the fourth consistent condition (.36. Noting that δ T 3 γ 1 = g3 T δ 1 and g4 T δ 4 = t g3 T δ 3, we know that (.36 with k = 1 is equivalent to [ ] g4 T δ 1 + g T δ 4 g1 T δ 4 + g3 T δ 1 t + gt 4 δ g T δ = 0. (.43 Further, using g T δ = t g T 1 δ 1, g T 4 δ = t g T 3 δ 1 and g T δ 4 = t g T 1 δ 3, (.43 can be simplified as g T 1 δ 1 [ g T 4 δ 1 g T 1 δ 4 + t(g T 3 δ 1 + g T 1 δ 3 ] + (g T 3 δ 1 = 0. (.44 11

12 Consequently, we obtain the following equation for the parameter t: where (t + 1 (t + 1 [ p (t + tp(t q(t ] = 0, (.45 p(t = ( + t + ( + t 1, q(t = ( + 3 t 3 (4 + 3 t + (3 + t. Further calculations provide ( [ p (t + tp(t q(t ] = [ t + (1 t + (1 ] [t + (1 3 t + ( ]. Considering the requirement that t ( 1, 1, one can deduce the following quadratic equation from (.45, ( t ( t = 0. (.46 The above equation has a unique root of t in the interval ( 1, 1, t = (.47 The numerical value of t is equal to approximately. Therefore if we choose the above t and the vectors δ 1 and g 1 such that (.4 holds with minus sign, then the prefixed steps and gradients satisfy the consistency conditions (.33, (.34, (.35 and (.36 for all k 1. There are many ways to choose δ 1 and g 1 satisfying (.4. Specifically, we choose ( η1 = ξ 1 ( 0, ( c1 = d 1 ( 0. (.48 In this case, by (.4 with minus signs and the definitions of i s, we can get that ( ( γ1 (17 8 t + ( = τ 1 ( t + (17 9, ( ( (.49 l1 ( 4 13 t + ( = h 1 ( 4 13 t + ( The initial step δ 1 and the initial gradient g 1 are then determined..5 Quasi-Newton Updating Matrices In this subsection we discuss the form of the BFGS quasi-newton updating matrices implied by (.5-(.8 under the consistent conditions (.33-(.36. A byproduct is that, we will know how to choose the initial matrix H 1 for the example. For this aim, we provide a follow-up lemma of Lemma.3. 1

13 Lemma.4. Assume that (.30 holds and the matrix H satisfies (.9. If further, the matrix A in (.3 is positive definite, there must exist nonsingular triangular matrices T 1 and T such that A = V T U = T T 1 T. (.50 Further, denoting ˆV = V T 1 = (ˆv 1, ˆv,..., ˆv n and the diagonal matrix T 1 1 T 1 = diag(t 1, t,..., t n, we have that H = n t i ˆv iˆv i T. (.51 i=1 Proof. The truth of (.50 is obvious by the Cholesky factorization of positive definite matrices. Further, by (.50 and the definition of ˆV, we get that Since HU = V, we obtain H = V U 1 = ( ( ˆV T 1 1 ˆV T U = T. T 1 ˆV T = ˆV ( T1 1 T 1 ˆV T, which with the definitions of ˆv i s and t i s implies the truth of (.51. Q.E.D. By the above lemma, we can express the form of H k+1 determined by (.5- (.6 under the consistent conditions (.33-(.36. At first, we make use of the steps {δ k+i ; i = 0, 1,, 3} to introduce the following two vectors that are orthogonal to γ k and g k+1 : z k = δt k+γ k δ T k γ k δ k δt k+g k+1 δ T k+1g k+1 δ k+1 + δ k+, w k = δt k+3γ k δ T k γ k δ k δt k+3g k+1 δ T k+1g k+1 δ k+1 + δ k+3. (.5 The vectors z k and w k are well defined because δ T k γ k = δ T k g k > 0 is positive due to the descent property of δ k. Direct calculations show that z T k γ k = z T k g k+1 = 0, w T k γ k = w T k g k+1 = 0. (.53 Further, we use z k and w k to define the vector that is orthogonal to g k+ : By the choice of v k, it is easy to see that v k = wt k g k+ z T k g k+ z k + w k. (.54 v T k γ k = v T k g k+1 = v T k g k+ = 0. (.55 13

14 Now we could express the matrix H k+1 by H k+1 = δ kδ T k δ T k g k δ k+1δ T k+1 δ T k+1g k+1 z kz T k z T k g k+ v kv T k v T k g k+3. (.56 Since δ T k g k < 0 for all k 1, we know that the matrix H k+1 is positive definite if and only if z T k g k+ < 0 and v T k g k+3 < 0. Direct calculations show that z T 1 g 3 = ( t ( < 0, v T 1 g 4 = ( t + ( < 0. Therefore we know that the matrix H is positive definite. In addition, it is difficult to build the relations δ T k+1g k+1 = t δ T k g k, z T k+1 g k+3 = t z T k g k+, v T k+1 g k+4 = t v T k g k+3, z k+1 = Mz k and v k+1 = M v k for all k 1. Thus we have by (.56 and δ k+1 = M δ k that H k+1 = 1 t MH km T, (.57 holds for all k and hence {H k : k } are all positive definite. With the above procedure, we can calculate the values of z 1, v 1 and H. Further, by (.3, we can obtain the initial matrix H 1 required by the example of this paper. H 1 = H δ 1δ T + δ δ T 1 δ T 1 γ 1 + gt δ g T 1 δ 1 δ 1 δ T 1 δ T 1 γ 1 := H , (.58 where H 1 is a symmetric matrix with entries H 1 (1, 1 = ( t + ( , H 1 (1, = ( t ( , H 1 (1, 3 = ( t ( , H 1 (1, 4 = ( t + ( H 1 (, = ( t + ( , H 1 (, 3 = 1068 t ( , H 1 (, 4 = ( t ( , H 1 (3, 3 = ( t + ( , H 1 (3, 4 = ( t + ( , H 1 (4, 4 = ( t + ( (.59 Direct calculations show that (.57 also holds with k = 1 and hence H 1 is a positive definite matrix. 14

15 3 Constructing A Suitable Objective Function 3.1 Recovering The Iterations and The Function Values Denote the rotation matrices ( cos θ1 sin θ R 1 = 1, R sin θ 1 cos θ = 1 ( cos θ sin θ sin θ cos θ. (3.1 If we write the steps {δ k } and gradients {g k } into the forms η i t 8j l i δ 8j+i = ξ i t 8j γ i, g 8j+i = t 8j h i c i ; i = 1,..., 8, (3. t 8j τ i d i we have from (.1, (., (.4 and (.5 that ( ( ηi+1 ηi = R ξ 1, i+1 ξ i ( ( li+1 li = t R h 1, i+1 h i ( γi+1 τ i+1 ( γi = t R, τ i ( ( ci+1 ci = R d. i+1 d i (3.3 For simplicity, we want the iterations {x k } asymptotically to turn around the eight vertices of some regular octagon Ω of the subspace spanned by the first and second coordinates. More exactly, such a regular octagon Ω has the origin as its center and its eight vertices V i are given by V i = (a i, b i, 0, 0 T with ( = b 1 1, ( a1 ( ai+1 b i+1 ( ai = R 1. (3.4 b i Here we should note that if there is no confusion, we also regard that Ω and V i s are defined in the subspace spanned by the first and second coordinates. To this aim, we ask the iterations {x k } to be of the form a i x 8j+i = b i t 8j p i, (3.5 t 8j q i where ( pi+1 q i+1 To decide the values of p 1 and q 1, noting that x 1 V 1 = x 1 lim j x 8j+1 = ( pi = t R. (3.6 q i 15 δ i, i=0

16 we have that ( p1 q 1 = i=0 ( γi τ i ( = (E tr 1 γ1, (3.7 τ 1 where again E is the two-dimensional identity matrix. Direct calculations show that ( p1 = 1 ( ( t + ( q ( t + ( (3.8 Now, let us assume that the limit of f(x k is f. Since for any j, the first two coordinates of {x 8j+i } always keep the same as those of its limit V i, we can get that ( T ( f(x 8j+i f = t 8j ci pi d i q i where = t 8j [ R i 1 ( T ( = t 8j+i 1 c1 p1 d 1 q 1 = t 8j+i 1 (f(x 1 f, ( c1 d 1 ] T [ (tr i 1 ( p1 q 1 ] f(x 1 f = c 1 p 1 + d 1 q 1 = ( t + ( Now we see the value of g T 8j+i δ 8j+i. Direct calculations show that t 8j T T l i η i l i η i g8j+i T δ 8j+i = t 8j h i ξ i c i t 8j γ i = t8j h i ξ i c i γ i d i t 8j τ i d i τ i = t 8j+i 1 (l 1 η 1 + h 1 ξ 1 + c 1 γ 1 + d 1 τ 1 = t 8j+i 1 g1 T δ 1, where g T 1 δ 1 = l 1 η 1 + h 1 ξ 1 + c 1 γ 1 + d 1 τ 1 = ( t + ( Therefore f(x 8j+i+1 f(x 8j+i α 8j+i g T 8j+iδ 8j+i = (f(x 8j+i+1 f (f(x 8j+i f g T 8j+iδ 8j+i = (t8j+i t 8j+i 1 (f(x 1 f t 8j+i 1 g T 1 δ 1 = (t 1(f(x 1 f g T 1 δ E 0. (3.9 16

17 The above relation, together with g T 8j+i+1 δ 8j+i = 0, implies that the stepsize α 8j+i = 1 can be accepted by the Wolfe line search, the Armijo line search and the Goldstein line search with suitable line search parameters. 3. Seeking A Suitable Form for The Objective Function We now consider how to construct a smooth function f such that its gradient at any point x 8j+i given in (3.5 are the one given in (3.; namely, f(x 8j+i = g 8j+i, for all j 0 and i = 1,..., 8. (3.10 To this aim, we assume that f is of the form f(x 1, x, x 3, x 4 = λ(x 1, x x 3 + µ(x 1, x x 4, (3.11 where λ and µ are two-dimensional functions to be determined. Since the function (3.11 is linear with x 3 and x 4, we know that f has no lower bound in R n. Consequently, for the sequence {x k } generated by some optimization method for the minimization of this function, it is expected that f(x k tends to, but this will not happen in our examples. With the prefixed form (3.11, we have that f(x 1, x, x 3, x 4 = λ x 1 x 3 + µ x 1 x 4 λ x x 3 + µ x x 4 λ(x 1, x µ(x 1, x. (3.1 Comparing the last two components of the right hand vector in (3.1 with the gradients in (3., we must have that λ(v i = c i, µ(v i = d i. (3.13 Consequently, by (.48 and (3.3, we can obtain the concrete values of λ and µ at the eight vertices of Ω, which are listed in the second and third rows in Table 3.1. Further, for each i, let J i be the Jacobian of (λ, µ at vertex V i and denote J i = ( λ x 1 λ x 1 µ µ x x Vi, J 1 = ( ω1 ω. (3.14 ω 3 ω 4 The comparison of the first two components of the right hand vector in (3.1 with the gradients in (3. leads to the relation ( ( pi li J i =. (3.15 q i To be such that (3.15 always holds, we ask J i+1 and J i to meet the condition h i J i+1 = R 1 J i R T (

18 for all i 1. In this case, if (3.15 holds for some i, we have by this, (3.6 and (3.3 that ( ( ( ( ( pi+1 J i+1 = (R q 1 J i R T pi pi li li+1 (tr = tr i+1 q 1 J i = tr i q 1 =, i h i h i+1 which means that (3.15 holds with i + 1. Therefore by the induction principle, to be such that (3.15 holds for all i 1, it remains to choose J 1 such that (3.15 holds with i = 1. Noticing that there are still two degrees of freedom, we ask the special relations ω = ω 3, ω 4 = ω 1. (3.17 Then the values of ω 1 and ω 3 can be solved from (3.15 with i = 1 and (3.17, ω 1 = ( t ( , ω 3 = ( t + ( (3.18 Therefore if we choose J 1 to be the one in (3.14 with the values in (3.17 and (3.18 and ask the relation (3.16 for all i 1, then we will have (3.15 for all i 1. Now, using (3.13, (3.16 and (3.17, we can list the function values and gradients of λ and µ at the vertices V i s of the octagon Ω into Table 3.1. V 1 V V 3 V 4 V 5 V 6 V 7 V 8 λ µ λ x 1 ω 1 ω 1 ω 1 -ω 1 ω 1 ω 1 ω 1 ω 1 λ x ω 3 ω 3 ω 3 ω 3 ω 3 ω 3 ω 3 ω 3 µ x 1 ω 3 ω 3 ω 3 ω 3 ω 3 ω 3 ω 3 ω 3 µ x ω 1 ω 1 -ω 1 ω 1 ω 1 ω 1 ω 1 ω 1 Table 3.1. Function values and gradients of λ and µ at V i s By using the special choice of Ω and observing the values in Table 3.1, we can ask the functions λ and µ to have the properties and µ(x 1, x = λ( x, x 1 (3.19 λ(x 1, x = λ( x 1, x (3.0 for all (x 1, x R. Further, by (3.0, we could think of constructing the function λ by a polynomial function with only odd orders. In the next subsection, we will construct a simple function that only meets the requirements (a and (c that are described in the first section. In 3.4, we construct a complicated function that can meet all the requirements (a, (b and (c. 18

19 3.3 A Simple Function that Meets (a and (c If we ignore the convexity requirement (b of the line search function, we can construct a relatively simple function λ(x 1, x to meet the interpolation conditions listed in Table 3.1 and the function µ(x 1, x is then given by (3.19. In fact, we can check that the following function λ f (x 1, x =(15 1 x (6 9 (x 4 1 x + x 1x 3 + ( x 3 1 +( 1 + (6x 1x + x 3 + ( x1 9 8 x. has values of 0, 1,, 1 at vertices V 1, V, V 3, V 4, respectively, and zero derivatives at all the vertices V i s. Further, we can check the following function λ g (x 1, x = 3 (x 1 1[x 1 (3 + ]x 4 has zero function values at all the vertices V i s, but could be used to interpolate the derivatives of λ(x 1, x at V i s. Consequently, we know that λ(x 1, x = λ f (x 1, x + ω 1 λ g (x 1, x + ω 3 λ g ( x, x 1 (3.1 meets all the interpolation conditions required by λ(x 1, x in Table 3.1. Consequently, we could say that for the function defined by (3.11, (3.1 and (3.19, the BFGS method with the Wolfe line search using the unit initial step does not converge. Observing that Ψ i (α = tψ i (α, Ψ i(α = tψ i(α at α = 0, 1, we ideally wish there is the following relation between Ψ i (α and Ψ i+1 (α: Ψ i+1 (α = t Ψ i (α, for all α 0 (3. so that the neighboring line search functions only differ up to a constant multiplier of t. However, the choice (3.1 of λ(x 1, x does not lead to (3.. Nevertheless, we can propose the following compensation function and define λ c (x 1, x = x 1 [ x 1 + x ( + ] ˆλ(x 1, x = λ(x 1, x + c 1 λ c (x 1, x + c λ c ( x, x 1, (3.3 where c 1 = ( t ( , c = ( t + (

20 In this case, the relation (3. will always hold and the corresponding line search function at the first iteration is 6 ˆΨ 1 (α = ρ i α i, (3.4 where i=0 ρ 6 = ( t + ( , ρ 5 = ( t ( , ρ 4 = ( t + ( , ρ 3 = ( t ( , ρ = ( t + ( , ρ 1 = g T 1 δ 1 = ( t + (40 18, ρ 0 = f(x 1 = ( t + ( To sum up at this stage, for the function (3.11 with λ(x 1, x given in (3.1 or (3.3 and µ(x 1, x = λ( x, x 1, if the initial point is x 1 = (a 1, b 1, p 1, q 1 T (see (3.4, (3.8, (.47 for their values and if the initial matrix is H 1 given by (.58 and (.59, then the BFGS method (1. and (1.5 with α k 1 will generate the iterations in (3.5 whose gradients are given by (.4-(.5. Therefore the method will asymptotically cycle around the eight vertices of a regular octagon without approaching a stationary point or pushing f(x k. 3.4 A Complicated Function that Meets (a, (b and (c To meet the requirement (b, this subsection gives a further compensation to the function ˆλ(x 1, x in (3.3 such that each line search function Ψ k (α is strongly convex. To this aim, we consider the straight line connecting x 8j+i and x 8j+i+1 : a i + α η i l i (α = x 8j+i + α δ 8j+i = b i + α ξ i t 8j (p i + α γ i. (3.5 t 8j (q i + α τ i By (3.1, the line search function at the (8j + i-th iteration is Ψ 8j+i (α = t 8j[ λ(a i +α η i, b i +α ξ i ( p i +α γ i +µ(ai +α η i, b i +α ξ i ( q i +α τ i ]. (3.6 0

(Quasi-)Newton methods

(Quasi-)Newton methods (Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

More information

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d). 1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

More information

The Characteristic Polynomial

The Characteristic Polynomial Physics 116A Winter 2011 The Characteristic Polynomial 1 Coefficients of the characteristic polynomial Consider the eigenvalue problem for an n n matrix A, A v = λ v, v 0 (1) The solution to this problem

More information

Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

More information

IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL. 1. Introduction

IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL. 1. Introduction IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL R. DRNOVŠEK, T. KOŠIR Dedicated to Prof. Heydar Radjavi on the occasion of his seventieth birthday. Abstract. Let S be an irreducible

More information

Inner Product Spaces and Orthogonality

Inner Product Spaces and Orthogonality Inner Product Spaces and Orthogonality week 3-4 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,

More information

Applications to Data Smoothing and Image Processing I

Applications to Data Smoothing and Image Processing I Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is

More information

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Mathematics Course 111: Algebra I Part IV: Vector Spaces Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are

More information

13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions. 3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in three-space, we write a vector in terms

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

1 VECTOR SPACES AND SUBSPACES

1 VECTOR SPACES AND SUBSPACES 1 VECTOR SPACES AND SUBSPACES What is a vector? Many are familiar with the concept of a vector as: Something which has magnitude and direction. an ordered pair or triple. a description for quantities such

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

We call this set an n-dimensional parallelogram (with one vertex 0). We also refer to the vectors x 1,..., x n as the edges of P.

We call this set an n-dimensional parallelogram (with one vertex 0). We also refer to the vectors x 1,..., x n as the edges of P. Volumes of parallelograms 1 Chapter 8 Volumes of parallelograms In the present short chapter we are going to discuss the elementary geometrical objects which we call parallelograms. These are going to

More information

Notes on Symmetric Matrices

Notes on Symmetric Matrices CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.

More information

Classification of Cartan matrices

Classification of Cartan matrices Chapter 7 Classification of Cartan matrices In this chapter we describe a classification of generalised Cartan matrices This classification can be compared as the rough classification of varieties in terms

More information

LINEAR ALGEBRA W W L CHEN

LINEAR ALGEBRA W W L CHEN LINEAR ALGEBRA W W L CHEN c W W L Chen, 1997, 2008 This chapter is available free to all individuals, on understanding that it is not to be used for financial gain, and may be downloaded and/or photocopied,

More information

ISOMETRIES OF R n KEITH CONRAD

ISOMETRIES OF R n KEITH CONRAD ISOMETRIES OF R n KEITH CONRAD 1. Introduction An isometry of R n is a function h: R n R n that preserves the distance between vectors: h(v) h(w) = v w for all v and w in R n, where (x 1,..., x n ) = x

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

MATH 590: Meshfree Methods

MATH 590: Meshfree Methods MATH 590: Meshfree Methods Chapter 7: Conditionally Positive Definite Functions Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2010 fasshauer@iit.edu MATH 590 Chapter

More information

On the representability of the bi-uniform matroid

On the representability of the bi-uniform matroid On the representability of the bi-uniform matroid Simeon Ball, Carles Padró, Zsuzsa Weiner and Chaoping Xing August 3, 2012 Abstract Every bi-uniform matroid is representable over all sufficiently large

More information

Recall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n.

Recall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n. ORTHOGONAL MATRICES Informally, an orthogonal n n matrix is the n-dimensional analogue of the rotation matrices R θ in R 2. When does a linear transformation of R 3 (or R n ) deserve to be called a rotation?

More information

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0. Cross product 1 Chapter 7 Cross product We are getting ready to study integration in several variables. Until now we have been doing only differential calculus. One outcome of this study will be our ability

More information

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen (für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained

More information

The NEWUOA software for unconstrained optimization without derivatives 1. M.J.D. Powell

The NEWUOA software for unconstrained optimization without derivatives 1. M.J.D. Powell DAMTP 2004/NA05 The NEWUOA software for unconstrained optimization without derivatives 1 M.J.D. Powell Abstract: The NEWUOA software seeks the least value of a function F(x), x R n, when F(x) can be calculated

More information

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1. MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

More information

Similarity and Diagonalization. Similar Matrices

Similarity and Diagonalization. Similar Matrices MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

AN ITERATIVE METHOD FOR THE HERMITIAN-GENERALIZED HAMILTONIAN SOLUTIONS TO THE INVERSE PROBLEM AX = B WITH A SUBMATRIX CONSTRAINT

AN ITERATIVE METHOD FOR THE HERMITIAN-GENERALIZED HAMILTONIAN SOLUTIONS TO THE INVERSE PROBLEM AX = B WITH A SUBMATRIX CONSTRAINT Bulletin of the Iranian Mathematical Society Vol. 39 No. 6 (2013), pp 129-1260. AN ITERATIVE METHOD FOR THE HERMITIAN-GENERALIZED HAMILTONIAN SOLUTIONS TO THE INVERSE PROBLEM AX = B WITH A SUBMATRIX CONSTRAINT

More information

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions. Chapter 1 Vocabulary identity - A statement that equates two equivalent expressions. verbal model- A word equation that represents a real-life problem. algebraic expression - An expression with variables.

More information

Ideal Class Group and Units

Ideal Class Group and Units Chapter 4 Ideal Class Group and Units We are now interested in understanding two aspects of ring of integers of number fields: how principal they are (that is, what is the proportion of principal ideals

More information

LINEAR ALGEBRA. September 23, 2010

LINEAR ALGEBRA. September 23, 2010 LINEAR ALGEBRA September 3, 00 Contents 0. LU-decomposition.................................... 0. Inverses and Transposes................................. 0.3 Column Spaces and NullSpaces.............................

More information

Roots of Polynomials

Roots of Polynomials Roots of Polynomials (Com S 477/577 Notes) Yan-Bin Jia Sep 24, 2015 A direct corollary of the fundamental theorem of algebra is that p(x) can be factorized over the complex domain into a product a n (x

More information

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set.

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set. MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set. Vector space A vector space is a set V equipped with two operations, addition V V (x,y) x + y V and scalar

More information

3. INNER PRODUCT SPACES

3. INNER PRODUCT SPACES . INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.

More information

Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number

More information

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation

More information

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem

More information

ON CERTAIN DOUBLY INFINITE SYSTEMS OF CURVES ON A SURFACE

ON CERTAIN DOUBLY INFINITE SYSTEMS OF CURVES ON A SURFACE i93 c J SYSTEMS OF CURVES 695 ON CERTAIN DOUBLY INFINITE SYSTEMS OF CURVES ON A SURFACE BY C H. ROWE. Introduction. A system of co 2 curves having been given on a surface, let us consider a variable curvilinear

More information

Diagonalisation. Chapter 3. Introduction. Eigenvalues and eigenvectors. Reading. Definitions

Diagonalisation. Chapter 3. Introduction. Eigenvalues and eigenvectors. Reading. Definitions Chapter 3 Diagonalisation Eigenvalues and eigenvectors, diagonalisation of a matrix, orthogonal diagonalisation fo symmetric matrices Reading As in the previous chapter, there is no specific essential

More information

Integer Factorization using the Quadratic Sieve

Integer Factorization using the Quadratic Sieve Integer Factorization using the Quadratic Sieve Chad Seibert* Division of Science and Mathematics University of Minnesota, Morris Morris, MN 56567 seib0060@morris.umn.edu March 16, 2011 Abstract We give

More information

Numerical Methods for Solving Systems of Nonlinear Equations

Numerical Methods for Solving Systems of Nonlinear Equations Numerical Methods for Solving Systems of Nonlinear Equations by Courtney Remani A project submitted to the Department of Mathematical Sciences in conformity with the requirements for Math 4301 Honour s

More information

Vector and Matrix Norms

Vector and Matrix Norms Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

More information

Duality of linear conic problems

Duality of linear conic problems Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least

More information

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

More information

By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

By choosing to view this document, you agree to all provisions of the copyright laws protecting it. This material is posted here with permission of the IEEE Such permission of the IEEE does not in any way imply IEEE endorsement of any of Helsinki University of Technology's products or services Internal

More information

Section 1.1. Introduction to R n

Section 1.1. Introduction to R n The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to

More information

SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA)

SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA) SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI (B.Sc.(Hons.), BUAA) A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF MATHEMATICS NATIONAL

More information

Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test

Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test Math Review for the Quantitative Reasoning Measure of the GRE revised General Test www.ets.org Overview This Math Review will familiarize you with the mathematical skills and concepts that are important

More information

Solving polynomial least squares problems via semidefinite programming relaxations

Solving polynomial least squares problems via semidefinite programming relaxations Solving polynomial least squares problems via semidefinite programming relaxations Sunyoung Kim and Masakazu Kojima August 2007, revised in November, 2007 Abstract. A polynomial optimization problem whose

More information

Math 215 HW #6 Solutions

Math 215 HW #6 Solutions Math 5 HW #6 Solutions Problem 34 Show that x y is orthogonal to x + y if and only if x = y Proof First, suppose x y is orthogonal to x + y Then since x, y = y, x In other words, = x y, x + y = (x y) T

More information

Factorization Theorems

Factorization Theorems Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

More information

BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION

BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION Ş. İlker Birbil Sabancı University Ali Taylan Cemgil 1, Hazal Koptagel 1, Figen Öztoprak 2, Umut Şimşekli

More information

1 Introduction to Matrices

1 Introduction to Matrices 1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns

More information

CONTROLLABILITY. Chapter 2. 2.1 Reachable Set and Controllability. Suppose we have a linear system described by the state equation

CONTROLLABILITY. Chapter 2. 2.1 Reachable Set and Controllability. Suppose we have a linear system described by the state equation Chapter 2 CONTROLLABILITY 2 Reachable Set and Controllability Suppose we have a linear system described by the state equation ẋ Ax + Bu (2) x() x Consider the following problem For a given vector x in

More information

1 Short Introduction to Time Series

1 Short Introduction to Time Series ECONOMICS 7344, Spring 202 Bent E. Sørensen January 24, 202 Short Introduction to Time Series A time series is a collection of stochastic variables x,.., x t,.., x T indexed by an integer value t. The

More information

PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include 2 + 5.

PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include 2 + 5. PUTNAM TRAINING POLYNOMIALS (Last updated: November 17, 2015) Remark. This is a list of exercises on polynomials. Miguel A. Lerma Exercises 1. Find a polynomial with integral coefficients whose zeros include

More information

The Open University s repository of research publications and other research outputs

The Open University s repository of research publications and other research outputs Open Research Online The Open University s repository of research publications and other research outputs The degree-diameter problem for circulant graphs of degree 8 and 9 Journal Article How to cite:

More information

The Inverse of a Matrix

The Inverse of a Matrix The Inverse of a Matrix 7.4 Introduction In number arithmetic every number a ( 0) has a reciprocal b written as a or such that a ba = ab =. Some, but not all, square matrices have inverses. If a square

More information

Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Numerical Analysis Lecture Notes Peter J. Olver 6. Eigenvalues and Singular Values In this section, we collect together the basic facts about eigenvalues and eigenvectors. From a geometrical viewpoint,

More information

Linear Programming I

Linear Programming I Linear Programming I November 30, 2003 1 Introduction In the VCR/guns/nuclear bombs/napkins/star wars/professors/butter/mice problem, the benevolent dictator, Bigus Piguinus, of south Antarctica penguins

More information

CS3220 Lecture Notes: QR factorization and orthogonal transformations

CS3220 Lecture Notes: QR factorization and orthogonal transformations CS3220 Lecture Notes: QR factorization and orthogonal transformations Steve Marschner Cornell University 11 March 2009 In this lecture I ll talk about orthogonal matrices and their properties, discuss

More information

24. The Branch and Bound Method

24. The Branch and Bound Method 24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no

More information

Nonlinear Iterative Partial Least Squares Method

Nonlinear Iterative Partial Least Squares Method Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for

More information

Notes on Determinant

Notes on Determinant ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without

More information

THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS

THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS KEITH CONRAD 1. Introduction The Fundamental Theorem of Algebra says every nonconstant polynomial with complex coefficients can be factored into linear

More information

3.8 Finding Antiderivatives; Divergence and Curl of a Vector Field

3.8 Finding Antiderivatives; Divergence and Curl of a Vector Field 3.8 Finding Antiderivatives; Divergence and Curl of a Vector Field 77 3.8 Finding Antiderivatives; Divergence and Curl of a Vector Field Overview: The antiderivative in one variable calculus is an important

More information

INTRODUCTORY LINEAR ALGEBRA WITH APPLICATIONS B. KOLMAN, D. R. HILL

INTRODUCTORY LINEAR ALGEBRA WITH APPLICATIONS B. KOLMAN, D. R. HILL SOLUTIONS OF THEORETICAL EXERCISES selected from INTRODUCTORY LINEAR ALGEBRA WITH APPLICATIONS B. KOLMAN, D. R. HILL Eighth Edition, Prentice Hall, 2005. Dr. Grigore CĂLUGĂREANU Department of Mathematics

More information

1 Sets and Set Notation.

1 Sets and Set Notation. LINEAR ALGEBRA MATH 27.6 SPRING 23 (COHEN) LECTURE NOTES Sets and Set Notation. Definition (Naive Definition of a Set). A set is any collection of objects, called the elements of that set. We will most

More information

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing

More information

SECRET sharing schemes were introduced by Blakley [5]

SECRET sharing schemes were introduced by Blakley [5] 206 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 1, JANUARY 2006 Secret Sharing Schemes From Three Classes of Linear Codes Jin Yuan Cunsheng Ding, Senior Member, IEEE Abstract Secret sharing has

More information

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009. Notes on Algebra

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009. Notes on Algebra U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009 Notes on Algebra These notes contain as little theory as possible, and most results are stated without proof. Any introductory

More information

CURVES WHOSE SECANT DEGREE IS ONE IN POSITIVE CHARACTERISTIC. 1. Introduction

CURVES WHOSE SECANT DEGREE IS ONE IN POSITIVE CHARACTERISTIC. 1. Introduction Acta Math. Univ. Comenianae Vol. LXXXI, 1 (2012), pp. 71 77 71 CURVES WHOSE SECANT DEGREE IS ONE IN POSITIVE CHARACTERISTIC E. BALLICO Abstract. Here we study (in positive characteristic) integral curves

More information

On the Interaction and Competition among Internet Service Providers

On the Interaction and Competition among Internet Service Providers On the Interaction and Competition among Internet Service Providers Sam C.M. Lee John C.S. Lui + Abstract The current Internet architecture comprises of different privately owned Internet service providers

More information

Rotation Rate of a Trajectory of an Algebraic Vector Field Around an Algebraic Curve

Rotation Rate of a Trajectory of an Algebraic Vector Field Around an Algebraic Curve QUALITATIVE THEORY OF DYAMICAL SYSTEMS 2, 61 66 (2001) ARTICLE O. 11 Rotation Rate of a Trajectory of an Algebraic Vector Field Around an Algebraic Curve Alexei Grigoriev Department of Mathematics, The

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 10 Boundary Value Problems for Ordinary Differential Equations Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign

More information

SOLVING LINEAR SYSTEMS

SOLVING LINEAR SYSTEMS SOLVING LINEAR SYSTEMS Linear systems Ax = b occur widely in applied mathematics They occur as direct formulations of real world problems; but more often, they occur as a part of the numerical analysis

More information

CHAPTER 1 Splines and B-splines an Introduction

CHAPTER 1 Splines and B-splines an Introduction CHAPTER 1 Splines and B-splines an Introduction In this first chapter, we consider the following fundamental problem: Given a set of points in the plane, determine a smooth curve that approximates the

More information

Overview of Math Standards

Overview of Math Standards Algebra 2 Welcome to math curriculum design maps for Manhattan- Ogden USD 383, striving to produce learners who are: Effective Communicators who clearly express ideas and effectively communicate with diverse

More information

Solution of Linear Systems

Solution of Linear Systems Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start

More information

Structure of the Root Spaces for Simple Lie Algebras

Structure of the Root Spaces for Simple Lie Algebras Structure of the Root Spaces for Simple Lie Algebras I. Introduction A Cartan subalgebra, H, of a Lie algebra, G, is a subalgebra, H G, such that a. H is nilpotent, i.e., there is some n such that (H)

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

More information

A Distributed Line Search for Network Optimization

A Distributed Line Search for Network Optimization 01 American Control Conference Fairmont Queen Elizabeth, Montréal, Canada June 7-June 9, 01 A Distributed Line Search for Networ Optimization Michael Zargham, Alejandro Ribeiro, Ali Jadbabaie Abstract

More information

A note on companion matrices

A note on companion matrices Linear Algebra and its Applications 372 (2003) 325 33 www.elsevier.com/locate/laa A note on companion matrices Miroslav Fiedler Academy of Sciences of the Czech Republic Institute of Computer Science Pod

More information

The basic unit in matrix algebra is a matrix, generally expressed as: a 11 a 12. a 13 A = a 21 a 22 a 23

The basic unit in matrix algebra is a matrix, generally expressed as: a 11 a 12. a 13 A = a 21 a 22 a 23 (copyright by Scott M Lynch, February 2003) Brief Matrix Algebra Review (Soc 504) Matrix algebra is a form of mathematics that allows compact notation for, and mathematical manipulation of, high-dimensional

More information

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they

More information

Notes on Factoring. MA 206 Kurt Bryan

Notes on Factoring. MA 206 Kurt Bryan The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor

More information

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Linear Algebra Notes for Marsden and Tromba Vector Calculus Linear Algebra Notes for Marsden and Tromba Vector Calculus n-dimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of

More information

CHAPTER SIX IRREDUCIBILITY AND FACTORIZATION 1. BASIC DIVISIBILITY THEORY

CHAPTER SIX IRREDUCIBILITY AND FACTORIZATION 1. BASIC DIVISIBILITY THEORY January 10, 2010 CHAPTER SIX IRREDUCIBILITY AND FACTORIZATION 1. BASIC DIVISIBILITY THEORY The set of polynomials over a field F is a ring, whose structure shares with the ring of integers many characteristics.

More information

ASEN 3112 - Structures. MDOF Dynamic Systems. ASEN 3112 Lecture 1 Slide 1

ASEN 3112 - Structures. MDOF Dynamic Systems. ASEN 3112 Lecture 1 Slide 1 19 MDOF Dynamic Systems ASEN 3112 Lecture 1 Slide 1 A Two-DOF Mass-Spring-Dashpot Dynamic System Consider the lumped-parameter, mass-spring-dashpot dynamic system shown in the Figure. It has two point

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Thnkwell s Homeschool Precalculus Course Lesson Plan: 36 weeks

Thnkwell s Homeschool Precalculus Course Lesson Plan: 36 weeks Thnkwell s Homeschool Precalculus Course Lesson Plan: 36 weeks Welcome to Thinkwell s Homeschool Precalculus! We re thrilled that you ve decided to make us part of your homeschool curriculum. This lesson

More information

MATH 551 - APPLIED MATRIX THEORY

MATH 551 - APPLIED MATRIX THEORY MATH 55 - APPLIED MATRIX THEORY FINAL TEST: SAMPLE with SOLUTIONS (25 points NAME: PROBLEM (3 points A web of 5 pages is described by a directed graph whose matrix is given by A Do the following ( points

More information

South Carolina College- and Career-Ready (SCCCR) Pre-Calculus

South Carolina College- and Career-Ready (SCCCR) Pre-Calculus South Carolina College- and Career-Ready (SCCCR) Pre-Calculus Key Concepts Arithmetic with Polynomials and Rational Expressions PC.AAPR.2 PC.AAPR.3 PC.AAPR.4 PC.AAPR.5 PC.AAPR.6 PC.AAPR.7 Standards Know

More information

Linear Algebra I. Ronald van Luijk, 2012

Linear Algebra I. Ronald van Luijk, 2012 Linear Algebra I Ronald van Luijk, 2012 With many parts from Linear Algebra I by Michael Stoll, 2007 Contents 1. Vector spaces 3 1.1. Examples 3 1.2. Fields 4 1.3. The field of complex numbers. 6 1.4.

More information

Some Notes on Taylor Polynomials and Taylor Series

Some Notes on Taylor Polynomials and Taylor Series Some Notes on Taylor Polynomials and Taylor Series Mark MacLean October 3, 27 UBC s courses MATH /8 and MATH introduce students to the ideas of Taylor polynomials and Taylor series in a fairly limited

More information

Linear Programming. March 14, 2014

Linear Programming. March 14, 2014 Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1

More information

The Dirichlet Unit Theorem

The Dirichlet Unit Theorem Chapter 6 The Dirichlet Unit Theorem As usual, we will be working in the ring B of algebraic integers of a number field L. Two factorizations of an element of B are regarded as essentially the same if

More information