# A Perfect Example for The BFGS Method

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 A Perfect Example for The BFGS Method Yu-Hong Dai Abstract Consider the BFGS quasi-newton method applied to a general nonconvex function that has continuous second derivatives. This paper aims to construct a four-dimensional example such that the BFGS method need not converge. The example is perfect in the following sense: (a All the stepsizes are exactly equal to one; the unit stepsize can also be accepted by various line searches including the Wolfe line search and the Arjimo line search; (b The objective function is strongly convex along each search direction although it is not in itself. The unit stepsize is the unique minimizer of each line search function. Hence the example also applies to the global line search and the line search that always picks the first local minimizer; (c The objective function is polynomial and hence is infinitely continuously differentiable. If relaxing the convexity requirement of the line search function; namely, (b, we are able to construct a relatively simple polynomial example. Keywords. unconstrained optimization, quasi-newton method, nonconvex function, global convergence. 1 Introduction Consider the unconstrained optimization problem min f(x, x R n, (1.1 where f is a general non-convex function that has continuous second derivatives. The quasi-newton method is a class of well-known and efficient methods for solving (1.1. It is of the iterative scheme x k+1 = x k α k H k g k, (1. This work was partially supported by the Chinese NSF grants and and the CAS grant kjcx-yw-s7-03. State Key Laboratory of Scientific and Engineering Computing, Institute of Computational Mathematics and Scientific/Engineering Computing, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, P.O. Box 719, Beijing , P.R. China. 1

2 where x 1 is a starting point, H k is some approximation to the inverse Hessian [ f(x k ] 1, g k = f(x k, and α k is a stepsize obtained in some way. Defining the vectors δ k = x k+1 x k, γ k = g k+1 g k, the quasi-newton method asks the next approximation matrix H k+1 to satisfy the secant equation H k+1 γ k = δ k. (1.3 This similarity to the identical equation [ f(x k+1 ] 1 γ k = δ k in the quadratic case enables the quasi-newton method to be superlinearly convergent (see Dennis and Moré [5], for example and makes it very attractive in practical optimization. The first quasi-newton method was dated back to Davidon [4] and Fletcher and Powell [9]. The DFP method updates the approximation matrix H k to H k+1 by the formula H k+1 = H k H kγ k γ T k H k γ T k H kγ k + δ kδ T k δ T k γ k. (1.4 Nowadays, the most efficient quasi-newton method is perhaps the BFGS method, which was proposed by Broyden [], Fletcher [7], Goldfarb [10], and Shanno [4], independently. The matrix H k+1 in the BFGS method can be updated by the way H k+1 = H k δ kγ T k H k + H k γ k δ T ( k δ T γt k H kγ k δk δ T k k γ k δ T k γ k δ T. (1.5 k γ k To pass the positive definiteness of the matrix H k to H k+1, practical quasi- Newton algorithms make use of the Wolfe line search to calculate the stepsize α k, which ensures a positive curvature to be found at each iteration; namely, δ T k γ k > 0. More exactly, defining the one-dimensional line search function Ψ k (α = f(x k αh k g k, α 0, (1.6 the Wolfe line search consists in finding a stepsize α k such that and Ψ k (α k Ψ k (0 + µ Ψ k(0 α k (1.7 Ψ k(α k η Ψ k(0, (1.8 where µ and η are constants with 0 < µ η < 1. There have been a large quantity of research works devoting to the global convergence of the quasi-newton method (see Yuan [8] and Sun and Yuan [6], for example. Specifically, Powell [18] showed that the DFP method with exact line searches is globally convergent for uniformly convex functions. In another paper [0], Powell established the global convergence of the BFGS method with the Wolfe line search for uniformly convex functions. This result was extended

3 by Byrd, Nocedal and Yuan [1] to the whole Broyden s convex family of methods except the DFP method. Therefore it is natural to ask the following question: Does the DFP method with the Wolfe line search converge globally for uniformly convex functions? On the other hand, since the BFGS method with the Wolfe line search works well for both convex and non-convex functions, we might ask another question: Does the BFGS method with the Wolfe line search converge globally for general functions? The difficulty and importance of the two convergence problems has been addressed in many situations, including Nocedal [15], Fletcher [8], and Yuan [8]. Recent studies provide a negative answer to the convergence problem of the BFGS method for nonconvex functions. As a matter of fact, in an early paper [1], which analyzes the convergence properties of the conjugate gradient method, Powell mentioned that the BFGS method need not converge if the line search can pick any local minimizer of the line search function Ψ k (α. After further studies on the two-dimensional example in [1], Dai [3] presented an example with six cycling points and showed by the example that the BFGS method with the Wolfe line search may fail for nonconvex functions. Later, Mascarenhas [14] constructed a three-dimensional counter-example such that the BFGS method does not converge if the line search picks the global minimizer of the function Ψ k (α. It should be noted that the stepsize in the counterexample of [14] also satisfies the Wolfe line search conditions. However, neither examples are such that the stepsize is the first local minimizer of the line search function Ψ k (α. Surprisingly enough, if there are only two variables, and if the stepsize is chosen to be the first local minimizer of Ψ k (α; namely, α k = arg min {α > 0 : α is a local minimizer of Ψ k (α}, (1.9 Powell [] established the global convergence of the BFGS method for general twice continuously differentiable function. Powell s proof makes use of the principle of contradiction and is quite sophisticated. Assuming the nonconvergence of the method and the relation lim inf g k 0, Powell showed that the limit points of the BFGS path k P = {x : x lies on the line segment connecting x k and x k+1 for some k 1} (that is exactly the same as the Polak-Ribière conjugate gradient path if n = and gk+1 T δ k = 0 for all k forms some line segment L. The use of the specific line search (1.9 is such that the objective function is monotonically decreasing on the line segment L k that connects x k and x k+1. It follows that f(x lim f(x k for all x L. Consequently, the line segment L k cannot cross L k and all the points x k with large indices lies in the same side of L. Furthermore, Powell showed that starting around from one end of the line segment L, the BFGS path can not get any close to the other end of L and finally obtained a contradiction. Intuitively, it is difficult to extend Powell s result to the case when n 3. This is because, even if it has been shown that the BFGS path 3

4 tends to some limit line segment L, each line segment L k could turn around L providing that n 3, making the similar analysis of the BFGS path complicated and nearly impossible. The purpose of this paper is to construct a four-dimensional counter-example for the BFGS method with the following features: (a All the stepsizes in the example are exactly equal to one; namely, α k 1; (1.10 For the search direction defined by the BFGS update in the example, a unit stepsize satisfies various line search conditions, including the Wolfe conditions and the Armijo conditions; (b The objective function is strongly convex along each search direction; namely, the line search function Ψ k (α is strongly convex for all k, although the objective function is not in itself. The unit stepsize is the unique minimizer of Ψ k (α. Hence the example also applies to the global line search and the specific line search (1.9; (c The objective function is polynomial and hence is infinitely continuously differentiable. On the other hand, the objective function in our example is linear in the third and fourth variables and thus has no local minimizer. However, the iterations generated by the BFGS method tend to a non-stationary point. As seen from 3.4, the construction of the example is quite complicated. To be such that the line search function Ψ k (α is strongly convex, a polynomial of degree 38 is introduced. Nevertheless, if we relax the convexity requirement of Ψ k (α; namely, (b in the above, it is possible to construct a relatively simple polynomial example of low degree (see 3.3. The construction of our examples can be divided into four procedures: (1 Prefix the special forms of the steps {δ k } and gradients {g k }, leaving several parameters to be determined later. In this case, once x 1 is given, the whole sequence {x k } is fixed. As seen in.1, our steps {δ k } and gradients {g k } are asked to possess some symmetrical and cyclic properties and to push the iterations {x k } tend to the eight vertices of a regular octagon. This is very helpful in simplifying the construction of the examples and we can focus our attention on the choice of the parameter t, that answers for the decay of the last two components of δ k and the first two components of g k. ( To enable the BFGS method to generate those prefixed steps, investigate the consistency conditions on the steps {δ k } and gradients {g k }. With the prefixed forms of {δ k } and {g k }, we show in. that the unit stepsize is indispensable, whereas this stepsize is usually used as the first trial stepsize in the implementation of quasi-newton methods. Four more consistency conditions on {δ k } and {g k } are obtained in.3 by expressing the vectors H k+1 γ k, H k+1 g k+1, H k+1 g k+ and H k+1 g k+3 by some δ k s and g k s instead of considering the quasi-newton 4

5 matrix H k+1 itself. (3 Choose the parameters in some way to satisfy the consistency conditions and other necessary conditions on the objective function. As seen in.4, the first three consistency conditions are actually corresponding to some under-determined linear system. Substituting its general solution by the Cramer s rule into the fourth consistency condition, we are led to a nonlinear equation from which the exact value can be obtained for the decay parameter t. By suitably choosing the other parameters, we can express all the quasi-newton updating matrices including the initial choice for H 1 in.5. (4 Construct a suitable objective function f whose gradients are the preassigned values; namely, f(x k = g k for all k 1 and the line search has the desired properties. This will be done in the whole third section. To make full use of symmetrical and cyclic properties of the steps {δ k } and {g k }, we carefully choose a special form of the objective function. In addition, the introduction of element function φ in 3.4 helps us greatly to convexify the line search function and finally complete the perfect example that meets all the requirements (a, (b and (c. Some concluding remarks are given in the last section. Looking for Consistent Steps and Gradients.1 The Forms of Steps and Gradients Consider the case of four dimension. Inspired by [1], [3] and [14], we assume that the steps {δ k } have the following form: δ 1 = (η 1, ξ 1, γ 1, τ 1 T ; δ k+1 = M δ k (k 1, (.1 where M is the 4 4 matrix defined by ( cos θ1 sin θ 1 M = sin θ 1 cos θ 1 0 t 0 ( cos θ sin θ sin θ cos θ. (. In the above, t is a parameter satisfying 0 < t < 1. This parameter answers for the decay of the last two components of the steps {δ k }. The angles θ 1 and θ are chosen so that θ 1 = π m 1 and θ = π m for some positive integers m 1 and m. Specifically, we choose in this paper θ 1 = 1 4 π, θ = 3 π. (.3 4 Consequently, the first two components of the steps {δ k } turn to the same after every eight iterations, and the last two components will shrink at a factor of t 8 after every eight iterations. Further, we can see that the iterations {x k } will tend to turn around the eight vertices of some regular octagon (see 3.1 for more details. Accordingly, the gradients {g k } are assumed to be of the form g 1 = (l 1, h 1, c 1, d 1 T ; g k+1 = P g k (k 1, (.4 5

6 where P = ( cos θ1 sin θ t 1 sin θ 1 cos θ 1 0 ( cos θ 0 sin θ. (.5 sin θ cos θ Since 0 < t < 1, we see that the first two components of {g k } vanish, whereas the last two components of the gradient {g k }, turn to the same after eight iterations. It follows that none of the cluster points generated by the BFGS method are stationary points.. Unit Stepsizes It is well known that the unit stepsize is usually used as the first trial stepsize in practical implementations of the BFGS method. Under suitable assumptions on f, the unit stepsize will be accepted by the line search as the iterates tend to the solution and will enable a superlinear convergence step (see [5], for example. In the following, we are going to show that if the line search satisfies g T k+1δ k = 0, for all k 1, (.6 and if the steps generated by the BFGS method have the form (.1-(. and the gradients have the form (.4-(.5, then the use of unit stepsizes is also indispensable. To begin with, we see that the line search condition (.6, the secant equation (1.3 and the definition of the search direction indicate that H k+1 g k+1 = α 1 k+1 δ k+1 (.7 δ T k+1γ k = α k+1 g T k+1h k+1 γ k = α k+1 g T k+1δ k = 0. (.8 The above relation (.8 is sometimes called as the conjugacy condition in the context of nonlinear conjugate gradient methods. By multiplying the BFGS updating formula (1.5 with g k+1 and using (.6, which with (.7 gives H k+1 g k+1 = H k g k+1 γt k H kg k+1 δ T k γ k δ k, H k g k+1 = α 1 k+1 δ k+1 + γt k H kg k+1 δ T k γ k δ k. (.9 Multiplying (.9 by g T k and noticing gt k H kg k+1 = 0, we get that 0 = α 1 k+1 gt k δ k+1 + γt k H kg k+1 δ T gk T δ k = α 1 k+1 k γ gt k δ k+1 γ T k H k g k+1. k 6

7 Thus γ T k H kg k+1 = α 1 k+1 gt k δ k+1 = α 1 k+1 gt k+1 δ k+1. Hence, by (.9, H k g k+1 = α 1 k+1 δ k+1 α 1 gk+1 T δ k+1 k+1 δ T δ k. (.10 k γ k It follows from (.10 and (.7 with k replaced by k 1 that [ H k γ k = α 1 k+1 δ k+1 + α 1 k α 1 gk+1 T δ ] k+1 k+1 δ T δ k. (.11 k γ k Further, substituting this into the BFGS updating formula (1.5 yields [ H k+1 = H k + α 1 δ k δ T k+1 + δ k+1 δ T k k+1 δ T + 1 α 1 k + α 1 gk+1 T δ ] k+1 k+1 k γ k δ T k γ k δ k δ T k δ T k γ k. (.1 The above new updating formula requires the quantity δ k+1, that depends on H k+1 itself, and hence has theoretical meanings only. Lemma.1. Assume that (.6 holds. Then for all k 1 and i 0, the vector H k g k+i + α 1 k+i δ k+i belongs to the subspace spanned by δ k, δ k+1,..., δ k+i 1 ; namely, H k g k+i + α 1 k+i δ k+i Span{δ k, δ k+1,..., δ k+i 1 }. (.13 Proof. For convenience, we write (.1 as H k+1 = H k + V (s k, s k+1, (.14 where V (s k, s k+1 means the rank-two matrix in the right hand of (.1. Therefore we have for all i 1, H k+i = H k + i V (s k+j 1, s k+j. (.15 j=1 The statement follows by multiplying the above relation with g k+i and using (.7 with k replaced by k + i 1 and g T k+i δ k+i 1 = 0. Q.E.D. Lemma.. To construct a desired example with Det (S 1 0, we must have that α k = 1, for all k 4. Proof. Define the following matrices G k = [ γ k 1 g k g k+1 g k+ ], S k = [ δ k 1 δ k δ k+1 δ k+ ]. 7

8 By H k γ k 1 = δ k 1 and Lemma.1, we can get that Det(H k Det(G k = Det [ ] H k γ k 1 H k g k H k g k+1 H k g k+ = Det [ δ k 1 α 1 k δ k α 1 k+1 δ k+1 α k+ δ ] k+ = α 1 k α 1 k+1 α 1 k+ Det(S k. (.16 Replacing k with k + 1 in the above yields Det(H k+1 Det(G k+1 = α 1 k+1 α 1 k+ α 1 k+3 Det(S k+1. (.17 On the other hand, due to the special forms of {g k } and {δ k }, we know that G k+1 = P G k and S k+1 = MS k. Hence Det(G k+1 = Det(P Det(G k = t Det(G k, Det(S k+1 = Det(M Det(S k = t Det(S k. (.18 Due to the basic determinant relation of the BFGS update, (.6 and (.7, it is not difficult to see that Det(H k+1 = δt k H 1 k δ k δ T Det(H k = α k Det(H k. (.19 k γ k If Det(S 1 0, (.16 with k = 1 implies that Det(G 1 0. Then by (.18, Det(G k 0, Det(S k 0, for all k 1. (.0 Dividing (.17 by (.16 and using the above relations, we then obtain α k+3 = 1. So the statement holds due to the arbitrariness of k 1. Q.E.D. The deletion of the first finite iterations does not influence the whole example. Thus we will ask our counter-example to satisfy and Det(S 1 0 (.1 α k 1. (. In this case, the updating formula (.1 of H k can be simplified as H k+1 = H k + δ kδ T k+1 + δ k+1 δ T k δ T gt k+1 δ k+1 k γ k gk T δ k δ k δ T k δ T k γ k. (.3 8

9 .3 Consistency Conditions Now we ask what else conditions, besides the requirement of unit stepsizes, have to be satisfied by the parameters in the definitions of {g k } and {δ k } so that the steps {s k ; k 1} can be generated by the BFGS update. Our early idea is based on the observation on the updating formula (1.5 that H k+1 is linear with H k and the linear system H 8j+9 = diag(t 4 E, t 4 E H 8j+1 diag(t 4 E, t 4 E, (.4 where E is the two-dimensional identity matrix. However, this way seems to be quite complicated. Notice that the dimension is n = 4 and by Lemma., the assumption (.1 implies (.0. Then the matrix H k+1 can be uniquely defined by the equations given by H k+1 γ k, H k+1 g k+1, H k+1 g k+ and H k+1 g k+3. As a matter of fact, we have that H k+1 γ k = δ k (the secant equation, (.5 H k+1 g k+1 = δ k+1 ( by (.7 and α k 1, (.6 H k+1 g k+ = δ k+ + gt k+ δ k+ gk+1 T δ δ k+1 (by (.10 and α k 1, (.7 k+1 ( g T H k+1 g k+3 = δ k+3 + k+3 δ k+3 gk+ T δ + gt k+3 δ k+1 k+ gk+1 T δ δ k+ k+1 ( ( g T k+ δ k+ g T k+3 δ k+1 gk+1 T δ k+1 gk+1 T δ δ k+1. (.8 k+1 The last equality is obtained by multiplying (.3 by g k+, using (.7 and finally replacing k with k + 1. Relations (.5-(.8 provide a system of 16 equations, while the symmetric matrix H k+1 only has 10 independent entries. How to ensure that this linear system has a symmetric solution H k+1? We have the following general lemma. Lemma.3. Assume that {u 1, u,..., u n } and {v 1, v,..., v n } are two sets of n-dimensional linearly independent vectors. Then there exists a symmetric matrix H R n n satisfying Hu i = v i, i = 1,,..., n (.9 if and only if u T i v j = u T j v i, i, j = 1,,..., n. (.30 Proof. The only if part. If H = H T satisfies (.9, we have for all i, j = 1,,..., n, u T i v j = u T i Hu j = u T i HT u j = (Hu i T u j = v T i u j. The if part. Assume that (.30 holds. Defining the matrices U = (u 1 u... u n, V = (v 1 v... v n, (.31 9

10 direct calculations show that U T HU = U T V = u T 1 v 1 u T 1 v u T 1 v n u T v 1 u T v u T v n u T n v 1 u T n v u T n v n := A. (.3 By (.30, A is symmetric. So H = U T AU 1 satisfies H = H T and (.9. This completes our proof. Q.E.D. By the above lemma, the following six conditions are sufficient for the linear system (.5-(.8 to allow a symmetric solution matrix H k+1. g T k+1 (H k+1y k = (H k+1 g k+1 T y k, g T k+ (H k+1y k = (H k+1 g k+ T y k, g T k+3 (H k+1y k = (H k+1 g k+3 T y k, g T k+ (H k+1g k+1 = (H k+1 g k+ T g k+1, g T k+3 (H k+1g k+1 = (H k+1 g k+3 T g k+1, g T k+3 (H k+1g k+ = (H k+1 g k+3 T g k+. Further, considering the whole sequence of {H k+1 ; k 1} and combining the line search condition g T k+1 δ k = 0, we can deduce the following four consistency conditions, which should be satisfied for all k 1, g T k+1δ k = 0, (.33 δ T k+1y k = 0, (.34 gk+δ T k = δ T k+y k, (.35 ( g gk+3δ T k = δ T T k+3y k + k+3 δ k+3 gk+ T δ + gt k+3 δ k+1 k+ gk+1 T δ δ T k+y k. (.36 k+1 The positive definiteness of the matrix H k+1 will be further considered in.5..4 Choosing The Parameters In this subsection we focus on how to choose the decay parameter t and suitable vectors δ 1 and g 1 such that the consistency conditions (.33, (.34, (.35 and (.36 hold with k = 1. Due to the special structure of this example, we then know that the consistency conditions hold for all k. Define v = ( T, where 1 = l 1 η 1 + h 1 ξ 1, = h 1 η 1 l 1 ξ 1, 3 = c 1 γ 1 + d 1 τ 1, 4 = d 1 γ 1 c 1 τ 1. (.37 As will be seen, the first three consistent conditions provide an under-determined linear system with v, from which we can get a general solution by the Cramer s rule. The substitution of v into the fourth condition (.36, which is nonlinear, yields a desired value for the decay parameter t. 10

11 At first, the condition δ T 1 g = δ T 1 P g 1 = 0, that is (.33 with k = 1, asks [t cos θ 1 t sin θ 1 cos θ sin θ ] v = 0. (.38 The condition δ T γ 1 = δ T 1 M T (P Ig 1 = δ T 1 (ti M T g 1 = 0, that is (.34 with k = 1, requires the vector v to satisfy [t cos θ 1 sin θ 1 t(1 cos θ t sin θ ] v = 0. (.39 The requirement (.35 with k = 1, that is, 0 = δ T 1 g 3 + δ T 3 γ 1 = δ T 1 P g 1 + δ T 1 (M T (P Ig 1 = δ T 1 [P + tm T (M T ]g 1, yields the equation [ t cos θ1 + (t 1 cos θ 1 t sin θ 1 (1 + t sin θ 1 t (cos θ cos θ + cos θ t (sin θ sin θ sin θ ] v = 0. (.40 By (.38, (.39, (.40 and the choices of θ 1 and θ, we see that { i } must satisfy t t t (1 + t t t t (1 + t t ( + 1t By the Cramer s rule, to meet (.41, we may choose { i } as follows: 1 = ± [ ( + t 3 + (1 + t + 1 ], = ± [ ( + t 3 + (1 + t 1 ], 3 = ± [ t 3 + t + (1 t + 1 ], 4 = ± [ t 3 t + (1 + t 1 ]. = 0. (.41 (.4 Since g1 T δ 1 = = ±(t + 1(t + 1 and 0 < t < 1, we choose all the signs in (.4 as so that g1 T δ 1 < 0. We now want to substitute the general solution to the fourth consistent condition (.36. Noting that δ T 3 γ 1 = g3 T δ 1 and g4 T δ 4 = t g3 T δ 3, we know that (.36 with k = 1 is equivalent to [ ] g4 T δ 1 + g T δ 4 g1 T δ 4 + g3 T δ 1 t + gt 4 δ g T δ = 0. (.43 Further, using g T δ = t g T 1 δ 1, g T 4 δ = t g T 3 δ 1 and g T δ 4 = t g T 1 δ 3, (.43 can be simplified as g T 1 δ 1 [ g T 4 δ 1 g T 1 δ 4 + t(g T 3 δ 1 + g T 1 δ 3 ] + (g T 3 δ 1 = 0. (.44 11

12 Consequently, we obtain the following equation for the parameter t: where (t + 1 (t + 1 [ p (t + tp(t q(t ] = 0, (.45 p(t = ( + t + ( + t 1, q(t = ( + 3 t 3 (4 + 3 t + (3 + t. Further calculations provide ( [ p (t + tp(t q(t ] = [ t + (1 t + (1 ] [t + (1 3 t + ( ]. Considering the requirement that t ( 1, 1, one can deduce the following quadratic equation from (.45, ( t ( t = 0. (.46 The above equation has a unique root of t in the interval ( 1, 1, t = (.47 The numerical value of t is equal to approximately. Therefore if we choose the above t and the vectors δ 1 and g 1 such that (.4 holds with minus sign, then the prefixed steps and gradients satisfy the consistency conditions (.33, (.34, (.35 and (.36 for all k 1. There are many ways to choose δ 1 and g 1 satisfying (.4. Specifically, we choose ( η1 = ξ 1 ( 0, ( c1 = d 1 ( 0. (.48 In this case, by (.4 with minus signs and the definitions of i s, we can get that ( ( γ1 (17 8 t + ( = τ 1 ( t + (17 9, ( ( (.49 l1 ( 4 13 t + ( = h 1 ( 4 13 t + ( The initial step δ 1 and the initial gradient g 1 are then determined..5 Quasi-Newton Updating Matrices In this subsection we discuss the form of the BFGS quasi-newton updating matrices implied by (.5-(.8 under the consistent conditions (.33-(.36. A byproduct is that, we will know how to choose the initial matrix H 1 for the example. For this aim, we provide a follow-up lemma of Lemma.3. 1

13 Lemma.4. Assume that (.30 holds and the matrix H satisfies (.9. If further, the matrix A in (.3 is positive definite, there must exist nonsingular triangular matrices T 1 and T such that A = V T U = T T 1 T. (.50 Further, denoting ˆV = V T 1 = (ˆv 1, ˆv,..., ˆv n and the diagonal matrix T 1 1 T 1 = diag(t 1, t,..., t n, we have that H = n t i ˆv iˆv i T. (.51 i=1 Proof. The truth of (.50 is obvious by the Cholesky factorization of positive definite matrices. Further, by (.50 and the definition of ˆV, we get that Since HU = V, we obtain H = V U 1 = ( ( ˆV T 1 1 ˆV T U = T. T 1 ˆV T = ˆV ( T1 1 T 1 ˆV T, which with the definitions of ˆv i s and t i s implies the truth of (.51. Q.E.D. By the above lemma, we can express the form of H k+1 determined by (.5- (.6 under the consistent conditions (.33-(.36. At first, we make use of the steps {δ k+i ; i = 0, 1,, 3} to introduce the following two vectors that are orthogonal to γ k and g k+1 : z k = δt k+γ k δ T k γ k δ k δt k+g k+1 δ T k+1g k+1 δ k+1 + δ k+, w k = δt k+3γ k δ T k γ k δ k δt k+3g k+1 δ T k+1g k+1 δ k+1 + δ k+3. (.5 The vectors z k and w k are well defined because δ T k γ k = δ T k g k > 0 is positive due to the descent property of δ k. Direct calculations show that z T k γ k = z T k g k+1 = 0, w T k γ k = w T k g k+1 = 0. (.53 Further, we use z k and w k to define the vector that is orthogonal to g k+ : By the choice of v k, it is easy to see that v k = wt k g k+ z T k g k+ z k + w k. (.54 v T k γ k = v T k g k+1 = v T k g k+ = 0. (.55 13

14 Now we could express the matrix H k+1 by H k+1 = δ kδ T k δ T k g k δ k+1δ T k+1 δ T k+1g k+1 z kz T k z T k g k+ v kv T k v T k g k+3. (.56 Since δ T k g k < 0 for all k 1, we know that the matrix H k+1 is positive definite if and only if z T k g k+ < 0 and v T k g k+3 < 0. Direct calculations show that z T 1 g 3 = ( t ( < 0, v T 1 g 4 = ( t + ( < 0. Therefore we know that the matrix H is positive definite. In addition, it is difficult to build the relations δ T k+1g k+1 = t δ T k g k, z T k+1 g k+3 = t z T k g k+, v T k+1 g k+4 = t v T k g k+3, z k+1 = Mz k and v k+1 = M v k for all k 1. Thus we have by (.56 and δ k+1 = M δ k that H k+1 = 1 t MH km T, (.57 holds for all k and hence {H k : k } are all positive definite. With the above procedure, we can calculate the values of z 1, v 1 and H. Further, by (.3, we can obtain the initial matrix H 1 required by the example of this paper. H 1 = H δ 1δ T + δ δ T 1 δ T 1 γ 1 + gt δ g T 1 δ 1 δ 1 δ T 1 δ T 1 γ 1 := H , (.58 where H 1 is a symmetric matrix with entries H 1 (1, 1 = ( t + ( , H 1 (1, = ( t ( , H 1 (1, 3 = ( t ( , H 1 (1, 4 = ( t + ( H 1 (, = ( t + ( , H 1 (, 3 = 1068 t ( , H 1 (, 4 = ( t ( , H 1 (3, 3 = ( t + ( , H 1 (3, 4 = ( t + ( , H 1 (4, 4 = ( t + ( (.59 Direct calculations show that (.57 also holds with k = 1 and hence H 1 is a positive definite matrix. 14

15 3 Constructing A Suitable Objective Function 3.1 Recovering The Iterations and The Function Values Denote the rotation matrices ( cos θ1 sin θ R 1 = 1, R sin θ 1 cos θ = 1 ( cos θ sin θ sin θ cos θ. (3.1 If we write the steps {δ k } and gradients {g k } into the forms η i t 8j l i δ 8j+i = ξ i t 8j γ i, g 8j+i = t 8j h i c i ; i = 1,..., 8, (3. t 8j τ i d i we have from (.1, (., (.4 and (.5 that ( ( ηi+1 ηi = R ξ 1, i+1 ξ i ( ( li+1 li = t R h 1, i+1 h i ( γi+1 τ i+1 ( γi = t R, τ i ( ( ci+1 ci = R d. i+1 d i (3.3 For simplicity, we want the iterations {x k } asymptotically to turn around the eight vertices of some regular octagon Ω of the subspace spanned by the first and second coordinates. More exactly, such a regular octagon Ω has the origin as its center and its eight vertices V i are given by V i = (a i, b i, 0, 0 T with ( = b 1 1, ( a1 ( ai+1 b i+1 ( ai = R 1. (3.4 b i Here we should note that if there is no confusion, we also regard that Ω and V i s are defined in the subspace spanned by the first and second coordinates. To this aim, we ask the iterations {x k } to be of the form a i x 8j+i = b i t 8j p i, (3.5 t 8j q i where ( pi+1 q i+1 To decide the values of p 1 and q 1, noting that x 1 V 1 = x 1 lim j x 8j+1 = ( pi = t R. (3.6 q i 15 δ i, i=0

16 we have that ( p1 q 1 = i=0 ( γi τ i ( = (E tr 1 γ1, (3.7 τ 1 where again E is the two-dimensional identity matrix. Direct calculations show that ( p1 = 1 ( ( t + ( q ( t + ( (3.8 Now, let us assume that the limit of f(x k is f. Since for any j, the first two coordinates of {x 8j+i } always keep the same as those of its limit V i, we can get that ( T ( f(x 8j+i f = t 8j ci pi d i q i where = t 8j [ R i 1 ( T ( = t 8j+i 1 c1 p1 d 1 q 1 = t 8j+i 1 (f(x 1 f, ( c1 d 1 ] T [ (tr i 1 ( p1 q 1 ] f(x 1 f = c 1 p 1 + d 1 q 1 = ( t + ( Now we see the value of g T 8j+i δ 8j+i. Direct calculations show that t 8j T T l i η i l i η i g8j+i T δ 8j+i = t 8j h i ξ i c i t 8j γ i = t8j h i ξ i c i γ i d i t 8j τ i d i τ i = t 8j+i 1 (l 1 η 1 + h 1 ξ 1 + c 1 γ 1 + d 1 τ 1 = t 8j+i 1 g1 T δ 1, where g T 1 δ 1 = l 1 η 1 + h 1 ξ 1 + c 1 γ 1 + d 1 τ 1 = ( t + ( Therefore f(x 8j+i+1 f(x 8j+i α 8j+i g T 8j+iδ 8j+i = (f(x 8j+i+1 f (f(x 8j+i f g T 8j+iδ 8j+i = (t8j+i t 8j+i 1 (f(x 1 f t 8j+i 1 g T 1 δ 1 = (t 1(f(x 1 f g T 1 δ E 0. (3.9 16

17 The above relation, together with g T 8j+i+1 δ 8j+i = 0, implies that the stepsize α 8j+i = 1 can be accepted by the Wolfe line search, the Armijo line search and the Goldstein line search with suitable line search parameters. 3. Seeking A Suitable Form for The Objective Function We now consider how to construct a smooth function f such that its gradient at any point x 8j+i given in (3.5 are the one given in (3.; namely, f(x 8j+i = g 8j+i, for all j 0 and i = 1,..., 8. (3.10 To this aim, we assume that f is of the form f(x 1, x, x 3, x 4 = λ(x 1, x x 3 + µ(x 1, x x 4, (3.11 where λ and µ are two-dimensional functions to be determined. Since the function (3.11 is linear with x 3 and x 4, we know that f has no lower bound in R n. Consequently, for the sequence {x k } generated by some optimization method for the minimization of this function, it is expected that f(x k tends to, but this will not happen in our examples. With the prefixed form (3.11, we have that f(x 1, x, x 3, x 4 = λ x 1 x 3 + µ x 1 x 4 λ x x 3 + µ x x 4 λ(x 1, x µ(x 1, x. (3.1 Comparing the last two components of the right hand vector in (3.1 with the gradients in (3., we must have that λ(v i = c i, µ(v i = d i. (3.13 Consequently, by (.48 and (3.3, we can obtain the concrete values of λ and µ at the eight vertices of Ω, which are listed in the second and third rows in Table 3.1. Further, for each i, let J i be the Jacobian of (λ, µ at vertex V i and denote J i = ( λ x 1 λ x 1 µ µ x x Vi, J 1 = ( ω1 ω. (3.14 ω 3 ω 4 The comparison of the first two components of the right hand vector in (3.1 with the gradients in (3. leads to the relation ( ( pi li J i =. (3.15 q i To be such that (3.15 always holds, we ask J i+1 and J i to meet the condition h i J i+1 = R 1 J i R T (

18 for all i 1. In this case, if (3.15 holds for some i, we have by this, (3.6 and (3.3 that ( ( ( ( ( pi+1 J i+1 = (R q 1 J i R T pi pi li li+1 (tr = tr i+1 q 1 J i = tr i q 1 =, i h i h i+1 which means that (3.15 holds with i + 1. Therefore by the induction principle, to be such that (3.15 holds for all i 1, it remains to choose J 1 such that (3.15 holds with i = 1. Noticing that there are still two degrees of freedom, we ask the special relations ω = ω 3, ω 4 = ω 1. (3.17 Then the values of ω 1 and ω 3 can be solved from (3.15 with i = 1 and (3.17, ω 1 = ( t ( , ω 3 = ( t + ( (3.18 Therefore if we choose J 1 to be the one in (3.14 with the values in (3.17 and (3.18 and ask the relation (3.16 for all i 1, then we will have (3.15 for all i 1. Now, using (3.13, (3.16 and (3.17, we can list the function values and gradients of λ and µ at the vertices V i s of the octagon Ω into Table 3.1. V 1 V V 3 V 4 V 5 V 6 V 7 V 8 λ µ λ x 1 ω 1 ω 1 ω 1 -ω 1 ω 1 ω 1 ω 1 ω 1 λ x ω 3 ω 3 ω 3 ω 3 ω 3 ω 3 ω 3 ω 3 µ x 1 ω 3 ω 3 ω 3 ω 3 ω 3 ω 3 ω 3 ω 3 µ x ω 1 ω 1 -ω 1 ω 1 ω 1 ω 1 ω 1 ω 1 Table 3.1. Function values and gradients of λ and µ at V i s By using the special choice of Ω and observing the values in Table 3.1, we can ask the functions λ and µ to have the properties and µ(x 1, x = λ( x, x 1 (3.19 λ(x 1, x = λ( x 1, x (3.0 for all (x 1, x R. Further, by (3.0, we could think of constructing the function λ by a polynomial function with only odd orders. In the next subsection, we will construct a simple function that only meets the requirements (a and (c that are described in the first section. In 3.4, we construct a complicated function that can meet all the requirements (a, (b and (c. 18

19 3.3 A Simple Function that Meets (a and (c If we ignore the convexity requirement (b of the line search function, we can construct a relatively simple function λ(x 1, x to meet the interpolation conditions listed in Table 3.1 and the function µ(x 1, x is then given by (3.19. In fact, we can check that the following function λ f (x 1, x =(15 1 x (6 9 (x 4 1 x + x 1x 3 + ( x 3 1 +( 1 + (6x 1x + x 3 + ( x1 9 8 x. has values of 0, 1,, 1 at vertices V 1, V, V 3, V 4, respectively, and zero derivatives at all the vertices V i s. Further, we can check the following function λ g (x 1, x = 3 (x 1 1[x 1 (3 + ]x 4 has zero function values at all the vertices V i s, but could be used to interpolate the derivatives of λ(x 1, x at V i s. Consequently, we know that λ(x 1, x = λ f (x 1, x + ω 1 λ g (x 1, x + ω 3 λ g ( x, x 1 (3.1 meets all the interpolation conditions required by λ(x 1, x in Table 3.1. Consequently, we could say that for the function defined by (3.11, (3.1 and (3.19, the BFGS method with the Wolfe line search using the unit initial step does not converge. Observing that Ψ i (α = tψ i (α, Ψ i(α = tψ i(α at α = 0, 1, we ideally wish there is the following relation between Ψ i (α and Ψ i+1 (α: Ψ i+1 (α = t Ψ i (α, for all α 0 (3. so that the neighboring line search functions only differ up to a constant multiplier of t. However, the choice (3.1 of λ(x 1, x does not lead to (3.. Nevertheless, we can propose the following compensation function and define λ c (x 1, x = x 1 [ x 1 + x ( + ] ˆλ(x 1, x = λ(x 1, x + c 1 λ c (x 1, x + c λ c ( x, x 1, (3.3 where c 1 = ( t ( , c = ( t + (

20 In this case, the relation (3. will always hold and the corresponding line search function at the first iteration is 6 ˆΨ 1 (α = ρ i α i, (3.4 where i=0 ρ 6 = ( t + ( , ρ 5 = ( t ( , ρ 4 = ( t + ( , ρ 3 = ( t ( , ρ = ( t + ( , ρ 1 = g T 1 δ 1 = ( t + (40 18, ρ 0 = f(x 1 = ( t + ( To sum up at this stage, for the function (3.11 with λ(x 1, x given in (3.1 or (3.3 and µ(x 1, x = λ( x, x 1, if the initial point is x 1 = (a 1, b 1, p 1, q 1 T (see (3.4, (3.8, (.47 for their values and if the initial matrix is H 1 given by (.58 and (.59, then the BFGS method (1. and (1.5 with α k 1 will generate the iterations in (3.5 whose gradients are given by (.4-(.5. Therefore the method will asymptotically cycle around the eight vertices of a regular octagon without approaching a stationary point or pushing f(x k. 3.4 A Complicated Function that Meets (a, (b and (c To meet the requirement (b, this subsection gives a further compensation to the function ˆλ(x 1, x in (3.3 such that each line search function Ψ k (α is strongly convex. To this aim, we consider the straight line connecting x 8j+i and x 8j+i+1 : a i + α η i l i (α = x 8j+i + α δ 8j+i = b i + α ξ i t 8j (p i + α γ i. (3.5 t 8j (q i + α τ i By (3.1, the line search function at the (8j + i-th iteration is Ψ 8j+i (α = t 8j[ λ(a i +α η i, b i +α ξ i ( p i +α γ i +µ(ai +α η i, b i +α ξ i ( q i +α τ i ]. (3.6 0

### (Quasi-)Newton methods

(Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

### t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

### The Characteristic Polynomial

Physics 116A Winter 2011 The Characteristic Polynomial 1 Coefficients of the characteristic polynomial Consider the eigenvalue problem for an n n matrix A, A v = λ v, v 0 (1) The solution to this problem

### Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

### IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL. 1. Introduction

IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL R. DRNOVŠEK, T. KOŠIR Dedicated to Prof. Heydar Radjavi on the occasion of his seventieth birthday. Abstract. Let S be an irreducible

### Summary of week 8 (Lectures 22, 23 and 24)

WEEK 8 Summary of week 8 (Lectures 22, 23 and 24) This week we completed our discussion of Chapter 5 of [VST] Recall that if V and W are inner product spaces then a linear map T : V W is called an isometry

### Inner Product Spaces and Orthogonality

Inner Product Spaces and Orthogonality week 3-4 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,

### Applications to Data Smoothing and Image Processing I

Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is

### DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

### Introduction to Flocking {Stochastic Matrices}

Supelec EECI Graduate School in Control Introduction to Flocking {Stochastic Matrices} A. S. Morse Yale University Gif sur - Yvette May 21, 2012 CRAIG REYNOLDS - 1987 BOIDS The Lion King CRAIG REYNOLDS

### Mathematics Course 111: Algebra I Part IV: Vector Spaces

Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are

### Outline. Optimization scheme Linear search methods Gradient descent Conjugate gradient Newton method Quasi-Newton methods

Outline 1 Optimization without constraints Optimization scheme Linear search methods Gradient descent Conjugate gradient Newton method Quasi-Newton methods 2 Optimization under constraints Lagrange Equality

### 1 VECTOR SPACES AND SUBSPACES

1 VECTOR SPACES AND SUBSPACES What is a vector? Many are familiar with the concept of a vector as: Something which has magnitude and direction. an ordered pair or triple. a description for quantities such

### Chapter 15 Introduction to Linear Programming

Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of

### LINEAR ALGEBRA W W L CHEN

LINEAR ALGEBRA W W L CHEN c W W L Chen, 1997, 2008 This chapter is available free to all individuals, on understanding that it is not to be used for financial gain, and may be downloaded and/or photocopied,

### Classification of Cartan matrices

Chapter 7 Classification of Cartan matrices In this chapter we describe a classification of generalised Cartan matrices This classification can be compared as the rough classification of varieties in terms

### EXPLICIT ABS SOLUTION OF A CLASS OF LINEAR INEQUALITY SYSTEMS AND LP PROBLEMS. Communicated by Mohammad Asadzadeh. 1. Introduction

Bulletin of the Iranian Mathematical Society Vol. 30 No. 2 (2004), pp 21-38. EXPLICIT ABS SOLUTION OF A CLASS OF LINEAR INEQUALITY SYSTEMS AND LP PROBLEMS H. ESMAEILI, N. MAHDAVI-AMIRI AND E. SPEDICATO

### We call this set an n-dimensional parallelogram (with one vertex 0). We also refer to the vectors x 1,..., x n as the edges of P.

Volumes of parallelograms 1 Chapter 8 Volumes of parallelograms In the present short chapter we are going to discuss the elementary geometrical objects which we call parallelograms. These are going to

### Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

### ISOMETRIES OF R n KEITH CONRAD

ISOMETRIES OF R n KEITH CONRAD 1. Introduction An isometry of R n is a function h: R n R n that preserves the distance between vectors: h(v) h(w) = v w for all v and w in R n, where (x 1,..., x n ) = x

### 13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in three-space, we write a vector in terms

Quadratic Functions, Optimization, and Quadratic Forms Robert M. Freund February, 2004 2004 Massachusetts Institute of echnology. 1 2 1 Quadratic Optimization A quadratic optimization problem is an optimization

### Notes on Symmetric Matrices

CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.

### On the representability of the bi-uniform matroid

On the representability of the bi-uniform matroid Simeon Ball, Carles Padró, Zsuzsa Weiner and Chaoping Xing August 3, 2012 Abstract Every bi-uniform matroid is representable over all sufficiently large

### Recall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n.

ORTHOGONAL MATRICES Informally, an orthogonal n n matrix is the n-dimensional analogue of the rotation matrices R θ in R 2. When does a linear transformation of R 3 (or R n ) deserve to be called a rotation?

### MATH 590: Meshfree Methods

MATH 590: Meshfree Methods Chapter 7: Conditionally Positive Definite Functions Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2010 fasshauer@iit.edu MATH 590 Chapter

### x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.

Cross product 1 Chapter 7 Cross product We are getting ready to study integration in several variables. Until now we have been doing only differential calculus. One outcome of this study will be our ability

### AN ITERATIVE METHOD FOR THE HERMITIAN-GENERALIZED HAMILTONIAN SOLUTIONS TO THE INVERSE PROBLEM AX = B WITH A SUBMATRIX CONSTRAINT

Bulletin of the Iranian Mathematical Society Vol. 39 No. 6 (2013), pp 129-1260. AN ITERATIVE METHOD FOR THE HERMITIAN-GENERALIZED HAMILTONIAN SOLUTIONS TO THE INVERSE PROBLEM AX = B WITH A SUBMATRIX CONSTRAINT

### Matrix Algebra LECTURE 1. Simultaneous Equations Consider a system of m linear equations in n unknowns: y 1 = a 11 x 1 + a 12 x 2 + +a 1n x n,

LECTURE 1 Matrix Algebra Simultaneous Equations Consider a system of m linear equations in n unknowns: y 1 a 11 x 1 + a 12 x 2 + +a 1n x n, (1) y 2 a 21 x 1 + a 22 x 2 + +a 2n x n, y m a m1 x 1 +a m2 x

### Ideal Class Group and Units

Chapter 4 Ideal Class Group and Units We are now interested in understanding two aspects of ring of integers of number fields: how principal they are (that is, what is the proportion of principal ideals

### MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

### Absolute Value Programming

Computational Optimization and Aplications,, 1 11 (2006) c 2006 Springer Verlag, Boston. Manufactured in The Netherlands. Absolute Value Programming O. L. MANGASARIAN olvi@cs.wisc.edu Computer Sciences

### Similarity and Diagonalization. Similar Matrices

MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

### Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

(für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained

### ME128 Computer-Aided Mechanical Design Course Notes Introduction to Design Optimization

ME128 Computer-ided Mechanical Design Course Notes Introduction to Design Optimization 2. OPTIMIZTION Design optimization is rooted as a basic problem for design engineers. It is, of course, a rare situation

### The NEWUOA software for unconstrained optimization without derivatives 1. M.J.D. Powell

DAMTP 2004/NA05 The NEWUOA software for unconstrained optimization without derivatives 1 M.J.D. Powell Abstract: The NEWUOA software seeks the least value of a function F(x), x R n, when F(x) can be calculated

### MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

### Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number

### ON CERTAIN DOUBLY INFINITE SYSTEMS OF CURVES ON A SURFACE

i93 c J SYSTEMS OF CURVES 695 ON CERTAIN DOUBLY INFINITE SYSTEMS OF CURVES ON A SURFACE BY C H. ROWE. Introduction. A system of co 2 curves having been given on a surface, let us consider a variable curvilinear

### 2.3 Convex Constrained Optimization Problems

42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

### LINEAR ALGEBRA. September 23, 2010

LINEAR ALGEBRA September 3, 00 Contents 0. LU-decomposition.................................... 0. Inverses and Transposes................................. 0.3 Column Spaces and NullSpaces.............................

### Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

Chapter 1 Vocabulary identity - A statement that equates two equivalent expressions. verbal model- A word equation that represents a real-life problem. algebraic expression - An expression with variables.

### Numerical Methods for Solving Systems of Nonlinear Equations

Numerical Methods for Solving Systems of Nonlinear Equations by Courtney Remani A project submitted to the Department of Mathematical Sciences in conformity with the requirements for Math 4301 Honour s

### Homework One Solutions. Keith Fratus

Homework One Solutions Keith Fratus June 8, 011 1 Problem One 1.1 Part a In this problem, we ll assume the fact that the sum of two complex numbers is another complex number, and also that the product

### MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set.

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set. Vector space A vector space is a set V equipped with two operations, addition V V (x,y) x + y V and scalar

### Roots of Polynomials

Roots of Polynomials (Com S 477/577 Notes) Yan-Bin Jia Sep 24, 2015 A direct corollary of the fundamental theorem of algebra is that p(x) can be factorized over the complex domain into a product a n (x

### The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem

### 3. INNER PRODUCT SPACES

. INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.

### MATH 240 Fall, Chapter 1: Linear Equations and Matrices

MATH 240 Fall, 2007 Chapter Summaries for Kolman / Hill, Elementary Linear Algebra, 9th Ed. written by Prof. J. Beachy Sections 1.1 1.5, 2.1 2.3, 4.2 4.9, 3.1 3.5, 5.3 5.5, 6.1 6.3, 6.5, 7.1 7.3 DEFINITIONS

### MATH 2030: SYSTEMS OF LINEAR EQUATIONS. ax + by + cz = d. )z = e. while these equations are not linear: xy z = 2, x x = 0,

MATH 23: SYSTEMS OF LINEAR EQUATIONS Systems of Linear Equations In the plane R 2 the general form of the equation of a line is ax + by = c and that the general equation of a plane in R 3 will be we call

### December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation

### Integer Factorization using the Quadratic Sieve

Integer Factorization using the Quadratic Sieve Chad Seibert* Division of Science and Mathematics University of Minnesota, Morris Morris, MN 56567 seib0060@morris.umn.edu March 16, 2011 Abstract We give

### Vector and Matrix Norms

Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

### Diagonalisation. Chapter 3. Introduction. Eigenvalues and eigenvectors. Reading. Definitions

Chapter 3 Diagonalisation Eigenvalues and eigenvectors, diagonalisation of a matrix, orthogonal diagonalisation fo symmetric matrices Reading As in the previous chapter, there is no specific essential

### Duality of linear conic problems

Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least

### Direct Methods for Solving Linear Systems. Linear Systems of Equations

Direct Methods for Solving Linear Systems Linear Systems of Equations Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University

### Introduction to Matrix Algebra

Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

### Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Peter J. Olver 6. Eigenvalues and Singular Values In this section, we collect together the basic facts about eigenvalues and eigenvectors. From a geometrical viewpoint,

### Chapter 8. Matrices II: inverses. 8.1 What is an inverse?

Chapter 8 Matrices II: inverses We have learnt how to add subtract and multiply matrices but we have not defined division. The reason is that in general it cannot always be defined. In this chapter, we

### Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like

### Factorization Theorems

Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

### MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

### 1 Short Introduction to Time Series

ECONOMICS 7344, Spring 202 Bent E. Sørensen January 24, 202 Short Introduction to Time Series A time series is a collection of stochastic variables x,.., x t,.., x T indexed by an integer value t. The

### Section 1.1. Introduction to R n

The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to

### CONTROLLABILITY. Chapter 2. 2.1 Reachable Set and Controllability. Suppose we have a linear system described by the state equation

Chapter 2 CONTROLLABILITY 2 Reachable Set and Controllability Suppose we have a linear system described by the state equation ẋ Ax + Bu (2) x() x Consider the following problem For a given vector x in

### THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS

THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS KEITH CONRAD 1. Introduction The Fundamental Theorem of Algebra says every nonconstant polynomial with complex coefficients can be factored into linear

### By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

This material is posted here with permission of the IEEE Such permission of the IEEE does not in any way imply IEEE endorsement of any of Helsinki University of Technology's products or services Internal

### CS3220 Lecture Notes: QR factorization and orthogonal transformations

CS3220 Lecture Notes: QR factorization and orthogonal transformations Steve Marschner Cornell University 11 March 2009 In this lecture I ll talk about orthogonal matrices and their properties, discuss

### The Open University s repository of research publications and other research outputs

Open Research Online The Open University s repository of research publications and other research outputs The degree-diameter problem for circulant graphs of degree 8 and 9 Journal Article How to cite:

### Roots and Coefficients of a Quadratic Equation Summary

Roots and Coefficients of a Quadratic Equation Summary For a quadratic equation with roots α and β: Sum of roots = α + β = and Product of roots = αβ = Symmetrical functions of α and β include: x = and

### Mathematics Notes for Class 12 chapter 3. Matrices

1 P a g e Mathematics Notes for Class 12 chapter 3. Matrices A matrix is a rectangular arrangement of numbers (real or complex) which may be represented as matrix is enclosed by [ ] or ( ) or Compact form

### Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test

Math Review for the Quantitative Reasoning Measure of the GRE revised General Test www.ets.org Overview This Math Review will familiarize you with the mathematical skills and concepts that are important

### ASEN 3112 - Structures. MDOF Dynamic Systems. ASEN 3112 Lecture 1 Slide 1

19 MDOF Dynamic Systems ASEN 3112 Lecture 1 Slide 1 A Two-DOF Mass-Spring-Dashpot Dynamic System Consider the lumped-parameter, mass-spring-dashpot dynamic system shown in the Figure. It has two point

### Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing

### Math 215 HW #6 Solutions

Math 5 HW #6 Solutions Problem 34 Show that x y is orthogonal to x + y if and only if x = y Proof First, suppose x y is orthogonal to x + y Then since x, y = y, x In other words, = x y, x + y = (x y) T

### INTRODUCTORY LINEAR ALGEBRA WITH APPLICATIONS B. KOLMAN, D. R. HILL

SOLUTIONS OF THEORETICAL EXERCISES selected from INTRODUCTORY LINEAR ALGEBRA WITH APPLICATIONS B. KOLMAN, D. R. HILL Eighth Edition, Prentice Hall, 2005. Dr. Grigore CĂLUGĂREANU Department of Mathematics

### U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009. Notes on Algebra

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009 Notes on Algebra These notes contain as little theory as possible, and most results are stated without proof. Any introductory

### Bindel, Fall 2012 Matrix Computations (CS 6210) Week 8: Friday, Oct 12

Why eigenvalues? Week 8: Friday, Oct 12 I spend a lot of time thinking about eigenvalue problems. In part, this is because I look for problems that can be solved via eigenvalues. But I might have fewer

### 9 Matrices, determinants, inverse matrix, Cramer s Rule

AAC - Business Mathematics I Lecture #9, December 15, 2007 Katarína Kálovcová 9 Matrices, determinants, inverse matrix, Cramer s Rule Basic properties of matrices: Example: Addition properties: Associative:

### The Inverse of a Matrix

The Inverse of a Matrix 7.4 Introduction In number arithmetic every number a ( 0) has a reciprocal b written as a or such that a ba = ab =. Some, but not all, square matrices have inverses. If a square

### SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA)

SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI (B.Sc.(Hons.), BUAA) A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF MATHEMATICS NATIONAL

### CURVES WHOSE SECANT DEGREE IS ONE IN POSITIVE CHARACTERISTIC. 1. Introduction

Acta Math. Univ. Comenianae Vol. LXXXI, 1 (2012), pp. 71 77 71 CURVES WHOSE SECANT DEGREE IS ONE IN POSITIVE CHARACTERISTIC E. BALLICO Abstract. Here we study (in positive characteristic) integral curves

### 1 Sets and Set Notation.

LINEAR ALGEBRA MATH 27.6 SPRING 23 (COHEN) LECTURE NOTES Sets and Set Notation. Definition (Naive Definition of a Set). A set is any collection of objects, called the elements of that set. We will most

### PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include 2 + 5.

PUTNAM TRAINING POLYNOMIALS (Last updated: November 17, 2015) Remark. This is a list of exercises on polynomials. Miguel A. Lerma Exercises 1. Find a polynomial with integral coefficients whose zeros include

### 1 Introduction to Matrices

1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns

### SECRET sharing schemes were introduced by Blakley [5]

206 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 1, JANUARY 2006 Secret Sharing Schemes From Three Classes of Linear Codes Jin Yuan Cunsheng Ding, Senior Member, IEEE Abstract Secret sharing has

### Solving polynomial least squares problems via semidefinite programming relaxations

Solving polynomial least squares problems via semidefinite programming relaxations Sunyoung Kim and Masakazu Kojima August 2007, revised in November, 2007 Abstract. A polynomial optimization problem whose

### BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION

BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION Ş. İlker Birbil Sabancı University Ali Taylan Cemgil 1, Hazal Koptagel 1, Figen Öztoprak 2, Umut Şimşekli

### Determinants. Dr. Doreen De Leon Math 152, Fall 2015

Determinants Dr. Doreen De Leon Math 52, Fall 205 Determinant of a Matrix Elementary Matrices We will first discuss matrices that can be used to produce an elementary row operation on a given matrix A.

### Presentation 3: Eigenvalues and Eigenvectors of a Matrix

Colleen Kirksey, Beth Van Schoyck, Dennis Bowers MATH 280: Problem Solving November 18, 2011 Presentation 3: Eigenvalues and Eigenvectors of a Matrix Order of Presentation: 1. Definitions of Eigenvalues

### Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Chapter 10 Boundary Value Problems for Ordinary Differential Equations Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign

### Notes on Determinant

ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without

### Iterative Methods for Computing Eigenvalues and Eigenvectors

The Waterloo Mathematics Review 9 Iterative Methods for Computing Eigenvalues and Eigenvectors Maysum Panju University of Waterloo mhpanju@math.uwaterloo.ca Abstract: We examine some numerical iterative

### Nonlinear Iterative Partial Least Squares Method

Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for

### 24. The Branch and Bound Method

24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no

### Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Chapter 3 Linear Least Squares Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

### Practical Numerical Training UKNum

Practical Numerical Training UKNum 7: Systems of linear equations C. Mordasini Max Planck Institute for Astronomy, Heidelberg Program: 1) Introduction 2) Gauss Elimination 3) Gauss with Pivoting 4) Determinants