A Perfect Example for The BFGS Method


 Catherine Webster
 3 years ago
 Views:
Transcription
1 A Perfect Example for The BFGS Method YuHong Dai Abstract Consider the BFGS quasinewton method applied to a general nonconvex function that has continuous second derivatives. This paper aims to construct a fourdimensional example such that the BFGS method need not converge. The example is perfect in the following sense: (a All the stepsizes are exactly equal to one; the unit stepsize can also be accepted by various line searches including the Wolfe line search and the Arjimo line search; (b The objective function is strongly convex along each search direction although it is not in itself. The unit stepsize is the unique minimizer of each line search function. Hence the example also applies to the global line search and the line search that always picks the first local minimizer; (c The objective function is polynomial and hence is infinitely continuously differentiable. If relaxing the convexity requirement of the line search function; namely, (b, we are able to construct a relatively simple polynomial example. Keywords. unconstrained optimization, quasinewton method, nonconvex function, global convergence. 1 Introduction Consider the unconstrained optimization problem min f(x, x R n, (1.1 where f is a general nonconvex function that has continuous second derivatives. The quasinewton method is a class of wellknown and efficient methods for solving (1.1. It is of the iterative scheme x k+1 = x k α k H k g k, (1. This work was partially supported by the Chinese NSF grants and and the CAS grant kjcxyws703. State Key Laboratory of Scientific and Engineering Computing, Institute of Computational Mathematics and Scientific/Engineering Computing, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, P.O. Box 719, Beijing , P.R. China. 1
2 where x 1 is a starting point, H k is some approximation to the inverse Hessian [ f(x k ] 1, g k = f(x k, and α k is a stepsize obtained in some way. Defining the vectors δ k = x k+1 x k, γ k = g k+1 g k, the quasinewton method asks the next approximation matrix H k+1 to satisfy the secant equation H k+1 γ k = δ k. (1.3 This similarity to the identical equation [ f(x k+1 ] 1 γ k = δ k in the quadratic case enables the quasinewton method to be superlinearly convergent (see Dennis and Moré [5], for example and makes it very attractive in practical optimization. The first quasinewton method was dated back to Davidon [4] and Fletcher and Powell [9]. The DFP method updates the approximation matrix H k to H k+1 by the formula H k+1 = H k H kγ k γ T k H k γ T k H kγ k + δ kδ T k δ T k γ k. (1.4 Nowadays, the most efficient quasinewton method is perhaps the BFGS method, which was proposed by Broyden [], Fletcher [7], Goldfarb [10], and Shanno [4], independently. The matrix H k+1 in the BFGS method can be updated by the way H k+1 = H k δ kγ T k H k + H k γ k δ T ( k δ T γt k H kγ k δk δ T k k γ k δ T k γ k δ T. (1.5 k γ k To pass the positive definiteness of the matrix H k to H k+1, practical quasi Newton algorithms make use of the Wolfe line search to calculate the stepsize α k, which ensures a positive curvature to be found at each iteration; namely, δ T k γ k > 0. More exactly, defining the onedimensional line search function Ψ k (α = f(x k αh k g k, α 0, (1.6 the Wolfe line search consists in finding a stepsize α k such that and Ψ k (α k Ψ k (0 + µ Ψ k(0 α k (1.7 Ψ k(α k η Ψ k(0, (1.8 where µ and η are constants with 0 < µ η < 1. There have been a large quantity of research works devoting to the global convergence of the quasinewton method (see Yuan [8] and Sun and Yuan [6], for example. Specifically, Powell [18] showed that the DFP method with exact line searches is globally convergent for uniformly convex functions. In another paper [0], Powell established the global convergence of the BFGS method with the Wolfe line search for uniformly convex functions. This result was extended
3 by Byrd, Nocedal and Yuan [1] to the whole Broyden s convex family of methods except the DFP method. Therefore it is natural to ask the following question: Does the DFP method with the Wolfe line search converge globally for uniformly convex functions? On the other hand, since the BFGS method with the Wolfe line search works well for both convex and nonconvex functions, we might ask another question: Does the BFGS method with the Wolfe line search converge globally for general functions? The difficulty and importance of the two convergence problems has been addressed in many situations, including Nocedal [15], Fletcher [8], and Yuan [8]. Recent studies provide a negative answer to the convergence problem of the BFGS method for nonconvex functions. As a matter of fact, in an early paper [1], which analyzes the convergence properties of the conjugate gradient method, Powell mentioned that the BFGS method need not converge if the line search can pick any local minimizer of the line search function Ψ k (α. After further studies on the twodimensional example in [1], Dai [3] presented an example with six cycling points and showed by the example that the BFGS method with the Wolfe line search may fail for nonconvex functions. Later, Mascarenhas [14] constructed a threedimensional counterexample such that the BFGS method does not converge if the line search picks the global minimizer of the function Ψ k (α. It should be noted that the stepsize in the counterexample of [14] also satisfies the Wolfe line search conditions. However, neither examples are such that the stepsize is the first local minimizer of the line search function Ψ k (α. Surprisingly enough, if there are only two variables, and if the stepsize is chosen to be the first local minimizer of Ψ k (α; namely, α k = arg min {α > 0 : α is a local minimizer of Ψ k (α}, (1.9 Powell [] established the global convergence of the BFGS method for general twice continuously differentiable function. Powell s proof makes use of the principle of contradiction and is quite sophisticated. Assuming the nonconvergence of the method and the relation lim inf g k 0, Powell showed that the limit points of the BFGS path k P = {x : x lies on the line segment connecting x k and x k+1 for some k 1} (that is exactly the same as the PolakRibière conjugate gradient path if n = and gk+1 T δ k = 0 for all k forms some line segment L. The use of the specific line search (1.9 is such that the objective function is monotonically decreasing on the line segment L k that connects x k and x k+1. It follows that f(x lim f(x k for all x L. Consequently, the line segment L k cannot cross L k and all the points x k with large indices lies in the same side of L. Furthermore, Powell showed that starting around from one end of the line segment L, the BFGS path can not get any close to the other end of L and finally obtained a contradiction. Intuitively, it is difficult to extend Powell s result to the case when n 3. This is because, even if it has been shown that the BFGS path 3
4 tends to some limit line segment L, each line segment L k could turn around L providing that n 3, making the similar analysis of the BFGS path complicated and nearly impossible. The purpose of this paper is to construct a fourdimensional counterexample for the BFGS method with the following features: (a All the stepsizes in the example are exactly equal to one; namely, α k 1; (1.10 For the search direction defined by the BFGS update in the example, a unit stepsize satisfies various line search conditions, including the Wolfe conditions and the Armijo conditions; (b The objective function is strongly convex along each search direction; namely, the line search function Ψ k (α is strongly convex for all k, although the objective function is not in itself. The unit stepsize is the unique minimizer of Ψ k (α. Hence the example also applies to the global line search and the specific line search (1.9; (c The objective function is polynomial and hence is infinitely continuously differentiable. On the other hand, the objective function in our example is linear in the third and fourth variables and thus has no local minimizer. However, the iterations generated by the BFGS method tend to a nonstationary point. As seen from 3.4, the construction of the example is quite complicated. To be such that the line search function Ψ k (α is strongly convex, a polynomial of degree 38 is introduced. Nevertheless, if we relax the convexity requirement of Ψ k (α; namely, (b in the above, it is possible to construct a relatively simple polynomial example of low degree (see 3.3. The construction of our examples can be divided into four procedures: (1 Prefix the special forms of the steps {δ k } and gradients {g k }, leaving several parameters to be determined later. In this case, once x 1 is given, the whole sequence {x k } is fixed. As seen in.1, our steps {δ k } and gradients {g k } are asked to possess some symmetrical and cyclic properties and to push the iterations {x k } tend to the eight vertices of a regular octagon. This is very helpful in simplifying the construction of the examples and we can focus our attention on the choice of the parameter t, that answers for the decay of the last two components of δ k and the first two components of g k. ( To enable the BFGS method to generate those prefixed steps, investigate the consistency conditions on the steps {δ k } and gradients {g k }. With the prefixed forms of {δ k } and {g k }, we show in. that the unit stepsize is indispensable, whereas this stepsize is usually used as the first trial stepsize in the implementation of quasinewton methods. Four more consistency conditions on {δ k } and {g k } are obtained in.3 by expressing the vectors H k+1 γ k, H k+1 g k+1, H k+1 g k+ and H k+1 g k+3 by some δ k s and g k s instead of considering the quasinewton 4
5 matrix H k+1 itself. (3 Choose the parameters in some way to satisfy the consistency conditions and other necessary conditions on the objective function. As seen in.4, the first three consistency conditions are actually corresponding to some underdetermined linear system. Substituting its general solution by the Cramer s rule into the fourth consistency condition, we are led to a nonlinear equation from which the exact value can be obtained for the decay parameter t. By suitably choosing the other parameters, we can express all the quasinewton updating matrices including the initial choice for H 1 in.5. (4 Construct a suitable objective function f whose gradients are the preassigned values; namely, f(x k = g k for all k 1 and the line search has the desired properties. This will be done in the whole third section. To make full use of symmetrical and cyclic properties of the steps {δ k } and {g k }, we carefully choose a special form of the objective function. In addition, the introduction of element function φ in 3.4 helps us greatly to convexify the line search function and finally complete the perfect example that meets all the requirements (a, (b and (c. Some concluding remarks are given in the last section. Looking for Consistent Steps and Gradients.1 The Forms of Steps and Gradients Consider the case of four dimension. Inspired by [1], [3] and [14], we assume that the steps {δ k } have the following form: δ 1 = (η 1, ξ 1, γ 1, τ 1 T ; δ k+1 = M δ k (k 1, (.1 where M is the 4 4 matrix defined by ( cos θ1 sin θ 1 M = sin θ 1 cos θ 1 0 t 0 ( cos θ sin θ sin θ cos θ. (. In the above, t is a parameter satisfying 0 < t < 1. This parameter answers for the decay of the last two components of the steps {δ k }. The angles θ 1 and θ are chosen so that θ 1 = π m 1 and θ = π m for some positive integers m 1 and m. Specifically, we choose in this paper θ 1 = 1 4 π, θ = 3 π. (.3 4 Consequently, the first two components of the steps {δ k } turn to the same after every eight iterations, and the last two components will shrink at a factor of t 8 after every eight iterations. Further, we can see that the iterations {x k } will tend to turn around the eight vertices of some regular octagon (see 3.1 for more details. Accordingly, the gradients {g k } are assumed to be of the form g 1 = (l 1, h 1, c 1, d 1 T ; g k+1 = P g k (k 1, (.4 5
6 where P = ( cos θ1 sin θ t 1 sin θ 1 cos θ 1 0 ( cos θ 0 sin θ. (.5 sin θ cos θ Since 0 < t < 1, we see that the first two components of {g k } vanish, whereas the last two components of the gradient {g k }, turn to the same after eight iterations. It follows that none of the cluster points generated by the BFGS method are stationary points.. Unit Stepsizes It is well known that the unit stepsize is usually used as the first trial stepsize in practical implementations of the BFGS method. Under suitable assumptions on f, the unit stepsize will be accepted by the line search as the iterates tend to the solution and will enable a superlinear convergence step (see [5], for example. In the following, we are going to show that if the line search satisfies g T k+1δ k = 0, for all k 1, (.6 and if the steps generated by the BFGS method have the form (.1(. and the gradients have the form (.4(.5, then the use of unit stepsizes is also indispensable. To begin with, we see that the line search condition (.6, the secant equation (1.3 and the definition of the search direction indicate that H k+1 g k+1 = α 1 k+1 δ k+1 (.7 δ T k+1γ k = α k+1 g T k+1h k+1 γ k = α k+1 g T k+1δ k = 0. (.8 The above relation (.8 is sometimes called as the conjugacy condition in the context of nonlinear conjugate gradient methods. By multiplying the BFGS updating formula (1.5 with g k+1 and using (.6, which with (.7 gives H k+1 g k+1 = H k g k+1 γt k H kg k+1 δ T k γ k δ k, H k g k+1 = α 1 k+1 δ k+1 + γt k H kg k+1 δ T k γ k δ k. (.9 Multiplying (.9 by g T k and noticing gt k H kg k+1 = 0, we get that 0 = α 1 k+1 gt k δ k+1 + γt k H kg k+1 δ T gk T δ k = α 1 k+1 k γ gt k δ k+1 γ T k H k g k+1. k 6
7 Thus γ T k H kg k+1 = α 1 k+1 gt k δ k+1 = α 1 k+1 gt k+1 δ k+1. Hence, by (.9, H k g k+1 = α 1 k+1 δ k+1 α 1 gk+1 T δ k+1 k+1 δ T δ k. (.10 k γ k It follows from (.10 and (.7 with k replaced by k 1 that [ H k γ k = α 1 k+1 δ k+1 + α 1 k α 1 gk+1 T δ ] k+1 k+1 δ T δ k. (.11 k γ k Further, substituting this into the BFGS updating formula (1.5 yields [ H k+1 = H k + α 1 δ k δ T k+1 + δ k+1 δ T k k+1 δ T + 1 α 1 k + α 1 gk+1 T δ ] k+1 k+1 k γ k δ T k γ k δ k δ T k δ T k γ k. (.1 The above new updating formula requires the quantity δ k+1, that depends on H k+1 itself, and hence has theoretical meanings only. Lemma.1. Assume that (.6 holds. Then for all k 1 and i 0, the vector H k g k+i + α 1 k+i δ k+i belongs to the subspace spanned by δ k, δ k+1,..., δ k+i 1 ; namely, H k g k+i + α 1 k+i δ k+i Span{δ k, δ k+1,..., δ k+i 1 }. (.13 Proof. For convenience, we write (.1 as H k+1 = H k + V (s k, s k+1, (.14 where V (s k, s k+1 means the ranktwo matrix in the right hand of (.1. Therefore we have for all i 1, H k+i = H k + i V (s k+j 1, s k+j. (.15 j=1 The statement follows by multiplying the above relation with g k+i and using (.7 with k replaced by k + i 1 and g T k+i δ k+i 1 = 0. Q.E.D. Lemma.. To construct a desired example with Det (S 1 0, we must have that α k = 1, for all k 4. Proof. Define the following matrices G k = [ γ k 1 g k g k+1 g k+ ], S k = [ δ k 1 δ k δ k+1 δ k+ ]. 7
8 By H k γ k 1 = δ k 1 and Lemma.1, we can get that Det(H k Det(G k = Det [ ] H k γ k 1 H k g k H k g k+1 H k g k+ = Det [ δ k 1 α 1 k δ k α 1 k+1 δ k+1 α k+ δ ] k+ = α 1 k α 1 k+1 α 1 k+ Det(S k. (.16 Replacing k with k + 1 in the above yields Det(H k+1 Det(G k+1 = α 1 k+1 α 1 k+ α 1 k+3 Det(S k+1. (.17 On the other hand, due to the special forms of {g k } and {δ k }, we know that G k+1 = P G k and S k+1 = MS k. Hence Det(G k+1 = Det(P Det(G k = t Det(G k, Det(S k+1 = Det(M Det(S k = t Det(S k. (.18 Due to the basic determinant relation of the BFGS update, (.6 and (.7, it is not difficult to see that Det(H k+1 = δt k H 1 k δ k δ T Det(H k = α k Det(H k. (.19 k γ k If Det(S 1 0, (.16 with k = 1 implies that Det(G 1 0. Then by (.18, Det(G k 0, Det(S k 0, for all k 1. (.0 Dividing (.17 by (.16 and using the above relations, we then obtain α k+3 = 1. So the statement holds due to the arbitrariness of k 1. Q.E.D. The deletion of the first finite iterations does not influence the whole example. Thus we will ask our counterexample to satisfy and Det(S 1 0 (.1 α k 1. (. In this case, the updating formula (.1 of H k can be simplified as H k+1 = H k + δ kδ T k+1 + δ k+1 δ T k δ T gt k+1 δ k+1 k γ k gk T δ k δ k δ T k δ T k γ k. (.3 8
9 .3 Consistency Conditions Now we ask what else conditions, besides the requirement of unit stepsizes, have to be satisfied by the parameters in the definitions of {g k } and {δ k } so that the steps {s k ; k 1} can be generated by the BFGS update. Our early idea is based on the observation on the updating formula (1.5 that H k+1 is linear with H k and the linear system H 8j+9 = diag(t 4 E, t 4 E H 8j+1 diag(t 4 E, t 4 E, (.4 where E is the twodimensional identity matrix. However, this way seems to be quite complicated. Notice that the dimension is n = 4 and by Lemma., the assumption (.1 implies (.0. Then the matrix H k+1 can be uniquely defined by the equations given by H k+1 γ k, H k+1 g k+1, H k+1 g k+ and H k+1 g k+3. As a matter of fact, we have that H k+1 γ k = δ k (the secant equation, (.5 H k+1 g k+1 = δ k+1 ( by (.7 and α k 1, (.6 H k+1 g k+ = δ k+ + gt k+ δ k+ gk+1 T δ δ k+1 (by (.10 and α k 1, (.7 k+1 ( g T H k+1 g k+3 = δ k+3 + k+3 δ k+3 gk+ T δ + gt k+3 δ k+1 k+ gk+1 T δ δ k+ k+1 ( ( g T k+ δ k+ g T k+3 δ k+1 gk+1 T δ k+1 gk+1 T δ δ k+1. (.8 k+1 The last equality is obtained by multiplying (.3 by g k+, using (.7 and finally replacing k with k + 1. Relations (.5(.8 provide a system of 16 equations, while the symmetric matrix H k+1 only has 10 independent entries. How to ensure that this linear system has a symmetric solution H k+1? We have the following general lemma. Lemma.3. Assume that {u 1, u,..., u n } and {v 1, v,..., v n } are two sets of ndimensional linearly independent vectors. Then there exists a symmetric matrix H R n n satisfying Hu i = v i, i = 1,,..., n (.9 if and only if u T i v j = u T j v i, i, j = 1,,..., n. (.30 Proof. The only if part. If H = H T satisfies (.9, we have for all i, j = 1,,..., n, u T i v j = u T i Hu j = u T i HT u j = (Hu i T u j = v T i u j. The if part. Assume that (.30 holds. Defining the matrices U = (u 1 u... u n, V = (v 1 v... v n, (.31 9
10 direct calculations show that U T HU = U T V = u T 1 v 1 u T 1 v u T 1 v n u T v 1 u T v u T v n u T n v 1 u T n v u T n v n := A. (.3 By (.30, A is symmetric. So H = U T AU 1 satisfies H = H T and (.9. This completes our proof. Q.E.D. By the above lemma, the following six conditions are sufficient for the linear system (.5(.8 to allow a symmetric solution matrix H k+1. g T k+1 (H k+1y k = (H k+1 g k+1 T y k, g T k+ (H k+1y k = (H k+1 g k+ T y k, g T k+3 (H k+1y k = (H k+1 g k+3 T y k, g T k+ (H k+1g k+1 = (H k+1 g k+ T g k+1, g T k+3 (H k+1g k+1 = (H k+1 g k+3 T g k+1, g T k+3 (H k+1g k+ = (H k+1 g k+3 T g k+. Further, considering the whole sequence of {H k+1 ; k 1} and combining the line search condition g T k+1 δ k = 0, we can deduce the following four consistency conditions, which should be satisfied for all k 1, g T k+1δ k = 0, (.33 δ T k+1y k = 0, (.34 gk+δ T k = δ T k+y k, (.35 ( g gk+3δ T k = δ T T k+3y k + k+3 δ k+3 gk+ T δ + gt k+3 δ k+1 k+ gk+1 T δ δ T k+y k. (.36 k+1 The positive definiteness of the matrix H k+1 will be further considered in.5..4 Choosing The Parameters In this subsection we focus on how to choose the decay parameter t and suitable vectors δ 1 and g 1 such that the consistency conditions (.33, (.34, (.35 and (.36 hold with k = 1. Due to the special structure of this example, we then know that the consistency conditions hold for all k. Define v = ( T, where 1 = l 1 η 1 + h 1 ξ 1, = h 1 η 1 l 1 ξ 1, 3 = c 1 γ 1 + d 1 τ 1, 4 = d 1 γ 1 c 1 τ 1. (.37 As will be seen, the first three consistent conditions provide an underdetermined linear system with v, from which we can get a general solution by the Cramer s rule. The substitution of v into the fourth condition (.36, which is nonlinear, yields a desired value for the decay parameter t. 10
11 At first, the condition δ T 1 g = δ T 1 P g 1 = 0, that is (.33 with k = 1, asks [t cos θ 1 t sin θ 1 cos θ sin θ ] v = 0. (.38 The condition δ T γ 1 = δ T 1 M T (P Ig 1 = δ T 1 (ti M T g 1 = 0, that is (.34 with k = 1, requires the vector v to satisfy [t cos θ 1 sin θ 1 t(1 cos θ t sin θ ] v = 0. (.39 The requirement (.35 with k = 1, that is, 0 = δ T 1 g 3 + δ T 3 γ 1 = δ T 1 P g 1 + δ T 1 (M T (P Ig 1 = δ T 1 [P + tm T (M T ]g 1, yields the equation [ t cos θ1 + (t 1 cos θ 1 t sin θ 1 (1 + t sin θ 1 t (cos θ cos θ + cos θ t (sin θ sin θ sin θ ] v = 0. (.40 By (.38, (.39, (.40 and the choices of θ 1 and θ, we see that { i } must satisfy t t t (1 + t t t t (1 + t t ( + 1t By the Cramer s rule, to meet (.41, we may choose { i } as follows: 1 = ± [ ( + t 3 + (1 + t + 1 ], = ± [ ( + t 3 + (1 + t 1 ], 3 = ± [ t 3 + t + (1 t + 1 ], 4 = ± [ t 3 t + (1 + t 1 ]. = 0. (.41 (.4 Since g1 T δ 1 = = ±(t + 1(t + 1 and 0 < t < 1, we choose all the signs in (.4 as so that g1 T δ 1 < 0. We now want to substitute the general solution to the fourth consistent condition (.36. Noting that δ T 3 γ 1 = g3 T δ 1 and g4 T δ 4 = t g3 T δ 3, we know that (.36 with k = 1 is equivalent to [ ] g4 T δ 1 + g T δ 4 g1 T δ 4 + g3 T δ 1 t + gt 4 δ g T δ = 0. (.43 Further, using g T δ = t g T 1 δ 1, g T 4 δ = t g T 3 δ 1 and g T δ 4 = t g T 1 δ 3, (.43 can be simplified as g T 1 δ 1 [ g T 4 δ 1 g T 1 δ 4 + t(g T 3 δ 1 + g T 1 δ 3 ] + (g T 3 δ 1 = 0. (.44 11
12 Consequently, we obtain the following equation for the parameter t: where (t + 1 (t + 1 [ p (t + tp(t q(t ] = 0, (.45 p(t = ( + t + ( + t 1, q(t = ( + 3 t 3 (4 + 3 t + (3 + t. Further calculations provide ( [ p (t + tp(t q(t ] = [ t + (1 t + (1 ] [t + (1 3 t + ( ]. Considering the requirement that t ( 1, 1, one can deduce the following quadratic equation from (.45, ( t ( t = 0. (.46 The above equation has a unique root of t in the interval ( 1, 1, t = (.47 The numerical value of t is equal to approximately. Therefore if we choose the above t and the vectors δ 1 and g 1 such that (.4 holds with minus sign, then the prefixed steps and gradients satisfy the consistency conditions (.33, (.34, (.35 and (.36 for all k 1. There are many ways to choose δ 1 and g 1 satisfying (.4. Specifically, we choose ( η1 = ξ 1 ( 0, ( c1 = d 1 ( 0. (.48 In this case, by (.4 with minus signs and the definitions of i s, we can get that ( ( γ1 (17 8 t + ( = τ 1 ( t + (17 9, ( ( (.49 l1 ( 4 13 t + ( = h 1 ( 4 13 t + ( The initial step δ 1 and the initial gradient g 1 are then determined..5 QuasiNewton Updating Matrices In this subsection we discuss the form of the BFGS quasinewton updating matrices implied by (.5(.8 under the consistent conditions (.33(.36. A byproduct is that, we will know how to choose the initial matrix H 1 for the example. For this aim, we provide a followup lemma of Lemma.3. 1
13 Lemma.4. Assume that (.30 holds and the matrix H satisfies (.9. If further, the matrix A in (.3 is positive definite, there must exist nonsingular triangular matrices T 1 and T such that A = V T U = T T 1 T. (.50 Further, denoting ˆV = V T 1 = (ˆv 1, ˆv,..., ˆv n and the diagonal matrix T 1 1 T 1 = diag(t 1, t,..., t n, we have that H = n t i ˆv iˆv i T. (.51 i=1 Proof. The truth of (.50 is obvious by the Cholesky factorization of positive definite matrices. Further, by (.50 and the definition of ˆV, we get that Since HU = V, we obtain H = V U 1 = ( ( ˆV T 1 1 ˆV T U = T. T 1 ˆV T = ˆV ( T1 1 T 1 ˆV T, which with the definitions of ˆv i s and t i s implies the truth of (.51. Q.E.D. By the above lemma, we can express the form of H k+1 determined by (.5 (.6 under the consistent conditions (.33(.36. At first, we make use of the steps {δ k+i ; i = 0, 1,, 3} to introduce the following two vectors that are orthogonal to γ k and g k+1 : z k = δt k+γ k δ T k γ k δ k δt k+g k+1 δ T k+1g k+1 δ k+1 + δ k+, w k = δt k+3γ k δ T k γ k δ k δt k+3g k+1 δ T k+1g k+1 δ k+1 + δ k+3. (.5 The vectors z k and w k are well defined because δ T k γ k = δ T k g k > 0 is positive due to the descent property of δ k. Direct calculations show that z T k γ k = z T k g k+1 = 0, w T k γ k = w T k g k+1 = 0. (.53 Further, we use z k and w k to define the vector that is orthogonal to g k+ : By the choice of v k, it is easy to see that v k = wt k g k+ z T k g k+ z k + w k. (.54 v T k γ k = v T k g k+1 = v T k g k+ = 0. (.55 13
14 Now we could express the matrix H k+1 by H k+1 = δ kδ T k δ T k g k δ k+1δ T k+1 δ T k+1g k+1 z kz T k z T k g k+ v kv T k v T k g k+3. (.56 Since δ T k g k < 0 for all k 1, we know that the matrix H k+1 is positive definite if and only if z T k g k+ < 0 and v T k g k+3 < 0. Direct calculations show that z T 1 g 3 = ( t ( < 0, v T 1 g 4 = ( t + ( < 0. Therefore we know that the matrix H is positive definite. In addition, it is difficult to build the relations δ T k+1g k+1 = t δ T k g k, z T k+1 g k+3 = t z T k g k+, v T k+1 g k+4 = t v T k g k+3, z k+1 = Mz k and v k+1 = M v k for all k 1. Thus we have by (.56 and δ k+1 = M δ k that H k+1 = 1 t MH km T, (.57 holds for all k and hence {H k : k } are all positive definite. With the above procedure, we can calculate the values of z 1, v 1 and H. Further, by (.3, we can obtain the initial matrix H 1 required by the example of this paper. H 1 = H δ 1δ T + δ δ T 1 δ T 1 γ 1 + gt δ g T 1 δ 1 δ 1 δ T 1 δ T 1 γ 1 := H , (.58 where H 1 is a symmetric matrix with entries H 1 (1, 1 = ( t + ( , H 1 (1, = ( t ( , H 1 (1, 3 = ( t ( , H 1 (1, 4 = ( t + ( H 1 (, = ( t + ( , H 1 (, 3 = 1068 t ( , H 1 (, 4 = ( t ( , H 1 (3, 3 = ( t + ( , H 1 (3, 4 = ( t + ( , H 1 (4, 4 = ( t + ( (.59 Direct calculations show that (.57 also holds with k = 1 and hence H 1 is a positive definite matrix. 14
15 3 Constructing A Suitable Objective Function 3.1 Recovering The Iterations and The Function Values Denote the rotation matrices ( cos θ1 sin θ R 1 = 1, R sin θ 1 cos θ = 1 ( cos θ sin θ sin θ cos θ. (3.1 If we write the steps {δ k } and gradients {g k } into the forms η i t 8j l i δ 8j+i = ξ i t 8j γ i, g 8j+i = t 8j h i c i ; i = 1,..., 8, (3. t 8j τ i d i we have from (.1, (., (.4 and (.5 that ( ( ηi+1 ηi = R ξ 1, i+1 ξ i ( ( li+1 li = t R h 1, i+1 h i ( γi+1 τ i+1 ( γi = t R, τ i ( ( ci+1 ci = R d. i+1 d i (3.3 For simplicity, we want the iterations {x k } asymptotically to turn around the eight vertices of some regular octagon Ω of the subspace spanned by the first and second coordinates. More exactly, such a regular octagon Ω has the origin as its center and its eight vertices V i are given by V i = (a i, b i, 0, 0 T with ( = b 1 1, ( a1 ( ai+1 b i+1 ( ai = R 1. (3.4 b i Here we should note that if there is no confusion, we also regard that Ω and V i s are defined in the subspace spanned by the first and second coordinates. To this aim, we ask the iterations {x k } to be of the form a i x 8j+i = b i t 8j p i, (3.5 t 8j q i where ( pi+1 q i+1 To decide the values of p 1 and q 1, noting that x 1 V 1 = x 1 lim j x 8j+1 = ( pi = t R. (3.6 q i 15 δ i, i=0
16 we have that ( p1 q 1 = i=0 ( γi τ i ( = (E tr 1 γ1, (3.7 τ 1 where again E is the twodimensional identity matrix. Direct calculations show that ( p1 = 1 ( ( t + ( q ( t + ( (3.8 Now, let us assume that the limit of f(x k is f. Since for any j, the first two coordinates of {x 8j+i } always keep the same as those of its limit V i, we can get that ( T ( f(x 8j+i f = t 8j ci pi d i q i where = t 8j [ R i 1 ( T ( = t 8j+i 1 c1 p1 d 1 q 1 = t 8j+i 1 (f(x 1 f, ( c1 d 1 ] T [ (tr i 1 ( p1 q 1 ] f(x 1 f = c 1 p 1 + d 1 q 1 = ( t + ( Now we see the value of g T 8j+i δ 8j+i. Direct calculations show that t 8j T T l i η i l i η i g8j+i T δ 8j+i = t 8j h i ξ i c i t 8j γ i = t8j h i ξ i c i γ i d i t 8j τ i d i τ i = t 8j+i 1 (l 1 η 1 + h 1 ξ 1 + c 1 γ 1 + d 1 τ 1 = t 8j+i 1 g1 T δ 1, where g T 1 δ 1 = l 1 η 1 + h 1 ξ 1 + c 1 γ 1 + d 1 τ 1 = ( t + ( Therefore f(x 8j+i+1 f(x 8j+i α 8j+i g T 8j+iδ 8j+i = (f(x 8j+i+1 f (f(x 8j+i f g T 8j+iδ 8j+i = (t8j+i t 8j+i 1 (f(x 1 f t 8j+i 1 g T 1 δ 1 = (t 1(f(x 1 f g T 1 δ E 0. (3.9 16
17 The above relation, together with g T 8j+i+1 δ 8j+i = 0, implies that the stepsize α 8j+i = 1 can be accepted by the Wolfe line search, the Armijo line search and the Goldstein line search with suitable line search parameters. 3. Seeking A Suitable Form for The Objective Function We now consider how to construct a smooth function f such that its gradient at any point x 8j+i given in (3.5 are the one given in (3.; namely, f(x 8j+i = g 8j+i, for all j 0 and i = 1,..., 8. (3.10 To this aim, we assume that f is of the form f(x 1, x, x 3, x 4 = λ(x 1, x x 3 + µ(x 1, x x 4, (3.11 where λ and µ are twodimensional functions to be determined. Since the function (3.11 is linear with x 3 and x 4, we know that f has no lower bound in R n. Consequently, for the sequence {x k } generated by some optimization method for the minimization of this function, it is expected that f(x k tends to, but this will not happen in our examples. With the prefixed form (3.11, we have that f(x 1, x, x 3, x 4 = λ x 1 x 3 + µ x 1 x 4 λ x x 3 + µ x x 4 λ(x 1, x µ(x 1, x. (3.1 Comparing the last two components of the right hand vector in (3.1 with the gradients in (3., we must have that λ(v i = c i, µ(v i = d i. (3.13 Consequently, by (.48 and (3.3, we can obtain the concrete values of λ and µ at the eight vertices of Ω, which are listed in the second and third rows in Table 3.1. Further, for each i, let J i be the Jacobian of (λ, µ at vertex V i and denote J i = ( λ x 1 λ x 1 µ µ x x Vi, J 1 = ( ω1 ω. (3.14 ω 3 ω 4 The comparison of the first two components of the right hand vector in (3.1 with the gradients in (3. leads to the relation ( ( pi li J i =. (3.15 q i To be such that (3.15 always holds, we ask J i+1 and J i to meet the condition h i J i+1 = R 1 J i R T (
18 for all i 1. In this case, if (3.15 holds for some i, we have by this, (3.6 and (3.3 that ( ( ( ( ( pi+1 J i+1 = (R q 1 J i R T pi pi li li+1 (tr = tr i+1 q 1 J i = tr i q 1 =, i h i h i+1 which means that (3.15 holds with i + 1. Therefore by the induction principle, to be such that (3.15 holds for all i 1, it remains to choose J 1 such that (3.15 holds with i = 1. Noticing that there are still two degrees of freedom, we ask the special relations ω = ω 3, ω 4 = ω 1. (3.17 Then the values of ω 1 and ω 3 can be solved from (3.15 with i = 1 and (3.17, ω 1 = ( t ( , ω 3 = ( t + ( (3.18 Therefore if we choose J 1 to be the one in (3.14 with the values in (3.17 and (3.18 and ask the relation (3.16 for all i 1, then we will have (3.15 for all i 1. Now, using (3.13, (3.16 and (3.17, we can list the function values and gradients of λ and µ at the vertices V i s of the octagon Ω into Table 3.1. V 1 V V 3 V 4 V 5 V 6 V 7 V 8 λ µ λ x 1 ω 1 ω 1 ω 1 ω 1 ω 1 ω 1 ω 1 ω 1 λ x ω 3 ω 3 ω 3 ω 3 ω 3 ω 3 ω 3 ω 3 µ x 1 ω 3 ω 3 ω 3 ω 3 ω 3 ω 3 ω 3 ω 3 µ x ω 1 ω 1 ω 1 ω 1 ω 1 ω 1 ω 1 ω 1 Table 3.1. Function values and gradients of λ and µ at V i s By using the special choice of Ω and observing the values in Table 3.1, we can ask the functions λ and µ to have the properties and µ(x 1, x = λ( x, x 1 (3.19 λ(x 1, x = λ( x 1, x (3.0 for all (x 1, x R. Further, by (3.0, we could think of constructing the function λ by a polynomial function with only odd orders. In the next subsection, we will construct a simple function that only meets the requirements (a and (c that are described in the first section. In 3.4, we construct a complicated function that can meet all the requirements (a, (b and (c. 18
19 3.3 A Simple Function that Meets (a and (c If we ignore the convexity requirement (b of the line search function, we can construct a relatively simple function λ(x 1, x to meet the interpolation conditions listed in Table 3.1 and the function µ(x 1, x is then given by (3.19. In fact, we can check that the following function λ f (x 1, x =(15 1 x (6 9 (x 4 1 x + x 1x 3 + ( x 3 1 +( 1 + (6x 1x + x 3 + ( x1 9 8 x. has values of 0, 1,, 1 at vertices V 1, V, V 3, V 4, respectively, and zero derivatives at all the vertices V i s. Further, we can check the following function λ g (x 1, x = 3 (x 1 1[x 1 (3 + ]x 4 has zero function values at all the vertices V i s, but could be used to interpolate the derivatives of λ(x 1, x at V i s. Consequently, we know that λ(x 1, x = λ f (x 1, x + ω 1 λ g (x 1, x + ω 3 λ g ( x, x 1 (3.1 meets all the interpolation conditions required by λ(x 1, x in Table 3.1. Consequently, we could say that for the function defined by (3.11, (3.1 and (3.19, the BFGS method with the Wolfe line search using the unit initial step does not converge. Observing that Ψ i (α = tψ i (α, Ψ i(α = tψ i(α at α = 0, 1, we ideally wish there is the following relation between Ψ i (α and Ψ i+1 (α: Ψ i+1 (α = t Ψ i (α, for all α 0 (3. so that the neighboring line search functions only differ up to a constant multiplier of t. However, the choice (3.1 of λ(x 1, x does not lead to (3.. Nevertheless, we can propose the following compensation function and define λ c (x 1, x = x 1 [ x 1 + x ( + ] ˆλ(x 1, x = λ(x 1, x + c 1 λ c (x 1, x + c λ c ( x, x 1, (3.3 where c 1 = ( t ( , c = ( t + (
20 In this case, the relation (3. will always hold and the corresponding line search function at the first iteration is 6 ˆΨ 1 (α = ρ i α i, (3.4 where i=0 ρ 6 = ( t + ( , ρ 5 = ( t ( , ρ 4 = ( t + ( , ρ 3 = ( t ( , ρ = ( t + ( , ρ 1 = g T 1 δ 1 = ( t + (40 18, ρ 0 = f(x 1 = ( t + ( To sum up at this stage, for the function (3.11 with λ(x 1, x given in (3.1 or (3.3 and µ(x 1, x = λ( x, x 1, if the initial point is x 1 = (a 1, b 1, p 1, q 1 T (see (3.4, (3.8, (.47 for their values and if the initial matrix is H 1 given by (.58 and (.59, then the BFGS method (1. and (1.5 with α k 1 will generate the iterations in (3.5 whose gradients are given by (.4(.5. Therefore the method will asymptotically cycle around the eight vertices of a regular octagon without approaching a stationary point or pushing f(x k. 3.4 A Complicated Function that Meets (a, (b and (c To meet the requirement (b, this subsection gives a further compensation to the function ˆλ(x 1, x in (3.3 such that each line search function Ψ k (α is strongly convex. To this aim, we consider the straight line connecting x 8j+i and x 8j+i+1 : a i + α η i l i (α = x 8j+i + α δ 8j+i = b i + α ξ i t 8j (p i + α γ i. (3.5 t 8j (q i + α τ i By (3.1, the line search function at the (8j + ith iteration is Ψ 8j+i (α = t 8j[ λ(a i +α η i, b i +α ξ i ( p i +α γ i +µ(ai +α η i, b i +α ξ i ( q i +α τ i ]. (3.6 0
(Quasi)Newton methods
(Quasi)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable nonlinear function g, x such that g(x) = 0, where g : R n R n. Given a starting
More informationt := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).
1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction
More informationThe Characteristic Polynomial
Physics 116A Winter 2011 The Characteristic Polynomial 1 Coefficients of the characteristic polynomial Consider the eigenvalue problem for an n n matrix A, A v = λ v, v 0 (1) The solution to this problem
More informationContinued Fractions and the Euclidean Algorithm
Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction
More informationIRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL. 1. Introduction
IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL R. DRNOVŠEK, T. KOŠIR Dedicated to Prof. Heydar Radjavi on the occasion of his seventieth birthday. Abstract. Let S be an irreducible
More informationSummary of week 8 (Lectures 22, 23 and 24)
WEEK 8 Summary of week 8 (Lectures 22, 23 and 24) This week we completed our discussion of Chapter 5 of [VST] Recall that if V and W are inner product spaces then a linear map T : V W is called an isometry
More informationInner Product Spaces and Orthogonality
Inner Product Spaces and Orthogonality week 34 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,
More informationApplications to Data Smoothing and Image Processing I
Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationIntroduction to Flocking {Stochastic Matrices}
Supelec EECI Graduate School in Control Introduction to Flocking {Stochastic Matrices} A. S. Morse Yale University Gif sur  Yvette May 21, 2012 CRAIG REYNOLDS  1987 BOIDS The Lion King CRAIG REYNOLDS
More informationMathematics Course 111: Algebra I Part IV: Vector Spaces
Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 19967 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are
More informationOutline. Optimization scheme Linear search methods Gradient descent Conjugate gradient Newton method QuasiNewton methods
Outline 1 Optimization without constraints Optimization scheme Linear search methods Gradient descent Conjugate gradient Newton method QuasiNewton methods 2 Optimization under constraints Lagrange Equality
More information1 VECTOR SPACES AND SUBSPACES
1 VECTOR SPACES AND SUBSPACES What is a vector? Many are familiar with the concept of a vector as: Something which has magnitude and direction. an ordered pair or triple. a description for quantities such
More informationChapter 15 Introduction to Linear Programming
Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2014 WeiTa Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of
More informationLINEAR ALGEBRA W W L CHEN
LINEAR ALGEBRA W W L CHEN c W W L Chen, 1997, 2008 This chapter is available free to all individuals, on understanding that it is not to be used for financial gain, and may be downloaded and/or photocopied,
More informationClassification of Cartan matrices
Chapter 7 Classification of Cartan matrices In this chapter we describe a classification of generalised Cartan matrices This classification can be compared as the rough classification of varieties in terms
More informationEXPLICIT ABS SOLUTION OF A CLASS OF LINEAR INEQUALITY SYSTEMS AND LP PROBLEMS. Communicated by Mohammad Asadzadeh. 1. Introduction
Bulletin of the Iranian Mathematical Society Vol. 30 No. 2 (2004), pp 2138. EXPLICIT ABS SOLUTION OF A CLASS OF LINEAR INEQUALITY SYSTEMS AND LP PROBLEMS H. ESMAEILI, N. MAHDAVIAMIRI AND E. SPEDICATO
More informationWe call this set an ndimensional parallelogram (with one vertex 0). We also refer to the vectors x 1,..., x n as the edges of P.
Volumes of parallelograms 1 Chapter 8 Volumes of parallelograms In the present short chapter we are going to discuss the elementary geometrical objects which we call parallelograms. These are going to
More informationModern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
More informationAdaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More informationISOMETRIES OF R n KEITH CONRAD
ISOMETRIES OF R n KEITH CONRAD 1. Introduction An isometry of R n is a function h: R n R n that preserves the distance between vectors: h(v) h(w) = v w for all v and w in R n, where (x 1,..., x n ) = x
More information13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.
3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in threespace, we write a vector in terms
More informationQuadratic Functions, Optimization, and Quadratic Forms
Quadratic Functions, Optimization, and Quadratic Forms Robert M. Freund February, 2004 2004 Massachusetts Institute of echnology. 1 2 1 Quadratic Optimization A quadratic optimization problem is an optimization
More informationNotes on Symmetric Matrices
CPSC 536N: Randomized Algorithms 201112 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.
More informationOn the representability of the biuniform matroid
On the representability of the biuniform matroid Simeon Ball, Carles Padró, Zsuzsa Weiner and Chaoping Xing August 3, 2012 Abstract Every biuniform matroid is representable over all sufficiently large
More informationRecall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n.
ORTHOGONAL MATRICES Informally, an orthogonal n n matrix is the ndimensional analogue of the rotation matrices R θ in R 2. When does a linear transformation of R 3 (or R n ) deserve to be called a rotation?
More informationMATH 590: Meshfree Methods
MATH 590: Meshfree Methods Chapter 7: Conditionally Positive Definite Functions Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2010 fasshauer@iit.edu MATH 590 Chapter
More informationx1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.
Cross product 1 Chapter 7 Cross product We are getting ready to study integration in several variables. Until now we have been doing only differential calculus. One outcome of this study will be our ability
More informationAN ITERATIVE METHOD FOR THE HERMITIANGENERALIZED HAMILTONIAN SOLUTIONS TO THE INVERSE PROBLEM AX = B WITH A SUBMATRIX CONSTRAINT
Bulletin of the Iranian Mathematical Society Vol. 39 No. 6 (2013), pp 1291260. AN ITERATIVE METHOD FOR THE HERMITIANGENERALIZED HAMILTONIAN SOLUTIONS TO THE INVERSE PROBLEM AX = B WITH A SUBMATRIX CONSTRAINT
More informationMatrix Algebra LECTURE 1. Simultaneous Equations Consider a system of m linear equations in n unknowns: y 1 = a 11 x 1 + a 12 x 2 + +a 1n x n,
LECTURE 1 Matrix Algebra Simultaneous Equations Consider a system of m linear equations in n unknowns: y 1 a 11 x 1 + a 12 x 2 + +a 1n x n, (1) y 2 a 21 x 1 + a 22 x 2 + +a 2n x n, y m a m1 x 1 +a m2 x
More informationIdeal Class Group and Units
Chapter 4 Ideal Class Group and Units We are now interested in understanding two aspects of ring of integers of number fields: how principal they are (that is, what is the proportion of principal ideals
More informationMATH10212 Linear Algebra. Systems of Linear Equations. Definition. An ndimensional vector is a row or a column of n numbers (or letters): a 1.
MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0534405967. Systems of Linear Equations Definition. An ndimensional vector is a row or a column
More informationAbsolute Value Programming
Computational Optimization and Aplications,, 1 11 (2006) c 2006 Springer Verlag, Boston. Manufactured in The Netherlands. Absolute Value Programming O. L. MANGASARIAN olvi@cs.wisc.edu Computer Sciences
More informationSimilarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
More informationNumerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen
(für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained
More informationME128 ComputerAided Mechanical Design Course Notes Introduction to Design Optimization
ME128 Computerided Mechanical Design Course Notes Introduction to Design Optimization 2. OPTIMIZTION Design optimization is rooted as a basic problem for design engineers. It is, of course, a rare situation
More informationThe NEWUOA software for unconstrained optimization without derivatives 1. M.J.D. Powell
DAMTP 2004/NA05 The NEWUOA software for unconstrained optimization without derivatives 1 M.J.D. Powell Abstract: The NEWUOA software seeks the least value of a function F(x), x R n, when F(x) can be calculated
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
More informationNumerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number
More informationON CERTAIN DOUBLY INFINITE SYSTEMS OF CURVES ON A SURFACE
i93 c J SYSTEMS OF CURVES 695 ON CERTAIN DOUBLY INFINITE SYSTEMS OF CURVES ON A SURFACE BY C H. ROWE. Introduction. A system of co 2 curves having been given on a surface, let us consider a variable curvilinear
More information2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
More informationLINEAR ALGEBRA. September 23, 2010
LINEAR ALGEBRA September 3, 00 Contents 0. LUdecomposition.................................... 0. Inverses and Transposes................................. 0.3 Column Spaces and NullSpaces.............................
More informationAlgebra 2 Chapter 1 Vocabulary. identity  A statement that equates two equivalent expressions.
Chapter 1 Vocabulary identity  A statement that equates two equivalent expressions. verbal model A word equation that represents a reallife problem. algebraic expression  An expression with variables.
More informationNumerical Methods for Solving Systems of Nonlinear Equations
Numerical Methods for Solving Systems of Nonlinear Equations by Courtney Remani A project submitted to the Department of Mathematical Sciences in conformity with the requirements for Math 4301 Honour s
More informationHomework One Solutions. Keith Fratus
Homework One Solutions Keith Fratus June 8, 011 1 Problem One 1.1 Part a In this problem, we ll assume the fact that the sum of two complex numbers is another complex number, and also that the product
More informationMATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set.
MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set. Vector space A vector space is a set V equipped with two operations, addition V V (x,y) x + y V and scalar
More informationRoots of Polynomials
Roots of Polynomials (Com S 477/577 Notes) YanBin Jia Sep 24, 2015 A direct corollary of the fundamental theorem of algebra is that p(x) can be factorized over the complex domain into a product a n (x
More informationThe Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Linesearch Method
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Linesearch Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem
More information3. INNER PRODUCT SPACES
. INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.
More informationMATH 240 Fall, Chapter 1: Linear Equations and Matrices
MATH 240 Fall, 2007 Chapter Summaries for Kolman / Hill, Elementary Linear Algebra, 9th Ed. written by Prof. J. Beachy Sections 1.1 1.5, 2.1 2.3, 4.2 4.9, 3.1 3.5, 5.3 5.5, 6.1 6.3, 6.5, 7.1 7.3 DEFINITIONS
More informationMATH 2030: SYSTEMS OF LINEAR EQUATIONS. ax + by + cz = d. )z = e. while these equations are not linear: xy z = 2, x x = 0,
MATH 23: SYSTEMS OF LINEAR EQUATIONS Systems of Linear Equations In the plane R 2 the general form of the equation of a line is ax + by = c and that the general equation of a plane in R 3 will be we call
More informationDecember 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in twodimensional space (1) 2x y = 3 describes a line in twodimensional space The coefficients of x and y in the equation
More informationInteger Factorization using the Quadratic Sieve
Integer Factorization using the Quadratic Sieve Chad Seibert* Division of Science and Mathematics University of Minnesota, Morris Morris, MN 56567 seib0060@morris.umn.edu March 16, 2011 Abstract We give
More informationVector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a nonempty
More informationDiagonalisation. Chapter 3. Introduction. Eigenvalues and eigenvectors. Reading. Definitions
Chapter 3 Diagonalisation Eigenvalues and eigenvectors, diagonalisation of a matrix, orthogonal diagonalisation fo symmetric matrices Reading As in the previous chapter, there is no specific essential
More informationDuality of linear conic problems
Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least
More informationDirect Methods for Solving Linear Systems. Linear Systems of Equations
Direct Methods for Solving Linear Systems Linear Systems of Equations Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University
More informationIntroduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra  1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
More informationNumerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J. Olver 6. Eigenvalues and Singular Values In this section, we collect together the basic facts about eigenvalues and eigenvectors. From a geometrical viewpoint,
More informationChapter 8. Matrices II: inverses. 8.1 What is an inverse?
Chapter 8 Matrices II: inverses We have learnt how to add subtract and multiply matrices but we have not defined division. The reason is that in general it cannot always be defined. In this chapter, we
More informationLecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs
CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like
More informationFactorization Theorems
Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
More information1 Short Introduction to Time Series
ECONOMICS 7344, Spring 202 Bent E. Sørensen January 24, 202 Short Introduction to Time Series A time series is a collection of stochastic variables x,.., x t,.., x T indexed by an integer value t. The
More informationSection 1.1. Introduction to R n
The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to
More informationCONTROLLABILITY. Chapter 2. 2.1 Reachable Set and Controllability. Suppose we have a linear system described by the state equation
Chapter 2 CONTROLLABILITY 2 Reachable Set and Controllability Suppose we have a linear system described by the state equation ẋ Ax + Bu (2) x() x Consider the following problem For a given vector x in
More informationTHE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS
THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS KEITH CONRAD 1. Introduction The Fundamental Theorem of Algebra says every nonconstant polynomial with complex coefficients can be factored into linear
More informationBy choosing to view this document, you agree to all provisions of the copyright laws protecting it.
This material is posted here with permission of the IEEE Such permission of the IEEE does not in any way imply IEEE endorsement of any of Helsinki University of Technology's products or services Internal
More informationCS3220 Lecture Notes: QR factorization and orthogonal transformations
CS3220 Lecture Notes: QR factorization and orthogonal transformations Steve Marschner Cornell University 11 March 2009 In this lecture I ll talk about orthogonal matrices and their properties, discuss
More informationThe Open University s repository of research publications and other research outputs
Open Research Online The Open University s repository of research publications and other research outputs The degreediameter problem for circulant graphs of degree 8 and 9 Journal Article How to cite:
More informationRoots and Coefficients of a Quadratic Equation Summary
Roots and Coefficients of a Quadratic Equation Summary For a quadratic equation with roots α and β: Sum of roots = α + β = and Product of roots = αβ = Symmetrical functions of α and β include: x = and
More informationMathematics Notes for Class 12 chapter 3. Matrices
1 P a g e Mathematics Notes for Class 12 chapter 3. Matrices A matrix is a rectangular arrangement of numbers (real or complex) which may be represented as matrix is enclosed by [ ] or ( ) or Compact form
More informationMath Review. for the Quantitative Reasoning Measure of the GRE revised General Test
Math Review for the Quantitative Reasoning Measure of the GRE revised General Test www.ets.org Overview This Math Review will familiarize you with the mathematical skills and concepts that are important
More informationASEN 3112  Structures. MDOF Dynamic Systems. ASEN 3112 Lecture 1 Slide 1
19 MDOF Dynamic Systems ASEN 3112 Lecture 1 Slide 1 A TwoDOF MassSpringDashpot Dynamic System Consider the lumpedparameter, massspringdashpot dynamic system shown in the Figure. It has two point
More informationFurther Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1
Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing
More informationMath 215 HW #6 Solutions
Math 5 HW #6 Solutions Problem 34 Show that x y is orthogonal to x + y if and only if x = y Proof First, suppose x y is orthogonal to x + y Then since x, y = y, x In other words, = x y, x + y = (x y) T
More informationINTRODUCTORY LINEAR ALGEBRA WITH APPLICATIONS B. KOLMAN, D. R. HILL
SOLUTIONS OF THEORETICAL EXERCISES selected from INTRODUCTORY LINEAR ALGEBRA WITH APPLICATIONS B. KOLMAN, D. R. HILL Eighth Edition, Prentice Hall, 2005. Dr. Grigore CĂLUGĂREANU Department of Mathematics
More informationU.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009. Notes on Algebra
U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009 Notes on Algebra These notes contain as little theory as possible, and most results are stated without proof. Any introductory
More informationBindel, Fall 2012 Matrix Computations (CS 6210) Week 8: Friday, Oct 12
Why eigenvalues? Week 8: Friday, Oct 12 I spend a lot of time thinking about eigenvalue problems. In part, this is because I look for problems that can be solved via eigenvalues. But I might have fewer
More information9 Matrices, determinants, inverse matrix, Cramer s Rule
AAC  Business Mathematics I Lecture #9, December 15, 2007 Katarína Kálovcová 9 Matrices, determinants, inverse matrix, Cramer s Rule Basic properties of matrices: Example: Addition properties: Associative:
More informationThe Inverse of a Matrix
The Inverse of a Matrix 7.4 Introduction In number arithmetic every number a ( 0) has a reciprocal b written as a or such that a ba = ab =. Some, but not all, square matrices have inverses. If a square
More informationSMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA)
SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI (B.Sc.(Hons.), BUAA) A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF MATHEMATICS NATIONAL
More informationCURVES WHOSE SECANT DEGREE IS ONE IN POSITIVE CHARACTERISTIC. 1. Introduction
Acta Math. Univ. Comenianae Vol. LXXXI, 1 (2012), pp. 71 77 71 CURVES WHOSE SECANT DEGREE IS ONE IN POSITIVE CHARACTERISTIC E. BALLICO Abstract. Here we study (in positive characteristic) integral curves
More information1 Sets and Set Notation.
LINEAR ALGEBRA MATH 27.6 SPRING 23 (COHEN) LECTURE NOTES Sets and Set Notation. Definition (Naive Definition of a Set). A set is any collection of objects, called the elements of that set. We will most
More informationPUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include 2 + 5.
PUTNAM TRAINING POLYNOMIALS (Last updated: November 17, 2015) Remark. This is a list of exercises on polynomials. Miguel A. Lerma Exercises 1. Find a polynomial with integral coefficients whose zeros include
More information1 Introduction to Matrices
1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns
More informationSECRET sharing schemes were introduced by Blakley [5]
206 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 1, JANUARY 2006 Secret Sharing Schemes From Three Classes of Linear Codes Jin Yuan Cunsheng Ding, Senior Member, IEEE Abstract Secret sharing has
More informationSolving polynomial least squares problems via semidefinite programming relaxations
Solving polynomial least squares problems via semidefinite programming relaxations Sunyoung Kim and Masakazu Kojima August 2007, revised in November, 2007 Abstract. A polynomial optimization problem whose
More informationBIG DATA PROBLEMS AND LARGESCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION
BIG DATA PROBLEMS AND LARGESCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION Ş. İlker Birbil Sabancı University Ali Taylan Cemgil 1, Hazal Koptagel 1, Figen Öztoprak 2, Umut Şimşekli
More informationDeterminants. Dr. Doreen De Leon Math 152, Fall 2015
Determinants Dr. Doreen De Leon Math 52, Fall 205 Determinant of a Matrix Elementary Matrices We will first discuss matrices that can be used to produce an elementary row operation on a given matrix A.
More informationPresentation 3: Eigenvalues and Eigenvectors of a Matrix
Colleen Kirksey, Beth Van Schoyck, Dennis Bowers MATH 280: Problem Solving November 18, 2011 Presentation 3: Eigenvalues and Eigenvectors of a Matrix Order of Presentation: 1. Definitions of Eigenvalues
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 10 Boundary Value Problems for Ordinary Differential Equations Prof. Michael T. Heath Department of Computer Science University of Illinois at UrbanaChampaign
More informationNotes on Determinant
ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 918/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without
More informationIterative Methods for Computing Eigenvalues and Eigenvectors
The Waterloo Mathematics Review 9 Iterative Methods for Computing Eigenvalues and Eigenvectors Maysum Panju University of Waterloo mhpanju@math.uwaterloo.ca Abstract: We examine some numerical iterative
More informationNonlinear Iterative Partial Least Squares Method
Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., RichardPlouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for
More information24. The Branch and Bound Method
24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NPcomplete. Then one can conclude according to the present state of science that no
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 3 Linear Least Squares Prof. Michael T. Heath Department of Computer Science University of Illinois at UrbanaChampaign Copyright c 2002. Reproduction
More informationPractical Numerical Training UKNum
Practical Numerical Training UKNum 7: Systems of linear equations C. Mordasini Max Planck Institute for Astronomy, Heidelberg Program: 1) Introduction 2) Gauss Elimination 3) Gauss with Pivoting 4) Determinants
More informationApproximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NPCompleteness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
More informationOn the Interaction and Competition among Internet Service Providers
On the Interaction and Competition among Internet Service Providers Sam C.M. Lee John C.S. Lui + Abstract The current Internet architecture comprises of different privately owned Internet service providers
More information