A Perfect Example for The BFGS Method


 Catherine Webster
 2 years ago
 Views:
Transcription
1 A Perfect Example for The BFGS Method YuHong Dai Abstract Consider the BFGS quasinewton method applied to a general nonconvex function that has continuous second derivatives. This paper aims to construct a fourdimensional example such that the BFGS method need not converge. The example is perfect in the following sense: (a All the stepsizes are exactly equal to one; the unit stepsize can also be accepted by various line searches including the Wolfe line search and the Arjimo line search; (b The objective function is strongly convex along each search direction although it is not in itself. The unit stepsize is the unique minimizer of each line search function. Hence the example also applies to the global line search and the line search that always picks the first local minimizer; (c The objective function is polynomial and hence is infinitely continuously differentiable. If relaxing the convexity requirement of the line search function; namely, (b, we are able to construct a relatively simple polynomial example. Keywords. unconstrained optimization, quasinewton method, nonconvex function, global convergence. 1 Introduction Consider the unconstrained optimization problem min f(x, x R n, (1.1 where f is a general nonconvex function that has continuous second derivatives. The quasinewton method is a class of wellknown and efficient methods for solving (1.1. It is of the iterative scheme x k+1 = x k α k H k g k, (1. This work was partially supported by the Chinese NSF grants and and the CAS grant kjcxyws703. State Key Laboratory of Scientific and Engineering Computing, Institute of Computational Mathematics and Scientific/Engineering Computing, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, P.O. Box 719, Beijing , P.R. China. 1
2 where x 1 is a starting point, H k is some approximation to the inverse Hessian [ f(x k ] 1, g k = f(x k, and α k is a stepsize obtained in some way. Defining the vectors δ k = x k+1 x k, γ k = g k+1 g k, the quasinewton method asks the next approximation matrix H k+1 to satisfy the secant equation H k+1 γ k = δ k. (1.3 This similarity to the identical equation [ f(x k+1 ] 1 γ k = δ k in the quadratic case enables the quasinewton method to be superlinearly convergent (see Dennis and Moré [5], for example and makes it very attractive in practical optimization. The first quasinewton method was dated back to Davidon [4] and Fletcher and Powell [9]. The DFP method updates the approximation matrix H k to H k+1 by the formula H k+1 = H k H kγ k γ T k H k γ T k H kγ k + δ kδ T k δ T k γ k. (1.4 Nowadays, the most efficient quasinewton method is perhaps the BFGS method, which was proposed by Broyden [], Fletcher [7], Goldfarb [10], and Shanno [4], independently. The matrix H k+1 in the BFGS method can be updated by the way H k+1 = H k δ kγ T k H k + H k γ k δ T ( k δ T γt k H kγ k δk δ T k k γ k δ T k γ k δ T. (1.5 k γ k To pass the positive definiteness of the matrix H k to H k+1, practical quasi Newton algorithms make use of the Wolfe line search to calculate the stepsize α k, which ensures a positive curvature to be found at each iteration; namely, δ T k γ k > 0. More exactly, defining the onedimensional line search function Ψ k (α = f(x k αh k g k, α 0, (1.6 the Wolfe line search consists in finding a stepsize α k such that and Ψ k (α k Ψ k (0 + µ Ψ k(0 α k (1.7 Ψ k(α k η Ψ k(0, (1.8 where µ and η are constants with 0 < µ η < 1. There have been a large quantity of research works devoting to the global convergence of the quasinewton method (see Yuan [8] and Sun and Yuan [6], for example. Specifically, Powell [18] showed that the DFP method with exact line searches is globally convergent for uniformly convex functions. In another paper [0], Powell established the global convergence of the BFGS method with the Wolfe line search for uniformly convex functions. This result was extended
3 by Byrd, Nocedal and Yuan [1] to the whole Broyden s convex family of methods except the DFP method. Therefore it is natural to ask the following question: Does the DFP method with the Wolfe line search converge globally for uniformly convex functions? On the other hand, since the BFGS method with the Wolfe line search works well for both convex and nonconvex functions, we might ask another question: Does the BFGS method with the Wolfe line search converge globally for general functions? The difficulty and importance of the two convergence problems has been addressed in many situations, including Nocedal [15], Fletcher [8], and Yuan [8]. Recent studies provide a negative answer to the convergence problem of the BFGS method for nonconvex functions. As a matter of fact, in an early paper [1], which analyzes the convergence properties of the conjugate gradient method, Powell mentioned that the BFGS method need not converge if the line search can pick any local minimizer of the line search function Ψ k (α. After further studies on the twodimensional example in [1], Dai [3] presented an example with six cycling points and showed by the example that the BFGS method with the Wolfe line search may fail for nonconvex functions. Later, Mascarenhas [14] constructed a threedimensional counterexample such that the BFGS method does not converge if the line search picks the global minimizer of the function Ψ k (α. It should be noted that the stepsize in the counterexample of [14] also satisfies the Wolfe line search conditions. However, neither examples are such that the stepsize is the first local minimizer of the line search function Ψ k (α. Surprisingly enough, if there are only two variables, and if the stepsize is chosen to be the first local minimizer of Ψ k (α; namely, α k = arg min {α > 0 : α is a local minimizer of Ψ k (α}, (1.9 Powell [] established the global convergence of the BFGS method for general twice continuously differentiable function. Powell s proof makes use of the principle of contradiction and is quite sophisticated. Assuming the nonconvergence of the method and the relation lim inf g k 0, Powell showed that the limit points of the BFGS path k P = {x : x lies on the line segment connecting x k and x k+1 for some k 1} (that is exactly the same as the PolakRibière conjugate gradient path if n = and gk+1 T δ k = 0 for all k forms some line segment L. The use of the specific line search (1.9 is such that the objective function is monotonically decreasing on the line segment L k that connects x k and x k+1. It follows that f(x lim f(x k for all x L. Consequently, the line segment L k cannot cross L k and all the points x k with large indices lies in the same side of L. Furthermore, Powell showed that starting around from one end of the line segment L, the BFGS path can not get any close to the other end of L and finally obtained a contradiction. Intuitively, it is difficult to extend Powell s result to the case when n 3. This is because, even if it has been shown that the BFGS path 3
4 tends to some limit line segment L, each line segment L k could turn around L providing that n 3, making the similar analysis of the BFGS path complicated and nearly impossible. The purpose of this paper is to construct a fourdimensional counterexample for the BFGS method with the following features: (a All the stepsizes in the example are exactly equal to one; namely, α k 1; (1.10 For the search direction defined by the BFGS update in the example, a unit stepsize satisfies various line search conditions, including the Wolfe conditions and the Armijo conditions; (b The objective function is strongly convex along each search direction; namely, the line search function Ψ k (α is strongly convex for all k, although the objective function is not in itself. The unit stepsize is the unique minimizer of Ψ k (α. Hence the example also applies to the global line search and the specific line search (1.9; (c The objective function is polynomial and hence is infinitely continuously differentiable. On the other hand, the objective function in our example is linear in the third and fourth variables and thus has no local minimizer. However, the iterations generated by the BFGS method tend to a nonstationary point. As seen from 3.4, the construction of the example is quite complicated. To be such that the line search function Ψ k (α is strongly convex, a polynomial of degree 38 is introduced. Nevertheless, if we relax the convexity requirement of Ψ k (α; namely, (b in the above, it is possible to construct a relatively simple polynomial example of low degree (see 3.3. The construction of our examples can be divided into four procedures: (1 Prefix the special forms of the steps {δ k } and gradients {g k }, leaving several parameters to be determined later. In this case, once x 1 is given, the whole sequence {x k } is fixed. As seen in.1, our steps {δ k } and gradients {g k } are asked to possess some symmetrical and cyclic properties and to push the iterations {x k } tend to the eight vertices of a regular octagon. This is very helpful in simplifying the construction of the examples and we can focus our attention on the choice of the parameter t, that answers for the decay of the last two components of δ k and the first two components of g k. ( To enable the BFGS method to generate those prefixed steps, investigate the consistency conditions on the steps {δ k } and gradients {g k }. With the prefixed forms of {δ k } and {g k }, we show in. that the unit stepsize is indispensable, whereas this stepsize is usually used as the first trial stepsize in the implementation of quasinewton methods. Four more consistency conditions on {δ k } and {g k } are obtained in.3 by expressing the vectors H k+1 γ k, H k+1 g k+1, H k+1 g k+ and H k+1 g k+3 by some δ k s and g k s instead of considering the quasinewton 4
5 matrix H k+1 itself. (3 Choose the parameters in some way to satisfy the consistency conditions and other necessary conditions on the objective function. As seen in.4, the first three consistency conditions are actually corresponding to some underdetermined linear system. Substituting its general solution by the Cramer s rule into the fourth consistency condition, we are led to a nonlinear equation from which the exact value can be obtained for the decay parameter t. By suitably choosing the other parameters, we can express all the quasinewton updating matrices including the initial choice for H 1 in.5. (4 Construct a suitable objective function f whose gradients are the preassigned values; namely, f(x k = g k for all k 1 and the line search has the desired properties. This will be done in the whole third section. To make full use of symmetrical and cyclic properties of the steps {δ k } and {g k }, we carefully choose a special form of the objective function. In addition, the introduction of element function φ in 3.4 helps us greatly to convexify the line search function and finally complete the perfect example that meets all the requirements (a, (b and (c. Some concluding remarks are given in the last section. Looking for Consistent Steps and Gradients.1 The Forms of Steps and Gradients Consider the case of four dimension. Inspired by [1], [3] and [14], we assume that the steps {δ k } have the following form: δ 1 = (η 1, ξ 1, γ 1, τ 1 T ; δ k+1 = M δ k (k 1, (.1 where M is the 4 4 matrix defined by ( cos θ1 sin θ 1 M = sin θ 1 cos θ 1 0 t 0 ( cos θ sin θ sin θ cos θ. (. In the above, t is a parameter satisfying 0 < t < 1. This parameter answers for the decay of the last two components of the steps {δ k }. The angles θ 1 and θ are chosen so that θ 1 = π m 1 and θ = π m for some positive integers m 1 and m. Specifically, we choose in this paper θ 1 = 1 4 π, θ = 3 π. (.3 4 Consequently, the first two components of the steps {δ k } turn to the same after every eight iterations, and the last two components will shrink at a factor of t 8 after every eight iterations. Further, we can see that the iterations {x k } will tend to turn around the eight vertices of some regular octagon (see 3.1 for more details. Accordingly, the gradients {g k } are assumed to be of the form g 1 = (l 1, h 1, c 1, d 1 T ; g k+1 = P g k (k 1, (.4 5
6 where P = ( cos θ1 sin θ t 1 sin θ 1 cos θ 1 0 ( cos θ 0 sin θ. (.5 sin θ cos θ Since 0 < t < 1, we see that the first two components of {g k } vanish, whereas the last two components of the gradient {g k }, turn to the same after eight iterations. It follows that none of the cluster points generated by the BFGS method are stationary points.. Unit Stepsizes It is well known that the unit stepsize is usually used as the first trial stepsize in practical implementations of the BFGS method. Under suitable assumptions on f, the unit stepsize will be accepted by the line search as the iterates tend to the solution and will enable a superlinear convergence step (see [5], for example. In the following, we are going to show that if the line search satisfies g T k+1δ k = 0, for all k 1, (.6 and if the steps generated by the BFGS method have the form (.1(. and the gradients have the form (.4(.5, then the use of unit stepsizes is also indispensable. To begin with, we see that the line search condition (.6, the secant equation (1.3 and the definition of the search direction indicate that H k+1 g k+1 = α 1 k+1 δ k+1 (.7 δ T k+1γ k = α k+1 g T k+1h k+1 γ k = α k+1 g T k+1δ k = 0. (.8 The above relation (.8 is sometimes called as the conjugacy condition in the context of nonlinear conjugate gradient methods. By multiplying the BFGS updating formula (1.5 with g k+1 and using (.6, which with (.7 gives H k+1 g k+1 = H k g k+1 γt k H kg k+1 δ T k γ k δ k, H k g k+1 = α 1 k+1 δ k+1 + γt k H kg k+1 δ T k γ k δ k. (.9 Multiplying (.9 by g T k and noticing gt k H kg k+1 = 0, we get that 0 = α 1 k+1 gt k δ k+1 + γt k H kg k+1 δ T gk T δ k = α 1 k+1 k γ gt k δ k+1 γ T k H k g k+1. k 6
7 Thus γ T k H kg k+1 = α 1 k+1 gt k δ k+1 = α 1 k+1 gt k+1 δ k+1. Hence, by (.9, H k g k+1 = α 1 k+1 δ k+1 α 1 gk+1 T δ k+1 k+1 δ T δ k. (.10 k γ k It follows from (.10 and (.7 with k replaced by k 1 that [ H k γ k = α 1 k+1 δ k+1 + α 1 k α 1 gk+1 T δ ] k+1 k+1 δ T δ k. (.11 k γ k Further, substituting this into the BFGS updating formula (1.5 yields [ H k+1 = H k + α 1 δ k δ T k+1 + δ k+1 δ T k k+1 δ T + 1 α 1 k + α 1 gk+1 T δ ] k+1 k+1 k γ k δ T k γ k δ k δ T k δ T k γ k. (.1 The above new updating formula requires the quantity δ k+1, that depends on H k+1 itself, and hence has theoretical meanings only. Lemma.1. Assume that (.6 holds. Then for all k 1 and i 0, the vector H k g k+i + α 1 k+i δ k+i belongs to the subspace spanned by δ k, δ k+1,..., δ k+i 1 ; namely, H k g k+i + α 1 k+i δ k+i Span{δ k, δ k+1,..., δ k+i 1 }. (.13 Proof. For convenience, we write (.1 as H k+1 = H k + V (s k, s k+1, (.14 where V (s k, s k+1 means the ranktwo matrix in the right hand of (.1. Therefore we have for all i 1, H k+i = H k + i V (s k+j 1, s k+j. (.15 j=1 The statement follows by multiplying the above relation with g k+i and using (.7 with k replaced by k + i 1 and g T k+i δ k+i 1 = 0. Q.E.D. Lemma.. To construct a desired example with Det (S 1 0, we must have that α k = 1, for all k 4. Proof. Define the following matrices G k = [ γ k 1 g k g k+1 g k+ ], S k = [ δ k 1 δ k δ k+1 δ k+ ]. 7
8 By H k γ k 1 = δ k 1 and Lemma.1, we can get that Det(H k Det(G k = Det [ ] H k γ k 1 H k g k H k g k+1 H k g k+ = Det [ δ k 1 α 1 k δ k α 1 k+1 δ k+1 α k+ δ ] k+ = α 1 k α 1 k+1 α 1 k+ Det(S k. (.16 Replacing k with k + 1 in the above yields Det(H k+1 Det(G k+1 = α 1 k+1 α 1 k+ α 1 k+3 Det(S k+1. (.17 On the other hand, due to the special forms of {g k } and {δ k }, we know that G k+1 = P G k and S k+1 = MS k. Hence Det(G k+1 = Det(P Det(G k = t Det(G k, Det(S k+1 = Det(M Det(S k = t Det(S k. (.18 Due to the basic determinant relation of the BFGS update, (.6 and (.7, it is not difficult to see that Det(H k+1 = δt k H 1 k δ k δ T Det(H k = α k Det(H k. (.19 k γ k If Det(S 1 0, (.16 with k = 1 implies that Det(G 1 0. Then by (.18, Det(G k 0, Det(S k 0, for all k 1. (.0 Dividing (.17 by (.16 and using the above relations, we then obtain α k+3 = 1. So the statement holds due to the arbitrariness of k 1. Q.E.D. The deletion of the first finite iterations does not influence the whole example. Thus we will ask our counterexample to satisfy and Det(S 1 0 (.1 α k 1. (. In this case, the updating formula (.1 of H k can be simplified as H k+1 = H k + δ kδ T k+1 + δ k+1 δ T k δ T gt k+1 δ k+1 k γ k gk T δ k δ k δ T k δ T k γ k. (.3 8
9 .3 Consistency Conditions Now we ask what else conditions, besides the requirement of unit stepsizes, have to be satisfied by the parameters in the definitions of {g k } and {δ k } so that the steps {s k ; k 1} can be generated by the BFGS update. Our early idea is based on the observation on the updating formula (1.5 that H k+1 is linear with H k and the linear system H 8j+9 = diag(t 4 E, t 4 E H 8j+1 diag(t 4 E, t 4 E, (.4 where E is the twodimensional identity matrix. However, this way seems to be quite complicated. Notice that the dimension is n = 4 and by Lemma., the assumption (.1 implies (.0. Then the matrix H k+1 can be uniquely defined by the equations given by H k+1 γ k, H k+1 g k+1, H k+1 g k+ and H k+1 g k+3. As a matter of fact, we have that H k+1 γ k = δ k (the secant equation, (.5 H k+1 g k+1 = δ k+1 ( by (.7 and α k 1, (.6 H k+1 g k+ = δ k+ + gt k+ δ k+ gk+1 T δ δ k+1 (by (.10 and α k 1, (.7 k+1 ( g T H k+1 g k+3 = δ k+3 + k+3 δ k+3 gk+ T δ + gt k+3 δ k+1 k+ gk+1 T δ δ k+ k+1 ( ( g T k+ δ k+ g T k+3 δ k+1 gk+1 T δ k+1 gk+1 T δ δ k+1. (.8 k+1 The last equality is obtained by multiplying (.3 by g k+, using (.7 and finally replacing k with k + 1. Relations (.5(.8 provide a system of 16 equations, while the symmetric matrix H k+1 only has 10 independent entries. How to ensure that this linear system has a symmetric solution H k+1? We have the following general lemma. Lemma.3. Assume that {u 1, u,..., u n } and {v 1, v,..., v n } are two sets of ndimensional linearly independent vectors. Then there exists a symmetric matrix H R n n satisfying Hu i = v i, i = 1,,..., n (.9 if and only if u T i v j = u T j v i, i, j = 1,,..., n. (.30 Proof. The only if part. If H = H T satisfies (.9, we have for all i, j = 1,,..., n, u T i v j = u T i Hu j = u T i HT u j = (Hu i T u j = v T i u j. The if part. Assume that (.30 holds. Defining the matrices U = (u 1 u... u n, V = (v 1 v... v n, (.31 9
10 direct calculations show that U T HU = U T V = u T 1 v 1 u T 1 v u T 1 v n u T v 1 u T v u T v n u T n v 1 u T n v u T n v n := A. (.3 By (.30, A is symmetric. So H = U T AU 1 satisfies H = H T and (.9. This completes our proof. Q.E.D. By the above lemma, the following six conditions are sufficient for the linear system (.5(.8 to allow a symmetric solution matrix H k+1. g T k+1 (H k+1y k = (H k+1 g k+1 T y k, g T k+ (H k+1y k = (H k+1 g k+ T y k, g T k+3 (H k+1y k = (H k+1 g k+3 T y k, g T k+ (H k+1g k+1 = (H k+1 g k+ T g k+1, g T k+3 (H k+1g k+1 = (H k+1 g k+3 T g k+1, g T k+3 (H k+1g k+ = (H k+1 g k+3 T g k+. Further, considering the whole sequence of {H k+1 ; k 1} and combining the line search condition g T k+1 δ k = 0, we can deduce the following four consistency conditions, which should be satisfied for all k 1, g T k+1δ k = 0, (.33 δ T k+1y k = 0, (.34 gk+δ T k = δ T k+y k, (.35 ( g gk+3δ T k = δ T T k+3y k + k+3 δ k+3 gk+ T δ + gt k+3 δ k+1 k+ gk+1 T δ δ T k+y k. (.36 k+1 The positive definiteness of the matrix H k+1 will be further considered in.5..4 Choosing The Parameters In this subsection we focus on how to choose the decay parameter t and suitable vectors δ 1 and g 1 such that the consistency conditions (.33, (.34, (.35 and (.36 hold with k = 1. Due to the special structure of this example, we then know that the consistency conditions hold for all k. Define v = ( T, where 1 = l 1 η 1 + h 1 ξ 1, = h 1 η 1 l 1 ξ 1, 3 = c 1 γ 1 + d 1 τ 1, 4 = d 1 γ 1 c 1 τ 1. (.37 As will be seen, the first three consistent conditions provide an underdetermined linear system with v, from which we can get a general solution by the Cramer s rule. The substitution of v into the fourth condition (.36, which is nonlinear, yields a desired value for the decay parameter t. 10
11 At first, the condition δ T 1 g = δ T 1 P g 1 = 0, that is (.33 with k = 1, asks [t cos θ 1 t sin θ 1 cos θ sin θ ] v = 0. (.38 The condition δ T γ 1 = δ T 1 M T (P Ig 1 = δ T 1 (ti M T g 1 = 0, that is (.34 with k = 1, requires the vector v to satisfy [t cos θ 1 sin θ 1 t(1 cos θ t sin θ ] v = 0. (.39 The requirement (.35 with k = 1, that is, 0 = δ T 1 g 3 + δ T 3 γ 1 = δ T 1 P g 1 + δ T 1 (M T (P Ig 1 = δ T 1 [P + tm T (M T ]g 1, yields the equation [ t cos θ1 + (t 1 cos θ 1 t sin θ 1 (1 + t sin θ 1 t (cos θ cos θ + cos θ t (sin θ sin θ sin θ ] v = 0. (.40 By (.38, (.39, (.40 and the choices of θ 1 and θ, we see that { i } must satisfy t t t (1 + t t t t (1 + t t ( + 1t By the Cramer s rule, to meet (.41, we may choose { i } as follows: 1 = ± [ ( + t 3 + (1 + t + 1 ], = ± [ ( + t 3 + (1 + t 1 ], 3 = ± [ t 3 + t + (1 t + 1 ], 4 = ± [ t 3 t + (1 + t 1 ]. = 0. (.41 (.4 Since g1 T δ 1 = = ±(t + 1(t + 1 and 0 < t < 1, we choose all the signs in (.4 as so that g1 T δ 1 < 0. We now want to substitute the general solution to the fourth consistent condition (.36. Noting that δ T 3 γ 1 = g3 T δ 1 and g4 T δ 4 = t g3 T δ 3, we know that (.36 with k = 1 is equivalent to [ ] g4 T δ 1 + g T δ 4 g1 T δ 4 + g3 T δ 1 t + gt 4 δ g T δ = 0. (.43 Further, using g T δ = t g T 1 δ 1, g T 4 δ = t g T 3 δ 1 and g T δ 4 = t g T 1 δ 3, (.43 can be simplified as g T 1 δ 1 [ g T 4 δ 1 g T 1 δ 4 + t(g T 3 δ 1 + g T 1 δ 3 ] + (g T 3 δ 1 = 0. (.44 11
12 Consequently, we obtain the following equation for the parameter t: where (t + 1 (t + 1 [ p (t + tp(t q(t ] = 0, (.45 p(t = ( + t + ( + t 1, q(t = ( + 3 t 3 (4 + 3 t + (3 + t. Further calculations provide ( [ p (t + tp(t q(t ] = [ t + (1 t + (1 ] [t + (1 3 t + ( ]. Considering the requirement that t ( 1, 1, one can deduce the following quadratic equation from (.45, ( t ( t = 0. (.46 The above equation has a unique root of t in the interval ( 1, 1, t = (.47 The numerical value of t is equal to approximately. Therefore if we choose the above t and the vectors δ 1 and g 1 such that (.4 holds with minus sign, then the prefixed steps and gradients satisfy the consistency conditions (.33, (.34, (.35 and (.36 for all k 1. There are many ways to choose δ 1 and g 1 satisfying (.4. Specifically, we choose ( η1 = ξ 1 ( 0, ( c1 = d 1 ( 0. (.48 In this case, by (.4 with minus signs and the definitions of i s, we can get that ( ( γ1 (17 8 t + ( = τ 1 ( t + (17 9, ( ( (.49 l1 ( 4 13 t + ( = h 1 ( 4 13 t + ( The initial step δ 1 and the initial gradient g 1 are then determined..5 QuasiNewton Updating Matrices In this subsection we discuss the form of the BFGS quasinewton updating matrices implied by (.5(.8 under the consistent conditions (.33(.36. A byproduct is that, we will know how to choose the initial matrix H 1 for the example. For this aim, we provide a followup lemma of Lemma.3. 1
13 Lemma.4. Assume that (.30 holds and the matrix H satisfies (.9. If further, the matrix A in (.3 is positive definite, there must exist nonsingular triangular matrices T 1 and T such that A = V T U = T T 1 T. (.50 Further, denoting ˆV = V T 1 = (ˆv 1, ˆv,..., ˆv n and the diagonal matrix T 1 1 T 1 = diag(t 1, t,..., t n, we have that H = n t i ˆv iˆv i T. (.51 i=1 Proof. The truth of (.50 is obvious by the Cholesky factorization of positive definite matrices. Further, by (.50 and the definition of ˆV, we get that Since HU = V, we obtain H = V U 1 = ( ( ˆV T 1 1 ˆV T U = T. T 1 ˆV T = ˆV ( T1 1 T 1 ˆV T, which with the definitions of ˆv i s and t i s implies the truth of (.51. Q.E.D. By the above lemma, we can express the form of H k+1 determined by (.5 (.6 under the consistent conditions (.33(.36. At first, we make use of the steps {δ k+i ; i = 0, 1,, 3} to introduce the following two vectors that are orthogonal to γ k and g k+1 : z k = δt k+γ k δ T k γ k δ k δt k+g k+1 δ T k+1g k+1 δ k+1 + δ k+, w k = δt k+3γ k δ T k γ k δ k δt k+3g k+1 δ T k+1g k+1 δ k+1 + δ k+3. (.5 The vectors z k and w k are well defined because δ T k γ k = δ T k g k > 0 is positive due to the descent property of δ k. Direct calculations show that z T k γ k = z T k g k+1 = 0, w T k γ k = w T k g k+1 = 0. (.53 Further, we use z k and w k to define the vector that is orthogonal to g k+ : By the choice of v k, it is easy to see that v k = wt k g k+ z T k g k+ z k + w k. (.54 v T k γ k = v T k g k+1 = v T k g k+ = 0. (.55 13
14 Now we could express the matrix H k+1 by H k+1 = δ kδ T k δ T k g k δ k+1δ T k+1 δ T k+1g k+1 z kz T k z T k g k+ v kv T k v T k g k+3. (.56 Since δ T k g k < 0 for all k 1, we know that the matrix H k+1 is positive definite if and only if z T k g k+ < 0 and v T k g k+3 < 0. Direct calculations show that z T 1 g 3 = ( t ( < 0, v T 1 g 4 = ( t + ( < 0. Therefore we know that the matrix H is positive definite. In addition, it is difficult to build the relations δ T k+1g k+1 = t δ T k g k, z T k+1 g k+3 = t z T k g k+, v T k+1 g k+4 = t v T k g k+3, z k+1 = Mz k and v k+1 = M v k for all k 1. Thus we have by (.56 and δ k+1 = M δ k that H k+1 = 1 t MH km T, (.57 holds for all k and hence {H k : k } are all positive definite. With the above procedure, we can calculate the values of z 1, v 1 and H. Further, by (.3, we can obtain the initial matrix H 1 required by the example of this paper. H 1 = H δ 1δ T + δ δ T 1 δ T 1 γ 1 + gt δ g T 1 δ 1 δ 1 δ T 1 δ T 1 γ 1 := H , (.58 where H 1 is a symmetric matrix with entries H 1 (1, 1 = ( t + ( , H 1 (1, = ( t ( , H 1 (1, 3 = ( t ( , H 1 (1, 4 = ( t + ( H 1 (, = ( t + ( , H 1 (, 3 = 1068 t ( , H 1 (, 4 = ( t ( , H 1 (3, 3 = ( t + ( , H 1 (3, 4 = ( t + ( , H 1 (4, 4 = ( t + ( (.59 Direct calculations show that (.57 also holds with k = 1 and hence H 1 is a positive definite matrix. 14
15 3 Constructing A Suitable Objective Function 3.1 Recovering The Iterations and The Function Values Denote the rotation matrices ( cos θ1 sin θ R 1 = 1, R sin θ 1 cos θ = 1 ( cos θ sin θ sin θ cos θ. (3.1 If we write the steps {δ k } and gradients {g k } into the forms η i t 8j l i δ 8j+i = ξ i t 8j γ i, g 8j+i = t 8j h i c i ; i = 1,..., 8, (3. t 8j τ i d i we have from (.1, (., (.4 and (.5 that ( ( ηi+1 ηi = R ξ 1, i+1 ξ i ( ( li+1 li = t R h 1, i+1 h i ( γi+1 τ i+1 ( γi = t R, τ i ( ( ci+1 ci = R d. i+1 d i (3.3 For simplicity, we want the iterations {x k } asymptotically to turn around the eight vertices of some regular octagon Ω of the subspace spanned by the first and second coordinates. More exactly, such a regular octagon Ω has the origin as its center and its eight vertices V i are given by V i = (a i, b i, 0, 0 T with ( = b 1 1, ( a1 ( ai+1 b i+1 ( ai = R 1. (3.4 b i Here we should note that if there is no confusion, we also regard that Ω and V i s are defined in the subspace spanned by the first and second coordinates. To this aim, we ask the iterations {x k } to be of the form a i x 8j+i = b i t 8j p i, (3.5 t 8j q i where ( pi+1 q i+1 To decide the values of p 1 and q 1, noting that x 1 V 1 = x 1 lim j x 8j+1 = ( pi = t R. (3.6 q i 15 δ i, i=0
16 we have that ( p1 q 1 = i=0 ( γi τ i ( = (E tr 1 γ1, (3.7 τ 1 where again E is the twodimensional identity matrix. Direct calculations show that ( p1 = 1 ( ( t + ( q ( t + ( (3.8 Now, let us assume that the limit of f(x k is f. Since for any j, the first two coordinates of {x 8j+i } always keep the same as those of its limit V i, we can get that ( T ( f(x 8j+i f = t 8j ci pi d i q i where = t 8j [ R i 1 ( T ( = t 8j+i 1 c1 p1 d 1 q 1 = t 8j+i 1 (f(x 1 f, ( c1 d 1 ] T [ (tr i 1 ( p1 q 1 ] f(x 1 f = c 1 p 1 + d 1 q 1 = ( t + ( Now we see the value of g T 8j+i δ 8j+i. Direct calculations show that t 8j T T l i η i l i η i g8j+i T δ 8j+i = t 8j h i ξ i c i t 8j γ i = t8j h i ξ i c i γ i d i t 8j τ i d i τ i = t 8j+i 1 (l 1 η 1 + h 1 ξ 1 + c 1 γ 1 + d 1 τ 1 = t 8j+i 1 g1 T δ 1, where g T 1 δ 1 = l 1 η 1 + h 1 ξ 1 + c 1 γ 1 + d 1 τ 1 = ( t + ( Therefore f(x 8j+i+1 f(x 8j+i α 8j+i g T 8j+iδ 8j+i = (f(x 8j+i+1 f (f(x 8j+i f g T 8j+iδ 8j+i = (t8j+i t 8j+i 1 (f(x 1 f t 8j+i 1 g T 1 δ 1 = (t 1(f(x 1 f g T 1 δ E 0. (3.9 16
17 The above relation, together with g T 8j+i+1 δ 8j+i = 0, implies that the stepsize α 8j+i = 1 can be accepted by the Wolfe line search, the Armijo line search and the Goldstein line search with suitable line search parameters. 3. Seeking A Suitable Form for The Objective Function We now consider how to construct a smooth function f such that its gradient at any point x 8j+i given in (3.5 are the one given in (3.; namely, f(x 8j+i = g 8j+i, for all j 0 and i = 1,..., 8. (3.10 To this aim, we assume that f is of the form f(x 1, x, x 3, x 4 = λ(x 1, x x 3 + µ(x 1, x x 4, (3.11 where λ and µ are twodimensional functions to be determined. Since the function (3.11 is linear with x 3 and x 4, we know that f has no lower bound in R n. Consequently, for the sequence {x k } generated by some optimization method for the minimization of this function, it is expected that f(x k tends to, but this will not happen in our examples. With the prefixed form (3.11, we have that f(x 1, x, x 3, x 4 = λ x 1 x 3 + µ x 1 x 4 λ x x 3 + µ x x 4 λ(x 1, x µ(x 1, x. (3.1 Comparing the last two components of the right hand vector in (3.1 with the gradients in (3., we must have that λ(v i = c i, µ(v i = d i. (3.13 Consequently, by (.48 and (3.3, we can obtain the concrete values of λ and µ at the eight vertices of Ω, which are listed in the second and third rows in Table 3.1. Further, for each i, let J i be the Jacobian of (λ, µ at vertex V i and denote J i = ( λ x 1 λ x 1 µ µ x x Vi, J 1 = ( ω1 ω. (3.14 ω 3 ω 4 The comparison of the first two components of the right hand vector in (3.1 with the gradients in (3. leads to the relation ( ( pi li J i =. (3.15 q i To be such that (3.15 always holds, we ask J i+1 and J i to meet the condition h i J i+1 = R 1 J i R T (
18 for all i 1. In this case, if (3.15 holds for some i, we have by this, (3.6 and (3.3 that ( ( ( ( ( pi+1 J i+1 = (R q 1 J i R T pi pi li li+1 (tr = tr i+1 q 1 J i = tr i q 1 =, i h i h i+1 which means that (3.15 holds with i + 1. Therefore by the induction principle, to be such that (3.15 holds for all i 1, it remains to choose J 1 such that (3.15 holds with i = 1. Noticing that there are still two degrees of freedom, we ask the special relations ω = ω 3, ω 4 = ω 1. (3.17 Then the values of ω 1 and ω 3 can be solved from (3.15 with i = 1 and (3.17, ω 1 = ( t ( , ω 3 = ( t + ( (3.18 Therefore if we choose J 1 to be the one in (3.14 with the values in (3.17 and (3.18 and ask the relation (3.16 for all i 1, then we will have (3.15 for all i 1. Now, using (3.13, (3.16 and (3.17, we can list the function values and gradients of λ and µ at the vertices V i s of the octagon Ω into Table 3.1. V 1 V V 3 V 4 V 5 V 6 V 7 V 8 λ µ λ x 1 ω 1 ω 1 ω 1 ω 1 ω 1 ω 1 ω 1 ω 1 λ x ω 3 ω 3 ω 3 ω 3 ω 3 ω 3 ω 3 ω 3 µ x 1 ω 3 ω 3 ω 3 ω 3 ω 3 ω 3 ω 3 ω 3 µ x ω 1 ω 1 ω 1 ω 1 ω 1 ω 1 ω 1 ω 1 Table 3.1. Function values and gradients of λ and µ at V i s By using the special choice of Ω and observing the values in Table 3.1, we can ask the functions λ and µ to have the properties and µ(x 1, x = λ( x, x 1 (3.19 λ(x 1, x = λ( x 1, x (3.0 for all (x 1, x R. Further, by (3.0, we could think of constructing the function λ by a polynomial function with only odd orders. In the next subsection, we will construct a simple function that only meets the requirements (a and (c that are described in the first section. In 3.4, we construct a complicated function that can meet all the requirements (a, (b and (c. 18
19 3.3 A Simple Function that Meets (a and (c If we ignore the convexity requirement (b of the line search function, we can construct a relatively simple function λ(x 1, x to meet the interpolation conditions listed in Table 3.1 and the function µ(x 1, x is then given by (3.19. In fact, we can check that the following function λ f (x 1, x =(15 1 x (6 9 (x 4 1 x + x 1x 3 + ( x 3 1 +( 1 + (6x 1x + x 3 + ( x1 9 8 x. has values of 0, 1,, 1 at vertices V 1, V, V 3, V 4, respectively, and zero derivatives at all the vertices V i s. Further, we can check the following function λ g (x 1, x = 3 (x 1 1[x 1 (3 + ]x 4 has zero function values at all the vertices V i s, but could be used to interpolate the derivatives of λ(x 1, x at V i s. Consequently, we know that λ(x 1, x = λ f (x 1, x + ω 1 λ g (x 1, x + ω 3 λ g ( x, x 1 (3.1 meets all the interpolation conditions required by λ(x 1, x in Table 3.1. Consequently, we could say that for the function defined by (3.11, (3.1 and (3.19, the BFGS method with the Wolfe line search using the unit initial step does not converge. Observing that Ψ i (α = tψ i (α, Ψ i(α = tψ i(α at α = 0, 1, we ideally wish there is the following relation between Ψ i (α and Ψ i+1 (α: Ψ i+1 (α = t Ψ i (α, for all α 0 (3. so that the neighboring line search functions only differ up to a constant multiplier of t. However, the choice (3.1 of λ(x 1, x does not lead to (3.. Nevertheless, we can propose the following compensation function and define λ c (x 1, x = x 1 [ x 1 + x ( + ] ˆλ(x 1, x = λ(x 1, x + c 1 λ c (x 1, x + c λ c ( x, x 1, (3.3 where c 1 = ( t ( , c = ( t + (
20 In this case, the relation (3. will always hold and the corresponding line search function at the first iteration is 6 ˆΨ 1 (α = ρ i α i, (3.4 where i=0 ρ 6 = ( t + ( , ρ 5 = ( t ( , ρ 4 = ( t + ( , ρ 3 = ( t ( , ρ = ( t + ( , ρ 1 = g T 1 δ 1 = ( t + (40 18, ρ 0 = f(x 1 = ( t + ( To sum up at this stage, for the function (3.11 with λ(x 1, x given in (3.1 or (3.3 and µ(x 1, x = λ( x, x 1, if the initial point is x 1 = (a 1, b 1, p 1, q 1 T (see (3.4, (3.8, (.47 for their values and if the initial matrix is H 1 given by (.58 and (.59, then the BFGS method (1. and (1.5 with α k 1 will generate the iterations in (3.5 whose gradients are given by (.4(.5. Therefore the method will asymptotically cycle around the eight vertices of a regular octagon without approaching a stationary point or pushing f(x k. 3.4 A Complicated Function that Meets (a, (b and (c To meet the requirement (b, this subsection gives a further compensation to the function ˆλ(x 1, x in (3.3 such that each line search function Ψ k (α is strongly convex. To this aim, we consider the straight line connecting x 8j+i and x 8j+i+1 : a i + α η i l i (α = x 8j+i + α δ 8j+i = b i + α ξ i t 8j (p i + α γ i. (3.5 t 8j (q i + α τ i By (3.1, the line search function at the (8j + ith iteration is Ψ 8j+i (α = t 8j[ λ(a i +α η i, b i +α ξ i ( p i +α γ i +µ(ai +α η i, b i +α ξ i ( q i +α τ i ]. (3.6 0
(Quasi)Newton methods
(Quasi)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable nonlinear function g, x such that g(x) = 0, where g : R n R n. Given a starting
More informationt := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).
1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction
More informationThe Characteristic Polynomial
Physics 116A Winter 2011 The Characteristic Polynomial 1 Coefficients of the characteristic polynomial Consider the eigenvalue problem for an n n matrix A, A v = λ v, v 0 (1) The solution to this problem
More informationContinued Fractions and the Euclidean Algorithm
Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction
More informationIRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL. 1. Introduction
IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL R. DRNOVŠEK, T. KOŠIR Dedicated to Prof. Heydar Radjavi on the occasion of his seventieth birthday. Abstract. Let S be an irreducible
More informationInner Product Spaces and Orthogonality
Inner Product Spaces and Orthogonality week 34 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,
More informationApplications to Data Smoothing and Image Processing I
Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is
More informationMathematics Course 111: Algebra I Part IV: Vector Spaces
Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 19967 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More information1 VECTOR SPACES AND SUBSPACES
1 VECTOR SPACES AND SUBSPACES What is a vector? Many are familiar with the concept of a vector as: Something which has magnitude and direction. an ordered pair or triple. a description for quantities such
More informationClassification of Cartan matrices
Chapter 7 Classification of Cartan matrices In this chapter we describe a classification of generalised Cartan matrices This classification can be compared as the rough classification of varieties in terms
More informationLINEAR ALGEBRA W W L CHEN
LINEAR ALGEBRA W W L CHEN c W W L Chen, 1997, 2008 This chapter is available free to all individuals, on understanding that it is not to be used for financial gain, and may be downloaded and/or photocopied,
More informationAdaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More informationModern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
More informationISOMETRIES OF R n KEITH CONRAD
ISOMETRIES OF R n KEITH CONRAD 1. Introduction An isometry of R n is a function h: R n R n that preserves the distance between vectors: h(v) h(w) = v w for all v and w in R n, where (x 1,..., x n ) = x
More information13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.
3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in threespace, we write a vector in terms
More informationNotes on Symmetric Matrices
CPSC 536N: Randomized Algorithms 201112 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.
More informationRecall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n.
ORTHOGONAL MATRICES Informally, an orthogonal n n matrix is the ndimensional analogue of the rotation matrices R θ in R 2. When does a linear transformation of R 3 (or R n ) deserve to be called a rotation?
More informationQuadratic Functions, Optimization, and Quadratic Forms
Quadratic Functions, Optimization, and Quadratic Forms Robert M. Freund February, 2004 2004 Massachusetts Institute of echnology. 1 2 1 Quadratic Optimization A quadratic optimization problem is an optimization
More informationWe call this set an ndimensional parallelogram (with one vertex 0). We also refer to the vectors x 1,..., x n as the edges of P.
Volumes of parallelograms 1 Chapter 8 Volumes of parallelograms In the present short chapter we are going to discuss the elementary geometrical objects which we call parallelograms. These are going to
More informationMATH 590: Meshfree Methods
MATH 590: Meshfree Methods Chapter 7: Conditionally Positive Definite Functions Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2010 fasshauer@iit.edu MATH 590 Chapter
More informationOn the representability of the biuniform matroid
On the representability of the biuniform matroid Simeon Ball, Carles Padró, Zsuzsa Weiner and Chaoping Xing August 3, 2012 Abstract Every biuniform matroid is representable over all sufficiently large
More informationIdeal Class Group and Units
Chapter 4 Ideal Class Group and Units We are now interested in understanding two aspects of ring of integers of number fields: how principal they are (that is, what is the proportion of principal ideals
More informationx1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.
Cross product 1 Chapter 7 Cross product We are getting ready to study integration in several variables. Until now we have been doing only differential calculus. One outcome of this study will be our ability
More informationNumerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen
(für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained
More informationMATH10212 Linear Algebra. Systems of Linear Equations. Definition. An ndimensional vector is a row or a column of n numbers (or letters): a 1.
MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0534405967. Systems of Linear Equations Definition. An ndimensional vector is a row or a column
More informationThe NEWUOA software for unconstrained optimization without derivatives 1. M.J.D. Powell
DAMTP 2004/NA05 The NEWUOA software for unconstrained optimization without derivatives 1 M.J.D. Powell Abstract: The NEWUOA software seeks the least value of a function F(x), x R n, when F(x) can be calculated
More informationSimilarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
More informationAN ITERATIVE METHOD FOR THE HERMITIANGENERALIZED HAMILTONIAN SOLUTIONS TO THE INVERSE PROBLEM AX = B WITH A SUBMATRIX CONSTRAINT
Bulletin of the Iranian Mathematical Society Vol. 39 No. 6 (2013), pp 1291260. AN ITERATIVE METHOD FOR THE HERMITIANGENERALIZED HAMILTONIAN SOLUTIONS TO THE INVERSE PROBLEM AX = B WITH A SUBMATRIX CONSTRAINT
More informationLINEAR ALGEBRA. September 23, 2010
LINEAR ALGEBRA September 3, 00 Contents 0. LUdecomposition.................................... 0. Inverses and Transposes................................. 0.3 Column Spaces and NullSpaces.............................
More informationNumerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number
More information2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
More informationON CERTAIN DOUBLY INFINITE SYSTEMS OF CURVES ON A SURFACE
i93 c J SYSTEMS OF CURVES 695 ON CERTAIN DOUBLY INFINITE SYSTEMS OF CURVES ON A SURFACE BY C H. ROWE. Introduction. A system of co 2 curves having been given on a surface, let us consider a variable curvilinear
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
More informationAlgebra 2 Chapter 1 Vocabulary. identity  A statement that equates two equivalent expressions.
Chapter 1 Vocabulary identity  A statement that equates two equivalent expressions. verbal model A word equation that represents a reallife problem. algebraic expression  An expression with variables.
More informationRoots of Polynomials
Roots of Polynomials (Com S 477/577 Notes) YanBin Jia Sep 24, 2015 A direct corollary of the fundamental theorem of algebra is that p(x) can be factorized over the complex domain into a product a n (x
More informationMATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set.
MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set. Vector space A vector space is a set V equipped with two operations, addition V V (x,y) x + y V and scalar
More information3. INNER PRODUCT SPACES
. INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.
More informationDecember 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in twodimensional space (1) 2x y = 3 describes a line in twodimensional space The coefficients of x and y in the equation
More informationThe Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Linesearch Method
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Linesearch Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem
More informationMATH 240 Fall, Chapter 1: Linear Equations and Matrices
MATH 240 Fall, 2007 Chapter Summaries for Kolman / Hill, Elementary Linear Algebra, 9th Ed. written by Prof. J. Beachy Sections 1.1 1.5, 2.1 2.3, 4.2 4.9, 3.1 3.5, 5.3 5.5, 6.1 6.3, 6.5, 7.1 7.3 DEFINITIONS
More informationDiagonalisation. Chapter 3. Introduction. Eigenvalues and eigenvectors. Reading. Definitions
Chapter 3 Diagonalisation Eigenvalues and eigenvectors, diagonalisation of a matrix, orthogonal diagonalisation fo symmetric matrices Reading As in the previous chapter, there is no specific essential
More informationNumerical Methods for Solving Systems of Nonlinear Equations
Numerical Methods for Solving Systems of Nonlinear Equations by Courtney Remani A project submitted to the Department of Mathematical Sciences in conformity with the requirements for Math 4301 Honour s
More informationVector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a nonempty
More information1 Short Introduction to Time Series
ECONOMICS 7344, Spring 202 Bent E. Sørensen January 24, 202 Short Introduction to Time Series A time series is a collection of stochastic variables x,.., x t,.., x T indexed by an integer value t. The
More informationDuality of linear conic problems
Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least
More informationFactorization Theorems
Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization
More informationCONTROLLABILITY. Chapter 2. 2.1 Reachable Set and Controllability. Suppose we have a linear system described by the state equation
Chapter 2 CONTROLLABILITY 2 Reachable Set and Controllability Suppose we have a linear system described by the state equation ẋ Ax + Bu (2) x() x Consider the following problem For a given vector x in
More informationInteger Factorization using the Quadratic Sieve
Integer Factorization using the Quadratic Sieve Chad Seibert* Division of Science and Mathematics University of Minnesota, Morris Morris, MN 56567 seib0060@morris.umn.edu March 16, 2011 Abstract We give
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
More informationLecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs
CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like
More informationThe Open University s repository of research publications and other research outputs
Open Research Online The Open University s repository of research publications and other research outputs The degreediameter problem for circulant graphs of degree 8 and 9 Journal Article How to cite:
More informationIntroduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra  1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
More informationCS3220 Lecture Notes: QR factorization and orthogonal transformations
CS3220 Lecture Notes: QR factorization and orthogonal transformations Steve Marschner Cornell University 11 March 2009 In this lecture I ll talk about orthogonal matrices and their properties, discuss
More informationSection 1.1. Introduction to R n
The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to
More informationBy choosing to view this document, you agree to all provisions of the copyright laws protecting it.
This material is posted here with permission of the IEEE Such permission of the IEEE does not in any way imply IEEE endorsement of any of Helsinki University of Technology's products or services Internal
More informationFurther Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1
Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing
More information24. The Branch and Bound Method
24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NPcomplete. Then one can conclude according to the present state of science that no
More informationMathematics Notes for Class 12 chapter 3. Matrices
1 P a g e Mathematics Notes for Class 12 chapter 3. Matrices A matrix is a rectangular arrangement of numbers (real or complex) which may be represented as matrix is enclosed by [ ] or ( ) or Compact form
More informationChapter 8. Matrices II: inverses. 8.1 What is an inverse?
Chapter 8 Matrices II: inverses We have learnt how to add subtract and multiply matrices but we have not defined division. The reason is that in general it cannot always be defined. In this chapter, we
More informationMath 215 HW #6 Solutions
Math 5 HW #6 Solutions Problem 34 Show that x y is orthogonal to x + y if and only if x = y Proof First, suppose x y is orthogonal to x + y Then since x, y = y, x In other words, = x y, x + y = (x y) T
More informationSMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA)
SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI (B.Sc.(Hons.), BUAA) A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF MATHEMATICS NATIONAL
More informationMath Review. for the Quantitative Reasoning Measure of the GRE revised General Test
Math Review for the Quantitative Reasoning Measure of the GRE revised General Test www.ets.org Overview This Math Review will familiarize you with the mathematical skills and concepts that are important
More informationU.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009. Notes on Algebra
U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009 Notes on Algebra These notes contain as little theory as possible, and most results are stated without proof. Any introductory
More informationSolving polynomial least squares problems via semidefinite programming relaxations
Solving polynomial least squares problems via semidefinite programming relaxations Sunyoung Kim and Masakazu Kojima August 2007, revised in November, 2007 Abstract. A polynomial optimization problem whose
More informationSECRET sharing schemes were introduced by Blakley [5]
206 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 1, JANUARY 2006 Secret Sharing Schemes From Three Classes of Linear Codes Jin Yuan Cunsheng Ding, Senior Member, IEEE Abstract Secret sharing has
More informationRoots and Coefficients of a Quadratic Equation Summary
Roots and Coefficients of a Quadratic Equation Summary For a quadratic equation with roots α and β: Sum of roots = α + β = and Product of roots = αβ = Symmetrical functions of α and β include: x = and
More information1 Introduction to Matrices
1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns
More informationBIG DATA PROBLEMS AND LARGESCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION
BIG DATA PROBLEMS AND LARGESCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION Ş. İlker Birbil Sabancı University Ali Taylan Cemgil 1, Hazal Koptagel 1, Figen Öztoprak 2, Umut Şimşekli
More informationCURVES WHOSE SECANT DEGREE IS ONE IN POSITIVE CHARACTERISTIC. 1. Introduction
Acta Math. Univ. Comenianae Vol. LXXXI, 1 (2012), pp. 71 77 71 CURVES WHOSE SECANT DEGREE IS ONE IN POSITIVE CHARACTERISTIC E. BALLICO Abstract. Here we study (in positive characteristic) integral curves
More informationPUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include 2 + 5.
PUTNAM TRAINING POLYNOMIALS (Last updated: November 17, 2015) Remark. This is a list of exercises on polynomials. Miguel A. Lerma Exercises 1. Find a polynomial with integral coefficients whose zeros include
More information1 Sets and Set Notation.
LINEAR ALGEBRA MATH 27.6 SPRING 23 (COHEN) LECTURE NOTES Sets and Set Notation. Definition (Naive Definition of a Set). A set is any collection of objects, called the elements of that set. We will most
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 10 Boundary Value Problems for Ordinary Differential Equations Prof. Michael T. Heath Department of Computer Science University of Illinois at UrbanaChampaign
More informationPresentation 3: Eigenvalues and Eigenvectors of a Matrix
Colleen Kirksey, Beth Van Schoyck, Dennis Bowers MATH 280: Problem Solving November 18, 2011 Presentation 3: Eigenvalues and Eigenvectors of a Matrix Order of Presentation: 1. Definitions of Eigenvalues
More informationThe Inverse of a Matrix
The Inverse of a Matrix 7.4 Introduction In number arithmetic every number a ( 0) has a reciprocal b written as a or such that a ba = ab =. Some, but not all, square matrices have inverses. If a square
More informationDeterminants. Dr. Doreen De Leon Math 152, Fall 2015
Determinants Dr. Doreen De Leon Math 52, Fall 205 Determinant of a Matrix Elementary Matrices We will first discuss matrices that can be used to produce an elementary row operation on a given matrix A.
More informationOn the Interaction and Competition among Internet Service Providers
On the Interaction and Competition among Internet Service Providers Sam C.M. Lee John C.S. Lui + Abstract The current Internet architecture comprises of different privately owned Internet service providers
More informationNumerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J. Olver 6. Eigenvalues and Singular Values In this section, we collect together the basic facts about eigenvalues and eigenvectors. From a geometrical viewpoint,
More informationSolution of Linear Systems
Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start
More informationTHE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS
THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS KEITH CONRAD 1. Introduction The Fundamental Theorem of Algebra says every nonconstant polynomial with complex coefficients can be factored into linear
More informationNonlinear Iterative Partial Least Squares Method
Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., RichardPlouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for
More informationINTRODUCTORY LINEAR ALGEBRA WITH APPLICATIONS B. KOLMAN, D. R. HILL
SOLUTIONS OF THEORETICAL EXERCISES selected from INTRODUCTORY LINEAR ALGEBRA WITH APPLICATIONS B. KOLMAN, D. R. HILL Eighth Edition, Prentice Hall, 2005. Dr. Grigore CĂLUGĂREANU Department of Mathematics
More informationStructure of the Root Spaces for Simple Lie Algebras
Structure of the Root Spaces for Simple Lie Algebras I. Introduction A Cartan subalgebra, H, of a Lie algebra, G, is a subalgebra, H G, such that a. H is nilpotent, i.e., there is some n such that (H)
More informationNotes on Determinant
ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 918/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without
More informationApproximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NPCompleteness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
More informationLinear Programming I
Linear Programming I November 30, 2003 1 Introduction In the VCR/guns/nuclear bombs/napkins/star wars/professors/butter/mice problem, the benevolent dictator, Bigus Piguinus, of south Antarctica penguins
More informationNotes on Factoring. MA 206 Kurt Bryan
The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor
More information(a) The transpose of a lower triangular matrix is upper triangular, and the transpose of an upper triangular matrix is lower triangular.
Theorem.7.: (Properties of Triangular Matrices) (a) The transpose of a lower triangular matrix is upper triangular, and the transpose of an upper triangular matrix is lower triangular. (b) The product
More informationASEN 3112  Structures. MDOF Dynamic Systems. ASEN 3112 Lecture 1 Slide 1
19 MDOF Dynamic Systems ASEN 3112 Lecture 1 Slide 1 A TwoDOF MassSpringDashpot Dynamic System Consider the lumpedparameter, massspringdashpot dynamic system shown in the Figure. It has two point
More informationCHAPTER SIX IRREDUCIBILITY AND FACTORIZATION 1. BASIC DIVISIBILITY THEORY
January 10, 2010 CHAPTER SIX IRREDUCIBILITY AND FACTORIZATION 1. BASIC DIVISIBILITY THEORY The set of polynomials over a field F is a ring, whose structure shares with the ring of integers many characteristics.
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationCHAPTER 1 Splines and Bsplines an Introduction
CHAPTER 1 Splines and Bsplines an Introduction In this first chapter, we consider the following fundamental problem: Given a set of points in the plane, determine a smooth curve that approximates the
More informationFacts About Eigenvalues
Facts About Eigenvalues By Dr David Butler Definitions Suppose A is an n n matrix An eigenvalue of A is a number λ such that Av = λv for some nonzero vector v An eigenvector of A is a nonzero vector v
More informationThe Dirichlet Unit Theorem
Chapter 6 The Dirichlet Unit Theorem As usual, we will be working in the ring B of algebraic integers of a number field L. Two factorizations of an element of B are regarded as essentially the same if
More informationOverview of Math Standards
Algebra 2 Welcome to math curriculum design maps for Manhattan Ogden USD 383, striving to produce learners who are: Effective Communicators who clearly express ideas and effectively communicate with diverse
More informationUNIT 2 MATRICES  I 2.0 INTRODUCTION. Structure
UNIT 2 MATRICES  I Matrices  I Structure 2.0 Introduction 2.1 Objectives 2.2 Matrices 2.3 Operation on Matrices 2.4 Invertible Matrices 2.5 Systems of Linear Equations 2.6 Answers to Check Your Progress
More informationA note on companion matrices
Linear Algebra and its Applications 372 (2003) 325 33 www.elsevier.com/locate/laa A note on companion matrices Miroslav Fiedler Academy of Sciences of the Czech Republic Institute of Computer Science Pod
More informationEigenvalues, Eigenvectors, Matrix Factoring, and Principal Components
Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they
More informationThe basic unit in matrix algebra is a matrix, generally expressed as: a 11 a 12. a 13 A = a 21 a 22 a 23
(copyright by Scott M Lynch, February 2003) Brief Matrix Algebra Review (Soc 504) Matrix algebra is a form of mathematics that allows compact notation for, and mathematical manipulation of, highdimensional
More information3.8 Finding Antiderivatives; Divergence and Curl of a Vector Field
3.8 Finding Antiderivatives; Divergence and Curl of a Vector Field 77 3.8 Finding Antiderivatives; Divergence and Curl of a Vector Field Overview: The antiderivative in one variable calculus is an important
More information