Elementary Gradient-Based Parameter Estimation
|
|
|
- Nathaniel Phillips
- 9 years ago
- Views:
Transcription
1 Elementary Gradient-Based Parameter Estimation Julius O. Smith III Center for Computer Research in Music and Acoustics (CCRMA Department of Music, Stanford University, Stanford, California USA Abstract This section defines some of the basic terms involved in optimization techniques known as gradient descent and Newton s method. Terms defined include metric space, linear space, norm, pseudo-norm, normed linear space, Banach space, Lp space, Hilbert space, functional, convex norm, concave norm, local minimizer, global minimizer, and Taylor series expansion. Contents 1 Vector Space Concepts Specific Norms Concavity (Convexity Concave Norms Gradient Descent 6 4 Taylor s Theorem 7 5 Newton s Method 8 1
2 1 Vector Space Concepts Definition. A set X of objects is called a metric space if with any two points p and q of X there is associated a real number d(p, q, called the distance from p to q, such that (a d(p, q > 0 if p q; d(p, p = 0, (b d(p, q = d(q, p, (c d(p, q d(p, r + d(r, q, for any r X [6]. Definition. A linear space is a set of vectors X together with a field of scalars S with an addition operation + : X X X, and a multiplication opration taking S X X, with the following properties: If x, y, and z are in X, and α, β are in S, then 1. x + y = y + x. 2. x + (y + z = (x + y + z. 3. There exists in X such that 0 x = for all x in X. 4. α(βx = (αβx. 5. (α + βx = αx + βx x = x. 7. α(x + y = αx + αy. The element is written as 0 thus coinciding with the notation for the real number zero. A linear space is sometimes be called a linear vector space, or a vector space. Definition. A normed linear space is a linear space X on which there is defined a real-valued function of x X called a norm, denoted x, satisfying the following three properties: 1. x 0, and x = 0 x = cx = c x, c a scalar. 3. x 1 + x 2 x 1 + x 2. The functional x y serves as a distance function on X, so a normed linear space is also a metric space. Note that when X is the space of continuous complex functions on the unit circle in the complex plane, the norm of a function is not changed when multiplied by a function of modulus 1 on the unit circle. In signal processing terms, the norm is insensitive to multiplication by a unity-gain allpass filter (also known as a Blaschke product. Definition. A pseudo-norm is a real-valued function of x X satisfying the following three properties: 1. x 0, and x = 0 = x = cx = c x, c a scalar. 3. x 1 + x 2 x 1 + x 2. 2
3 A pseudo-norm differs from a norm in that the pseudo-norm can be zero for nonzero vectors (functions. Definition. A Banach Space is a complete normed linear space, that is, a normed linear space in which every Cauchy sequence 1 converges to an element of the space. Definition. A function H(e jω is said to belong to the space L p if π π H(e jω p dω 2π <. Definition. A function H(e jω is said to belong to the space H p if it is in L p and if its analytic continuation H(z is analytic for z < 1. H(z is said to be in H p if H(z 1 H p. Theorem. (Riesz-Fischer The L p spaces are complete. Proof. See Royden [5], p Definition. A Hilbert space is a Banach space with a symmetric bilinear inner product < x, y > defined such that the inner product of a vector with itself is the square of its norm < x, x >= x Specific Norms The L p norms are defined on the space L p by ( 1 π F p = F (e jω p dω 1/p, p 1. (1 2π π 2π L p norms are technically pseudo-norms; if functions in L p are replaced by equivalence classes containing all functions equal almost everywhere, then a norm is obtained. Since all practical desired frequency responses arising in digital filter design problems are bounded on the unit circle, it follows that {H(e jω } forms a Banach space under any L p norm. The weighted L p norms are defined by ( 1 π F p = 2π π F (e jω p W (e jω dω 2π 1 p, p 1, (2 where W (e jω is real, positive, and integrable. Typically, W = 1. If W (e jω = 0 for a set of nonzero measure, then a pseudo-norm results. The case p = 2 gives the popular root mean square norm, and 2 2 can be interpreted as the total energy of F in many physical contexts. An advantage of working in L 2 is that the norm is provided by an inner product, H, G = π The norm of a vector H L 2 is then given by π H(e jω G(e jω dω 2π. H = H, H. 1 A sequence H n(e jω is said to be a Cauchy sequence if for each ɛ > 0 there is an N such that H n(e jω H m(e jω < ɛ for all n and m larger than N. 3
4 As p approaches infinity in Eq. (1, the error measure is dominated by the largest values of F (e jω. Accordingly, it is customary to define F = max π<ω π and this is often called the Chebyshev or uniform norm. Suppose the L 1 norm of F (e jω is finite, and let f(n = 1 2π π π F (e jω, (3 F (e jω jωn dω e 2π denote the Fourier coefficients of F (e jω. When F (e jω is a filter frequency response, f(n is the corresponding impulse response. The filter F is said to be causal if f(n = 0 for n < 0. The norms for impulse response sequences f p are defined in a manner exactly analogous with the frequency response norms F p, viz., f p = ( n= f(n p 1 p. These time-domain norms are called l p norms. The L p and l p norms are strictly concave functionals for 1 < p < (see below. By Parseval s theorem, we have F 2 = f 2, i.e., the L p and l p norms are the same for p = 2. The Frobenious norm of an m n matrix A is defined as A F = m i=1 j=1 n a ij 2. That is, the Frobenious norm is the L 2 norm applied to the elements of the matrix. For this norm there exists the following. Theorem. The unique m n rank k matrix B which minimizes A B F is given by UΣ k V, where A = UΣV is a singular value decomposition of A, and Σ k is formed from Σ by setting to zero all but the k largest singular values. Proof. See Golub and Kahan [3]. The induced norm of a matrix A is defined in terms of the norm defined for the vectors x on which it operates, For the L 2 norm, we have A = sup x A 2 2 = sup x and this is called the spectral norm of the matrix A. Ax x x T A T Ax x T, x 4
5 The Hankel matrix corresponding to a time series f is defined by Γ(f[i, j] = f(i + j, i.e., f(0 f(1 f(2 Γ(f = f(1 f(2 f(2.. Note that the Hankel matrix involves only causal components of the time series. The Hankel norm of a filter frequency response is defined as the spectral norm of the Hankel matrix of its impulse response, F (e jω H = Γ(f 2. The Hankel norm is truly a norm only if H(z H p, i.e., if it is causal. For noncausal filters, it is a pseudo-norm. If F is strictly stable, then F (e jω is finite for all ω, and all norms defined thus far are finite. Also, the Hankel matrix Γ(f is a bounded linear operator in this case. The Hankel norm is bounded below by the L 2 norm, and bounded above by the L norm [1], F 2 F H F, with equality iff F is an allpass filter (i.e., F (e jω constant. 2 Concavity (Convexity Definition. A set S is said to be concave if for every vector x and y in S, λx + (1 λy is in S for all 0 λ 1. In other words, all points on the line between two points of S lie in S. Definition. A functional is a mapping from a vector space to the real numbers R. Thus, for example, every norm is a functional. Definition. A linear functional is a functional f such that for each x and y in the linear space X, and for all scalars α and β, we have f(αx + βy = αf(x + βf(y. Definition. The norm of a linear functional f is defined on the normed linear space X by f = f(x sup x X x. Definition. A functional f defined on a concave subset S of a vector space X is said to be concave on S if for every vector x and y in S, λf(x + (1 λf(y f (λx + (1 λy, 0 λ 1. A concave functional has the property that its values along a line segment lie below or on the line between its values at the end points. The functional is strictly concave on S if strict inequality holds above for λ (0, 1. Finally, f is uniformly concave on S if there exists c > 0 such that for all x, y S, λf(x + (1 λf(y f (λx + (1 λy cλ(1 λ x y 2, 0 λ 1. 5
6 We have Uniformly Concave = Strictly Concave = Concave Definition. A local minimizer of a real-valued function f(x is any x such that f(x < f(x in some neighborhood of x. Definition. A global minimizer of a real-valued function f(x on a set S is any x S such that f(x < f(x for all x S. Definition. A cluster point x of a sequence x n is any point such that every neighborhood of x contains at least one x n. S. Definition. The concave hull of a set S in a metric space is the smallest concave set containing 2.1 Concave Norms A desirable property of the error norm minimized by a filter-design technique is concavity of the error norm with respect to the filter coefficients. When this holds, the error surface looks like a bowl, and the global minimum can be found by iteratively moving the parameters in the downhill (negative gradient direction. The advantages of concavity are evident from the following classical results. Theorem. If X is a vector space, S a concave subset of X, and f a concave functional on S, then any local minimizer of f is a global minimizer of f in S. Theorem. If X is a normed linear space, S a concave subset of X, and f a strictly concave functional on S, then f has at most one minimizer in S. Theorem. Let S be a closed and bounded subset of R n. If f : R n R 1 is continuous on S, then f has at least one minimizer in S. Theorem (2.1 bears directly on the existence of a solution to the general filter design problem in the frequency domain. Replacing closed and bounded with compact, it becomes true for a functional on an arbitrary metric space (Rudin [6], Thm. 14. (In R n, compact is equivalent to closed and bounded [5]. Theorem (2.1 implies only compactness of ˆΘ = {ˆb 0,..., ˆb nb, â 1,..., â na } and continuity of the error norm J(ˆθ on ˆΘ need to be shown to prove existence of a solution to the general frequency-domain filter design problem. 3 Gradient Descent Concavity is valuable in connection with the Gradient Method of minimizing J(ˆθ with respect to ˆθ. Definition. The gradient of the error measure J(ˆθ is defined as the ˆN 1 column vector J (ˆθ = J(θ [ θ (ˆθ = θ J(θb 0 (ˆb0,..., θ J(θb n (ˆbnb b, θ J(θa 1 (â 1,..., T θ J(θa n a (â na ]. Definition. The Gradient Method (Cauchy is defined as follows. 6
7 Given ˆθ 0 ˆΘ, compute ˆθ n+1 = ˆθ n t n J (ˆθ n, n = 1, 2,..., where J (ˆθ n is the gradient of J at ˆθ n, and t n R is chosen as the smallest nonnegative local minimizer of Φ n (t = J (ˆθn tj (ˆθ n. Cauchy originally proposed to find the value of t n 0 which gave a global minimum of Φ n (t. This, however, is not always feasible in practice. Some general results regarding the Gradient Method are given below. Theorem. If ˆθ 0 is a local minimizer of J(ˆθ, and J (ˆθ 0 exists, then J (ˆθ 0 = 0. Theorem. The gradient method is a descent method, i.e., J(ˆθ n+1 J(ˆθ n. Definition. J : ˆΘ R 1, ˆΘ R ˆN, is said to be in the class C k ( ˆΘ if all kth order partial derivatives of J(ˆθ with respect to the components of ˆθ are continuous on ˆΘ. Definition. derivatives, The Hessian J (ˆθ of J at ˆθ is defined as the matrix of second-order partial J (ˆθ[i, j] = 2 J(θ θ[i] θ[j] (ˆθ, where θ[i] denotes the ith component of θ, i = 1,..., ˆN = n a + n b + 1, and [i, j] denotes the matrix entry at the ith row and jth column. The Hessian of every element of C 2 ( ˆΘ is a symmetric matrix [7]. This is because continuous second-order partials satisfy 2 x 1 x 2 = 2 x 2 x 1. Theorem. If J C 1 ( ˆΘ, then any cluster point ˆθ of the gradient sequence ˆθ n is necessarily a stationary point, i.e., J (ˆθ = 0. Theorem. Let ˆΘ denote the concave hull of ˆΘ R ˆN. If J C 2 ( ˆΘ, and there exist positive constants c and C such that c η 2 η T J (ˆθη C η 2, (4 for all ˆθ ˆΘ and for all η R ˆN, then the gradient method beginning with any point in ˆΘ converges to a point ˆθ. Moreover, ˆθ is the unique global minimizer of J in R ˆN. By the norm equivalence theorem [4], Eq. (4 is satisfied whenever J (ˆθ is a norm on ˆΘ for each ˆθ ˆΘ. Since J belongs to C 2 ( ˆΘ, it is a symmetric matrix. It is also bounded since it is continuous over a compact set. Thus a sufficient requirement is that J be positive definite on ˆΘ. Positive definiteness of J can be viewed as positive curvature of J at each point of ˆΘ which corresponds to strict concavity of J on ˆΘ. 4 Taylor s Theorem Theorem. (Taylor Every functional J : R ˆN R 1 in C 2 (R ˆN has the representation J(ˆθ + η = J(ˆθ + J (ˆθη ηt J (ˆθ + ληη 7
8 for some λ between 0 and 1, where J (ˆθ is the ˆN 1 gradient vector evaluated at ˆθ R n, and J (ˆθ is the ˆN ˆN Hessian matrix of J at ˆθ, i.e., J (ˆθ J (ˆθ = J(θ θ (ˆθ (5 = 2 J(θ ˆθ (ˆθ (6 2 Proof. See Goldstein [2] p The Taylor infinite series is treated in Williamson and Crowell [7]. The present form is typically more useful for computing bounds on the error incurred by neglecting higher order terms in the Taylor expansion. 5 Newton s Method The gradient method is based on the first-order term in the Taylor expansion for J(ˆθ. By taking a second-order term as well and solving the quadratic minimization problem iteratively, Newton s method for functional minimization is obtained. Essentially, Newton s method requires the error surface to be close to quadratic, and its effectiveness is directly tied to the accuracy of this assumption. For most problems, the error surface can be well approximated by a quadratic form near the solution. For this reason, Newton s method tends to give very rapid ( quadratic convergence in the last stages of iteration. Newton s method is derived as follows: The Taylor expansion of J(θ about ˆθ gives J(ˆθ = J(ˆθ + J (ˆθ (ˆθ ˆθ + 1 (ˆθ ˆθ ( T J λˆθ + λˆθ (ˆθ 2 ˆθ, for some 0 λ 1, where λ = 1 λ. It is now necessary to assume that J (λˆθ + λˆθ J (ˆθ. Differentiating with respect to ˆθ, where J(ˆθ is presumed to be minimum, this becomes 0 = 0 + J (ˆθ + J (ˆθ (ˆθ ˆθ. Solving for ˆθ yields ˆθ = ˆθ [J (ˆθ] 1 J (ˆθ. (7 Applying Eq. (7 iteratively, we obtain the following. Definition. Newton s method is defined by ˆθ n+1 = ˆθ n [J (ˆθ n ] 1 J (ˆθ n, n = 1, 2,..., (8 where ˆθ 0 is given as an initial condition. When J (λˆθ + λˆθ = J (ˆθ, the answer is obtained after the first iteration. In particular, when the error surface J(ˆθ is a quadratic form in ˆθ, Newton s method produces ˆθ in one iteration, i.e., ˆθ 1 = ˆθ for every ˆθ 0. For Newton s method, there is the following general result on the existence of a critical point (i.e., a point at which the gradient vanishes within a sphere of a Banach space. 8
9 Theorem. (Kantorovich Let ˆθ 0 be a point in ˆΘ for which [J (ˆθ 0 ] 1 exists, and set R 0 = [J (ˆθ 0 ] 1 J (ˆθ 0. Let S denote the sphere {ˆθ ˆΘ such that ˆθ ˆθ 0 2R 0 }. Set C 0 = J (ˆθ 0. If there exists a number M such that J (ˆθ 1 J M ˆθ 1 ˆθ 2 (ˆθ 2, 2 for ˆθ 1, ˆθ 2 in S, and such that C 0 R 0 M = h 0 1/2, then J (ˆθ = 0 for some ˆθ in S, and the Newton sequence Eq. (8 converges to it. Furthermore, the rate of convergence is quadratic, satisfying ˆθ ˆθ n 2 n+1 (2h 0 2n 1 R 0. Proof. See Goldstein [2], p
10 Index L p norms, 3 l p norms, 4 Banach Space, 3 Cauchy sequence, 3 causal, 4 Chebyshev norm, 4 cluster point, 6 concave, 5 concave error surface, 6 concave functional, 5 concave hull, 6 convergence, existence of minimizer, 6 convergence, global, 6 strictly concave functional, 5 uniform norm, 4 uniformly concave, 5 weighted L p norms, 3 Frobenious norm, 4 functional, 5 global convergence, conditions for, 6 global minimizer, 6 gradient, 6 Gradient Method, 6 Hankel matrix, 5 Hankel norm, 5 Hessian, 7 induced norm, 4 linear functional, 5 linear space, 2 local minimizer, 6 metric space, 2 Newton s method, 8 norm, 2 normed linear space, 2 pseudo-norm, 2 root mean square norm, 3 spectral norm, 4 stationary point, 7 10
11 References [1] Y. Genin, An introduction to the model reduction problem with hankel norm criterion, in Proc. European Conf. Circuit Theory and Design, The Hague, Aug [2] A. A. Goldstein, Constructive Real Analysis, New York: Harper and Row, [3] G. H. Golub and C. F. Van Loan, Matrix Computations, 2nd Edition, Baltimore: The Johns Hopkins University Press, [4] J. M. Ortega, Numerical Analysis, New York: Academic Press, [5] H. L. Royden, Real Analysis, New York: Macmillan, [6] W. Rudin, Principles of Mathematical Analysis, New York: McGraw-Hill, [7] R. E. Williamson, R. H. Crowell, and H. F. Trotter, Calculus of Vector Functions, Englewood Cliffs, NJ: Prentice-Hall,
Metric Spaces. Chapter 7. 7.1. Metrics
Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some
Inner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
BANACH AND HILBERT SPACE REVIEW
BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but
1 Norms and Vector Spaces
008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)
Vector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem
Similarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
By choosing to view this document, you agree to all provisions of the copyright laws protecting it.
This material is posted here with permission of the IEEE Such permission of the IEEE does not in any way imply IEEE endorsement of any of Helsinki University of Technology's products or services Internal
Lecture 5 Principal Minors and the Hessian
Lecture 5 Principal Minors and the Hessian Eivind Eriksen BI Norwegian School of Management Department of Economics October 01, 2010 Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and
Continuity of the Perron Root
Linear and Multilinear Algebra http://dx.doi.org/10.1080/03081087.2014.934233 ArXiv: 1407.7564 (http://arxiv.org/abs/1407.7564) Continuity of the Perron Root Carl D. Meyer Department of Mathematics, North
Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.
1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.
Fixed Point Theorems
Fixed Point Theorems Definition: Let X be a set and let T : X X be a function that maps X into itself. (Such a function is often called an operator, a transformation, or a transform on X, and the notation
Mathematical Methods of Engineering Analysis
Mathematical Methods of Engineering Analysis Erhan Çinlar Robert J. Vanderbei February 2, 2000 Contents Sets and Functions 1 1 Sets................................... 1 Subsets.............................
Notes on metric spaces
Notes on metric spaces 1 Introduction The purpose of these notes is to quickly review some of the basic concepts from Real Analysis, Metric Spaces and some related results that will be used in this course.
Notes on Symmetric Matrices
CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.
Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8
Spaces and bases Week 3: Wednesday, Feb 8 I have two favorite vector spaces 1 : R n and the space P d of polynomials of degree at most d. For R n, we have a canonical basis: R n = span{e 1, e 2,..., e
Numerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number
a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.
Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
Separation Properties for Locally Convex Cones
Journal of Convex Analysis Volume 9 (2002), No. 1, 301 307 Separation Properties for Locally Convex Cones Walter Roth Department of Mathematics, Universiti Brunei Darussalam, Gadong BE1410, Brunei Darussalam
Date: April 12, 2001. Contents
2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........
1. Prove that the empty set is a subset of every set.
1. Prove that the empty set is a subset of every set. Basic Topology Written by Men-Gen Tsai email: [email protected] Proof: For any element x of the empty set, x is also an element of every set since
Section 6.1 - Inner Products and Norms
Section 6.1 - Inner Products and Norms Definition. Let V be a vector space over F {R, C}. An inner product on V is a function that assigns, to every ordered pair of vectors x and y in V, a scalar in F,
t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).
1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction
Math 4310 Handout - Quotient Vector Spaces
Math 4310 Handout - Quotient Vector Spaces Dan Collins The textbook defines a subspace of a vector space in Chapter 4, but it avoids ever discussing the notion of a quotient space. This is understandable
TOPIC 4: DERIVATIVES
TOPIC 4: DERIVATIVES 1. The derivative of a function. Differentiation rules 1.1. The slope of a curve. The slope of a curve at a point P is a measure of the steepness of the curve. If Q is a point on the
Derivative Free Optimization
Department of Mathematics Derivative Free Optimization M.J.D. Powell LiTH-MAT-R--2014/02--SE Department of Mathematics Linköping University S-581 83 Linköping, Sweden. Three lectures 1 on Derivative Free
Multi-variable Calculus and Optimization
Multi-variable Calculus and Optimization Dudley Cooke Trinity College Dublin Dudley Cooke (Trinity College Dublin) Multi-variable Calculus and Optimization 1 / 51 EC2040 Topic 3 - Multi-variable Calculus
Lecture 7: Finding Lyapunov Functions 1
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.243j (Fall 2003): DYNAMICS OF NONLINEAR SYSTEMS by A. Megretski Lecture 7: Finding Lyapunov Functions 1
Continued Fractions and the Euclidean Algorithm
Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction
Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.
Chapter 7 Eigenvalues and Eigenvectors In this last chapter of our exploration of Linear Algebra we will revisit eigenvalues and eigenvectors of matrices, concepts that were already introduced in Geometry
1 if 1 x 0 1 if 0 x 1
Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or
9 More on differentiation
Tel Aviv University, 2013 Measure and category 75 9 More on differentiation 9a Finite Taylor expansion............... 75 9b Continuous and nowhere differentiable..... 78 9c Differentiable and nowhere monotone......
CONTINUED FRACTIONS AND PELL S EQUATION. Contents 1. Continued Fractions 1 2. Solution to Pell s Equation 9 References 12
CONTINUED FRACTIONS AND PELL S EQUATION SEUNG HYUN YANG Abstract. In this REU paper, I will use some important characteristics of continued fractions to give the complete set of solutions to Pell s equation.
(Basic definitions and properties; Separation theorems; Characterizations) 1.1 Definition, examples, inner description, algebraic properties
Lecture 1 Convex Sets (Basic definitions and properties; Separation theorems; Characterizations) 1.1 Definition, examples, inner description, algebraic properties 1.1.1 A convex set In the school geometry
Linear Algebra I. Ronald van Luijk, 2012
Linear Algebra I Ronald van Luijk, 2012 With many parts from Linear Algebra I by Michael Stoll, 2007 Contents 1. Vector spaces 3 1.1. Examples 3 1.2. Fields 4 1.3. The field of complex numbers. 6 1.4.
October 3rd, 2012. Linear Algebra & Properties of the Covariance Matrix
Linear Algebra & Properties of the Covariance Matrix October 3rd, 2012 Estimation of r and C Let rn 1, rn, t..., rn T be the historical return rates on the n th asset. rn 1 rṇ 2 r n =. r T n n = 1, 2,...,
1. Let P be the space of all polynomials (of one real variable and with real coefficients) with the norm
Uppsala Universitet Matematiska Institutionen Andreas Strömbergsson Prov i matematik Funktionalanalys Kurs: F3B, F4Sy, NVP 005-06-15 Skrivtid: 9 14 Tillåtna hjälpmedel: Manuella skrivdon, Kreyszigs bok
Duality of linear conic problems
Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete
CHAPTER SIX IRREDUCIBILITY AND FACTORIZATION 1. BASIC DIVISIBILITY THEORY
January 10, 2010 CHAPTER SIX IRREDUCIBILITY AND FACTORIZATION 1. BASIC DIVISIBILITY THEORY The set of polynomials over a field F is a ring, whose structure shares with the ring of integers many characteristics.
SOLUTIONS TO EXERCISES FOR. MATHEMATICS 205A Part 3. Spaces with special properties
SOLUTIONS TO EXERCISES FOR MATHEMATICS 205A Part 3 Fall 2008 III. Spaces with special properties III.1 : Compact spaces I Problems from Munkres, 26, pp. 170 172 3. Show that a finite union of compact subspaces
Linear Algebra Notes for Marsden and Tromba Vector Calculus
Linear Algebra Notes for Marsden and Tromba Vector Calculus n-dimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of
19 LINEAR QUADRATIC REGULATOR
19 LINEAR QUADRATIC REGULATOR 19.1 Introduction The simple form of loopshaping in scalar systems does not extend directly to multivariable (MIMO) plants, which are characterized by transfer matrices instead
Examination paper for TMA4205 Numerical Linear Algebra
Department of Mathematical Sciences Examination paper for TMA4205 Numerical Linear Algebra Academic contact during examination: Markus Grasmair Phone: 97580435 Examination date: December 16, 2015 Examination
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
minimal polyonomial Example
Minimal Polynomials Definition Let α be an element in GF(p e ). We call the monic polynomial of smallest degree which has coefficients in GF(p) and α as a root, the minimal polyonomial of α. Example: We
Undergraduate Notes in Mathematics. Arkansas Tech University Department of Mathematics
Undergraduate Notes in Mathematics Arkansas Tech University Department of Mathematics An Introductory Single Variable Real Analysis: A Learning Approach through Problem Solving Marcel B. Finan c All Rights
Høgskolen i Narvik Sivilingeniørutdanningen STE6237 ELEMENTMETODER. Oppgaver
Høgskolen i Narvik Sivilingeniørutdanningen STE637 ELEMENTMETODER Oppgaver Klasse: 4.ID, 4.IT Ekstern Professor: Gregory A. Chechkin e-mail: [email protected] Narvik 6 PART I Task. Consider two-point
n k=1 k=0 1/k! = e. Example 6.4. The series 1/k 2 converges in R. Indeed, if s n = n then k=1 1/k, then s 2n s n = 1 n + 1 +...
6 Series We call a normed space (X, ) a Banach space provided that every Cauchy sequence (x n ) in X converges. For example, R with the norm = is an example of Banach space. Now let (x n ) be a sequence
More than you wanted to know about quadratic forms
CALIFORNIA INSTITUTE OF TECHNOLOGY Division of the Humanities and Social Sciences More than you wanted to know about quadratic forms KC Border Contents 1 Quadratic forms 1 1.1 Quadratic forms on the unit
1 Solving LPs: The Simplex Algorithm of George Dantzig
Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.
Linear Programming. March 14, 2014
Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1
MICROLOCAL ANALYSIS OF THE BOCHNER-MARTINELLI INTEGRAL
PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 00, Number 0, Pages 000 000 S 0002-9939(XX)0000-0 MICROLOCAL ANALYSIS OF THE BOCHNER-MARTINELLI INTEGRAL NIKOLAI TARKHANOV AND NIKOLAI VASILEVSKI
Nonlinear Programming Methods.S2 Quadratic Programming
Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective
Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.
Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x
Mathematics Course 111: Algebra I Part IV: Vector Spaces
Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are
Notes on Determinant
ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without
Matrix Differentiation
1 Introduction Matrix Differentiation ( and some other stuff ) Randal J. Barnes Department of Civil Engineering, University of Minnesota Minneapolis, Minnesota, USA Throughout this presentation I have
Maximum Likelihood Estimation
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
Nonlinear Optimization: Algorithms 3: Interior-point methods
Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris [email protected] Nonlinear optimization c 2006 Jean-Philippe Vert,
and s n (x) f(x) for all x and s.t. s n is measurable if f is. REAL ANALYSIS Measures. A (positive) measure on a measurable space
RAL ANALYSIS A survey of MA 641-643, UAB 1999-2000 M. Griesemer Throughout these notes m denotes Lebesgue measure. 1. Abstract Integration σ-algebras. A σ-algebra in X is a non-empty collection of subsets
Constrained Least Squares
Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580-587 CICN may05/1 Background The least squares problem: min Ax b 2 x Sometimes,
Linear Programming in Matrix Form
Linear Programming in Matrix Form Appendix B We first introduce matrix concepts in linear programming by developing a variation of the simplex method called the revised simplex method. This algorithm,
F Matrix Calculus F 1
F Matrix Calculus F 1 Appendix F: MATRIX CALCULUS TABLE OF CONTENTS Page F1 Introduction F 3 F2 The Derivatives of Vector Functions F 3 F21 Derivative of Vector with Respect to Vector F 3 F22 Derivative
Linear Algebra: Vectors
A Linear Algebra: Vectors A Appendix A: LINEAR ALGEBRA: VECTORS TABLE OF CONTENTS Page A Motivation A 3 A2 Vectors A 3 A2 Notational Conventions A 4 A22 Visualization A 5 A23 Special Vectors A 5 A3 Vector
CSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
Lecture 2: Homogeneous Coordinates, Lines and Conics
Lecture 2: Homogeneous Coordinates, Lines and Conics 1 Homogeneous Coordinates In Lecture 1 we derived the camera equations λx = P X, (1) where x = (x 1, x 2, 1), X = (X 1, X 2, X 3, 1) and P is a 3 4
Inner products on R n, and more
Inner products on R n, and more Peyam Ryan Tabrizian Friday, April 12th, 2013 1 Introduction You might be wondering: Are there inner products on R n that are not the usual dot product x y = x 1 y 1 + +
Lecture L3 - Vectors, Matrices and Coordinate Transformations
S. Widnall 16.07 Dynamics Fall 2009 Lecture notes based on J. Peraire Version 2.0 Lecture L3 - Vectors, Matrices and Coordinate Transformations By using vectors and defining appropriate operations between
Adaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.
Walrasian Demand Econ 2100 Fall 2015 Lecture 5, September 16 Outline 1 Walrasian Demand 2 Properties of Walrasian Demand 3 An Optimization Recipe 4 First and Second Order Conditions Definition Walrasian
MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.
MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α
What is Linear Programming?
Chapter 1 What is Linear Programming? An optimization problem usually has three essential ingredients: a variable vector x consisting of a set of unknowns to be determined, an objective function of x to
1.3. DOT PRODUCT 19. 6. If θ is the angle (between 0 and π) between two non-zero vectors u and v,
1.3. DOT PRODUCT 19 1.3 Dot Product 1.3.1 Definitions and Properties The dot product is the first way to multiply two vectors. The definition we will give below may appear arbitrary. But it is not. It
Nonlinear Algebraic Equations Example
Nonlinear Algebraic Equations Example Continuous Stirred Tank Reactor (CSTR). Look for steady state concentrations & temperature. s r (in) p,i (in) i In: N spieces with concentrations c, heat capacities
Fourth-Order Compact Schemes of a Heat Conduction Problem with Neumann Boundary Conditions
Fourth-Order Compact Schemes of a Heat Conduction Problem with Neumann Boundary Conditions Jennifer Zhao, 1 Weizhong Dai, Tianchan Niu 1 Department of Mathematics and Statistics, University of Michigan-Dearborn,
Systems of Linear Equations
Systems of Linear Equations Beifang Chen Systems of linear equations Linear systems A linear equation in variables x, x,, x n is an equation of the form a x + a x + + a n x n = b, where a, a,, a n and
1 VECTOR SPACES AND SUBSPACES
1 VECTOR SPACES AND SUBSPACES What is a vector? Many are familiar with the concept of a vector as: Something which has magnitude and direction. an ordered pair or triple. a description for quantities such
Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen
(für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained
Class Meeting # 1: Introduction to PDEs
MATH 18.152 COURSE NOTES - CLASS MEETING # 1 18.152 Introduction to PDEs, Fall 2011 Professor: Jared Speck Class Meeting # 1: Introduction to PDEs 1. What is a PDE? We will be studying functions u = u(x
CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e.
CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e. This chapter contains the beginnings of the most important, and probably the most subtle, notion in mathematical analysis, i.e.,
Practical Guide to the Simplex Method of Linear Programming
Practical Guide to the Simplex Method of Linear Programming Marcel Oliver Revised: April, 0 The basic steps of the simplex algorithm Step : Write the linear programming problem in standard form Linear
Principles of Scientific Computing Nonlinear Equations and Optimization
Principles of Scientific Computing Nonlinear Equations and Optimization David Bindel and Jonathan Goodman last revised March 6, 2006, printed March 6, 2009 1 1 Introduction This chapter discusses two related
Numerical Methods I Eigenvalue Problems
Numerical Methods I Eigenvalue Problems Aleksandar Donev Courant Institute, NYU 1 [email protected] 1 Course G63.2010.001 / G22.2420-001, Fall 2010 September 30th, 2010 A. Donev (Courant Institute)
Finite Dimensional Hilbert Spaces and Linear Inverse Problems
Finite Dimensional Hilbert Spaces and Linear Inverse Problems ECE 174 Lecture Supplement Spring 2009 Ken Kreutz-Delgado Electrical and Computer Engineering Jacobs School of Engineering University of California,
Vector Spaces. 2 3x + 4x 2 7x 3) + ( 8 2x + 11x 2 + 9x 3)
Vector Spaces The idea of vectors dates back to the middle 1800 s, but our current understanding of the concept waited until Peano s work in 1888. Even then it took many years to understand the importance
FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES
FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES CHRISTOPHER HEIL 1. Cosets and the Quotient Space Any vector space is an abelian group under the operation of vector addition. So, if you are have studied
1 Introduction to Matrices
1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns
Scalar Valued Functions of Several Variables; the Gradient Vector
Scalar Valued Functions of Several Variables; the Gradient Vector Scalar Valued Functions vector valued function of n variables: Let us consider a scalar (i.e., numerical, rather than y = φ(x = φ(x 1,
Spazi vettoriali e misure di similaritá
Spazi vettoriali e misure di similaritá R. Basili Corso di Web Mining e Retrieval a.a. 2008-9 April 10, 2009 Outline Outline Spazi vettoriali a valori reali Operazioni tra vettori Indipendenza Lineare
Introduction to Algebraic Geometry. Bézout s Theorem and Inflection Points
Introduction to Algebraic Geometry Bézout s Theorem and Inflection Points 1. The resultant. Let K be a field. Then the polynomial ring K[x] is a unique factorisation domain (UFD). Another example of a
α = u v. In other words, Orthogonal Projection
Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
