Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Size: px
Start display at page:

Download "Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh"

Transcription

1 Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, / 30 The Problem We first consider the following problem: minimize f (x) (1) subject to x = (x 1,..., x n ) R n We will assume that f is: smooth (will be made precise later) strongly convex (will be made precise later) So, this is unconstrained minimization of a smooth convex function. 2 / 30

2 Randomized Coordinate Descent with Arbitrary Sampling NSync Algorithm (R. and Takáč 2014, [4]) Input: initial point x 0 R n subset probabilities {p S } for each S [n] = {1, 2,..., n} stepsize parameters v 1,..., v n > 0 for k = 0, 1, 2,... do a) Select a random set of coordinates S k [n] following the law P(S k = S) = p S, S [n] b) Update (possibly in parallel) selected coordinates: x k+1 = x k i S k 1 v i (e T i f (x k ))e i end for (e i is the ith unit coordinate vector) Remark: The NSync algorithm was introduced in The first coordinate descent algorithm using arbitrary sampling. 3 / 30 Two More Ways of Writing the Update Step 1. Coordinate-by-coordinate: x k+1 i = { xi k, i / S k, xi k 1 v i ( f (x k )) i, i S k. 2. Via projection to a subset of blocks: If for h R n and S [n] we write h S = h i e i, (2) i S then x k+1 = x k + h Sk for h = (Diag(v)) 1 f (x k ). (3) Depending on context, we shall interchangeably denote the ith partial derivative of f at x by i f (x) = e T i f (x) = ( f (x)) i 4 / 30

3 Samplings Definition 1 (Sampling) By the name sampling we refer to a set valued random mapping with values being subsets of [n] = {1, 2,..., n}. For sampling Ŝ we ine the probability vector p = (p 1,..., p n ) T by We say that Ŝ is proper, if p i > 0 for all i. p i = P(i Ŝ) (4) A sampling Ŝ is uniquely characterized by the probability mass function p S = P(Ŝ = S), S [n]; (5) that is, by assigning probabilities to all subsets of [n]. Later on it will be useful to also consider the probability matrix P = P(Ŝ) = (p ij) given by p ij = P(i Ŝ, j Ŝ) = S:{i,j} S p S. (6) 5 / 30 Samplings: A Basic Identity Lemma 2 ([3]) For any sampling Ŝ we have n p i = E[ Ŝ ]. (7) Proof. n (4)+(5) p i = n p S = p S = p S S = E[ Ŝ ]. S [n]:i S S [n] i:i S S [n] 6 / 30

4 Sampling Zoo - Part I Why consider different samplings? 1. Basic Considerations. It is important that each block i has a positive probability of being chosen, otherwise NSync will not be able to update some blocks and hence will not converge to optimum. For technical/sanity reasons, we ine: Proper sampling. pi = P(i Ŝ) > 0 for all i [n] Nil sampling: P( Ŝ = ) = 1 Vacuous sampling: P( Ŝ = ) > 0 2. Parallelism. Choice of sampling affects the level of parallelism: E[ Ŝ ] is the average number of updates performed in parallel in one iteration; and is hence closely related to the number of iterations. serial sampling: picks one block: P( Ŝ = 1) = 1 We call this sampling serial although nothing prevents us from computing the actual update to the block, and/or to apply he update in parallel. 7 / 30 Sampling Zoo - Part II fully parallel sampling: always picks all blocks: P(Ŝ = {1, 2,..., n}) = 1 3. Processor reliability. Sampling may be induced/informed by the computing environment: Reliable/dedicated processors. If one has reliable processors, it is sensible to choose sampling Ŝ such that P( Ŝ = τ) = 1 for some τ related to the number of processors. Unreliable processors. If processors given a computing task are busy or unreliable, they return answer later or not at all - it is then sensible to ignore such updates and move on. This then means that Ŝ varies from iteration to iteration. 4. Distributed computing. In a distributed computing environment it is sensible: to allow each compute node as much autonomy as possible so as to minimize communication cost, to make sure all nodes are busy at all times 8 / 30

5 Sampling Zoo - Part III This suggests a strategy where the set of blocks is partitioned, with each node owning a partition, and independently picking a chunky subset of blocks at each iteration it will update, ideally from local information. 5. Uniformity. It may or may not make sense to update some blocks more often than others: uniform samplings: P(i Ŝ) = P(j Ŝ) for all i, j [n] doubly uniform (DU): These are samplings characterized by: S = S P(Ŝ = S ) = P(Ŝ = S ) for all S, S [n] τ-nice: DU sampling with the additional property that P( Ŝ = τ) = 1 distributed τ-nice: will ine later independent sampling: union of independent uniform serial samplings nonuniform samplings 9 / 30 Sampling Zoo - Part IV 6. Complexity of generating a sampling. Some samplings are computationally more efficient to generate than others: the potential benefits of a sampling may be completely ruined by the difficulty to generate sets according to the sampling s distribution. a τ-nice sampling can be well approximated by an independent sampling, which is easy to generate... in general, many samplings will be hard to generate 10 / 30

6 Assumption: Strong convexity Assumption 1 (Strong convexity) Function f is differentiable and λ-strongly convex (with λ > 0) with respect to the standard Euclidean norm h = ( n ) 1/2 hi 2. That is, we assume that for all x, h R n, f (x + h) f (x) + f (x), h + λ 2 h 2. (8) 11 / 30 Assumption: Expected Separable Overapproximation Assumption 2 (ESO) Assume Ŝ is proper and that for some vector of positive weights v = (v 1,..., v n ) and all x, h R n, E[f (x + hŝ)] f (x) + f (x), h p h 2 p v. (9) Note that the ESO parameters v, p depend on both f and Ŝ. For simplicity, we will often instead of (9) use the compact notation Notation used above: h S = i S (f, Ŝ) ESO(v). h i e i R n (projection of h R n onto coordinates i S) g, h p p v = n p i g i h i R (weighted inner product) = (p 1 v 1,..., p n v n ) R n (Hadamard product) 12 / 30

7 Assumption: Expected Separable Overapproximation Here the ESO inequality again, now without the simplifying notation: E f x + h i e i f (x) + i Ŝ } {{ } complicated n p i i f (x)h i + } {{ } linear in h 1 n p i v i hi 2 2 } {{ } quadratic and separable in h 13 / 30 Complexity of NSync Theorem 3 (R. and Takáč 2013, [4]) Let x be a minimizer of f. Let Assumptions 1 and 2 be satisfied for a proper sampling Ŝ (that is, (f, Ŝ) ESO(v)). Choose starting point x 0 R n, error tolerance 0 < ɛ < f (x 0 ) f (x ) and confidence level 0 < ρ < 1. If {x k } are the random iterates generated by NSync, where the random sets S k are iid following the distribution of Ŝ, then K Ω ( f (x 0 λ log ) f (x ) ) P(f (x K ) f (x ) ɛ) 1 ρ, (10) ɛρ where Ω = max,...,n v i p i n v i E[ Ŝ ]. (11) 14 / 30

8 What does this mean? Linear convergence. NSync converges linearly (i.e., logarithmic dependence on ɛ) High confidence is not a problem. ρ appears inside the logarithm, so it easy to achieve high confidence (by running the method longer; there is no need to restart) Focus on the leading term. The leading term is Ω; and we have a closed-form expression for it in terms of parameters v1,..., v n (which depend on f and Ŝ) parameters p1,..., p n (which depend on Ŝ) Parallelization speedup. The lower bound suggests that if it was the case that the parameters v i did not grow with increasing E[ Ŝ ], then we could potentially be getting linear speedup in τ (average number of updates per iteration). So we shall study the dependence of vi on τ (this will depend on f and Ŝ) As we shall see, speedup is often guaranteed for sparse or well-conditioned problems. τ = Question: How to design sampling Ŝ so that Ω is minimized? 15 / 30 Analysis of the Algorithm (Proof of Theorem 3) 16 / 30

9 Tool: Markov s Inequality Theorem 4 (Markov s Inequality) Let X be a nonnegative random variable. Then for any ɛ > 0, P(X ɛ) E[X ]. ɛ Proof. Let 1 X ɛ be the random variable which is equal to 1 if X ɛ and 0 otherwise. Then 1 X ɛ X ɛ. By taking expectations of all terms, we obtain [ ] X P(X ɛ) = E [1 X ɛ ] E ɛ = E[X ]. ɛ 17 / 30 Tool: Tower Property of Expectations (Motivation) Example 5 Consider discrete random variables X and Y: X has 2 outcomes: x 1 and x 2 Y has 3 outcomes: y 1, y 2 and y 3 Their joint probability mass function if given in this table: y 1 y 2 y 3 x x Obviously, E[X] = 0.28x x 2. But we can also write: E[X] = (0.05x x 2 ) + (0.20x x 2 ) + (0.03x x 2 ) = 0.30 }{{} P(Y=y 1 ) 0.30 x x 2) +0.50( } {{ } x x 2) ( x x 2) E[X Y=y 1 ] ( 0.05 = E[E[X Y]]. 18 / 30

10 Tower Property Lemma 6 (Tower Property / Iterated Expectation) For any random variables X and Y, we have E[X ] = E[E[X Y ]]. Proof. We shall only prove this for discrete random variables; the proof is more technical in the continuous case. E[X ] = x xp(x = x) = x x y P(X = x & Y = y) = y = y xp(x = x & Y = y) x P(X = x & Y = y) P(Y = y)x P(Y = y) x = y P(Y = y) xp(x = x Y = y) x } {{ } E[X Y =y] = E[E[X Y ]]. 19 / 30 Proof of Theorem 3 - Part I If we let µ = λ/ω, then f (x + h) (8) f (x) + f (x), h + λ 2 h 2 f (x) + f (x), h + µ 2 h 2 v p 1. (12) Indeed, one can easily verify that µ is ined to be the largest number for which λ h 2 µ h 2 v p 1 holds for all h. Hence, f is µ-strongly convex with respect to the norm v p 1. Let x be a minimizer of f, i.e., an optimal solution of (1). Minimizing both sides of (12) in h, we get f (x ) f (x) (12) min h R n f (x), h + µ 2 h 2 v p 1 = 1 2µ f (x) 2 p v 1. (13) 20 / 30

11 Proof of Theorem 3 - Part II Let h k = v 1 f (x k ). Then in view of (3), we have x k+1 = x k + h k S k. Utilizing Assumption 2, we get E[f (x k+1 ) x k ] = E [ f (x k + h k S k ) x k] (9) f (x k ) + f (x k ), h k p hk 2 p v = f (x k ) 1 2 f (x k ) 2 p v 1 (13) f (x k ) µ(f (x k ) f (x )). Taking expectations in the last inequality, using the tower property, and subtracting f (x ) from both sides of the inequality, we get E[f (x k+1 ) f (x )] (1 µ)e[f (x k ) f (x )]. Unrolling the recurrence, we get E[f (x k ) f (x )] (1 µ) k (f (x 0 ) f (x )). (14) 21 / 30 Proof of Theorem 3 - Part III Using Markov s inequality, (14) and the inition of K, we get P(f (x K ) f (x ) ɛ) E[f (x K ) f (x )]/ɛ (14) (1 µ) K (f (x 0 ) f (x ))/ɛ (10) ρ. Finally, let us now establish the lower bound on Ω. Letting { } n = p R n : p 0, p i = E[ Ŝ ], we have Ω (11) = max i v i p i (7) min max p i v i p i = 1 E[ Ŝ ] n v i, where the last equality follows since optimal p i is proportional to v i. 22 / 30

12 How to compute the ESO stepsize parameters v 1,..., v n? 23 / 30 C 1 (A) Functions By inition, the ESO parameters v depend on both f and also highlighted by the notation we use: Ŝ. This is (f, Ŝ) ESO(v). Hence, in order to compute these parameters, we need to specify f and Ŝ. The following inition describes a wide class of functions, often appearing in computational practice, for which we will be able to do this. Definition 7 (C 1 (A) functions) Let A R m n. By C 1 (A) we denote the set of continuously differentiable functions f : R n R satisfying the following inequality for all x, h R n : f (x + h) f (x) + ( f (x)) T h ht A T Ah. (15) Example 8 (Least Squares) The function f (x) = 1 2 Ax b 2 satisfies (15) with an equality. Hence, f C 1 (A). 24 / 30

13 Are There More C 1 (A) Functions? Functions we wish to minimize in machine learning often have a finite sum structure: f (x) = j φ j(m j x), (16) where M j R m n are data matrices and φ j : R m R are loss functions. The next result says that under certain assumptions on the loss functions, f C 1 (A) for some A, and describes what this A is. Theorem 9 ([5]) Assume that for each j, function φ j is γ j -smooth: φ j (s) φ j (s ) γ j s s, for all s, s R m. Then f C 1 (A), where A satisfies A T A = J j=1 γ jm T j M j. Remark: The above theorem also says what A T A is. This is important, as will be clear from the next theorem. 25 / 30 ESO for C 1 (A) Functions and an Arbitrary Sampling Theorem 10 ([5]) Assume f C 1 (A) for some real matrix A R m n, let Ŝ be an arbitrary sampling and let P = P(Ŝ) be its probability matrix. If v = (v 1,..., v n ) satisfies P A T A Diag(p v), then (f, Ŝ) ESO(v). 26 / 30

14 Sampling Identity for a Quadratic In in order to prove Theorem 10, we will need the following lemma. Lemma 11 ([5]) Let G be any real n n matrix and Ŝ an arbitrary sampling. Then for any h R n we have E [ h Ṱ Gh ] ( ) S Ŝ = h T P(Ŝ) G h, (17) where denotes the Hadamard (elementwise) product of matrices, and P(Ŝ) is the probability matrix of Ŝ. 27 / 30 Proof of Lemma 11 Proof. Let 1 ij be the indicator random variable of the event i Ŝ & j Ŝ: 1 ij = { 1 if i Ŝ & j Ŝ, 0 otherwise. Note that E[1 ij ] = P(i Ŝ & j Ŝ) = p ij. We now have E [ h Ṱ Gh ] (2) S Ŝ = E n n G ij h i h j = E 1 ij G ij h i h j i Ŝ j Ŝ j=1 n n n n = E[1 ij ]G ij h i h j = p ij G ij h i h j j=1 = h T ( P(Ŝ) G ) h. j=1 28 / 30

15 Proof of Theorem 10 Having established Lemma 11, we are now ready prove Theorem 10. Proof. Fixing any x, h R n, we have E [ f (x + hŝ) ] (15) E [ ] f (x) + f (x), hŝ hṱ S AT AhŜ ] = f (x) + f (x), h p E [ h Ṱ S AT AhŜ (Lemma 11) = f (x) + f (x), h p ht ( P A T A ) h f (x) + f (x), h p ht Diag(p v)h. } {{ } = h 2 p v 29 / 30 References I [1] Yurii Nesterov. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization, 22(2): , 2012 [2] Peter Richtárik and Martin Takáč. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Mathematical Programming, 144(2):1 38, 2014 [3] Peter Richtárik and Martin Takáč. Parallel coordinate descent methods for big data optimization. Mathematical Programming, 1 52, 2015 [4] Peter Richtárik and Martin Takáč. On optimal probabilities in stochastic coordinate descent methods. Optimization Letters, 2015 [5] Zheng Qu and Peter Richtárik. Coordinate descent with arbitrary sampling II: expected separable overapproximation, arxiv: , / 30

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions

Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions Peter Richtárik School of Mathematics The University of Edinburgh Joint work with Martin Takáč (Edinburgh)

More information

Lecture 4: BK inequality 27th August and 6th September, 2007

Lecture 4: BK inequality 27th August and 6th September, 2007 CSL866: Percolation and Random Graphs IIT Delhi Amitabha Bagchi Scribe: Arindam Pal Lecture 4: BK inequality 27th August and 6th September, 2007 4. Preliminaries The FKG inequality allows us to lower bound

More information

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let

1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let Copyright c 2009 by Karl Sigman 1 Stopping Times 1.1 Stopping Times: Definition Given a stochastic process X = {X n : n 0}, a random time τ is a discrete random variable on the same probability space as

More information

17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function

17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function 17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function, : V V R, which is symmetric, that is u, v = v, u. bilinear, that is linear (in both factors):

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014 Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

More information

Probability Generating Functions

Probability Generating Functions page 39 Chapter 3 Probability Generating Functions 3 Preamble: Generating Functions Generating functions are widely used in mathematics, and play an important role in probability theory Consider a sequence

More information

Inner Product Spaces

Inner Product Spaces Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

More information

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d). 1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

More information

Mathematical finance and linear programming (optimization)

Mathematical finance and linear programming (optimization) Mathematical finance and linear programming (optimization) Geir Dahl September 15, 2009 1 Introduction The purpose of this short note is to explain how linear programming (LP) (=linear optimization) may

More information

GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1

GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1 (R) User Manual Environmental Energy Technologies Division Berkeley, CA 94720 http://simulationresearch.lbl.gov Michael Wetter MWetter@lbl.gov February 20, 2009 Notice: This work was supported by the U.S.

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

CITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION

CITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION No: CITY UNIVERSITY LONDON BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION ENGINEERING MATHEMATICS 2 (resit) EX2005 Date: August

More information

1 Norms and Vector Spaces

1 Norms and Vector Spaces 008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)

More information

Proximal mapping via network optimization

Proximal mapping via network optimization L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:

More information

No: 10 04. Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics

No: 10 04. Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics No: 10 04 Bilkent University Monotonic Extension Farhad Husseinov Discussion Papers Department of Economics The Discussion Papers of the Department of Economics are intended to make the initial results

More information

Høgskolen i Narvik Sivilingeniørutdanningen STE6237 ELEMENTMETODER. Oppgaver

Høgskolen i Narvik Sivilingeniørutdanningen STE6237 ELEMENTMETODER. Oppgaver Høgskolen i Narvik Sivilingeniørutdanningen STE637 ELEMENTMETODER Oppgaver Klasse: 4.ID, 4.IT Ekstern Professor: Gregory A. Chechkin e-mail: chechkin@mech.math.msu.su Narvik 6 PART I Task. Consider two-point

More information

Numerical methods for American options

Numerical methods for American options Lecture 9 Numerical methods for American options Lecture Notes by Andrzej Palczewski Computational Finance p. 1 American options The holder of an American option has the right to exercise it at any moment

More information

BANACH AND HILBERT SPACE REVIEW

BANACH AND HILBERT SPACE REVIEW BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but

More information

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Mathematics Course 111: Algebra I Part IV: Vector Spaces Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are

More information

Introduction to Scheduling Theory

Introduction to Scheduling Theory Introduction to Scheduling Theory Arnaud Legrand Laboratoire Informatique et Distribution IMAG CNRS, France arnaud.legrand@imag.fr November 8, 2004 1/ 26 Outline 1 Task graphs from outer space 2 Scheduling

More information

Probability Theory. Florian Herzog. A random variable is neither random nor variable. Gian-Carlo Rota, M.I.T..

Probability Theory. Florian Herzog. A random variable is neither random nor variable. Gian-Carlo Rota, M.I.T.. Probability Theory A random variable is neither random nor variable. Gian-Carlo Rota, M.I.T.. Florian Herzog 2013 Probability space Probability space A probability space W is a unique triple W = {Ω, F,

More information

Randomized Coordinate Descent Methods for Big Data Optimization

Randomized Coordinate Descent Methods for Big Data Optimization Randomized Coordinate Descent Methods for Big Data Optimization Doctoral thesis Martin Takáč Supervisor: Dr. Peter Richtárik Doctor of Philosophy The University of Edinburgh 2014 Randomized Coordinate

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

THE DYING FIBONACCI TREE. 1. Introduction. Consider a tree with two types of nodes, say A and B, and the following properties:

THE DYING FIBONACCI TREE. 1. Introduction. Consider a tree with two types of nodes, say A and B, and the following properties: THE DYING FIBONACCI TREE BERNHARD GITTENBERGER 1. Introduction Consider a tree with two types of nodes, say A and B, and the following properties: 1. Let the root be of type A.. Each node of type A produces

More information

Distributed Coordinate Descent Method for Learning with Big Data

Distributed Coordinate Descent Method for Learning with Big Data Peter Richtárik Martin Takáč University of Edinburgh, King s Buildings, EH9 3JZ Edinburgh, United Kingdom PETER.RICHTARIK@ED.AC.UK MARTIN.TAKI@GMAIL.COM Abstract In this paper we develop and analyze Hydra:

More information

Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

More information

Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number

More information

Stochastic Inventory Control

Stochastic Inventory Control Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the

More information

(Quasi-)Newton methods

(Quasi-)Newton methods (Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

More information

Integer Factorization using the Quadratic Sieve

Integer Factorization using the Quadratic Sieve Integer Factorization using the Quadratic Sieve Chad Seibert* Division of Science and Mathematics University of Minnesota, Morris Morris, MN 56567 seib0060@morris.umn.edu March 16, 2011 Abstract We give

More information

minimal polyonomial Example

minimal polyonomial Example Minimal Polynomials Definition Let α be an element in GF(p e ). We call the monic polynomial of smallest degree which has coefficients in GF(p) and α as a root, the minimal polyonomial of α. Example: We

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete

More information

Tail inequalities for order statistics of log-concave vectors and applications

Tail inequalities for order statistics of log-concave vectors and applications Tail inequalities for order statistics of log-concave vectors and applications Rafał Latała Based in part on a joint work with R.Adamczak, A.E.Litvak, A.Pajor and N.Tomczak-Jaegermann Banff, May 2011 Basic

More information

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan

More information

Continuity of the Perron Root

Continuity of the Perron Root Linear and Multilinear Algebra http://dx.doi.org/10.1080/03081087.2014.934233 ArXiv: 1407.7564 (http://arxiv.org/abs/1407.7564) Continuity of the Perron Root Carl D. Meyer Department of Mathematics, North

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

More information

Pacific Journal of Mathematics

Pacific Journal of Mathematics Pacific Journal of Mathematics GLOBAL EXISTENCE AND DECREASING PROPERTY OF BOUNDARY VALUES OF SOLUTIONS TO PARABOLIC EQUATIONS WITH NONLOCAL BOUNDARY CONDITIONS Sangwon Seo Volume 193 No. 1 March 2000

More information

ALMOST COMMON PRIORS 1. INTRODUCTION

ALMOST COMMON PRIORS 1. INTRODUCTION ALMOST COMMON PRIORS ZIV HELLMAN ABSTRACT. What happens when priors are not common? We introduce a measure for how far a type space is from having a common prior, which we term prior distance. If a type

More information

ON SEQUENTIAL CONTINUITY OF COMPOSITION MAPPING. 0. Introduction

ON SEQUENTIAL CONTINUITY OF COMPOSITION MAPPING. 0. Introduction ON SEQUENTIAL CONTINUITY OF COMPOSITION MAPPING Abstract. In [1] there was proved a theorem concerning the continuity of the composition mapping, and there was announced a theorem on sequential continuity

More information

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,

More information

SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA)

SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA) SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI (B.Sc.(Hons.), BUAA) A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF MATHEMATICS NATIONAL

More information

Ideal Class Group and Units

Ideal Class Group and Units Chapter 4 Ideal Class Group and Units We are now interested in understanding two aspects of ring of integers of number fields: how principal they are (that is, what is the proportion of principal ideals

More information

Master s Theory Exam Spring 2006

Master s Theory Exam Spring 2006 Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

More information

Markov random fields and Gibbs measures

Markov random fields and Gibbs measures Chapter Markov random fields and Gibbs measures 1. Conditional independence Suppose X i is a random element of (X i, B i ), for i = 1, 2, 3, with all X i defined on the same probability space (.F, P).

More information

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α

More information

COMBINATORIAL PROPERTIES OF THE HIGMAN-SIMS GRAPH. 1. Introduction

COMBINATORIAL PROPERTIES OF THE HIGMAN-SIMS GRAPH. 1. Introduction COMBINATORIAL PROPERTIES OF THE HIGMAN-SIMS GRAPH ZACHARY ABEL 1. Introduction In this survey we discuss properties of the Higman-Sims graph, which has 100 vertices, 1100 edges, and is 22 regular. In fact

More information

Two-Stage Stochastic Linear Programs

Two-Stage Stochastic Linear Programs Two-Stage Stochastic Linear Programs Operations Research Anthony Papavasiliou 1 / 27 Two-Stage Stochastic Linear Programs 1 Short Reviews Probability Spaces and Random Variables Convex Analysis 2 Deterministic

More information

Nonlinear Optimization: Algorithms 3: Interior-point methods

Nonlinear Optimization: Algorithms 3: Interior-point methods Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,

More information

THREE DIMENSIONAL GEOMETRY

THREE DIMENSIONAL GEOMETRY Chapter 8 THREE DIMENSIONAL GEOMETRY 8.1 Introduction In this chapter we present a vector algebra approach to three dimensional geometry. The aim is to present standard properties of lines and planes,

More information

Vector and Matrix Norms

Vector and Matrix Norms Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

More information

Chapter 4 Lecture Notes

Chapter 4 Lecture Notes Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,

More information

Chapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

Chapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one

More information

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}. Walrasian Demand Econ 2100 Fall 2015 Lecture 5, September 16 Outline 1 Walrasian Demand 2 Properties of Walrasian Demand 3 An Optimization Recipe 4 First and Second Order Conditions Definition Walrasian

More information

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem

More information

An improved on-line algorithm for scheduling on two unrestrictive parallel batch processing machines

An improved on-line algorithm for scheduling on two unrestrictive parallel batch processing machines This is the Pre-Published Version. An improved on-line algorithm for scheduling on two unrestrictive parallel batch processing machines Q.Q. Nong, T.C.E. Cheng, C.T. Ng Department of Mathematics, Ocean

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

Let H and J be as in the above lemma. The result of the lemma shows that the integral

Let H and J be as in the above lemma. The result of the lemma shows that the integral Let and be as in the above lemma. The result of the lemma shows that the integral ( f(x, y)dy) dx is well defined; we denote it by f(x, y)dydx. By symmetry, also the integral ( f(x, y)dx) dy is well defined;

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

Date: April 12, 2001. Contents

Date: April 12, 2001. Contents 2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........

More information

a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.

a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2. Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given

More information

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014

More information

Cyber-Security Analysis of State Estimators in Power Systems

Cyber-Security Analysis of State Estimators in Power Systems Cyber-Security Analysis of State Estimators in Electric Power Systems André Teixeira 1, Saurabh Amin 2, Henrik Sandberg 1, Karl H. Johansson 1, and Shankar Sastry 2 ACCESS Linnaeus Centre, KTH-Royal Institute

More information

MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

More information

Notes on Symmetric Matrices

Notes on Symmetric Matrices CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.

More information

Imprecise probabilities, bets and functional analytic methods in Łukasiewicz logic.

Imprecise probabilities, bets and functional analytic methods in Łukasiewicz logic. Imprecise probabilities, bets and functional analytic methods in Łukasiewicz logic. Martina Fedel joint work with K.Keimel,F.Montagna,W.Roth Martina Fedel (UNISI) 1 / 32 Goal The goal of this talk is to

More information

RANDOM INTERVAL HOMEOMORPHISMS. MICHA L MISIUREWICZ Indiana University Purdue University Indianapolis

RANDOM INTERVAL HOMEOMORPHISMS. MICHA L MISIUREWICZ Indiana University Purdue University Indianapolis RANDOM INTERVAL HOMEOMORPHISMS MICHA L MISIUREWICZ Indiana University Purdue University Indianapolis This is a joint work with Lluís Alsedà Motivation: A talk by Yulij Ilyashenko. Two interval maps, applied

More information

We shall turn our attention to solving linear systems of equations. Ax = b

We shall turn our attention to solving linear systems of equations. Ax = b 59 Linear Algebra We shall turn our attention to solving linear systems of equations Ax = b where A R m n, x R n, and b R m. We already saw examples of methods that required the solution of a linear system

More information

Some probability and statistics

Some probability and statistics Appendix A Some probability and statistics A Probabilities, random variables and their distribution We summarize a few of the basic concepts of random variables, usually denoted by capital letters, X,Y,

More information

1 Approximating Set Cover

1 Approximating Set Cover CS 05: Algorithms (Grad) Feb 2-24, 2005 Approximating Set Cover. Definition An Instance (X, F ) of the set-covering problem consists of a finite set X and a family F of subset of X, such that every elemennt

More information

Separation Properties for Locally Convex Cones

Separation Properties for Locally Convex Cones Journal of Convex Analysis Volume 9 (2002), No. 1, 301 307 Separation Properties for Locally Convex Cones Walter Roth Department of Mathematics, Universiti Brunei Darussalam, Gadong BE1410, Brunei Darussalam

More information

Geometrical Characterization of RN-operators between Locally Convex Vector Spaces

Geometrical Characterization of RN-operators between Locally Convex Vector Spaces Geometrical Characterization of RN-operators between Locally Convex Vector Spaces OLEG REINOV St. Petersburg State University Dept. of Mathematics and Mechanics Universitetskii pr. 28, 198504 St, Petersburg

More information

Level Set Framework, Signed Distance Function, and Various Tools

Level Set Framework, Signed Distance Function, and Various Tools Level Set Framework Geometry and Calculus Tools Level Set Framework,, and Various Tools Spencer Department of Mathematics Brigham Young University Image Processing Seminar (Week 3), 2010 Level Set Framework

More information

Linear Algebra I. Ronald van Luijk, 2012

Linear Algebra I. Ronald van Luijk, 2012 Linear Algebra I Ronald van Luijk, 2012 With many parts from Linear Algebra I by Michael Stoll, 2007 Contents 1. Vector spaces 3 1.1. Examples 3 1.2. Fields 4 1.3. The field of complex numbers. 6 1.4.

More information

Lectures on Stochastic Processes. William G. Faris

Lectures on Stochastic Processes. William G. Faris Lectures on Stochastic Processes William G. Faris November 8, 2001 2 Contents 1 Random walk 7 1.1 Symmetric simple random walk................... 7 1.2 Simple random walk......................... 9 1.3

More information

Math 4310 Handout - Quotient Vector Spaces

Math 4310 Handout - Quotient Vector Spaces Math 4310 Handout - Quotient Vector Spaces Dan Collins The textbook defines a subspace of a vector space in Chapter 4, but it avoids ever discussing the notion of a quotient space. This is understandable

More information

An Additive Neumann-Neumann Method for Mortar Finite Element for 4th Order Problems

An Additive Neumann-Neumann Method for Mortar Finite Element for 4th Order Problems An Additive eumann-eumann Method for Mortar Finite Element for 4th Order Problems Leszek Marcinkowski Department of Mathematics, University of Warsaw, Banacha 2, 02-097 Warszawa, Poland, Leszek.Marcinkowski@mimuw.edu.pl

More information

Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.

Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U. Summer course on Convex Optimization Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.Minnesota Interior-Point Methods: the rebirth of an old idea Suppose that f is

More information

Associativity condition for some alternative algebras of degree three

Associativity condition for some alternative algebras of degree three Associativity condition for some alternative algebras of degree three Mirela Stefanescu and Cristina Flaut Abstract In this paper we find an associativity condition for a class of alternative algebras

More information

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2014 Timo Koski () Mathematisk statistik 24.09.2014 1 / 75 Learning outcomes Random vectors, mean vector, covariance

More information

Error estimates for nearly degenerate finite elements

Error estimates for nearly degenerate finite elements Error estimates for nearly degenerate finite elements Pierre Jamet In RAIRO: Analyse Numérique, Vol 10, No 3, March 1976, p. 43 61 Abstract We study a property which is satisfied by most commonly used

More information

Similarity and Diagonalization. Similar Matrices

Similarity and Diagonalization. Similar Matrices MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

More information

The Multiplicative Weights Update method

The Multiplicative Weights Update method Chapter 2 The Multiplicative Weights Update method The Multiplicative Weights method is a simple idea which has been repeatedly discovered in fields as diverse as Machine Learning, Optimization, and Game

More information

NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing

NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing Alex Cloninger Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu

More information

Some stability results of parameter identification in a jump diffusion model

Some stability results of parameter identification in a jump diffusion model Some stability results of parameter identification in a jump diffusion model D. Düvelmeyer Technische Universität Chemnitz, Fakultät für Mathematik, 09107 Chemnitz, Germany Abstract In this paper we discuss

More information

Influences in low-degree polynomials

Influences in low-degree polynomials Influences in low-degree polynomials Artūrs Bačkurs December 12, 2012 1 Introduction In 3] it is conjectured that every bounded real polynomial has a highly influential variable The conjecture is known

More information

Message-passing sequential detection of multiple change points in networks

Message-passing sequential detection of multiple change points in networks Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal

More information

GI01/M055 Supervised Learning Proximal Methods

GI01/M055 Supervised Learning Proximal Methods GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators

More information

1 Portfolio mean and variance

1 Portfolio mean and variance Copyright c 2005 by Karl Sigman Portfolio mean and variance Here we study the performance of a one-period investment X 0 > 0 (dollars) shared among several different assets. Our criterion for measuring

More information

17.3.1 Follow the Perturbed Leader

17.3.1 Follow the Perturbed Leader CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.

More information

Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski.

Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski. Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski. Fabienne Comte, Celine Duval, Valentine Genon-Catalot To cite this version: Fabienne

More information

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data Faming Liang University of Florida August 9, 2015 Abstract MCMC methods have proven to be a very powerful tool for analyzing

More information

Lecture notes: single-agent dynamics 1

Lecture notes: single-agent dynamics 1 Lecture notes: single-agent dynamics 1 Single-agent dynamic optimization models In these lecture notes we consider specification and estimation of dynamic optimization models. Focus on single-agent models.

More information

1 Introduction. 2 Prediction with Expert Advice. Online Learning 9.520 Lecture 09

1 Introduction. 2 Prediction with Expert Advice. Online Learning 9.520 Lecture 09 1 Introduction Most of the course is concerned with the batch learning problem. In this lecture, however, we look at a different model, called online. Let us first compare and contrast the two. In batch

More information

Perron vector Optimization applied to search engines

Perron vector Optimization applied to search engines Perron vector Optimization applied to search engines Olivier Fercoq INRIA Saclay and CMAP Ecole Polytechnique May 18, 2011 Web page ranking The core of search engines Semantic rankings (keywords) Hyperlink

More information

Modeling and Performance Evaluation of Computer Systems Security Operation 1

Modeling and Performance Evaluation of Computer Systems Security Operation 1 Modeling and Performance Evaluation of Computer Systems Security Operation 1 D. Guster 2 St.Cloud State University 3 N.K. Krivulin 4 St.Petersburg State University 5 Abstract A model of computer system

More information

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem

More information

THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS

THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS KEITH CONRAD 1. Introduction The Fundamental Theorem of Algebra says every nonconstant polynomial with complex coefficients can be factored into linear

More information