Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
|
|
- Cecil Owen
- 8 years ago
- Views:
Transcription
1 Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, / 30 The Problem We first consider the following problem: minimize f (x) (1) subject to x = (x 1,..., x n ) R n We will assume that f is: smooth (will be made precise later) strongly convex (will be made precise later) So, this is unconstrained minimization of a smooth convex function. 2 / 30
2 Randomized Coordinate Descent with Arbitrary Sampling NSync Algorithm (R. and Takáč 2014, [4]) Input: initial point x 0 R n subset probabilities {p S } for each S [n] = {1, 2,..., n} stepsize parameters v 1,..., v n > 0 for k = 0, 1, 2,... do a) Select a random set of coordinates S k [n] following the law P(S k = S) = p S, S [n] b) Update (possibly in parallel) selected coordinates: x k+1 = x k i S k 1 v i (e T i f (x k ))e i end for (e i is the ith unit coordinate vector) Remark: The NSync algorithm was introduced in The first coordinate descent algorithm using arbitrary sampling. 3 / 30 Two More Ways of Writing the Update Step 1. Coordinate-by-coordinate: x k+1 i = { xi k, i / S k, xi k 1 v i ( f (x k )) i, i S k. 2. Via projection to a subset of blocks: If for h R n and S [n] we write h S = h i e i, (2) i S then x k+1 = x k + h Sk for h = (Diag(v)) 1 f (x k ). (3) Depending on context, we shall interchangeably denote the ith partial derivative of f at x by i f (x) = e T i f (x) = ( f (x)) i 4 / 30
3 Samplings Definition 1 (Sampling) By the name sampling we refer to a set valued random mapping with values being subsets of [n] = {1, 2,..., n}. For sampling Ŝ we ine the probability vector p = (p 1,..., p n ) T by We say that Ŝ is proper, if p i > 0 for all i. p i = P(i Ŝ) (4) A sampling Ŝ is uniquely characterized by the probability mass function p S = P(Ŝ = S), S [n]; (5) that is, by assigning probabilities to all subsets of [n]. Later on it will be useful to also consider the probability matrix P = P(Ŝ) = (p ij) given by p ij = P(i Ŝ, j Ŝ) = S:{i,j} S p S. (6) 5 / 30 Samplings: A Basic Identity Lemma 2 ([3]) For any sampling Ŝ we have n p i = E[ Ŝ ]. (7) Proof. n (4)+(5) p i = n p S = p S = p S S = E[ Ŝ ]. S [n]:i S S [n] i:i S S [n] 6 / 30
4 Sampling Zoo - Part I Why consider different samplings? 1. Basic Considerations. It is important that each block i has a positive probability of being chosen, otherwise NSync will not be able to update some blocks and hence will not converge to optimum. For technical/sanity reasons, we ine: Proper sampling. pi = P(i Ŝ) > 0 for all i [n] Nil sampling: P( Ŝ = ) = 1 Vacuous sampling: P( Ŝ = ) > 0 2. Parallelism. Choice of sampling affects the level of parallelism: E[ Ŝ ] is the average number of updates performed in parallel in one iteration; and is hence closely related to the number of iterations. serial sampling: picks one block: P( Ŝ = 1) = 1 We call this sampling serial although nothing prevents us from computing the actual update to the block, and/or to apply he update in parallel. 7 / 30 Sampling Zoo - Part II fully parallel sampling: always picks all blocks: P(Ŝ = {1, 2,..., n}) = 1 3. Processor reliability. Sampling may be induced/informed by the computing environment: Reliable/dedicated processors. If one has reliable processors, it is sensible to choose sampling Ŝ such that P( Ŝ = τ) = 1 for some τ related to the number of processors. Unreliable processors. If processors given a computing task are busy or unreliable, they return answer later or not at all - it is then sensible to ignore such updates and move on. This then means that Ŝ varies from iteration to iteration. 4. Distributed computing. In a distributed computing environment it is sensible: to allow each compute node as much autonomy as possible so as to minimize communication cost, to make sure all nodes are busy at all times 8 / 30
5 Sampling Zoo - Part III This suggests a strategy where the set of blocks is partitioned, with each node owning a partition, and independently picking a chunky subset of blocks at each iteration it will update, ideally from local information. 5. Uniformity. It may or may not make sense to update some blocks more often than others: uniform samplings: P(i Ŝ) = P(j Ŝ) for all i, j [n] doubly uniform (DU): These are samplings characterized by: S = S P(Ŝ = S ) = P(Ŝ = S ) for all S, S [n] τ-nice: DU sampling with the additional property that P( Ŝ = τ) = 1 distributed τ-nice: will ine later independent sampling: union of independent uniform serial samplings nonuniform samplings 9 / 30 Sampling Zoo - Part IV 6. Complexity of generating a sampling. Some samplings are computationally more efficient to generate than others: the potential benefits of a sampling may be completely ruined by the difficulty to generate sets according to the sampling s distribution. a τ-nice sampling can be well approximated by an independent sampling, which is easy to generate... in general, many samplings will be hard to generate 10 / 30
6 Assumption: Strong convexity Assumption 1 (Strong convexity) Function f is differentiable and λ-strongly convex (with λ > 0) with respect to the standard Euclidean norm h = ( n ) 1/2 hi 2. That is, we assume that for all x, h R n, f (x + h) f (x) + f (x), h + λ 2 h 2. (8) 11 / 30 Assumption: Expected Separable Overapproximation Assumption 2 (ESO) Assume Ŝ is proper and that for some vector of positive weights v = (v 1,..., v n ) and all x, h R n, E[f (x + hŝ)] f (x) + f (x), h p h 2 p v. (9) Note that the ESO parameters v, p depend on both f and Ŝ. For simplicity, we will often instead of (9) use the compact notation Notation used above: h S = i S (f, Ŝ) ESO(v). h i e i R n (projection of h R n onto coordinates i S) g, h p p v = n p i g i h i R (weighted inner product) = (p 1 v 1,..., p n v n ) R n (Hadamard product) 12 / 30
7 Assumption: Expected Separable Overapproximation Here the ESO inequality again, now without the simplifying notation: E f x + h i e i f (x) + i Ŝ } {{ } complicated n p i i f (x)h i + } {{ } linear in h 1 n p i v i hi 2 2 } {{ } quadratic and separable in h 13 / 30 Complexity of NSync Theorem 3 (R. and Takáč 2013, [4]) Let x be a minimizer of f. Let Assumptions 1 and 2 be satisfied for a proper sampling Ŝ (that is, (f, Ŝ) ESO(v)). Choose starting point x 0 R n, error tolerance 0 < ɛ < f (x 0 ) f (x ) and confidence level 0 < ρ < 1. If {x k } are the random iterates generated by NSync, where the random sets S k are iid following the distribution of Ŝ, then K Ω ( f (x 0 λ log ) f (x ) ) P(f (x K ) f (x ) ɛ) 1 ρ, (10) ɛρ where Ω = max,...,n v i p i n v i E[ Ŝ ]. (11) 14 / 30
8 What does this mean? Linear convergence. NSync converges linearly (i.e., logarithmic dependence on ɛ) High confidence is not a problem. ρ appears inside the logarithm, so it easy to achieve high confidence (by running the method longer; there is no need to restart) Focus on the leading term. The leading term is Ω; and we have a closed-form expression for it in terms of parameters v1,..., v n (which depend on f and Ŝ) parameters p1,..., p n (which depend on Ŝ) Parallelization speedup. The lower bound suggests that if it was the case that the parameters v i did not grow with increasing E[ Ŝ ], then we could potentially be getting linear speedup in τ (average number of updates per iteration). So we shall study the dependence of vi on τ (this will depend on f and Ŝ) As we shall see, speedup is often guaranteed for sparse or well-conditioned problems. τ = Question: How to design sampling Ŝ so that Ω is minimized? 15 / 30 Analysis of the Algorithm (Proof of Theorem 3) 16 / 30
9 Tool: Markov s Inequality Theorem 4 (Markov s Inequality) Let X be a nonnegative random variable. Then for any ɛ > 0, P(X ɛ) E[X ]. ɛ Proof. Let 1 X ɛ be the random variable which is equal to 1 if X ɛ and 0 otherwise. Then 1 X ɛ X ɛ. By taking expectations of all terms, we obtain [ ] X P(X ɛ) = E [1 X ɛ ] E ɛ = E[X ]. ɛ 17 / 30 Tool: Tower Property of Expectations (Motivation) Example 5 Consider discrete random variables X and Y: X has 2 outcomes: x 1 and x 2 Y has 3 outcomes: y 1, y 2 and y 3 Their joint probability mass function if given in this table: y 1 y 2 y 3 x x Obviously, E[X] = 0.28x x 2. But we can also write: E[X] = (0.05x x 2 ) + (0.20x x 2 ) + (0.03x x 2 ) = 0.30 }{{} P(Y=y 1 ) 0.30 x x 2) +0.50( } {{ } x x 2) ( x x 2) E[X Y=y 1 ] ( 0.05 = E[E[X Y]]. 18 / 30
10 Tower Property Lemma 6 (Tower Property / Iterated Expectation) For any random variables X and Y, we have E[X ] = E[E[X Y ]]. Proof. We shall only prove this for discrete random variables; the proof is more technical in the continuous case. E[X ] = x xp(x = x) = x x y P(X = x & Y = y) = y = y xp(x = x & Y = y) x P(X = x & Y = y) P(Y = y)x P(Y = y) x = y P(Y = y) xp(x = x Y = y) x } {{ } E[X Y =y] = E[E[X Y ]]. 19 / 30 Proof of Theorem 3 - Part I If we let µ = λ/ω, then f (x + h) (8) f (x) + f (x), h + λ 2 h 2 f (x) + f (x), h + µ 2 h 2 v p 1. (12) Indeed, one can easily verify that µ is ined to be the largest number for which λ h 2 µ h 2 v p 1 holds for all h. Hence, f is µ-strongly convex with respect to the norm v p 1. Let x be a minimizer of f, i.e., an optimal solution of (1). Minimizing both sides of (12) in h, we get f (x ) f (x) (12) min h R n f (x), h + µ 2 h 2 v p 1 = 1 2µ f (x) 2 p v 1. (13) 20 / 30
11 Proof of Theorem 3 - Part II Let h k = v 1 f (x k ). Then in view of (3), we have x k+1 = x k + h k S k. Utilizing Assumption 2, we get E[f (x k+1 ) x k ] = E [ f (x k + h k S k ) x k] (9) f (x k ) + f (x k ), h k p hk 2 p v = f (x k ) 1 2 f (x k ) 2 p v 1 (13) f (x k ) µ(f (x k ) f (x )). Taking expectations in the last inequality, using the tower property, and subtracting f (x ) from both sides of the inequality, we get E[f (x k+1 ) f (x )] (1 µ)e[f (x k ) f (x )]. Unrolling the recurrence, we get E[f (x k ) f (x )] (1 µ) k (f (x 0 ) f (x )). (14) 21 / 30 Proof of Theorem 3 - Part III Using Markov s inequality, (14) and the inition of K, we get P(f (x K ) f (x ) ɛ) E[f (x K ) f (x )]/ɛ (14) (1 µ) K (f (x 0 ) f (x ))/ɛ (10) ρ. Finally, let us now establish the lower bound on Ω. Letting { } n = p R n : p 0, p i = E[ Ŝ ], we have Ω (11) = max i v i p i (7) min max p i v i p i = 1 E[ Ŝ ] n v i, where the last equality follows since optimal p i is proportional to v i. 22 / 30
12 How to compute the ESO stepsize parameters v 1,..., v n? 23 / 30 C 1 (A) Functions By inition, the ESO parameters v depend on both f and also highlighted by the notation we use: Ŝ. This is (f, Ŝ) ESO(v). Hence, in order to compute these parameters, we need to specify f and Ŝ. The following inition describes a wide class of functions, often appearing in computational practice, for which we will be able to do this. Definition 7 (C 1 (A) functions) Let A R m n. By C 1 (A) we denote the set of continuously differentiable functions f : R n R satisfying the following inequality for all x, h R n : f (x + h) f (x) + ( f (x)) T h ht A T Ah. (15) Example 8 (Least Squares) The function f (x) = 1 2 Ax b 2 satisfies (15) with an equality. Hence, f C 1 (A). 24 / 30
13 Are There More C 1 (A) Functions? Functions we wish to minimize in machine learning often have a finite sum structure: f (x) = j φ j(m j x), (16) where M j R m n are data matrices and φ j : R m R are loss functions. The next result says that under certain assumptions on the loss functions, f C 1 (A) for some A, and describes what this A is. Theorem 9 ([5]) Assume that for each j, function φ j is γ j -smooth: φ j (s) φ j (s ) γ j s s, for all s, s R m. Then f C 1 (A), where A satisfies A T A = J j=1 γ jm T j M j. Remark: The above theorem also says what A T A is. This is important, as will be clear from the next theorem. 25 / 30 ESO for C 1 (A) Functions and an Arbitrary Sampling Theorem 10 ([5]) Assume f C 1 (A) for some real matrix A R m n, let Ŝ be an arbitrary sampling and let P = P(Ŝ) be its probability matrix. If v = (v 1,..., v n ) satisfies P A T A Diag(p v), then (f, Ŝ) ESO(v). 26 / 30
14 Sampling Identity for a Quadratic In in order to prove Theorem 10, we will need the following lemma. Lemma 11 ([5]) Let G be any real n n matrix and Ŝ an arbitrary sampling. Then for any h R n we have E [ h Ṱ Gh ] ( ) S Ŝ = h T P(Ŝ) G h, (17) where denotes the Hadamard (elementwise) product of matrices, and P(Ŝ) is the probability matrix of Ŝ. 27 / 30 Proof of Lemma 11 Proof. Let 1 ij be the indicator random variable of the event i Ŝ & j Ŝ: 1 ij = { 1 if i Ŝ & j Ŝ, 0 otherwise. Note that E[1 ij ] = P(i Ŝ & j Ŝ) = p ij. We now have E [ h Ṱ Gh ] (2) S Ŝ = E n n G ij h i h j = E 1 ij G ij h i h j i Ŝ j Ŝ j=1 n n n n = E[1 ij ]G ij h i h j = p ij G ij h i h j j=1 = h T ( P(Ŝ) G ) h. j=1 28 / 30
15 Proof of Theorem 10 Having established Lemma 11, we are now ready prove Theorem 10. Proof. Fixing any x, h R n, we have E [ f (x + hŝ) ] (15) E [ ] f (x) + f (x), hŝ hṱ S AT AhŜ ] = f (x) + f (x), h p E [ h Ṱ S AT AhŜ (Lemma 11) = f (x) + f (x), h p ht ( P A T A ) h f (x) + f (x), h p ht Diag(p v)h. } {{ } = h 2 p v 29 / 30 References I [1] Yurii Nesterov. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization, 22(2): , 2012 [2] Peter Richtárik and Martin Takáč. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Mathematical Programming, 144(2):1 38, 2014 [3] Peter Richtárik and Martin Takáč. Parallel coordinate descent methods for big data optimization. Mathematical Programming, 1 52, 2015 [4] Peter Richtárik and Martin Takáč. On optimal probabilities in stochastic coordinate descent methods. Optimization Letters, 2015 [5] Zheng Qu and Peter Richtárik. Coordinate descent with arbitrary sampling II: expected separable overapproximation, arxiv: , / 30
Adaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More informationBig Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions
Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions Peter Richtárik School of Mathematics The University of Edinburgh Joint work with Martin Takáč (Edinburgh)
More informationLecture 4: BK inequality 27th August and 6th September, 2007
CSL866: Percolation and Random Graphs IIT Delhi Amitabha Bagchi Scribe: Arindam Pal Lecture 4: BK inequality 27th August and 6th September, 2007 4. Preliminaries The FKG inequality allows us to lower bound
More information2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
More information1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let
Copyright c 2009 by Karl Sigman 1 Stopping Times 1.1 Stopping Times: Definition Given a stochastic process X = {X n : n 0}, a random time τ is a discrete random variable on the same probability space as
More information17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function
17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function, : V V R, which is symmetric, that is u, v = v, u. bilinear, that is linear (in both factors):
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationProbability Generating Functions
page 39 Chapter 3 Probability Generating Functions 3 Preamble: Generating Functions Generating functions are widely used in mathematics, and play an important role in probability theory Consider a sequence
More informationInner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
More informationt := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).
1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction
More informationMathematical finance and linear programming (optimization)
Mathematical finance and linear programming (optimization) Geir Dahl September 15, 2009 1 Introduction The purpose of this short note is to explain how linear programming (LP) (=linear optimization) may
More informationGenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1
(R) User Manual Environmental Energy Technologies Division Berkeley, CA 94720 http://simulationresearch.lbl.gov Michael Wetter MWetter@lbl.gov February 20, 2009 Notice: This work was supported by the U.S.
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationCITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION
No: CITY UNIVERSITY LONDON BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION ENGINEERING MATHEMATICS 2 (resit) EX2005 Date: August
More information1 Norms and Vector Spaces
008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)
More informationProximal mapping via network optimization
L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:
More informationNo: 10 04. Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics
No: 10 04 Bilkent University Monotonic Extension Farhad Husseinov Discussion Papers Department of Economics The Discussion Papers of the Department of Economics are intended to make the initial results
More informationHøgskolen i Narvik Sivilingeniørutdanningen STE6237 ELEMENTMETODER. Oppgaver
Høgskolen i Narvik Sivilingeniørutdanningen STE637 ELEMENTMETODER Oppgaver Klasse: 4.ID, 4.IT Ekstern Professor: Gregory A. Chechkin e-mail: chechkin@mech.math.msu.su Narvik 6 PART I Task. Consider two-point
More informationNumerical methods for American options
Lecture 9 Numerical methods for American options Lecture Notes by Andrzej Palczewski Computational Finance p. 1 American options The holder of an American option has the right to exercise it at any moment
More informationBANACH AND HILBERT SPACE REVIEW
BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but
More informationMathematics Course 111: Algebra I Part IV: Vector Spaces
Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are
More informationIntroduction to Scheduling Theory
Introduction to Scheduling Theory Arnaud Legrand Laboratoire Informatique et Distribution IMAG CNRS, France arnaud.legrand@imag.fr November 8, 2004 1/ 26 Outline 1 Task graphs from outer space 2 Scheduling
More informationProbability Theory. Florian Herzog. A random variable is neither random nor variable. Gian-Carlo Rota, M.I.T..
Probability Theory A random variable is neither random nor variable. Gian-Carlo Rota, M.I.T.. Florian Herzog 2013 Probability space Probability space A probability space W is a unique triple W = {Ω, F,
More informationRandomized Coordinate Descent Methods for Big Data Optimization
Randomized Coordinate Descent Methods for Big Data Optimization Doctoral thesis Martin Takáč Supervisor: Dr. Peter Richtárik Doctor of Philosophy The University of Edinburgh 2014 Randomized Coordinate
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationTHE DYING FIBONACCI TREE. 1. Introduction. Consider a tree with two types of nodes, say A and B, and the following properties:
THE DYING FIBONACCI TREE BERNHARD GITTENBERGER 1. Introduction Consider a tree with two types of nodes, say A and B, and the following properties: 1. Let the root be of type A.. Each node of type A produces
More informationDistributed Coordinate Descent Method for Learning with Big Data
Peter Richtárik Martin Takáč University of Edinburgh, King s Buildings, EH9 3JZ Edinburgh, United Kingdom PETER.RICHTARIK@ED.AC.UK MARTIN.TAKI@GMAIL.COM Abstract In this paper we develop and analyze Hydra:
More informationContinued Fractions and the Euclidean Algorithm
Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction
More informationNumerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number
More informationStochastic Inventory Control
Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the
More information(Quasi-)Newton methods
(Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting
More informationInteger Factorization using the Quadratic Sieve
Integer Factorization using the Quadratic Sieve Chad Seibert* Division of Science and Mathematics University of Minnesota, Morris Morris, MN 56567 seib0060@morris.umn.edu March 16, 2011 Abstract We give
More informationminimal polyonomial Example
Minimal Polynomials Definition Let α be an element in GF(p e ). We call the monic polynomial of smallest degree which has coefficients in GF(p) and α as a root, the minimal polyonomial of α. Example: We
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete
More informationTail inequalities for order statistics of log-concave vectors and applications
Tail inequalities for order statistics of log-concave vectors and applications Rafał Latała Based in part on a joint work with R.Adamczak, A.E.Litvak, A.Pajor and N.Tomczak-Jaegermann Banff, May 2011 Basic
More informationTHE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok
THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan
More informationContinuity of the Perron Root
Linear and Multilinear Algebra http://dx.doi.org/10.1080/03081087.2014.934233 ArXiv: 1407.7564 (http://arxiv.org/abs/1407.7564) Continuity of the Perron Root Carl D. Meyer Department of Mathematics, North
More informationApproximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
More informationPacific Journal of Mathematics
Pacific Journal of Mathematics GLOBAL EXISTENCE AND DECREASING PROPERTY OF BOUNDARY VALUES OF SOLUTIONS TO PARABOLIC EQUATIONS WITH NONLOCAL BOUNDARY CONDITIONS Sangwon Seo Volume 193 No. 1 March 2000
More informationALMOST COMMON PRIORS 1. INTRODUCTION
ALMOST COMMON PRIORS ZIV HELLMAN ABSTRACT. What happens when priors are not common? We introduce a measure for how far a type space is from having a common prior, which we term prior distance. If a type
More informationON SEQUENTIAL CONTINUITY OF COMPOSITION MAPPING. 0. Introduction
ON SEQUENTIAL CONTINUITY OF COMPOSITION MAPPING Abstract. In [1] there was proved a theorem concerning the continuity of the composition mapping, and there was announced a theorem on sequential continuity
More informationSF2940: Probability theory Lecture 8: Multivariate Normal Distribution
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,
More informationSMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA)
SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI (B.Sc.(Hons.), BUAA) A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF MATHEMATICS NATIONAL
More informationIdeal Class Group and Units
Chapter 4 Ideal Class Group and Units We are now interested in understanding two aspects of ring of integers of number fields: how principal they are (that is, what is the proportion of principal ideals
More informationMaster s Theory Exam Spring 2006
Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem
More informationMarkov random fields and Gibbs measures
Chapter Markov random fields and Gibbs measures 1. Conditional independence Suppose X i is a random element of (X i, B i ), for i = 1, 2, 3, with all X i defined on the same probability space (.F, P).
More informationMATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.
MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α
More informationCOMBINATORIAL PROPERTIES OF THE HIGMAN-SIMS GRAPH. 1. Introduction
COMBINATORIAL PROPERTIES OF THE HIGMAN-SIMS GRAPH ZACHARY ABEL 1. Introduction In this survey we discuss properties of the Higman-Sims graph, which has 100 vertices, 1100 edges, and is 22 regular. In fact
More informationTwo-Stage Stochastic Linear Programs
Two-Stage Stochastic Linear Programs Operations Research Anthony Papavasiliou 1 / 27 Two-Stage Stochastic Linear Programs 1 Short Reviews Probability Spaces and Random Variables Convex Analysis 2 Deterministic
More informationNonlinear Optimization: Algorithms 3: Interior-point methods
Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,
More informationTHREE DIMENSIONAL GEOMETRY
Chapter 8 THREE DIMENSIONAL GEOMETRY 8.1 Introduction In this chapter we present a vector algebra approach to three dimensional geometry. The aim is to present standard properties of lines and planes,
More informationVector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty
More informationChapter 4 Lecture Notes
Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,
More informationChapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling
Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one
More informationWalrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.
Walrasian Demand Econ 2100 Fall 2015 Lecture 5, September 16 Outline 1 Walrasian Demand 2 Properties of Walrasian Demand 3 An Optimization Recipe 4 First and Second Order Conditions Definition Walrasian
More informationINDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS
INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem
More informationAn improved on-line algorithm for scheduling on two unrestrictive parallel batch processing machines
This is the Pre-Published Version. An improved on-line algorithm for scheduling on two unrestrictive parallel batch processing machines Q.Q. Nong, T.C.E. Cheng, C.T. Ng Department of Mathematics, Ocean
More informationSeveral Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
More informationLet H and J be as in the above lemma. The result of the lemma shows that the integral
Let and be as in the above lemma. The result of the lemma shows that the integral ( f(x, y)dy) dx is well defined; we denote it by f(x, y)dydx. By symmetry, also the integral ( f(x, y)dx) dy is well defined;
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationDate: April 12, 2001. Contents
2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........
More informationa 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.
Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given
More informationTwo Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014
More informationCyber-Security Analysis of State Estimators in Power Systems
Cyber-Security Analysis of State Estimators in Electric Power Systems André Teixeira 1, Saurabh Amin 2, Henrik Sandberg 1, Karl H. Johansson 1, and Shankar Sastry 2 ACCESS Linnaeus Centre, KTH-Royal Institute
More informationMATHEMATICAL METHODS OF STATISTICS
MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS
More informationNotes on Symmetric Matrices
CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.
More informationImprecise probabilities, bets and functional analytic methods in Łukasiewicz logic.
Imprecise probabilities, bets and functional analytic methods in Łukasiewicz logic. Martina Fedel joint work with K.Keimel,F.Montagna,W.Roth Martina Fedel (UNISI) 1 / 32 Goal The goal of this talk is to
More informationRANDOM INTERVAL HOMEOMORPHISMS. MICHA L MISIUREWICZ Indiana University Purdue University Indianapolis
RANDOM INTERVAL HOMEOMORPHISMS MICHA L MISIUREWICZ Indiana University Purdue University Indianapolis This is a joint work with Lluís Alsedà Motivation: A talk by Yulij Ilyashenko. Two interval maps, applied
More informationWe shall turn our attention to solving linear systems of equations. Ax = b
59 Linear Algebra We shall turn our attention to solving linear systems of equations Ax = b where A R m n, x R n, and b R m. We already saw examples of methods that required the solution of a linear system
More informationSome probability and statistics
Appendix A Some probability and statistics A Probabilities, random variables and their distribution We summarize a few of the basic concepts of random variables, usually denoted by capital letters, X,Y,
More information1 Approximating Set Cover
CS 05: Algorithms (Grad) Feb 2-24, 2005 Approximating Set Cover. Definition An Instance (X, F ) of the set-covering problem consists of a finite set X and a family F of subset of X, such that every elemennt
More informationSeparation Properties for Locally Convex Cones
Journal of Convex Analysis Volume 9 (2002), No. 1, 301 307 Separation Properties for Locally Convex Cones Walter Roth Department of Mathematics, Universiti Brunei Darussalam, Gadong BE1410, Brunei Darussalam
More informationGeometrical Characterization of RN-operators between Locally Convex Vector Spaces
Geometrical Characterization of RN-operators between Locally Convex Vector Spaces OLEG REINOV St. Petersburg State University Dept. of Mathematics and Mechanics Universitetskii pr. 28, 198504 St, Petersburg
More informationLevel Set Framework, Signed Distance Function, and Various Tools
Level Set Framework Geometry and Calculus Tools Level Set Framework,, and Various Tools Spencer Department of Mathematics Brigham Young University Image Processing Seminar (Week 3), 2010 Level Set Framework
More informationLinear Algebra I. Ronald van Luijk, 2012
Linear Algebra I Ronald van Luijk, 2012 With many parts from Linear Algebra I by Michael Stoll, 2007 Contents 1. Vector spaces 3 1.1. Examples 3 1.2. Fields 4 1.3. The field of complex numbers. 6 1.4.
More informationLectures on Stochastic Processes. William G. Faris
Lectures on Stochastic Processes William G. Faris November 8, 2001 2 Contents 1 Random walk 7 1.1 Symmetric simple random walk................... 7 1.2 Simple random walk......................... 9 1.3
More informationMath 4310 Handout - Quotient Vector Spaces
Math 4310 Handout - Quotient Vector Spaces Dan Collins The textbook defines a subspace of a vector space in Chapter 4, but it avoids ever discussing the notion of a quotient space. This is understandable
More informationAn Additive Neumann-Neumann Method for Mortar Finite Element for 4th Order Problems
An Additive eumann-eumann Method for Mortar Finite Element for 4th Order Problems Leszek Marcinkowski Department of Mathematics, University of Warsaw, Banacha 2, 02-097 Warszawa, Poland, Leszek.Marcinkowski@mimuw.edu.pl
More informationSummer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.
Summer course on Convex Optimization Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.Minnesota Interior-Point Methods: the rebirth of an old idea Suppose that f is
More informationAssociativity condition for some alternative algebras of degree three
Associativity condition for some alternative algebras of degree three Mirela Stefanescu and Cristina Flaut Abstract In this paper we find an associativity condition for a class of alternative algebras
More informationSF2940: Probability theory Lecture 8: Multivariate Normal Distribution
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2014 Timo Koski () Mathematisk statistik 24.09.2014 1 / 75 Learning outcomes Random vectors, mean vector, covariance
More informationError estimates for nearly degenerate finite elements
Error estimates for nearly degenerate finite elements Pierre Jamet In RAIRO: Analyse Numérique, Vol 10, No 3, March 1976, p. 43 61 Abstract We study a property which is satisfied by most commonly used
More informationSimilarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
More informationThe Multiplicative Weights Update method
Chapter 2 The Multiplicative Weights Update method The Multiplicative Weights method is a simple idea which has been repeatedly discovered in fields as diverse as Machine Learning, Optimization, and Game
More informationNMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing
NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing Alex Cloninger Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu
More informationSome stability results of parameter identification in a jump diffusion model
Some stability results of parameter identification in a jump diffusion model D. Düvelmeyer Technische Universität Chemnitz, Fakultät für Mathematik, 09107 Chemnitz, Germany Abstract In this paper we discuss
More informationInfluences in low-degree polynomials
Influences in low-degree polynomials Artūrs Bačkurs December 12, 2012 1 Introduction In 3] it is conjectured that every bounded real polynomial has a highly influential variable The conjecture is known
More informationMessage-passing sequential detection of multiple change points in networks
Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal
More informationGI01/M055 Supervised Learning Proximal Methods
GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators
More information1 Portfolio mean and variance
Copyright c 2005 by Karl Sigman Portfolio mean and variance Here we study the performance of a one-period investment X 0 > 0 (dollars) shared among several different assets. Our criterion for measuring
More information17.3.1 Follow the Perturbed Leader
CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.
More informationDiscussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski.
Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski. Fabienne Comte, Celine Duval, Valentine Genon-Catalot To cite this version: Fabienne
More informationA Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data
A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data Faming Liang University of Florida August 9, 2015 Abstract MCMC methods have proven to be a very powerful tool for analyzing
More informationLecture notes: single-agent dynamics 1
Lecture notes: single-agent dynamics 1 Single-agent dynamic optimization models In these lecture notes we consider specification and estimation of dynamic optimization models. Focus on single-agent models.
More information1 Introduction. 2 Prediction with Expert Advice. Online Learning 9.520 Lecture 09
1 Introduction Most of the course is concerned with the batch learning problem. In this lecture, however, we look at a different model, called online. Let us first compare and contrast the two. In batch
More informationPerron vector Optimization applied to search engines
Perron vector Optimization applied to search engines Olivier Fercoq INRIA Saclay and CMAP Ecole Polytechnique May 18, 2011 Web page ranking The core of search engines Semantic rankings (keywords) Hyperlink
More informationModeling and Performance Evaluation of Computer Systems Security Operation 1
Modeling and Performance Evaluation of Computer Systems Security Operation 1 D. Guster 2 St.Cloud State University 3 N.K. Krivulin 4 St.Petersburg State University 5 Abstract A model of computer system
More informationThe Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem
More informationTHE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS
THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS KEITH CONRAD 1. Introduction The Fundamental Theorem of Algebra says every nonconstant polynomial with complex coefficients can be factored into linear
More information