# The Expectation Maximization Algorithm A short tutorial

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 The Expectation Maximiation Algorithm A short tutorial Sean Borman Comments and corrections to: em-tut at seanborman dot com July Last updated January 09, 2009 Revision history Corrected grammar in the paragraph which precedes Equation (7. Changed datestamp format in the revision history Corrected caption for Figure (2. Added conditioning on n for l in convergence discussion in Section (3.2. Changed contact info to reduce spam Added explanation and disambiguating parentheses in the development leading to Equation (4. Minor corrections Added Figure (. Corrected typo above Equation (5. Minor corrections. Added hyperlinks Minor corrections Initial revision. Introduction This tutorial discusses the Expectation Maximiation (EM algorithm of Dempster, Laird and Rubin []. The approach taken follows that of an unpublished note by Stuart Russel, but fleshes out some of the gory details. In order to ensure that the presentation is reasonably self-contained, some of the results on which the derivation of the algorithm is based are presented prior to the main results. The EM algorithm has become a popular tool in statistical estimation problems involving incomplete data, or in problems which can be posed in a similar form, such as mixture estimation [3, 4]. The EM algorithm has also been used in various motion estimation frameworks [5] and variants of it have been used in multiframe superresolution restoration methods which combine motion estimation along the lines of [2].

2 λf(x + ( λf(x 2 f(λx + ( λx 2 a x λx + ( λx 2 x 2 Figure : f is convex on [a, b] if f(λx + ( λx 2 λf(x + ( λf(x 2 x, x 2 [a, b], λ [0, ]. 2 Convex Functions Definition Let f be a real valued function defined on an interval I = [a, b]. f is said to be convex on I if x, x 2 I, λ [0, ], f(λx + ( λx 2 λf(x + ( λf(x 2. f is said to be strictly convex if the inequality is strict. Intuitively, this definition states that the function falls below (strictly convex or is never above (convex the straight line (the secant from points (x, f(x to (x 2, f(x 2. See Figure (. Definition 2 f is concave (strictly concave if f is convex (strictly convex. Theorem If f(x is twice differentiable on [a, b] and f (x 0 on [a, b] then f(x is convex on [a, b]. Proof: For x y [a, b] and λ [0, ] let = λy+( λx. By definition, f is convex iff f(λy+( λx λf(y+( λf(x. Writing = λy+( λx, and noting that f( = λf(+( λf( we have that f( = λf(+( λf( λf(y+( λf(x. By rearranging terms, an equivalent definition for convexity can be obtained: f is convex if λ[f(y f(] ( λ[f( f(x] ( By the mean value theorem, s, x s s.t. f( f(x = f (s( x (2 b 2

3 Similarly, applying the mean value theorem to f(y f(, t, t y s.t. f(y f( = f (t(y (3 Thus we have the situation, x s t y. By assumption, f (x 0 on [a, b] so f (s f (t since s t. (4 Also note that we may rewrite = λy + ( λx in the form Finally, combining the above we have, ( λ( x = λ(y. (5 ( λ[f( f(x] = ( λf (s( x by Equation (2 f (t( λ( x by Equation (4 = λf (t(y by Equation (5 = λ[f(y f(] by Equation (3. Proposition ln(x is strictly convex on (0,. Proof: With f(x = ln(x, we have f (x = x 2 > 0 for x (0,. By Theorem (, ln(x is strictly convex on (0,. Also, by Definition (2 ln(x is strictly concave on (0,. The notion of convexity can be extended to apply to n points. This result is known as Jensen s inequality. Theorem 2 (Jensen s inequality Let f be a convex function defined on an interval I. If x, x 2,..., x n I and λ, λ 2,..., λ n 0 with n λ i =, ( n f λ i x i λ i f(x i Proof: For n = this is trivial. The case n = 2 corresponds to the definition of convexity. To show that this is true for all natural numbers, we proceed by induction. Assume the theorem is true for some n then, ( n+ ( f λ i x i = f λ n+ x n+ + λ i x i ( = f λ n+ x n+ + ( λ n+ λ n+ 3 λ i x i

4 ( λ n+ f (x n+ + ( λ n+ f λ i x i λ n+ ( n λ i = λ n+ f (x n+ + ( λ n+ f x i λ n+ λ n+ f (x n+ + ( λ n+ = λ n+ f (x n+ + = n+ λ i f (x i λ i f (x i λ i λ n+ f (x i Since ln(x is concave, we may apply Jensen s inequality to obtain the useful result, ln λ i x i λ i ln(x i. (6 This allows us to lower-bound a logarithm of a sum, a result that is used in the derivation of the EM algorithm. Jensen s inequality provides a simple proof that the arithmetic mean is greater than or equal to the geometric mean. Proposition 2 n x i n x x 2 x n. Proof: If x, x 2,..., x n 0 then, since ln(x is concave we have ln x i n n ln(x i = n ln(x x 2 x n = ln(x x 2 x n n Thus, we have n x i n x x 2 x n 4

5 3 The Expectation-Maximiation Algorithm The EM algorithm is an efficient iterative procedure to compute the Maximum Likelihood (ML estimate in the presence of missing or hidden data. In ML estimation, we wish to estimate the model parameter(s for which the observed data are the most likely. Each iteration of the EM algorithm consists of two processes: The E-step, and the M-step. In the expectation, or E-step, the missing data are estimated given the observed data and current estimate of the model parameters. This is achieved using the conditional expectation, explaining the choice of terminology. In the M-step, the likelihood function is maximied under the assumption that the missing data are known. The estimate of the missing data from the E-step are used in lieu of the actual missing data. Convergence is assured since the algorithm is guaranteed to increase the likelihood at each iteration. 3. Derivation of the EM-algorithm Let X be random vector which results from a parameteried family. We wish to find such that P(X is a maximum. This is known as the Maximum Likelihood (ML estimate for. In order to estimate, it is typical to introduce the log likelihood function defined as, L( = ln P(X. (7 The likelihood function is considered to be a function of the parameter given the data X. Since ln(x is a strictly increasing function, the value of which maximies P(X also maximies L(. The EM algorithm is an iterative procedure for maximiing L(. Assume that after the n th iteration the current estimate for is given by n. Since the objective is to maximie L(, we wish to compute an updated estimate such that, L( > L( n (8 Equivalently we want to maximie the difference, L( L( n = ln P(X ln P(X n. (9 So far, we have not considered any unobserved or missing variables. In problems where such data exist, the EM algorithm provides a natural framework for their inclusion. Alternately, hidden variables may be introduced purely as an artifice for making the maximum likelihood estimation of tractable. In this case, it is assumed that knowledge of the hidden variables will make the maximiation of the likelihood function easier. Either way, denote the hidden random vector by Z and a given realiation by. The total probability P(X may be written in terms of the hidden variables as, P(X = P(X, P(. (0 5

6 We may then rewrite Equation (9 as, L( L( n = ln P(X, P( ln P(X n. ( Notice that this expression involves the logarithm of a sum. In Section (2 using Jensen s inequality, it was shown that, ln λ i x i λ i ln(x i for constants λ i 0 with n λ i =. This result may be applied to Equation ( provided that the constants λ i can be identified. Consider letting the constants be of the form P( X, n. Since P( X, n is a probability measure, we have that P( X, n 0 and that P( X, n = as required. Then starting with Equation ( the constants P( X, n are introduced as, L( L( n = ln P(X, P( ln P(X n = ln P(X, P( P( X, n ln P(X n P( X, n P(X, P( = ln P( X, n ln P(X n P( X, n P(X, P( P( X, n ln ln P(X n (2 P( X, n = P(X, P( P( X, n ln (3 P( X, n P(X n = ( n. (4 In going from Equation (2 to Equation (3 we made use of the fact that P( X, n = so that ln P(X n = P( X, nln P(X n which allows the term ln P(X n to be brought into the summation. We continue by writing and for convenience define, L( L( n + ( n (5 l( n = L( n + ( n so that the relationship in Equation (5 can be made explicit as, L( l( n. 6

7 L( n+ l( n+ n L( n = l( n n L( l( n L( l( n n n+ Figure 2: Graphical interpretation of a single iteration of the EM algorithm: The function l( n is bounded above by the likelihood function L(. The functions are equal at = n. The EM algorithm chooses n+ as the value of for which l( n is a maximum. Since L( l( n increasing l( n ensures that the value of the likelihood function L( is increased at each step. We have now a function, l( n which is bounded above by the likelihood function L(. Additionally, observe that, l( n n = L( n + ( n n = L( n + = L( n + P( X, n ln P(X, np( n P( X, n P(X n P( X, n ln P(X, n P(X, n = L( n + P( X, n ln = L( n, (6 so for = n the functions l( n and L( are equal. Our objective is to choose a values of so that L( is maximied. We have shown that the function l( n is bounded above by the likelihood function L( and that the value of the functions l( n and L( are equal at the current estimate for = n. Therefore, any which increases l( n will also increase L(. In order to achieve the greatest possible increase in the value of L(, the EM algorithm calls for selecting such that l( n is maximied. We denote this updated value as n+. This process is illustrated in Figure (2. 7

8 Formally we have, n+ {l( n } { L( n + P( X, n ln Now drop terms which are constant w.r.t. { } P( X, n ln P(X, P( } P(X, P( P(X n P( X, n { } P(X,, P(, P( X, n ln P(, P( { } P( X, n ln P(X, { EZ X,n {ln P(X, } } (7 In Equation (7 the expectation and maximiation steps are apparent. The EM algorithm thus consists of iterating the:. E-step: Determine the conditional expectation E Z X,n {ln P(X, } 2. M-step: Maximie this expression with respect to. At this point it is fair to ask what has been gained given that we have simply traded the maximiation of L( for the maximiation of l( n. The answer lies in the fact that l( n takes into account the unobserved or missing data Z. In the case where we wish to estimate these variables the EM algorithms provides a framework for doing so. Also, as alluded to earlier, it may be convenient to introduce such hidden variables so that the maximiation of L( n is simplified given knowledge of the hidden variables. (as compared with a direct maximiation of L( 3.2 Convergence of the EM Algorithm The convergence properties of the EM algorithm are discussed in detail by McLachlan and Krishnan [3]. In this section we discuss the general convergence of the algorithm. Recall that n+ is the estimate for which maximies the difference ( n. Starting with the current estimate for, that is, n we had that ( n n = 0. Since n+ is chosen to maximie ( n, we then have that ( n+ n ( n n = 0, so for each iteration the likelihood L( is nondecreasing. When the algorithm reaches a fixed point for some n the value n maximies l( n. Since L and l are equal at n if L and l are differentiable at n, then n must be a stationary point of L. The stationary point need not, however, be a local maximum. In [3] it is shown that it is possible for the algorithm to converge to local minima or saddle points in unusual cases. 8

9 3.3 The Generalied EM Algorithm In the formulation of the EM algorithm described above, n+ was chosen as the value of for which ( n was maximied. While this ensures the greatest increase in L(, it is however possible to relax the requirement of maximiation to one of simply increasing ( n so that ( n+ n ( n n. This approach, to simply increase and not necessarily maximie ( n+ n is known as the Generalied Expectation Maximiation (GEM algorithm and is often useful in cases where the maximiation is difficult. The convergence of the GEM algorithm can be argued as above. References [] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society: Series B, 39(: 38, November 977. [2] R. C. Hardie, K. J. Barnard, and E. E. Armstrong. Joint MAP registration and high-resolution image estimation using a sequence of undersampled images. IEEE Transactions on Image Processing, 6(2:62 633, December 997. [3] Geoffrey McLachlan and Thriyambakam Krishnan. The EM Algorithm and Extensions. John Wiley & Sons, New York, 996. [4] Geoffrey McLachlan and David Peel. Finite Mixture Models. John Wiley & Sons, New York, [5] Yair Weiss. Bayesian motion estimation and segmentation. PhD thesis, Massachusetts Institute of Technology, May

### Critical points of once continuously differentiable functions are important because they are the only points that can be local maxima or minima.

Lecture 0: Convexity and Optimization We say that if f is a once continuously differentiable function on an interval I, and x is a point in the interior of I that x is a critical point of f if f (x) =

### Notes from Week 1: Algorithms for sequential prediction

CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 1: Algorithms for sequential prediction Instructor: Robert Kleinberg 22-26 Jan 2007 1 Introduction In this course we will be looking

### A binary search tree is a binary tree with a special property called the BST-property, which is given as follows:

Chapter 12: Binary Search Trees A binary search tree is a binary tree with a special property called the BST-property, which is given as follows: For all nodes x and y, if y belongs to the left subtree

### WHAT ARE MATHEMATICAL PROOFS AND WHY THEY ARE IMPORTANT?

WHAT ARE MATHEMATICAL PROOFS AND WHY THEY ARE IMPORTANT? introduction Many students seem to have trouble with the notion of a mathematical proof. People that come to a course like Math 216, who certainly

### Solutions of Equations in One Variable. Fixed-Point Iteration II

Solutions of Equations in One Variable Fixed-Point Iteration II Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011

### Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

### Convex Optimization SVM s and Kernel Machines

Convex Optimization SVM s and Kernel Machines S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola and Stéphane Canu S.V.N.

### Clustering - example. Given some data x i X Find a partitioning of the data into k disjunctive clusters Example: k-means clustering

Clustering - example Graph Mining and Graph Kernels Given some data x i X Find a partitioning of the data into k disjunctive clusters Example: k-means clustering x!!!!8!!! 8 x 1 1 Clustering - example

### 2.3 Convex Constrained Optimization Problems

42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

### Bayesian Image Super-Resolution

Bayesian Image Super-Resolution Michael E. Tipping and Christopher M. Bishop Microsoft Research, Cambridge, U.K..................................................................... Published as: Bayesian

### Stochastic Inventory Control

Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the

### An EM algorithm for the estimation of a ne state-space systems with or without known inputs

An EM algorithm for the estimation of a ne state-space systems with or without known inputs Alexander W Blocker January 008 Abstract We derive an EM algorithm for the estimation of a ne Gaussian state-space

CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.

### 5. Convergence of sequences of random variables

5. Convergence of sequences of random variables Throughout this chapter we assume that {X, X 2,...} is a sequence of r.v. and X is a r.v., and all of them are defined on the same probability space (Ω,

### Mathematics Notes for Class 12 chapter 12. Linear Programming

1 P a g e Mathematics Notes for Class 12 chapter 12. Linear Programming Linear Programming It is an important optimization (maximization or minimization) technique used in decision making is business and

### CHAPTER 3. Sequences. 1. Basic Properties

CHAPTER 3 Sequences We begin our study of analysis with sequences. There are several reasons for starting here. First, sequences are the simplest way to introduce limits, the central idea of calculus.

### N E W S A N D L E T T E R S

N E W S A N D L E T T E R S 73rd Annual William Lowell Putnam Mathematical Competition Editor s Note: Additional solutions will be printed in the Monthly later in the year. PROBLEMS A1. Let d 1, d,...,

### Week 1: Introduction to Online Learning

Week 1: Introduction to Online Learning 1 Introduction This is written based on Prediction, Learning, and Games (ISBN: 2184189 / -21-8418-9 Cesa-Bianchi, Nicolo; Lugosi, Gabor 1.1 A Gentle Start Consider

### Senior Secondary Australian Curriculum

Senior Secondary Australian Curriculum Mathematical Methods Glossary Unit 1 Functions and graphs Asymptote A line is an asymptote to a curve if the distance between the line and the curve approaches zero

Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

### INTRODUCTION TO THE CONVERGENCE OF SEQUENCES

INTRODUCTION TO THE CONVERGENCE OF SEQUENCES BECKY LYTLE Abstract. In this paper, we discuss the basic ideas involved in sequences and convergence. We start by defining sequences and follow by explaining

### 6.254 : Game Theory with Engineering Applications Lecture 5: Existence of a Nash Equilibrium

6.254 : Game Theory with Engineering Applications Lecture 5: Existence of a Nash Equilibrium Asu Ozdaglar MIT February 18, 2010 1 Introduction Outline Pricing-Congestion Game Example Existence of a Mixed

### Items related to expected use of graphing technology appear in bold italics.

- 1 - Items related to expected use of graphing technology appear in bold italics. Investigating the Graphs of Polynomial Functions determine, through investigation, using graphing calculators or graphing

### Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing

### Robotics 2 Clustering & EM. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard

Robotics 2 Clustering & EM Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard 1 Clustering (1) Common technique for statistical data analysis to detect structure (machine learning,

### Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean

Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. by Philip Kostov and Seamus McErlean Working Paper, Agricultural and Food Economics, Queen

### x a x 2 (1 + x 2 ) n.

Limits and continuity Suppose that we have a function f : R R. Let a R. We say that f(x) tends to the limit l as x tends to a; lim f(x) = l ; x a if, given any real number ɛ > 0, there exists a real number

### Concentration inequalities for order statistics Using the entropy method and Rényi s representation

Concentration inequalities for order statistics Using the entropy method and Rényi s representation Maud Thomas 1 in collaboration with Stéphane Boucheron 1 1 LPMA Université Paris-Diderot High Dimensional

### Numerical methods for finding the roots of a function

Numerical methods for finding the roots of a function The roots of a function f (x) are defined as the values for which the value of the function becomes equal to zero. So, finding the roots of f (x) means

### HOMEWORK 5 SOLUTIONS. n!f n (1) lim. ln x n! + xn x. 1 = G n 1 (x). (2) k + 1 n. (n 1)!

Math 7 Fall 205 HOMEWORK 5 SOLUTIONS Problem. 2008 B2 Let F 0 x = ln x. For n 0 and x > 0, let F n+ x = 0 F ntdt. Evaluate n!f n lim n ln n. By directly computing F n x for small n s, we obtain the following

### Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University

Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x θ) Likelihood: eskmate

### ANALYTICAL MATHEMATICS FOR APPLICATIONS 2016 LECTURE NOTES Series

ANALYTICAL MATHEMATICS FOR APPLICATIONS 206 LECTURE NOTES 8 ISSUED 24 APRIL 206 A series is a formal sum. Series a + a 2 + a 3 + + + where { } is a sequence of real numbers. Here formal means that we don

### THE FUNDAMENTAL THEOREM OF ARBITRAGE PRICING

THE FUNDAMENTAL THEOREM OF ARBITRAGE PRICING 1. Introduction The Black-Scholes theory, which is the main subject of this course and its sequel, is based on the Efficient Market Hypothesis, that arbitrages

### AP CALCULUS AB 2009 SCORING GUIDELINES

AP CALCULUS AB 2009 SCORING GUIDELINES Question 5 x 2 5 8 f ( x ) 1 4 2 6 Let f be a function that is twice differentiable for all real numbers. The table above gives values of f for selected points in

### 9.2 Summation Notation

9. Summation Notation 66 9. Summation Notation In the previous section, we introduced sequences and now we shall present notation and theorems concerning the sum of terms of a sequence. We begin with a

### LECTURE 15: AMERICAN OPTIONS

LECTURE 15: AMERICAN OPTIONS 1. Introduction All of the options that we have considered thus far have been of the European variety: exercise is permitted only at the termination of the contract. These

### Notes: Chapter 2 Section 2.2: Proof by Induction

Notes: Chapter 2 Section 2.2: Proof by Induction Basic Induction. To prove: n, a W, n a, S n. (1) Prove the base case - S a. (2) Let k a and prove that S k S k+1 Example 1. n N, n i = n(n+1) 2. Example

### Lecture 1 Newton s method.

Lecture 1 Newton s method. 1 Square roots 2 Newton s method. The guts of the method. A vector version. Implementation. The existence theorem. Basins of attraction. The Babylonian algorithm for finding

### Metric Spaces. Chapter 1

Chapter 1 Metric Spaces Many of the arguments you have seen in several variable calculus are almost identical to the corresponding arguments in one variable calculus, especially arguments concerning convergence

### 1 Convex Sets, and Convex Functions

Convex Sets, and Convex Functions In this section, we introduce one of the most important ideas in the theory of optimization, that of a convex set. We discuss other ideas which stem from the basic definition,

### CHAPTER 5. Product Measures

54 CHAPTER 5 Product Measures Given two measure spaces, we may construct a natural measure on their Cartesian product; the prototype is the construction of Lebesgue measure on R 2 as the product of Lebesgue

### Definition of a Linear Program

Definition of a Linear Program Definition: A function f(x 1, x,..., x n ) of x 1, x,..., x n is a linear function if and only if for some set of constants c 1, c,..., c n, f(x 1, x,..., x n ) = c 1 x 1

### CF4103 Financial Time Series. 1 Financial Time Series and Their Characteristics

CF4103 Financial Time Series 1 Financial Time Series and Their Characteristics Financial time series analysis is concerned with theory and practice of asset valuation over time. For example, there are

### Mathematical Methods of Engineering Analysis

Mathematical Methods of Engineering Analysis Erhan Çinlar Robert J. Vanderbei February 2, 2000 Contents Sets and Functions 1 1 Sets................................... 1 Subsets.............................

### MATHEMATICS (CLASSES XI XII)

MATHEMATICS (CLASSES XI XII) General Guidelines (i) All concepts/identities must be illustrated by situational examples. (ii) The language of word problems must be clear, simple and unambiguous. (iii)

### MAT2400 Analysis I. A brief introduction to proofs, sets, and functions

MAT2400 Analysis I A brief introduction to proofs, sets, and functions In Analysis I there is a lot of manipulations with sets and functions. It is probably also the first course where you have to take

### Online Convex Optimization

E0 370 Statistical Learning heory Lecture 19 Oct 22, 2013 Online Convex Optimization Lecturer: Shivani Agarwal Scribe: Aadirupa 1 Introduction In this lecture we shall look at a fairly general setting

### 1 Mathematical Induction

Extra Credit Homework Problems Note: these problems are of varying difficulty, so you might want to assign different point values for the different problems. I have suggested the point values each problem

### The Dirichlet Unit Theorem

Chapter 6 The Dirichlet Unit Theorem As usual, we will be working in the ring B of algebraic integers of a number field L. Two factorizations of an element of B are regarded as essentially the same if

### Taylor and Maclaurin Series

Taylor and Maclaurin Series In the preceding section we were able to find power series representations for a certain restricted class of functions. Here we investigate more general problems: Which functions

### The Geometric Series

The Geometric Series Professor Jeff Stuart Pacific Lutheran University c 8 The Geometric Series Finite Number of Summands The geometric series is a sum in which each summand is obtained as a common multiple

### Course 221: Analysis Academic year , First Semester

Course 221: Analysis Academic year 2007-08, First Semester David R. Wilkins Copyright c David R. Wilkins 1989 2007 Contents 1 Basic Theorems of Real Analysis 1 1.1 The Least Upper Bound Principle................

### MATHEMATICS 31 A. COURSE OVERVIEW RATIONALE

MATHEMATICS 31 A. COURSE OVERVIEW RATIONALE To set goals and make informed choices, students need an array of thinking and problem-solving skills. Fundamental to this is an understanding of mathematical

### Mathematics for Economics (Part I) Note 5: Convex Sets and Concave Functions

Natalia Lazzati Mathematics for Economics (Part I) Note 5: Convex Sets and Concave Functions Note 5 is based on Madden (1986, Ch. 1, 2, 4 and 7) and Simon and Blume (1994, Ch. 13 and 21). Concave functions

### Prime Numbers. Chapter Primes and Composites

Chapter 2 Prime Numbers The term factoring or factorization refers to the process of expressing an integer as the product of two or more integers in a nontrivial way, e.g., 42 = 6 7. Prime numbers are

### Problem 1 (10 pts) Find the radius of convergence and interval of convergence of the series

1 Problem 1 (10 pts) Find the radius of convergence and interval of convergence of the series a n n=1 n(x + 2) n 5 n 1. n(x + 2)n Solution: Do the ratio test for the absolute convergence. Let a n =. Then,

### Universal Algorithm for Trading in Stock Market Based on the Method of Calibration

Universal Algorithm for Trading in Stock Market Based on the Method of Calibration Vladimir V yugin Institute for Information Transmission Problems, Russian Academy of Sciences, Bol shoi Karetnyi per.

### 2.2. Instantaneous Velocity

2.2. Instantaneous Velocity toc Assuming that your are not familiar with the technical aspects of this section, when you think about it, your knowledge of velocity is limited. In terms of your own mathematical

### Supplement to Call Centers with Delay Information: Models and Insights

Supplement to Call Centers with Delay Information: Models and Insights Oualid Jouini 1 Zeynep Akşin 2 Yves Dallery 1 1 Laboratoire Genie Industriel, Ecole Centrale Paris, Grande Voie des Vignes, 92290

### Induction. Margaret M. Fleck. 10 October These notes cover mathematical induction and recursive definition

Induction Margaret M. Fleck 10 October 011 These notes cover mathematical induction and recursive definition 1 Introduction to induction At the start of the term, we saw the following formula for computing

### t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

### 3. Continuous Random Variables

3. Continuous Random Variables A continuous random variable is one which can take any value in an interval (or union of intervals) The values that can be taken by such a variable cannot be listed. Such

### Nonlinear Optimization: Algorithms 3: Interior-point methods

Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,

### Infinite series, improper integrals, and Taylor series

Chapter Infinite series, improper integrals, and Taylor series. Introduction This chapter has several important and challenging goals. The first of these is to understand how concepts that were discussed

### Sequences and Convergence in Metric Spaces

Sequences and Convergence in Metric Spaces Definition: A sequence in a set X (a sequence of elements of X) is a function s : N X. We usually denote s(n) by s n, called the n-th term of s, and write {s

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### Master s Theory Exam Spring 2006

Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

### Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

### Chapter 6 Finite sets and infinite sets. Copyright 2013, 2005, 2001 Pearson Education, Inc. Section 3.1, Slide 1

Chapter 6 Finite sets and infinite sets Copyright 013, 005, 001 Pearson Education, Inc. Section 3.1, Slide 1 Section 6. PROPERTIES OF THE NATURE NUMBERS 013 Pearson Education, Inc.1 Slide Recall that denotes

### By W.E. Diewert. July, Linear programming problems are important for a number of reasons:

APPLIED ECONOMICS By W.E. Diewert. July, 3. Chapter : Linear Programming. Introduction The theory of linear programming provides a good introduction to the study of constrained maximization (and minimization)

### ALMOST COMMON PRIORS 1. INTRODUCTION

ALMOST COMMON PRIORS ZIV HELLMAN ABSTRACT. What happens when priors are not common? We introduce a measure for how far a type space is from having a common prior, which we term prior distance. If a type

### CHAPTER 1 Splines and B-splines an Introduction

CHAPTER 1 Splines and B-splines an Introduction In this first chapter, we consider the following fundamental problem: Given a set of points in the plane, determine a smooth curve that approximates the

### Math 2602 Finite and Linear Math Fall 14. Homework 9: Core solutions

Math 2602 Finite and Linear Math Fall 14 Homework 9: Core solutions Section 8.2 on page 264 problems 13b, 27a-27b. Section 8.3 on page 275 problems 1b, 8, 10a-10b, 14. Section 8.4 on page 279 problems

### 61. REARRANGEMENTS 119

61. REARRANGEMENTS 119 61. Rearrangements Here the difference between conditionally and absolutely convergent series is further refined through the concept of rearrangement. Definition 15. (Rearrangement)

### Lecture 3: Solving Equations Using Fixed Point Iterations

cs41: introduction to numerical analysis 09/14/10 Lecture 3: Solving Equations Using Fixed Point Iterations Instructor: Professor Amos Ron Scribes: Yunpeng Li, Mark Cowlishaw, Nathanael Fillmore Our problem,

### Series Convergence Tests Math 122 Calculus III D Joyce, Fall 2012

Some series converge, some diverge. Series Convergence Tests Math 22 Calculus III D Joyce, Fall 202 Geometric series. We ve already looked at these. We know when a geometric series converges and what it

### max cx s.t. Ax c where the matrix A, cost vector c and right hand side b are given and x is a vector of variables. For this example we have x

Linear Programming Linear programming refers to problems stated as maximization or minimization of a linear function subject to constraints that are linear equalities and inequalities. Although the study

### Probabilistic user behavior models in online stores for recommender systems

Probabilistic user behavior models in online stores for recommender systems Tomoharu Iwata Abstract Recommender systems are widely used in online stores because they are expected to improve both user

### (Quasi-)Newton methods

(Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

### Solutions Of Some Non-Linear Programming Problems BIJAN KUMAR PATEL. Master of Science in Mathematics. Prof. ANIL KUMAR

Solutions Of Some Non-Linear Programming Problems A PROJECT REPORT submitted by BIJAN KUMAR PATEL for the partial fulfilment for the award of the degree of Master of Science in Mathematics under the supervision

### Some Notes on Taylor Polynomials and Taylor Series

Some Notes on Taylor Polynomials and Taylor Series Mark MacLean October 3, 27 UBC s courses MATH /8 and MATH introduce students to the ideas of Taylor polynomials and Taylor series in a fairly limited

### Testing Cost Inefficiency under Free Entry in the Real Estate Brokerage Industry

Web Appendix to Testing Cost Inefficiency under Free Entry in the Real Estate Brokerage Industry Lu Han University of Toronto lu.han@rotman.utoronto.ca Seung-Hyun Hong University of Illinois hyunhong@ad.uiuc.edu

### Diploma Plus in Certificate in Advanced Engineering

Diploma Plus in Certificate in Advanced Engineering Mathematics New Syllabus from April 2011 Ngee Ann Polytechnic / School of Interdisciplinary Studies 1 I. SYNOPSIS APPENDIX A This course of advanced

### Convex analysis and profit/cost/support functions

CALIFORNIA INSTITUTE OF TECHNOLOGY Division of the Humanities and Social Sciences Convex analysis and profit/cost/support functions KC Border October 2004 Revised January 2009 Let A be a subset of R m

### Lecture 24: Series finale

Lecture 24: Series finale Nathan Pflueger 2 November 20 Introduction This lecture will discuss one final convergence test and error bound for series, which concerns alternating series. A general summary

### The Not-Formula Book for C1

Not The Not-Formula Book for C1 Everything you need to know for Core 1 that won t be in the formula book Examination Board: AQA Brief This document is intended as an aid for revision. Although it includes

### No-Betting Pareto Dominance

No-Betting Pareto Dominance Itzhak Gilboa, Larry Samuelson and David Schmeidler HEC Paris/Tel Aviv Yale Interdisciplinary Center Herzlyia/Tel Aviv/Ohio State May, 2014 I. Introduction I.1 Trade Suppose

### The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem

### Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass.

Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

### Introduction to Convex Optimization for Machine Learning

Introduction to Convex Optimization for Machine Learning John Duchi University of California, Berkeley Practical Machine Learning, Fall 2009 Duchi (UC Berkeley) Convex Optimization for Machine Learning

### 1. R In this and the next section we are going to study the properties of sequences of real numbers.

+a 1. R In this and the next section we are going to study the properties of sequences of real numbers. Definition 1.1. (Sequence) A sequence is a function with domain N. Example 1.2. A sequence of real

### On a comparison result for Markov processes

On a comparison result for Markov processes Ludger Rüschendorf University of Freiburg Abstract A comparison theorem is stated for Markov processes in polish state spaces. We consider a general class of

### MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete

### Microeconometrics Blundell Lecture 1 Overview and Binary Response Models

Microeconometrics Blundell Lecture 1 Overview and Binary Response Models Richard Blundell http://www.ucl.ac.uk/~uctp39a/ University College London February-March 2016 Blundell (University College London)

### U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009. Notes on Algebra

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009 Notes on Algebra These notes contain as little theory as possible, and most results are stated without proof. Any introductory

### Notes V General Equilibrium: Positive Theory. 1 Walrasian Equilibrium and Excess Demand

Notes V General Equilibrium: Positive Theory In this lecture we go on considering a general equilibrium model of a private ownership economy. In contrast to the Notes IV, we focus on positive issues such

### Metric Spaces. Chapter 7. 7.1. Metrics

Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some