# princeton university F 02 cos 597D: a theorist s toolkit Lecture 7: Markov Chains and Random Walks

Save this PDF as:

Size: px
Start display at page:

Download "princeton university F 02 cos 597D: a theorist s toolkit Lecture 7: Markov Chains and Random Walks"

## Transcription

1 princeton university F 02 cos 597D: a theorist s toolkit Lecture 7: Markov Chains and Random Walks Lecturer: Sanjeev Arora Scribe:Elena Nabieva 1 Basics A Markov chain is a discrete-time stochastic process on n states defined in terms of a transition probability matrix (M) with rows i and columns j. M = ( P ij ) A transition probability P ij corresponds to the probability that the state at time step t + 1 will be j, given that the state at time t is i. Therefore, each row in the matrix M is a distribution and i, j SP ij 0 and j P ij = 1. Let the initial distribution be given by the row vector x R n, x i 0 and i x i = 1. After one step, the new distribution is xm. It is easy to see that xm is again a distribution. Sometimes it is useful to think of x as describing a certain amount fluid sitting at each node, such that the sum of the amounts is 1. After one step, the fluid sitting at node i distributes to its neighbors, such that P ij fraction goes to j. We stress that the evolution of a Markov chain is memoryless: the transition probability P ij depends only on the state i and not on the time t or the sequence of transititions taken before this time. Suppose we take two steps in this Markov chain. The memoryless property implies that the probability of going from i to j is k P ikp kj, which is just the (i, j)th entry of the matrix M 2. In general taking t steps in the Markov chain corresponds to the matrix M t. Definition 1 A distribution π for the Markov chain M is a stationary distribution if πm = π. Note that an alternative statement is that π is an eigenvector which has all nonnegative coordinates and whose corresponding eigenvalue is 1. Example 1 Consider a Markov chain defined by the following random walk on the nodes of an n-cycle. At each step, stay at the same node with probability 1/2. Go left with probability 1/4 and right with probability 1/4. The uniform distribution, which assigns probability 1/n to each node, is a stationary distribution for this chain, since it is unchanged after applying one step of the chain. Definition 2 A Markov chain M is ergodic if there exists a unique stationary distribution π and for every (initial) distribution x the limit lim t xm t = π. Theorem 1 The following are necessary and sufficient conditions for ergodicity: 1. connectivity: i, j : M t (i, j) > 0 for some t. 1

2 2 2. aperiodicity: i : gcd{t : M t (i, j) > 0} = 1. Remark 1 Clearly, these conditions are necessary. If the Markov chain is disconnected it cannot have a unique stationary distribution there is a different stationary distribution for each connected component. Similarly, a bipartite graph does not have a unique distribution: if the initial distribution places all probability on one side of the bipartite graph, then the distribution at time t oscillates between the two sides depending on whether t is odd or even. Note that in a bipartite graph gcd{t : M t (i, j) > 0} 2. The sufficiency of these conditions is proved using eigenvalue techniques (for inspiration see the analysis of mixing time later on). Both conditions are easily satisfied in practice. In particular, any Markov chain can be made aperiodic by adding self-loops assigned probability 1/2. Definition 3 An ergodic Markov chain is reversible if the stationary distribution π satisfies for all i, j, π i P ij = π j P ji. Uses of Markov Chains. A Markov Chain is a very convenient way to model many situations where the memoryless property makes sense. Examples including communication theory (Markovian sources), linguistics (Markovian models of language production), speech recognition, internet search (Google s Pagerank algorithm is based upon a Markovian model of a random surfer). 2 Mixing Times Informally, the mixing time of a Markov chain is the time it takes to reach nearly uniform distribution from any arbitrary starting distribution. Definition 4 The mixing time of an ergodic Markov chain M is t if for every starting distribution x, the distribution xm t satisfies xm t π 1 1/4. (Here 1 denotes the l 1 norm and the constant 1/4 is arbitrary.) The next exercise clarifies why we are interested in l 1 norm. Exercise 1 For any distribution π on {1, 2,..., N}, and S {1, 2,..., N} let π(s) = i S π i. Show that for any two distributions π, π, π π 1 = 2 max π(s) π (S). (1) S {1,...,N} Here is another way to restate the property in (1). Suppose A is some deterministic algorithm (we place no bounds on its complexity) that, given any number i {1, 2,..., N}, outputs Yes or No. If π π 1 ɛ then the probability that A outputs Yes on a random input drawn according to π cannot be too different from the probability it outputs Yes on an input drawn according to π. For this reason, l 1 distance is also called statistical difference. We are interested in analysing the mixing time so that we can draw a sample from the stationary distribution.

3 3 Example 2 (Mixing time of a cycle) Consider an n-cycle, i.e., a Markov chain with n states where, at each state, Pr(left) = Pr(right) = Pr(stay) = 1/3. Suppose the initial distribution concentrates all probability at state 0. Then t steps correspond to about 2t/3 random coin tosses and the index of the final state is (#(Heads) #(Tails)) (mod n). Clearly, it takes Ω(n 2 ) steps for the walk to reach the other half of the circle with any reasonable probability, and the mixing time is Ω(n 2 ). We will later see that this lowerbound is fairly tight. 2.1 Approximate Counting and Sampling Markov chains allow one to sample from very nontrivial sets, provided we know how to find at least one element of this set. The idea is to define a Markov chain whose state space is the same as this set. The Markov chain is such that it has a unique stationary distribution, which is uniform. We know how to find one element of the set. We do a walk according to the Markov chain with this as the starting point, and after T = O(mixing time) steps, output the node we are at. This is approximately a random sample from the set. We illustrate this idea later. First we discuss why sampling from large sets is important. Usually this set has exponential size set and it is only given implicitly. We give a few examples of some interesting sets. Example 3 (Perfect matchings) For some given graph G = (V, E) the set of all perfect matchings in G could be exponentially large compared to the size of the graph, and is only know implicitly. We know how to generate some element of this set, since we can find a perfect matching (if one exists) in polynomial time. But how do we generate a random element? Example 4 (0 1 knapsack) Given a 1... a n, b Z +, the set of vectors (x i,..., x n ) s.t. ai x i b. In both cases, determining the exact size of the set is in #P (the complexity class corresponding to counting the number of solutions to an NP problem). In fact, we have the following. Theorem 2 (Valiant, late 1970s) If there exist a polynomial-time algorithm for counting the number of perfect matchings or the number of solutions to the 0 1 knapsack counting problem, then P = NP (in fact, P = P #P ). Valiant s Theorem does not rule out finding good approximations to this problem. Definition 5 A Fully Polynomial Randomized Approximation Scheme (FPRAS) is a randomized algorithm, which for any ɛ finds an answer in time polynomial in ( n ɛ log 1 δ ) that is correct within a multiplicative factor (1+ɛ) with probability (1-δ). It turns out that approximate counting is equivalent to approximate sampling.

4 4 Theorem 3 (Jerrum, Valiant, Vazirani, 1984) If we can sample almost uniformly, in polynomial time, from A = {(x 1,..., x n ): a i x i b}, then we can design an FPRAS for the knapsack counting problem. Conversely, given an FPRAS for knapsack counting, we can draw an almost uniform sample from A. Remark 2 By sampling almost uniformly we mean having a sampling algorithm whose output distribution has l 1 distance exp( n 2 ) (say) from the uniform distribution. For ease of exposition, we think of this as a uniform sample. Proof: We first show how to count approximately assuming there is a polynomial time sampling algorithm. The idea is simple though the details require some care (which we suppress here). Suppose we have a sampling algorithm for knapsack. Draw a few samples from A, and observe what fraction feature x 1 = 0. Say it is p. Let A 0 be the set of solutions with x 1 = 0. Then p = A 0 / A. Now since A 0 is the set of (x 2,..., x n ) such that i 2 a ix i b a 1 x 1, it is also the set of solutions of a knapsack problem, but with one fewer variable. Using the algorithm recursively, assume we can calculate A 0. Then we can calculate A = A 0 /p. Now if we do not know A 0, p accurately but up to some accuracy, say (1 + ɛ). So we will only know A up to accuracy (1 + ɛ) ɛ. Actually the above is not accurate, since it ignores the possibility that p is so small that we never see an element of A 0 when we draw poly(n) samples from A. However, in that case, the set A 1 = A \ A 0 must be at least 1/2 of A and we can estimate its size. Then we proceed in the rest of the algorithm using A 1. Therefore, by choosing ɛ appropriately so that (1 + ɛ) n is small, and using the Chernoff bound, we can achieve the desired bound on the error in polynomial time. The converse is similar. To turn a counting algorithm into a sampling algorithm, we need to show how to output a random member of A. We do this bit by bit, first outputting x 1, then x 2, and so on. To output x 1, output 0 with probability p and 1 with probablity 1 p, where p = A 0 / A is calculated by calling the counting algorithm twice. Having output x 1 with the correct probability, we are left with a sampling problem on n 1 variables, which we solve recursively. Again, we need some care because we only have an approximate counting algorithm instead of an exact algorithm. Since we need to count the approximate counting algorithm only 2n times, an error of (1 + ɛ) each time could turn into an error of (1 + ɛ) 2n, which is about 1 + 2ɛ. Thus to count approximately, it suffices to sample from the uniform distribution. We define a Markov chain M on A whose stationary distribution is uniform. Then we show that its mixing time is poly(n). The Markov chain is as follows. If the current node is (x 1,..., x n ) (note a 1 x 1 + a 2 x a n x n b) then 1. with probability 1/2 remain at the same node 2. else pick i {1,..., n}. Let y = (x 1,..., x i 1, 1 x i, x i+1,..., x n ). If y A, go there. Else stay put.

5 5 Note that M is 1. aperiodic because of self-loops 2. connected because every sequence can be turned into the zero vector in a finite number of transformations, i.e., every node is connected to 0. Therefore, M is ergodic, i.e., has a unique stationary distribution. Since the uniform distribution is stationary, it follows that the stationary distribution of M is uniform. Now the question is: how fast does M converge to the uniform distribution? If M mixes fast, we can get an efficient approximation algorithm for the knapsack counting: we get the solution by running M for the mixing time and sampling from the resulting distribution after the mixing time has elapsed. Theorem 4 (Morris-Sinclair, 1999): The mixing time for M is O(n 8 ). Fact (see our remark later in our analysis of mixing time): running the M for a bit longer than the mixing time results in a distribution that is extremely close to uniform. Thus, we get the following sampling algorithm: 1. Start with the zero vector as the initial distribution of M. 2. Run M for O(n 9 ) time. 3. output the node at which the algorithm stops. This results in a uniform sampling from A. Thus Markov chains are useful for sampling from a distribution. Often, we are unable to prove any useful bounds on the mixing time (this is the case for many Markov chains used in simulated annealing and the Metropolis algorithm of statistical physics) but nevertheless in practice the chains are found to mix rapidly. Thus they are useful even though we do not have a proof that they work. 3 Bounding the mixing time For simplicity we restrict attention to regular graphs. Let M be a Markov chain on a d-regular undirected graph with an adjacency matrix A. Assume that M is ergodic and that d includes any self-loops. Then, clearly M = 1 d A. Since M is ergodic, and since 1 n 1 is a stationary distribution, then 1 n 1 is the unique stationary distribution for M. The question is how fast does M convege to 1 n 1? Note that if x is a distribution, x can be written as x = 1 1 n + α i e i

6 6 where e i are the eigenvectors of M which form an orthogonal basis and 1 is the first eigenvector with eigenvalue 1. (Clearly, x can be written as a combination of the eigenvectors; the observation here is that the coefficient in front of the first eigenvector 1 is 1 x/ which is 1 n i x i = 1 n.) Also M t x = M t 1 (Mx) = M t 1 ( 1 n 1 + α i λ i e i ) = M t 2 (M( 1 n = 1 n 1 + α i λ t ie i α i λ i e i )) α i λ t ie i 2 λ t max where λ max is the second largest eigenvalue of M. (Note that we are using the fact that the total l 2 norm of any distribution is i x2 i i x i = 1.) Thus we have proved M t x 1 n 1 2 λ t max. Mixing times were defined using l 1 distance, but Cauchy Schwartz inequality relates the l 2 and l 1 distances: p 1 n p 2. So we have proved: Theorem 5 The mixing time is at most O( log n λ max ). Note also that if we let the Markov chain run for O(k log n/λ max ) steps then the distance to uniform distribution drops to exp( k). This is why we were not very fussy about the constant 1/4 in the definition of the mixing time earlier. Finally, we recall from the last lecture: for S V, V ol(s) = i S d i, where d i is the degree of node i, the Cheeger Constant is h G = min S V,vol(S) V ol(v ) 2 E(S, S V ol(s) If µ is the smallest nonzero eigenvalue of the Laplacian L of M, then The Laplacian for our graph is Therefore, 2h G µ h2 G 2 L = I M

7 7 spec(l) = {0 = µ 1 µ 2... µ n } and Note that λ max = (1 µ 2 ) t. Therefore, spec(m) = {1 = 1 µ 1 1 µ µ n }. α i λ t ie i 2 (1 h2 G 2 )t n and we obtain the Jerrum-Sinclair inequality: M t x 1 n 1 2 (1 h2 G 2 ) t n. Examples: 1. For n-cycle: λ max = (1 c n 2 ) t, mixing time O(n 2 log n) (c is some constant). 2. For a hypercube on 2 n nodes (with self-loops added), λ max = (1 c n ) (this was a homework problem), so mixing time O(n log n) (c is some constant). Observe that the mixing time is much smaller than the number of nodes, i.e., the random walk does not visit all nodes. Finally, we note that random walks also give a randomized way to check s t connectivity (for undirected graphs) in logarithmic space, a surprising result since the usual method of checking s t connectivity, namely, breadth-first-search, seems to inherently require linear space. The main idea is that a random walk on a connected graph on n nodes mixes in O(n 4 ) time (the Cheeger constant must be at least 1/n 2 ) and so a logarithmic space algorithm can just do a random walk for O(n 2 log n) steps (note that space O(log n) is required is just to store the current node and a counter for the number of steps) starting from s, and if it never sees t, it can output reject. This answer will be correct with high probability. This application of random walks by Alleliunas, Karp, Karmarkar, Lipton and Lovasz 1979 was probably one of the first in theoretical computer science and it has been the subject of much further work recently. 4 Analysis of Mixing Time for General Markov Chains Thanks to Satyen Kale for providing this additional note In the class we only analysed random walks on d-regular graphs and showed that they converge exponentially fast with rate given by the second largest eigenvalue of the transition matrix. Here, we prove the same fact for general ergodic Markov chains. We need a lemma first.

8 8 Lemma 6 Let M be the transition matrix of an ergodic Markov chain with stationary distribution π and eigenvalues λ 1 (= 1) λ 2... λ n, corresponding to eigenvectors v 1 (= π), v 2,... v n. Then for any k 2, v k 1 = 0. Proof: We have v k M = λ k v k. Mulitplying by 1 and noting that M 1 = 1, we get v k 1 = λ k v k 1. Since the Markov chain is ergodic, λ k 1, so v k 1 = 0 as required. We are now ready to prove the main result concerning the exponentially fast convergence of a general ergodic Markov chain: Theorem 7 In the setup of the lemma above, let λ = max { λ 2, λ n }. Then for any initial distribution x, we have xm t π 2 λ t x 2. Proof: Write x in terms of v 1, v 2,..., v n as x = α 1 π + α i v i. Multiplying the above equation by 1, we get α 1 = 1 (since x 1 = π 1 = 1). Therefore xm t = π + n α iλ t i v i, and hence as needed. xm t π 2 α i λ t iv i 2 (2) λ t α α2 n (3) λ t x 2, (4)

### HOMEWORK 5 SOLUTIONS. n!f n (1) lim. ln x n! + xn x. 1 = G n 1 (x). (2) k + 1 n. (n 1)!

Math 7 Fall 205 HOMEWORK 5 SOLUTIONS Problem. 2008 B2 Let F 0 x = ln x. For n 0 and x > 0, let F n+ x = 0 F ntdt. Evaluate n!f n lim n ln n. By directly computing F n x for small n s, we obtain the following

### DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

### princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 6: Provable Approximation via Linear Programming Lecturer: Sanjeev Arora

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 6: Provable Approximation via Linear Programming Lecturer: Sanjeev Arora Scribe: One of the running themes in this course is the notion of

### Lecture 3: Finding integer solutions to systems of linear equations

Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture

### CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms

### Lecture 3: Linear Programming Relaxations and Rounding

Lecture 3: Linear Programming Relaxations and Rounding 1 Approximation Algorithms and Linear Relaxations For the time being, suppose we have a minimization problem. Many times, the problem at hand can

### Lecture 1: Course overview, circuits, and formulas

Lecture 1: Course overview, circuits, and formulas Topics in Complexity Theory and Pseudorandomness (Spring 2013) Rutgers University Swastik Kopparty Scribes: John Kim, Ben Lund 1 Course Information Swastik

### Matrix Norms. Tom Lyche. September 28, Centre of Mathematics for Applications, Department of Informatics, University of Oslo

Matrix Norms Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo September 28, 2009 Matrix Norms We consider matrix norms on (C m,n, C). All results holds for

### (67902) Topics in Theory and Complexity Nov 2, 2006. Lecture 7

(67902) Topics in Theory and Complexity Nov 2, 2006 Lecturer: Irit Dinur Lecture 7 Scribe: Rani Lekach 1 Lecture overview This Lecture consists of two parts In the first part we will refresh the definition

### Similarity and Diagonalization. Similar Matrices

MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

### Approximation Algorithms

Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

### Definition 11.1. Given a graph G on n vertices, we define the following quantities:

Lecture 11 The Lovász ϑ Function 11.1 Perfect graphs We begin with some background on perfect graphs. graphs. First, we define some quantities on Definition 11.1. Given a graph G on n vertices, we define

### Introduction to Markov Chain Monte Carlo

Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem

### Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like

### Conductance, the Normalized Laplacian, and Cheeger s Inequality

Spectral Graph Theory Lecture 6 Conductance, the Normalized Laplacian, and Cheeger s Inequality Daniel A. Spielman September 21, 2015 Disclaimer These notes are not necessarily an accurate representation

### Chapter 6. Orthogonality

6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be

### 8.1 Makespan Scheduling

600.469 / 600.669 Approximation Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programing: Min-Makespan and Bin Packing Date: 2/19/15 Scribe: Gabriel Kaptchuk 8.1 Makespan Scheduling Consider an instance

### 1. LINEAR EQUATIONS. A linear equation in n unknowns x 1, x 2,, x n is an equation of the form

1. LINEAR EQUATIONS A linear equation in n unknowns x 1, x 2,, x n is an equation of the form a 1 x 1 + a 2 x 2 + + a n x n = b, where a 1, a 2,..., a n, b are given real numbers. For example, with x and

### MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

### SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH

31 Kragujevac J. Math. 25 (2003) 31 49. SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH Kinkar Ch. Das Department of Mathematics, Indian Institute of Technology, Kharagpur 721302, W.B.,

### LECTURE 4. Last time: Lecture outline

LECTURE 4 Last time: Types of convergence Weak Law of Large Numbers Strong Law of Large Numbers Asymptotic Equipartition Property Lecture outline Stochastic processes Markov chains Entropy rate Random

Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one

### A Sublinear Bipartiteness Tester for Bounded Degree Graphs

A Sublinear Bipartiteness Tester for Bounded Degree Graphs Oded Goldreich Dana Ron February 5, 1998 Abstract We present a sublinear-time algorithm for testing whether a bounded degree graph is bipartite

### Theory of Computation Prof. Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology, Madras

Theory of Computation Prof. Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture No. # 31 Recursive Sets, Recursively Innumerable Sets, Encoding

### Presentation 3: Eigenvalues and Eigenvectors of a Matrix

Colleen Kirksey, Beth Van Schoyck, Dennis Bowers MATH 280: Problem Solving November 18, 2011 Presentation 3: Eigenvalues and Eigenvectors of a Matrix Order of Presentation: 1. Definitions of Eigenvalues

### ! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

Approximation Algorithms 11 Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of three

### Master s Theory Exam Spring 2006

Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

### CONTINUED FRACTIONS AND PELL S EQUATION. Contents 1. Continued Fractions 1 2. Solution to Pell s Equation 9 References 12

CONTINUED FRACTIONS AND PELL S EQUATION SEUNG HYUN YANG Abstract. In this REU paper, I will use some important characteristics of continued fractions to give the complete set of solutions to Pell s equation.

### Lecture 13 - Basic Number Theory.

Lecture 13 - Basic Number Theory. Boaz Barak March 22, 2010 Divisibility and primes Unless mentioned otherwise throughout this lecture all numbers are non-negative integers. We say that A divides B, denoted

### Inner Product Spaces

Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

### Systems of Linear Equations

Systems of Linear Equations Beifang Chen Systems of linear equations Linear systems A linear equation in variables x, x,, x n is an equation of the form a x + a x + + a n x n = b, where a, a,, a n and

### 1 Eigenvalues and Eigenvectors

Math 20 Chapter 5 Eigenvalues and Eigenvectors Eigenvalues and Eigenvectors. Definition: A scalar λ is called an eigenvalue of the n n matrix A is there is a nontrivial solution x of Ax = λx. Such an x

### Eigenvalues and eigenvectors of a matrix

Eigenvalues and eigenvectors of a matrix Definition: If A is an n n matrix and there exists a real number λ and a non-zero column vector V such that AV = λv then λ is called an eigenvalue of A and V is

### 2.1 Complexity Classes

15-859(M): Randomized Algorithms Lecturer: Shuchi Chawla Topic: Complexity classes, Identity checking Date: September 15, 2004 Scribe: Andrew Gilpin 2.1 Complexity Classes In this lecture we will look

### 1 Solving LPs: The Simplex Algorithm of George Dantzig

Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.

### The Second Eigenvalue of the Google Matrix

0 2 The Second Eigenvalue of the Google Matrix Taher H Haveliwala and Sepandar D Kamvar Stanford University taherh,sdkamvar @csstanfordedu Abstract We determine analytically the modulus of the second eigenvalue

### Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

### 10.3 POWER METHOD FOR APPROXIMATING EIGENVALUES

58 CHAPTER NUMERICAL METHODS. POWER METHOD FOR APPROXIMATING EIGENVALUES In Chapter 7 you saw that the eigenvalues of an n n matrix A are obtained by solving its characteristic equation n c nn c nn...

### Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = 36 + 41i.

Math 5A HW4 Solutions September 5, 202 University of California, Los Angeles Problem 4..3b Calculate the determinant, 5 2i 6 + 4i 3 + i 7i Solution: The textbook s instructions give us, (5 2i)7i (6 + 4i)(

### ! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of

### x a x 2 (1 + x 2 ) n.

Limits and continuity Suppose that we have a function f : R R. Let a R. We say that f(x) tends to the limit l as x tends to a; lim f(x) = l ; x a if, given any real number ɛ > 0, there exists a real number

### 2 Complex Functions and the Cauchy-Riemann Equations

2 Complex Functions and the Cauchy-Riemann Equations 2.1 Complex functions In one-variable calculus, we study functions f(x) of a real variable x. Likewise, in complex analysis, we study functions f(z)

### by the matrix A results in a vector which is a reflection of the given

Eigenvalues & Eigenvectors Example Suppose Then So, geometrically, multiplying a vector in by the matrix A results in a vector which is a reflection of the given vector about the y-axis We observe that

### Notes on Complexity Theory Last updated: August, 2011. Lecture 1

Notes on Complexity Theory Last updated: August, 2011 Jonathan Katz Lecture 1 1 Turing Machines I assume that most students have encountered Turing machines before. (Students who have not may want to look

### Notes on Factoring. MA 206 Kurt Bryan

The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor

Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

### Applied Algorithm Design Lecture 5

Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design

### Lecture 8: Random Walk vs. Brownian Motion, Binomial Model vs. Log-Normal Distribution

Lecture 8: Random Walk vs. Brownian Motion, Binomial Model vs. Log-ormal Distribution October 4, 200 Limiting Distribution of the Scaled Random Walk Recall that we defined a scaled simple random walk last

### Factoring & Primality

Factoring & Primality Lecturer: Dimitris Papadopoulos In this lecture we will discuss the problem of integer factorization and primality testing, two problems that have been the focus of a great amount

### I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

I GROUPS: BASIC DEFINITIONS AND EXAMPLES Definition 1: An operation on a set G is a function : G G G Definition 2: A group is a set G which is equipped with an operation and a special element e G, called

### Lecture 13: Factoring Integers

CS 880: Quantum Information Processing 0/4/0 Lecture 3: Factoring Integers Instructor: Dieter van Melkebeek Scribe: Mark Wellons In this lecture, we review order finding and use this to develop a method

### APPLICATIONS OF THE ORDER FUNCTION

APPLICATIONS OF THE ORDER FUNCTION LECTURE NOTES: MATH 432, CSUSM, SPRING 2009. PROF. WAYNE AITKEN In this lecture we will explore several applications of order functions including formulas for GCDs and

### 1 Formulating The Low Degree Testing Problem

6.895 PCP and Hardness of Approximation MIT, Fall 2010 Lecture 5: Linearity Testing Lecturer: Dana Moshkovitz Scribe: Gregory Minton and Dana Moshkovitz In the last lecture, we proved a weak PCP Theorem,

### Linear Codes. In the V[n,q] setting, the terms word and vector are interchangeable.

Linear Codes Linear Codes In the V[n,q] setting, an important class of codes are the linear codes, these codes are the ones whose code words form a sub-vector space of V[n,q]. If the subspace of V[n,q]

### 8.1 Min Degree Spanning Tree

CS880: Approximations Algorithms Scribe: Siddharth Barman Lecturer: Shuchi Chawla Topic: Min Degree Spanning Tree Date: 02/15/07 In this lecture we give a local search based algorithm for the Min Degree

### Lecture 2: Universality

CS 710: Complexity Theory 1/21/2010 Lecture 2: Universality Instructor: Dieter van Melkebeek Scribe: Tyson Williams In this lecture, we introduce the notion of a universal machine, develop efficient universal

### Load Balancing and Switch Scheduling

EE384Y Project Final Report Load Balancing and Switch Scheduling Xiangheng Liu Department of Electrical Engineering Stanford University, Stanford CA 94305 Email: liuxh@systems.stanford.edu Abstract Load

### Lecture 1: Time Complexity

Computational Complexity Theory, Fall 2010 August 25 Lecture 1: Time Complexity Lecturer: Peter Bro Miltersen Scribe: Søren Valentin Haagerup 1 Introduction to the course The field of Computational Complexity

### Lecture 4: AC 0 lower bounds and pseudorandomness

Lecture 4: AC 0 lower bounds and pseudorandomness Topics in Complexity Theory and Pseudorandomness (Spring 2013) Rutgers University Swastik Kopparty Scribes: Jason Perry and Brian Garnett In this lecture,

### DETERMINANTS. b 2. x 2

DETERMINANTS 1 Systems of two equations in two unknowns A system of two equations in two unknowns has the form a 11 x 1 + a 12 x 2 = b 1 a 21 x 1 + a 22 x 2 = b 2 This can be written more concisely in

### 1 if 1 x 0 1 if 0 x 1

Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or

### Factoring Algorithms

Factoring Algorithms The p 1 Method and Quadratic Sieve November 17, 2008 () Factoring Algorithms November 17, 2008 1 / 12 Fermat s factoring method Fermat made the observation that if n has two factors

### (x + a) n = x n + a Z n [x]. Proof. If n is prime then the map

22. A quick primality test Prime numbers are one of the most basic objects in mathematics and one of the most basic questions is to decide which numbers are prime (a clearly related problem is to find

### Lecture 12: Chain Matrix Multiplication

Lecture 12: Chain Matrix Multiplication CLRS Section 15.2 Outline of this Lecture Recalling matrix multiplication. The chain matrix multiplication problem. A dynamic programming algorithm for chain matrix

### FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL STATIsTICs 4 IV. RANDOm VECTORs 1. JOINTLY DIsTRIBUTED RANDOm VARIABLEs If are two rom variables defined on the same sample space we define the joint

### Negative Examples for Sequential Importance Sampling of. Binary Contingency Tables

Negative Examples for Sequential Importance Sampling of Binary Contingency Tables Ivona Bezáková Alistair Sinclair Daniel Štefankovič Eric Vigoda April 2, 2006 Keywords: Sequential Monte Carlo; Markov

### 2.3 Scheduling jobs on identical parallel machines

2.3 Scheduling jobs on identical parallel machines There are jobs to be processed, and there are identical machines (running in parallel) to which each job may be assigned Each job = 1,,, must be processed

### Near Optimal Solutions

Near Optimal Solutions Many important optimization problems are lacking efficient solutions. NP-Complete problems unlikely to have polynomial time solutions. Good heuristics important for such problems.

### IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem Time on my hands: Coin tosses. Problem Formulation: Suppose that I have

### a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.

Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given

### Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Chapter 7 Eigenvalues and Eigenvectors In this last chapter of our exploration of Linear Algebra we will revisit eigenvalues and eigenvectors of matrices, concepts that were already introduced in Geometry

### Diagonal, Symmetric and Triangular Matrices

Contents 1 Diagonal, Symmetric Triangular Matrices 2 Diagonal Matrices 2.1 Products, Powers Inverses of Diagonal Matrices 2.1.1 Theorem (Powers of Matrices) 2.2 Multiplying Matrices on the Left Right by

### 3 Does the Simplex Algorithm Work?

Does the Simplex Algorithm Work? In this section we carefully examine the simplex algorithm introduced in the previous chapter. Our goal is to either prove that it works, or to determine those circumstances

### Probability and Statistics

CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b - 0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be

### Diagonalisation. Chapter 3. Introduction. Eigenvalues and eigenvectors. Reading. Definitions

Chapter 3 Diagonalisation Eigenvalues and eigenvectors, diagonalisation of a matrix, orthogonal diagonalisation fo symmetric matrices Reading As in the previous chapter, there is no specific essential

### The SIS Epidemic Model with Markovian Switching

The SIS Epidemic with Markovian Switching Department of Mathematics and Statistics University of Strathclyde Glasgow, G1 1XH (Joint work with A. Gray, D. Greenhalgh and J. Pan) Outline Motivation 1 Motivation

### MATH 551 - APPLIED MATRIX THEORY

MATH 55 - APPLIED MATRIX THEORY FINAL TEST: SAMPLE with SOLUTIONS (25 points NAME: PROBLEM (3 points A web of 5 pages is described by a directed graph whose matrix is given by A Do the following ( points

### L1 vs. L2 Regularization and feature selection.

L1 vs. L2 Regularization and feature selection. Paper by Andrew Ng (2004) Presentation by Afshin Rostami Main Topics Covering Numbers Definition Convergence Bounds L1 regularized logistic regression L1

### Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

### 2.5 Elementary Row Operations and the Determinant

2.5 Elementary Row Operations and the Determinant Recall: Let A be a 2 2 matrtix : A = a b. The determinant of A, denoted by det(a) c d or A, is the number ad bc. So for example if A = 2 4, det(a) = 2(5)

### Optimal File Sharing in Distributed Networks

Optimal File Sharing in Distributed Networks Moni Naor Ron M. Roth Abstract The following file distribution problem is considered: Given a network of processors represented by an undirected graph G = (V,

### LS.6 Solution Matrices

LS.6 Solution Matrices In the literature, solutions to linear systems often are expressed using square matrices rather than vectors. You need to get used to the terminology. As before, we state the definitions

### IEOR 6711: Stochastic Models, I Fall 2012, Professor Whitt, Final Exam SOLUTIONS

IEOR 6711: Stochastic Models, I Fall 2012, Professor Whitt, Final Exam SOLUTIONS There are four questions, each with several parts. 1. Customers Coming to an Automatic Teller Machine (ATM) (30 points)

### Breaking The Code. Ryan Lowe. Ryan Lowe is currently a Ball State senior with a double major in Computer Science and Mathematics and

Breaking The Code Ryan Lowe Ryan Lowe is currently a Ball State senior with a double major in Computer Science and Mathematics and a minor in Applied Physics. As a sophomore, he took an independent study

### x if x 0, x if x < 0.

Chapter 3 Sequences In this chapter, we discuss sequences. We say what it means for a sequence to converge, and define the limit of a convergent sequence. We begin with some preliminary results about the

### THE \$25,000,000,000 EIGENVECTOR THE LINEAR ALGEBRA BEHIND GOOGLE

THE \$5,,, EIGENVECTOR THE LINEAR ALGEBRA BEHIND GOOGLE KURT BRYAN AND TANYA LEISE Abstract. Google s success derives in large part from its PageRank algorithm, which ranks the importance of webpages according

### Analysis of Mean-Square Error and Transient Speed of the LMS Adaptive Algorithm

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 7, JULY 2002 1873 Analysis of Mean-Square Error Transient Speed of the LMS Adaptive Algorithm Onkar Dabeer, Student Member, IEEE, Elias Masry, Fellow,

### On the communication and streaming complexity of maximum bipartite matching

On the communication and streaming complexity of maximum bipartite matching Ashish Goel Michael Kapralov Sanjeev Khanna November 27, 20 Abstract Consider the following communication problem. Alice holds

### MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

### 33 Cauchy Integral Formula

33 AUHY INTEGRAL FORMULA October 27, 2006 PROOF Use the theorem to write f ()d + 1 f ()d + 1 f ()d = 0 2 f ()d = f ()d f ()d. 2 1 2 This is the deformation principle; if you can continuously deform 1 to

### Series Convergence Tests Math 122 Calculus III D Joyce, Fall 2012

Some series converge, some diverge. Series Convergence Tests Math 22 Calculus III D Joyce, Fall 202 Geometric series. We ve already looked at these. We know when a geometric series converges and what it

### Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions

### 1. Let P be the space of all polynomials (of one real variable and with real coefficients) with the norm

Uppsala Universitet Matematiska Institutionen Andreas Strömbergsson Prov i matematik Funktionalanalys Kurs: F3B, F4Sy, NVP 005-06-15 Skrivtid: 9 14 Tillåtna hjälpmedel: Manuella skrivdon, Kreyszigs bok

### Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number

### Lecture 6 Online and streaming algorithms for clustering

CSE 291: Unsupervised learning Spring 2008 Lecture 6 Online and streaming algorithms for clustering 6.1 On-line k-clustering To the extent that clustering takes place in the brain, it happens in an on-line

### Big Data Technology Motivating NoSQL Databases: Computing Page Importance Metrics at Crawl Time

Big Data Technology Motivating NoSQL Databases: Computing Page Importance Metrics at Crawl Time Edward Bortnikov & Ronny Lempel Yahoo! Labs, Haifa Class Outline Link-based page importance measures Why

### Minimum Makespan Scheduling

Minimum Makespan Scheduling Minimum makespan scheduling: Definition and variants Factor 2 algorithm for identical machines PTAS for identical machines Factor 2 algorithm for unrelated machines Martin Zachariasen,

### REMARKS ON ABSOLUTE CONTINUITY IN THE CONTEXT OF FREE PROBABILITY AND RANDOM MATRICES arxiv: v2 [math.pr] 8 Feb 2015

REMARKS ON ABSOLUTE CONTINUITY IN THE CONTEXT OF FREE PROBABILITY AND RANDOM MATRICES arxiv:409.227v2 [math.pr] 8 Feb 205 ARIJIT CHAKRABARTY AND RAJAT SUBHRA HAZRA Abstract. In this note, we show that