Markov Chains. Table of Contents. Schedules
|
|
|
- Leon Carr
- 10 years ago
- Views:
Transcription
1 Markov Chains These notes contain material prepared by colleagues who have also presented this course at Cambridge, especially James Norris. The material mainly comes from books of Norris, Grimmett & Stirzaker, Ross, Aldous & Fill, and Grinstead & Snell. Many of the examples are classic and ought to occur in any sensible course on Markov chains. Contents Table of Contents Schedules i iv Definitions, basic properties, the transition matrix. An example and some interesting questions Definitions Where do Markov chains come from? How can we simulate them? The n-step transition matrix P (n) for a two-state Markov chain Calculation of n-step transition probabilities, class structure, absorption, and irreducibility 5. Example: a three-state Markov chain Example: use of symmetry Markov property Class structure Closed classes Irreducibility Hitting probabilities and mean hitting times 9 3. Absorption probabilities and mean hitting times Calculation of hitting probabilities and mean hitting times Absorption probabilities are minimal solutions to RHEs Gambler s ruin Survival probability for birth and death chains, stopping times and strong Markov property 3 4. Survival probability for birth death chains Mean hitting times are minimal solutions to RHEs Stopping times Strong Markov property i
2 5 Recurrence and transience 7 5. Recurrence and transience Equivalence of recurrence and certainty of return Equivalence of transience and summability of n-step transition probabilities Recurrence as a class property Relation with closed classes Random walks in dimensions one, two and three 6. Simple random walk on Z Simple symmetric random walk on Z Simple symmetric random walk on Z *A continuized analysis of random walk on Z 3 * *Feasibility of wind instruments* Invariant distributions 5 7. Examples of invariant distributions Notation What does an invariant measure or distribution tell us? Invariant distribution is the solution to LHEs Stationary distributions Equilibrium distributions Existence and uniqueness of invariant distribution, mean return time, positive and null recurrence 9 8. Existence and uniqueness up to constant multiples Mean return time, positive and null recurrence Random surfer Convergence to equilibrium for ergodic chains Equivalence of positive recurrence and the existence of an invariant distribution Aperiodic chains Convergence to equilibrium *and proof by coupling* Long-run proportion of time spent in given state Ergodic theorem *Kemeny s constant and the random target lemma* Time reversal, detailed balance, reversibility, random walk on a graph 4. Time reversal Detailed balance Reversibility Random walk on a graph ii
3 Concluding problems and recommendations for further study 45. Reversibility and Ehrenfest s urn model Reversibility and the M/M/ queue *The probabilistic abacus* *Random walks and electrical networks* Probability courses in Part II Appendix 5 A Probability spaces 5 B Historical notes 5 C The probabilistic abacus for absorbing chains 53 Index 56 Topics marked * * are non-examinable. Furthermore, only Appendix A is examinable. Richard Weber, October 0 iii
4 Schedules Definition and basic properties, the transition matrix. Calculation of n-step transition probabilities. Communicating classes, closed classes, absorption, irreducibility. Calculation of hitting probabilities and mean hitting times; survival probability for birth and death chains. Stopping times and statement of the strong Markov property. [5] Recurrence and transience; equivalence of transience and summability of n-step transition probabilities; equivalence of recurrence and certainty of return. Recurrence as a class property, relation with closed classes. Simple random walks in dimensions one, two and three. [3] Invariant distributions, statement of existence and uniqueness up to constant multiples. Mean return time, positive recurrence; equivalence of positive recurrence and the existence of an invariant distribution. Convergence to equilibrium for irreducible, positive recurrent, aperiodic chains *and proof by coupling*. Long-run proportion of time spent in given state. [3] Time reversal, detailed balance, reversibility; random walk on a graph. [] Learning outcomes A Markov process is a random process for which the future (the next step) depends only on the present state; it has no memory of how the present state was reached. A typical example is a random walk (in two dimensions, the drunkards walk). The course is concerned with Markov chains in discrete time, including periodicity and recurrence. For example, a random walk on a lattice of integers returns to the initial position with probability one in one or two dimensions, but in three or more dimensions the probability of recurrence in zero. Some Markov chains settle down to an equilibrium state and these are the next topic in the course. The material in this course will be essential if you plan to take any of the applicable courses in Part II. Learning outcomes By the end of this course, you should: understand the notion of a discrete-time Markov chain and be familiar with both the finite state-space case and some simple infinite state-space cases, such as random walks and birth-and-death chains; know how to compute for simple examples the n-step transition probabilities, hitting probabilities, expected hitting times and invariant distribution; understand the notions of recurrence and transience, and the stronger notion of positive recurrence; understand the notion of time-reversibility and the role of the detailed balance equations; know under what conditions a Markov chain will converge to equilibrium in long time; be able to calculate the long-run proportion of time spent in a given state. iv
5 Definitions, basic properties, the transition matrix Markov chains were introduced in 906 by Andrei Andreyevich Markov (856 9) and were named in his honor.. An example and some interesting questions Example.. A frog hops about on 7 lily pads. The numbers next to arrowsshow the probabilities with which, at the next jump, he jumps to a neighbouring lily pad (and when out-going probabilities sum to less than he stays where he is with the remaining probability) P = There are 7 states (lily pads). In matrix P the element p 57 (= /) is the probability that, when starting in state 5, the next jump takes the frog to state 7. We would like to know where do we go, how long does it take to get there, and what happens in the long run? Specifically: (a) Starting in state, what is the probability that we are still in state after 3 steps? (p (3) = /4) after 5 steps? (p(5) = 3/6) or after 000 steps? ( /5 as lim n p (n) = /5) (b) Starting in state 4, what is the probability that we ever reach state 7? (/3) (c) Starting in state 4, how long on average does it take to reach either 3 or 7? (/3) (d) Starting in state, what is the long-run proportion of time spent in state 3? (/5) Markov chains models/methods are useful in answering questions such as: How long does it take to shuffle deck of cards? How likely is a queue to overflow its buffer? How long does it take for a knight making random moves on a chessboard to return to his initial square (answer 68, if starting in a corner, 4 if starting near the centre). What do the hyperlinks between web pages say about their relative popularities?
6 . Definitions Let I be a countable set, {i,j,k,...}. Each i I is called a state and I is called the state-space. We work in a probability space (Ω, F,P). Here Ω is a set of outcomes, F is a set of subsets of Ω, and for A F, P(A) is the probability of A (see Appendix A). The object of our study is a sequence of random variables X 0,X,... (taking values in I) whose joint distribution is determined by simple rules. Recall that a random variable X with values in I is a function X : Ω I. A row vector λ = (λ i : i I) is called a measure if λ i 0 for all i. If i λ i = then it is a distribution (or probability measure). We start with an initial distribution over I, specified by {λ i : i I} such that 0 λ i for all i and i I λ i =. The special case that with probability we start in state i is denoted λ = δ i = (0,...,,...,0). We also have a transition matrix P = (p ij : i,j I) with p ij 0 for all i,j. It is a stochastic matrix, meaning that p ij 0 for all i,j I and j I p ij = (i.e. each row of P is a distribution over I). Definition.. We say that (X n ) n 0 is a Markov chain with initial distribution λ and transition matrix P if for all n 0 and i 0,...,i n+ I, (i) P(X 0 = i 0 ) = λ i0 ; (ii) P(X n+ = i n+ X 0 = i 0,...,X n = i n ) = P(X n+ = i n+ X n = i n ) = p ini n+. For short, we say (X n ) n 0 is Markov(λ,P). Checking conditions (i) and (ii) is usually the most helpful way to determine whether or not a given random process (X n ) n 0 is a Markov chain. However, it can also be helpful to have the alternative description which is provided by the following theorem. Theorem.3. (X n ) n 0 is Markov(λ,P) if and only if for all n 0 and i 0,...,i n I, P(X 0 = i 0,...,X n = i n ) = λ i0 p i0i p in i n. (.) Proof. Suppose (X n ) n 0 is Markov(λ,P). Then P(X 0 = i 0,...,X n = i n ) = P(X n = i n X 0 = i 0,...,X n = i n )P(X 0 = i 0,...,X n = i n ) = P(X 0 = i 0 )P(X = i X 0 = i 0 ) P(X n = i n X 0 = i,...,x n = i n ) = λ i0 p i0i p in i n. On the other hand if (.) holds we can sum it over all i,...,i n, to give P(X 0 = i 0 ) = λ 0, i.e (i). Then summing (.) on i n we get P(X 0 = i 0,...,X n = i n ) = λ i0 p i0i p in i n. Hence P(X n = i n X 0 = i 0,...,X n = i n ) = P(X 0 = i 0,...,X n = i n ) P(X 0 = i 0,...,X n = i n ) = p i n i n,
7 which establishes (ii)..3 Where do Markov chains come from? At each time we apply some new randomness to determine the next step, in a way that is a function only of the current state. We might take U,U,... as some i.i.d. random variables taking values in a set E and a function F : I E I. Take i I. Set X 0 = i and define recursively X n+ = F(X n,u n+ ), n 0. Then (X n ) n 0 is Markov(δ i,p) where p ij = P(F(i,U) = j)..4 How can we simulate them? Use a computer to simulate U,U,... as i.i.d. U[0,]. Define F(U,i) = J by the rules U [0,p i ) = J = [ j U k= p ik, ) j k= p ik, j = J = j..5 The n-step transition matrix Let A be an event. A convenient notation is P i (A) = P(A X 0 = i). For example P i (X = j) = p ij. Given the initial distribution λ, let us treat it as a row vector. Then P(X = j) = i I λ i P i (X = j) = i I λ i p ij. Similarly, P i (X = j) = k P i (X = k,x = j) = k p ik p kj = (P ) ij P(X = j) = i,k λ i P i (X = k,x = j) = i,k λ i p ik p kj = (λp ) j. Continuing in this way, P i (X n = j) = (δ i P n ) j = (P n ) ij = p (n) ij P(X n = j) = i 0,...,i n λ i0 p i0i p in j = (λp n ) j. Thus P (n) = (p (n) ij ), the n-step transition matrix, is simply Pn (P raised to power n). Also, for all i,j and n,m 0, the (obvious) Chapman-Kolmogorov equations hold: p (n+m) ij = k I p (n) ik p(m) kj 3
8 named for their independent formulation by Chapman (a Trinity College graduate, ), and Kolmogorov ( ). In Example. n P (n) = P (n) for a two-state Markov chain Example.4 (A two-state Markov chain). α β ( ) α β α α P = β β The eigenvalues are and α β. So we can write ( ) ( ) 0 0 P = U U 0 α β = P n = U 0 ( α β) n U So p (n) = A+B( α β)n for some A and B. i.e. Butp (0) = = A+B andp() = α = A+B( α β). So(A,B) = (β,α)/(α+β), P (X n = ) = p (n) = β α+β + α α+β ( α β)n. Note that this tends exponentially fast to a limit of β/(α+β). Other components of P n can be computed similarly, and we have ( β P n α+β = + α α+β ( α β)n α α+β α ) α+β ( α β)n β α+β β α+β ( α β)n α α+β + β α+β ( α β)n Note. We might have reached the same answer by arguing: ( p (n) = p(n ) p +p (n ) p = p (n ) ( α)+ This gives the linear recurrence relation p (n) = β +( α β)p(n ) to which the solution is of the form p (n) = A+B( α β)n. p (n ) ) β. 4
9 Calculation of n-step transition probabilities, class structure, absorption, and irreducibility. Example: a three-state Markov chain Example.. / 3 / 0 0 P = 0 0 Now 0 = det(xi P) = x(x /) /4 = (/4)(x )(4x +). Eigenvalues are, ±i/. This means that p (n) = A+B(i/)n +C( i/) n. We make the substitution: (±i/) n = (/) n e ±inπ/ = (/) n( ) cos(nπ/)±isin(nπ/). Thus for some B and C, ) p (n) = A+(/)n( B cos(nπ/)+c sin(nπ/). We then use facts that p (0) =, p() p (n) = 5 +( = 0, p() = 0 to fix A, B and C and get ) n [ 4 nπ 5 cos ] nπ 5 sin. Note that the second term is exponentially decaying. We have p (5) = 3 6, p(0) = 5 56, p(n) /5 < n. More generally, for chain with m states, and states i and j (i) Compute the eigenvalues µ,...,µ m of the m m matrix P. (ii) If the eigenvalues are distinct then p (n) ij has the form (remembering that µ = ), p (n) ij = a +a µ n + a m µ n m for some constants a,...,a m. If an eigenvalue µ is repeated k times then there will be a term of (b 0 +b n+ +b k n k )µ n. (iii) Complex eigenvalues come in conjugate pairs and these can be written using sins and cosines, as in the example. However, sometimes there is a more clever way. 5
10 . Example: use of symmetry Example. (Random walk on vertices of a complete graph). Consider a random walk on the complete graph K 4 (vertices of a tetrahedron), with 0 P = Eigenvalues are {, /3, /3, /3} so general solution is p (n) = A + ( /3)n (a + bn + cn ). However, we may use symmetry in a helpful way. Notice that for i j, p (n) ij = (/3)( p (n) ii ). So p (n) = p j p (n ) j = (/3) j ( p (n ) Thus we have just a first order recurrence equation, to which the general solution is of the form p (n) = A + B( /3)n. Since A+B = and A + ( /3)B = 0 we have p (n) = /4+(3/4)( /3)n. Obviously, p (n) /4 as n. p (n) Do we expect the same sort of thing for random walk on corners of a cube? (No, does not tend to a limit since p(n) = 0 if n is odd.).3 Markov property Theorem.3 (Markov property). Let (X n ) n 0 be Markov(λ,P). Then conditional on X m = i, (X m+n ) n 0 is Markov(δ i,p) and is independent of the random variables X 0,X,...,X m. Proof (non-examinable). WemustshowthatforanyeventAdeterminedbyX 0,...,X m we have ). P({X m = i m,...,x m+n = i m+n } A X m = i) = δ iim p imi m+ p im+n i m+n P(A X m = i) (.) then the result follows from Theorem.3. First consider the case of elementary events like A = {X 0 = i 0,...,X m = i m }. In that case we have to show that P(X 0 = i 0,...,X m+n = i m+n and i = i m )/P(X m = i) = δ iim p imi m+ p im+n i m+n P(X 0 = i 0,...,X m = i m and i = i m )/P(X m = i) which is true by Theorem.3. In general, any event A determined by X 0,...,X m may be written as the union of a countable number of disjoint elementary events A = A k. k= 6
11 The desired identity (.) follows by summing the corresponding identities for the A k..4 Class structure It is sometimes possible to break a Markov chain into smaller pieces, each of which is relatively easy to understand, and which together give an understanding of the whole. This is done by identifying communicating classes of the chain. Recall the initial example. 7 / / / / / / /4 /4 / We say that i leads to j and write i j if P i (X n = j for some n 0) > 0. We say i communicates with j and write i j if both i j and j i. Theorem.4. For distinct states i and j the following are equivalent. (i) i j; (ii) p i0i p ii p in i n > 0 for some states i 0,i,...,i n, where n, i 0 = i and i n = j; (iii) p (n) ij > 0 for some n. Proof. Observe that p (n) ij P i (X n = j for some n 0) which proves the equivalence of (i) and (iii). Also, p (n) ij = p i0i p ii p in j i,...,i n so that (ii) and (iii) are equivalent. 6 n=0 p (n) ij 7
12 .5 Closed classes It is clear from (ii) that i j and j k imply i k, and also that i i. So satisfies the conditions for an equivalence relation on I and so partitions I into communicating classes. We say a C is a closed class if i C, i j = j C. A closed class is one from which there is no escape. A state i is absorbing if {i} is a closed class. If C is not closed then it is open and there exist i C and j C with i j (you can escape). Example.5. Find the classes in P and say whether they are open or closed. P = The solution is obvious from the diagram. The classes are {,,3}, {4} and {5,6}, with only {5,6} being closed..6 Irreducibility A chain or transition matrix P in which I is a single class is called irreducible. It is easy to detect. We just have to check that i j for every i,j. 8
13 3 Hitting probabilities and mean hitting times 3. Absorption probabilities and mean hitting times Example 3.. P = ( ) p p 0 Let us calculate probability of absorption in state. P (hit ) = P(hit at time n) = ( p) n p =. n= n= Similarly can find mean time to hit state : E (time to hit ) = np(hit at time n) n= = n( p) n p = p d ( p) n = dp p. n= Alternatively, set h = P (hit ), k = E (time to hit ). Conditional on the first step, h = ( p)p (hit X = )+pp (hit X = ) = ( p)h+p = h = k = +( p)k +p.0 = k = /p. 3. Calculation of hitting probabilities and mean hitting times Let (X n ) n 0 be a Markov chain with transition matrix P. The first hitting time of a subset A of I is the random variable H A : Ω {0,,...} { } given by n=0 H A (ω) = inf{n 0 : X n (ω) A) where we agree that the infimum of the empty set is. The probability starting from i that (X n ) n 0 ever hits A is h A i = P i (H A < ). When A is a closed class, h A i is called an absorption probability. The mean hitting time for (X n ) n 0 reaching A is given by k A i = E i (H A ) = n< np i (H A = n)+ P(H A = ). Informally, h A i = P i (hit A), k A i = E i (time to hit A). Remarkably, these quantities can be calculated from certain simple linear equations. Let us consider an example. 9
14 Example 3.. Symmetric random walk on the integers 0,,,3, with absorption at 0 and 3. / / 0 / 3 / Starting from, what is the probability of absorption at 3? How long does it take until the chain is absorbed in 0 or 3? Let h i = P i (hit 3), k i = E i (time to hit {0,3}). Clearly, h 0 = 0 h = h 0 + h k 0 = 0 k = + k 0 + k h = h + h 3 h 3 = k = + k + k 3 k 3 = 0 These are easily solved to give h = /3, h = /3, and k = k =. Example 3.3 (Gambler s ruin on 0,...,N). Asymmetric random walk on the integers 0,,...,N, with absorption at 0 and N. 0 < p = q <. p p p p p 0 i- i i+ N- N q q q q q Let h i = P i (hit 0). So h 0 =, h N = 0 and h i = qh i +ph i+, i N. Characteristic equation is px x + q = (x )(px q) so roots are {,q/p} and if p q general solution is h i = A+B(q/p) i. Using boundary conditions we find h i = (q/p)i (q/p) N (q/p) N. If p = q general solution is h i = A + Bi and using boundary conditions we get h i = i/n. Similarly, you should be able to show that if p = q, k i = E i (time to hit {0,N}) = i(n i). 0
15 3.3 Absorption probabilities are minimal solutions to RHEs For a finite state space, as examples thus far, the equations have a unique solution. But if I is infinite there may be many solutions. The absorption probabilities are given by the minimal solution. Theorem 3.4. The vector of hitting probabilities h A = (h A i : i I) is the minimal non-negative solution to the system of linear equations { h A i = for i A h A i = j p ijh A (3.) j for i A (Minimality means that if x = (x i : i I) is another solution with x i 0 then x i h i for all i.) We call (3.) a set of right hand equations (RHEs) because h A occurs on the r.h.s. of P in h A = Ph A (where P is the matrix obtained by deleting rows for which i A). Proof. First we show that h A satisfies (3.). If X 0 = i A, then H A = 0, so h A i =. If X 0 A, then H A, so by the Markov property h A i = P i (H A < ) = j I P i (H A <,X = j) = j I P i (H A < X = j)p i (X = j) = j I p ij h A j ( as Pi (H A < X = j) = P j (H A < ) = h A ) j. Suppose now that x = (x i : i I) is any solution to (3.). Then h A i = x i = for i A. Suppose i A, then x i = j p ij x j = j A p ij x j + j Ap ij x j. Substitute for x j to obtain x i = p ij + p ij p jk + p jk x k j A j A k A k A = P i (X A)+P i (X A,X A)+ By repeated substitution for x in the final terms we obtain j,k A p ij p jk x k x i = P i (X A)+P i (X A,X A)+ +P i (X A,X,...,X n A,X n A) + p ij p jj p jn j n x jn. j,...,j n A So since x jn is non-negative, x i P i (H A n). This implies x i lim n P i(h A n) = P i (H A < ) = h i.
16 Notice that if we try to use this theorem to solve Example 3. then (3.) does not provide the information that h 0 = 0 since the second part of (3.) only says h 0 = h 0. However, h 0 = 0 follows from the minimality condition. ts 3.4 Gambler s ruin Example 3.5 (Gambler s ruin on 0,,...). 0 < p = q <. p p p p 0 i i i+ q q q q The transition probabilities are p 00 =, p i,i = q, p i,i+ = p for i =,,... Set h i = P i (hit 0), then it is the minimal non-negative solution to h 0 =, h i = ph i+ +qh i for i =,,... If p q this recurrence has general solution h i = A+A(q/p) i. We know by Theorem 3.4 that the minimal solution is h, a distribution on {0,,...}. If p < q then the fact that 0 h i for all i forces A = 0. So h i = for all i. If p > q then in seeking a minimal solution we will wish to take A as large as possible, consistent with h i 0 for all i. So A =, and h i = (q/p) i. Finally, if p = q = / the recurrence relation has a general solution h i = +Bi, and the restriction 0 h i forces B = 0. Thus h i = for all i. So even if you find a fair casino you are certain to end up broke. This apparent paradox is called gambler s ruin.
17 4 Survival probability for birth and death chains, stopping times and strong Markov property 4. Survival probability for birth death chains Example 4.. Birth and death chains. Consider the Markov chain with diagram p p i p i 0 i i i+ q q i q i+ where, for i =,,..., we have 0 < p i = q i <. State i is that in which a population is i. As in previous examples, 0 is an absorbing state and we wish to calculate the absorption probability starting from state i. So now h i = P i (hit 0) is the extinction probability starting from state i. We write down the usual system of r.h. equations h 0 =, h i = p i h i+ +q i h i for i =,,... This recurrence relation has variable coefficients so the usual technique fails. But consider u i = h i h i. Then p i u i+ = q i u i, so ( ) ( ) qi qi q i q u i+ = u i = u = γ i u p i p i p i p where the final equality defines γ i. Then so u + +u i = h 0 h i h i = u (γ 0 + +γ i ) where γ 0 =. At this point u remains to be determined. Since we know h is the minimal solution to the right hand equations, we want to choose u to be as large as possible. In the case i=0 γ i =, the restriction 0 h i forces u = h = 0 and h i = for all i. But if i=0 γ i < then we can take u > 0 so long as u (γ 0 + +γ i ) 0 for all i. Thus the minimal non-negative solution occurs when u = ( i=0 γ i) and then / h i = γ j γ j. j=i In this case, for i =,,..., we have h i <, so the population survives with positive probability. j=0 3
18 4. Mean hitting times are minimal solutions to RHEs A similar result to Theorem 3.4 can be proved for mean hitting times. Theorem 4.. The vector of mean hitting times k A = (ki A : i I) is the minimal non-negative solution to the system of linear equations { k A i = 0 for i A ki A = + j p ijkj A (4.) for i A Proof. First we show that k A satisfies (4.). If X 0 = i A, then H A = 0, so k A i = 0. If X 0 A, then H A, so by the Markov property t= E i (H A X = j) = +E j (H A ) = +k A j and so for i A, ki A = P(H A t) = P(H A t X = j)p i (X = j) = j I t= j I P(H A t X = j)p i (X = j) t= = j I E i (H A X = j)p i (X = j) = j I p ij (+k A j ) = + j I p ij k A j In the second line abovewe use the fact that we can swap t and j I for countable sums (Fubini s theorem). Suppose now that y = (y i : i I) is any solution to (4.). Then ki A = y i = 0 for i A. Suppose i A, then y i = + j Ap ij y j = + p ij + p jk y k j A k A = P i (H A )+P i (H A )+ By repeated substitution we obtain y i = P i (H A )+ +P i (H A n)+ j,k A j,...,j n A p ij p jk y k. p ij p jj p jn j n y jn. 4
19 So since y jn is non-negative y i lim n [P i(h A )+ +P i (H A n)] = E i (H A ) = k A i. 4.3 Stopping times The Markov property says that for each time m, conditional on X m = i, the process after time m is a Markov chain that begins afresh from i. What if we simply waited for the process to hit state i at some random time H? ArandomvariableT : Ω {0,,,...} { }iscalledastoppingtimeiftheevent {T = n} depends only on X 0,...,X n for n = 0,,,.... Alternatively, T is a stopping time if {T = n} F n for all n, where this means {T = n} {(X 0,...,X n ) B n } for some B n I n+. Intuitively, this means that by watching the process you can tell when T occurs. If asked to stop at T you know when to stop. Examples The hitting time: H i = inf{n 0 : X n = i} is a stopping time.. T = H i + is a stopping time. 3. T = H i is not a stopping time. 4. T = L i = sup{n 0 : X n = i} is not a stopping time. 4.4 Strong Markov property Theorem 4.4 (Strong Markov property). Let (X n ) n 0 be Markov(λ,P) and let T be a stopping time of (X n ) n 0. Then conditional on T < and X T = i, (X T+n ) n 0 is Markov(δ i,p) and independent of the random variables X 0,X i,...,x T. Proof (not examinable). If B is an event determined by X 0,X,...,X T then B {T = m} is an event determined by X 0,X,...,X m, so, by the Markov property at time m, P({X T = j 0,X T+ = j,...,x T+n = j n } B {T = m} {X T = i}) = P i ({X 0 = j 0,X = j,...,x n = j n })P(B {T = m} {X T = i}) where we have used the condition T = m to replace m by T. Now sum over m = 0,,,... and divide by P(T <,X T = i) to obtain P({X T = j 0,X T+ = j,...,x T+n = j n } B T <,X T = i) = P i ({X 0 = j 0,X = j,...,x n = j n })P(B T <,X T = i). 5
20 Example 4.5 (Gambler s ruin again). 0 < p = q <. p p p p 0 i i i+ q q q q Let h i = P i (hit 0). Then h = ph +q h = h. The fact that h = h is explained as follows. Firstly, by the strong Markov property we have that P (hit 0) = P (hit )P (hit 0). This is because any path which starts in state and eventually hits state 0 must at some first time hit state, and then from state reach state 0. Secondly, P (hit ) = P (hit 0). This is because paths that go from to are in one-to-one correspondence with paths that go from to 0 (just shifted to the right). So P (hit 0) = P (hit 0). So h = ph +q and this implies h = or = q/p. We take the minimal solution. Similarly, let k i = E i (time to hit 0). Assuming q > p, = k = /( p) = /(q p). k = +pk k = k (by strong Markov property) 6
21 5 Recurrence and transience 5. Recurrence and transience Let (X n ) n 0 be a Markov chain with transition matrix P. Define H i = inf{n 0 : X n = i} = hitting time on i T i = inf{n : X n = i} = first passage time to i. Note that H i and T i differ only if X 0 = i. Let V i = {Xn=i} = number of visits to i n=0 f i = P i (T i < ) = return probability to i m i = E i (T i ) = mean return time to i. We say that i is recurrent if P i (V i = ) =. Otherwise i is transient. A recurrent state is one that one you keep coming back to. A transient state is one that you eventually leave forever. In fact, as we shortly see, P i (V i = ) can only take the values 0 and (not, say, /). 5. Equivalence of recurrence and certainty of return Lemma 5.. For all k 0, P i (V i k +) = (f i ) k. Proof. This is true for k = 0. Assume it is true for k. The kth visit to i is a stopping time, so by the strong Markov property P i (V i k +) = P i (V i k + V i k)p i (V i k) = P i (T i < )(f i ) k = (f i ) k. Hence the lemma holds by induction. Theorem 5.. We have the dichotomy Proof. Observe that P i (V i < ) = P i ( k ) {V i = k} = i is recurrent f i = i is transient f i <. P i (V i = k) = ( f i )f k k= So i is recurrent iff f i =, and transient iff f i <. k= i = { 0, f i =,, f i <. 7
22 5.3 Equivalence of transience and summability of n-step transition probabilities Theorem 5.3. We have the dichotomy i is recurrent i is transient n=0 n=0 p (n) ii = f i = p (n) ii < f i <. Proof. If i is recurrent, this means P i (V i = ) =, and so ( p (n) ( ) ) ii = E i {Xn=i} = Ei {Xn=i} = E i (V i ) =. n=0 n=0 n=0 If i is transient then (by Theorem 5.) f i < and n=0 p (n) ii = E i (V i ) = P i (V i > r) = r=0 r=0 f r i = f i <. 5.4 Recurrence as a class property Now we can show that recurrence and transience are class properties. Theorem 5.4. Let C be a communicating class. Then either all states in C are transient or all are recurrent. Proof. Take any pair of states i,j C and suppose that i is transient. There exists n,m 0 with p (n) ij > 0 and p (m) ji > 0, and, for all r 0, so r=0 p (n+m+r) ii p (r) jj and so j is transient by Theorem 5.3 p (n) ij p(r) jj p(m) ji p (n) ij p(m) ji r=0 p (n+m+r) ii < In the light of this theorem it is natural to speak of a recurrent or transient class. 8
23 The following blue text gives an alternative way to reach the conclusion that recurrence and transience are class properties. In lectures we will probably omit this. But it is nice to see the contrast with matrix method used in Theorem 5.4. Theorem 5.5. Suppose i is recurrent and i j. Then (a) P j (H i < ) =, (b) P i (H j < ) =, (c) j is recurrent. Note that (c) implies that recurrence and transience are class properties (see also Theorem 5.4). Also, (a) implies that every recurrent class is closed (see also Theorem 5.6). Proof. (a) By the strong Markov property 0 = P i (H i = ) P i (H j < )P j (H i = ). But P i (H j < ) > 0 so P j (H i = ) = 0 = P j (H i < ) =, which is (a). (b) [ Let T (0) i A k = so = 0 and T (k) i X n = j for some T (k ) i be the time ] of kth return to i. Consider the events < n < T (k) i. Let ρ = P i (A k ). The A k are independent, P i (H j < ) = P i ( k A ) ( ) {0, ρ = 0 k = Pi k Ac k =, ρ > 0 Now i j implies ρ > 0, which forces P i (H j < ) =, which is (b). Finally, by (a), (b) and the strong Markov property, P j (T j < ) P j (H i < )P i (H j < ) =. 5.5 Relation with closed classes Theorem 5.6. Every recurrent class is closed. Proof. If C is not closed then there exists i C and j C with i j and m such that P i (X m = j) > 0. Now since j i, and this implies that P i (V i = X m = j) = 0 P i (V i = ) = k P i (V i = X m = k)p i (X m = k) <. So i is not recurrent, and so neither is C. 9
24 Theorem 5.7. Every finite closed class is recurrent. Proof. Let C be such a class. Pick any initial distribution on C. Then i C V i =. Since C is finite some state must be visited infinitely often. So ( ) = P {V i = } i = ) i CP(V i C So for some i, 0 < P(V i = ) = P(H i < )P i (V i = ). But since P i (V i = ) can only take values 0 or, we must have P i (V i = ) =. Thus i is recurrent, and so also C is recurrent. We will need the following result for Theorem 9.8. Theorem 5.8. Suppose P is irreducible and recurrent. Then for all j I we have P(T j < ) =. Proof. By the Markov property we have P(T j < ) = i I P(X 0 = i)p i (T j < ) so it suffices to show that P i (T j < ) = for all i I. Choose m with p (m) ji > 0. By Theorem 5.3 we have = P j (X n = j for infinitely many n) P j (X n = j for some n m+) = k I P j (X n = j for some n m+ X m = k)p j (X m = k) = k I P k (T j < )p (m) jk But k I p(m) jk = so we must have P i (T j < ) =. We conclude with a criterion by which to tell whether an irreducible chain is transient. Theorem 5.9. An irreducible Markov chain is transient iff for some state i there exists a nonzero vector y such that y j = k i p jky k, for all j i, and y j for all j. *Proof*. If the chain is transient then we may take y j = P j (T i < ), with 0 < y j <. On the other hand, if there exists a y as described in the theorem statement, then y j k i p jk y k. By repeated substitution of this into itself (in the manner of the proofs of Theorems 3.4 and 4.) we can show 0 < y j p jk p kk...p km k m = P j (T i > m). k k...k m i Taking a limit as m we have P j (T i = ) > 0 and so the chain is transient. 0
25 6 Random walks in dimensions one, two and three 6. Simple random walk on Z Example 6. (Simple random walk on Z). The simple random walk on Z has diagram q p q p q p i i i+ where 0 < p = q <. Consider f 0 = P 0 (return to 0). Let h i = P i (hit 0). Then f 0 = qh +ph. Using the strong Markov property h = q + ph. The smallest solution is h = min{,q/p}. So if q = p = / then h = h = = f 0 = and the random walk is recurrent. But if q < p then h <, which implies f 0 < and transience. Similarly, if q > p then h < and we have transience. We may also analyse this random walk using Theorem 5.3 (testing to see whether n p(n) 00 is finite or infinite. This is more complicated method, but we now do it as a warm-up for our subsequent analysis of random walk on Z. Suppose we start at 0. Obviously, we cannot return to 0 after an odd number of steps, so p (n+) 00 = 0 for all n. Any given sequence of steps of length n from 0 to 0 occurs with probability p n q n, there being n steps right and n steps left, and the number of such sequences is the number of ways of choosing the n right steps from n. Thus p (n) 00 = ( n n ) p n q n. Stirling s approximation provides an approximation to n! for large n: it is known that n! A n(n/e) n as n for some A [, ) (we do not need here that A = π). Thus p (n) 00 = (n)! (n!) (pq)n (4pq)n A n/ as n. In the symmetric case p = q = / we have that 4pq = and so for some N and all n N, p (n) 00 A n
26 so n=n p (n) 00 A n=n n = which show that the random walk is recurrent. On the other hand, if p q then 4pq = r <, so by a similar argument, for some N, n=n and so the random walk is transient. p (n) 00 A r n < n=n 6. Simple symmetric random walk on Z Example 6. (Simple symmetric random walk on Z ). The simple random walk on Z has diagram and transition probabilities 4 p ij = { /4 if i j = 0 otherwise. Suppose we start at 0. Let us call the walk X n and write X n + and X n of the walk on the diagonal lines y = ±x. for the projection X + n X n X n ThenX n + andxn areindependent symmetricrandomwalkson / Z, and X n = 0 if and only if X n + = 0 = X n. This makes clear that for X n we have ( (n )( ) ) n p (n) 00 = n A n as n
27 by Stirling s formula. Then walk is recurrent. n=0 p(n) 00 = by comparison with 6.3 Simple symmetric random walk on Z 3 n=0 /n and the Example 6.3 (Simple symmetric random walk on Z 3 ). The transition probabilities are given by { /6 if i j = p ij = 0 otherwise. Thus the chain jumps to each of its nearest neighbours with equal probability. There is no easy way to map this into three independent walks on Z as we have done above for the random walk on Z (although see 6.4 below). Suppose we start at 0. We can only return to zero after an even number of steps, say n. Of these n steps there must be i up, i down, j north, j south, k east, k west for some i,j,k 0, with i+j +k = n. By counting ways this can be done we get Now p (n) 00 = i,j,k 0 i+j+k=n (n)! (i!j!k!) ( 6 ) n = i,j,k 0 i+j+k=n ( )( n n ( n i j k ) n )( ) n = 3 i,j,k 0 i+j+k=n ( n i j k ) ( ) n. 3 the left hand side being the total probability of all the ways of placing n balls in 3 boxes. For the case where n = 3m we have ( ) n = n! ( ) i j k i!j!k! n! n m!m!m! = m m m for all i,j,k, so p (n) 00 ( n n )( ) n ( n m m m )( 3 ) n A 3 ( ) 3/ 6 as n. n Hence m=0 p(6m) 00 < by comparison with n=0 n 3/. But p (6m) 00 (/6) p (6m ) 00 and p (6m) 00 (/6) 4 p (6m 4) 00 so for all m we must have n=0 p (n) 00 < and the walk is transient. In fact, it can be shown that the probability of return to the origin is about The above results were first proved in 9 by the Hungarian mathematician George Pólya ( ). 3
28 6.4 *A continuized analysis of random walk on Z 3 * There is a nice way to analyse the random walk on Z 3 that avoids the need for the intricacies above. Consider six independent processes X + (t), X (t), Y + (t), Y (t), Z + (t), Z (t). Each is nondecreasing in continuous time on the states 0,,,..., making a transition + at times that are separated as i.i.d. exponential(/) random variables. Consequently, successive jumps (in one of the 6 directions) are separated as i.i.d. exponential(3) (i.e. as events in a Poisson process of rate 3). Also, P(X + (t) = i) = (t/) i e t/ /i!, and similarly for the others. At time t, P(X + (t) = X (t)) = ( (t/) i e /i!) t/ = I0 (t)e t (πt) /, i=0 where I 0 is a modified Bessel function of the first kind of order 0. Thus, for this continuous-time process p 00 (t) (πt) 3/. Notice that this process observed at its jump times is equivalent to discrete-time random walk on Z 3. Since the mean time between the jumps of the continuous process is /3, we find n=0 p(n) 00 = 3 0 p 00 (t)dt <. 6.5 *Feasibility of wind instruments* Example 6.4. Lord Rayleigh in On the theory of resonance (899) proposed a model for wind instruments in which the creation of resonance through a vibrating column of air requires repeated expansion and contraction of a mass of air at the mouth of the instrument, air being modelled as an incompressible fluid. Think instead about an infinite rectangular lattice of cities. City (0,0) wishes to expand its tax base and does this by inducing a business from a neighboring city to relocate to it. The impoverished city does the same (choosing to beggar-its-neighbour randomly amongst its 4 neighbours since beggars can t be choosers ), and this continues, just like a -D random walk. Unfortunately, this means that with probability the walk returns to the origin city who eventually finds that one of its own businesses is induced away by one of its neighbours, leaving it no better off than at the start. We might say that it is infinitely-hard to expand the tax base by a beggar-your-neighbour policy. However, in 3-D there is a positive probability (about 0.66) that the city (0,0) will never be beggared by one of its 6 neighbours. By analogy, we see that in -D it will be infinitely hard to expand the air at the mouth of the wind instrument, but in 3-D the energy required is finite. That is why Doyle and Snell say wind instruments are possible in our 3-dimensional world, but not in Flatland. We will learn in Lecture something more about the method that Rayleigh used to show that the energy required to create a vibrating column of air in 3-D is finite. 4
29 7 Invariant distributions 7. Examples of invariant distributions Many of the long-time properties of Markov chains are connected with the notion of an invariant distribution or measure. Remember that a measure λ is any row vector (λ i : i I) with non-negative entries. An measure λ is an invariant measure if λp = λ. A invariant measure λ is an invariant distribution if i λ i =. The terms equilibrium, stationary, and steady state are also used to mean the same thing. Example 7.. Here are some invariant distributions for some -state Markov chains. ) ) P = P = P = ( ( ( ) ), λ = (, ), (, ) (, λ = (, ), (, ) ( 3 4 4, λ = ( 3, 3 ), ( 3, 3 ) ( ) ) = (, ) = (, ) = ( 3, 3 ) (A) Does an invariant measure always exist? Can there be more than one? (B) Does an invariant distribution always exist? Can there be more than one? (C) What does an invariant measure or distribution tell me about the chain? (D) How do I calculate π? (Left hand equations, detailed balance, symmetry). 7. Notation We repeat and add to the notation definitions in 5.. T i = inf{n : X n = i} = first passage time to i m i = E i (T i ) = mean return time to i n V i (n) = {Xk =i} = number of visits to i before time n k=0 V k i = V i (T k ) = number of visits to i before first return to k γ k i = E k (V k i ) = mean number of visits to i between successive visits to k Notice that if X 0 = k then V k k =, and hence γk k =. 5
30 7.3 What does an invariant measure or distribution tell us? Suppose π is an invariant distribution and λ is an invariant measure. Under suitable conditions, m i =, γj i π = π ( j = λ ) j i π i λ i and ( Vj (n) P n or, as we say, V j (n)/n π j almost surely. ) π j as n = Example 7.. Consider the two state chain with ( ) α α P = β β Then P n ( β α+β β α+β α α+β α α+β ) = ( ) π π π π π = (π,π ) = (β/(α+β),α/(α+β)) is an invariant distribution. 7.4 Invariant distribution is the solution to LHEs How do we calculate the π i? Example 7.3. Consider again Example.. / 3 / 0 0 P = 0 0 To find an invariant distribution we write down the components of the vector equation π = πp. (We call these left hand equations (LHEs) as π appears on the left of P.) π = π 3 π = π +π π 3 = π +π 3 The right-hand sides give the probabilities for X, when X 0 has distribution π, and the equations require X also to have distribution π,. The equations are homogeneous 6
31 so one of them is redundant, and another equation is required to fix π uniquely. That equation is π +π +π 3 =, and so we find that π = (/5,/5,/5). Recall that in Example. p (n) /5 as n so this confirms Theorem 7.6, below. Alternatively, knowing that p (n) ( ) n( p (n) = a+ bcos nπ +csin nπ ) has the form we can use Theorem 7.6 and knowledge of π to identify a = /5, instead of working out p () in Example.. Alternatively, we may sometimes find π from the detailed balance equations π i p ij = π j p ji for all i,j. If these equations have a solution then it is an invariant distribution for P (since by summing on j we get π j = j π jp ji ). For example, in Example 7. the detailed balance equation is π α = π β. Together with π +π = this gives π = (β,α)/(α+β). We will say more about this in Lecture. Example 7.4. Consider a success-run chain on {0,,...}, whose transition probabilities are given by p i,i+ = p i, p i0 = q i = p i. q i+ q i q q i 0 i i i+ p 0 p p i p i Then the left-hand (invariant measure equations) read π 0 = π i q i = p 0 π 0 = π i q i i=0 π i = π i p i, for i. Let r 0 =, r i = p 0 p p i. So π i = r i π 0. We now show that an invariant measure may not exist. Suppose we choose p i converging sufficiently rapidly to so that i= r := p i > 0 (which is equivalent to i=0 q i <.) i=0 7
32 So q i π i = ( p i )r i π 0 = (r i r i+ )π 0. Also π 0 = π i q i = lim i=0 n n i=0 π i q i = lim n = lim n (r 0 r n+ )π 0 = ( r)π 0 so π 0 = 0 and there is no invariant measure. n (r i r i+ )π 0 i=0 7.5 Stationary distributions The following result explains why invariant distributions are also called stationary distributions. Theorem 7.5. Let (X n ) n 0 be Markov(λ,P) and suppose that λ is invariant for P. Then (X m+n ) n 0 is also Markov(λ,P). Proof. By the Markov property we have P(X m = i) = (λp m ) i = λ i for all i, and clearly, conditional on X m+n = i, X m+n+ is independent of X m,x m+,...,x m+n and has distribution (p ij : j I) 7.6 Equilibrium distributions The next result explains why invariant distributions are also called equilibrium distributions. Theorem 7.6. Let I be finite. Suppose for some i I that p (n) ij π j for all j I. Then π = (π j : j I) is an invariant distribution. Proof. We have and π j = lim n p(n) ij = lim n j I j I π j = lim n p(n) ij = lim n k I j I p (n) ij = p (n) ik p kj = lim n p(n) ik p kj = π k p kj k I k I where we have used finiteness of I to justify interchange of summation and limit operations. Hence π is an invariant distribution. Notice that for any of the symmetric random walks discussed previously we have p (n) ij 0 as n for all i,j I. The limit is certainly invariant, but it is not a distribution. Theorem 7.6 is not a very useful result but it serves to indicate a relationship between invariant distributions n-step transition probabilities. In Theorem 9.8 we shall prove a sort of converse, which is much more useful. 8
33 8 Existence and uniqueness of invariant distribution, mean return time, positive and null recurrence 8. Existence and uniqueness up to constant multiples Example 8. (Non-uniqueness of invariant distribution). ( ) 0 P = 0 All distributions are invariant. (Example Sheet has a less trivial example.) Example 8. (Non-uniqueness with irreducibility). Suppose p q. q p q p q p i i i+ π i = π i p+π i+ q Set λ i =, µ i = (p/q) i, i Z. Then both λ and µ are invariant measures. However, this Markov chain is transient. The next theorem shows that an invariant measure such that λ i = for some i must have 0 < λ j < for all j. Theorem 8.3. Suppose P is irreducible and λ 0 and λ = λp. Then λ 0 or (0 < λ i < for all i) or λ. Proof. λ = λp = λp n. Given i,j there exists n with p (n) ij > 0 (since irreducible) so λ j = k λ k p (n) kj λ i p (n) ij So λ j < = λ i < and λ i > 0 = λ j > 0. In the next two results we shall show that every irreducible and recurrent stochastic matrix P has an essentially unique positive invariant measure. The proofs rely heavily on the probabilistic interpretation so it is worth noting at the outset that, for a finite state-space I, the existence of an invariant row vector is a simple piece of linear algebra: the row sums of P are all, so the column vector of ones is an eigenvector with eigenvalue, so P must also have a row eigenvector with eigenvalue. Recall that for a fixed state k, we define for each i the expected time spent in i between visits to k: T k γi k = E k {Xn=i}. n=0 9
34 Here the sum of indicator functions counts the number of times at which X n = i before the first passage to k at time T k. Theorem 8.4 (Existence of an invariant measure). Let P be irreducible and recurrent. Then (i) γ k k =. (ii) γ k = (γ k i : i I) satisfies γk P = γ k. (iii) 0 < γ k i < for all i. Proof. (i) is obvious. (ii) For n =,,... the event {n T k } depends only on X 0,...,X n so, by the Markov property at n, P k (X n = i,x n = j and n T k ) = P k (X n = i and n T k )p ij. (8.) Since P is recurrent, we have P k (T k < ) =. This means that we can partition the entire sample space by events of the form {T k = t}, t =,,... Also, X 0 = X Tk = k with probability. So for all j (including j = k) γj k = E T k k n= = E k t= n= {Xn=j} t {Xn=j and T k =t} = E k n= t=n = E k {Xn=j and n T k } (since T k is finite) = i I = i I = i I = i I = i I n= P k (X n = i,x n = j and n T k ) n= {Xn=j and T k =t} p ij P k (X n = i and n T k ) (using here (8.)) n= p ij E k n= p ij E k m=0 T k p ij E k m=0 {Xn =i and n T k } {Xm=i and m T k } {Xm=i} = i I γ k i p ij The bit in blue is sort of optional, but is to help you to see how we are using the assumption that T k is finite, i.e. P k (T k < ) =. (iii) Since P is irreducible, for each i there exists n,m 0 with p (n) ik,p(m) ki > 0. Then γi k γk k p(m) ki > 0 and γi kp(n) ik γk k = by (i) and (ii). 30
35 Theorem 8.5 (Uniqueness of an invariant measure). Let P be irreducible and let λ be an invariant measure for P with λ k =. Then λ γ k. If in addition P is recurrent, then λ = γ k. Proof. For each j I we have λ j = λ i0 p i0j = p kj + λ i0 p i0j i 0 I i 0 k = p kj + p ki0 p i0j + λ i p ii 0 p i0j. i 0 k i 0,i k = p kj + p ki0 p i0,j i 0 k i 0,...,i n k i 0,i k p ki p ii 0 p i0,j p kin p ii 0 p i0j + i 0,...,i n k λ in p ini n p ii 0 p i0j P k (X = j and T k )+P k (X = j and T k )+ +P k (X n = j and T k n) γ k j as n. So λ γ k. If P is recurrent, then γ k is invariant by Theorem 8.4, so µ = λ γ k is also invariant and µ 0. Since P is irreducible, given i I, we have p (n) ik > 0 for some n, and 0 = µ k = j µ jp (n) jk µ ip (n) ik, so µ i = Mean return time, positive and null recurrence Recall that a state is recurrent if P i (V i = ) =, i.e. P i (X n = i for infinitely many n) =. We showed in Theorem 5.3 that this is equivalent to f i =, i.e. P i (T i < ) =. If in addition, the mean return time (or expected return time) m i = E i (T i ) is finite, then we say that i is positive recurrent. A recurrent state that fails to have this property is called null recurrent, i.e. m i =. We show in the next lecture that the following trichotomy holds for an irreducible Markov chain: P is transient; P is null-recurrent with m i = for all i; P is positive-recurrent with an invariant distribution π, and m i (= /π i ) < for all i. 3
36 8.3 Random surfer Example 8.6 (Random surfer and PageRank). The designer of a search engine trys to please three constituencies: searchers (who want useful search results), paying advertisers (who want to see click throughs ), and the search provider itself (who wants to maximize advertising revenue). The order in which search results are listed is a very important part of the design. Google PageRank lists search results according to the probabilities with which a person randomly clicking on links will arrive at any particular page. Let us model the web as a graph G = (V,E) in which the vertices of the graph correspond to web pages. Suppose V = n. There is a directed edge from i to j if and only if page i contains a link to page j. Imagine that a random surfer jumps from his current page i by randomly choosing an outgoing link from amongst the L(i) links if L(i) > 0, or if L(i) = 0 chooses a random page. Let { /L(i) if L(i) > 0 and (i,j) E, ˆp ij = /n if L(i) = 0. To avoid the possibility that ˆP = (ˆp ij ) might not be irreducible or aperiodic (a complicating property that we discuss in the next lecture) we make a small adjustment, choosing α [0,), and set p ij = αˆp ij +( α)(/n). In other words, we imagine that at each click the surfer chooses with probability α a random page amongst those that are linked from his current page (or a random page if there are no such links), and with probability α chooses a completely random page. The invariant distribution satisfying πp = π tells us the proportions of time that a random surfer will spend on the various pages. So if π i > π j then i is more important than page j and should be ranked higher. The solution of πp = π was at the heart of Google s (original) page-ranking algorithm. In fact, it is not too difficult to estimate π because P n converges very quickly to its limit π π π n Π = π π π π n =.. π π π n These days the ranking algorithm is more complicated and a trade secret. 3
37 9 Convergence to equilibrium for ergodic chains 9. Equivalence of positive recurrence and the existence of an invariant distribution Theorem 9.. Let P be irreducible. Then the following are equivalent: (i) every state is positive recurrent; (ii) some state i is positive recurrent; (iii) P has an invariant distribution, π say. Moreover, when (iii) holds we have m i = /π i for all i. Proof. (i) = (ii) is obvious. (ii) = (iii) If i is positive recurrent, it is certainly recurrent, so P is recurrent. By Theorem 8.4, γ i is then invariant. But γj i = m i < j I so π k = γ i k /m i defines an invariant distribution. (iii) = (i) Since i I π i = we have > π k > 0 for some k. By Theorem 8.3 we must therefore have π k > 0 for all k. (Alternatively, for any k, we must have π k = i I π ip (n) ik > 0 for some n.) Pick any k. Set λ i = π i /π k. Then λ is an invariant measure with λ k = So by Theorem 8.5, λ γ k. Hence m k = i I γ k i i I and so k is positive recurrent for all k, which is (i). π i π k = π k < (9.) Finally, we have just shown that (iii) implies that P is recurrent. So by Theorem 8.5 λ = γ k and the inequality (9.) is in fact an equality. Thus m k = /π k. Example 9. (Simple symmetric random walk on Z). The simple symmetric random walk on Z is clearly irreducible, and by Example 6. it is recurrent. Consider the measure λ i = for all i. Then λ i = λ i ( /) +λ i+ ( /) and so λ is invariant. Any invariant distribution must be a scalar multiple of λ. Since i I λ i = there can be no invariant distribution and so the walk is null recurrent. Example 9.3. The existence of an invariant measure, does not guarantee recurrence. Consider, for example, the simple symmetric random walk on Z 3, which is transient, but has invariant measure λ given by λ i = for all i. 33
38 9. Aperiodic chains We next turn to an investigation of the limiting behaviour of the n-step transition probabilities p (n) ij as n. As we saw in Theorem 7.6, if the state-space is finite and if for some i the limit p (n) ij π j exists for all j, then π must be an invariant distribution. But, as the following example shows, the limit does not always exist. Example 9.4. Consider the two-state chain with transition matrix ( ) 0 P = 0 Then P = I = P n and P n+ = P. Thus p (n) ij fails to converge for all i,j. Let us call a state i aperiodic if p (n) ii > 0 for all sufficiently large n. The behaviour of the chain in Example 9.4 is connected with its periodicity. Lemma 9.5. A state i is aperiodic if there exist n,...,n k, k, with no common divisor and such that p (nj) ii > 0, for all j =,...,k. Proof. For all sufficiently large n we can write n = a n + + a k n k using some non-negative integers a,...,a k. So then p (n) ii p (n) ii p (n) ii } {{ } a times p (n) ii p (n) ii } {{ } a times p (n k) ii > 0. } {{ } a k times ii p (n k) Similarly, if d is the greatest common divisor of all those n for which p (n) ii > 0 and d, then for all n that are sufficiently large and divisible by d we can write > 0. It follows that for all n sufficiently large p (n) ii > 0 if and only if n is divisible by d. This shows that such a state i which is not aperiodic is rightly called periodic, with period d. One can also show that if two states communicate then they must have the same period. n = a n + +a k n k for some n,...,n k such that p (n) ii p (n k) ii Lemma 9.6. Suppose P is irreducible, and has an aperiodic state i. Then, for all states j and k, p (n) jk > 0 for all sufficiently large n. In particular, all states are aperiodic. Proof. There exist r,s > 0 with p (r) ji,p(s) ik > 0. Then for all sufficiently large n. p (r+n+s) jk p (r) ji p(n) ii p (s) ik > 0 A Markov chain is called a regular chain if some power of the transition matrix has only positive elements. We have shown above that if a finite state space Markov chain is aperiodic and irreducible then it is regular. An aperiodic, irreducible and positive recurrent Markov chain is called an ergodic chain. Theorem 9.8 is about ergodic chains. 34
39 9.3 Convergence to equilibrium *and proof by coupling* We begin this section with a magic trick. Example 9.7. (from Grinstead and Snell s Introduction to Probability, nd edition, page 45, also in Grimmett and Stirzaker) Here is a trick to try on your friends. Shuffle a deck of cards and deal them out one at a time. Count the face cards each as ten. Ask your friend to look at one of the first ten cards; if this card is a six, she is to look at the card that turns up six cards later; if this card is a three, she is to look at the card that turns up three cards later, and so forth. Eventually she will reach a point where she is to look at a card that turns up x cards later but there are not x cards left. You then tell her the last card that she looked at even though you did not know her starting point. You tell her you do this by watching her, and she cannot disguise the times that she looks at the cards. In fact you just do the same procedure and, even though you do not start at the same point as she does, you will most likely end at the same point. Why? Using a computer, I have simulated this game 0,000 times. In each game I have computed whether or not two players finish with the same last card, doing this for each of the 00 possible pairs of their initial choices of numbers. Between games I have shuffled the pack of cards by applying a random number of in- and out-shuffles that are chosen by a simulations of 0 00 Bernoulli trials. I find that the probability that the two players last cards match one another is about This increases to 0.90 if using two decks of cards and 0.96 with four decks of cards. The proof of Theorem 9.8 that now follows has similarities to the magic trick in that it works by coupling two Markov chains. It is beautiful and ingenious. I like what J. Michael Steele says: Coupling is one of the most powerful of the genuinely probabilistic techniques. Here by genuinely probabilistic we mean something that works directly with random variables rather than with their analytical co-travelers (like distributions, densities, or characteristic functions). Ifyouunderstandthe proofoftheorem9.8and haveagoodpicturewhat isgoingon in your mind, then I think you are developing excellent intuitive probabilistic insight. Theorem 9.8 (Convergence to equilibrium). Let P be the transition matrix of an ergodic Markov chain (i.e. irreducible, aperiodic and positive recurrent), with invariant distribution π. Then (for any initial distribution) P(X n = j) π j as n for all j. In particular p (n) ij π j as n for all i,j. Proof. We use a coupling argument that is due to French mathematician Vincent Doblin (937). Let (X n ) n 0 be and (Y n ) n 0 be independent and Markov(λ,P) and Markov(π, P), respectively. Fix a reference state b and set T = inf{n : X n = Y n = b}. 35
40 Step. We show P(T < ) =. To see this, observe that process W n = (X n,y n ) is a Markov chain on I I with transition probabilities p (i,k)(j,l) = p ij p kl and initial distribution µ (i,k) = λ i π k. Since P is aperiodic, for all states i,j,k,l we have p (n) (i,k)(j,l) = p(n) ij p(n) kl > 0 for sufficiently large n, so P is irreducible. Also, P has an invariant distribution of π (i,k) = π i π k. So by Theorem 9., P is positive recurrent. But T is the first passage time of Wn to (b,b) and so P(T < ) = by Theorem 5.8. Step. P(X n = i) = P(X n = i,n T)+P(X n = i,n < T) = P(Y n = i,n T)+P(X n = i,n < T) = P(Y n = i,n T)+P(Y n = i,n < T) P(Y n = i,n < T) +P(X n = i,n < T) = P(Y n = i) P(Y n = i,n < T)+P(X n = i,n < T) = π i P(Y n = i,n < T)+P(X n = i,n < T) where the second line follows from X T = Y T, and the last line follows from fact that the initial distribution of Y 0 is the invariant distribution. Now P(Y n = i,n < T) P(n < T) and P(n < T) P(T = ) = 0 as n. Hence P(Y n = i,n < T) 0. Similarly, P(X n = i,n < T) 0. This proves P(X n = i) π i. To understand this proof one should see what goes wrong when P is not aperiodic. Consider the two-state chain of Example 9.4 which has(/, /) as its unique invariant distribution. We start (X n ) n 0 from and (Y n ) n 0 with equal probability from or. However, if Y 0 =, then, because of periodicity, (X n ) n 0 and (Y n ) n 0 will never meet, and the proof fails. 36
41 0 Long-run proportion of time spent in given state 0. Ergodic theorem Theorem 0. (Strong law of large numbers). Let Y,Y... be a sequence of independent and identically distributed non-negative random variables with E(Y i ) = µ. Then We do not give a proof. ( Y + +Y n P n ) µ as n =. Recall the definition V i (n) = n k=0 {X k =i} =number of visits to i before time n. Theorem 0. (Ergodic theorem). Let P be irreducible and let λ be any distribution. Suppose (X n ) 0 n N is Markov(λ,P). Then ( Vi (n) P n ) as n = m i where m i = E i (T i ) is the expected return time to state i. Moreover, in the positive recurrent case, for any bounded function f : I R we have ( ) n P f(x k ) n f as n = where k=0 f = i π i f(i) and where (π i : i I) is the unique invariant distribution. Notice that for this result we do not need P to be aperiodic. Proof. We prove only the first part. If P is transient, then, with probability, the total number V i of visits to i is finite, so V i (n) n V i n 0 = m i. Suppose that P is recurrent and fix a state i. For T = T i we have P(T < ) = by Theorem 5.8 and (X T +n) n 0 is Markov(δ i,p) and independent of X 0,X,...,X T by the strong Markov property. The long-run proportion of time spent in i is the same for (X T+n ) n 0 and (X n ) n 0 so it suffices to consider the case λ = δ i. Recall that the first passage time to state i is a random variable T i defined by T i (ω) = inf{n : X n (ω) = i} 37
42 where inf =. We now define inductively the rth passage time T (r) i to state i by and for r = 0,,,... T (0) i (ω) = 0, T () i (ω) = T i (ω), T (r+) i (ω) = inf{n T (r) i (ω)+ : X n (ω) = i}. The length of the rth excursion is then S (r) i = { T (r) i T (r ) i if T (r ) i 0 otherwise. < i k j k i l i i k j i l j i 0 S () i S () i S (3) i S (Vi(n) ) i n n S (Vi(n)) i Write S (r) i forthe length ofthe rth excursionto i. Onecan see, using the strongmarkov property, that the non-negative random variables S () i,s () i,... are independent and identically distributed with E i (S (r) i ) = m i. Now S () i + +S (Vi(n) ) i n, the left-hand side being the time of the last visit to i before n. Also S () i + +S (Vi(n)) i n, the left-hand side being the time of the first visit to i after n. Hence S () i + +S (Vi(n) ) i V i (n) By the strong law of large numbers ( S () i + +S (n) i P n and, since P is recurrent n V i (n) S() i m i as n P (V i (n) as n ) =. So letting n in (0.), we get ( ) n P V i (n) m i as n =. which implies ( Vi (n) P n + +S (Vi(n)) i. (0.) V i (n) ) ) as n =. m i = 38
43 0. *Kemeny s constant and the random target lemma* In this section we explore a bit further the calculation of mean recurrence times. Define m ij = E i (T j ) = E i (inf{n : X n = j}) so m ii = m i. How can we compute these mean recurrence times? Throughout this section we suppose that the Markov chain is irreducible and positive recurrent. Suppose (X n ) n 0 is Markov(δ i,p) and (Y n ) n 0 is Markov(π,P), where π is the invariant distribution. Define z ij := E i [ n=0 ( ) ] {Xn=j} {Yn=j} = n=0 (p (n) ij π j ). (0.) An application of the coupling idea (that eventually X n = Y n ) can be used to show that this sum is finite. Then T j z ij = E i ( {Xn=j} π j ) +E j n=0 = δ ij π j E i (T j )+z jj. n=t j ( {Xn=j} π j ) Hence by putting i = j we get m i = E i (T i ) = /π i, as we have seen before. For i j, m ij = E i (T j ) = z jj z ij π j. (0.3) We will figure out how to compute the z ij shortly. First, let us consider the question of how long it takes to reach equilibrium. Suppose j is a random state that is chosen according to the invariant distribution. Then starting from state i, the expected number of steps required to hit state j (possibly 0 if i = j) is π j E i (T j ) = jj z ij ) = j i j i(z (z jj z ij ) = z jj (0.4) j j since from (0.) we see that j z ij = 0. The right hand side of (0.4) is (surprisingly!) a constant K (called Kemeny s constant) which is independent of i. This is known as the random target lemma. There is a nice direct proof also. Lemma 0.3 (random target lemma). Starting in state i, the expected time of hitting a state j that is chosen randomly according to the invariant distribution, is j i π je i (T j ) = K (a constant independent of i). Proof. We wish to compute k i = j π jm ij π i m i. The first step takes us closer to the target state, unless (with probability π i ) we start in the target. So k i = + j p ij k j π i m i = j p ij k j. 39
44 Writing k = (k,...,k n ), this says k = Pk. Since each k i is some weighted-average of k,...,k n we see that k i must be a constant, say k i = K. Now we show how to compute the matrix Z = (z ij ). Let Π be the matrix in which each row is π. Suppose that the Markov chain is ergodic (aperiodic, irreducible and positive recurrent). Then we know that P n Π. It is easy to see that ΠP = PΠ = ΠΠ = Π and so (P Π) k = (P k Π). Let us define Z by Z := Z +Π = I +(P Π)+(P Π)+ (0.5) = I +(P Π)+(P Π) + = (I (P Π)). The right hand side of (0.5) converges since each terms is growing closer to 0 geometrically fast. Equivalently, we see that (I P +Π) is non-singular and so has an inverse. For if there were to exist x such that (I P +Π)x = 0 then π(i P +Π)x = πx πx πx = 0 = πx = 0 = Πx = 0 = (I P)x = 0. The only way this can happen is if x is a constant vector (by the same argument as above that k = Pk implies k is a constant vector). Since π has only positive components this implies x = 0. We have seen in (0.3) that m ij = (z ij z jj )/π j, and in (0.4) that K = trace(z) = trace( Z) trace(π) = trace( Z). The eigenvalues of Z are the reciprocals of the eigenvalues of Z = I P +Π, and these are the same as, λ,..., λ N, where,λ,...,λ N are the eigenvalues of P (assuming there are N states). Since the trace of a matrix is the sum of its eigenvalues, K = N i= λ i. Recall that in the two-state chain of Example.4 the eigenvalues are (, α β). So in this chain K = /(α+β). 40
45 Time reversal, detailed balance, reversibility, random walk on a graph. Time reversal For Markov chains, the past and future are independent given the present. This property is symmetrical in time and suggests looking at Markov chains with time running backwards. On the other hand, convergence to equilibrium shows behaviour which is asymmetrical in time: a highly organised state such as a point mass decays to a disorganised one, the invariant distribution. This is an example of entropy increasing. So if we want complete time-symmetry we must begin in equilibrium. The next result shows that a Markov chain in equilibrium, run backwards, is again a Markov chain. The transition matrix may however be different. Theorem.. Let P be irreducible and have an invariant distribution π. Suppose that (X n ) 0 n N is Markov(π,P) and set Y n = X N n. Then (Y n ) 0 n N is Markov(π, ˆP) where ˆP is given by π jˆp ji = π i p ij and P is also irreducible with invariant distribution π. Proof. First we check that ˆP is a stochastic matrix: ˆp ji = π i p ij = π j i since π is invariant for P. Next we check that π is invariant for ˆP: π jˆp ji = π i p ij = π i j I j I since P is a stochastic matrix. We have P(Y 0 = i 0,Y = i,...,y N = i N ) = P(X 0 = i N,X = i N,...,X N = i 0 ) i = π in p ini N p ii 0 = π i0ˆp i0i ˆp in i N so by Theorem.3 (Y n ) 0 n N is Markov(π, ˆP). Finally, since P is irreducible, for each pair of states i,j, there is a chain of states i 0 = i,i i,...,i n,i n = j with p i0i p in i n > 0. So and thus ˆP is also irreducible. ˆp ini n ˆp in i n = π i0 p i0i ˆp in i n /π in > 0 4
46 . Detailed balance A stochastic matrix P and a measure λ are said to be in detailed balance if λ i p ij = λ j p ji for all i,j. Though obvious, the following result is worth remembering because when a solution λ to the detailed balance equations exists, it is often easier to find by the detailed balance equations than by the equation λ = λp. Lemma.. If P and λ are in detailed balance, then λ is invariant for P. Proof. We have (λp) i = j λ jp ji = j λ ip ij = λ i. Examples.3. (a) A two-state chain like P = ( / / /3 /3 (b) Any walk on 0,...,N with (p i,i,p i,i+ ) = (q i,p i ): ). p 0 q i p i q N 0 i i i+ N N We have π i p i = π i+ q i+ = π i = p i p 0 q i q π 0. (c) Consider walk on a triangle with P = 0 a a a 0 a a a 0 a a a a 3 a a Detailed balance equations are π a = π ( a) π a = π 3 ( a) π 3 a = π ( a) So there is a solution if a = /. But not if a /. 4
47 .3 Reversibility Let (X n ) n 0 be Markov(λ,P), with P irreducible. We say that (X n ) n 0 is reversible if, for all N, (X N n ) 0 n N is also Markov(λ,P). Theorem.4. Let P be an irreducible stochastic matrix and let λ be a distribution. Suppose that (X n ) n 0 is Markov(λ,P). Then the following are equivalent: (a) (X n ) n 0 is reversible; (b) P and λ are in detailed balance. Proof. (a) X N has distribution λ for all N and ˆp ij = (λ j /λ i )p ji = p ij for all i,j (b).4 Random walk on a graph Example.5. Random walk on the vertices of a graph G = (V,E). The states are the vertices (i V), some of which are joined by edges ((i,j) E). For example: 4 3 Thusagraphisapartially drawn Markov chain diagram. There is a natural way to complete the diagram which gives rise to the random walk on G. The valence (or degree) v i of vertex i is the number of edges at i. We assume that every vertex has finite valence. The random walk on G picks edges with equal probability: Thus the transition probabilities are given by { /v i if (i,j) E p ij = 0 otherwise Assume G is connected, so that P is irreducible. It is easy to see that P is in detailed balance with v = (v i : i G), i.e. v i P ij = v i (/v i ) =. So, if the total valence σ = i v i is finite. then π i = v i /σ is invariant and P is reversible. The invariant distribution in this example is π = (/0)(,3,,3). So, for example, m = /π =
48 Example.6. Random walk on a tree with arbitrary transition probabilities Clearly, we can solve detailed balance equations if there are no cycles. E.g. 4 π = π Example.7 (Random chess board knight). A random knight makes each permissible move with equal probability. If it starts in a corner, how long on average will it take to return? This is an example of a random walk on a graph: the vertices are the squares of the chess board and the edges are the moves that the knight can take. The diagram shows valences of the 64 vertices of the graph. We know by Theorem 9. and the preceding example that i E c (T c ) = /π c = v i v c so all we have to do is, identify valences. The four corner squares have valence ; and the eight squaresadjacent tothe cornershavevalence3. Thereare0squaresofvalence 4; 6 of valence 6; and the 6 central squares have valence 8. Hence, E c (T c ) = = 68. Obviously this is much easier than solving sets of 64 simultaneous linear equations to find π from πp = π, or calculating E c (T c ) using Theorem
49 Concluding problems and recommendations for further study. Reversibility and Ehrenfest s urn model Example. (Ehrenfest s urn model). In Ehrenfest s model for gas diffusion N balls are placed in urns. Every second, a ball is chosen at random and moved to the other urn (Paul Ehrenfest, ). At each second we let the number of balls in the first urn, i, be the state. From state i we can pass only to state i and i +, and the transition probabilities are given by /N i/n 0 i i i+ N /N i/n /N This defines the transition matrix of an irreducible Markov chain. Since each ball moves independently of the others and is ultimately equally likely to be in either urn, we can see that the invariant distribution π is the binomial distribution with parameters N and /. It is easy to check that this is correct (from detailed balance equations). So, π i = N ( ) N / ( ), and m i = N N. i i Consider in particular the central term i = N/. We have seen that this term is approximately / πn/. Thus we may approximate m N/ by πn/. Assume that we let our system run until it is in equilibrium. At this point, a video is made, shown to you, and you are asked to say if it is shown in the forward or the reverse direction. It would seem that there should always be a tendency to move toward an equal proportion of balls so that the correct order of time should be the one with the most transitions from i to i if i > N/ and i to i+ if i < N/. However, the reversibility of the process means this is not the case. Thefollowingchartshows{X n } 500 n=300 in a simulationofan urnwith 0balls, started at X 0 = 0. Next to it the data is shown reversed. There is no apparent difference
50 . Reversibility and the M/M/ queue Example.. Consider the following random walk with 0 < p = q < /. q p p p 0 i- i i+ q q q q p p This is a discrete time version of the M/M/ queue. At each discrete time instant there is either an arrival (with probability p) or a potential service completion (with probability q). The potential service completion is an actual service completion if the queue is nonempty. Suppose we do not observe X n, but we hear a beep at time n+ if X n+ = X n +. We hear a chirp if X n+ = X n. There is silence at time n+ if X n = X n+ = 0. What can we say about the pattern of sounds produced by beeps and chirps? If we hear a beep at time n then the time of the next beep is n + Y where Y is a geometric random variable with parameter p, such that P(Y = k) = q k p, k =,,.... Clearly EY = /p. As for chirps, if we hear a chirp at time n, then time of the next chirp is at n+z, with P(Z = k) = p k q, and EZ = /q, but only if X n > 0. If X n = 0 then the time until the next chirp is at n+t where T has the distribution of Y +Z. This suggests that we might be able to hear some difference in the soundtracks of beeps and chirps. However, (X n ) n 0 is reversible, with pπ i = qπ i+, i = 0,,... So once we have reached equilibrium the reversed process has the same statistics as forward process. Steps of i i+ and i+ i are interchanged when we run the process in reverse. This means that the sound pattern of the chirps must be the same as that of beeps, and we must conclude that the distribution of the times between successive chirps are also i.i.d. geometric random variable with parameter p. This is surprising! The following simulations is for p = beeps silent chirps Burke s Theorem (956) is a version of this result for the continuous time Markov process known as the M/M/ queue. This is a single-server queue to which customers arrive as a Poisson process of rate λ, and in which service times are independent and exponentially distributed with mean /µ (and µ > λ). Burke s theorem says that the steady-state departure process is a Poisson process with rate parameter λ. This means 46
51 that an observorwho sees only the departure process cannot tell whether or not there is a queue located between the input and output! Equivalently, if each arrival produces a beep and each departure produces a chirp, we cannot tell by just listening which sound is being called a beep and which sound is being called a chirp..3 *The probabilistic abacus* Peter Winkler (Mathematical Mind-Benders, 007) poses the following puzzle. In a circle are n children, each holding some gumdrops. The teacher gives an additional gumdrop to each child who has an odd number, then each child passes half of his or her gumdrops to the child on his left. These two steps are repeated until they no longer have any effect. Prove that this process will indeed eventually terminate, with all kids have the same (even) number of gumdrops. To solve this puzzle, let m i (t) be the number of gumdrops held by the child i just before they pass for the (t+)th time, t = 0,,... Let M = max i m i (0). Notice that m i () M for all i, and inductively m i (t) M for all i and t. But until the game ends the teacher gives out at least gumdrop per round. Hence the game must terminate within a finite number of rounds. Grinstead and Snell describe a similar algorithm which works for more general Markov chains as follows (pages ). Let P = (p ij ) be the n n transition matrix of an irreducible and aperiodic Markov chain of n states, with all entries rational. A class of n children plays a game of successive rounds. Initially, child i has m i (0) gumdrops, and at end of a round t she has m i (t) gumdrops. The teacher then hands out gumdrops so that each child has n i (t) gumdrops, where n i (t) is the least number not less than m i (t) such that p ij n i (t) is an integer, for every j. Finally, for each i and j, child i passes p ij n i (t) gumdrops to child j. That completes the round, child i now has m i (t + ) gumdrops, and the game continues to the round t +. Again, we can see that in a finite number of rounds we are guaranteed to reach a situation in which the teacher does not have to give out any more gumdrops, and that the holdings of gumdrops correspond to an invariant measure. (This is the algorithm for computing an invariant distribution that is described by Arthur Engel, Wahrscheinlichkeitsrechnung und Statistik, vol. (Stuttgart: Klett Verlag, 976). He also describes an algorithm for an absorbing Markov chain that calculates the absorption probabilities. See Appendix C.).4 *Random walks and electrical networks* Recall the problem of finding absorption probabilities. Consider random walk on a graph and let h x = P x (H a < H b ), the probability when starting from x of hitting a before b. 47
52 x y L b x y L b v b = 0 a a v a = This satisfies the right hand equations h = Ph, with boundary conditions, i.e. h x = y p xy h y = y d x h y, x a,b, with h a =, h b = 0. (.) where d x is the degree (or valence) of node x. Now consider the electrical network in which a ohm resistor replaces each link and a battery holds the voltages at a and b to and 0 respectively. By Ohm s Law the current from x to y is i xy = voltage drop from x to y resistance between x and y = v x v y. ByKirchhoff slaws, the currentinto xmust be equaltothe currentout. Soforx a,b 0 = y i xy = y (v x v y ) = v x = y d x v y. Thus v satisfies the same equations (.) as does h and so h = v. A function h which satisfies h = Ph is said to be a harmonic function. We can now consider the following questions: (i) what is the probability, starting at a that we return to a before hitting b? (ii) how does this probability change if we remove a link, such as L? We have seen that to answer (i) we just put a battery between a and b, to fix v a = and v b = 0, and unit resistors at each link. Then v y is the probability that a walker who starts at y hits a before b. Also, y p ayv y is the probability that a walker who starts at a returns to a before reaching b. The current flowing out of a (and in total around the whole network) is i ay = ( (v a v y ) = d a ) p ay v y = d a ( P a (H a < H b )). y y y Rayleigh s Monotonicity Law says that if we increase the resistance of any resistor then the resistance between any two points can only increase. So if we remove a link, such as L, (which is equivalent to increasing the resistance of the corresponding resistor to infinity), then the resistance between points a and b increases, and so that a voltage 48
53 difference of drives less current around the network and thus the quantity y p ayv y increases, i.e. P a (H a < H b ) increases. It is perhaps a bit surprising that the removal of a link always has this effect, no matter where that link is located in the graph. We may let the size of the network increase towards that of the infinite lattice of resistors and ask What happens to the effective resistance between the node at the centre and those nodes around the outer boundary as the distance between them tends to infinity?. By what we know of recurrence properties of symmetric andom walk, it must be that in and dimensions the effective resistance tends to infinity, whereas in 3 dimensions it tends to something finite. (This next bit will probably not be lectured.) We can also give a nice interpretation of the current in a link as the net flow of electrons across the link as the electrons are imagined to perform random walks that start at a and continue until reaching b. Specifically, suppose a walker begins at a and performs a random walk until reaching b; note that if he returns to a before reaching b, he keeps on going. Let u x = E a [V x (T b )] be the expected number of visits to state x before reaching b. Clearly, for x a,b, u x = y u y p yx, which is equivalent to u x d x = y p xy u y d y. Thus u x /d x is harmonic, with u b = 0. So u = v for a network in which we fix v b = 0. The current from x to adjacent node y is i xy = v x v y = u x d x u y d y = u x p xy u y p yx. Now u x p xy is the expected value of the number of times that the walker will pass along the edge from x to y. Similarly, u y p yx is the expected value of the number of times that the walkerwill pass along the edge from y to x. So the current i xy has an interpretation as the expected value of the net number of times that the walker will pass along the edge from x to y, when starting at a and continuing until hitting b. We should fix v a so that y i ay = since, given a start at a, this is necessarily. 49
54 .5 Probability courses in Part II There are several courses in Part II of the tripos that should interest you if you have enjoyed this course. All of these build on Probability IA, and the knowledge of Markov Chains which you now have gained (although Markov Chains is not essential). Probability and Measure: this course develops important concepts such as expectation and the strong law of large numbers more rigorously and generally than in Probability IA. Limits are a key theme. For example, one can prove that that certain tail events can only have probabilities of 0 or. We have seen a example of this in our course: P i (X n makes infinitely many returns to i) takes values 0 or (never /). The Measure Theory that is in this course is important for many mathematicians, not just those who are specializing in optimization/probability/statistics. Applied Probability: this course is about Markov processes in continuous time, with applications in queueing, communication networks, insurance ruin, and epidemics. Imagine our frog in Example. waits for an exponentially distributed time before hopping to a new lily pad. What now is p 57 (t), t 0? As you might guess, we find this by solving differential equations, in place of the recurrence relations we had in discrete time. Optimization and Control: we add to Markov chains notions of cost, reward, and optimization. Suppose a controller can pick, as a function of the current state x, the transition matrix P, to be one of a set of k possible matrices, say P(a), a {a k,...,a k }. We have seen this type of problem in asking how should a gambler bet when she wishes to maximize her chance of reaching 0 pounds, or asking how many flowers should a theatre owner send to a disguntled opera singer. Perhaps we would like to steer the frog to arrive at some particular lily pad in the least possible time, or with the least cost. Suppose three frogs are placed at different vertices of Z. At each step we can choose one of the frogs to make a random hop to one of its neighbouring vertices. We wish to minimize the expected time until we first have a frog at the origin. One could think of this as a playing golf with more than one ball. Stochastic Financial Models: this is about random walks, Brownian motion, and other stochastic models that are useful in modelling financial products. We might like to design optimal bets on where the frog will end up, or hedge the risk that he ends up in an unfavourable place. For example, suppose that our frog starts in state i and does a biased random walk on {0,,...,0}, eventually hitting state 0 or 0, where she then wins a prize worth 0 or 0. How much would we be willing to pay at time 0 for the right (option) to buy her final prize for s? I hope some of you will have enjoyed Markov Chains to such an extent that you will want to do all these courses in Part II! 50
55 Appendix A Probability spaces The models in this course take place in a probability space (Ω, F,P). Let us briefly review what this means. Ω is a set of outcomes (ω Ω) F is a nonempty set of subsets of Ω (corresponding to possible events), and such that (i) Ω F (ii) A F = A c F (iii) A n F for n N = n N A n F (a union of countably many members of F) Note that these imply n N A n F, because n A n = ( n Ac n )c. A collection of subsets with these properties is called a σ-algebra or σ-field. ( sigma refers to the fact we are dealing with countable unions.) P : F [0,] satisfies P(Ω) = A n F for n N, A n disjoint = P( n A n) = n= P(A n). Note that this sum is well-defined, independently of how we enumerate the A n. For if (B n : n N) is an enumeration of the same sets, but in a different order, then given any n there exists an m such that n m A i i= and it follows that i= P(A i) i= P(B i). Clearly, is also true, so the sums are equal. Another important consequence of these axioms is that we can take limits. Proposition A.. If (A n ) n N is a sequence of events and lim n A n = A, then lim n P(A n ) = P(A). We use this repeatedly. For example, if V i is the number of visits to i, then P i (V i = ) = P i (lim k [V i k]) = lim k P i (V i k) = lim k fi k, which is 0 or, since 0 f i (we cannot have P i (V i = ) = /). X : Ω I is a random variable if i= {X = i} = {ω Ω : X(ω) = i} F for all i I. Then P(X = i) = P({X = i}). This is why we require {X = i} F. B i 5
56 B Historical notes Grinstead and Snell (Introduction to Probability, 997) say the following (pages ): Markov chains were introduced by Andrei Andreyevich Markov (856 9) and were named in his honor. He was a talented undergraduate who received a gold medal for his undergraduate thesis at St. Petersburg University. Besides being an active research mathematician and teacher, he was also active in politics and participated in the liberal movement in Russia at the beginning of the twentieth century. In 93, when the government celebrated the 300th anniversary of the House of Romanov family, Markov organized a counter-celebration of the 00th anniversary of Bernoulli s discovery of the Law of Large Numbers. Markov was led to develop Markov chains as a natural extension of sequences of independent random variables. In his first paper, in 906, he proved that for a Markov chain with positive transition probabilities and numerical states the average of the outcomes converges to the expected value of the limiting distribution (the fixed vector). In a later paper he proved the central limit theorem for such chains. Writing about Markov, A. P. Youschkevitch remarks: Markov arrived at his chains starting from the internal needs of probability theory, and he never wrote about their applications to physical science. For him the only real examples of the chains were literary texts, where the two states denoted the vowels and consonants. ed. C. C. Gillespie (New York: Scribner s Sons, 970), pp ) In a paper written in 93, Markov chose a sequence of 0,000 letters from Pushkin s Eugene Onegin to see if this sequence can be approximately considered a simple chain. He obtained the Markov chain with transition matrix ( ) The fixed vector for this chain is (.43,.568), indicating that we should expect about 43. percent vowels and 56.8 percent consonants in the novel, which was borne out by the actual count. See Dictionary of Scientific Biography, A. A. Markov, An Example of Statistical Analysis of the Text of Eugene Onegin Illustrating the Association of Trials into a Chain, Bulletin de l Acadamie Imperiale des Sciences de St. Petersburg, ser. 6, vol. 7 (93), pp
57 C The probabilistic abacus for absorbing chains This section is about Engel s probabilistic abacus, an algorithmic method of computing the absorption probabilities of an absorbing Markov chain. The version of the abacus that calculates the invariant distribution of a recurrent Markov chain has already been described in.3. These algorithms are of no practical value! However, they are interesting because it is challenging to understand why they work, and thinking about that will help to build your intuition for Markov processes. Consider an absorbing Markov chain with rational transition probabilities, as in P = / /4 /3 /4 /3 /3 /3 /3 Clearly {,,3} are transient and {4,5} are absorbing. Suppose we want to find α 4, the probability that starting in state absorption takes place in state 4. We can do this by Engel s algorithm, which is equivalent to playing a so-called chip firing game. It works like this. We create one node for each state and put some chips (or tokens) at the nodes corresponding to the non-absorbing states, {,,3}. Suppose that there are integers r i,r i,...,r in such that p ij = r ij /r i for all j. If there were r i chips at node i we could fire or make a move in node i. This means taking r i chips from node i and moving r ij of them to node j, for each j. The critical loading is one in which each node has one less chip that it needs to fire, i.e. c i = r i. So c = 3 and c = c 3 =. We start with a critical loading by placing tokens at nodes,, 3, 4, 5 in numbers: (3,,,0,0), as shown above, and then add a large number of tokens to another node 0. Firing node 0 means moving one token from node 0 to node. Engel s algorithm also imposes the rule that node 0 may be fired only if no other node can fire. Starting from the critical loading we fire node 0 and then node : (3,,,0,0) 0 (4,,,0,0) (,3,3,0,0) 53
58 /4 /4 /4 /4 /3 /4 /3 /4 /3 / / / /3 /3 /3 /3 /3 /3 /3 /3 /3 /3 /3 /3 Now nodes or 3 could fire. Suppose we fire 3 and then 3 : (,3,3,0,0) 3 (,3,,0,) (3,0,,,) /4 /4 /3 / /3 /3 /3 /3 Now we fire the sequence 0,, 3, 0, 0,, 0: (3,0,,,) 0 (4,0,,,) (,,3,,) 3 (,,,,4) 0 (3,,,,4) 0 (4,,,,4) (,,,,4) 0 (3,,,,4) /4 /4 /3 / /3 /3 /3 /3 At this point we stop, because nodes, and 3 now have exactly the same loading as at the start. We are at (3,,,0,0)+(0,0,0,,4). We have fired 0 five times and ended up back at the critical loading, but with token in node 4 and 4 tokens in node 5. Thus α 4 = /5 and α 5 = 4/5. Why does this algorithm work? It is fairly obvious that if the initially critically loaded configuration of the transient states reoccurs then the numbers of tokens that have appeared in the nodes that correspond to the absorbing states must be in quantities that are in proportion to the absorptions probabilities, α j. But why is the initial 3 If we fire then we reach (3,0,4,,0) and we next must fire 0 again, but ultimately the final configuration will turn out to be the same. 54
59 critically loaded configuration guaranteed to eventually reappear? This puzzled Engel in 976, and was proved circa 979 by Peter Doyle, whose proof we now present. It is much more difficult than to prove that the abacus described in.3 terminates. (The proof is interesting. There is, in fact, a rich literature on the properties of chip firing games and this proof generalises to show that many chip firing games have the property that the termination state does not depend on the order in which moves are made.) Proof that the critical loading reoccurs.. At times that we fire node 0 the loadings of the nodes corresponding to the nonabsorbing states are no greater than their critical loadings. There are only finitely many such possible loadings. So some loading must reoccur. Call this loading L.. Start by loading L in white tokens. By hypothesis, if we continue from this initially loading then we will eventually return to it after some sequence of firings, say f = (f 0,f i,...,f in ), of which m of them are firings of node 0. Let us add red tokens to each node so as to take the loading up to the critical loading. Put just m white tokens at node 0. We now compare the result of two different strategies of play. Strategy I is to follow Engel s algorithm until we run out of tokens in node 0. Strategy II is to make the firing sequence f, using only white tokens. Here we no longer impose the rule that node 0 can fire only if no other node can. When we run out of token in node 0, L reoccurs and so we are back at the critical loading. The proof is finished by proving that from any initial loading, the loading obtained at the end, when the m tokens in node 0 have run out, is independent of any choices that we make along the way about which of several available nodes we might fire. Assuming this is true, then by starting in the critical loading and following Engel s algorithm until tokens in node 0 run out, we end in the same place as Strategy II, which we know ends in the critical loading. So the critical loading reoccurs under Engel s algorithm also. 3. Finally, we prove our claim. Suppose that up to the point that no more firings become possible Strategy I fires sequentially the nodes e,...,e N, and Strategy II fires sequentially the nodes f,...,f M. We now show that these are the same set of firings arranged in different orders, and hence N = M and terminal loadings are identical. Consider firing e. If none of f,...,f M are node e then this sequence f cannot have terminated, because there must be enough tokens in node e to fire it. So some f i = e. Suppose we have proved that e,...,e k are in correspondence with some f i,...,f ik. Consider e k+. If there is a f k+ amongst the f i which occurs in addition to f i,...,f ik, and before these are all carried out, then we can choose to match this to e k+. Suppose there was no such f i. The firing e k+ was made possible by e,...,e k. It must also have been made possible by f i,...,f ik. Any other firings that have been made can only help by adding tokens to node e k+. So there must be at least one more firing subsequent to all of f i,...,f ik that is of node e k+. So again, we can make a match. Inductively, the e i s are contained within the f i s. By that same argument the reverse is true and we conclude that the two sets of firings are the same. 55
60 Index absorbing, 8 absorption probability, 9 almost surely, 6 aperiodic, 34 birth-death chain, 3 Burke s theorem, 46 Chapman-Kolomogorov equations, 3 chip firing game, 53 class property, 8 closed class, 8, 9 communicating classes, 8 convergence to equilibrium, see ergodic theorem coupling, 35 degree, 43 detailed balance, 7, 4 distribution, Doblin, V., 35 Ehrenfest s urn model, 45 electrical networks, 47 Engel s probabilistic abacus, 47, 53 equilibrium distribution, 5, 8 equivalence relation, 8 ergodic chain, 34 ergodic theorem, 37 evening out the gumdrops, 47 expected return time, see mean return time first passage time, 7, 5, 37 gambler s ruin,, 6 harmonic function, 48 hitting time, 9, 7 initial distribution, invariant distribution, 5 33 existence and uniqueness, 9 invariant measure, 5 irreducible, 8 Kemeny s constant, see random target lemma left hand equations, 6 magic trick, 35 Markov chain, simulation of, 3 Markov property, 6 Markov, A. A.,, 5 mean hitting time, 9 mean recurrence times, 39 mean return time, 7, 5, 3 measure, M/M/ queue, 46 n-step transition matrix, 3 null recurrent, 3 number of visits, 7 open class, 8 periodic, 34 Pólya, G., 3 positive recurrent, 3 probabilistic abacus, 47, 53 probability space,, 5 random surfer, 3 random target lemma, 39 random variable, 5 random walk, 0, 33 and electrical networks, of a knight on a chessboard, 44 on a complete graph, 6 on a graph, 43 on a triangle, 4 on {0,,...,N}, 0 on {0,,...}, 56
61 on Z d, 4 Rayleigh s Monotonicity Law, 48 recurrent, 7 regular chain, 34 return probability, 7 reversibility, 4 47 reversible, 43 right hand equations, rth excursion, 38 rth passage time, 38 σ-algebra, 5 σ-field, 5 state, state-space, stationary distribution, 5, 8 steady state distribution, 5 stochastic matrix, stopping time, 5 strong law of large numbers, 37 strong Markov property, 5 success-run chain, 7 transient, 7 transition matrix, two-state Markov chain, 4 valence, 43, 48 57
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
Master s Theory Exam Spring 2006
Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem
IEOR 6711: Stochastic Models, I Fall 2012, Professor Whitt, Final Exam SOLUTIONS
IEOR 6711: Stochastic Models, I Fall 2012, Professor Whitt, Final Exam SOLUTIONS There are four questions, each with several parts. 1. Customers Coming to an Automatic Teller Machine (ATM) (30 points)
1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let
Copyright c 2009 by Karl Sigman 1 Stopping Times 1.1 Stopping Times: Definition Given a stochastic process X = {X n : n 0}, a random time τ is a discrete random variable on the same probability space as
Markov Chains. Chapter 4. 4.1 Stochastic Processes
Chapter 4 Markov Chains 4 Stochastic Processes Often, we need to estimate probabilities using a collection of random variables For example, an actuary may be interested in estimating the probability that
Random access protocols for channel access. Markov chains and their stability. Laurent Massoulié.
Random access protocols for channel access Markov chains and their stability [email protected] Aloha: the first random access protocol for channel access [Abramson, Hawaii 70] Goal: allow machines
9.2 Summation Notation
9. Summation Notation 66 9. Summation Notation In the previous section, we introduced sequences and now we shall present notation and theorems concerning the sum of terms of a sequence. We begin with a
The Characteristic Polynomial
Physics 116A Winter 2011 The Characteristic Polynomial 1 Coefficients of the characteristic polynomial Consider the eigenvalue problem for an n n matrix A, A v = λ v, v 0 (1) The solution to this problem
LECTURE 4. Last time: Lecture outline
LECTURE 4 Last time: Types of convergence Weak Law of Large Numbers Strong Law of Large Numbers Asymptotic Equipartition Property Lecture outline Stochastic processes Markov chains Entropy rate Random
Factorization Theorems
Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization
How to Gamble If You Must
How to Gamble If You Must Kyle Siegrist Department of Mathematical Sciences University of Alabama in Huntsville Abstract In red and black, a player bets, at even stakes, on a sequence of independent games
CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e.
CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e. This chapter contains the beginnings of the most important, and probably the most subtle, notion in mathematical analysis, i.e.,
RESULTANT AND DISCRIMINANT OF POLYNOMIALS
RESULTANT AND DISCRIMINANT OF POLYNOMIALS SVANTE JANSON Abstract. This is a collection of classical results about resultants and discriminants for polynomials, compiled mainly for my own use. All results
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation
Lectures on Stochastic Processes. William G. Faris
Lectures on Stochastic Processes William G. Faris November 8, 2001 2 Contents 1 Random walk 7 1.1 Symmetric simple random walk................... 7 1.2 Simple random walk......................... 9 1.3
Notes on Determinant
ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
Continued Fractions and the Euclidean Algorithm
Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
The Exponential Distribution
21 The Exponential Distribution From Discrete-Time to Continuous-Time: In Chapter 6 of the text we will be considering Markov processes in continuous time. In a sense, we already have a very good understanding
1 Sets and Set Notation.
LINEAR ALGEBRA MATH 27.6 SPRING 23 (COHEN) LECTURE NOTES Sets and Set Notation. Definition (Naive Definition of a Set). A set is any collection of objects, called the elements of that set. We will most
Chapter 4 Lecture Notes
Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,
1 if 1 x 0 1 if 0 x 1
Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or
Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs
CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like
LINEAR ALGEBRA W W L CHEN
LINEAR ALGEBRA W W L CHEN c W W L Chen, 1997, 2008 This chapter is available free to all individuals, on understanding that it is not to be used for financial gain, and may be downloaded and/or photocopied,
3. INNER PRODUCT SPACES
. INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.
Basic Probability Concepts
page 1 Chapter 1 Basic Probability Concepts 1.1 Sample and Event Spaces 1.1.1 Sample Space A probabilistic (or statistical) experiment has the following characteristics: (a) the set of all possible outcomes
Big Data Technology Motivating NoSQL Databases: Computing Page Importance Metrics at Crawl Time
Big Data Technology Motivating NoSQL Databases: Computing Page Importance Metrics at Crawl Time Edward Bortnikov & Ronny Lempel Yahoo! Labs, Haifa Class Outline Link-based page importance measures Why
Chapter 3. Distribution Problems. 3.1 The idea of a distribution. 3.1.1 The twenty-fold way
Chapter 3 Distribution Problems 3.1 The idea of a distribution Many of the problems we solved in Chapter 1 may be thought of as problems of distributing objects (such as pieces of fruit or ping-pong balls)
How To Prove The Dirichlet Unit Theorem
Chapter 6 The Dirichlet Unit Theorem As usual, we will be working in the ring B of algebraic integers of a number field L. Two factorizations of an element of B are regarded as essentially the same if
Math 312 Homework 1 Solutions
Math 31 Homework 1 Solutions Last modified: July 15, 01 This homework is due on Thursday, July 1th, 01 at 1:10pm Please turn it in during class, or in my mailbox in the main math office (next to 4W1) Please
Some Polynomial Theorems. John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA 90405 [email protected].
Some Polynomial Theorems by John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA 90405 [email protected] This paper contains a collection of 31 theorems, lemmas,
Chapter 17. Orthogonal Matrices and Symmetries of Space
Chapter 17. Orthogonal Matrices and Symmetries of Space Take a random matrix, say 1 3 A = 4 5 6, 7 8 9 and compare the lengths of e 1 and Ae 1. The vector e 1 has length 1, while Ae 1 = (1, 4, 7) has length
Solving Systems of Linear Equations
LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how
Wald s Identity. by Jeffery Hein. Dartmouth College, Math 100
Wald s Identity by Jeffery Hein Dartmouth College, Math 100 1. Introduction Given random variables X 1, X 2, X 3,... with common finite mean and a stopping rule τ which may depend upon the given sequence,
1 VECTOR SPACES AND SUBSPACES
1 VECTOR SPACES AND SUBSPACES What is a vector? Many are familiar with the concept of a vector as: Something which has magnitude and direction. an ordered pair or triple. a description for quantities such
Homework set 4 - Solutions
Homework set 4 - Solutions Math 495 Renato Feres Problems R for continuous time Markov chains The sequence of random variables of a Markov chain may represent the states of a random system recorded at
Vectors 2. The METRIC Project, Imperial College. Imperial College of Science Technology and Medicine, 1996.
Vectors 2 The METRIC Project, Imperial College. Imperial College of Science Technology and Medicine, 1996. Launch Mathematica. Type
Analysis of a Production/Inventory System with Multiple Retailers
Analysis of a Production/Inventory System with Multiple Retailers Ann M. Noblesse 1, Robert N. Boute 1,2, Marc R. Lambrecht 1, Benny Van Houdt 3 1 Research Center for Operations Management, University
PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include 2 + 5.
PUTNAM TRAINING POLYNOMIALS (Last updated: November 17, 2015) Remark. This is a list of exercises on polynomials. Miguel A. Lerma Exercises 1. Find a polynomial with integral coefficients whose zeros include
Direct Methods for Solving Linear Systems. Matrix Factorization
Direct Methods for Solving Linear Systems Matrix Factorization Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011
Mathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson
Mathematics for Computer Science/Software Engineering Notes for the course MSM1F3 Dr. R. A. Wilson October 1996 Chapter 1 Logic Lecture no. 1. We introduce the concept of a proposition, which is a statement
(67902) Topics in Theory and Complexity Nov 2, 2006. Lecture 7
(67902) Topics in Theory and Complexity Nov 2, 2006 Lecturer: Irit Dinur Lecture 7 Scribe: Rani Lekach 1 Lecture overview This Lecture consists of two parts In the first part we will refresh the definition
CONTROLLABILITY. Chapter 2. 2.1 Reachable Set and Controllability. Suppose we have a linear system described by the state equation
Chapter 2 CONTROLLABILITY 2 Reachable Set and Controllability Suppose we have a linear system described by the state equation ẋ Ax + Bu (2) x() x Consider the following problem For a given vector x in
Math 181 Handout 16. Rich Schwartz. March 9, 2010
Math 8 Handout 6 Rich Schwartz March 9, 200 The purpose of this handout is to describe continued fractions and their connection to hyperbolic geometry. The Gauss Map Given any x (0, ) we define γ(x) =
Probability Generating Functions
page 39 Chapter 3 Probability Generating Functions 3 Preamble: Generating Functions Generating functions are widely used in mathematics, and play an important role in probability theory Consider a sequence
Random variables, probability distributions, binomial random variable
Week 4 lecture notes. WEEK 4 page 1 Random variables, probability distributions, binomial random variable Eample 1 : Consider the eperiment of flipping a fair coin three times. The number of tails that
CONTINUED FRACTIONS AND FACTORING. Niels Lauritzen
CONTINUED FRACTIONS AND FACTORING Niels Lauritzen ii NIELS LAURITZEN DEPARTMENT OF MATHEMATICAL SCIENCES UNIVERSITY OF AARHUS, DENMARK EMAIL: [email protected] URL: http://home.imf.au.dk/niels/ Contents
SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH
31 Kragujevac J. Math. 25 (2003) 31 49. SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH Kinkar Ch. Das Department of Mathematics, Indian Institute of Technology, Kharagpur 721302, W.B.,
LOOKING FOR A GOOD TIME TO BET
LOOKING FOR A GOOD TIME TO BET LAURENT SERLET Abstract. Suppose that the cards of a well shuffled deck of cards are turned up one after another. At any time-but once only- you may bet that the next card
1 Gambler s Ruin Problem
Coyright c 2009 by Karl Sigman 1 Gambler s Ruin Problem Let N 2 be an integer and let 1 i N 1. Consider a gambler who starts with an initial fortune of $i and then on each successive gamble either wins
NOTES ON LINEAR TRANSFORMATIONS
NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all
Stationary random graphs on Z with prescribed iid degrees and finite mean connections
Stationary random graphs on Z with prescribed iid degrees and finite mean connections Maria Deijfen Johan Jonasson February 2006 Abstract Let F be a probability distribution with support on the non-negative
Section 6.1 - Inner Products and Norms
Section 6.1 - Inner Products and Norms Definition. Let V be a vector space over F {R, C}. An inner product on V is a function that assigns, to every ordered pair of vectors x and y in V, a scalar in F,
MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.
MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column
9 MATRICES AND TRANSFORMATIONS
9 MATRICES AND TRANSFORMATIONS Chapter 9 Matrices and Transformations Objectives After studying this chapter you should be able to handle matrix (and vector) algebra with confidence, and understand the
Mathematics Course 111: Algebra I Part IV: Vector Spaces
Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are
Chapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm.
Chapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm. We begin by defining the ring of polynomials with coefficients in a ring R. After some preliminary results, we specialize
WHERE DOES THE 10% CONDITION COME FROM?
1 WHERE DOES THE 10% CONDITION COME FROM? The text has mentioned The 10% Condition (at least) twice so far: p. 407 Bernoulli trials must be independent. If that assumption is violated, it is still okay
Linear Programming I
Linear Programming I November 30, 2003 1 Introduction In the VCR/guns/nuclear bombs/napkins/star wars/professors/butter/mice problem, the benevolent dictator, Bigus Piguinus, of south Antarctica penguins
GENERATING SETS KEITH CONRAD
GENERATING SETS KEITH CONRAD 1 Introduction In R n, every vector can be written as a unique linear combination of the standard basis e 1,, e n A notion weaker than a basis is a spanning set: a set of vectors
Essentials of Stochastic Processes
i Essentials of Stochastic Processes Rick Durrett 70 60 50 10 Sep 10 Jun 10 May at expiry 40 30 20 10 0 500 520 540 560 580 600 620 640 660 680 700 Almost Final Version of the 2nd Edition, December, 2011
MATH 10034 Fundamental Mathematics IV
MATH 0034 Fundamental Mathematics IV http://www.math.kent.edu/ebooks/0034/funmath4.pdf Department of Mathematical Sciences Kent State University January 2, 2009 ii Contents To the Instructor v Polynomials.
MATH 4330/5330, Fourier Analysis Section 11, The Discrete Fourier Transform
MATH 433/533, Fourier Analysis Section 11, The Discrete Fourier Transform Now, instead of considering functions defined on a continuous domain, like the interval [, 1) or the whole real line R, we wish
Approximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
minimal polyonomial Example
Minimal Polynomials Definition Let α be an element in GF(p e ). We call the monic polynomial of smallest degree which has coefficients in GF(p) and α as a root, the minimal polyonomial of α. Example: We
6.042/18.062J Mathematics for Computer Science December 12, 2006 Tom Leighton and Ronitt Rubinfeld. Random Walks
6.042/8.062J Mathematics for Comuter Science December 2, 2006 Tom Leighton and Ronitt Rubinfeld Lecture Notes Random Walks Gambler s Ruin Today we re going to talk about one-dimensional random walks. In
DETERMINANTS IN THE KRONECKER PRODUCT OF MATRICES: THE INCIDENCE MATRIX OF A COMPLETE GRAPH
DETERMINANTS IN THE KRONECKER PRODUCT OF MATRICES: THE INCIDENCE MATRIX OF A COMPLETE GRAPH CHRISTOPHER RH HANUSA AND THOMAS ZASLAVSKY Abstract We investigate the least common multiple of all subdeterminants,
Inner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
4: SINGLE-PERIOD MARKET MODELS
4: SINGLE-PERIOD MARKET MODELS Ben Goldys and Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2015 B. Goldys and M. Rutkowski (USydney) Slides 4: Single-Period Market
Markov random fields and Gibbs measures
Chapter Markov random fields and Gibbs measures 1. Conditional independence Suppose X i is a random element of (X i, B i ), for i = 1, 2, 3, with all X i defined on the same probability space (.F, P).
7 Gaussian Elimination and LU Factorization
7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method
Matrix Calculations: Applications of Eigenvalues and Eigenvectors; Inner Products
Matrix Calculations: Applications of Eigenvalues and Eigenvectors; Inner Products H. Geuvers Institute for Computing and Information Sciences Intelligent Systems Version: spring 2015 H. Geuvers Version:
SUBGROUPS OF CYCLIC GROUPS. 1. Introduction In a group G, we denote the (cyclic) group of powers of some g G by
SUBGROUPS OF CYCLIC GROUPS KEITH CONRAD 1. Introduction In a group G, we denote the (cyclic) group of powers of some g G by g = {g k : k Z}. If G = g, then G itself is cyclic, with g as a generator. Examples
Similarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
A linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form
Section 1.3 Matrix Products A linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form (scalar #1)(quantity #1) + (scalar #2)(quantity #2) +...
1 Introduction to Matrices
1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns
Chapter 2 Discrete-Time Markov Models
Chapter Discrete-Time Markov Models.1 Discrete-Time Markov Chains Consider a system that is observed at times 0;1;;:::: Let X n be the state of the system at time n for n D 0;1;;:::: Suppose we are currently
x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.
Cross product 1 Chapter 7 Cross product We are getting ready to study integration in several variables. Until now we have been doing only differential calculus. One outcome of this study will be our ability
A Second Course in Mathematics Concepts for Elementary Teachers: Theory, Problems, and Solutions
A Second Course in Mathematics Concepts for Elementary Teachers: Theory, Problems, and Solutions Marcel B. Finan Arkansas Tech University c All Rights Reserved First Draft February 8, 2006 1 Contents 25
CONTINUED FRACTIONS AND PELL S EQUATION. Contents 1. Continued Fractions 1 2. Solution to Pell s Equation 9 References 12
CONTINUED FRACTIONS AND PELL S EQUATION SEUNG HYUN YANG Abstract. In this REU paper, I will use some important characteristics of continued fractions to give the complete set of solutions to Pell s equation.
Adding vectors We can do arithmetic with vectors. We ll start with vector addition and related operations. Suppose you have two vectors
1 Chapter 13. VECTORS IN THREE DIMENSIONAL SPACE Let s begin with some names and notation for things: R is the set (collection) of real numbers. We write x R to mean that x is a real number. A real number
Catalan Numbers. Thomas A. Dowling, Department of Mathematics, Ohio State Uni- versity.
7 Catalan Numbers Thomas A. Dowling, Department of Mathematics, Ohio State Uni- Author: versity. Prerequisites: The prerequisites for this chapter are recursive definitions, basic counting principles,
Statistics 100A Homework 4 Solutions
Chapter 4 Statistics 00A Homework 4 Solutions Ryan Rosario 39. A ball is drawn from an urn containing 3 white and 3 black balls. After the ball is drawn, it is then replaced and another ball is drawn.
Systems of Linear Equations
Systems of Linear Equations Beifang Chen Systems of linear equations Linear systems A linear equation in variables x, x,, x n is an equation of the form a x + a x + + a n x n = b, where a, a,, a n and
11 Ideals. 11.1 Revisiting Z
11 Ideals The presentation here is somewhat different than the text. In particular, the sections do not match up. We have seen issues with the failure of unique factorization already, e.g., Z[ 5] = O Q(
Numerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number
T ( a i x i ) = a i T (x i ).
Chapter 2 Defn 1. (p. 65) Let V and W be vector spaces (over F ). We call a function T : V W a linear transformation form V to W if, for all x, y V and c F, we have (a) T (x + y) = T (x) + T (y) and (b)
The Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009. Notes on Algebra
U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009 Notes on Algebra These notes contain as little theory as possible, and most results are stated without proof. Any introductory
1 = (a 0 + b 0 α) 2 + + (a m 1 + b m 1 α) 2. for certain elements a 0,..., a m 1, b 0,..., b m 1 of F. Multiplying out, we obtain
Notes on real-closed fields These notes develop the algebraic background needed to understand the model theory of real-closed fields. To understand these notes, a standard graduate course in algebra is
NOV - 30211/II. 1. Let f(z) = sin z, z C. Then f(z) : 3. Let the sequence {a n } be given. (A) is bounded in the complex plane
Mathematical Sciences Paper II Time Allowed : 75 Minutes] [Maximum Marks : 100 Note : This Paper contains Fifty (50) multiple choice questions. Each question carries Two () marks. Attempt All questions.
Lecture 4: BK inequality 27th August and 6th September, 2007
CSL866: Percolation and Random Graphs IIT Delhi Amitabha Bagchi Scribe: Arindam Pal Lecture 4: BK inequality 27th August and 6th September, 2007 4. Preliminaries The FKG inequality allows us to lower bound
1 Local Brouwer degree
1 Local Brouwer degree Let D R n be an open set and f : S R n be continuous, D S and c R n. Suppose that the set f 1 (c) D is compact. (1) Then the local Brouwer degree of f at c in the set D is defined.
THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS
THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS KEITH CONRAD 1. Introduction The Fundamental Theorem of Algebra says every nonconstant polynomial with complex coefficients can be factored into linear
Goal Problems in Gambling and Game Theory. Bill Sudderth. School of Statistics University of Minnesota
Goal Problems in Gambling and Game Theory Bill Sudderth School of Statistics University of Minnesota 1 Three problems Maximizing the probability of reaching a goal. Maximizing the probability of reaching
CITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION
No: CITY UNIVERSITY LONDON BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION ENGINEERING MATHEMATICS 2 (resit) EX2005 Date: August
Mathematics Review for MS Finance Students
Mathematics Review for MS Finance Students Anthony M. Marino Department of Finance and Business Economics Marshall School of Business Lecture 1: Introductory Material Sets The Real Number System Functions,
