Lecture 2: Absorbing states in Markov chains. Mean time to absorption. Wright-Fisher Model. Moran Model.

Lecture 2: Absorbng states n Markov chans. Mean tme to absorpton. Wrght-Fsher Model. Moran Model. Antonna Mtrofanova, NYU, department of Computer Scence December 8, 2007 Hgher Order Transton Probabltes Very often we are nterested n a probablty of gong from state to state n n steps, whch we denote as p (n). For example, the probablty of gong from the state to state n two steps s: p (2) k p k p k where k s the set of all possble states. In other words t conssts of probabltes of gong from state to any other possble state (n one step) and then gong from that step to. Interestngly, the probablty p (2) corresponds to (, ) th entry n the matrx P 2 P P Smlarly, gong from to n n steps s defned as p (n) k p k p (n ) k k p (n ) k p k and correspond to the (, ) th entry n P n matrx (therefore we can compute transton probabltes by takng matrx powers). We defne the n-step transton probabltes for the Markov Chan by then Snce we know that P [X n X 0 ] P [X n+ k,... X n+m k m X n ] P [X k,..., X m k m X 0 ] P [X n X 0 ] p[x n+m X m ] In other words, the probablty that a path started at and ended at does not depend on the tme at whch t s ntated. Therefore p (n) P [X n X 0 ] We can also wrte p (n+m) p (n) k p(m) k k whch s also called Chapman-Kolmogorov equaton. It follows that we can compute a uncondtonal probablty (of X n takng a value of )as P [X n ] p (n) p p (n) Frequently we are nterested n the tme the system goes from some ntal state to some termnal crtcal state, called absorbng state.

2 Absorpton Probabltes The state s called absorbng f p. In other words, once the system hts state, t stays there forever not beng able to escape. The most nterestng absorbng states whch arse n populaton genetcs are at 0 and at M. Let us assume the populaton of haplods, each havng ether allele A or allele A 2. In ths case M. And let X be a random varable whch descrbes the frequency of allele A n a gven populaton. In populaton genetcs, t s of nterest to fnd out how fast the allele wll go to ether absorpton state ( 0 or ) gven that the populaton started wth alleles A. Many functonals (ncludng absorpton probabltes) on Markov Chan are evaluated by a technque called frst step analyss. Ths method proceeds by the analyzng the possbltes that can arse at the end of the frst transton. Let us now fx k as absorbng state. The probablty of absorpton n ths state depends on the ntal state X 0. Let us defne U k as the probablty of eventually reachng state k gven that we started n state. Frst possblty would be to go from state X 0 to X k mmedately, whch s descrbed by p k. However, f the state k s not entered at X, then we must go to some other state k. Once we enter state, the probablty of ultmate absorpton n k s U k by defnton. Weghtng all the possbltes gves U k p k + M 0& k p U k snce U kk. If we wrte u as the probablty of absorpton (say, n ), then we wrte 3 Mean tme untl absorpton u M p U k (.) 0 M p u (.2) 0 It s more dffcult to assess the propertes of the (random) tme untl absorpton. However, t s common to evaluate the mean tme untl X reaches 0 or startng from (t s called mean absorpton tme). We wsh to make a calculaton of a mean tme untl absorpton for a general startng pont. We now ntroduce a concept, whch s central n calculatng the mean absorpton tme: Let us observe that startng from the system wll vst state some number of tmes before absorpton. Ths fact t true for all (except 0 and ). Therefore, f we know the number of tmes the system vsts state (for all ) before absorpton, then we can obtan an average tme untl absorpton by summng up over average tmes the system s n a specfc state, for each state. Let us now formally defne mean number of tmes that X takes the value before absorbton n 0 or (gven that t started n ) as { t }. Then the mean tme to absorpton gven that we started at state s the sum: t We wll proceed by a frst step analyss: f the system starts at and then proceeds to k at ts next step, then the mean number of vsts to state pror to absorpton startng from state k now s { t k } by defnton. However, observe that n ths case, we mss the case when. Therefore, there s a need to defne an addtonal varable δ f, and 0 otherwse. As a result, weghtng by the probablty of gong to a state k at the frst step, we obtan t t M p k t k + δ, t 0 t M 0 By summng up the meant tmes the system s n state for all, we obtan 2

t By defnton t t so that t M p k t k + δ M p k t k +, t 0 t M 0 Observe that snce δ wll equal to only once (when ), then δ n the above equaton. Now, we wll dscuss two most mportant models n populaton genetcs, whch descrbe the evoluton of allele frequences: Wrght-Fsher model and Moran model. 4 Wrght-Fsher Model Let us assume a smple haplod model of the populaton of genes (or alternatvely N dplod organsms) of random reproducton, wth each haplod possessng ether allele A or allele A 2. Let us, for start, dsregard mutaton pressures and selectve forces. At every tme-step, each gene (allele) gves brth to some number of offsprngs (whch are the exact copes of hmself) and des mmedately after that, thus lvng only one generaton. Ths process descrbes how the genes get transmtted from one generaton to the next. However, the processed of brth and death wll have to reman unseen behnd the curtan for a moment. Instead, we wll only observe how the frequency of alleles wll change from generaton to generaton. We wll fx our attenton at frequency of allele A n the populaton of haplods. Let us thnk of ths process of gong from one generaton to the other as a Markov Chan, where the state X of the chan corresponds to the number of haplods (genes) of type A. Clearly, n any generaton X takes one of the values 0,,...,, whch consttutes a state space. We wll denote the value taken by X n generaton t as X t. The model assumes that genes for the generaton t + are derved by samplng wth replacement from the genes of generaton t. Thus, the make up of the next generaton s determned by ndependent Bernoull trals so that X t s a bnomal random varable. Let the ntal generaton consst of genes of type A and genes of type A 2. Then we defne a probablty of success (resultng n allele A ) p and a probablty of falure q (resultng n allele A 2 ) for each Bernoull tral as p q We generate a Markov Chan {X n }, where X n s the number of A genes n the n th generaton, among a constant populaton sze of ndvduals. Bascally, X t+ s a bnomal random varable wth ndex and parameter (probablty of success) X t /. Observe that the transton probabltes from X t to X t+ for ths Markov Chan are computed accordng to the bnomal dstrbuton as P (X t+ X t ) p ( ) p q ( ) (/) { (/)} Observe now that states 0 and are completely absorbng. In other words, no matter what the value of X 0 s, eventually X t wll take the value 0 or. And once ths happens, X wll stay n that state forever. In the case of X t 0, the populaton wll consst only of A 2 genes, whle n the case of X t the populaton wll be purely A -gene populaton. In ths model, the allele frequency of the next generaton s manpulated manly by a genetc drft (a Genetc drft can be defned as a force that reduces heterozygosty by the random loss of alleles). 3

4. Absorpton probablty n Wrght-Fsher model Let us dscuss absorpton probabltes n Wrght-Fsher model. In fact, the populaton can attan fxaton and be composed of only A -genes (X t ) or A 2 -genes (X t 0). It can be shown that wth probablty one, ether of the absorbng states (ether 0 or ) s eventually entered (and ths s true for both Wrght-Fsher and Moran models). And therefore lm t P (X t ) 0. We wll dscuss two cases of absorpton: at 0 and at. 4.. Absorpton at zero: We wrte the probablty of extncton (absorpton at 0) of a gene gven that t started wth copes as lm n P (X n 0 X 0 ) Let us fnd the probablty of absorbng n state 0 by usng E(X n ). We express E(X n ) by usng the expectaton of a condtonal expectaton E(X n ) E[E(X n X n )] E(X n ) E(X n 2 ) E(X 0 ) Ths property s called the constancy of expectaton (and s also true for Moran model). Further we can wrte E(X n ) as E(X n ) 0 u,0 + ( u,0 ) Now, snce lm n E(X n ), 0 u,0 + ( u,0 ) and therefore u,0 Observe that we gnore P (X t ) snce they are equal to zero as n goes to nfnty. 4..2 Absorpton at : We want to calculate probablty that A eventually becomes fxed n the populaton (absorpton at )and follow a smlar argument: 0 ( u, ) + u, so that u, A dfferent argument (whch s more relevant to a genetcal pont of vew) s that eventually every gene n the populaton s descended from the unque gene n generaton zero. The probablty that such a gene (allele) s A s smply the ntal fracton of A alleles, namely /, and ths also must be a fxaton probablty of allele A. 4..3 Absorpton startng wth one A allele Suppose that n a populaton of pure A 2 alleles a sngle new mutant A allele (gene) arses. There are no more new mutatons and therefore we can assume that we start wth a populaton wth one A allele and A 2 alleles. Accordng to the prevous result, the probablty of fxaton for ths allele s u,. On the other hand, the probablty that the allele s lost s /. 4

4.2 Mean tme untl absorpton n Wrght-Fsher model The calculaton of the mean tme untl absorpton for the Wrght-Fsher model s very computatonally expensve. Therefore, t s useful to approxmate ths quantty, whch wll be descrbed n the next secton. However, t s relatvely smple to derve the mean tme untl absorpton startng wth one allele of type A (before the mutant s lost or before the mutant s fxed). For ths calculaton we use the same analyss as before. We wll use the expected number of vsts to a state along the path to absorpton, startng from state X 0. We denote the mean number of generatons to absorpton n 0 or, gven that we started wth one allele A, as t. We need to sum up the expected number of such vsts for all, avodng states 0 and : t where t, s the mean number of tmes when the number of A alleles takes the value of (system s n state ) before reachng ether 0 or. Both Fsher and Wrght found that t, t, 2 startng at. Snce N log(n) + γ where γ s a Euler s constant (0.5772...), we derve t t, 2 2 2(log( ) + γ) 4.3 Approxmatng Mean Tme untl Absorpton Even though n prncple we can fnd solutons of the mean tme to absorpton for a general, n practce these solutons seem extremely dffcult, and smple expressons for these mean tmes have not yet been found. Let us now present a smple approxmaton for t. We agan apply the frst step analyss, where we start from state and n the frst step attend some ntermedate step k. We defne M, /M x, k/m x + δx, and t t(x). Then we can rewrte the equaton as M t p k t k + t(x) P {x x + δx} t(x + δx) + () E{ t(x + δx)} + (2) Now assumng that t(x) s a twce dfferentable functon of a contnuous varable x, we can use Taylor seres to approxmate the above quantty. The Taylor seres states that f(y) n0 f(a) 0! f (n) (a) (y a) n (4) n! (y a) 0 + {f(a)}! (3) (y a) + {f(a)} (y a) 2 +... (5) 2! f(a) + {f(a)} (y a) + 2 {f(a)} (y a) 2 +..., (6) 5

We re-wrte t(x) by applyng Taylor s seres (showng three leadng terms): t(x) E{ t(x + δx)} + (7) t(x) + E(δx){ t(x)} + 2 E(δx)2 { t(x)} +, (8) All expectatons are condtonal on x. Usng the fact that the expectaton of the bnomal random varable s E(Y ) np, we can rewrte then E(x + δx) E(/M) E() M M M M M, where p M. We can as well use the fact establshed before that E(X n X n ). At the same tme x M and E(x) x; therefore E(δx) 0. As a result the term E(δx){ t(x)} 0. Let us calculate E(δx) 2. In our case, the E(δx) 2 V ar(δx) snce V ar(δx) E(δx) 2 [E(δx)] 2 and [E(δx)] 2 0, as shown from the prevous result. The varance of the bnomal random varable (9). The above gves V ar(x + δx) V ar(/) V ar() 4N 2 ) ( 4N 2 x( x) t(x) t(x) + x( x) { t(x)} + 2 x( x) { t(x)} 2 4N x( x){ t(x)} The soluton to ths equaton, subect to the boundary condtons t(0) t() 0 s t(x) 4N (0) x( x) 4N () x( x) ( 4N x + ) (2) x ( ) 4N x + (3) x 4N (ln(x) + ln( x)) (4) { } 4N ln(x) + ln( x) (5) 4N {(xln(x) x) + (( x)ln( x) ( x))} (6) 4N{xlogx + ( x)log( x)} (7) where x, the ntal frequency of allele A. Ths s also s called dffuson approxmaton to the mean absorpton tme. When we ntally start wth one A allele x, then the mean tme to absorpton s 6

( ) t 4N{ ( ) log + ( ( )log ) } (8) 2 + 2log (9) (20) At the same tme, when x 2, then t( 2 ) 2.8N Observe that for equal ntal frequences (x 2 ), the mean tme s relatvely long. 5 Moran model In each generaton of the Moran model, one gene s chosen at random to gve 2 offsprngs and one gene s chosen to de (all other genes survve to the next generaton). As opposed to Wrght-Fsher model, Moran model has overlappng generatons. Ths model s also known as a brth-and-death model. We stll consder a constant populaton sze of haplods, each of whch has ether allele A or allele A 2. Let us (for now) gnore mutaton or selecton pressures. Agan, we defne X to be a random varable, whch represents the number of alleles of type A n the populaton. It s of nterest to calculate transton probabltes for the mpled Markov chan. Suppose that n populaton t (whch corresponds to state X t n underlyng Markov chan) the number of alleles A s. Then n populaton t +, the number of alleles A can be ether ( ), ( + ), or. The system can go from to f A 2 ndvdual s chosen to gve 2 offsprngs and A ndvdual s chosen to de: ( ) ( ) p, To go from to +, the opposte should be true: A 2 s chosen to de and A s chosen to reproduce: ( ) ( ) p,+ and for gong from to, t takes ether A to reproduce and de or A 2 to reproduce and de: ( p, ) ( + ) 2 + ( ) 2 () 2 Observe that p 0 for all other values of snce t s mpossble to make other transtons. 5. Propertes of a Contnuant matrx n Moran model. In the case of Moran model, the transton probablty matrx s Contnuant, whch means that p 0 ff >. Now we can apply the standard Contnuant matrx theory to our model so that the probablty of fxaton and mean tme to absorpton can be found explctly. In partcular, we can use a brth-and-death process concepts to calculate the desred quanttes. The brthdeath process s a specal case of Contnuous-tme Markov process where the states represent the current sze of a populaton and where the transtons are lmted to brths and deaths. When a brth occurs, the state goes to state +, defned by the brth rate p,+ λ. When the death occurs, the process goes from state to state, defned by the death rate p, µ. For now, we wll ust use facts from the theory of brth-death processes, wthout provng them; however, we would lke to return to formal defntons and proofs n one of the future lectures. We defne ρ µ µ 2... µ λ λ 2... λ 7

and ρ 0. If 0 and M are both absorbng states, then the probablty of absorpton n ether of them becomes u / Proceedng further wth the above argument, we can calculate the mean number of tmes the system s n state gven that t started n state as from ths we can derve t t t ( u ) ρ µ, (,..., ) t u t k, ( +,..., M ) ρ λ 5.2 Probablty of fxaton n Moran model Let us now observe that n the Moran model so that ( u ) ρ k + ρ µ + λ µ ( )/() 2 ρ µ λ µ 2 λ 2... µ λ u k ρ λ for 0,,...,. It can be shown that, smlarly to Wrght-Fsher model, E(X t ). Followng all of the above, the probablty of fxaton (gven that we started wth coped of A ) s gven that M. u 5.3 Expected absorpton tme Usng the fact that ρ, we can derve the followng for,..., and for +,..., M t ( u ) ρ µ (2) ( ) ( ) ( ) ( ) (22) (23) (24) 8

t u k (25) ρ λ k Now we can calculate the expected tme to absorpton as t ( ) ( ) (26) (27) (28) (29) t (30) ( u ) ρ k + ρ µ + ( ) ( ) + + u k (3) ρ λ + + It s possble to condton on the fact that, say A eventually fxes. In ths case, t s not dffcult to derve smpler expressons for mean absorpton tmes. Such dervatons wll be dscussed n the future lectures. (32) (33) References [] Warren Ewens, mathematcal Populaton Genetcs, Second edton, 2004, pp 20-23, 86-9, 92-99, 04-09. [2] Howard Taylor and Samuel Karln, An Introducton to Stochastc Modelng, Thrd edton, 998, pp 95-38. [3] Sdney Resnck, Adventures n Stochastc Processes, 992, pp72-6. 9