Lecture 2: Absorbing states in Markov chains. Mean time to absorption. Wright-Fisher Model. Moran Model.

Similar documents
Recurrence. 1 Definitions and main statements

Calculation of Sampling Weights

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

The OC Curve of Attribute Acceptance Plans

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

What is Candidate Sampling

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

Lecture 3: Force of Interest, Real Interest Rate, Annuity

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

1 Example 1: Axis-aligned rectangles

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

HÜCKEL MOLECULAR ORBITAL THEORY

Extending Probabilistic Dynamic Epistemic Logic

PERRON FROBENIUS THEOREM

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

Implementation of Deutsch's Algorithm Using Mathcad

An Alternative Way to Measure Private Equity Performance

This circuit than can be reduced to a planar circuit

Section 5.4 Annuities, Present Value, and Amortization

How To Calculate The Accountng Perod Of Nequalty

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

Lecture 3: Annuity. Study annuities whose payments form a geometric progression or a arithmetic progression.

Stochastic epidemic models revisited: Analysis of some continuous performance measures

Ring structure of splines on triangulations

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

The Cox-Ross-Rubinstein Option Pricing Model

BERNSTEIN POLYNOMIALS

Joe Pimbley, unpublished, Yield Curve Calculations

We are now ready to answer the question: What are the possible cardinalities for finite fields?

Hedging Interest-Rate Risk with Duration

Level Annuities with Payments Less Frequent than Each Interest Period

Implied (risk neutral) probabilities, betting odds and prediction markets

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

1. Measuring association using correlation and regression

12 Evolutionary Dynamics

RELIABILITY, RISK AND AVAILABILITY ANLYSIS OF A CONTAINER GANTRY CRANE ABSTRACT

Finite Math Chapter 10: Study Guide and Solution to Problems

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems

An Interest-Oriented Network Evolution Mechanism for Online Communities

SUPPLIER FINANCING AND STOCK MANAGEMENT. A JOINT VIEW.

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

Support Vector Machines

DEFINING %COMPLETE IN MICROSOFT PROJECT

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

In our example i = r/12 =.0825/12 At the end of the first month after your payment is received your amount in the account, the balance, is

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Solution: Let i = 10% and d = 5%. By definition, the respective forces of interest on funds A and B are. i 1 + it. S A (t) = d (1 dt) 2 1. = d 1 dt.

A Lyapunov Optimization Approach to Repeated Stochastic Games

The Application of Fractional Brownian Motion in Option Pricing

Sketching Sampled Data Streams

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

SCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS

A Performance Analysis of View Maintenance Techniques for Data Warehouses

NMT EE 589 & UNM ME 482/582 ROBOT ENGINEERING. Dr. Stephen Bruder NMT EE 589 & UNM ME 482/582

Interest Rate Fundamentals

Methods for Calculating Life Insurance Rates

GENETIC ALGORITHM FOR PROJECT SCHEDULING AND RESOURCE ALLOCATION UNDER UNCERTAINTY

Texas Instruments 30X IIS Calculator

Analysis of Energy-Conserving Access Protocols for Wireless Identification Networks

Addendum to: Importing Skill-Biased Technology

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT

MAC Layer Service Time Distribution of a Fixed Priority Real Time Scheduler over

The University of Texas at Austin. Austin, Texas December Abstract. programs in which operations of dierent processes mayoverlap.

Testing and Debugging Resource Allocation for Fault Detection and Removal Process

Fisher Markets and Convex Programs

CALL ADMISSION CONTROL IN WIRELESS MULTIMEDIA NETWORKS

Financial Mathemetics

Rate Monotonic (RM) Disadvantages of cyclic. TDDB47 Real Time Systems. Lecture 2: RM & EDF. Priority-based scheduling. States of a process

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Optimal outpatient appointment scheduling

Quality of Service Analysis and Control for Wireless Sensor Networks

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

Section 2 Introduction to Statistical Mechanics

1 De nitions and Censoring

Lecture 2: Single Layer Perceptrons Kevin Swingler

Software project management with GAs

Portfolio Loss Distribution

CHAPTER 14 MORE ABOUT REGRESSION

Using Series to Analyze Financial Situations: Present Value

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35, , ,200,000 60, ,000

A Probabilistic Theory of Coherence

FINANCIAL MATHEMATICS. A Practical Guide for Actuaries. and other Business Professionals

substances (among other variables as well). ( ) Thus the change in volume of a mixture can be written as

Epidemics in heterogeneous communities: estimation of R 0 and secure vaccination coverage

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

where the coordinates are related to those in the old frame as follows.

Simple Interest Loans (Section 5.1) :

7.5. Present Value of an Annuity. Investigate

Faraday's Law of Induction

Mean Molecular Weight

Efficient Project Portfolio as a tool for Enterprise Risk Management

Project Networks With Mixed-Time Constraints

8 Algorithm for Binary Searching in Trees

A Model of Private Equity Fund Compensation

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

Transcription:

Lecture 2: Absorbng states n Markov chans. Mean tme to absorpton. Wrght-Fsher Model. Moran Model. Antonna Mtrofanova, NYU, department of Computer Scence December 8, 2007 Hgher Order Transton Probabltes Very often we are nterested n a probablty of gong from state to state n n steps, whch we denote as p (n). For example, the probablty of gong from the state to state n two steps s: p (2) k p k p k where k s the set of all possble states. In other words t conssts of probabltes of gong from state to any other possble state (n one step) and then gong from that step to. Interestngly, the probablty p (2) corresponds to (, ) th entry n the matrx P 2 P P Smlarly, gong from to n n steps s defned as p (n) k p k p (n ) k k p (n ) k p k and correspond to the (, ) th entry n P n matrx (therefore we can compute transton probabltes by takng matrx powers). We defne the n-step transton probabltes for the Markov Chan by then Snce we know that P [X n X 0 ] P [X n+ k,... X n+m k m X n ] P [X k,..., X m k m X 0 ] P [X n X 0 ] p[x n+m X m ] In other words, the probablty that a path started at and ended at does not depend on the tme at whch t s ntated. Therefore p (n) P [X n X 0 ] We can also wrte p (n+m) p (n) k p(m) k k whch s also called Chapman-Kolmogorov equaton. It follows that we can compute a uncondtonal probablty (of X n takng a value of )as P [X n ] p (n) p p (n) Frequently we are nterested n the tme the system goes from some ntal state to some termnal crtcal state, called absorbng state.

2 Absorpton Probabltes The state s called absorbng f p. In other words, once the system hts state, t stays there forever not beng able to escape. The most nterestng absorbng states whch arse n populaton genetcs are at 0 and at M. Let us assume the populaton of haplods, each havng ether allele A or allele A 2. In ths case M. And let X be a random varable whch descrbes the frequency of allele A n a gven populaton. In populaton genetcs, t s of nterest to fnd out how fast the allele wll go to ether absorpton state ( 0 or ) gven that the populaton started wth alleles A. Many functonals (ncludng absorpton probabltes) on Markov Chan are evaluated by a technque called frst step analyss. Ths method proceeds by the analyzng the possbltes that can arse at the end of the frst transton. Let us now fx k as absorbng state. The probablty of absorpton n ths state depends on the ntal state X 0. Let us defne U k as the probablty of eventually reachng state k gven that we started n state. Frst possblty would be to go from state X 0 to X k mmedately, whch s descrbed by p k. However, f the state k s not entered at X, then we must go to some other state k. Once we enter state, the probablty of ultmate absorpton n k s U k by defnton. Weghtng all the possbltes gves U k p k + M 0& k p U k snce U kk. If we wrte u as the probablty of absorpton (say, n ), then we wrte 3 Mean tme untl absorpton u M p U k (.) 0 M p u (.2) 0 It s more dffcult to assess the propertes of the (random) tme untl absorpton. However, t s common to evaluate the mean tme untl X reaches 0 or startng from (t s called mean absorpton tme). We wsh to make a calculaton of a mean tme untl absorpton for a general startng pont. We now ntroduce a concept, whch s central n calculatng the mean absorpton tme: Let us observe that startng from the system wll vst state some number of tmes before absorpton. Ths fact t true for all (except 0 and ). Therefore, f we know the number of tmes the system vsts state (for all ) before absorpton, then we can obtan an average tme untl absorpton by summng up over average tmes the system s n a specfc state, for each state. Let us now formally defne mean number of tmes that X takes the value before absorbton n 0 or (gven that t started n ) as { t }. Then the mean tme to absorpton gven that we started at state s the sum: t We wll proceed by a frst step analyss: f the system starts at and then proceeds to k at ts next step, then the mean number of vsts to state pror to absorpton startng from state k now s { t k } by defnton. However, observe that n ths case, we mss the case when. Therefore, there s a need to defne an addtonal varable δ f, and 0 otherwse. As a result, weghtng by the probablty of gong to a state k at the frst step, we obtan t t M p k t k + δ, t 0 t M 0 By summng up the meant tmes the system s n state for all, we obtan 2

t By defnton t t so that t M p k t k + δ M p k t k +, t 0 t M 0 Observe that snce δ wll equal to only once (when ), then δ n the above equaton. Now, we wll dscuss two most mportant models n populaton genetcs, whch descrbe the evoluton of allele frequences: Wrght-Fsher model and Moran model. 4 Wrght-Fsher Model Let us assume a smple haplod model of the populaton of genes (or alternatvely N dplod organsms) of random reproducton, wth each haplod possessng ether allele A or allele A 2. Let us, for start, dsregard mutaton pressures and selectve forces. At every tme-step, each gene (allele) gves brth to some number of offsprngs (whch are the exact copes of hmself) and des mmedately after that, thus lvng only one generaton. Ths process descrbes how the genes get transmtted from one generaton to the next. However, the processed of brth and death wll have to reman unseen behnd the curtan for a moment. Instead, we wll only observe how the frequency of alleles wll change from generaton to generaton. We wll fx our attenton at frequency of allele A n the populaton of haplods. Let us thnk of ths process of gong from one generaton to the other as a Markov Chan, where the state X of the chan corresponds to the number of haplods (genes) of type A. Clearly, n any generaton X takes one of the values 0,,...,, whch consttutes a state space. We wll denote the value taken by X n generaton t as X t. The model assumes that genes for the generaton t + are derved by samplng wth replacement from the genes of generaton t. Thus, the make up of the next generaton s determned by ndependent Bernoull trals so that X t s a bnomal random varable. Let the ntal generaton consst of genes of type A and genes of type A 2. Then we defne a probablty of success (resultng n allele A ) p and a probablty of falure q (resultng n allele A 2 ) for each Bernoull tral as p q We generate a Markov Chan {X n }, where X n s the number of A genes n the n th generaton, among a constant populaton sze of ndvduals. Bascally, X t+ s a bnomal random varable wth ndex and parameter (probablty of success) X t /. Observe that the transton probabltes from X t to X t+ for ths Markov Chan are computed accordng to the bnomal dstrbuton as P (X t+ X t ) p ( ) p q ( ) (/) { (/)} Observe now that states 0 and are completely absorbng. In other words, no matter what the value of X 0 s, eventually X t wll take the value 0 or. And once ths happens, X wll stay n that state forever. In the case of X t 0, the populaton wll consst only of A 2 genes, whle n the case of X t the populaton wll be purely A -gene populaton. In ths model, the allele frequency of the next generaton s manpulated manly by a genetc drft (a Genetc drft can be defned as a force that reduces heterozygosty by the random loss of alleles). 3

4. Absorpton probablty n Wrght-Fsher model Let us dscuss absorpton probabltes n Wrght-Fsher model. In fact, the populaton can attan fxaton and be composed of only A -genes (X t ) or A 2 -genes (X t 0). It can be shown that wth probablty one, ether of the absorbng states (ether 0 or ) s eventually entered (and ths s true for both Wrght-Fsher and Moran models). And therefore lm t P (X t ) 0. We wll dscuss two cases of absorpton: at 0 and at. 4.. Absorpton at zero: We wrte the probablty of extncton (absorpton at 0) of a gene gven that t started wth copes as lm n P (X n 0 X 0 ) Let us fnd the probablty of absorbng n state 0 by usng E(X n ). We express E(X n ) by usng the expectaton of a condtonal expectaton E(X n ) E[E(X n X n )] E(X n ) E(X n 2 ) E(X 0 ) Ths property s called the constancy of expectaton (and s also true for Moran model). Further we can wrte E(X n ) as E(X n ) 0 u,0 + ( u,0 ) Now, snce lm n E(X n ), 0 u,0 + ( u,0 ) and therefore u,0 Observe that we gnore P (X t ) snce they are equal to zero as n goes to nfnty. 4..2 Absorpton at : We want to calculate probablty that A eventually becomes fxed n the populaton (absorpton at )and follow a smlar argument: 0 ( u, ) + u, so that u, A dfferent argument (whch s more relevant to a genetcal pont of vew) s that eventually every gene n the populaton s descended from the unque gene n generaton zero. The probablty that such a gene (allele) s A s smply the ntal fracton of A alleles, namely /, and ths also must be a fxaton probablty of allele A. 4..3 Absorpton startng wth one A allele Suppose that n a populaton of pure A 2 alleles a sngle new mutant A allele (gene) arses. There are no more new mutatons and therefore we can assume that we start wth a populaton wth one A allele and A 2 alleles. Accordng to the prevous result, the probablty of fxaton for ths allele s u,. On the other hand, the probablty that the allele s lost s /. 4

4.2 Mean tme untl absorpton n Wrght-Fsher model The calculaton of the mean tme untl absorpton for the Wrght-Fsher model s very computatonally expensve. Therefore, t s useful to approxmate ths quantty, whch wll be descrbed n the next secton. However, t s relatvely smple to derve the mean tme untl absorpton startng wth one allele of type A (before the mutant s lost or before the mutant s fxed). For ths calculaton we use the same analyss as before. We wll use the expected number of vsts to a state along the path to absorpton, startng from state X 0. We denote the mean number of generatons to absorpton n 0 or, gven that we started wth one allele A, as t. We need to sum up the expected number of such vsts for all, avodng states 0 and : t where t, s the mean number of tmes when the number of A alleles takes the value of (system s n state ) before reachng ether 0 or. Both Fsher and Wrght found that t, t, 2 startng at. Snce N log(n) + γ where γ s a Euler s constant (0.5772...), we derve t t, 2 2 2(log( ) + γ) 4.3 Approxmatng Mean Tme untl Absorpton Even though n prncple we can fnd solutons of the mean tme to absorpton for a general, n practce these solutons seem extremely dffcult, and smple expressons for these mean tmes have not yet been found. Let us now present a smple approxmaton for t. We agan apply the frst step analyss, where we start from state and n the frst step attend some ntermedate step k. We defne M, /M x, k/m x + δx, and t t(x). Then we can rewrte the equaton as M t p k t k + t(x) P {x x + δx} t(x + δx) + () E{ t(x + δx)} + (2) Now assumng that t(x) s a twce dfferentable functon of a contnuous varable x, we can use Taylor seres to approxmate the above quantty. The Taylor seres states that f(y) n0 f(a) 0! f (n) (a) (y a) n (4) n! (y a) 0 + {f(a)}! (3) (y a) + {f(a)} (y a) 2 +... (5) 2! f(a) + {f(a)} (y a) + 2 {f(a)} (y a) 2 +..., (6) 5

We re-wrte t(x) by applyng Taylor s seres (showng three leadng terms): t(x) E{ t(x + δx)} + (7) t(x) + E(δx){ t(x)} + 2 E(δx)2 { t(x)} +, (8) All expectatons are condtonal on x. Usng the fact that the expectaton of the bnomal random varable s E(Y ) np, we can rewrte then E(x + δx) E(/M) E() M M M M M, where p M. We can as well use the fact establshed before that E(X n X n ). At the same tme x M and E(x) x; therefore E(δx) 0. As a result the term E(δx){ t(x)} 0. Let us calculate E(δx) 2. In our case, the E(δx) 2 V ar(δx) snce V ar(δx) E(δx) 2 [E(δx)] 2 and [E(δx)] 2 0, as shown from the prevous result. The varance of the bnomal random varable (9). The above gves V ar(x + δx) V ar(/) V ar() 4N 2 ) ( 4N 2 x( x) t(x) t(x) + x( x) { t(x)} + 2 x( x) { t(x)} 2 4N x( x){ t(x)} The soluton to ths equaton, subect to the boundary condtons t(0) t() 0 s t(x) 4N (0) x( x) 4N () x( x) ( 4N x + ) (2) x ( ) 4N x + (3) x 4N (ln(x) + ln( x)) (4) { } 4N ln(x) + ln( x) (5) 4N {(xln(x) x) + (( x)ln( x) ( x))} (6) 4N{xlogx + ( x)log( x)} (7) where x, the ntal frequency of allele A. Ths s also s called dffuson approxmaton to the mean absorpton tme. When we ntally start wth one A allele x, then the mean tme to absorpton s 6

( ) t 4N{ ( ) log + ( ( )log ) } (8) 2 + 2log (9) (20) At the same tme, when x 2, then t( 2 ) 2.8N Observe that for equal ntal frequences (x 2 ), the mean tme s relatvely long. 5 Moran model In each generaton of the Moran model, one gene s chosen at random to gve 2 offsprngs and one gene s chosen to de (all other genes survve to the next generaton). As opposed to Wrght-Fsher model, Moran model has overlappng generatons. Ths model s also known as a brth-and-death model. We stll consder a constant populaton sze of haplods, each of whch has ether allele A or allele A 2. Let us (for now) gnore mutaton or selecton pressures. Agan, we defne X to be a random varable, whch represents the number of alleles of type A n the populaton. It s of nterest to calculate transton probabltes for the mpled Markov chan. Suppose that n populaton t (whch corresponds to state X t n underlyng Markov chan) the number of alleles A s. Then n populaton t +, the number of alleles A can be ether ( ), ( + ), or. The system can go from to f A 2 ndvdual s chosen to gve 2 offsprngs and A ndvdual s chosen to de: ( ) ( ) p, To go from to +, the opposte should be true: A 2 s chosen to de and A s chosen to reproduce: ( ) ( ) p,+ and for gong from to, t takes ether A to reproduce and de or A 2 to reproduce and de: ( p, ) ( + ) 2 + ( ) 2 () 2 Observe that p 0 for all other values of snce t s mpossble to make other transtons. 5. Propertes of a Contnuant matrx n Moran model. In the case of Moran model, the transton probablty matrx s Contnuant, whch means that p 0 ff >. Now we can apply the standard Contnuant matrx theory to our model so that the probablty of fxaton and mean tme to absorpton can be found explctly. In partcular, we can use a brth-and-death process concepts to calculate the desred quanttes. The brthdeath process s a specal case of Contnuous-tme Markov process where the states represent the current sze of a populaton and where the transtons are lmted to brths and deaths. When a brth occurs, the state goes to state +, defned by the brth rate p,+ λ. When the death occurs, the process goes from state to state, defned by the death rate p, µ. For now, we wll ust use facts from the theory of brth-death processes, wthout provng them; however, we would lke to return to formal defntons and proofs n one of the future lectures. We defne ρ µ µ 2... µ λ λ 2... λ 7

and ρ 0. If 0 and M are both absorbng states, then the probablty of absorpton n ether of them becomes u / Proceedng further wth the above argument, we can calculate the mean number of tmes the system s n state gven that t started n state as from ths we can derve t t t ( u ) ρ µ, (,..., ) t u t k, ( +,..., M ) ρ λ 5.2 Probablty of fxaton n Moran model Let us now observe that n the Moran model so that ( u ) ρ k + ρ µ + λ µ ( )/() 2 ρ µ λ µ 2 λ 2... µ λ u k ρ λ for 0,,...,. It can be shown that, smlarly to Wrght-Fsher model, E(X t ). Followng all of the above, the probablty of fxaton (gven that we started wth coped of A ) s gven that M. u 5.3 Expected absorpton tme Usng the fact that ρ, we can derve the followng for,..., and for +,..., M t ( u ) ρ µ (2) ( ) ( ) ( ) ( ) (22) (23) (24) 8

t u k (25) ρ λ k Now we can calculate the expected tme to absorpton as t ( ) ( ) (26) (27) (28) (29) t (30) ( u ) ρ k + ρ µ + ( ) ( ) + + u k (3) ρ λ + + It s possble to condton on the fact that, say A eventually fxes. In ths case, t s not dffcult to derve smpler expressons for mean absorpton tmes. Such dervatons wll be dscussed n the future lectures. (32) (33) References [] Warren Ewens, mathematcal Populaton Genetcs, Second edton, 2004, pp 20-23, 86-9, 92-99, 04-09. [2] Howard Taylor and Samuel Karln, An Introducton to Stochastc Modelng, Thrd edton, 998, pp 95-38. [3] Sdney Resnck, Adventures n Stochastc Processes, 992, pp72-6. 9