Bayesian coalescent inference of population size history



Similar documents
PRINCIPLES OF POPULATION GENETICS

Global Under Diagnosis of Viral Hepatitis

A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML

Biology Notes for exam 5 - Population genetics Ch 13, 14, 15

A Rough Guide to BEAST 1.4

Beginner's guide to Hepatitis C testing and immunisation against hepatitis A+B in general practice

Hepatitis C Glossary of Terms

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:


Principles of Evolution - Origin of Species

Divergence Time Estimation using BEAST v1.7.5

Epidemiology of Hepatitis C Infection. Pablo Barreiro Service of Infectious Diseases Hospital Carlos III, Madrid

Course Descriptions. I. Professional Courses: MSEG 7216: Introduction to Infectious Diseases (Medical Students)

AP Biology Essential Knowledge Student Diagnostic

Bayesian Phylogeny and Measures of Branch Support

The Epidemiology of Hepatitis A, B, and C

HEPATITIS WEB STUDY Acute Hepatitis C Virus Infection: Epidemiology, Clinical Features, and Diagnosis

Epidemiology of HCV and HIV/HCV Infection in Guangzhou, China. Title. Charles Wang, M.D.

Hepatitis C. Eliot Godofsky, MD University Hepatitis Center Bradenton, FL

João Silva de Mendonça, MD, PhD Infectious Diseases Service Hospital do Servidor Público Estadual São Paulo - Brazil

Comparative Drug Ranking for Clinical Decision Support in HIV Treatment

Statistics Graduate Courses

SIXTY-SEVENTH WORLD HEALTH ASSEMBLY. Agenda item May Hepatitis

Viral Hepatitis A, B, and C

Evolution (18%) 11 Items Sample Test Prep Questions

Poor access to HCV treatment is undermining Universal Access A briefing note to the UNITAID Board

FAQs: Gene drives - - What is a gene drive?

Practice Questions 1: Evolution

Bacteria vs. Virus: What s the Difference? Grade 11-12

Name Class Date. binomial nomenclature. MAIN IDEA: Linnaeus developed the scientific naming system still used today.

Hepatitis C Infections in Oregon September 2014

Case Finding for Hepatitis B and Hepatitis C

Y Chromosome Markers

Imperfect Debugging in Software Reliability

HIV and Hepatitis Co-infection. Martin Fisher Brighton and Sussex University Hospitals, UK

Summary Genes and Variation Evolution as Genetic Change. Name Class Date

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

Viral hepatitis. Report by the Secretariat

Molecular Diagnosis of Hepatitis B and Hepatitis D infections

HCV in 2020: Any cases left? Rafael Esteban Hospital General Universitario Valle Hebron Barcelona. Spain

Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs)

Liver Disease and Therapy of Hepatitis B Virus Infections

LECTURE 6 Gene Mutation (Chapter )

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Prospects for Vaccines against Hepatitis C Viruses. T. Jake Liang. M.D. Liver Diseases Branch NIDDK, NIH, HHS

DnaSP, DNA polymorphism analyses by the coalescent and other methods.

Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment

Tutorial on Markov Chain Monte Carlo

PHYLOGENY AND EVOLUTION OF NEWCASTLE DISEASE VIRUS GENOTYPES

Genetics Lecture Notes Lectures 1 2

Understanding by Design. Title: BIOLOGY/LAB. Established Goal(s) / Content Standard(s): Essential Question(s) Understanding(s):

In this study, the DCV+ASV regimen had low rates of discontinuation (5%) due to adverse events, and low rates of serious adverse events (5.

Worksheet - COMPARATIVE MAPPING 1

Mutations: 2 general ways to alter DNA. Mutations. What is a mutation? Mutations are rare. Changes in a single DNA base. Change a single DNA base

When an occupational exposure occurs, the source patient should be evaluated for both hepatitis B and hepatitis C. (AII)

Investigating the genetic basis for intelligence

The Basics of Drug Resistance:

Likelihood of Cancer

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

Master's projects at ITMO University. Daniil Chivilikhin PhD ITMO University

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Lecture 13: DNA Technology. DNA Sequencing. DNA Sequencing Genetic Markers - RFLPs polymerase chain reaction (PCR) products of biotechnology

Recommendations for the Identification of Chronic Hepatitis C virus infection Among Persons Born During

Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the

Forensic DNA Testing Terminology

Hepatitis C Monitoring and Complications (and Treatment!) Dr Mark Douglas

3/25/2014. April 3, Dennison MM, et al. Ann Intern Med. 2014;160:

Preface. TTY: (888) or Hepatitis C Counseling and Testing, contact: 800-CDC-INFO ( )

A comparison of methods for estimating the transition:transversion ratio from DNA sequences

Lecture 10 Friday, March 20, 2009

Structure and Function of DNA

Chapter 13: Meiosis and Sexual Life Cycles

Milestones of bacterial genetic research:

EPIDEMIOLOGY OF HEPATITIS B IN IRELAND

Basic Principles of Forensic Molecular Biology and Genetics. Population Genetics

National Health Burden of CLD in Italy

Cystic Fibrosis Webquest Sarah Follenweider, The English High School 2009 Summer Research Internship Program

DNA Sequence Alignment Analysis

Bloodborne Pathogens

IN THE SENATE OF THE UNITED STATES , M. introduced the following bill; which was referred to the Committee on A BILL

REVIEWS. Computer programs for population genetics data analysis: a survival guide FOCUS ON STATISTICAL ANALYSIS

THE A, B, C S OF HEPATITIS. Matt Eidem, M.D. Digestive Health Associates of Texas 1600 Coit Road Suite #301 Plano, Texas (972)

Biology Final Exam Study Guide: Semester 2

Lessons from the Stanford HIV Drug Resistance Database

Offering Testing for Hepatitis B and C in Primary Care

P (B) In statistics, the Bayes theorem is often used in the following way: P (Data Unknown)P (Unknown) P (Data)

Hepatitis C. Screening, Diagnosis and Linkage to Care

boceprevir 200mg capsule (Victrelis ) Treatment experienced patients SMC No. (722/11) Merck, Sharpe and Dohme Ltd

Localised Sex, Contingency and Mutator Genes. Bacterial Genetics as a Metaphor for Computing Systems

Transcription:

Bayesian coalescent inference of population size history Alexei Drummond University of Auckland Workshop on Population and Speciation Genomics, 2016 1st February 2016 1 / 39

BEAST tutorials Population inference methods using BEAST2 we will cover 1. Extended (EBSP) 2. Structured coalescent (MultiTypeTree) 3. SNP-based species coalescent inference (SNAPP) Not being covered Divergence time estimation relaxed molecular clocks multi-locus species coalescent inference (*BEAST) Epidemiological phylodynamic models Fossilized Birth-death model, sampled ancestors Programming new models in BEAST 2 / 39

The coalescent Data: a small genetic sample from a large background population. The coalescent is a model of the ancestral relationships of a sample of individuals taken from a larger population. describes a probability distribution on ancestral genealogies (trees) given a population history, N(t). Therefore the coalescent can convert information from ancestral genealogies into information about population history and vice versa. a model of ancestral genealogies, not sequences, and its simplest form assumes neutral evolution. can be thought of as a prior on the tree, in a Bayesian setting. 4 / 39

Theoretical population genetics Most of theoretical population genetics is based on the idealized Wright-Fisher model of population which assumes Constant population size N Discrete generations Complete mixing For the purposes of this presentation the population will be assumed to be haploid, as is the case for many pathogens. 5 / 39

Kingman s n-coalescent Consider tracing the ancestry of a sample of k individuals from the present, back into the past. This process eventually coalesces to a single common ancestor (concestor) of the sample of individuals. Kingman s n-coalescent describes the statistical properties of such an ancestry when k is small compared to the total population size N. 6 / 39

The coalescence of two ancestral lineages First, consider two random members from a population of fixed size N. By perfect mixing, the probability they share a concestor in the previous generation is 1/N. The probability the concestor is t generations back is Pr{t} = 1 N (1 1 N )t 1. It follows that g = t 1, has a geometric distribution with a success rate of λ = 1/N, and so has mean N and variance of N 3 /(N 1). 7 / 39

The coalescence of k lineages With k lineages the time to the first coalescence is derived in the same way, only now there are ( k 2) possible pairs that may coalesce, resulting in a success rate of λ = ( k 2) /N and mean time to first coalescence (t k ) of E[t k ] = N ( k 2). This implicitly assumes that N is much larger than O(k 2 ), so that the probability of two coalescent events in the same generation is small. 8 / 39

The coalescent likelihood for a genealogy For a genealogy with known coalescent times t = {t 2, t 3,..., t n } we can write the likelihood: f(t N) = 1 N n 1 ( n exp k=2 k ) 2 tk ( N ). 9 / 39

The coalescent with serial samples Many epidemiological agents, like RNA viruses, evolve very rapidly, so that the effect of sampling the population at different times becomes important. Ancient DNA also requires care handling of sampling times. Constant size Exponential growth 10 / 39

Bayesian integration of uncertainty in genealogies How similar are these two trees? Both of them are plausible given the data. We can use Bayesian Markov-chain Monte Carlo to average the coalescent over all plausible trees. 11 / 39

A case study of the coalescent approach to molecular epidemiology Hepatitis C (HCV) Identified in 1989 9.6kb single-stranded RNA genome Polyprotein cleaved by proteases Tissue culture system only recently developed 13 / 39

How important is Hepatitis C? MESSINA ET AL. HEPATOLOGY, January 2015 > 185 million people infected worldwide Fig. 1. Relative prevalence of each HCV genotype by GBD region. Size of pie charts is proportional to the number of seroprevalent cases as imated by Hanafiah et al. 2 80% infections are chronic Liver cirrhosis and cancer risk 10,000 deaths per year in USA No protective immunity? 14 / 39

HCV Transmission By percutaneous exposure to infected blood Blood transfusion / blood products Injecting and nasal drug use Sexual and vertical transmission Unsafe injections Unidentified routes 15 / 39

Coalescent population inference of HCV Pybus et al (2003) Molecular Biology and Evolution Egyptian HCV gene sequences n=61 E1 gene, 411bp All sequence contemporaneous Egypt has highest prevalence of HCV worldwide (10-20%) But low prevalence in neighbouring states Why is Egypt so seriously affected? Parenteral antischistosomal therapy (PAT)? 16 / 39

Demographic model for The coalescent can be extended to model any integrable function of varying population size. The model we used was a const-exp-const model. A Bayesian MCMC method was developed to sample the gene genealogy, the substitution model and demographic function simultaneously. N c N(x) = N c e r(t x) N A if t x if x < t < y if t y 17 / 39

Estimated population history of HCV in Egypt 18 / 39

Uncertainty in parameter estimates Demographic parameters Mutational parameters Growth rate of the growth phase Grey box is the prior Rates at different codon positions, All significantly different 19 / 39

Full Bayesian Estimation Marginalized over uncertainty in genealogy and mutational processes Yellow band represents time over which PAT was employed in Egypt 20 / 39

Virus Phylodynamics 22 / 39

Skyline coalescent model 23 / 39

The generalized skyline plot - simulated data Constant population size, N(t) = N 0 Exponential growth, N(t) = N 0 e rt 24 / 39

The generalized skyline plot - HIV-1 group M The tree used here was estimated in Yusim et al (2001) Phil. Trans. Roy. Soc. Lond. B 356:855-866. The black curve is a parametric coalescent estimate obtained from the same data under the expansion model, N(t) = (N 0 N A )e rt + N A 25 / 39

The Drummond et al (2005) Molecular Biology and Evolution The Bayesian skyline plot estimates a demographic function that has a certain fixed number of steps (in this example 15) and then integrates over all possible positions of the break points, and population sizes within each epoch. 26 / 39

Validating the 27 / 39

Comparison of BSP to parametric coalescent 28 / 39

Modeling complex demographic history Dengue 4 in Puerto Rico N(t) = N 0 exp( rt) log marginal likelihood = -10566.421 N(t) = scaled-translated case data log marginal likelihood = -10478.572 29 / 39

Comparing BSP to incidence data Dengue 4 in Puerto Rico 30 / 39

Extended BSP: Stochastic Variable Selection Heled and Drummond (2008) 31 / 39

Comparison of EBSP with BSP on Egypt HCV 32 / 39

Detecting evolutionary bottlenecks using EBSP 480 contemporaneous samples from a single locus 33 / 39

Detecting evolutionary bottlenecks using EBSP 16 contemporaneous samples from each of 32 loci 34 / 39

Detecting evolutionary bottlenecks using EBSP 480 samples sampled through time from a single locus 35 / 39

The population genetic dynamics of Influenza A Rambaut et al (2008) Nature 453:615-620 36 / 39

Conclusions provides a mathematical framework for accurately modelling small genetic samples and the stochastic outcomes of genetic drift due to random mating. provides backward probabilities for population parameters. In most implementations the coalescent does not directly model small populations or stochastically fluctuating populations. can be extended to: non-parametric fluctuating populations structured populations gene conversion 37 / 39

BEAST book theory / practice / programming 38 / 39

Taming the BEAST Bayesian evolutionary analysis by sampling trees A summer school on Bayesian phylogenetic and phylodynamic analyses in BEAST2 with invited talks, lectures and tutorials by leading and renowned experts in the field. Alexei Drummond University of Auckland Tracy Heath Iowa State University Oliver Pybus University of Oxford Tanja Stadler ETH Zürich Tim Vaughan University of Auckland 26 June - 1 July 2016 Engelberg, Switzerland For further information and to apply: http://www.bsse.ethz.ch/cevo/taming-the-beast.html Computational Evolution Engelberg Tourismus