Reconstructing phylogenies: how? how well? why?
|
|
- Katrina Allen
- 7 years ago
- Views:
Transcription
1 Joe Felsenstein Department of Genome Sciences and Department of Biology University of Washington, Seattle p.1/29
2 A review that asks these questions What are some of the strengths and weaknesses of different ways of reconstructing evolutionary trees (phylogenies)? p.2/29
3 A review that asks these questions What are some of the strengths and weaknesses of different ways of reconstructing evolutionary trees (phylogenies)? How can we find out how accurate we may have been in reconstructing the phylogeny? p.2/29
4 A review that asks these questions What are some of the strengths and weaknesses of different ways of reconstructing evolutionary trees (phylogenies)? How can we find out how accurate we may have been in reconstructing the phylogeny? Why do we want to reconstruct it? What are phylogenies used for? p.2/29
5 What does tree space (with branch lengths) look like? an example: three species with a clock A B C trifurcation t 1 t 2 t 1 not possible OK etc. t 2 when we consider all three possible topologies, the space looks like: t 1 t 2 p.3/29
6 For one tree topology The space of trees varying all 2n 3 branch lengths, each a nonegative number, defines an orthant" (open corner) of a 2n 3-dimensional real space: B A v 1 F v 6 v 2 3 C wall 8v 9 v v v 7 v 5 v 4 D floor wall v 9 E p.4/29
7 Through the looking-glass Shrinking one of the n 1 interior branches to 0, we arrive at a trifurcation: B A v 1 F v 6 v 2 3 v v v 8 7 v 9 v 5 v 4 C D A v 1 F v 6 v 7 B v 2 v3 v C 8 v 4 D v 9 v 5 E A v 1 F v 6 B v 2 3 v v v 8 7 E v 5 v 4 D C A v 1 F v 6 v 7 v 9 B v 2 v3 v8 v v 4 5 C E E Here, as we pass through the looking glass" we are also touch the space for two other tree topologies, and we could decide to enter either. D p.5/29
8 The graph of all trees of 5 species The space of all these orthants, one for each topology, connecting ones that share faces (looking glasses): A C D B E A D B E C B C E D A A E B C D B A C D E B D C A E A B C D E A B C E D A B D C E A D B C E A C B D E C A B D E A E B D C A E C B D D A B E C The Schoenberg graph (all 15 trees of size 5 connected by NNI s) p.6/29
9 There are very large numbers of trees For 21 species, the number of possible unrooted tree topologies exceeds Avogadro s Number: it is = 8, 200, 794, 532, 637, 891, 559, and that s not even asking about how hard it is to optimize the 39 branch lengths for each of these trees. What this goes with is that most methods of finding the best tree are NP-hard, and not easy to approximate either. p.7/29
10 Parsimony methods Alpha Delta Gamma Beta Epsilon Sites Species Alpha T A G C A T Beta C A A G C T Gamma T C G G C T Delta T C G C A A Epsilon C A A C A T p.8/29
11 Advantages and disadvantages of parsimony methods Disadvantage: not model-based so people think it makes no assumptions. p.9/29
12 Advantages and disadvantages of parsimony methods Disadvantage: not model-based so people think it makes no assumptions. Advantage: reasonably fast, no search of branch lengths needed and quick to compute the criterion. p.9/29
13 Advantages and disadvantages of parsimony methods Disadvantage: not model-based so people think it makes no assumptions. Advantage: reasonably fast, no search of branch lengths needed and quick to compute the criterion. Advantage: good statistical properties when amounts of change are small. p.9/29
14 Advantages and disadvantages of parsimony methods Disadvantage: not model-based so people think it makes no assumptions. Advantage: reasonably fast, no search of branch lengths needed and quick to compute the criterion. Advantage: good statistical properties when amounts of change are small. Disadvantage: statistical misbehavior (inconsistency) when some nearby branches on the tree are long (Long Branch Attraction). p.9/29
15 Advantages and disadvantages of parsimony methods Disadvantage: not model-based so people think it makes no assumptions. Advantage: reasonably fast, no search of branch lengths needed and quick to compute the criterion. Advantage: good statistical properties when amounts of change are small. Disadvantage: statistical misbehavior (inconsistency) when some nearby branches on the tree are long (Long Branch Attraction). Disadvantage: likely to make you think you have William of Ockham s endorsement. p.9/29
16 Advantages and disadvantages of parsimony methods Disadvantage: not model-based so people think it makes no assumptions. Advantage: reasonably fast, no search of branch lengths needed and quick to compute the criterion. Advantage: good statistical properties when amounts of change are small. Disadvantage: statistical misbehavior (inconsistency) when some nearby branches on the tree are long (Long Branch Attraction). Disadvantage: likely to make you think you have William of Ockham s endorsement. Disadvantage: may lead to the delusion that you know exactly what happened in evolution, in detail. p.9/29
17 The sequences: Distance matrix methods yield distances: A B C D E A B C D E CCTAACCTCTGACCC... CGTAACCTCCGGCCC... CGTAACCTCTGGCCC... CGCAACCTCTGGCTC... CCTAACCTCTGGCCC... A B C D alter tree until predictions match observed distances as closely as possible E A suggested tree: A C D B E predicts: A B C D E compare: A B C D E p.10/29
18 Advantages and disadvantages of distance methods Advantage: model-based so assumptions are clearer. p.11/29
19 Advantages and disadvantages of distance methods Advantage: model-based so assumptions are clearer. Advantage: it s geometry so mathematical scientists love it. p.11/29
20 Advantages and disadvantages of distance methods Advantage: model-based so assumptions are clearer. Advantage: it s geometry so mathematical scientists love it. Advantage: often fast (especially Neighbor-Joining method), can handle large numbers of sequences. p.11/29
21 Advantages and disadvantages of distance methods Advantage: model-based so assumptions are clearer. Advantage: it s geometry so mathematical scientists love it. Advantage: often fast (especially Neighbor-Joining method), can handle large numbers of sequences. Disadvantage: not using data fully statistically efficiently. p.11/29
22 Advantages and disadvantages of distance methods Advantage: model-based so assumptions are clearer. Advantage: it s geometry so mathematical scientists love it. Advantage: often fast (especially Neighbor-Joining method), can handle large numbers of sequences. Disadvantage: not using data fully statistically efficiently. Advantage: when tested by simulation, found to be surprisingly efficient anyway. p.11/29
23 Advantages and disadvantages of distance methods Advantage: model-based so assumptions are clearer. Advantage: it s geometry so mathematical scientists love it. Advantage: often fast (especially Neighbor-Joining method), can handle large numbers of sequences. Disadvantage: not using data fully statistically efficiently. Advantage: when tested by simulation, found to be surprisingly efficient anyway. Disadvantage: cannot easily propagate some information about local features in the sequences from one distance calculation to another. p.11/29
24 Advantages and disadvantages of distance methods Advantage: model-based so assumptions are clearer. Advantage: it s geometry so mathematical scientists love it. Advantage: often fast (especially Neighbor-Joining method), can handle large numbers of sequences. Disadvantage: not using data fully statistically efficiently. Advantage: when tested by simulation, found to be surprisingly efficient anyway. Disadvantage: cannot easily propagate some information about local features in the sequences from one distance calculation to another. Disadvantage: it s geometry so mathematical scientists hang onto it beyond the point of reason. p.11/29
25 Maximum likelihood A C C C G t i are t 1 "branch lengths", x t2 t 7 t 3 z t 4 t 5 y t 6 (rate X time) w t 8 To compute the likelihood for one site, sum over all possible states (bases) at interior nodes: L (i) = x y z w Prob (w) Prob (x w, t 7 ) Prob (A x, t 1 ) Prob (C x, t 2 ) Prob (z w, t 8 ) Prob (C z, t 3 ) Prob (y z, t 6 ) Prob (C y, t 4 ) Prob (G y, t 5 ) p.12/29
26 Advantages and disadvantages of likelihood Advantage: uses a model, so assumptions are clear. p.13/29
27 Advantages and disadvantages of likelihood Advantage: uses a model, so assumptions are clear. Advantage: fully statistically efficient. p.13/29
28 Advantages and disadvantages of likelihood Advantage: uses a model, so assumptions are clear. Advantage: fully statistically efficient. Disadvantage: computationally slower. p.13/29
29 Advantages and disadvantages of likelihood Advantage: uses a model, so assumptions are clear. Advantage: fully statistically efficient. Disadvantage: computationally slower. Advantage: statistical testing by likelihood ratio tests available p.13/29
30 Advantages and disadvantages of likelihood Advantage: uses a model, so assumptions are clear. Advantage: fully statistically efficient. Disadvantage: computationally slower. Advantage: statistical testing by likelihood ratio tests available Disadvantage: can t use the LRT test to test tree topologies. p.13/29
31 Bayesian inference methods Basically uses the likelihood machinery, and adds priors on parameters and on trees. Implemented by Markov chain Monte Carlo methods to sample from the posterior on trees (or parameters, or both). Very popular right now. Advantage: interpretation is straightforward, once the assumptions are met. p.14/29
32 Bayesian inference methods Basically uses the likelihood machinery, and adds priors on parameters and on trees. Implemented by Markov chain Monte Carlo methods to sample from the posterior on trees (or parameters, or both). Very popular right now. Advantage: interpretation is straightforward, once the assumptions are met. Advantage: gives you what you want, the probability of the result. p.14/29
33 Bayesian inference methods Basically uses the likelihood machinery, and adds priors on parameters and on trees. Implemented by Markov chain Monte Carlo methods to sample from the posterior on trees (or parameters, or both). Very popular right now. Advantage: interpretation is straightforward, once the assumptions are met. Advantage: gives you what you want, the probability of the result. Disadvantage: how long is long enough to run the MCMC? p.14/29
34 Bayesian inference methods Basically uses the likelihood machinery, and adds priors on parameters and on trees. Implemented by Markov chain Monte Carlo methods to sample from the posterior on trees (or parameters, or both). Very popular right now. Advantage: interpretation is straightforward, once the assumptions are met. Advantage: gives you what you want, the probability of the result. Disadvantage: how long is long enough to run the MCMC? Disadvantage: where do we get priors from, what effect do they have? p.14/29
35 Bayesian inference methods Basically uses the likelihood machinery, and adds priors on parameters and on trees. Implemented by Markov chain Monte Carlo methods to sample from the posterior on trees (or parameters, or both). Very popular right now. Advantage: interpretation is straightforward, once the assumptions are met. Advantage: gives you what you want, the probability of the result. Disadvantage: how long is long enough to run the MCMC? Disadvantage: where do we get priors from, what effect do they have? Disadvantage: they keep chanting in unison We are the statisticians of Bayes you will be assimilated. p.14/29
36 Aren t these graphical models? x x x x x x x x x x x x x x x v 1 x x x x x v 5 v 6 x x x x x v v 3 x x x x x x x x x x v 8 v 7 x x x x x v 9 x x x x x (You have to imaging it going back 500 layers or so). The problem is to use the data, which is at the tips but not available for the interior nodes, to infer the topology and branch lengths of the tree that is shared by all sites. p.15/29
37 Could we use graphical model machinery here? Like Moliére s character who is delighted to discover that he s been speaking prose all his life, we found we had already been using the relevant Graphical Model machinery since about So alas there was nothing to gain. The same thing is true for statistical genetics, where the graphical model machinery reinvents the standard peeling algorithms for computing likelihoods on pedigrees, in use since p.16/29
38 Bootstrap sampling of phylogenies Original Data sites sequences p.17/29
39 Draw columns randomly with replacement Original Data sites sequences Bootstrap sample #1 sites Estimate of the tree sequences sample same number of sites, with replacement p.18/29
40 Make a tree from that resampled data set Original Data sites sequences Bootstrap sample #1 sites Estimate of the tree sequences sample same number of sites, with replacement Bootstrap estimate of the tree, #1 p.19/29
41 Draw another bootstrap sample Original Data sites sequences Bootstrap sample #1 sites Estimate of the tree sequences sample same number of sites, with replacement Bootstrap sample #2 sequences sites sample same number of sites, with replacement Bootstrap estimate of the tree, #1 (and so on) p.20/29
42 ... and get a tree for it too. And so on. Original Data sites sequences Bootstrap sample #1 sites Estimate of the tree sequences sample same number of sites, with replacement Bootstrap sample #2 sequences sites sample same number of sites, with replacement Bootstrap estimate of the tree, #1 (and so on) Bootstrap estimate of the tree, #2 p.21/29
43 Summarizing the cloud of trees by support for branches Bovine Mouse Squir Monk Chimp Human Gorilla Orang Gibbon Rhesus Mac Jpn Macaq Crab E.Mac BarbMacaq Tarsier Lemur p.22/29
44 Some alternatives to bootstrapping Parametric bootstrapping same, but simulate data sets from our best estimate of the tree instead of sampling sites. p.23/29
45 Some alternatives to bootstrapping Parametric bootstrapping same, but simulate data sets from our best estimate of the tree instead of sampling sites. Bayesian inference of course gets statistical support information from the posterior. p.23/29
46 Some alternatives to bootstrapping Parametric bootstrapping same, but simulate data sets from our best estimate of the tree instead of sampling sites. Bayesian inference of course gets statistical support information from the posterior. The Kishino-Hasegawa-Templeton test (KHT test) which compares prespecified trees to each other by paired sites tests. p.23/29
47 Why want to know the tree? It affects all parts of the genomes it is the essential part of propagating information about the evolution of one part of the genome to inquiries about another part. The standard method for finding functional regions of the genome is now using PhyloHMMs which use Hidden Markov Model machinery together with phylogenies to find regions that have unusually low rates of evolution. p.24/29
48 Another kind of tree: the coalescent Coalescent trees are trees of ancestry of copies of a single gene locus within a species. They are weakly inferrable as most have only a few sites (SNPs) varying among individuals. Since each coalescent tree applies to a very short region of genome, maybe as little as one gene, there is less interest in the tree. But they do illuminate the values of parameters such as population size, migration rates, recombination rates etc. This allows us to accumulate information across different loci (genes). To don this we have to sum over our uncertainty about the tree by using MCMC methods, accumulating the information (as log likelihood or using Bayesian machinery) to make inferences about the parameters. This is the interface between within-species population genetics and between-species work on phylogenies. It is also the statistical foundation of inferences from mitochondrial genealogies ( mitochondrial Eve ) and Y chromosome genealogies, and of the samples from the rest of the genome that are now being added to this. p.25/29
49 A coalescent Time p.26/29
50 Yet another kind of tree: trees of gene families Gene duplications in evolution create new genes. Both the new gene and the original one then evolve. Frog Human Monkey Squirrel a b a b a b species boundary gene duplication tree of genes Some forks are gene duplications, leading to subtrees that are all supposed to have the same phylogeny as they are in the same set of species. Example: Hemoglobin proteins. p.27/29
51 Yet another kind of tree: trees of gene families Gene duplications in evolution create new genes. Both the new gene and the original one then evolve. Frog Human Monkey Squirrel a b a b a b Frog Human Monkey Squirrel Human Monkey Squirrel a a a b b b species boundary gene duplication tree of genes These two trees should be identical Some forks are gene duplications, leading to subtrees that are all supposed to have the same phylogeny as they are in the same set of species. Example: Hemoglobin proteins. p.28/29
52 References Felsenstein, J Inferring Phylogenies. Sinauer Associates, Sunderland, Massachusetts. [Book you and all your friends must rush out and buy] Semple, C. and M. Steel Phylogenetics. Oxford Lecture Series in Mathematics and Its Applications, 24. Oxford University Press. [More rigorous mathematical treatment] Yang, Z Computational Molecular Evolution. Oxford Series in Ecology and Evolution. Oxford University Press, Oxford. [Careful survey of molecular phylogeny methods, from a leader] For a list of 348 phylogeny programs, many available free, see p.29/29
Phylogenetic Trees Made Easy
Phylogenetic Trees Made Easy A How-To Manual Fourth Edition Barry G. Hall University of Rochester, Emeritus and Bellingham Research Institute Sinauer Associates, Inc. Publishers Sunderland, Massachusetts
More informationBayesian Phylogeny and Measures of Branch Support
Bayesian Phylogeny and Measures of Branch Support Bayesian Statistics Imagine we have a bag containing 100 dice of which we know that 90 are fair and 10 are biased. The
More informationBayesian coalescent inference of population size history
Bayesian coalescent inference of population size history Alexei Drummond University of Auckland Workshop on Population and Speciation Genomics, 2016 1st February 2016 1 / 39 BEAST tutorials Population
More informationModel-based Synthesis. Tony O Hagan
Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that
More informationIntroduction to Phylogenetic Analysis
Subjects of this lecture Introduction to Phylogenetic nalysis Irit Orr 1 Introducing some of the terminology of phylogenetics. 2 Introducing some of the most commonly used methods for phylogenetic analysis.
More informationPHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference
PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference Stephane Guindon, F. Le Thiec, Patrice Duroux, Olivier Gascuel To cite this version: Stephane Guindon, F. Le Thiec, Patrice
More informationA comparison of methods for estimating the transition:transversion ratio from DNA sequences
Molecular Phylogenetics and Evolution 32 (2004) 495 503 MOLECULAR PHYLOGENETICS AND EVOLUTION www.elsevier.com/locate/ympev A comparison of methods for estimating the transition:transversion ratio from
More informationBio-Informatics Lectures. A Short Introduction
Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively
More informationAlgorithms in Computational Biology (236522) spring 2007 Lecture #1
Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office
More informationHierarchical Bayesian Modeling of the HIV Response to Therapy
Hierarchical Bayesian Modeling of the HIV Response to Therapy Shane T. Jensen Department of Statistics, The Wharton School, University of Pennsylvania March 23, 2010 Joint Work with Alex Braunstein and
More informationArbres formels et Arbre(s) de la Vie
Arbres formels et Arbre(s) de la Vie A bit of history and biology Definitions Numbers Topological distances Consensus Random models Algorithms to build trees Basic principles DATA sequence alignment distance
More informationIntroduction to Bioinformatics AS 250.265 Laboratory Assignment 6
Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 In the last lab, you learned how to perform basic multiple sequence alignments. While useful in themselves for determining conserved residues
More informationWhat mathematical optimization can, and cannot, do for biologists. Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL
What mathematical optimization can, and cannot, do for biologists Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL Introduction There is no shortage of literature about the
More informationVisualization of Phylogenetic Trees and Metadata
Visualization of Phylogenetic Trees and Metadata November 27, 2015 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.clcbio.com support-clcbio@qiagen.com
More informationSequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment
Sequence Analysis 15: lecture 5 Substitution matrices Multiple sequence alignment A teacher's dilemma To understand... Multiple sequence alignment Substitution matrices Phylogenetic trees You first need
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationMaximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1
Maximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1 Ziheng Yang Department of Animal Science, Beijing Agricultural University Felsenstein s maximum-likelihood
More informationMissing data and the accuracy of Bayesian phylogenetics
Journal of Systematics and Evolution 46 (3): 307 314 (2008) (formerly Acta Phytotaxonomica Sinica) doi: 10.3724/SP.J.1002.2008.08040 http://www.plantsystematics.com Missing data and the accuracy of Bayesian
More informationName Class Date. binomial nomenclature. MAIN IDEA: Linnaeus developed the scientific naming system still used today.
Section 1: The Linnaean System of Classification 17.1 Reading Guide KEY CONCEPT Organisms can be classified based on physical similarities. VOCABULARY taxonomy taxon binomial nomenclature genus MAIN IDEA:
More informationA Non-Linear Schema Theorem for Genetic Algorithms
A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland
More informationMolecular Clocks and Tree Dating with r8s and BEAST
Integrative Biology 200B University of California, Berkeley Principals of Phylogenetics: Ecology and Evolution Spring 2011 Updated by Nick Matzke Molecular Clocks and Tree Dating with r8s and BEAST Today
More informationGenome Explorer For Comparative Genome Analysis
Genome Explorer For Comparative Genome Analysis Jenn Conn 1, Jo L. Dicks 1 and Ian N. Roberts 2 Abstract Genome Explorer brings together the tools required to build and compare phylogenies from both sequence
More informationA Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML
9 June 2011 A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML by Jun Inoue, Mario dos Reis, and Ziheng Yang In this tutorial we will analyze
More informationTowards running complex models on big data
Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation
More informationSimulation and Risk Analysis
Simulation and Risk Analysis Using Analytic Solver Platform REVIEW BASED ON MANAGEMENT SCIENCE What We ll Cover Today Introduction Frontline Systems Session Ι Beta Training Program Goals Overview of Analytic
More informationDnaSP, DNA polymorphism analyses by the coalescent and other methods.
DnaSP, DNA polymorphism analyses by the coalescent and other methods. Author affiliation: Julio Rozas 1, *, Juan C. Sánchez-DelBarrio 2,3, Xavier Messeguer 2 and Ricardo Rozas 1 1 Departament de Genètica,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationApproximating the Coalescent with Recombination. Niall Cardin Corpus Christi College, University of Oxford April 2, 2007
Approximating the Coalescent with Recombination A Thesis submitted for the Degree of Doctor of Philosophy Niall Cardin Corpus Christi College, University of Oxford April 2, 2007 Approximating the Coalescent
More informationProtein Sequence Analysis - Overview -
Protein Sequence Analysis - Overview - UDEL Workshop Raja Mazumder Research Associate Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center Topics Why do protein
More informationLikelihood: Frequentist vs Bayesian Reasoning
"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B University of California, Berkeley Spring 2009 N Hallinan Likelihood: Frequentist vs Bayesian Reasoning Stochastic odels and
More informationWeb-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni
1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed
More informationLab 2/Phylogenetics/September 16, 2002 1 PHYLOGENETICS
Lab 2/Phylogenetics/September 16, 2002 1 Read: Tudge Chapter 2 PHYLOGENETICS Objective of the Lab: To understand how DNA and protein sequence information can be used to make comparisons and assess evolutionary
More informationMethods for big data in medical genomics
Methods for big data in medical genomics Parallel Hidden Markov Models in Population Genetics Chris Holmes, (Peter Kecskemethy & Chris Gamble) Department of Statistics and, Nuffield Department of Medicine
More informationActivity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations
Activity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations SCENARIO You have responded, as a result of a call from the police to the Coroner s Office, to the scene of the death of
More informationSample Size Designs to Assess Controls
Sample Size Designs to Assess Controls B. Ricky Rambharat, PhD, PStat Lead Statistician Office of the Comptroller of the Currency U.S. Department of the Treasury Washington, DC FCSM Research Conference
More informationPROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org
BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab Mai S. Mabrouk 1, Marwa Hamdy 2, Marwa Mamdouh 2, Marwa Aboelfotoh 2,Yasser M. Kadah 2 1 Biomedical Engineering Department,
More informationInference on Phase-type Models via MCMC
Inference on Phase-type Models via MCMC with application to networks of repairable redundant systems Louis JM Aslett and Simon P Wilson Trinity College Dublin 28 th June 202 Toy Example : Redundant Repairable
More informationPRINCIPLES OF POPULATION GENETICS
PRINCIPLES OF POPULATION GENETICS FOURTH EDITION Daniel L. Hartl Harvard University Andrew G. Clark Cornell University UniversitSts- und Landesbibliothek Darmstadt Bibliothek Biologie Sinauer Associates,
More informationUnderstanding by Design. Title: BIOLOGY/LAB. Established Goal(s) / Content Standard(s): Essential Question(s) Understanding(s):
Understanding by Design Title: BIOLOGY/LAB Standard: EVOLUTION and BIODIVERSITY Grade(s):9/10/11/12 Established Goal(s) / Content Standard(s): 5. Evolution and Biodiversity Central Concepts: Evolution
More informationGaussian Processes to Speed up Hamiltonian Monte Carlo
Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo
More informationThe Story of Human Evolution Part 1: From ape-like ancestors to modern humans
The Story of Human Evolution Part 1: From ape-like ancestors to modern humans Slide 1 The Story of Human Evolution This powerpoint presentation tells the story of who we are and where we came from - how
More informationY Chromosome Markers
Y Chromosome Markers Lineage Markers Autosomal chromosomes recombine with each meiosis Y and Mitochondrial DNA does not This means that the Y and mtdna remains constant from generation to generation Except
More informationSENSITIVITY ANALYSIS AND INFERENCE. Lecture 12
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationStatistics & Probability PhD Research. 15th November 2014
Statistics & Probability PhD Research 15th November 2014 1 Statistics Statistical research is the development and application of methods to infer underlying structure from data. Broad areas of statistics
More informationBayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
More informationPhylogenetic systematics turns over a new leaf
30 Review Phylogenetic systematics turns over a new leaf Paul O. Lewis Long restricted to the domain of molecular systematics and studies of molecular evolution, likelihood methods are now being used in
More informationLearning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
More informationAn Introduction to Phylogenetics
An Introduction to Phylogenetics Bret Larget larget@stat.wisc.edu Departments of Botany and of Statistics University of Wisconsin Madison February 4, 2008 1 / 70 Phylogenetics and Darwin A phylogeny is
More informationComparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom
Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Be able to explain the difference between the p-value and a posterior
More informationLecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions
SMA 50: Statistical Learning and Data Mining in Bioinformatics (also listed as 5.077: Statistical Learning and Data Mining ()) Spring Term (Feb May 200) Faculty: Professor Roy Welsch Wed 0 Feb 7:00-8:0
More informationFinal Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
More informationA Primer of Genome Science THIRD
A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:
More informationAn Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment
An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment Hideki Asoh 1, Masanori Shiro 1 Shotaro Akaho 1, Toshihiro Kamishima 1, Koiti Hasida 1, Eiji Aramaki 2, and Takahide
More informationAP Biology Essential Knowledge Student Diagnostic
AP Biology Essential Knowledge Student Diagnostic Background The Essential Knowledge statements provided in the AP Biology Curriculum Framework are scientific claims describing phenomenon occurring in
More informationFinding Clusters in Phylogenetic Trees: A Special Type of Cluster Analysis
Finding lusters in Phylogenetic Trees: Special Type of luster nalysis Why try to identify clusters in phylogenetic trees? xample: origin of HIV. NUMR: Why are there so many distinct clusters? LUR04-7 SYNHRONY:
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationData Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov
Data Integration Lectures 16 & 17 Lectures Outline Goals for Data Integration Homogeneous data integration time series data (Filkov et al. 2002) Heterogeneous data integration microarray + sequence microarray
More informationBayesian Statistics in One Hour. Patrick Lam
Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical
More informationA branch-and-bound algorithm for the inference of ancestral. amino-acid sequences when the replacement rate varies among
A branch-and-bound algorithm for the inference of ancestral amino-acid sequences when the replacement rate varies among sites Tal Pupko 1,*, Itsik Pe er 2, Masami Hasegawa 1, Dan Graur 3, and Nir Friedman
More informationParallelization Strategies for Multicore Data Analysis
Parallelization Strategies for Multicore Data Analysis Wei-Chen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management
More informationHigh Throughput Network Analysis
High Throughput Network Analysis Sumeet Agarwal 1,2, Gabriel Villar 1,2,3, and Nick S Jones 2,4,5 1 Systems Biology Doctoral Training Centre, University of Oxford, Oxford OX1 3QD, United Kingdom 2 Department
More information14.10.2014. Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)
Overview Kyrre Glette kyrrehg@ifi INF3490 Swarm Intelligence Particle Swarm Optimization Introduction to swarm intelligence principles Particle Swarm Optimization (PSO) 3 Swarms in nature Fish, birds,
More informationUsing simulation to calculate the NPV of a project
Using simulation to calculate the NPV of a project Marius Holtan Onward Inc. 5/31/2002 Monte Carlo simulation is fast becoming the technology of choice for evaluating and analyzing assets, be it pure financial
More informationHandling attrition and non-response in longitudinal data
Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein
More informationSection 3 Comparative Genomics and Phylogenetics
Section 3 Section 3 Comparative enomics and Phylogenetics At the end of this section you should be able to: Describe what is meant by DNA sequencing. Explain what is meant by Bioinformatics and Comparative
More informationHidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006
Hidden Markov Models in Bioinformatics By Máthé Zoltán Kőrösi Zoltán 2006 Outline Markov Chain HMM (Hidden Markov Model) Hidden Markov Models in Bioinformatics Gene Finding Gene Finding Model Viterbi algorithm
More informationSimilarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases. Andreas Züfle
Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases Andreas Züfle Geo Spatial Data Huge flood of geo spatial data Modern technology New user mentality Great research potential
More informationCore Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1
Core Bioinformatics 2014/2015 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformàtica/Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: Sonia.Casillas@uab.cat
More informationNetwork Tomography Based on to-end Measurements
Network Tomography Based on end-to to-end Measurements Francesco Lo Presti Dipartimento di Informatica - Università dell Aquila The First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING
More informationName: Date: Problem How do amino acid sequences provide evidence for evolution? Procedure Part A: Comparing Amino Acid Sequences
Name: Date: Amino Acid Sequences and Evolutionary Relationships Introduction Homologous structures those structures thought to have a common origin but not necessarily a common function provide some of
More informationBorges, J. L. 1998. On exactitude in science. P. 325, In, Jorge Luis Borges, Collected Fictions (Trans. Hurley, H.) Penguin Books.
... In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
More information1 Prior Probability and Posterior Probability
Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which
More informationPrinciples of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
More informationMATCH Commun. Math. Comput. Chem. 61 (2009) 781-788
MATCH Communications in Mathematical and in Computer Chemistry MATCH Commun. Math. Comput. Chem. 61 (2009) 781-788 ISSN 0340-6253 Three distances for rapid similarity analysis of DNA sequences Wei Chen,
More informationObjections to Bayesian statistics
Bayesian Analysis (2008) 3, Number 3, pp. 445 450 Objections to Bayesian statistics Andrew Gelman Abstract. Bayesian inference is one of the more controversial approaches to statistics. The fundamental
More informationA Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data
A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data Faming Liang University of Florida August 9, 2015 Abstract MCMC methods have proven to be a very powerful tool for analyzing
More informationBayesian Tutorial (Sheet Updated 20 March)
Bayesian Tutorial (Sheet Updated 20 March) Practice Questions (for discussing in Class) Week starting 21 March 2016 1. What is the probability that the total of two dice will be greater than 8, given that
More informationName: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.
Name: Class: Date: Chapter 17 Practice Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The correct order for the levels of Linnaeus's classification system,
More informationOnline Model-Based Clustering for Crisis Identification in Distributed Computing
Online Model-Based Clustering for Crisis Identification in Distributed Computing Dawn Woodard School of Operations Research and Information Engineering & Dept. of Statistical Science, Cornell University
More informationSYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis
SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October 17, 2015 Outline
More informationINDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS
INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem
More informationHeuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations
Heuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations AlCoB 2014 First International Conference on Algorithms for Computational Biology Thiago da Silva Arruda Institute
More informationFull and Complete Binary Trees
Full and Complete Binary Trees Binary Tree Theorems 1 Here are two important types of binary trees. Note that the definitions, while similar, are logically independent. Definition: a binary tree T is full
More informationPERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0
PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0 15 th January 2014 Al Chrosny Director, Software Engineering TreeAge Software, Inc. achrosny@treeage.com Andrew Munzer Director, Training and Customer
More informationData Partitions and Complex Models in Bayesian Analysis: The Phylogeny of Gymnophthalmid Lizards
Syst. Biol. 53(3):448 469, 2004 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150490445797 Data Partitions and Complex Models in Bayesian Analysis:
More information6 Creating the Animation
6 Creating the Animation Now that the animation can be represented, stored, and played back, all that is left to do is understand how it is created. This is where we will use genetic algorithms, and this
More informationA Bayesian Antidote Against Strategy Sprawl
A Bayesian Antidote Against Strategy Sprawl Benjamin Scheibehenne (benjamin.scheibehenne@unibas.ch) University of Basel, Missionsstrasse 62a 4055 Basel, Switzerland & Jörg Rieskamp (joerg.rieskamp@unibas.ch)
More informationPHYLOGENY AND COMPARATIVE METHODS SYMBIOMICS WORKSHOP
PHYLOGENY AND COMPARATIVE METHODS SYMBIOMICS WORKSHOP March 4-7, 2013 Valencia, Spain Parc Cientific of the University of Valencia Goals The aim of this workshop is to provide the attendees with a broad
More informationLecture 10: Regression Trees
Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
More informationMCMC-Based Assessment of the Error Characteristics of a Surface-Based Combined Radar - Passive Microwave Cloud Property Retrieval
MCMC-Based Assessment of the Error Characteristics of a Surface-Based Combined Radar - Passive Microwave Cloud Property Retrieval Derek J. Posselt University of Michigan Jay G. Mace University of Utah
More informationWorksheet - COMPARATIVE MAPPING 1
Worksheet - COMPARATIVE MAPPING 1 The arrangement of genes and other DNA markers is compared between species in Comparative genome mapping. As early as 1915, the geneticist J.B.S Haldane reported that
More informationBook Review of Rosenhouse, The Monty Hall Problem. Leslie Burkholder 1
Book Review of Rosenhouse, The Monty Hall Problem Leslie Burkholder 1 The Monty Hall Problem, Jason Rosenhouse, New York, Oxford University Press, 2009, xii, 195 pp, US $24.95, ISBN 978-0-19-5#6789-8 (Source
More informationBayesian Statistical Analysis in Medical Research
Bayesian Statistical Analysis in Medical Research David Draper Department of Applied Mathematics and Statistics University of California, Santa Cruz draper@ams.ucsc.edu www.ams.ucsc.edu/ draper ROLE Steering
More informationGenetics Lecture Notes 7.03 2005. Lectures 1 2
Genetics Lecture Notes 7.03 2005 Lectures 1 2 Lecture 1 We will begin this course with the question: What is a gene? This question will take us four lectures to answer because there are actually several
More informationPrinciples of Evolution - Origin of Species
Theories of Organic Evolution X Multiple Centers of Creation (de Buffon) developed the concept of "centers of creation throughout the world organisms had arisen, which other species had evolved from X
More informationMatthew Kaplan and Taylor Edwards. University of Arizona Tucson, Arizona
Matthew Kaplan and Taylor Edwards University of Arizona Tucson, Arizona Unresolved paternity Consent for testing Ownership of Samples Uncovering genetic disorders SRY Reversal / Klinefelter s Syndrome
More informationAn experimental study comparing linguistic phylogenetic reconstruction methods *
An experimental study comparing linguistic phylogenetic reconstruction methods * François Barbançon, a Steven N. Evans, b Luay Nakhleh c, Don Ringe, d and Tandy Warnow, e, a Palantir Technologies, 100
More informationStatistics in Applications III. Distribution Theory and Inference
2.2 Master of Science Degrees The Department of Statistics at FSU offers three different options for an MS degree. 1. The applied statistics degree is for a student preparing for a career as an applied
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More information