Lecture 3. Phylogeny methods: Branch and bound, distance methods
|
|
- Neal Snow
- 7 years ago
- Views:
Transcription
1 Lecture 3. Phylogeny methods: ranch and bound, distance methods Joe Felsenstein epartment of Genome Sciences and epartment of iology Lecture 3. Phylogeny methods: ranch and bound, distance methods p.1/25
2 Greedy search by sequential addition Greedy search by addition of species in a fixed order (,,,, ) in the best place each time. Lecture 3. Phylogeny methods: ranch and bound, distance methods p.2/25
3 Goloboff s time-saving trick G H K G H R L V Z S U V Z M R S U Goloboff s economy in computing scores of rearranged trees Once the views have been computed, they can be taken to represent subtrees, without going inside those subtrees Lecture 3. Phylogeny methods: ranch and bound, distance methods p.3/25
4 Star decomposition F F F F F F Star decomposition" search for best tree can happen in multiple ways Lecture 3. Phylogeny methods: ranch and bound, distance methods p.4/25
5 isk-covering F isk covering" assembly of a tree from overlapping estimated subtrees Lecture 3. Phylogeny methods: ranch and bound, distance methods p.5/25
6 Shortest Hamiltonian path problem (a) (b) (c) (d) Lecture 3. Phylogeny methods: ranch and bound, distance methods p.6/25
7 Search tree for this problem (1,2,3,4,5,6,7,8,10,9) (1,2,3,4,5,6,7,9,10,8) (1,2,3,4,5,6,7,10,9,8) (1,2,3,4,5,6,7,8,9,10) (1,2,3,4,5,6,7,9,8,10) (1,2,3,4,5,6,7,10,8,9) add add 310 add 9 add add 310 add 8 add add 39 add 8 add 9 add 10 add 8 add 10 add 8 add 9 add 8 add 10 add 9 etc. etc. add 3 add 4 add 5 etc. etc. etc. add 2 add 3 add 4 add 5 etc. etc. add 1 add 2 add 3 start Lecture 3. Phylogeny methods: ranch and bound, distance methods p.7/25
8 Search tree of trees Lecture 3. Phylogeny methods: ranch and bound, distance methods p.8/25
9 same, with parsimony scores in place of trees Lecture 3. Phylogeny methods: ranch and bound, distance methods p.9/25
10 Time Polynomial time and exponential time n +4n 3 e 0.5n Problem size How does the time taken by an algorithm depend on the size of the problem? If it is a polynomial (even one with big coefficients), with a big enough case it is faster than one that depends on the size exponentially. Lecture 3. Phylogeny methods: ranch and bound, distance methods p.10/25
11 NP completeness and NP hardness P NP does this part exist? is P = NP? NP Hard NP omplete (This diagram is not quite correct see the diagrams on the Wikipedia page for NP-hard ). P = problems that can be solved by a polynomial time algorithm NP complete = problems for which a proposed solution can be checked in polynomial time but for which it can be proven that if one of them is in P, all are. NP hard = problems for which a solution can be checked in polynomial time, but might be not solvable in polynomial time. Lecture 3. Phylogeny methods: ranch and bound, distance methods p.11/25
12 istance methods These have been attractive, particular to mathematical scientists who love geometry. This has its good and bad effects. 1. Take the sequences in all pairs. 2. For each pair compute a distance. (s we will see, this is best thought of as the length of the 2-species tree for those species). 3. Try to find that tree which best fits the table of distances. Lecture 3. Phylogeny methods: ranch and bound, distance methods p.12/25
13 phylogeny with branch lengths and the pairwise distances it predicts Lecture 3. Phylogeny methods: ranch and bound, distance methods p.13/25
14 phylogeny with branch lengths v 1 v 2 v 5 v 6 v 7 v 3 v 4 Lecture 3. Phylogeny methods: ranch and bound, distance methods p.14/25
15 Least squares trees Least squares methods minimize Q = n w ij ( ij d ij ) 2 i=1 j i over all trees, using the distances d ij that they predict. avalli-sforza and dwards suggested w ij = 1, Fitch and Margoliash suggested w ij = 1/ 2 ij. Lecture 3. Phylogeny methods: ranch and bound, distance methods p.15/25
16 Statistical assumptions of least squares trees Implicit assumption is that distances are (independently?) Normally distributed with expectation d ij and variance proportional to 1/w 2 ij : ij N (d ij, K/w ij ) Thus the different weightings correspond to different assumptions about the error in the distances. lso, there is assumed to be no covariance of distances. In fact, the distances will covary, since a change in an interior branch of the tree increases (or decreases) all distances whose paths go through that branch. Lecture 3. Phylogeny methods: ranch and bound, distance methods p.16/25
17 Matrix approach to fitting branch lengths If we stack the distances up into a column vector, we can solve the least squares equation (obtained by taking derivatives of the quadratic form Q): T = ( 12, 13, 14, 15, 23, 24, 25, 34, 35, 45 ) X T = X T X v. where the design matrix X for the given tree topology has 1 s whenever a given branch lies on the path between those two species. Here is the design matrix for the tree we just saw. X = ranches which , , , , , , , , , ,5 v v 2 1 v 7 v 5 v 6 v v 3 4 Lecture 3. Phylogeny methods: ranch and bound, distance methods p.17/25
18 The Jukes-antor model for N u/3 G u/3 u/3 u/3 u/3 u/3 T Lecture 3. Phylogeny methods: ranch and bound, distance methods p.18/25
19 erivation of the probability of change 1. Imagine events occuring at rate 4 3u per unit time which replace a base by one of the 4 bases chosen at random. 2. Persuade yourself that this is no different in outcome from events u per unit time that replace it by one of the other 3 chosen at random. 3. The probability a branch has none of these (first kind of) events if it is of length t is exp( 4 3 u t). (Think the zero term of a Poisson distribution). 4. If it does have one or more of these events, you end up with one of the 4 bases chosen at random. 5. Therefore the probability of a net change is: 3 4 (1 e ( 4 3 u t)) Lecture 3. Phylogeny methods: ranch and bound, distance methods p.19/25
20 per site differences The distance for the Jukes-antor model branch length Lecture 3. Phylogeny methods: ranch and bound, distance methods p.20/25
21 If you don t correct for multiple hits Left: the true tree. Right: a tree fitting the uncorrected distances Lecture 3. Phylogeny methods: ranch and bound, distance methods p.21/25
22 References, page 1 Maddison,. R The discovery and importance of multiple islands of most-parsimonious trees. Systematic Zoology 40: [iscusses heuristic search strategy involving ties, multiple starts] Farris, J. S Methods for computing Wagner trees. Systematic Zoology 19: [arly parsimony algorithms paper is one of first to mention sequential addition strategy] Saitou, N., and M. Nei The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular iology and volution 4: [First mention of star-decomposition search for best trees, sort of] Strimmer, K., and. von Haeseler Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies. Molecular iology and volution 13: [ssembles trees out of quartets] Huson,., S. Nettles, L. Parida, T. Warnow, and S. Yooseph The disk-covering method for tree reconstruction. pp in Proceedings of lgorithms and xperiments (LX98), Trento, Italy, Feb. 9-11, 1998, ed. R. attiti and.. ertossi. [ isk-covering method for long stringy trees] Lecture 3. Phylogeny methods: ranch and bound, distance methods p.22/25
23 References, page 2 Foulds, L. R. and R. L. Graham The Steiner problem in phylogeny is NP-complete. dvances in pplied Mathematics 3: [Parsimony is NP-hard] Graham, R. L. and L. R. Foulds Unlikelihood that minimal phylogenies for a realistic biological study can be constructed in reasonable computat ional time. Mathematical iosciences 60: [... and more] Hendy, M.. and. Penny ranch and bound algorithms to determine minimal evolutionary trees. Mathematical iosciences 60: [Introduced branch-and-bound for phylogenies] Felsenstein, J Inferring Phylogenies. Sinauer ssociates, Sunderland, Massachusetts. [For this lecture the material is chapters 4, and 5] Semple,. and M. Steel Phylogenetics. Oxford University Press, Oxford. [lso covers search strategies] Lecture 3. Phylogeny methods: ranch and bound, distance methods p.23/25
24 References, page 3 Felsenstein, J istance methods for inferring phylogenies: a justification. volution 38: [rgument for statistical interpretation of distance methods] Farris, J. S istance data revisited. ladistics 1: [Reply to my 1984 paper] Felsenstein, J istance methods: reply to Farris. ladistics 2: [reply to Farris 1985] Farris, J. S istances and statistics. ladistics 2: [debate was cut off after this] Lecture 3. Phylogeny methods: ranch and bound, distance methods p.24/25
25 References, page 4 ryant,., and P. Waddell Rapid evaluation of least-squares and minimum-evolution criteria on phylogenetic trees. Molecular iology and volution 15: [quicker least squares distance trees] Felsenstein, J Inferring Phylogenies. Sinauer ssociates, Sunderland, Massachusetts. [See chapter 11] Semple,. and M. Steel Phylogenetics. Oxford University Press, Oxford. [See pp ] Yang, Z omputational Molecular volution. Oxford University Press, Oxford. [See pages 89-93] Lecture 3. Phylogeny methods: ranch and bound, distance methods p.25/25
Arbres formels et Arbre(s) de la Vie
Arbres formels et Arbre(s) de la Vie A bit of history and biology Definitions Numbers Topological distances Consensus Random models Algorithms to build trees Basic principles DATA sequence alignment distance
More informationHeuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations
Heuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations AlCoB 2014 First International Conference on Algorithms for Computational Biology Thiago da Silva Arruda Institute
More informationPhylogenetic Trees Made Easy
Phylogenetic Trees Made Easy A How-To Manual Fourth Edition Barry G. Hall University of Rochester, Emeritus and Bellingham Research Institute Sinauer Associates, Inc. Publishers Sunderland, Massachusetts
More informationLinearly Independent Sets and Linearly Dependent Sets
These notes closely follow the presentation of the material given in David C. Lay s textbook Linear Algebra and its Applications (3rd edition). These notes are intended primarily for in-class presentation
More informationIntroduction to Bioinformatics AS 250.265 Laboratory Assignment 6
Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 In the last lab, you learned how to perform basic multiple sequence alignments. While useful in themselves for determining conserved residues
More informationPRec-I-DCM3: a parallel framework for fast and accurate large-scale phylogeny reconstruction
Int. J. Bioinformatics Research and Applications, Vol. 2, No. 4, 2006 407 PRec-I-DCM3: a parallel framework for fast and accurate large-scale phylogeny reconstruction Yuri Dotsenko*, Cristian Coarfa, Luay
More information1 Review of Least Squares Solutions to Overdetermined Systems
cs4: introduction to numerical analysis /9/0 Lecture 7: Rectangular Systems and Numerical Integration Instructor: Professor Amos Ron Scribes: Mark Cowlishaw, Nathanael Fillmore Review of Least Squares
More informationBio-Informatics Lectures. A Short Introduction
Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively
More informationLeast-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
More information4 Techniques for Analyzing Large Data Sets
4 Techniques for Analyzing Large Data Sets Pablo A. Goloboff Contents 1 Introduction 70 2 Traditional Techniques 71 3 Composite Optima: Why Do Traditional Techniques Fail? 72 4 Techniques for Analyzing
More informationOutline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits
Outline NP-completeness Examples of Easy vs. Hard problems Euler circuit vs. Hamiltonian circuit Shortest Path vs. Longest Path 2-pairs sum vs. general Subset Sum Reducing one problem to another Clique
More informationMaximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1
Maximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1 Ziheng Yang Department of Animal Science, Beijing Agricultural University Felsenstein s maximum-likelihood
More informationBayesian Phylogeny and Measures of Branch Support
Bayesian Phylogeny and Measures of Branch Support Bayesian Statistics Imagine we have a bag containing 100 dice of which we know that 90 are fair and 10 are biased. The
More informationMultivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More informationPHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference
PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference Stephane Guindon, F. Le Thiec, Patrice Duroux, Olivier Gascuel To cite this version: Stephane Guindon, F. Le Thiec, Patrice
More informationWhat mathematical optimization can, and cannot, do for biologists. Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL
What mathematical optimization can, and cannot, do for biologists Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL Introduction There is no shortage of literature about the
More informationHidden Markov Models
8.47 Introduction to omputational Molecular Biology Lecture 7: November 4, 2004 Scribe: Han-Pang hiu Lecturer: Ross Lippert Editor: Russ ox Hidden Markov Models The G island phenomenon The nucleotide frequencies
More informationDynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction
Lecture 11 Dynamic Programming 11.1 Overview Dynamic Programming is a powerful technique that allows one to solve many different types of problems in time O(n 2 ) or O(n 3 ) for which a naive approach
More informationIntroduction to Phylogenetic Analysis
Subjects of this lecture Introduction to Phylogenetic nalysis Irit Orr 1 Introducing some of the terminology of phylogenetics. 2 Introducing some of the most commonly used methods for phylogenetic analysis.
More informationManifold Learning Examples PCA, LLE and ISOMAP
Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition
More information5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1
5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1 General Integer Linear Program: (ILP) min c T x Ax b x 0 integer Assumption: A, b integer The integrality condition
More informationThe Taxman Game. Robert K. Moniot September 5, 2003
The Taxman Game Robert K. Moniot September 5, 2003 1 Introduction Want to know how to beat the taxman? Legally, that is? Read on, and we will explore this cute little mathematical game. The taxman game
More informationDesign and Analysis of ACO algorithms for edge matching problems
Design and Analysis of ACO algorithms for edge matching problems Carl Martin Dissing Söderlind Kgs. Lyngby 2010 DTU Informatics Department of Informatics and Mathematical Modelling Technical University
More informationGenome Explorer For Comparative Genome Analysis
Genome Explorer For Comparative Genome Analysis Jenn Conn 1, Jo L. Dicks 1 and Ian N. Roberts 2 Abstract Genome Explorer brings together the tools required to build and compare phylogenies from both sequence
More informationA comparison of methods for estimating the transition:transversion ratio from DNA sequences
Molecular Phylogenetics and Evolution 32 (2004) 495 503 MOLECULAR PHYLOGENETICS AND EVOLUTION www.elsevier.com/locate/ympev A comparison of methods for estimating the transition:transversion ratio from
More informationFactoring Algorithms
Institutionen för Informationsteknologi Lunds Tekniska Högskola Department of Information Technology Lund University Cryptology - Project 1 Factoring Algorithms The purpose of this project is to understand
More informationEuclidean Minimum Spanning Trees Based on Well Separated Pair Decompositions Chaojun Li. Advised by: Dave Mount. May 22, 2014
Euclidean Minimum Spanning Trees Based on Well Separated Pair Decompositions Chaojun Li Advised by: Dave Mount May 22, 2014 1 INTRODUCTION In this report we consider the implementation of an efficient
More informationLecture 10: Regression Trees
Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
More informationComplexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar
Complexity Theory IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar Outline Goals Computation of Problems Concepts and Definitions Complexity Classes and Problems Polynomial Time Reductions Examples
More informationHigh Performance Computing for Operation Research
High Performance Computing for Operation Research IEF - Paris Sud University claude.tadonki@u-psud.fr INRIA-Alchemy seminar, Thursday March 17 Research topics Fundamental Aspects of Algorithms and Complexity
More informationPhylogenetic Models of Rate Heterogeneity: A High Performance Computing Perspective
Phylogenetic Models of Rate Heterogeneity: A High Performance Computing Perspective Alexandros Stamatakis Institute of Computer Science, Foundation for Research and Technology-Hellas P.O. Box 1385, Heraklion,
More informationHigh Throughput Network Analysis
High Throughput Network Analysis Sumeet Agarwal 1,2, Gabriel Villar 1,2,3, and Nick S Jones 2,4,5 1 Systems Biology Doctoral Training Centre, University of Oxford, Oxford OX1 3QD, United Kingdom 2 Department
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationEvaluating the Performance of a Successive-Approximations Approach to Parameter Optimization in Maximum-Likelihood Phylogeny Estimation
Evaluating the Performance of a Successive-Approximations Approach to Parameter Optimization in Maximum-Likelihood Phylogeny Estimation Jack Sullivan,* Zaid Abdo, à Paul Joyce, à and David L. Swofford
More informationReview Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
More informationBIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS
BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:
More informationApplied Algorithm Design Lecture 5
Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design
More informationTIMSS Advanced 2015 Mathematics Framework
PR dvanced 205 athematics Framework Liv issel Grønmo, ary Lindquist, and lka rora he assessment framework for dvanced athematics is organized around two dimensions: a content dimension specifying the domains
More informationNP-completeness and the real world. NP completeness. NP-completeness and the real world (2) NP-completeness and the real world
-completeness and the real world completeness Course Discrete Biological Models (Modelli Biologici Discreti) Zsuzsanna Lipták Imagine you are working for a biotech company. One day your boss calls you
More informationA Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML
9 June 2011 A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML by Jun Inoue, Mario dos Reis, and Ziheng Yang In this tutorial we will analyze
More informationCAD Algorithms. P and NP
CAD Algorithms The Classes P and NP Mohammad Tehranipoor ECE Department 6 September 2010 1 P and NP P and NP are two families of problems. P is a class which contains all of the problems we solve using
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationMultimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.
Multimedia Databases Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 14 Previous Lecture 13 Indexes for Multimedia Data 13.1
More informationGeneral Framework for an Iterative Solution of Ax b. Jacobi s Method
2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,
More informationDmitri Krioukov CAIDA/UCSD
Hyperbolic geometry of complex networks Dmitri Krioukov CAIDA/UCSD dima@caida.org F. Papadopoulos, M. Boguñá, A. Vahdat, and kc claffy Complex networks Technological Internet Transportation Power grid
More informationMolecular Clocks and Tree Dating with r8s and BEAST
Integrative Biology 200B University of California, Berkeley Principals of Phylogenetics: Ecology and Evolution Spring 2011 Updated by Nick Matzke Molecular Clocks and Tree Dating with r8s and BEAST Today
More informationMathematics for Algorithm and System Analysis
Mathematics for Algorithm and System Analysis for students of computer and computational science Edward A. Bender S. Gill Williamson c Edward A. Bender & S. Gill Williamson 2005. All rights reserved. Preface
More informationNon-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationSolving certain quintics
Annales Mathematicae et Informaticae 37 010) pp. 193 197 http://ami.ektf.hu Solving certain quintics Raghavendra G. Kulkarni Bharat Electronics Ltd., India Submitted 1 July 010; Accepted 6 July 010 Abstract
More informationIntroduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More information1 Prior Probability and Posterior Probability
Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationIE 680 Special Topics in Production Systems: Networks, Routing and Logistics*
IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* Rakesh Nagi Department of Industrial Engineering University at Buffalo (SUNY) *Lecture notes from Network Flows by Ahuja, Magnanti
More informationStatistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
More informationMissing data and the accuracy of Bayesian phylogenetics
Journal of Systematics and Evolution 46 (3): 307 314 (2008) (formerly Acta Phytotaxonomica Sinica) doi: 10.3724/SP.J.1002.2008.08040 http://www.plantsystematics.com Missing data and the accuracy of Bayesian
More informationJUST-IN-TIME SCHEDULING WITH PERIODIC TIME SLOTS. Received December May 12, 2003; revised February 5, 2004
Scientiae Mathematicae Japonicae Online, Vol. 10, (2004), 431 437 431 JUST-IN-TIME SCHEDULING WITH PERIODIC TIME SLOTS Ondřej Čepeka and Shao Chin Sung b Received December May 12, 2003; revised February
More informationBoolean Network Models
Boolean Network Models 2/5/03 History Kaufmann, 1970s Studied organization and dynamics properties of (N,k) Boolean Networks Found out that highly connected networks behave differently than lowly connected
More informationBorges, J. L. 1998. On exactitude in science. P. 325, In, Jorge Luis Borges, Collected Fictions (Trans. Hurley, H.) Penguin Books.
... In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More information11 Multivariate Polynomials
CS 487: Intro. to Symbolic Computation Winter 2009: M. Giesbrecht Script 11 Page 1 (These lecture notes were prepared and presented by Dan Roche.) 11 Multivariate Polynomials References: MC: Section 16.6
More informationOnline Consensus and Agreement of Phylogenetic Trees.
Online Consensus and Agreement of Phylogenetic Trees. Tanya Y. Berger-Wolf 1 Department of Computer Science, University of New Mexico, Albuquerque, NM 87131, USA. tanyabw@cs.unm.edu Abstract. Computational
More informationScaling the gene duplication problem towards the Tree of Life: Accelerating the rspr heuristic search
Scaling the gene duplication problem towards the Tree of Life: Accelerating the rspr heuristic search André Wehe 1 and J. Gordon Burleigh 2 1 Department of Computer Science, Iowa State University, Ames,
More informationSocial Media Mining. Network Measures
Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users
More informationOffline 1-Minesweeper is NP-complete
Offline 1-Minesweeper is NP-complete James D. Fix Brandon McPhail May 24 Abstract We use Minesweeper to illustrate NP-completeness proofs, arguments that establish the hardness of solving certain problems.
More informationFactoring. Factoring 1
Factoring Factoring 1 Factoring Security of RSA algorithm depends on (presumed) difficulty of factoring o Given N = pq, find p or q and RSA is broken o Rabin cipher also based on factoring Factoring like
More informationFEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL
FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL STATIsTICs 4 IV. RANDOm VECTORs 1. JOINTLY DIsTRIBUTED RANDOm VARIABLEs If are two rom variables defined on the same sample space we define the joint
More informationPROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org
BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab Mai S. Mabrouk 1, Marwa Hamdy 2, Marwa Mamdouh 2, Marwa Aboelfotoh 2,Yasser M. Kadah 2 1 Biomedical Engineering Department,
More information1 Solving LPs: The Simplex Algorithm of George Dantzig
Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.
More informationCMPSCI611: Approximating MAX-CUT Lecture 20
CMPSCI611: Approximating MAX-CUT Lecture 20 For the next two lectures we ll be seeing examples of approximation algorithms for interesting NP-hard problems. Today we consider MAX-CUT, which we proved to
More informationLoad balancing in a heterogeneous computer system by self-organizing Kohonen network
Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.
More informationDynamic programming. Doctoral course Optimization on graphs - Lecture 4.1. Giovanni Righini. January 17 th, 2013
Dynamic programming Doctoral course Optimization on graphs - Lecture.1 Giovanni Righini January 1 th, 201 Implicit enumeration Combinatorial optimization problems are in general NP-hard and we usually
More informationInternational Journal of Advanced Research in Computer Science and Software Engineering
Volume 3, Issue 7, July 23 ISSN: 2277 28X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Greedy Algorithm:
More informationVisualization of Phylogenetic Trees and Metadata
Visualization of Phylogenetic Trees and Metadata November 27, 2015 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.clcbio.com support-clcbio@qiagen.com
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
More informationP vs NP problem in the field anthropology
Research Article P vs NP problem in the field anthropology Michael.A. Popov, Oxford, UK Email Michael282.eps@gmail.com Keywords P =?NP - complexity anthropology - M -decision - quantum -like game - game-theoretical
More informationCore Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1
Core Bioinformatics 2014/2015 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformàtica/Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: Sonia.Casillas@uab.cat
More informationSome Computer Organizations and Their Effectiveness. Michael J Flynn. IEEE Transactions on Computers. Vol. c-21, No.
Some Computer Organizations and Their Effectiveness Michael J Flynn IEEE Transactions on Computers. Vol. c-21, No.9, September 1972 Introduction Attempts to codify a computer have been from three points
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationData Structures. Chapter 8
Chapter 8 Data Structures Computer has to process lots and lots of data. To systematically process those data efficiently, those data are organized as a whole, appropriate for the application, called a
More informationIndiana State Core Curriculum Standards updated 2009 Algebra I
Indiana State Core Curriculum Standards updated 2009 Algebra I Strand Description Boardworks High School Algebra presentations Operations With Real Numbers Linear Equations and A1.1 Students simplify and
More informationContinued Fractions and the Euclidean Algorithm
Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction
More information5.1 Bipartite Matching
CS787: Advanced Algorithms Lecture 5: Applications of Network Flow In the last lecture, we looked at the problem of finding the maximum flow in a graph, and how it can be efficiently solved using the Ford-Fulkerson
More informationA Non-Linear Schema Theorem for Genetic Algorithms
A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland
More informationMaster's projects at ITMO University. Daniil Chivilikhin PhD Student @ ITMO University
Master's projects at ITMO University Daniil Chivilikhin PhD Student @ ITMO University General information Guidance from our lab's researchers Publishable results 2 Research areas Research at ITMO Evolutionary
More informationAlgebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.
Chapter 1 Vocabulary identity - A statement that equates two equivalent expressions. verbal model- A word equation that represents a real-life problem. algebraic expression - An expression with variables.
More informationProtein Sequence Analysis - Overview -
Protein Sequence Analysis - Overview - UDEL Workshop Raja Mazumder Research Associate Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center Topics Why do protein
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationSTATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. Clarificationof zonationprocedure described onpp. 238-239
STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. by John C. Davis Clarificationof zonationprocedure described onpp. 38-39 Because the notation used in this section (Eqs. 4.8 through 4.84) is inconsistent
More informationAn experimental study comparing linguistic phylogenetic reconstruction methods *
An experimental study comparing linguistic phylogenetic reconstruction methods * François Barbançon, a Steven N. Evans, b Luay Nakhleh c, Don Ringe, d and Tandy Warnow, e, a Palantir Technologies, 100
More informationComputer Algorithms. NP-Complete Problems. CISC 4080 Yanjun Li
Computer Algorithms NP-Complete Problems NP-completeness The quest for efficient algorithms is about finding clever ways to bypass the process of exhaustive search, using clues from the input in order
More informationProgramming Using Python
Introduction to Computation and Programming Using Python Revised and Expanded Edition John V. Guttag The MIT Press Cambridge, Massachusetts London, England CONTENTS PREFACE xiii ACKNOWLEDGMENTS xv 1 GETTING
More informationFinding Clusters in Phylogenetic Trees: A Special Type of Cluster Analysis
Finding lusters in Phylogenetic Trees: Special Type of luster nalysis Why try to identify clusters in phylogenetic trees? xample: origin of HIV. NUMR: Why are there so many distinct clusters? LUR04-7 SYNHRONY:
More informationNetwork Protocol Analysis using Bioinformatics Algorithms
Network Protocol Analysis using Bioinformatics Algorithms Marshall A. Beddoe Marshall_Beddoe@McAfee.com ABSTRACT Network protocol analysis is currently performed by hand using only intuition and a protocol
More informationSystems of Linear Equations
Systems of Linear Equations Beifang Chen Systems of linear equations Linear systems A linear equation in variables x, x,, x n is an equation of the form a x + a x + + a n x n = b, where a, a,, a n and
More informationIntroduction to Multivariate Analysis
Introduction to Multivariate Analysis Lecture 1 August 24, 2005 Multivariate Analysis Lecture #1-8/24/2005 Slide 1 of 30 Today s Lecture Today s Lecture Syllabus and course overview Chapter 1 (a brief
More informationNEW GENERATION OF COMPUTER AIDED DESIGN IN SPACE PLANNING METHODS A SURVEY AND A PROPOSAL
NEW GENERATION OF COMPUTER AIDED DESIGN IN SPACE PLANNING METHODS A SURVEY AND A PROPOSAL YING-CHUN HSU, ROBERT J. KRAWCZYK Illinois Institute of Technology, Chicago, IL USA Email address: hsuying1@iit.edu
More informationPartial Fractions. Combining fractions over a common denominator is a familiar operation from algebra:
Partial Fractions Combining fractions over a common denominator is a familiar operation from algebra: From the standpoint of integration, the left side of Equation 1 would be much easier to work with than
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More information