Lecture 3. Phylogeny methods: Branch and bound, distance methods

Size: px
Start display at page:

Download "Lecture 3. Phylogeny methods: Branch and bound, distance methods"

Transcription

1 Lecture 3. Phylogeny methods: ranch and bound, distance methods Joe Felsenstein epartment of Genome Sciences and epartment of iology Lecture 3. Phylogeny methods: ranch and bound, distance methods p.1/25

2 Greedy search by sequential addition Greedy search by addition of species in a fixed order (,,,, ) in the best place each time. Lecture 3. Phylogeny methods: ranch and bound, distance methods p.2/25

3 Goloboff s time-saving trick G H K G H R L V Z S U V Z M R S U Goloboff s economy in computing scores of rearranged trees Once the views have been computed, they can be taken to represent subtrees, without going inside those subtrees Lecture 3. Phylogeny methods: ranch and bound, distance methods p.3/25

4 Star decomposition F F F F F F Star decomposition" search for best tree can happen in multiple ways Lecture 3. Phylogeny methods: ranch and bound, distance methods p.4/25

5 isk-covering F isk covering" assembly of a tree from overlapping estimated subtrees Lecture 3. Phylogeny methods: ranch and bound, distance methods p.5/25

6 Shortest Hamiltonian path problem (a) (b) (c) (d) Lecture 3. Phylogeny methods: ranch and bound, distance methods p.6/25

7 Search tree for this problem (1,2,3,4,5,6,7,8,10,9) (1,2,3,4,5,6,7,9,10,8) (1,2,3,4,5,6,7,10,9,8) (1,2,3,4,5,6,7,8,9,10) (1,2,3,4,5,6,7,9,8,10) (1,2,3,4,5,6,7,10,8,9) add add 310 add 9 add add 310 add 8 add add 39 add 8 add 9 add 10 add 8 add 10 add 8 add 9 add 8 add 10 add 9 etc. etc. add 3 add 4 add 5 etc. etc. etc. add 2 add 3 add 4 add 5 etc. etc. add 1 add 2 add 3 start Lecture 3. Phylogeny methods: ranch and bound, distance methods p.7/25

8 Search tree of trees Lecture 3. Phylogeny methods: ranch and bound, distance methods p.8/25

9 same, with parsimony scores in place of trees Lecture 3. Phylogeny methods: ranch and bound, distance methods p.9/25

10 Time Polynomial time and exponential time n +4n 3 e 0.5n Problem size How does the time taken by an algorithm depend on the size of the problem? If it is a polynomial (even one with big coefficients), with a big enough case it is faster than one that depends on the size exponentially. Lecture 3. Phylogeny methods: ranch and bound, distance methods p.10/25

11 NP completeness and NP hardness P NP does this part exist? is P = NP? NP Hard NP omplete (This diagram is not quite correct see the diagrams on the Wikipedia page for NP-hard ). P = problems that can be solved by a polynomial time algorithm NP complete = problems for which a proposed solution can be checked in polynomial time but for which it can be proven that if one of them is in P, all are. NP hard = problems for which a solution can be checked in polynomial time, but might be not solvable in polynomial time. Lecture 3. Phylogeny methods: ranch and bound, distance methods p.11/25

12 istance methods These have been attractive, particular to mathematical scientists who love geometry. This has its good and bad effects. 1. Take the sequences in all pairs. 2. For each pair compute a distance. (s we will see, this is best thought of as the length of the 2-species tree for those species). 3. Try to find that tree which best fits the table of distances. Lecture 3. Phylogeny methods: ranch and bound, distance methods p.12/25

13 phylogeny with branch lengths and the pairwise distances it predicts Lecture 3. Phylogeny methods: ranch and bound, distance methods p.13/25

14 phylogeny with branch lengths v 1 v 2 v 5 v 6 v 7 v 3 v 4 Lecture 3. Phylogeny methods: ranch and bound, distance methods p.14/25

15 Least squares trees Least squares methods minimize Q = n w ij ( ij d ij ) 2 i=1 j i over all trees, using the distances d ij that they predict. avalli-sforza and dwards suggested w ij = 1, Fitch and Margoliash suggested w ij = 1/ 2 ij. Lecture 3. Phylogeny methods: ranch and bound, distance methods p.15/25

16 Statistical assumptions of least squares trees Implicit assumption is that distances are (independently?) Normally distributed with expectation d ij and variance proportional to 1/w 2 ij : ij N (d ij, K/w ij ) Thus the different weightings correspond to different assumptions about the error in the distances. lso, there is assumed to be no covariance of distances. In fact, the distances will covary, since a change in an interior branch of the tree increases (or decreases) all distances whose paths go through that branch. Lecture 3. Phylogeny methods: ranch and bound, distance methods p.16/25

17 Matrix approach to fitting branch lengths If we stack the distances up into a column vector, we can solve the least squares equation (obtained by taking derivatives of the quadratic form Q): T = ( 12, 13, 14, 15, 23, 24, 25, 34, 35, 45 ) X T = X T X v. where the design matrix X for the given tree topology has 1 s whenever a given branch lies on the path between those two species. Here is the design matrix for the tree we just saw. X = ranches which , , , , , , , , , ,5 v v 2 1 v 7 v 5 v 6 v v 3 4 Lecture 3. Phylogeny methods: ranch and bound, distance methods p.17/25

18 The Jukes-antor model for N u/3 G u/3 u/3 u/3 u/3 u/3 T Lecture 3. Phylogeny methods: ranch and bound, distance methods p.18/25

19 erivation of the probability of change 1. Imagine events occuring at rate 4 3u per unit time which replace a base by one of the 4 bases chosen at random. 2. Persuade yourself that this is no different in outcome from events u per unit time that replace it by one of the other 3 chosen at random. 3. The probability a branch has none of these (first kind of) events if it is of length t is exp( 4 3 u t). (Think the zero term of a Poisson distribution). 4. If it does have one or more of these events, you end up with one of the 4 bases chosen at random. 5. Therefore the probability of a net change is: 3 4 (1 e ( 4 3 u t)) Lecture 3. Phylogeny methods: ranch and bound, distance methods p.19/25

20 per site differences The distance for the Jukes-antor model branch length Lecture 3. Phylogeny methods: ranch and bound, distance methods p.20/25

21 If you don t correct for multiple hits Left: the true tree. Right: a tree fitting the uncorrected distances Lecture 3. Phylogeny methods: ranch and bound, distance methods p.21/25

22 References, page 1 Maddison,. R The discovery and importance of multiple islands of most-parsimonious trees. Systematic Zoology 40: [iscusses heuristic search strategy involving ties, multiple starts] Farris, J. S Methods for computing Wagner trees. Systematic Zoology 19: [arly parsimony algorithms paper is one of first to mention sequential addition strategy] Saitou, N., and M. Nei The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular iology and volution 4: [First mention of star-decomposition search for best trees, sort of] Strimmer, K., and. von Haeseler Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies. Molecular iology and volution 13: [ssembles trees out of quartets] Huson,., S. Nettles, L. Parida, T. Warnow, and S. Yooseph The disk-covering method for tree reconstruction. pp in Proceedings of lgorithms and xperiments (LX98), Trento, Italy, Feb. 9-11, 1998, ed. R. attiti and.. ertossi. [ isk-covering method for long stringy trees] Lecture 3. Phylogeny methods: ranch and bound, distance methods p.22/25

23 References, page 2 Foulds, L. R. and R. L. Graham The Steiner problem in phylogeny is NP-complete. dvances in pplied Mathematics 3: [Parsimony is NP-hard] Graham, R. L. and L. R. Foulds Unlikelihood that minimal phylogenies for a realistic biological study can be constructed in reasonable computat ional time. Mathematical iosciences 60: [... and more] Hendy, M.. and. Penny ranch and bound algorithms to determine minimal evolutionary trees. Mathematical iosciences 60: [Introduced branch-and-bound for phylogenies] Felsenstein, J Inferring Phylogenies. Sinauer ssociates, Sunderland, Massachusetts. [For this lecture the material is chapters 4, and 5] Semple,. and M. Steel Phylogenetics. Oxford University Press, Oxford. [lso covers search strategies] Lecture 3. Phylogeny methods: ranch and bound, distance methods p.23/25

24 References, page 3 Felsenstein, J istance methods for inferring phylogenies: a justification. volution 38: [rgument for statistical interpretation of distance methods] Farris, J. S istance data revisited. ladistics 1: [Reply to my 1984 paper] Felsenstein, J istance methods: reply to Farris. ladistics 2: [reply to Farris 1985] Farris, J. S istances and statistics. ladistics 2: [debate was cut off after this] Lecture 3. Phylogeny methods: ranch and bound, distance methods p.24/25

25 References, page 4 ryant,., and P. Waddell Rapid evaluation of least-squares and minimum-evolution criteria on phylogenetic trees. Molecular iology and volution 15: [quicker least squares distance trees] Felsenstein, J Inferring Phylogenies. Sinauer ssociates, Sunderland, Massachusetts. [See chapter 11] Semple,. and M. Steel Phylogenetics. Oxford University Press, Oxford. [See pp ] Yang, Z omputational Molecular volution. Oxford University Press, Oxford. [See pages 89-93] Lecture 3. Phylogeny methods: ranch and bound, distance methods p.25/25

Arbres formels et Arbre(s) de la Vie

Arbres formels et Arbre(s) de la Vie Arbres formels et Arbre(s) de la Vie A bit of history and biology Definitions Numbers Topological distances Consensus Random models Algorithms to build trees Basic principles DATA sequence alignment distance

More information

Heuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations

Heuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations Heuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations AlCoB 2014 First International Conference on Algorithms for Computational Biology Thiago da Silva Arruda Institute

More information

Phylogenetic Trees Made Easy

Phylogenetic Trees Made Easy Phylogenetic Trees Made Easy A How-To Manual Fourth Edition Barry G. Hall University of Rochester, Emeritus and Bellingham Research Institute Sinauer Associates, Inc. Publishers Sunderland, Massachusetts

More information

Linearly Independent Sets and Linearly Dependent Sets

Linearly Independent Sets and Linearly Dependent Sets These notes closely follow the presentation of the material given in David C. Lay s textbook Linear Algebra and its Applications (3rd edition). These notes are intended primarily for in-class presentation

More information

Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6

Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 In the last lab, you learned how to perform basic multiple sequence alignments. While useful in themselves for determining conserved residues

More information

PRec-I-DCM3: a parallel framework for fast and accurate large-scale phylogeny reconstruction

PRec-I-DCM3: a parallel framework for fast and accurate large-scale phylogeny reconstruction Int. J. Bioinformatics Research and Applications, Vol. 2, No. 4, 2006 407 PRec-I-DCM3: a parallel framework for fast and accurate large-scale phylogeny reconstruction Yuri Dotsenko*, Cristian Coarfa, Luay

More information

1 Review of Least Squares Solutions to Overdetermined Systems

1 Review of Least Squares Solutions to Overdetermined Systems cs4: introduction to numerical analysis /9/0 Lecture 7: Rectangular Systems and Numerical Integration Instructor: Professor Amos Ron Scribes: Mark Cowlishaw, Nathanael Fillmore Review of Least Squares

More information

Bio-Informatics Lectures. A Short Introduction

Bio-Informatics Lectures. A Short Introduction Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively

More information

Least-Squares Intersection of Lines

Least-Squares Intersection of Lines Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a

More information

4 Techniques for Analyzing Large Data Sets

4 Techniques for Analyzing Large Data Sets 4 Techniques for Analyzing Large Data Sets Pablo A. Goloboff Contents 1 Introduction 70 2 Traditional Techniques 71 3 Composite Optima: Why Do Traditional Techniques Fail? 72 4 Techniques for Analyzing

More information

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits Outline NP-completeness Examples of Easy vs. Hard problems Euler circuit vs. Hamiltonian circuit Shortest Path vs. Longest Path 2-pairs sum vs. general Subset Sum Reducing one problem to another Clique

More information

Maximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1

Maximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1 Maximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1 Ziheng Yang Department of Animal Science, Beijing Agricultural University Felsenstein s maximum-likelihood

More information

Bayesian Phylogeny and Measures of Branch Support

Bayesian Phylogeny and Measures of Branch Support Bayesian Phylogeny and Measures of Branch Support Bayesian Statistics Imagine we have a bag containing 100 dice of which we know that 90 are fair and 10 are biased. The

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference

PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference Stephane Guindon, F. Le Thiec, Patrice Duroux, Olivier Gascuel To cite this version: Stephane Guindon, F. Le Thiec, Patrice

More information

What mathematical optimization can, and cannot, do for biologists. Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL

What mathematical optimization can, and cannot, do for biologists. Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL What mathematical optimization can, and cannot, do for biologists Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL Introduction There is no shortage of literature about the

More information

Hidden Markov Models

Hidden Markov Models 8.47 Introduction to omputational Molecular Biology Lecture 7: November 4, 2004 Scribe: Han-Pang hiu Lecturer: Ross Lippert Editor: Russ ox Hidden Markov Models The G island phenomenon The nucleotide frequencies

More information

Dynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction

Dynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction Lecture 11 Dynamic Programming 11.1 Overview Dynamic Programming is a powerful technique that allows one to solve many different types of problems in time O(n 2 ) or O(n 3 ) for which a naive approach

More information

Introduction to Phylogenetic Analysis

Introduction to Phylogenetic Analysis Subjects of this lecture Introduction to Phylogenetic nalysis Irit Orr 1 Introducing some of the terminology of phylogenetics. 2 Introducing some of the most commonly used methods for phylogenetic analysis.

More information

Manifold Learning Examples PCA, LLE and ISOMAP

Manifold Learning Examples PCA, LLE and ISOMAP Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition

More information

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1 5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1 General Integer Linear Program: (ILP) min c T x Ax b x 0 integer Assumption: A, b integer The integrality condition

More information

The Taxman Game. Robert K. Moniot September 5, 2003

The Taxman Game. Robert K. Moniot September 5, 2003 The Taxman Game Robert K. Moniot September 5, 2003 1 Introduction Want to know how to beat the taxman? Legally, that is? Read on, and we will explore this cute little mathematical game. The taxman game

More information

Design and Analysis of ACO algorithms for edge matching problems

Design and Analysis of ACO algorithms for edge matching problems Design and Analysis of ACO algorithms for edge matching problems Carl Martin Dissing Söderlind Kgs. Lyngby 2010 DTU Informatics Department of Informatics and Mathematical Modelling Technical University

More information

Genome Explorer For Comparative Genome Analysis

Genome Explorer For Comparative Genome Analysis Genome Explorer For Comparative Genome Analysis Jenn Conn 1, Jo L. Dicks 1 and Ian N. Roberts 2 Abstract Genome Explorer brings together the tools required to build and compare phylogenies from both sequence

More information

A comparison of methods for estimating the transition:transversion ratio from DNA sequences

A comparison of methods for estimating the transition:transversion ratio from DNA sequences Molecular Phylogenetics and Evolution 32 (2004) 495 503 MOLECULAR PHYLOGENETICS AND EVOLUTION www.elsevier.com/locate/ympev A comparison of methods for estimating the transition:transversion ratio from

More information

Factoring Algorithms

Factoring Algorithms Institutionen för Informationsteknologi Lunds Tekniska Högskola Department of Information Technology Lund University Cryptology - Project 1 Factoring Algorithms The purpose of this project is to understand

More information

Euclidean Minimum Spanning Trees Based on Well Separated Pair Decompositions Chaojun Li. Advised by: Dave Mount. May 22, 2014

Euclidean Minimum Spanning Trees Based on Well Separated Pair Decompositions Chaojun Li. Advised by: Dave Mount. May 22, 2014 Euclidean Minimum Spanning Trees Based on Well Separated Pair Decompositions Chaojun Li Advised by: Dave Mount May 22, 2014 1 INTRODUCTION In this report we consider the implementation of an efficient

More information

Lecture 10: Regression Trees

Lecture 10: Regression Trees Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

More information

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar Complexity Theory IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar Outline Goals Computation of Problems Concepts and Definitions Complexity Classes and Problems Polynomial Time Reductions Examples

More information

High Performance Computing for Operation Research

High Performance Computing for Operation Research High Performance Computing for Operation Research IEF - Paris Sud University claude.tadonki@u-psud.fr INRIA-Alchemy seminar, Thursday March 17 Research topics Fundamental Aspects of Algorithms and Complexity

More information

Phylogenetic Models of Rate Heterogeneity: A High Performance Computing Perspective

Phylogenetic Models of Rate Heterogeneity: A High Performance Computing Perspective Phylogenetic Models of Rate Heterogeneity: A High Performance Computing Perspective Alexandros Stamatakis Institute of Computer Science, Foundation for Research and Technology-Hellas P.O. Box 1385, Heraklion,

More information

High Throughput Network Analysis

High Throughput Network Analysis High Throughput Network Analysis Sumeet Agarwal 1,2, Gabriel Villar 1,2,3, and Nick S Jones 2,4,5 1 Systems Biology Doctoral Training Centre, University of Oxford, Oxford OX1 3QD, United Kingdom 2 Department

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Evaluating the Performance of a Successive-Approximations Approach to Parameter Optimization in Maximum-Likelihood Phylogeny Estimation

Evaluating the Performance of a Successive-Approximations Approach to Parameter Optimization in Maximum-Likelihood Phylogeny Estimation Evaluating the Performance of a Successive-Approximations Approach to Parameter Optimization in Maximum-Likelihood Phylogeny Estimation Jack Sullivan,* Zaid Abdo, à Paul Joyce, à and David L. Swofford

More information

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy. Blue vs. Orange. Review Jeopardy Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

More information

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:

More information

Applied Algorithm Design Lecture 5

Applied Algorithm Design Lecture 5 Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design

More information

TIMSS Advanced 2015 Mathematics Framework

TIMSS Advanced 2015 Mathematics Framework PR dvanced 205 athematics Framework Liv issel Grønmo, ary Lindquist, and lka rora he assessment framework for dvanced athematics is organized around two dimensions: a content dimension specifying the domains

More information

NP-completeness and the real world. NP completeness. NP-completeness and the real world (2) NP-completeness and the real world

NP-completeness and the real world. NP completeness. NP-completeness and the real world (2) NP-completeness and the real world -completeness and the real world completeness Course Discrete Biological Models (Modelli Biologici Discreti) Zsuzsanna Lipták Imagine you are working for a biotech company. One day your boss calls you

More information

A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML

A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML 9 June 2011 A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML by Jun Inoue, Mario dos Reis, and Ziheng Yang In this tutorial we will analyze

More information

CAD Algorithms. P and NP

CAD Algorithms. P and NP CAD Algorithms The Classes P and NP Mohammad Tehranipoor ECE Department 6 September 2010 1 P and NP P and NP are two families of problems. P is a class which contains all of the problems we solve using

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.

Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs. Multimedia Databases Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 14 Previous Lecture 13 Indexes for Multimedia Data 13.1

More information

General Framework for an Iterative Solution of Ax b. Jacobi s Method

General Framework for an Iterative Solution of Ax b. Jacobi s Method 2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,

More information

Dmitri Krioukov CAIDA/UCSD

Dmitri Krioukov CAIDA/UCSD Hyperbolic geometry of complex networks Dmitri Krioukov CAIDA/UCSD dima@caida.org F. Papadopoulos, M. Boguñá, A. Vahdat, and kc claffy Complex networks Technological Internet Transportation Power grid

More information

Molecular Clocks and Tree Dating with r8s and BEAST

Molecular Clocks and Tree Dating with r8s and BEAST Integrative Biology 200B University of California, Berkeley Principals of Phylogenetics: Ecology and Evolution Spring 2011 Updated by Nick Matzke Molecular Clocks and Tree Dating with r8s and BEAST Today

More information

Mathematics for Algorithm and System Analysis

Mathematics for Algorithm and System Analysis Mathematics for Algorithm and System Analysis for students of computer and computational science Edward A. Bender S. Gill Williamson c Edward A. Bender & S. Gill Williamson 2005. All rights reserved. Preface

More information

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Solving certain quintics

Solving certain quintics Annales Mathematicae et Informaticae 37 010) pp. 193 197 http://ami.ektf.hu Solving certain quintics Raghavendra G. Kulkarni Bharat Electronics Ltd., India Submitted 1 July 010; Accepted 6 July 010 Abstract

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

1 Prior Probability and Posterior Probability

1 Prior Probability and Posterior Probability Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* Rakesh Nagi Department of Industrial Engineering University at Buffalo (SUNY) *Lecture notes from Network Flows by Ahuja, Magnanti

More information

Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

More information

Missing data and the accuracy of Bayesian phylogenetics

Missing data and the accuracy of Bayesian phylogenetics Journal of Systematics and Evolution 46 (3): 307 314 (2008) (formerly Acta Phytotaxonomica Sinica) doi: 10.3724/SP.J.1002.2008.08040 http://www.plantsystematics.com Missing data and the accuracy of Bayesian

More information

JUST-IN-TIME SCHEDULING WITH PERIODIC TIME SLOTS. Received December May 12, 2003; revised February 5, 2004

JUST-IN-TIME SCHEDULING WITH PERIODIC TIME SLOTS. Received December May 12, 2003; revised February 5, 2004 Scientiae Mathematicae Japonicae Online, Vol. 10, (2004), 431 437 431 JUST-IN-TIME SCHEDULING WITH PERIODIC TIME SLOTS Ondřej Čepeka and Shao Chin Sung b Received December May 12, 2003; revised February

More information

Boolean Network Models

Boolean Network Models Boolean Network Models 2/5/03 History Kaufmann, 1970s Studied organization and dynamics properties of (N,k) Boolean Networks Found out that highly connected networks behave differently than lowly connected

More information

Borges, J. L. 1998. On exactitude in science. P. 325, In, Jorge Luis Borges, Collected Fictions (Trans. Hurley, H.) Penguin Books.

Borges, J. L. 1998. On exactitude in science. P. 325, In, Jorge Luis Borges, Collected Fictions (Trans. Hurley, H.) Penguin Books. ... In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

11 Multivariate Polynomials

11 Multivariate Polynomials CS 487: Intro. to Symbolic Computation Winter 2009: M. Giesbrecht Script 11 Page 1 (These lecture notes were prepared and presented by Dan Roche.) 11 Multivariate Polynomials References: MC: Section 16.6

More information

Online Consensus and Agreement of Phylogenetic Trees.

Online Consensus and Agreement of Phylogenetic Trees. Online Consensus and Agreement of Phylogenetic Trees. Tanya Y. Berger-Wolf 1 Department of Computer Science, University of New Mexico, Albuquerque, NM 87131, USA. tanyabw@cs.unm.edu Abstract. Computational

More information

Scaling the gene duplication problem towards the Tree of Life: Accelerating the rspr heuristic search

Scaling the gene duplication problem towards the Tree of Life: Accelerating the rspr heuristic search Scaling the gene duplication problem towards the Tree of Life: Accelerating the rspr heuristic search André Wehe 1 and J. Gordon Burleigh 2 1 Department of Computer Science, Iowa State University, Ames,

More information

Social Media Mining. Network Measures

Social Media Mining. Network Measures Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users

More information

Offline 1-Minesweeper is NP-complete

Offline 1-Minesweeper is NP-complete Offline 1-Minesweeper is NP-complete James D. Fix Brandon McPhail May 24 Abstract We use Minesweeper to illustrate NP-completeness proofs, arguments that establish the hardness of solving certain problems.

More information

Factoring. Factoring 1

Factoring. Factoring 1 Factoring Factoring 1 Factoring Security of RSA algorithm depends on (presumed) difficulty of factoring o Given N = pq, find p or q and RSA is broken o Rabin cipher also based on factoring Factoring like

More information

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL STATIsTICs 4 IV. RANDOm VECTORs 1. JOINTLY DIsTRIBUTED RANDOm VARIABLEs If are two rom variables defined on the same sample space we define the joint

More information

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab Mai S. Mabrouk 1, Marwa Hamdy 2, Marwa Mamdouh 2, Marwa Aboelfotoh 2,Yasser M. Kadah 2 1 Biomedical Engineering Department,

More information

1 Solving LPs: The Simplex Algorithm of George Dantzig

1 Solving LPs: The Simplex Algorithm of George Dantzig Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.

More information

CMPSCI611: Approximating MAX-CUT Lecture 20

CMPSCI611: Approximating MAX-CUT Lecture 20 CMPSCI611: Approximating MAX-CUT Lecture 20 For the next two lectures we ll be seeing examples of approximation algorithms for interesting NP-hard problems. Today we consider MAX-CUT, which we proved to

More information

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Load balancing in a heterogeneous computer system by self-organizing Kohonen network Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.

More information

Dynamic programming. Doctoral course Optimization on graphs - Lecture 4.1. Giovanni Righini. January 17 th, 2013

Dynamic programming. Doctoral course Optimization on graphs - Lecture 4.1. Giovanni Righini. January 17 th, 2013 Dynamic programming Doctoral course Optimization on graphs - Lecture.1 Giovanni Righini January 1 th, 201 Implicit enumeration Combinatorial optimization problems are in general NP-hard and we usually

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 7, July 23 ISSN: 2277 28X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Greedy Algorithm:

More information

Visualization of Phylogenetic Trees and Metadata

Visualization of Phylogenetic Trees and Metadata Visualization of Phylogenetic Trees and Metadata November 27, 2015 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.clcbio.com support-clcbio@qiagen.com

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical

More information

P vs NP problem in the field anthropology

P vs NP problem in the field anthropology Research Article P vs NP problem in the field anthropology Michael.A. Popov, Oxford, UK Email Michael282.eps@gmail.com Keywords P =?NP - complexity anthropology - M -decision - quantum -like game - game-theoretical

More information

Core Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1

Core Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1 Core Bioinformatics 2014/2015 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformàtica/Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: Sonia.Casillas@uab.cat

More information

Some Computer Organizations and Their Effectiveness. Michael J Flynn. IEEE Transactions on Computers. Vol. c-21, No.

Some Computer Organizations and Their Effectiveness. Michael J Flynn. IEEE Transactions on Computers. Vol. c-21, No. Some Computer Organizations and Their Effectiveness Michael J Flynn IEEE Transactions on Computers. Vol. c-21, No.9, September 1972 Introduction Attempts to codify a computer have been from three points

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Data Structures. Chapter 8

Data Structures. Chapter 8 Chapter 8 Data Structures Computer has to process lots and lots of data. To systematically process those data efficiently, those data are organized as a whole, appropriate for the application, called a

More information

Indiana State Core Curriculum Standards updated 2009 Algebra I

Indiana State Core Curriculum Standards updated 2009 Algebra I Indiana State Core Curriculum Standards updated 2009 Algebra I Strand Description Boardworks High School Algebra presentations Operations With Real Numbers Linear Equations and A1.1 Students simplify and

More information

Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

More information

5.1 Bipartite Matching

5.1 Bipartite Matching CS787: Advanced Algorithms Lecture 5: Applications of Network Flow In the last lecture, we looked at the problem of finding the maximum flow in a graph, and how it can be efficiently solved using the Ford-Fulkerson

More information

A Non-Linear Schema Theorem for Genetic Algorithms

A Non-Linear Schema Theorem for Genetic Algorithms A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland

More information

Master's projects at ITMO University. Daniil Chivilikhin PhD Student @ ITMO University

Master's projects at ITMO University. Daniil Chivilikhin PhD Student @ ITMO University Master's projects at ITMO University Daniil Chivilikhin PhD Student @ ITMO University General information Guidance from our lab's researchers Publishable results 2 Research areas Research at ITMO Evolutionary

More information

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions. Chapter 1 Vocabulary identity - A statement that equates two equivalent expressions. verbal model- A word equation that represents a real-life problem. algebraic expression - An expression with variables.

More information

Protein Sequence Analysis - Overview -

Protein Sequence Analysis - Overview - Protein Sequence Analysis - Overview - UDEL Workshop Raja Mazumder Research Associate Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center Topics Why do protein

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. Clarificationof zonationprocedure described onpp. 238-239

STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. Clarificationof zonationprocedure described onpp. 238-239 STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. by John C. Davis Clarificationof zonationprocedure described onpp. 38-39 Because the notation used in this section (Eqs. 4.8 through 4.84) is inconsistent

More information

An experimental study comparing linguistic phylogenetic reconstruction methods *

An experimental study comparing linguistic phylogenetic reconstruction methods * An experimental study comparing linguistic phylogenetic reconstruction methods * François Barbançon, a Steven N. Evans, b Luay Nakhleh c, Don Ringe, d and Tandy Warnow, e, a Palantir Technologies, 100

More information

Computer Algorithms. NP-Complete Problems. CISC 4080 Yanjun Li

Computer Algorithms. NP-Complete Problems. CISC 4080 Yanjun Li Computer Algorithms NP-Complete Problems NP-completeness The quest for efficient algorithms is about finding clever ways to bypass the process of exhaustive search, using clues from the input in order

More information

Programming Using Python

Programming Using Python Introduction to Computation and Programming Using Python Revised and Expanded Edition John V. Guttag The MIT Press Cambridge, Massachusetts London, England CONTENTS PREFACE xiii ACKNOWLEDGMENTS xv 1 GETTING

More information

Finding Clusters in Phylogenetic Trees: A Special Type of Cluster Analysis

Finding Clusters in Phylogenetic Trees: A Special Type of Cluster Analysis Finding lusters in Phylogenetic Trees: Special Type of luster nalysis Why try to identify clusters in phylogenetic trees? xample: origin of HIV. NUMR: Why are there so many distinct clusters? LUR04-7 SYNHRONY:

More information

Network Protocol Analysis using Bioinformatics Algorithms

Network Protocol Analysis using Bioinformatics Algorithms Network Protocol Analysis using Bioinformatics Algorithms Marshall A. Beddoe Marshall_Beddoe@McAfee.com ABSTRACT Network protocol analysis is currently performed by hand using only intuition and a protocol

More information

Systems of Linear Equations

Systems of Linear Equations Systems of Linear Equations Beifang Chen Systems of linear equations Linear systems A linear equation in variables x, x,, x n is an equation of the form a x + a x + + a n x n = b, where a, a,, a n and

More information

Introduction to Multivariate Analysis

Introduction to Multivariate Analysis Introduction to Multivariate Analysis Lecture 1 August 24, 2005 Multivariate Analysis Lecture #1-8/24/2005 Slide 1 of 30 Today s Lecture Today s Lecture Syllabus and course overview Chapter 1 (a brief

More information

NEW GENERATION OF COMPUTER AIDED DESIGN IN SPACE PLANNING METHODS A SURVEY AND A PROPOSAL

NEW GENERATION OF COMPUTER AIDED DESIGN IN SPACE PLANNING METHODS A SURVEY AND A PROPOSAL NEW GENERATION OF COMPUTER AIDED DESIGN IN SPACE PLANNING METHODS A SURVEY AND A PROPOSAL YING-CHUN HSU, ROBERT J. KRAWCZYK Illinois Institute of Technology, Chicago, IL USA Email address: hsuying1@iit.edu

More information

Partial Fractions. Combining fractions over a common denominator is a familiar operation from algebra:

Partial Fractions. Combining fractions over a common denominator is a familiar operation from algebra: Partial Fractions Combining fractions over a common denominator is a familiar operation from algebra: From the standpoint of integration, the left side of Equation 1 would be much easier to work with than

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information