Hierarchical Classification:

Size: px
Start display at page:

Download "Hierarchical Classification:"

Transcription

1 Genome Bioinformatics Protein Families Annotation Phylogeny I Molecule Compare Domains? Compare Epression Similar Proteins? Epression What is a phylogenetic tree? How to make a phylogenetic tree? TLR Some of the slides in this lecture are courtesy of Jaap Heringa, Anders Gorm Pedersen and Michael Rosenerg Hierarchical Classification: Linnaeus Tree: depiction (formalization) of classification Carl Linnaeus Theory of evolution The only figure in Darwin s On the Origin of Species is Charles Darwin

2 Phylogenetic trees. historical pattern of relationships among organisms: interpretation of a tree e.g. Flow of Time How to read a phylogenetic tree? Ancestors Trees are useful in ioinformatics eyond phylogeny of species. Where else can phylogenetic trees e used? Progressive multiple alignment general principles Scores to distances Scores Similarity matri Score - Score - Score 4-5 Guide tree Multiple alignment Other trees (=clusters): gene epression Phylogenetic Trees Unrooted Rooted

3 Unrooted vs rooted Trees Trees and evolutionary time Unrooted vs rooted Trees Phylogenies using characters Faster Evolution Molecular Phylogeny changed taonomy Three main classes of phylogenetic methods Distance ased uses pairwise distances fastest approach Parsimony fewest numer of evolutionary events (mutations) attempts to construct maimum parsimony tree Maimum likelihood

4 Phylogenetic tree y Distance methods (Clustering) Similarity criterion Scores Multiple alignment Distance matri Phylogenetic tree Distances Evolutionary sequence distance = sequence dissimilarity Human -KITVVGVGAVGMACAISILMKDLADELALVDVIEDKLKGEMMDLQHGSLFLRTPKIVSGKDYNVTANSKLVIITAGARQ Chicken -KISVVGVGAVGMACAISILMKDLADELTLVDVVEDKLKGEMMDLQHGSLFLKTPKITSGKDYSVTAHSKLVIVTAGARQ Dogfish KITVVGVGAVGMACAISILMKDLADEVALVDVMEDKLKGEMMDLQHGSLFLHTAKIVSGKDYSVSAGSKLVVITAGARQ Lamprey SKVTIVGVGQVGMAAAISVLLRDLADELALVDVVEDRLKGEMMDLLHGSLFLKTAKIVADKDYSVTAGSRLVVVTAGARQ Barley TKISVIGAGNVGMAIAQTILTQNLADEIALVDALPDKLRGEALDLQHAAAFLPRVRI-SGTDAAVTKNSDLVIVTAGARQ Maizey casei -KVILVGDGAVGSSYAYAMVLQGIAQEIGIVDIFKDKTKGDAIDLSNALPFTSPKKIYSA-EYSDAKDADLVVITAGAPQ Bacillus TKVSVIGAGNVGMAIAQTILTRDLADEIALVDAVPDKLRGEMLDLQHAAAFLPRTRLVSGTDMSVTRGSDLVIVTAGARQ Lacto ste -RVVVIGAGFVGASYVFALMNQGIADEIVLIDANESKAIGDAMDFNHGKVFAPKPVDIWHGDYDDCRDADLVVICAGANQ Lacto_plant QKVVLVGDGAVGSSYAFAMAQQGIAEEFVIVDVVKDRTKGDALDLEDAQAFTAPKKIYSG-EYSDCKDADLVVITAGAPQ Therma_mari MKIGIVGLGRVGSSTAFALLMKGFAREMVLIDVDKKRAEGDALDLIHGTPFTRRANIYAG-DYADLKGSDVVIVAAGVPQ Bifido -KLAVIGAGAVGSTLAFAAAQRGIAREIVLEDIAKERVEAEVLDMQHGSSFYPTVSIDGSDDPEICRDADMVVITAGPRQ Thermus_aqua MKVGIVGSGFVGSATAYALVLQGVAREVVLVDLDRKLAQAHAEDILHATPFAHPVWVRSGW-YEDLEGARVVIVAAGVAQ Mycoplasma -KIALIGAGNVGNSFLYAAMNQGLASEYGIIDINPDFADGNAFDFEDASASLPFPISVSRYEYKDLKDADFIVITAGRPQ Distance Matri Human Chicken Dogfish Lamprey Barley Maizey Lacto_casei Bacillus_stea Lacto_plant Therma_mari Bifido Thermus_aqua Mycoplasma NB ecause evo distance we otain a phylogenetic tree 5 5 Clustering Scores Single linkage - Nearest neighour Cluster criterion Complete linkage Furthest neighour Group averaging UPGMA Neighour joining Distance matri Phylogenetic tree Clustering algorithm: UPGMA human - mouse - fugu Yeast human fugu mouse human mouse Fugu 4 Yeast Evolutionary clock speeds Uniform clock: Ultrametric distances lead to identical distances from root to leaves UPGMA trees would e correct if evolution had a uniform clock, ut it often did not! Neighour-Joining (Saitou and Nei, 987) Gloal : keeps total ranch length minimal At each step, join two nodes that are considering their respective distance to all other nodes, closest Leads to unrooted tree Non-uniform evolutionary clock: leaves have different distances to the root

5 Neighour joining Neighour joining y At each step all possile neighour joinings are checked and the one corresponding to the minimal total tree length (calculated y adding all ranch lengths) is taken. At each step all possile neighour joinings are checked and the one corresponding to the minimal total tree length (calculated y adding all ranch lengths) is taken. Neighour joining Introduce a root y root y Yeast ranch human root ranch y y fugu mouse Yeast fugu mouse human At each step all possile neighour joinings are checked and the one corresponding to the minimal total tree length (calculated y adding all ranch lengths) is taken. internal node leaf internal node (ancestor) leaf How to root a tree How to root a tree: outgroup Outgroup place root etween distant (still homolog) sequence and rest group Midpoint place root at midpoint of longest path (sum of ranches etween any two leafs) Gene duplication place root etween paralogous gene copies Y f-β fugu Yeast f 5 mouse m human h f-α Y f m h Y f m h h-α h-β f-α h-α f-β h-β 4

6 Orthologs and paralogs

7 Gene duplication and gene loss Simple real life eample Kinase-5: essential for centrosome separation in mitosis Gene duplication: divergence of a gene within one genome Let's tell a story Verterate Toll-Like Receptors Spanish Flu (98) Roach, Jared C. et al. (005) Proc. Natl. Acad. Sci. USA 0, Three main classes of phylogenetic methods Distance ased uses pairwise distances fastest approach Parsimony fewest numer of evolutionary events (mutations) attempts to construct maimum parsimony tree Maimum likelihood Parsimony A a c a t g a a B a c t t g a a C a c a t g t a D a c a t g t a

8 Parsimony A a c a t g a a B a c t t g a a C a c a t g t a D a c a t g t a Parsimony A a c a t g a a B a c t t g a a C a c a t g t a D a c a t g t a Informative sites are the sites where at least two different characters occur at least twice. Another eample Another eample Human c c t t g a a Chimp c c t t g a a Gorilla c c t a g t a Gion t c a a g a a Orangutan t c a a g a t Human c c t t g a a Chimp c c t t g a a Gorilla c c t a g t a Gion t c a a g a a Orangutan t c a a g a t Chimp Gion Human Gorilla Orangutan Three main classes of phylogenetic methods Distance ased uses pairwise distances fastest approach Parsimony fewest numer of evolutionary events (mutations) attempts to construct maimum parsimony tree Maimum likelihood Maimum likelihood If data = alignment, hypothesis = tree, and under a given evolutionary model (e.g. Sustitution matri): compute likelihood that the hypothesis (=tree), given a model (e.g. sustitution matri), results in the oserved data (= multiple sequence alignment). maimum likelihood selects the hypothesis (tree) that maimises the oserved data Etremely time consuming method Best approach to find the true tree

9 Parsimony, Maimum Likelihood or Neighor- Joining? Common practice: use all methods and compare trees Data is of greater importance than method As with alignments, one must rememer that a phylogenetic tree is a hypothesis of the true evolutionary history. As a hypothesis it could e right or wrong or a it of oth. If we would know the true tree of life we would also know which method is est. How to assess confidence in tree Distance method ootstrap: Select multiple alignment columns with replacement Recalculate tree Compare ranches with original tree Repeat times, so calculate different trees How often is ranching preserved for each internal node? Uses samples of the data The Bootstrap The Bootstrap Original C C V K V I Y S M A V R L I F S M A L R L L F S The Bootstrap The Bootstrap Original C C V K V I Y S M A V R L I F S M A L R L L F S C C V K V I Y S Original M A V R L I F S M A L R L L F S V K V S I I S I Scramled V R V S I I S I L R L T L L S L Nonsupportive

10 The Bootstrap Bootstrap eample 85 times 5 times 85 Horizontal (lateral) gene transfer: The evolutionary history of a gene is not always consistent with the history of the species! Detecting HGT in trees Eukaryotes Aminoacyl-tRNA synthetase Discovering horizontal gene transfer y: Comparing phylogenetic trees of the species (SSU rrna) and that of the gene in question. Be careful however!! The sequences have to e orthologous to each other. Ancient gene duplications followed y differential loss can also give rise to horizontal gene transfer like trees. Archaea Leucine Aminoacyl-tRNA synthetase. Bacteria Detecting HGT in trees Detecting HGT in trees Eukaryotes Archaea Archaea Eukaryotes Bacteria Bacteria No apparent Horizontal Gene Transfer in the evolution of Leucine Aminoacyl-tRNA synthetase (the phylogeny of the sequences fits more or less the species phylogeny). Proline Aminoacyl-tRNA synthetase. Archaea Eukaryotes Bacteria?

11 Detecting HGT in trees Archaea Eukaryotes Bacteria Apparent Horizontal Gene Transfer to the parasites Bu (B.urgdorferi) and Mge, Mpe (Mycoplasmas) from the Eukaryotes represented y Cel (C.elegans) and Sce (S.cerevisiae) Let's tell a story MHC molecules Let's tell a story MHC molecules Another use of Phylogenies

Introduction to Phylogenetic Analysis

Introduction to Phylogenetic Analysis Subjects of this lecture Introduction to Phylogenetic nalysis Irit Orr 1 Introducing some of the terminology of phylogenetics. 2 Introducing some of the most commonly used methods for phylogenetic analysis.

More information

Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6

Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 In the last lab, you learned how to perform basic multiple sequence alignments. While useful in themselves for determining conserved residues

More information

Name Class Date. binomial nomenclature. MAIN IDEA: Linnaeus developed the scientific naming system still used today.

Name Class Date. binomial nomenclature. MAIN IDEA: Linnaeus developed the scientific naming system still used today. Section 1: The Linnaean System of Classification 17.1 Reading Guide KEY CONCEPT Organisms can be classified based on physical similarities. VOCABULARY taxonomy taxon binomial nomenclature genus MAIN IDEA:

More information

Protein Sequence Analysis - Overview -

Protein Sequence Analysis - Overview - Protein Sequence Analysis - Overview - UDEL Workshop Raja Mazumder Research Associate Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center Topics Why do protein

More information

Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question. Name: Class: Date: Chapter 17 Practice Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The correct order for the levels of Linnaeus's classification system,

More information

Phylogenetic Trees Made Easy

Phylogenetic Trees Made Easy Phylogenetic Trees Made Easy A How-To Manual Fourth Edition Barry G. Hall University of Rochester, Emeritus and Bellingham Research Institute Sinauer Associates, Inc. Publishers Sunderland, Massachusetts

More information

Bio-Informatics Lectures. A Short Introduction

Bio-Informatics Lectures. A Short Introduction Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively

More information

Network Protocol Analysis using Bioinformatics Algorithms

Network Protocol Analysis using Bioinformatics Algorithms Network Protocol Analysis using Bioinformatics Algorithms Marshall A. Beddoe Marshall_Beddoe@McAfee.com ABSTRACT Network protocol analysis is currently performed by hand using only intuition and a protocol

More information

Maximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1

Maximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1 Maximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1 Ziheng Yang Department of Animal Science, Beijing Agricultural University Felsenstein s maximum-likelihood

More information

The Central Dogma of Molecular Biology

The Central Dogma of Molecular Biology Vierstraete Andy (version 1.01) 1/02/2000 -Page 1 - The Central Dogma of Molecular Biology Figure 1 : The Central Dogma of molecular biology. DNA contains the complete genetic information that defines

More information

Activity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations

Activity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations Activity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations SCENARIO You have responded, as a result of a call from the police to the Coroner s Office, to the scene of the death of

More information

4. Why are common names not good to use when classifying organisms? Give an example.

4. Why are common names not good to use when classifying organisms? Give an example. 1. Define taxonomy. Classification of organisms 2. Who was first to classify organisms? Aristotle 3. Explain Aristotle s taxonomy of organisms. Patterns of nature: looked like 4. Why are common names not

More information

Visualization of Phylogenetic Trees and Metadata

Visualization of Phylogenetic Trees and Metadata Visualization of Phylogenetic Trees and Metadata November 27, 2015 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.clcbio.com support-clcbio@qiagen.com

More information

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab Mai S. Mabrouk 1, Marwa Hamdy 2, Marwa Mamdouh 2, Marwa Aboelfotoh 2,Yasser M. Kadah 2 1 Biomedical Engineering Department,

More information

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office

More information

Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment Sequence Analysis 15: lecture 5 Substitution matrices Multiple sequence alignment A teacher's dilemma To understand... Multiple sequence alignment Substitution matrices Phylogenetic trees You first need

More information

MAKING AN EVOLUTIONARY TREE

MAKING AN EVOLUTIONARY TREE Student manual MAKING AN EVOLUTIONARY TREE THEORY The relationship between different species can be derived from different information sources. The connection between species may turn out by similarities

More information

The Story of Human Evolution Part 1: From ape-like ancestors to modern humans

The Story of Human Evolution Part 1: From ape-like ancestors to modern humans The Story of Human Evolution Part 1: From ape-like ancestors to modern humans Slide 1 The Story of Human Evolution This powerpoint presentation tells the story of who we are and where we came from - how

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Core Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1

Core Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1 Core Bioinformatics 2014/2015 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformàtica/Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: Sonia.Casillas@uab.cat

More information

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the

More information

PHYLOGENETIC ANALYSIS

PHYLOGENETIC ANALYSIS Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Second Edition Andreas D. Baxevanis, B.F. Francis Ouellette Copyright 2001 John Wiley & Sons, Inc. ISBNs: 0-471-38390-2 (Hardback);

More information

1) Orthology of zebrafish HoxD4 and euteleost HoxD4a:

1) Orthology of zebrafish HoxD4 and euteleost HoxD4a: Supplementary Material for Karen D Crow, Peter F. Stadler, Vincent J. Lynch, Chris Amemiya, and Günter P. Wagner. 2005 The fish specific Hox cluster duplication is coincident with the origin of teleosts.

More information

Final Project Report

Final Project Report CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

More information

Guide for Bioinformatics Project Module 3

Guide for Bioinformatics Project Module 3 Structure- Based Evidence and Multiple Sequence Alignment In this module we will revisit some topics we started to look at while performing our BLAST search and looking at the CDD database in the first

More information

2.3 Identify rrna sequences in DNA

2.3 Identify rrna sequences in DNA 2.3 Identify rrna sequences in DNA For identifying rrna sequences in DNA we will use rnammer, a program that implements an algorithm designed to find rrna sequences in DNA [5]. The program was made by

More information

Bayesian Phylogeny and Measures of Branch Support

Bayesian Phylogeny and Measures of Branch Support Bayesian Phylogeny and Measures of Branch Support Bayesian Statistics Imagine we have a bag containing 100 dice of which we know that 90 are fair and 10 are biased. The

More information

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:

More information

17.1. The Tree of Life CHAPTER 17. Organisms can be classified based on physical similarities. Linnaean taxonomy. names.

17.1. The Tree of Life CHAPTER 17. Organisms can be classified based on physical similarities. Linnaean taxonomy. names. SECTION 17.1 THE LINNAEAN SYSTEM OF CLASSIFICATION Study Guide KEY CONCEPT Organisms can be classified based on physical similarities. VOCABULARY taxonomy taxon binomial nomenclature genus MAIN IDEA: Linnaeus

More information

Introduction to Bioinformatics 3. DNA editing and contig assembly

Introduction to Bioinformatics 3. DNA editing and contig assembly Introduction to Bioinformatics 3. DNA editing and contig assembly Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 matthewb@ba.ars.usda.gov

More information

High Throughput Network Analysis

High Throughput Network Analysis High Throughput Network Analysis Sumeet Agarwal 1,2, Gabriel Villar 1,2,3, and Nick S Jones 2,4,5 1 Systems Biology Doctoral Training Centre, University of Oxford, Oxford OX1 3QD, United Kingdom 2 Department

More information

Horizontal Gene Transfer and Its Part in the Reorganisation of Genetics during the LUCA Epoch

Horizontal Gene Transfer and Its Part in the Reorganisation of Genetics during the LUCA Epoch Life 2013, 3, 518-523; doi:10.3390/life3040518 Editorial OPEN ACCESS life ISSN 2075-1729 www.mdpi.com/journal/life Horizontal Gene Transfer and Its Part in the Reorganisation of Genetics during the LUCA

More information

A Primer of Genome Science THIRD

A Primer of Genome Science THIRD A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:

More information

Protein Phylogenies and Signature Sequences: A Reappraisal of Evolutionary Relationships among Archaebacteria, Eubacteria, and Eukaryotes

Protein Phylogenies and Signature Sequences: A Reappraisal of Evolutionary Relationships among Archaebacteria, Eubacteria, and Eukaryotes MICROBIOLOGY AND MOLECULAR BIOLOGY REVIEWS, Dec. 1998, p. 1435 1491 Vol. 62, No. 4 1092-2172/98/$04.00 0 Copyright 1998, American Society for Microbiology. All Rights Reserved. Protein Phylogenies and

More information

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing James D. Jackson Philip J. Hatcher Department of Computer Science Kingsbury Hall University of New Hampshire Durham,

More information

Pairwise Sequence Alignment

Pairwise Sequence Alignment Pairwise Sequence Alignment carolin.kosiol@vetmeduni.ac.at SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What

More information

A short guide to phylogeny reconstruction

A short guide to phylogeny reconstruction A short guide to phylogeny reconstruction E. Michu Institute of Biophysics, Academy of Sciences of the Czech Republic, Brno, Czech Republic ABSTRACT This review is a short introduction to phylogenetic

More information

Principles of Evolution - Origin of Species

Principles of Evolution - Origin of Species Theories of Organic Evolution X Multiple Centers of Creation (de Buffon) developed the concept of "centers of creation throughout the world organisms had arisen, which other species had evolved from X

More information

11, Olomouc, 783 71, Czech Republic. Version of record first published: 24 Sep 2012.

11, Olomouc, 783 71, Czech Republic. Version of record first published: 24 Sep 2012. This article was downloaded by: [Knihovna Univerzity Palackeho], [Vladan Ondrej] On: 24 September 2012, At: 05:24 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954

More information

Scaling the gene duplication problem towards the Tree of Life: Accelerating the rspr heuristic search

Scaling the gene duplication problem towards the Tree of Life: Accelerating the rspr heuristic search Scaling the gene duplication problem towards the Tree of Life: Accelerating the rspr heuristic search André Wehe 1 and J. Gordon Burleigh 2 1 Department of Computer Science, Iowa State University, Ames,

More information

Bioinformatics Grid - Enabled Tools For Biologists.

Bioinformatics Grid - Enabled Tools For Biologists. Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis

More information

Lab 2/Phylogenetics/September 16, 2002 1 PHYLOGENETICS

Lab 2/Phylogenetics/September 16, 2002 1 PHYLOGENETICS Lab 2/Phylogenetics/September 16, 2002 1 Read: Tudge Chapter 2 PHYLOGENETICS Objective of the Lab: To understand how DNA and protein sequence information can be used to make comparisons and assess evolutionary

More information

What mathematical optimization can, and cannot, do for biologists. Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL

What mathematical optimization can, and cannot, do for biologists. Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL What mathematical optimization can, and cannot, do for biologists Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL Introduction There is no shortage of literature about the

More information

PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference

PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference Stephane Guindon, F. Le Thiec, Patrice Duroux, Olivier Gascuel To cite this version: Stephane Guindon, F. Le Thiec, Patrice

More information

Arbres formels et Arbre(s) de la Vie

Arbres formels et Arbre(s) de la Vie Arbres formels et Arbre(s) de la Vie A bit of history and biology Definitions Numbers Topological distances Consensus Random models Algorithms to build trees Basic principles DATA sequence alignment distance

More information

Substitute 4 for x in the function, Simplify.

Substitute 4 for x in the function, Simplify. Page 1 of 19 Review of Eponential and Logarithmic Functions An eponential function is a function in the form of f ( ) = for a fied ase, where > 0 and 1. is called the ase of the eponential function. The

More information

Phylogenetic Analysis using MapReduce Programming Model

Phylogenetic Analysis using MapReduce Programming Model 2015 IEEE International Parallel and Distributed Processing Symposium Workshops Phylogenetic Analysis using MapReduce Programming Model Siddesh G M, K G Srinivasa*, Ishank Mishra, Abhinav Anurag, Eklavya

More information

Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering

Example: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering Overview Prognostic Models and Data Mining in Medicine, part I Cluster Analsis What is Cluster Analsis? K-Means Clustering Hierarchical Clustering Cluster Validit Eample: Microarra data analsis 6 Summar

More information

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov Data Integration Lectures 16 & 17 Lectures Outline Goals for Data Integration Homogeneous data integration time series data (Filkov et al. 2002) Heterogeneous data integration microarray + sequence microarray

More information

Theory of Evolution. A. the beginning of life B. the evolution of eukaryotes C. the evolution of archaebacteria D. the beginning of terrestrial life

Theory of Evolution. A. the beginning of life B. the evolution of eukaryotes C. the evolution of archaebacteria D. the beginning of terrestrial life Theory of Evolution 1. In 1966, American biologist Lynn Margulis proposed the theory of endosymbiosis, or the idea that mitochondria are the descendents of symbiotic, aerobic eubacteria. What does the

More information

A branch-and-bound algorithm for the inference of ancestral. amino-acid sequences when the replacement rate varies among

A branch-and-bound algorithm for the inference of ancestral. amino-acid sequences when the replacement rate varies among A branch-and-bound algorithm for the inference of ancestral amino-acid sequences when the replacement rate varies among sites Tal Pupko 1,*, Itsik Pe er 2, Masami Hasegawa 1, Dan Graur 3, and Nir Friedman

More information

BIRCH: An Efficient Data Clustering Method For Very Large Databases

BIRCH: An Efficient Data Clustering Method For Very Large Databases BIRCH: An Efficient Data Clustering Method For Very Large Databases Tian Zhang, Raghu Ramakrishnan, Miron Livny CPSC 504 Presenter: Discussion Leader: Sophia (Xueyao) Liang HelenJr, Birches. Online Image.

More information

A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML

A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML 9 June 2011 A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML by Jun Inoue, Mario dos Reis, and Ziheng Yang In this tutorial we will analyze

More information

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources 1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools

More information

Worksheet - COMPARATIVE MAPPING 1

Worksheet - COMPARATIVE MAPPING 1 Worksheet - COMPARATIVE MAPPING 1 The arrangement of genes and other DNA markers is compared between species in Comparative genome mapping. As early as 1915, the geneticist J.B.S Haldane reported that

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining /8/ What is Cluster

More information

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions SMA 50: Statistical Learning and Data Mining in Bioinformatics (also listed as 5.077: Statistical Learning and Data Mining ()) Spring Term (Feb May 200) Faculty: Professor Roy Welsch Wed 0 Feb 7:00-8:0

More information

User Manual for SplitsTree4 V4.14.2

User Manual for SplitsTree4 V4.14.2 User Manual for SplitsTree4 V4.14.2 Daniel H. Huson and David Bryant November 4, 2015 Contents Contents 1 1 Introduction 4 2 Getting Started 5 3 Obtaining and Installing the Program 5 4 Program Overview

More information

REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf])

REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf]) 820 REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf]) (See also General Regulations) BMS1 Admission to the Degree To be eligible for admission to the degree of Bachelor

More information

Molecular Clocks and Tree Dating with r8s and BEAST

Molecular Clocks and Tree Dating with r8s and BEAST Integrative Biology 200B University of California, Berkeley Principals of Phylogenetics: Ecology and Evolution Spring 2011 Updated by Nick Matzke Molecular Clocks and Tree Dating with r8s and BEAST Today

More information

Bioinformatics: Network Analysis

Bioinformatics: Network Analysis Bioinformatics: Network Analysis Graph-theoretic Properties of Biological Networks COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Outline Architectural features Motifs, modules,

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical

More information

Biological Sciences Initiative. Human Genome

Biological Sciences Initiative. Human Genome Biological Sciences Initiative HHMI Human Genome Introduction In 2000, researchers from around the world published a draft sequence of the entire genome. 20 labs from 6 countries worked on the sequence.

More information

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental

More information

Supporting Online Material for

Supporting Online Material for www.sciencemag.org/cgi/content/full/312/5781/1762/dc1 Supporting Online Material for Silk Genes Support the Single Origin of Orb Webs Jessica E. Garb,* Teresa DiMauro, Victoria Vo, Cheryl Y. Hayashi *To

More information

Genome Explorer For Comparative Genome Analysis

Genome Explorer For Comparative Genome Analysis Genome Explorer For Comparative Genome Analysis Jenn Conn 1, Jo L. Dicks 1 and Ian N. Roberts 2 Abstract Genome Explorer brings together the tools required to build and compare phylogenies from both sequence

More information

The world of non-coding RNA. Espen Enerly

The world of non-coding RNA. Espen Enerly The world of non-coding RNA Espen Enerly ncrna in general Different groups Small RNAs Outline mirnas and sirnas Speculations Common for all ncrna Per def.: never translated Not spurious transcripts Always/often

More information

1 Mutation and Genetic Change

1 Mutation and Genetic Change CHAPTER 14 1 Mutation and Genetic Change SECTION Genes in Action KEY IDEAS As you read this section, keep these questions in mind: What is the origin of genetic differences among organisms? What kinds

More information

BLAST. Anders Gorm Pedersen & Rasmus Wernersson

BLAST. Anders Gorm Pedersen & Rasmus Wernersson BLAST Anders Gorm Pedersen & Rasmus Wernersson Database searching Using pairwise alignments to search databases for similar sequences Query sequence Database Database searching Most common use of pairwise

More information

Inferred thermophily of the last universal ancestor based on estimated

Inferred thermophily of the last universal ancestor based on estimated 1 Inferred thermophily of the last universal ancestor based on estimated amino acid composition Dawn J. Brooks and Eric A. Gaucher Address: Foundation for Applied Molecular Evolution, Gainesville, Florida

More information

Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is

Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is

More information

Human-Mouse Synteny in Functional Genomics Experiment

Human-Mouse Synteny in Functional Genomics Experiment Human-Mouse Synteny in Functional Genomics Experiment Ksenia Krasheninnikova University of the Russian Academy of Sciences, JetBrains krasheninnikova@gmail.com September 18, 2012 Ksenia Krasheninnikova

More information

1. Over the past century, several scientists around the world have made the following observations:

1. Over the past century, several scientists around the world have made the following observations: Evolution Keystone Review 1. Over the past century, several scientists around the world have made the following observations: New mitochondria and plastids can only be generated by old mitochondria and

More information

PHYLOGENY AND COMPARATIVE METHODS SYMBIOMICS WORKSHOP

PHYLOGENY AND COMPARATIVE METHODS SYMBIOMICS WORKSHOP PHYLOGENY AND COMPARATIVE METHODS SYMBIOMICS WORKSHOP March 4-7, 2013 Valencia, Spain Parc Cientific of the University of Valencia Goals The aim of this workshop is to provide the attendees with a broad

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Clustering Algorithms K-means and its variants Hierarchical clustering

More information

Introduction to Genome Annotation

Introduction to Genome Annotation Introduction to Genome Annotation AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT

More information

DNA Sequence Alignment Analysis

DNA Sequence Alignment Analysis Analysis of DNA sequence data p. 1 Analysis of DNA sequence data using MEGA and DNAsp. Analysis of two genes from the X and Y chromosomes of plant species from the genus Silene The first two computer classes

More information

DnaSP, DNA polymorphism analyses by the coalescent and other methods.

DnaSP, DNA polymorphism analyses by the coalescent and other methods. DnaSP, DNA polymorphism analyses by the coalescent and other methods. Author affiliation: Julio Rozas 1, *, Juan C. Sánchez-DelBarrio 2,3, Xavier Messeguer 2 and Ricardo Rozas 1 1 Departament de Genètica,

More information

AP Biology 2015 Free-Response Questions

AP Biology 2015 Free-Response Questions AP Biology 2015 Free-Response Questions College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

More information

UCHIME in practice Single-region sequencing Reference database mode

UCHIME in practice Single-region sequencing Reference database mode UCHIME in practice Single-region sequencing UCHIME is designed for experiments that perform community sequencing of a single region such as the 16S rrna gene or fungal ITS region. While UCHIME may prove

More information

Molecular typing of VTEC: from PFGE to NGS-based phylogeny

Molecular typing of VTEC: from PFGE to NGS-based phylogeny Molecular typing of VTEC: from PFGE to NGS-based phylogeny Valeria Michelacci 10th Annual Workshop of the National Reference Laboratories for E. coli in the EU Rome, November 5 th 2015 Molecular typing

More information

Unraveling protein networks with Power Graph Analysis

Unraveling protein networks with Power Graph Analysis Unraveling protein networks with Power Graph Analysis PLoS Computational Biology, 2008 Loic Royer Matthias Reimann Bill Andreopoulos Michael Schroeder Schroeder Group Bioinformatics 1 Complex Networks

More information

CCR Biology - Chapter 10 Practice Test - Summer 2012

CCR Biology - Chapter 10 Practice Test - Summer 2012 Name: Class: Date: CCR Biology - Chapter 10 Practice Test - Summer 2012 Multiple Choice Identify the choice that best completes the statement or answers the question. 1. What is the term for a feature

More information

2011.008a-cB. Code assigned:

2011.008a-cB. Code assigned: This form should be used for all taxonomic proposals. Please complete all those modules that are applicable (and then delete the unwanted sections). For guidance, see the notes written in blue and the

More information

Section 3 Comparative Genomics and Phylogenetics

Section 3 Comparative Genomics and Phylogenetics Section 3 Section 3 Comparative enomics and Phylogenetics At the end of this section you should be able to: Describe what is meant by DNA sequencing. Explain what is meant by Bioinformatics and Comparative

More information

Hierarchical Bayesian Modeling of the HIV Response to Therapy

Hierarchical Bayesian Modeling of the HIV Response to Therapy Hierarchical Bayesian Modeling of the HIV Response to Therapy Shane T. Jensen Department of Statistics, The Wharton School, University of Pennsylvania March 23, 2010 Joint Work with Alex Braunstein and

More information

AP Biology Essential Knowledge Student Diagnostic

AP Biology Essential Knowledge Student Diagnostic AP Biology Essential Knowledge Student Diagnostic Background The Essential Knowledge statements provided in the AP Biology Curriculum Framework are scientific claims describing phenomenon occurring in

More information

Systematics - BIO 615

Systematics - BIO 615 Outline - and introduction to phylogenetic inference 1. Pre Lamarck, Pre Darwin Classification without phylogeny 2. Lamarck & Darwin to Hennig (et al.) Classification with phylogeny but without a reproducible

More information

Human Genome Organization: An Update. Genome Organization: An Update

Human Genome Organization: An Update. Genome Organization: An Update Human Genome Organization: An Update Genome Organization: An Update Highlights of Human Genome Project Timetable Proposed in 1990 as 3 billion dollar joint venture between DOE and NIH with 15 year completion

More information

Exploratory data analysis for microarray data

Exploratory data analysis for microarray data Eploratory data analysis for microarray data Anja von Heydebreck Ma Planck Institute for Molecular Genetics, Dept. Computational Molecular Biology, Berlin, Germany heydebre@molgen.mpg.de Visualization

More information

Core Bioinformatics. Titulació Tipus Curs Semestre. 4313473 Bioinformàtica/Bioinformatics OB 0 1

Core Bioinformatics. Titulació Tipus Curs Semestre. 4313473 Bioinformàtica/Bioinformatics OB 0 1 Core Bioinformatics 2014/2015 Codi: 42397 Crèdits: 12 Titulació Tipus Curs Semestre 4313473 Bioinformàtica/Bioinformatics OB 0 1 Professor de contacte Nom: Sònia Casillas Viladerrams Correu electrònic:

More information

A Non-Linear Schema Theorem for Genetic Algorithms

A Non-Linear Schema Theorem for Genetic Algorithms A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland

More information

Tutorial for proteome data analysis using the Perseus software platform

Tutorial for proteome data analysis using the Perseus software platform Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information

More information

Bioinformatics Resources at a Glance

Bioinformatics Resources at a Glance Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences

More information

MATCH Commun. Math. Comput. Chem. 61 (2009) 781-788

MATCH Commun. Math. Comput. Chem. 61 (2009) 781-788 MATCH Communications in Mathematical and in Computer Chemistry MATCH Commun. Math. Comput. Chem. 61 (2009) 781-788 ISSN 0340-6253 Three distances for rapid similarity analysis of DNA sequences Wei Chen,

More information

Master's projects at ITMO University. Daniil Chivilikhin PhD Student @ ITMO University

Master's projects at ITMO University. Daniil Chivilikhin PhD Student @ ITMO University Master's projects at ITMO University Daniil Chivilikhin PhD Student @ ITMO University General information Guidance from our lab's researchers Publishable results 2 Research areas Research at ITMO Evolutionary

More information

IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 6, NO. 2, APRIL-JUNE 2009 1

IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 6, NO. 2, APRIL-JUNE 2009 1 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 6, NO. 2, APRIL-JUNE 2009 1 The Gene-Duplication Problem: Near-Linear Time Algorithms for NNI-Based Local Searches Mukul S. Bansal,

More information

IsoBase: a database of functionally related proteins across PPI networks

IsoBase: a database of functionally related proteins across PPI networks D295 D300 doi:10.1093/nar/gkq1234 IsoBase: a database of functionally related proteins across PPI networks Daniel Park 1,2, Rohit Singh 1, Michael Baym 1,3,4, Chung-Shou Liao 5 and Bonnie Berger 1,4, *

More information

Amino Acids and Their Properties

Amino Acids and Their Properties Amino Acids and Their Properties Recap: ss-rrna and mutations Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used So by aligning ss-rrna of one organism with that

More information

Data for phylogenetic analysis

Data for phylogenetic analysis Data for phylogenetic analysis The data that are used to estimate the phylogeny of a set of tips are the characteristics of those tips. Therefore the success of phylogenetic inference depends in large

More information