Hierarchical Classification:
|
|
- Rolf Thornton
- 7 years ago
- Views:
Transcription
1 Genome Bioinformatics Protein Families Annotation Phylogeny I Molecule Compare Domains? Compare Epression Similar Proteins? Epression What is a phylogenetic tree? How to make a phylogenetic tree? TLR Some of the slides in this lecture are courtesy of Jaap Heringa, Anders Gorm Pedersen and Michael Rosenerg Hierarchical Classification: Linnaeus Tree: depiction (formalization) of classification Carl Linnaeus Theory of evolution The only figure in Darwin s On the Origin of Species is Charles Darwin
2 Phylogenetic trees. historical pattern of relationships among organisms: interpretation of a tree e.g. Flow of Time How to read a phylogenetic tree? Ancestors Trees are useful in ioinformatics eyond phylogeny of species. Where else can phylogenetic trees e used? Progressive multiple alignment general principles Scores to distances Scores Similarity matri Score - Score - Score 4-5 Guide tree Multiple alignment Other trees (=clusters): gene epression Phylogenetic Trees Unrooted Rooted
3 Unrooted vs rooted Trees Trees and evolutionary time Unrooted vs rooted Trees Phylogenies using characters Faster Evolution Molecular Phylogeny changed taonomy Three main classes of phylogenetic methods Distance ased uses pairwise distances fastest approach Parsimony fewest numer of evolutionary events (mutations) attempts to construct maimum parsimony tree Maimum likelihood
4 Phylogenetic tree y Distance methods (Clustering) Similarity criterion Scores Multiple alignment Distance matri Phylogenetic tree Distances Evolutionary sequence distance = sequence dissimilarity Human -KITVVGVGAVGMACAISILMKDLADELALVDVIEDKLKGEMMDLQHGSLFLRTPKIVSGKDYNVTANSKLVIITAGARQ Chicken -KISVVGVGAVGMACAISILMKDLADELTLVDVVEDKLKGEMMDLQHGSLFLKTPKITSGKDYSVTAHSKLVIVTAGARQ Dogfish KITVVGVGAVGMACAISILMKDLADEVALVDVMEDKLKGEMMDLQHGSLFLHTAKIVSGKDYSVSAGSKLVVITAGARQ Lamprey SKVTIVGVGQVGMAAAISVLLRDLADELALVDVVEDRLKGEMMDLLHGSLFLKTAKIVADKDYSVTAGSRLVVVTAGARQ Barley TKISVIGAGNVGMAIAQTILTQNLADEIALVDALPDKLRGEALDLQHAAAFLPRVRI-SGTDAAVTKNSDLVIVTAGARQ Maizey casei -KVILVGDGAVGSSYAYAMVLQGIAQEIGIVDIFKDKTKGDAIDLSNALPFTSPKKIYSA-EYSDAKDADLVVITAGAPQ Bacillus TKVSVIGAGNVGMAIAQTILTRDLADEIALVDAVPDKLRGEMLDLQHAAAFLPRTRLVSGTDMSVTRGSDLVIVTAGARQ Lacto ste -RVVVIGAGFVGASYVFALMNQGIADEIVLIDANESKAIGDAMDFNHGKVFAPKPVDIWHGDYDDCRDADLVVICAGANQ Lacto_plant QKVVLVGDGAVGSSYAFAMAQQGIAEEFVIVDVVKDRTKGDALDLEDAQAFTAPKKIYSG-EYSDCKDADLVVITAGAPQ Therma_mari MKIGIVGLGRVGSSTAFALLMKGFAREMVLIDVDKKRAEGDALDLIHGTPFTRRANIYAG-DYADLKGSDVVIVAAGVPQ Bifido -KLAVIGAGAVGSTLAFAAAQRGIAREIVLEDIAKERVEAEVLDMQHGSSFYPTVSIDGSDDPEICRDADMVVITAGPRQ Thermus_aqua MKVGIVGSGFVGSATAYALVLQGVAREVVLVDLDRKLAQAHAEDILHATPFAHPVWVRSGW-YEDLEGARVVIVAAGVAQ Mycoplasma -KIALIGAGNVGNSFLYAAMNQGLASEYGIIDINPDFADGNAFDFEDASASLPFPISVSRYEYKDLKDADFIVITAGRPQ Distance Matri Human Chicken Dogfish Lamprey Barley Maizey Lacto_casei Bacillus_stea Lacto_plant Therma_mari Bifido Thermus_aqua Mycoplasma NB ecause evo distance we otain a phylogenetic tree 5 5 Clustering Scores Single linkage - Nearest neighour Cluster criterion Complete linkage Furthest neighour Group averaging UPGMA Neighour joining Distance matri Phylogenetic tree Clustering algorithm: UPGMA human - mouse - fugu Yeast human fugu mouse human mouse Fugu 4 Yeast Evolutionary clock speeds Uniform clock: Ultrametric distances lead to identical distances from root to leaves UPGMA trees would e correct if evolution had a uniform clock, ut it often did not! Neighour-Joining (Saitou and Nei, 987) Gloal : keeps total ranch length minimal At each step, join two nodes that are considering their respective distance to all other nodes, closest Leads to unrooted tree Non-uniform evolutionary clock: leaves have different distances to the root
5 Neighour joining Neighour joining y At each step all possile neighour joinings are checked and the one corresponding to the minimal total tree length (calculated y adding all ranch lengths) is taken. At each step all possile neighour joinings are checked and the one corresponding to the minimal total tree length (calculated y adding all ranch lengths) is taken. Neighour joining Introduce a root y root y Yeast ranch human root ranch y y fugu mouse Yeast fugu mouse human At each step all possile neighour joinings are checked and the one corresponding to the minimal total tree length (calculated y adding all ranch lengths) is taken. internal node leaf internal node (ancestor) leaf How to root a tree How to root a tree: outgroup Outgroup place root etween distant (still homolog) sequence and rest group Midpoint place root at midpoint of longest path (sum of ranches etween any two leafs) Gene duplication place root etween paralogous gene copies Y f-β fugu Yeast f 5 mouse m human h f-α Y f m h Y f m h h-α h-β f-α h-α f-β h-β 4
6 Orthologs and paralogs
7 Gene duplication and gene loss Simple real life eample Kinase-5: essential for centrosome separation in mitosis Gene duplication: divergence of a gene within one genome Let's tell a story Verterate Toll-Like Receptors Spanish Flu (98) Roach, Jared C. et al. (005) Proc. Natl. Acad. Sci. USA 0, Three main classes of phylogenetic methods Distance ased uses pairwise distances fastest approach Parsimony fewest numer of evolutionary events (mutations) attempts to construct maimum parsimony tree Maimum likelihood Parsimony A a c a t g a a B a c t t g a a C a c a t g t a D a c a t g t a
8 Parsimony A a c a t g a a B a c t t g a a C a c a t g t a D a c a t g t a Parsimony A a c a t g a a B a c t t g a a C a c a t g t a D a c a t g t a Informative sites are the sites where at least two different characters occur at least twice. Another eample Another eample Human c c t t g a a Chimp c c t t g a a Gorilla c c t a g t a Gion t c a a g a a Orangutan t c a a g a t Human c c t t g a a Chimp c c t t g a a Gorilla c c t a g t a Gion t c a a g a a Orangutan t c a a g a t Chimp Gion Human Gorilla Orangutan Three main classes of phylogenetic methods Distance ased uses pairwise distances fastest approach Parsimony fewest numer of evolutionary events (mutations) attempts to construct maimum parsimony tree Maimum likelihood Maimum likelihood If data = alignment, hypothesis = tree, and under a given evolutionary model (e.g. Sustitution matri): compute likelihood that the hypothesis (=tree), given a model (e.g. sustitution matri), results in the oserved data (= multiple sequence alignment). maimum likelihood selects the hypothesis (tree) that maimises the oserved data Etremely time consuming method Best approach to find the true tree
9 Parsimony, Maimum Likelihood or Neighor- Joining? Common practice: use all methods and compare trees Data is of greater importance than method As with alignments, one must rememer that a phylogenetic tree is a hypothesis of the true evolutionary history. As a hypothesis it could e right or wrong or a it of oth. If we would know the true tree of life we would also know which method is est. How to assess confidence in tree Distance method ootstrap: Select multiple alignment columns with replacement Recalculate tree Compare ranches with original tree Repeat times, so calculate different trees How often is ranching preserved for each internal node? Uses samples of the data The Bootstrap The Bootstrap Original C C V K V I Y S M A V R L I F S M A L R L L F S The Bootstrap The Bootstrap Original C C V K V I Y S M A V R L I F S M A L R L L F S C C V K V I Y S Original M A V R L I F S M A L R L L F S V K V S I I S I Scramled V R V S I I S I L R L T L L S L Nonsupportive
10 The Bootstrap Bootstrap eample 85 times 5 times 85 Horizontal (lateral) gene transfer: The evolutionary history of a gene is not always consistent with the history of the species! Detecting HGT in trees Eukaryotes Aminoacyl-tRNA synthetase Discovering horizontal gene transfer y: Comparing phylogenetic trees of the species (SSU rrna) and that of the gene in question. Be careful however!! The sequences have to e orthologous to each other. Ancient gene duplications followed y differential loss can also give rise to horizontal gene transfer like trees. Archaea Leucine Aminoacyl-tRNA synthetase. Bacteria Detecting HGT in trees Detecting HGT in trees Eukaryotes Archaea Archaea Eukaryotes Bacteria Bacteria No apparent Horizontal Gene Transfer in the evolution of Leucine Aminoacyl-tRNA synthetase (the phylogeny of the sequences fits more or less the species phylogeny). Proline Aminoacyl-tRNA synthetase. Archaea Eukaryotes Bacteria?
11 Detecting HGT in trees Archaea Eukaryotes Bacteria Apparent Horizontal Gene Transfer to the parasites Bu (B.urgdorferi) and Mge, Mpe (Mycoplasmas) from the Eukaryotes represented y Cel (C.elegans) and Sce (S.cerevisiae) Let's tell a story MHC molecules Let's tell a story MHC molecules Another use of Phylogenies
Introduction to Phylogenetic Analysis
Subjects of this lecture Introduction to Phylogenetic nalysis Irit Orr 1 Introducing some of the terminology of phylogenetics. 2 Introducing some of the most commonly used methods for phylogenetic analysis.
More informationIntroduction to Bioinformatics AS 250.265 Laboratory Assignment 6
Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 In the last lab, you learned how to perform basic multiple sequence alignments. While useful in themselves for determining conserved residues
More informationName Class Date. binomial nomenclature. MAIN IDEA: Linnaeus developed the scientific naming system still used today.
Section 1: The Linnaean System of Classification 17.1 Reading Guide KEY CONCEPT Organisms can be classified based on physical similarities. VOCABULARY taxonomy taxon binomial nomenclature genus MAIN IDEA:
More informationProtein Sequence Analysis - Overview -
Protein Sequence Analysis - Overview - UDEL Workshop Raja Mazumder Research Associate Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center Topics Why do protein
More informationName: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.
Name: Class: Date: Chapter 17 Practice Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The correct order for the levels of Linnaeus's classification system,
More informationPhylogenetic Trees Made Easy
Phylogenetic Trees Made Easy A How-To Manual Fourth Edition Barry G. Hall University of Rochester, Emeritus and Bellingham Research Institute Sinauer Associates, Inc. Publishers Sunderland, Massachusetts
More informationBio-Informatics Lectures. A Short Introduction
Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively
More informationNetwork Protocol Analysis using Bioinformatics Algorithms
Network Protocol Analysis using Bioinformatics Algorithms Marshall A. Beddoe Marshall_Beddoe@McAfee.com ABSTRACT Network protocol analysis is currently performed by hand using only intuition and a protocol
More informationMaximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1
Maximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1 Ziheng Yang Department of Animal Science, Beijing Agricultural University Felsenstein s maximum-likelihood
More informationThe Central Dogma of Molecular Biology
Vierstraete Andy (version 1.01) 1/02/2000 -Page 1 - The Central Dogma of Molecular Biology Figure 1 : The Central Dogma of molecular biology. DNA contains the complete genetic information that defines
More informationActivity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations
Activity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations SCENARIO You have responded, as a result of a call from the police to the Coroner s Office, to the scene of the death of
More information4. Why are common names not good to use when classifying organisms? Give an example.
1. Define taxonomy. Classification of organisms 2. Who was first to classify organisms? Aristotle 3. Explain Aristotle s taxonomy of organisms. Patterns of nature: looked like 4. Why are common names not
More informationVisualization of Phylogenetic Trees and Metadata
Visualization of Phylogenetic Trees and Metadata November 27, 2015 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.clcbio.com support-clcbio@qiagen.com
More informationPROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org
BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab Mai S. Mabrouk 1, Marwa Hamdy 2, Marwa Mamdouh 2, Marwa Aboelfotoh 2,Yasser M. Kadah 2 1 Biomedical Engineering Department,
More informationAlgorithms in Computational Biology (236522) spring 2007 Lecture #1
Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office
More informationSequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment
Sequence Analysis 15: lecture 5 Substitution matrices Multiple sequence alignment A teacher's dilemma To understand... Multiple sequence alignment Substitution matrices Phylogenetic trees You first need
More informationMAKING AN EVOLUTIONARY TREE
Student manual MAKING AN EVOLUTIONARY TREE THEORY The relationship between different species can be derived from different information sources. The connection between species may turn out by similarities
More informationThe Story of Human Evolution Part 1: From ape-like ancestors to modern humans
The Story of Human Evolution Part 1: From ape-like ancestors to modern humans Slide 1 The Story of Human Evolution This powerpoint presentation tells the story of who we are and where we came from - how
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationCore Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1
Core Bioinformatics 2014/2015 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformàtica/Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: Sonia.Casillas@uab.cat
More informationRETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
More informationPHYLOGENETIC ANALYSIS
Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Second Edition Andreas D. Baxevanis, B.F. Francis Ouellette Copyright 2001 John Wiley & Sons, Inc. ISBNs: 0-471-38390-2 (Hardback);
More information1) Orthology of zebrafish HoxD4 and euteleost HoxD4a:
Supplementary Material for Karen D Crow, Peter F. Stadler, Vincent J. Lynch, Chris Amemiya, and Günter P. Wagner. 2005 The fish specific Hox cluster duplication is coincident with the origin of teleosts.
More informationFinal Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
More informationGuide for Bioinformatics Project Module 3
Structure- Based Evidence and Multiple Sequence Alignment In this module we will revisit some topics we started to look at while performing our BLAST search and looking at the CDD database in the first
More information2.3 Identify rrna sequences in DNA
2.3 Identify rrna sequences in DNA For identifying rrna sequences in DNA we will use rnammer, a program that implements an algorithm designed to find rrna sequences in DNA [5]. The program was made by
More informationBayesian Phylogeny and Measures of Branch Support
Bayesian Phylogeny and Measures of Branch Support Bayesian Statistics Imagine we have a bag containing 100 dice of which we know that 90 are fair and 10 are biased. The
More informationBIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS
BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:
More information17.1. The Tree of Life CHAPTER 17. Organisms can be classified based on physical similarities. Linnaean taxonomy. names.
SECTION 17.1 THE LINNAEAN SYSTEM OF CLASSIFICATION Study Guide KEY CONCEPT Organisms can be classified based on physical similarities. VOCABULARY taxonomy taxon binomial nomenclature genus MAIN IDEA: Linnaeus
More informationIntroduction to Bioinformatics 3. DNA editing and contig assembly
Introduction to Bioinformatics 3. DNA editing and contig assembly Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 matthewb@ba.ars.usda.gov
More informationHigh Throughput Network Analysis
High Throughput Network Analysis Sumeet Agarwal 1,2, Gabriel Villar 1,2,3, and Nick S Jones 2,4,5 1 Systems Biology Doctoral Training Centre, University of Oxford, Oxford OX1 3QD, United Kingdom 2 Department
More informationHorizontal Gene Transfer and Its Part in the Reorganisation of Genetics during the LUCA Epoch
Life 2013, 3, 518-523; doi:10.3390/life3040518 Editorial OPEN ACCESS life ISSN 2075-1729 www.mdpi.com/journal/life Horizontal Gene Transfer and Its Part in the Reorganisation of Genetics during the LUCA
More informationA Primer of Genome Science THIRD
A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:
More informationProtein Phylogenies and Signature Sequences: A Reappraisal of Evolutionary Relationships among Archaebacteria, Eubacteria, and Eukaryotes
MICROBIOLOGY AND MOLECULAR BIOLOGY REVIEWS, Dec. 1998, p. 1435 1491 Vol. 62, No. 4 1092-2172/98/$04.00 0 Copyright 1998, American Society for Microbiology. All Rights Reserved. Protein Phylogenies and
More informationEfficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing
Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing James D. Jackson Philip J. Hatcher Department of Computer Science Kingsbury Hall University of New Hampshire Durham,
More informationPairwise Sequence Alignment
Pairwise Sequence Alignment carolin.kosiol@vetmeduni.ac.at SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What
More informationA short guide to phylogeny reconstruction
A short guide to phylogeny reconstruction E. Michu Institute of Biophysics, Academy of Sciences of the Czech Republic, Brno, Czech Republic ABSTRACT This review is a short introduction to phylogenetic
More informationPrinciples of Evolution - Origin of Species
Theories of Organic Evolution X Multiple Centers of Creation (de Buffon) developed the concept of "centers of creation throughout the world organisms had arisen, which other species had evolved from X
More information11, Olomouc, 783 71, Czech Republic. Version of record first published: 24 Sep 2012.
This article was downloaded by: [Knihovna Univerzity Palackeho], [Vladan Ondrej] On: 24 September 2012, At: 05:24 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954
More informationScaling the gene duplication problem towards the Tree of Life: Accelerating the rspr heuristic search
Scaling the gene duplication problem towards the Tree of Life: Accelerating the rspr heuristic search André Wehe 1 and J. Gordon Burleigh 2 1 Department of Computer Science, Iowa State University, Ames,
More informationBioinformatics Grid - Enabled Tools For Biologists.
Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis
More informationLab 2/Phylogenetics/September 16, 2002 1 PHYLOGENETICS
Lab 2/Phylogenetics/September 16, 2002 1 Read: Tudge Chapter 2 PHYLOGENETICS Objective of the Lab: To understand how DNA and protein sequence information can be used to make comparisons and assess evolutionary
More informationWhat mathematical optimization can, and cannot, do for biologists. Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL
What mathematical optimization can, and cannot, do for biologists Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL Introduction There is no shortage of literature about the
More informationPHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference
PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference Stephane Guindon, F. Le Thiec, Patrice Duroux, Olivier Gascuel To cite this version: Stephane Guindon, F. Le Thiec, Patrice
More informationArbres formels et Arbre(s) de la Vie
Arbres formels et Arbre(s) de la Vie A bit of history and biology Definitions Numbers Topological distances Consensus Random models Algorithms to build trees Basic principles DATA sequence alignment distance
More informationSubstitute 4 for x in the function, Simplify.
Page 1 of 19 Review of Eponential and Logarithmic Functions An eponential function is a function in the form of f ( ) = for a fied ase, where > 0 and 1. is called the ase of the eponential function. The
More informationPhylogenetic Analysis using MapReduce Programming Model
2015 IEEE International Parallel and Distributed Processing Symposium Workshops Phylogenetic Analysis using MapReduce Programming Model Siddesh G M, K G Srinivasa*, Ishank Mishra, Abhinav Anurag, Eklavya
More informationExample: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering
Overview Prognostic Models and Data Mining in Medicine, part I Cluster Analsis What is Cluster Analsis? K-Means Clustering Hierarchical Clustering Cluster Validit Eample: Microarra data analsis 6 Summar
More informationData Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov
Data Integration Lectures 16 & 17 Lectures Outline Goals for Data Integration Homogeneous data integration time series data (Filkov et al. 2002) Heterogeneous data integration microarray + sequence microarray
More informationTheory of Evolution. A. the beginning of life B. the evolution of eukaryotes C. the evolution of archaebacteria D. the beginning of terrestrial life
Theory of Evolution 1. In 1966, American biologist Lynn Margulis proposed the theory of endosymbiosis, or the idea that mitochondria are the descendents of symbiotic, aerobic eubacteria. What does the
More informationA branch-and-bound algorithm for the inference of ancestral. amino-acid sequences when the replacement rate varies among
A branch-and-bound algorithm for the inference of ancestral amino-acid sequences when the replacement rate varies among sites Tal Pupko 1,*, Itsik Pe er 2, Masami Hasegawa 1, Dan Graur 3, and Nir Friedman
More informationBIRCH: An Efficient Data Clustering Method For Very Large Databases
BIRCH: An Efficient Data Clustering Method For Very Large Databases Tian Zhang, Raghu Ramakrishnan, Miron Livny CPSC 504 Presenter: Discussion Leader: Sophia (Xueyao) Liang HelenJr, Birches. Online Image.
More informationA Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML
9 June 2011 A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML by Jun Inoue, Mario dos Reis, and Ziheng Yang In this tutorial we will analyze
More informationJust the Facts: A Basic Introduction to the Science Underlying NCBI Resources
1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools
More informationWorksheet - COMPARATIVE MAPPING 1
Worksheet - COMPARATIVE MAPPING 1 The arrangement of genes and other DNA markers is compared between species in Comparative genome mapping. As early as 1915, the geneticist J.B.S Haldane reported that
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining /8/ What is Cluster
More informationLecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions
SMA 50: Statistical Learning and Data Mining in Bioinformatics (also listed as 5.077: Statistical Learning and Data Mining ()) Spring Term (Feb May 200) Faculty: Professor Roy Welsch Wed 0 Feb 7:00-8:0
More informationUser Manual for SplitsTree4 V4.14.2
User Manual for SplitsTree4 V4.14.2 Daniel H. Huson and David Bryant November 4, 2015 Contents Contents 1 1 Introduction 4 2 Getting Started 5 3 Obtaining and Installing the Program 5 4 Program Overview
More informationREGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf])
820 REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf]) (See also General Regulations) BMS1 Admission to the Degree To be eligible for admission to the degree of Bachelor
More informationMolecular Clocks and Tree Dating with r8s and BEAST
Integrative Biology 200B University of California, Berkeley Principals of Phylogenetics: Ecology and Evolution Spring 2011 Updated by Nick Matzke Molecular Clocks and Tree Dating with r8s and BEAST Today
More informationBioinformatics: Network Analysis
Bioinformatics: Network Analysis Graph-theoretic Properties of Biological Networks COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Outline Architectural features Motifs, modules,
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
More informationBiological Sciences Initiative. Human Genome
Biological Sciences Initiative HHMI Human Genome Introduction In 2000, researchers from around the world published a draft sequence of the entire genome. 20 labs from 6 countries worked on the sequence.
More informationClassifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental
More informationSupporting Online Material for
www.sciencemag.org/cgi/content/full/312/5781/1762/dc1 Supporting Online Material for Silk Genes Support the Single Origin of Orb Webs Jessica E. Garb,* Teresa DiMauro, Victoria Vo, Cheryl Y. Hayashi *To
More informationGenome Explorer For Comparative Genome Analysis
Genome Explorer For Comparative Genome Analysis Jenn Conn 1, Jo L. Dicks 1 and Ian N. Roberts 2 Abstract Genome Explorer brings together the tools required to build and compare phylogenies from both sequence
More informationThe world of non-coding RNA. Espen Enerly
The world of non-coding RNA Espen Enerly ncrna in general Different groups Small RNAs Outline mirnas and sirnas Speculations Common for all ncrna Per def.: never translated Not spurious transcripts Always/often
More information1 Mutation and Genetic Change
CHAPTER 14 1 Mutation and Genetic Change SECTION Genes in Action KEY IDEAS As you read this section, keep these questions in mind: What is the origin of genetic differences among organisms? What kinds
More informationBLAST. Anders Gorm Pedersen & Rasmus Wernersson
BLAST Anders Gorm Pedersen & Rasmus Wernersson Database searching Using pairwise alignments to search databases for similar sequences Query sequence Database Database searching Most common use of pairwise
More informationInferred thermophily of the last universal ancestor based on estimated
1 Inferred thermophily of the last universal ancestor based on estimated amino acid composition Dawn J. Brooks and Eric A. Gaucher Address: Foundation for Applied Molecular Evolution, Gainesville, Florida
More informationClustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is
Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is
More informationHuman-Mouse Synteny in Functional Genomics Experiment
Human-Mouse Synteny in Functional Genomics Experiment Ksenia Krasheninnikova University of the Russian Academy of Sciences, JetBrains krasheninnikova@gmail.com September 18, 2012 Ksenia Krasheninnikova
More information1. Over the past century, several scientists around the world have made the following observations:
Evolution Keystone Review 1. Over the past century, several scientists around the world have made the following observations: New mitochondria and plastids can only be generated by old mitochondria and
More informationPHYLOGENY AND COMPARATIVE METHODS SYMBIOMICS WORKSHOP
PHYLOGENY AND COMPARATIVE METHODS SYMBIOMICS WORKSHOP March 4-7, 2013 Valencia, Spain Parc Cientific of the University of Valencia Goals The aim of this workshop is to provide the attendees with a broad
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Clustering Algorithms K-means and its variants Hierarchical clustering
More informationIntroduction to Genome Annotation
Introduction to Genome Annotation AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT
More informationDNA Sequence Alignment Analysis
Analysis of DNA sequence data p. 1 Analysis of DNA sequence data using MEGA and DNAsp. Analysis of two genes from the X and Y chromosomes of plant species from the genus Silene The first two computer classes
More informationDnaSP, DNA polymorphism analyses by the coalescent and other methods.
DnaSP, DNA polymorphism analyses by the coalescent and other methods. Author affiliation: Julio Rozas 1, *, Juan C. Sánchez-DelBarrio 2,3, Xavier Messeguer 2 and Ricardo Rozas 1 1 Departament de Genètica,
More informationAP Biology 2015 Free-Response Questions
AP Biology 2015 Free-Response Questions College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home
More informationUCHIME in practice Single-region sequencing Reference database mode
UCHIME in practice Single-region sequencing UCHIME is designed for experiments that perform community sequencing of a single region such as the 16S rrna gene or fungal ITS region. While UCHIME may prove
More informationMolecular typing of VTEC: from PFGE to NGS-based phylogeny
Molecular typing of VTEC: from PFGE to NGS-based phylogeny Valeria Michelacci 10th Annual Workshop of the National Reference Laboratories for E. coli in the EU Rome, November 5 th 2015 Molecular typing
More informationUnraveling protein networks with Power Graph Analysis
Unraveling protein networks with Power Graph Analysis PLoS Computational Biology, 2008 Loic Royer Matthias Reimann Bill Andreopoulos Michael Schroeder Schroeder Group Bioinformatics 1 Complex Networks
More informationCCR Biology - Chapter 10 Practice Test - Summer 2012
Name: Class: Date: CCR Biology - Chapter 10 Practice Test - Summer 2012 Multiple Choice Identify the choice that best completes the statement or answers the question. 1. What is the term for a feature
More information2011.008a-cB. Code assigned:
This form should be used for all taxonomic proposals. Please complete all those modules that are applicable (and then delete the unwanted sections). For guidance, see the notes written in blue and the
More informationSection 3 Comparative Genomics and Phylogenetics
Section 3 Section 3 Comparative enomics and Phylogenetics At the end of this section you should be able to: Describe what is meant by DNA sequencing. Explain what is meant by Bioinformatics and Comparative
More informationHierarchical Bayesian Modeling of the HIV Response to Therapy
Hierarchical Bayesian Modeling of the HIV Response to Therapy Shane T. Jensen Department of Statistics, The Wharton School, University of Pennsylvania March 23, 2010 Joint Work with Alex Braunstein and
More informationAP Biology Essential Knowledge Student Diagnostic
AP Biology Essential Knowledge Student Diagnostic Background The Essential Knowledge statements provided in the AP Biology Curriculum Framework are scientific claims describing phenomenon occurring in
More informationSystematics - BIO 615
Outline - and introduction to phylogenetic inference 1. Pre Lamarck, Pre Darwin Classification without phylogeny 2. Lamarck & Darwin to Hennig (et al.) Classification with phylogeny but without a reproducible
More informationHuman Genome Organization: An Update. Genome Organization: An Update
Human Genome Organization: An Update Genome Organization: An Update Highlights of Human Genome Project Timetable Proposed in 1990 as 3 billion dollar joint venture between DOE and NIH with 15 year completion
More informationExploratory data analysis for microarray data
Eploratory data analysis for microarray data Anja von Heydebreck Ma Planck Institute for Molecular Genetics, Dept. Computational Molecular Biology, Berlin, Germany heydebre@molgen.mpg.de Visualization
More informationCore Bioinformatics. Titulació Tipus Curs Semestre. 4313473 Bioinformàtica/Bioinformatics OB 0 1
Core Bioinformatics 2014/2015 Codi: 42397 Crèdits: 12 Titulació Tipus Curs Semestre 4313473 Bioinformàtica/Bioinformatics OB 0 1 Professor de contacte Nom: Sònia Casillas Viladerrams Correu electrònic:
More informationA Non-Linear Schema Theorem for Genetic Algorithms
A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland
More informationTutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
More informationBioinformatics Resources at a Glance
Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences
More informationMATCH Commun. Math. Comput. Chem. 61 (2009) 781-788
MATCH Communications in Mathematical and in Computer Chemistry MATCH Commun. Math. Comput. Chem. 61 (2009) 781-788 ISSN 0340-6253 Three distances for rapid similarity analysis of DNA sequences Wei Chen,
More informationMaster's projects at ITMO University. Daniil Chivilikhin PhD Student @ ITMO University
Master's projects at ITMO University Daniil Chivilikhin PhD Student @ ITMO University General information Guidance from our lab's researchers Publishable results 2 Research areas Research at ITMO Evolutionary
More informationIEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 6, NO. 2, APRIL-JUNE 2009 1
IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 6, NO. 2, APRIL-JUNE 2009 1 The Gene-Duplication Problem: Near-Linear Time Algorithms for NNI-Based Local Searches Mukul S. Bansal,
More informationIsoBase: a database of functionally related proteins across PPI networks
D295 D300 doi:10.1093/nar/gkq1234 IsoBase: a database of functionally related proteins across PPI networks Daniel Park 1,2, Rohit Singh 1, Michael Baym 1,3,4, Chung-Shou Liao 5 and Bonnie Berger 1,4, *
More informationAmino Acids and Their Properties
Amino Acids and Their Properties Recap: ss-rrna and mutations Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used So by aligning ss-rrna of one organism with that
More informationData for phylogenetic analysis
Data for phylogenetic analysis The data that are used to estimate the phylogeny of a set of tips are the characteristics of those tips. Therefore the success of phylogenetic inference depends in large
More information