# Performance study of supertree methods

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Q&A How did you become involved in doing research? I applied for an REU (Research for Undergraduates) at KU last summer and worked with Dr. Mark Holder for ten weeks. It was an amazing experience! How is the research process different from what you expected? The amount of trial and error used in fixing everything. It s not as simple as going from point A to B. There s a lot of tear-downs in between. Patrick Niehaus What is your favorite part of doing research? Running simulations on the supercomputer. HOMETOWN Hilton Head Island, South Carolina MAJORS Computational Science, at University of South Carolina Beaufort ACADEMIC LEVEL Senior RESEARCH MENTOR Mark Holder, Associate Professor of Ecology & Evolutionary Biology Performance study of supertree methods Patrick Niehaus INTRODUCTION Phylogenetic trees depict the relationships among varying species [1]. A simple example of this is a family tree. When referring to Figure 1, the three most important components of a tree include: the Node(s) (denoted by squares), the Branches (denoted by lines), and the Tips (denoted by circles). The Nodes represent the hypothetical ancestors. The Branches represent the evolution of a lineage. Lastly, the Tips represent the species being studied. Most phylogenetic trees are estimated from DNA sequence data. A supertree is a phylogenetic tree estimate which is produced by merging smaller phylogenetic tree estimates together [2-4]. By combining several phylogenetic trees together, we can gain a better understanding of how species are related to one another. One example would be combining the human and primate phylogenetic trees. MRP works by translating each branch of each individual tree into a matrix column; this process is repeated until a character matrix is formed [5,6]. All taxa that descend from the branch are represented in the matrix with a 1, other taxa that are found in the tree are represented with a 0, and taxa that aren t found in the tree are represented with the missing data symbol?. Each branch of each input tree is encoded as a series of columns. Once complete, the character matrix includes all of the phylogenetic relationships contained in the source trees. The supertree is estimated from this Figure 1. This figure is a visual example of a tree. Summer 2013 Spring

2 matrix using the same parsimony approach that is used when estimating a tree from two-state character data. Refer to Figure 2 for additional reference. TAG exploits the fact that trees are actually part of a graph. Phylogenetic trees are directed and require that each node has, at most, one parent. However, TAG relaxes these requirements, and allows multiple trees to be combined into one common graph [1]. Through this process, computational time and resources are believed to be greatly minimized. Refer to Figure 3 for additional reference. MRP is one of the more accurate supertree methods for problems with smaller taxa sets. When faced with a larger taxa set, MRP loses accuracy. TAG, however, is designed to work best with larger taxa inputs and is hypothesized to be more accurate and efficient when working with a larger taxonomy input. METHODS This study first requires an input taxonomy. A true tree is generated from the taxonomy by randomly resolving areas of uncertainty and then altering the tree to mimic the effects of taxonomic errors. Several input trees are generated from the true tree to mimic a set of inputs which could be obtained by several overlapping phylogenetic studies. The input trees produced sub-sample the taxa in the taxonomy, and also differ from the true tree because the simulator introduces random changes to the topology. Each input tree is then run individually through MRP and TAG as an estimate and are then compared to the true tree. Whichever estimate is closest to the true tree (e.g. the least amount of false negatives and positives) and takes the least amount of time is believed to be the more accurate and efficient algorithm. Refer to Figure 4 for additional information. Figure 2. This figure is a visual example of the MRP process. Figure 3. This figure is a visual example of the TAG process. Figure 4. This figure is a visual representation of my model. 42 JOURNAL OF UNDERGRADUATE RESEARCH

3 RESULTS At this point of the study, MRP and TAG have only been compared with the primates taxonomy, totaling around 1400 input lines. With the smaller taxonomy input, there are many differences in the outputs produced by MRP and TAG. The differences seem to be caused by false positives, false negatives, and differences between MRP and TAG to the consensus tree (CON). A false positive occurs when the input tree has a branch that isn t found in the true tree, and a false negative occurs when the true tree has a branch that isn t found in the input tree. A consensus tree is a tree that has all of the branches common to the input trees. This is particularly helpful in finding how similar the outputs of MRP and TAG are. As of now, it appears that MRP far surpasses TAG for a smaller taxa input, such as the primates taxonomy. However, Figures 5 and 6 show how the number of input trees affects the outputs of MRP and TAG in terms of false negatives and positives. MRP appears to become more effective in terms of false negatives and positives as the number of input trees increases. TAG, however, appears to become less effective as the number of trees increases, and even appears to have about the same number of false negatives/positives after around 60 input trees. To explain why the outputs of MRP and TAG vary so greatly when the number of input trees was increased, detailed bar graphs were generated for 10 and 200 total input trees (Figures 7 and 8). When referring to Figure 7, it appears that TAG surpasses MRP in terms of false positives and false negatives. However, when looking at MRPtoTAG we see that the two are returning very different outputs, and referring to TAGtoMRPCON only confirms the varying outputs. Figure 8 shows that MRP far surpasses TAG in both false positives and false negatives, and in its comparison to the consensus tree. Therefore, it s easily apparent that MRP requires a Figures 7 and 8. These two figures show a detailed method comparison for MRP and TAG. These two outputs were chosen at random. This was done to avoid any bias. Figures 5 and 6. These two figures show how increasing the number of input trees affects the number of false positives and false negatives. Each dot represents an average of 10 different simulations for a given number (x) of input trees. This was done to avoid any large fluctuations in the outputs of MRP and TAG. The number of input trees (x-axis) goes in the order of 5, 10, 20, 40, 80, and 200. Summer 2013 Spring

4 larger number of input trees, but TAG, when finished, may be able to require less input trees. Figures 9 and 10 show what happens when we increase the SPR value for a total of 80 input trees. SPR stands for Sub-tree Prune Re-graph. In essence, this cuts off a branch of the tree and reconnects it elsewhere. The larger the SPR value, the further along the tree the branch is placed. Interestingly enough, this caused MRP to return more false positives as SPR was increased, but TAG s false positives remained below 30. In terms of false negatives, TAG remains consistent, and MRP slowly rises. At this time, we re unsure as to why TAG returns so many false negatives despite the lower SPR value. This will be further investigated at a later stage in the study. CONCLUSION While MRP seems to be the more effective algorithm in this part of the study, it is important to remember that TAG was designed to work with inputs in the form of many large overlapping phylogenetic trees with a complementary taxonomic hierarchy. It is unclear how to best simulate a realistic taxonomic input in our study system, which may result in suboptimal performance by the algorithm. In future work, MRP and TAG will be compared with a larger taxonomy (e.g. the Tree of Life, which is around 2.7 million species). Figures 9 and 10. These two figures show how changes in SPR affect the number of false positives and false negatives. Each dot represents an increase in SPR for a total of 80 input trees over an average of 10 simulations. This study is being conducted with the principles of open source code. All non-proprietary software is publicly viewable through the links listed below. https://github.com/mtholder/supertree-study. This link holds the code which ran the entire model illustrated in Figure 4. https://github.com/opentreeoflife/big-tree-collection-simulator. This link represents the algorithm that takes the input taxonomy and creates a true tree and the input tree(s). Refer to Figure 4 for additional information. https://github.com/opentreeoflife/treemachine. This link holds the code of the TAG algorithm. 44 JOURNAL OF UNDERGRADUATE RESEARCH

5 Bibliography 1. Smith SA, Brown JW, Hinchliff CE (2013) Analyzing and Synthesizing Phylogenies Using Tree Alignment Graphs. PLoS Comput Biol 9(9): e doi: /journal.pcbi Bininda-Emonds OR, editor (2004) Phylogenetic supertrees: Combining information to reveal the Tree of Life (Computational Biology). Springer, 564 pp. 3. Davis KE, Hill J (2010) The Supertree Tool Kit. BMC Research Notes 3: Von Haeseler A (2012) Do we still need supertrees?. BMC Biology 10: 13. doi: /preaccept Hill J, Davis K (2014) The Supertree Toolkit 2: a new and improved software package with a Graphical User Interface for supertree construction. Biodiversity Data Journal 2: e1053. doi: /BDJ.2.e1053 Summer 2013 Spring

### Student Guide for Mesquite

MESQUITE Student User Guide 1 Student Guide for Mesquite This guide describes how to 1. create a project file, 2. construct phylogenetic trees, and 3. map trait evolution on branches (e.g. morphological

### Biology 164 Laboratory PHYLOGENETIC SYSTEMATICS

Biology 164 Laboratory PHYLOGENETIC SYSTEMATICS Objectives 1. To become familiar with the cladistic approach to reconstruction of phylogenies. 2. To construct a character matrix and phylogeny for a group

### How to Build a Phylogenetic Tree

How to Build a Phylogenetic Tree Phylogenetics tree is a structure in which species are arranged on branches that link them according to their relationship and/or evolutionary descent. A typical rooted

### Name Class Date. binomial nomenclature. MAIN IDEA: Linnaeus developed the scientific naming system still used today.

Section 1: The Linnaean System of Classification 17.1 Reading Guide KEY CONCEPT Organisms can be classified based on physical similarities. VOCABULARY taxonomy taxon binomial nomenclature genus MAIN IDEA:

### Schools of Systematics. Schools of Systematics. What is Evolutionary Systematics? What is Evolutionary Systematics? What is Evolutionary Systematics?

Topic 3: Systematics II What are the schools of thought of systematics? How does one make a cladogram? What are higher taxa & ranks? How should they be used in this course? Applying phylogenies Evolution

### Scaling the gene duplication problem towards the Tree of Life: Accelerating the rspr heuristic search

Scaling the gene duplication problem towards the Tree of Life: Accelerating the rspr heuristic search André Wehe 1 and J. Gordon Burleigh 2 1 Department of Computer Science, Iowa State University, Ames,

### Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

### 1. Which tree below is the most accurate? 1. They re exactly the same.

Tree Thinking Assessment Quiz Complete on your own without any other resources. This is just an assessment to determine your current understanding. 1. Which tree below is the most accurate? 1. They re

### Molecular Clocks and Tree Dating with r8s and BEAST

Integrative Biology 200B University of California, Berkeley Principals of Phylogenetics: Ecology and Evolution Spring 2011 Updated by Nick Matzke Molecular Clocks and Tree Dating with r8s and BEAST Today

### Taxonomy Five Kingdoms

Taxonomy Five Kingdoms R.H Whittaker 1969 Three Domains Evolutionary Trees Cladistics: Cladograms and molecular data What are the tools used by scientists to observe and understand evolutionary relationships?

### Gene Trees in Species Trees

Computational Biology Class Presentation Gene Trees in Species Trees Wayne P. Maddison Published in Systematic Biology, Vol. 46, No.3 (Sept 1997), pp. 523-536 Overview Gene Trees and Species Trees Processes

### 18.1 Finding Order in Diversity

18.1 Finding Order in Diversity Lesson Objectives Describe the goals of binomial nomenclature and systematics. Identify the taxa in the classification system devised by Linnaeus. Lesson Summary Assigning

### Assign: Unit 1: Preparation Activity page 4-7. Chapter 1: Classifying Life s Diversity page 8

Assign: Unit 1: Preparation Activity page 4-7 Chapter 1: Classifying Life s Diversity page 8 1.1: Identifying, Naming, and Classifying Species page 10 Key Terms: species, morphology, phylogeny, taxonomy,

### COMP9417: Machine Learning and Data Mining

COMP9417: Machine Learning and Data Mining A note on the two-class confusion matrix, lift charts and ROC curves Session 1, 2003 Acknowledgement The material in this supplementary note is based in part

### Visualization of Phylogenetic Trees and Metadata

Visualization of Phylogenetic Trees and Metadata November 27, 2015 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.clcbio.com support-clcbio@qiagen.com

### Consensus trees and tree support

Newsletter 64 28 Consensus trees and tree support In this article I will look at two separate issues; consensus trees and support for the nodes on your tree. There is a tenuous link between these as we

### Fault Localization in a Software Project using Back- Tracking Principles of Matrix Dependency

Fault Localization in a Software Project using Back- Tracking Principles of Matrix Dependency ABSTRACT Fault identification and testing has always been the most specific concern in the field of software

### PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org

BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab Mai S. Mabrouk 1, Marwa Hamdy 2, Marwa Mamdouh 2, Marwa Aboelfotoh 2,Yasser M. Kadah 2 1 Biomedical Engineering Department,

### Arbres formels et Arbre(s) de la Vie

Arbres formels et Arbre(s) de la Vie A bit of history and biology Definitions Numbers Topological distances Consensus Random models Algorithms to build trees Basic principles DATA sequence alignment distance

### Chapter 12. GARBAGE IN, GARBAGE OUT Data issues in supertree construction. 1. Introduction

Chapter 12 GARBAGE IN, GARBAGE OUT Data issues in supertree construction Olaf R. P. Bininda-Emonds, Kate E. Jones, Samantha A. Price, Marcel Cardillo, Richard Grenyer, and Andy Purvis Abstract: Keywords:

### Chapter 7. Hierarchical cluster analysis. Contents 7-1

7-1 Chapter 7 Hierarchical cluster analysis In Part 2 (Chapters 4 to 6) we defined several different ways of measuring distance (or dissimilarity as the case may be) between the rows or between the columns

### Polynomial Supertree Methods Revisited

Polynomial Supertree Methods Revisited Malte Brinkmeyer, Thasso Griebel, and Sebastian Böcker Department of Computer Science, Friedrich Schiller University, 07743 Jena, Germany {malte.b,thasso.griebel,sebastian.boecker}@uni-jena.de

### Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6

Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 In the last lab, you learned how to perform basic multiple sequence alignments. While useful in themselves for determining conserved residues

### Phylogenetic Analysis using MapReduce Programming Model

2015 IEEE International Parallel and Distributed Processing Symposium Workshops Phylogenetic Analysis using MapReduce Programming Model Siddesh G M, K G Srinivasa*, Ishank Mishra, Abhinav Anurag, Eklavya

### Genetic Algorithms and Evolutionary Computation

Genetic Algorithms and Evolutionary Computation Matteo Matteucci and Andrea Bonarini {matteucci,bonarini}@elet.polimi.it Department of Electronics and Information Politecnico di Milano Genetic Algorithms

### Protein Protein Interaction Networks

Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

### foundations of artificial intelligence acting humanly: Searle s Chinese Room acting humanly: Turing Test

cis20.2 design and implementation of software applications 2 spring 2010 lecture # IV.1 introduction to intelligent systems AI is the science of making machines do things that would require intelligence

### GAMEPLAY-BASED OPTIMIZATION OF A SIMPLE 2D SIDE-SCROLLING MOBILE GAME USING ARTIFICIAL EVOLUTION EU CHUN VUI

GAMEPLAY-BASED OPTIMIZATION OF A SIMPLE 2D SIDE-SCROLLING MOBILE GAME USING ARTIFICIAL EVOLUTION EU CHUN VUI FACULTY OF COMPUTING AND INFORMATICS UNIVERSITY MALAYSIA SABAH 2015 ABSTRACT Evolutionary Computing

### BIOLOGY B. AS and A LEVEL (ADVANCING BIOLOGY) Theme: The development of species: evolution and classification Delivery Guide H022/H422

AS and A LEVEL Delivery Guide H022/H422 BIOLOGY B (ADVANCING BIOLOGY) Theme: The development of species: evolution and classification 3.1.3 January 2016 We will inform centres about any changes to the

### UML Use Case Diagram? Basic Use Case Diagram Symbols and Notations

This file will be helpful during viva exam. You should have all the knowledge about the diagrams which you have included in your presentation. You should know all the symbols, relationships. You must prepare

### Core Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1

Core Bioinformatics 2014/2015 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformàtica/Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: Sonia.Casillas@uab.cat

### The species concept in Aspergillus: recommendations of an international workshop

The species concept in Aspergillus: recommendations of an international workshop Robert A. Samson & János Varga CBS Fungal Biodiversity Centre Utrecht, The Netherlands Overview Introduction What is a species

### Bayesian Phylogeny and Measures of Branch Support

Bayesian Phylogeny and Measures of Branch Support Bayesian Statistics Imagine we have a bag containing 100 dice of which we know that 90 are fair and 10 are biased. The

### 6 Creating the Animation

6 Creating the Animation Now that the animation can be represented, stored, and played back, all that is left to do is understand how it is created. This is where we will use genetic algorithms, and this

### Lab 2 - Illustrating Evolutionary Relationships Between Organisms: Emperor Penguins and Phylogenetic Trees

Biology 18 Spring 2008 Lab 2 - Illustrating Evolutionary Relationships Between Organisms: Emperor Penguins and Phylogenetic Trees Pre-Lab Reference Reading: Review pp. 542-556 and pp. 722-737 in Life by

### Chapter 15. UNROOTED SUPERTREES Limitations, traps, and phylogenetic patchworks. Sebastian Böcker

Chapter 15 UNROOTED SUPERTREES Limitations, traps, and phylogenetic patchworks Sebastian Böcker Abstract: Whereas biologists might think of rooted trees as the natural, or even the only, way to display

### Computational Analysis of Potential Rainforest Monoculture Due to Slash-and-Burn. Techniques Phase II. New Mexico. Supercomputing Challenge

Computational Analysis of Potential Rainforest Monoculture Due to Slash-and-Burn Techniques Phase II New Mexico Supercomputing Challenge Final Report April 1, 2015 Team 125 School of Dreams Academy Team

### Constructing Phylogenetic Trees Using Maximum Likelihood

Claremont Colleges Scholarship @ Claremont Scripps Senior Theses Scripps Student Scholarship 2012 Constructing Phylogenetic Trees Using Maximum Likelihood Anna Cho Scripps College Recommended Citation

### MCMC A T T T G C T C B T T C C C T C C G C C T C T C D C C T T C T C. (Saitou and Nei, 1987) (Swofford and Begle, 1993)

MCMC 1 1 1 DNA 1 2 3 4 5 6 7 A T T T G C T C B T T C C C T C C G C C T C T C D C C T T C T C ( 2) (Saitou and Nei, 1987) (Swofford and Begle, 1993) 1 A B C D (Vos, 2003) (Zwickl, 2006) (Morrison, 2007)

### Biodiversity Data Analysis: Testing Statistical Hypotheses By Joanna Weremijewicz, Simeon Yurek, Steven Green, Ph. D. and Dana Krempels, Ph. D.

Biodiversity Data Analysis: Testing Statistical Hypotheses By Joanna Weremijewicz, Simeon Yurek, Steven Green, Ph. D. and Dana Krempels, Ph. D. In biological science, investigators often collect biological

### Lab 2/Phylogenetics/September 16, 2002 1 PHYLOGENETICS

Lab 2/Phylogenetics/September 16, 2002 1 Read: Tudge Chapter 2 PHYLOGENETICS Objective of the Lab: To understand how DNA and protein sequence information can be used to make comparisons and assess evolutionary

### Module 2. Problem Solving using Search- (Single agent search) Version 1 CSE IIT, Kharagpur

Module 2 Problem Solving using Search- (Single agent search) Lesson 6 Informed Search Strategies-II 3.3 Iterative-Deepening A* 3.3.1 IDA* Algorithm Iterative deepening A* or IDA* is similar to iterative-deepening

### I. Use BLAST to Find DNA Sequences in Databases (Electronic PCR)

Using DNA Barcodes to Identify and Classify Living Things: Bioinformatics I. Use BLAST to Find DNA Sequences in Databases (Electronic PCR) 1. Perform a BLAST search as follows: a) Do an Internet search

### SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations

### Evolution of Birds. Summary:

Oregon State Standards OR Science 7.1, 7.2, 7.3, 7.3S.1, 7.3S.2 8.1, 8.2, 8.2L.1, 8.3, 8.3S.1, 8.3S.2 H.1, H.2, H.2L.4, H.2L.5, H.3, H.3S.1, H.3S.2, H.3S.3 Summary: Students create phylogenetic trees to

### Pairwise Sequence Alignment

Pairwise Sequence Alignment carolin.kosiol@vetmeduni.ac.at SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What

### Rarefaction Method DRAFT 1/5/2016 Our data base combines taxonomic counts from 23 agencies. The number of organisms identified and counted per sample

Rarefaction Method DRAFT 1/5/2016 Our data base combines taxonomic counts from 23 agencies. The number of organisms identified and counted per sample differs among agencies. Some count 100 individuals

### Molecules, Morphology and Species Concepts. Speciation Occurs at Widely Differing Rates. Speciation Rates

Molecules, Morphology and Species Concepts Speciation Occurs at Widely Differing Rates Horseshoe crabs (Limulus), the same as fossils 300 Mya Darwin s Finches: 13 species within 100,000 years. Speciation

### Biology 300 Homework assignment #1 Solutions. Assignment:

Biology 300 Homework assignment #1 Solutions Assignment: Chapter 1, Problems 6, 15 Chapter 2, Problems 6, 8, 9, 12 Chapter 3, Problems 4, 6, 15 Chapter 4, Problem 16 Answers in bold. Chapter 1 6. Identify

### Vector storage and access; algorithms in GIS. This is lecture 6

Vector storage and access; algorithms in GIS This is lecture 6 Vector data storage and access Vectors are built from points, line and areas. (x,y) Surface: (x,y,z) Vector data access Access to vector

### Protein Sequence Analysis - Overview -

Protein Sequence Analysis - Overview - UDEL Workshop Raja Mazumder Research Associate Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center Topics Why do protein

Phylogenetic Trees Made Easy A How-To Manual Fourth Edition Barry G. Hall University of Rochester, Emeritus and Bellingham Research Institute Sinauer Associates, Inc. Publishers Sunderland, Massachusetts

### SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October 17, 2015 Outline

### Experimental Analysis

Experimental Analysis Instructors: If your institution does not have the Fish Farm computer simulation, contact the project directors for information on obtaining it free of charge. The ESA21 project team

### Sequencing a Genome:

Sequencing a Genome: Inside the Washington University Genome Sequencing Center Activity Supplement (Electropherogram Interpretation) Project Outline The multimedia project Sequencing a Genome: Inside the

### Classification/Decision Trees (II)

Classification/Decision Trees (II) Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Right Sized Trees Let the expected misclassification rate of a tree T be R (T ).

### NSilico Life Science Introductory Bioinformatics Course

NSilico Life Science Introductory Bioinformatics Course INTRODUCTORY BIOINFORMATICS COURSE A public course delivered over three days on the fundamentals of bioinformatics and illustrated with lectures,

### Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical

### Variable selection using random forests

Pattern Recognition Letters 31 (2010) January 25, 2012 Outline 1 2 Sensitivity to n and p Sensitivity to mtry and ntree 3 Procedure Starting example 4 Prostate data Four high dimensional classication datasets

### Bioinformatics Lab. MODULE 1: Sequence Taxonomy

Student Activity Sheet Name: Bioinformatics Lab MODULE 1: Sequence Taxonomy Objective: The goal of this module is to introduce you to the number and diversity of nucleotide sequences in the NCBI database.

### COMPUTER AIDED DRAWING

Computer Aided Drawing (CAD) is where we produce our orthographic and 3D drawings using a computer software package. CAD has many advantages and disadvantages over traditional drawing on a drawing board

### Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office

### New Methods for Detecting Lineage-Specific Selection

New Methods for Detecting Lineage-Specific Selection Adam Siepel 1, Katherine S. Pollard 1, and David Haussler 1,2 1 Center for Biomolecular Science and Engineering, U.C. Santa Cruz, Santa Cruz, CA 95064,

### Module 6. Channel Coding. Version 2 ECE IIT, Kharagpur

Module 6 Channel Coding Lesson 35 Convolutional Codes After reading this lesson, you will learn about Basic concepts of Convolutional Codes; State Diagram Representation; Tree Diagram Representation; Trellis

### Interactive Exploration of Decision Tree Results

Interactive Exploration of Decision Tree Results 1 IRISA Campus de Beaulieu F35042 Rennes Cedex, France (email: pnguyenk,amorin@irisa.fr) 2 INRIA Futurs L.R.I., University Paris-Sud F91405 ORSAY Cedex,

### Understanding 31-derful. Abstract

Understanding 31-derful Emily Alfs Mathematics and Computer Science Doane College Crete, Nebraska 68333 emily.alfs@doane.edu Abstract Similar to Sudoku, 31-derful is a game where the player tries to place

### eflora and DialGraph, tools for enhancing identification processes in plants Fernando Sánchez Laulhé, Cecilio Cano Calonge, Antonio Jiménez Montaño

Nimis P. L., Vignes Lebbe R. (eds.) Tools for Identifying Biodiversity: Progress and Problems pp. 163-169. ISBN 978-88-8303-295-0. EUT, 2010. eflora and DialGraph, tools for enhancing identification processes

### BIOL 3306 Evolutionary Biology

BIOL 3306 Evolutionary Biology Maximum score: 22.5 Time Allowed: 90 min Total Problems: 17 1. The figure below shows two different hypotheses for the phylogenetic relationships among several major groups

### AP Biology Learning Objective Cards

1.1 The student is able to convert a data set from a table of numbers that reflect a change in the genetic makeup of a population over time and to apply mathematical methods and conceptual understandings

### A data management framework for the Fungal Tree of Life

Web Accessible Sequence Analysis for Biological Inference A data management framework for the Fungal Tree of Life Kauff F, Cox CJ, Lutzoni F. 2007. WASABI: An automated sequence processing system for multi-gene

### Schedule Risk Analysis Simulator using Beta Distribution

Schedule Risk Analysis Simulator using Beta Distribution Isha Sharma Department of Computer Science and Applications, Kurukshetra University, Kurukshetra, Haryana (INDIA) ishasharma211@yahoo.com Dr. P.K.

### BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

### 2013 Parking and Transportation Services Survey Analysis

2013 Parking and Transportation Services Survey Analysis Contents Summary... 3 Results... 5 Response Rate... 5 Overall Satisfaction... 6 All Respondents... 6 Faculty... 7 Staff... 8 Students... 9 Most

### 1 Basic Definitions and Concepts in Graph Theory

CME 305: Discrete Mathematics and Algorithms 1 Basic Definitions and Concepts in Graph Theory A graph G(V, E) is a set V of vertices and a set E of edges. In an undirected graph, an edge is an unordered

### A Guide to Reviewing TreeAge Pro Models

1 Summary The goals of this guide are to help reviewers with the following: 1. Understand the Model A reviewer needs to understand how to use TreeAge Pro to explore the model structure and to find the

### A Non-Linear Schema Theorem for Genetic Algorithms

A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland

### Inference of Large Phylogenetic Trees on Parallel Architectures. Michael Ott

Inference of Large Phylogenetic Trees on Parallel Architectures Michael Ott TECHNISCHE UNIVERSITÄT MÜNCHEN Lehrstuhl für Rechnertechnik und Rechnerorganisation / Parallelrechnerarchitektur Inference of

### Organizing Life s Diversity Section 17.1 The History of Classification

Organizing Life s Diversity Section 17.1 The History of Classification Scan Section 1 of the chapter. Write three questions that come to mind from reading the headings and the illustration captions. 1.

### Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

### Appendix E: Graphing Data

You will often make scatter diagrams and line graphs to illustrate the data that you collect. Scatter diagrams are often used to show the relationship between two variables. For example, in an absorbance

### Project Scheduling Techniques: Probabilistic and Deterministic

Project Scheduling Techniques: Probabilistic and Deterministic Contents Project scheduling and types of project schedule... 2 Overview... 2 Project scheduling techniques... 2 Deterministic Scheduling...

### User Manual for SplitsTree4 V4.14.2

User Manual for SplitsTree4 V4.14.2 Daniel H. Huson and David Bryant November 4, 2015 Contents Contents 1 1 Introduction 4 2 Getting Started 5 3 Obtaining and Installing the Program 5 4 Program Overview

### CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

### Indexing Multiple- Inheritance Hierarchies

Indexing Multiple- Inheritance Hierarchies Faculty of Electronics and Information Technology Warsaw University of Technology Jacek Lewandowski prof. Henryk Rybiński Layout The context of research Introduction

### Box plots & t-tests. Example

Box plots & t-tests Box Plots Box plots are a graphical representation of your sample (easy to visualize descriptive statistics); they are also known as box-and-whisker diagrams. Any data that you can

### DNA Sequence Analysis Tutorial Genetics Laboratory

DNA Sequence Analysis Tutorial Genetics Laboratory By Jason Evans, Luke Sheneman and Celeste Brown 1 What is VectorNTI? From the Informax website: VectorNTI equips laboratories with an extensive range

### Scheduling. Open shop, job shop, flow shop scheduling. Related problems. Open shop, job shop, flow shop scheduling

Scheduling Basic scheduling problems: open shop, job shop, flow job The disjunctive graph representation Algorithms for solving the job shop problem Computational complexity of the job shop problem Open

### Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006

Hidden Markov Models in Bioinformatics By Máthé Zoltán Kőrösi Zoltán 2006 Outline Markov Chain HMM (Hidden Markov Model) Hidden Markov Models in Bioinformatics Gene Finding Gene Finding Model Viterbi algorithm

### Sequoia Forest. A Scalable Random Forest Implementation on Spark. Sung Hwan Chung, Alpine Data Labs

Sequoia Forest A Scalable Random Forest Implementation on Spark Sung Hwan Chung, Alpine Data Labs Random Forest Overview It s a technique to reduce the variance of single decision tree predictions by averaging

### Using a Genetic Algorithm to Solve Crossword Puzzles. Kyle Williams

Using a Genetic Algorithm to Solve Crossword Puzzles Kyle Williams April 8, 2009 Abstract In this report, I demonstrate an approach to solving crossword puzzles by using a genetic algorithm. Various values

### Using Neural Networks to Create an Adaptive Character Recognition System

Using Neural Networks to Create an Adaptive Character Recognition System Alexander J. Faaborg Cornell University, Ithaca NY (May 14, 2002) Abstract A back-propagation neural network with one hidden layer

### Contents. Principles of Taxonomy. Préface to the First Edition. Préface to the Second Edition. Acknowledgments for the First Edition

Contents Préface to the First Edition Préface to the Second Edition Acknowledgments for the First Edition Acknowledgments for the Second Edition xv xvii xix xxi PART ONE Principles of Taxonomy SECTION

### Chapter 8 QUARTET SUPERTREES. 1. Introduction. Raul Piaggio-Talice, J. Gordon Burleigh, and Oliver Eulenstein

Chapter 8 QUARTET SUPERTREES Raul Piaggio-Talice, J. Gordon Burleigh, and Oliver Eulenstein Abstract: Keywords: We introduce two supertree methods that produce unrooted supertrees from unrooted input trees.

### Identifying Market Price Levels using Differential Evolution

Identifying Market Price Levels using Differential Evolution Michael Mayo University of Waikato, Hamilton, New Zealand mmayo@waikato.ac.nz WWW home page: http://www.cs.waikato.ac.nz/~mmayo/ Abstract. Evolutionary

### IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 6, NO. 2, APRIL-JUNE 2009 1

IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 6, NO. 2, APRIL-JUNE 2009 1 The Gene-Duplication Problem: Near-Linear Time Algorithms for NNI-Based Local Searches Mukul S. Bansal,

### 6.080 / 6.089 Great Ideas in Theoretical Computer Science Spring 2008

MIT OpenCourseWare http://ocw.mit.edu 6.080 / 6.089 Great Ideas in Theoretical Computer Science Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

### Lecture 7: Genetic Algorithms

Lecture 7: Genetic Algorithms Cognitive Systems - Machine Learning Part II: Special Aspects of Concept Learning Genetic Algorithms, Genetic Programming, Models of Evolution last change November 26, 2014