# Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Sequence Analysis 15: lecture 5 Substitution matrices Multiple sequence alignment

2 A teacher's dilemma To understand... Multiple sequence alignment Substitution matrices Phylogenetic trees You first need to know... Substitution matrices Phylogenetic trees Multiple sequence alignment We ll start with substitution matrices.

3 Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of mutation (or log-odds ratio) Derived from multiple sequence alignments Two commonly used matrices: PAM and BLOSUM PAM = percent accepted mutations (Dayhoff) BLOSUM = Blocks substitution matrix (Henikoff)

4 PAM M Dayhoff, 1978 Evolutionary time is measured in Percent Accepted Mutations, or PAMs One PAM of evolution means 1% of the residues/bases have changed, averaged over all 20 amino acids. To get the relative frequency of each type of mutation, we count the times it was observed in a database of multiple sequence alignments. Based on global alignments Assumes a Markov model for evolution. Margaret Oakley Dayhoff

5 BLOSUM Henikoff & Henikoff, 1992 Based on database of ungapped local alignments (BLOCKS) Alignments have lower similarity than PAM alignments. BLOSUM number indicates the percent identity level of sequences in the alignment. For example, for BLOSUM62 sequences with approximately 62% identity were counted. Some BLOCKS represent functional units, providing validation of the alignment. Steven Henikoff

6 Multiple Sequence Alignment A multiple sequence alignment is made using many pairwise sequence alignments

7 Columns in a MSA have a common evolutionary history a phylogenetic tree for one position in the alignment? N S A By aligning the sequences, we are asserting that the aligned residues in each column had a common ancestor.

8 A tree shows the evolutionary history of a single position Ancestral characters can be inferred by parsimony analysis. W N W W worm clam bird goat fish 8

9 Counting mutations without knowing ancestral sequences Naíve way: Assume any of the characters could be the ancestral one. Assume equal distance to the ancestor from each taxon. L K F L K F L K F L K F L K F L K F L K F W W N R L S K K P R L S K K P R L T K K P R L S K K P R L S R K P R L T R K P R L ~ K K P W W N If was the ancestor, then it mutated to a W twice, to N once, and stayed three times.

10 We could have picked W as the ancestor... L K F L K F L K F L K F L K F L K F L K F W W N R L S K K P R L S K K P R L T K K P R L S K K P R L S R K P R L T R K P R L ~ K K P W W N If W was the ancestor, then it mutated to a four times, to N once, and stayed W once. * *FYI: This is how you draw a phylogenetic tree when the branch order is not known.

11 Subsitution matrices are symmetrical Since we don't know which sequence came first, we don't know whether w or...is correct. So we count this as one mutation of each type. P(-->W) and P(W-->) are the same number. (That's why we only show the upper triangle) w

12 Summing the substitution counts We assume the ancestor is one of the observed amino acids, but we don't know which, so we try them all. W W N N N W one column of a MSA W symmetrical matrix

13 Next possible ancestor, again. We already counted this, so ignore it. W W N N N W W

14 W W N N N W 2 1 W 1

15 W W N N N W 2 1 W 0

16 N W W W N N W

17 Next... again W W N Counting as the ancestor many times as it appears recognizes the increased likelihood that (the most frequent aa at this position) is the true ancestor. N W N W 1 0 0

18 W W N (no counts for last seq.) N W N W 0 0 0

19 o to next column. Continue summing. W W N P P I N P P A Continue doing this for every column in every multiple sequence alignment... N W N W TOTAL=21 1

20 Probability ratios are expressed as log odds Substitutions (and many other things in bioinformatics) are expressed as a "likelihood ratio", or "odds ratio" of the observed data over the expected value. Likelihood and odds are synomyms for Probability. So Log Odds is the log (usually base 2) of the odds ratio. log odds ratio = log 2 (observed/expected )

21 How do you calculate log-odds? P() = 4/7 = 0.57 Observed probability of -> q = P(->)=6/21 = 0.29 Expected probability of ->, e = 0.57*0.57 = 0.33 odds ratio = q /e = 0.29/0.33 log odds ratio = log 2 (q /e ) If the lod is < 0., then the mutation is less likely than expected by chance. If it is > 0., it is more likely.

22 Same amino acids, different distribution, different outcome. P()=0.50 e = 0.25 q = 9/42 =0.21 lod = log 2 (0.21/0.25) = 0.2 A W W A N A A s spread over many columns P()=0.50 e = 0.25 q = 21/42 =0.5 lod = log 2 (0.50/0.25) = 1 W A W A W A A s concentrated

23 Different observations, same expectation P()=0.50, P(W)=0.14 e W = 0.07 q W = 7/42 =0.17 lod = log 2 (0.17/0.07) = 1.3 A W A W N A A and W seen together more often than expected. P()=0.50, P(W)=0.14 e W = 0.07 q = 3/42 =0.07 lod = log 2 (0.07/0.07) = 0 W A W A W A A s and W s not seen together.

24 In class exercise: et the substitution value for P->Q PQPP QQQP QQPP QPPP QQQP P Q P Q P(P)=, P(Q)= e PQ = q PQ = / = lod = log 2 (q PQ /e PQ ) = sequence alignment database. substitution counts expected (e), versus observed (q) for P->Q

25 PAM assumes Markovian evolution A Markov process is one where the likelihood of the next "state" depends only on the current state. The inference that evolution is Markovian assumes that base changes (or amino acid changes) occur at a constant rate and depend only on the identity of the current base (or amino acid). transition likelihood / MY current aa -> ->A ->V V->V V-> A V V millions of years (MY)

26 Markovian evolution is an extrapolation Start with one sequence. One position. Say ly. Wait 1 million years. What amino acids are now found at that position? PAM1 = Wait another million years. PAM1 = But,... PAM1 is just PAM1 PAM1

27 250 million years? PAM1 PAM1 PAM1 = 250 PAM250 = PAM1 The number after PAM denotes the power to which PAM1 was taken.

28 NOTE OF CLARIFICATION: PAM does not stand for Plus A Million years (or anything like that). It stands for Percent Accepted Mutations. One PAM1 unit does not correspond to 1 million years of evolution. There is no timescale associated with PAM. PAM1 corresponds to 1% mutations. (or 99% identity). The timescale depends on the species. 28

29 Differences between PAM and BLOSUM PAM PAM matrices are based on global alignments of closely related proteins. The PAM1 is the matrix calculated from comparisons of sequences with no more than 1% divergence. Other PAM matrices are extrapolated from PAM1 using an assumed Markov chain. BLOSUM BLOSUM matrices are based on local alignments. BLOSUM 62 is a matrix calculated from comparisons of sequences with approx 62% identity. All BLOSUM matrices are based on observed alignments; they are not extrapolated from comparisons of closely related proteins. BLOSUM 62 is the default matrix in BLAST (the database search program). It is tailored for comparisons of moderately distant proteins. Alignment of distant relatives may be more accurate with a different matrix.

30 PAM250

31 BLOSUM62

32 In class exercise: Which substitution matrix favors... PAM250 BLOSUM62 conservation of polar residues conservation of non-polar residues conservation of C, Y, or W polar-to-nonpolar mutations polar-to-polar mutations

33 Protein versus DNA alignments Are protein alignment better? Protein alphabet = 20, DNA alphabet = 4. Protein alignment is more informative Less chance of homoplasy with proteins. Homology detectable at greater edit distance Protein alignment more informative Better old Standard alignments are available for proteins. Better statistics from.s. alignments. On the other hand, DNA alignments are more sensitive to short evolutionary distances. 33

34 DNA evolutionary models: P-distance What is the relationship between time and the %identity? p = D L evolutionary time 0 p 1 p is a good measure of time only when p is small. 34

35 DNA evolutionary models: Poisson correction Corrects for multiple mutations at the same site. Unobserved mutations. p = D L 1-p = e -2rt d P = 2rt dp = -ln(1-p) The fraction unchanged decays according to the Poisson function. In the time t since the common ancestor, 2rt mutations have occurred, where r is the mutation rate (r = genetic drift * selection pressure) evolutionary time 0 p 1 Poisson correction assumes p goes to 1 at t=. Where should it really go? 35

36 DNA evolutionary models: Jukes-Cantor What is the relationship between true evolutionary distance and p-distance? Prob(mutation in one unit of time) = α Prob(no mutation) = 1-3α A C α << 1. T A 1-3α α α α At time t, fraction identical is q(t). Fraction non-identical is p(t). p(t) + q(t) = 1 In time t+1, each of q(t) positions stays same with prob = 1-3α. C T α α α 1-3α α α α 1-3α α α α 1-3α Prob that both sequences do not mutate = (1-3α) 2 =(1-6α+9α 2 ) (1-6α). ( Since α<<1, we can safely neglect α 2. ) Prob that a mismatch mutates back to an identity = 2αp(t) q(t+1) = q(t)(1-6α) + 2α(1-q(t)) d q(t)/dt q(t+1) - q(t) = 2α - 8α q(t) Integrating: q(t) = (1/4)(1 + 3exp(-8αt)) Solving for d JC = 6αt = -(3/4)ln(1 - (4/3)p), where p is the p-distance. 36

37 DNA evolutionary models evolutionary time 0 p 1 p-distance Poisson correction Jukes-Cantor In Jukes-Cantor, p limits to p=0.75 at infinite evolutionary distance. 37

38 Transitions/transversions A C In DNA replication, errors can be transitions (purine for purine, pyrimidine for pyrimidine) or transversions (purine for pyrimidine & vice versa) R = transitions/transversions. R would be 1/2 if all mutations were equally likely. In DNA alignments, R is observed to be about 4. Transitions are greatly favored over transversions. T 38

39 Jukes-Cantor with correction for transitions/ transversions (Kimura 2-parameter model, dk2p) A C A C T 1-2β-α β β α β 1-2β-α α β β α β β α 1-2β-α β 1-2β-α Split changes (D) into the two types, transition (P) and transversion (Q) p-distance = D/L = P + Q P = transitions/l, Q=transversions/L The the corrected evolutionary distance is... d K2P =-(1/2)ln(1-2P-Q)-(1/4)ln(1-2Q) 39

40 Further corrections are possible, but... Do we need a nucleotide substitution matrix? A C T A C Additional corrections possible for: Sequence position (gamma) Isochores (C-rich, AT-rich regions) Methylated, not methylated.

41 Review questions What are we asserting about ancestry by aligning the sequences? What kind of data is used to generate a substitution matrix? What makes the PAM matrix Markovian? When should you align amino acid sequence, when DNA? When is the P-distance an accurate measure of time? Why is time versus p-distance Poisson? Why are transitions more common the transversions? What does Kimura do that Jukes-Cantor does not? 41

### Amino Acids and Their Properties

Amino Acids and Their Properties Recap: ss-rrna and mutations Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used So by aligning ss-rrna of one organism with that

### Bio-Informatics Lectures. A Short Introduction

Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively

### Introduction to Phylogenetic Analysis

Subjects of this lecture Introduction to Phylogenetic nalysis Irit Orr 1 Introducing some of the terminology of phylogenetics. 2 Introducing some of the most commonly used methods for phylogenetic analysis.

### Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6

Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 In the last lab, you learned how to perform basic multiple sequence alignments. While useful in themselves for determining conserved residues

### Network Protocol Analysis using Bioinformatics Algorithms

Network Protocol Analysis using Bioinformatics Algorithms Marshall A. Beddoe Marshall_Beddoe@McAfee.com ABSTRACT Network protocol analysis is currently performed by hand using only intuition and a protocol

### How to Build a Phylogenetic Tree

How to Build a Phylogenetic Tree Phylogenetics tree is a structure in which species are arranged on branches that link them according to their relationship and/or evolutionary descent. A typical rooted

### Pairwise sequence alignments

Pairwise sequence alignments Volker Flegel Vassilios Ioannidis VI - 2004 Page 1 Outline Introduction Definitions Biological context of pairwise alignments Computing of pairwise alignments Some programs

### Pairwise Sequence Alignment

Pairwise Sequence Alignment carolin.kosiol@vetmeduni.ac.at SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What

### Protein Sequence Analysis - Overview -

Protein Sequence Analysis - Overview - UDEL Workshop Raja Mazumder Research Associate Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center Topics Why do protein

### FROM PROTEIN SEQUENCES TO PHYLOGENETIC TREES

FROM PROTEIN SEQUENCES TO PHYLOGENETIC TREES Robert Hirt Department of Zoology, The Natural History Museum, London Agenda Remind you that molecular phylogenetics is complex the more you know about the

### Rapid alignment methods: FASTA and BLAST. p The biological problem p Search strategies p FASTA p BLAST

Rapid alignment methods: FASTA and BLAST p The biological problem p Search strategies p FASTA p BLAST 257 BLAST: Basic Local Alignment Search Tool p BLAST (Altschul et al., 1990) and its variants are some

### Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003

Similarity Searches on Sequence Databases: BLAST, FASTA Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Outline Importance of Similarity Heuristic Sequence Alignment:

### Clone Manager. Getting Started

Clone Manager for Windows Professional Edition Volume 2 Alignment, Primer Operations Version 9.5 Getting Started Copyright 1994-2015 Scientific & Educational Software. All rights reserved. The software

### Overview. bayesian coalescent analysis (of viruses) The coalescent. Coalescent inference

Overview bayesian coalescent analysis (of viruses) Introduction to the Coalescent Phylodynamics and the Bayesian skyline plot Molecular population genetics for viruses Testing model assumptions Alexei

### Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

### RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the

### An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm

### Core Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1

Core Bioinformatics 2014/2015 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformàtica/Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: Sonia.Casillas@uab.cat

### BLAST. Anders Gorm Pedersen & Rasmus Wernersson

BLAST Anders Gorm Pedersen & Rasmus Wernersson Database searching Using pairwise alignments to search databases for similar sequences Query sequence Database Database searching Most common use of pairwise

### Algorithms for Sequence Alignment. Dynamic programming

Algorithms for Sequence Alignment Previous lectures Global alignment (Needleman-Wunsch algorithm) Local alignment (Smith-Waterman algorithm) Heuristic method BLAST Statistics of BLAST scores Dynamic programming

### Sequence Database Searching (Basic Tools and Advanced Methods)

I519 Introduction to Bioinformatics Sequence Database Searching (Basic Tools and Advanced Methods) Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Basics of DB search BLAST Table of

### Bioinformatics Resources at a Glance

Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences

### Analysis of two genes from the X and Y chromosomes of plant species from the genus Silene

Analysis of DNA sequence data p. 1 Analysis of DNA sequence data using MEGA and DNAsp. Analysis of two genes from the X and Y chromosomes of plant species from the genus Silene The first two computer classes

### PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org

BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab Mai S. Mabrouk 1, Marwa Hamdy 2, Marwa Mamdouh 2, Marwa Aboelfotoh 2,Yasser M. Kadah 2 1 Biomedical Engineering Department,

### Introduction to Bioinformatics 3. DNA editing and contig assembly

Introduction to Bioinformatics 3. DNA editing and contig assembly Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 matthewb@ba.ars.usda.gov

### BIOL 3306 Evolutionary Biology

BIOL 3306 Evolutionary Biology Maximum score: 22.5 Time Allowed: 90 min Total Problems: 17 1. The figure below shows two different hypotheses for the phylogenetic relationships among several major groups

### DNA Insertions and Deletions in the Human Genome. Philipp W. Messer

DNA Insertions and Deletions in the Human Genome Philipp W. Messer Genetic Variation CGACAATAGCGCTCTTACTACGTGTATCG : : CGACAATGGCGCT---ACTACGTGCATCG 1. Nucleotide mutations 2. Genomic rearrangements 3.

Phylogenetic Trees Made Easy A How-To Manual Fourth Edition Barry G. Hall University of Rochester, Emeritus and Bellingham Research Institute Sinauer Associates, Inc. Publishers Sunderland, Massachusetts

### The Central Dogma of Molecular Biology

Vierstraete Andy (version 1.01) 1/02/2000 -Page 1 - The Central Dogma of Molecular Biology Figure 1 : The Central Dogma of molecular biology. DNA contains the complete genetic information that defines

### Bayesian Phylogeny and Measures of Branch Support

Bayesian Phylogeny and Measures of Branch Support Bayesian Statistics Imagine we have a bag containing 100 dice of which we know that 90 are fair and 10 are biased. The

### Bioinformatics Grid - Enabled Tools For Biologists.

Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis

### Constructing Phylogenetic Trees Using Maximum Likelihood

Claremont Colleges Scholarship @ Claremont Scripps Senior Theses Scripps Student Scholarship 2012 Constructing Phylogenetic Trees Using Maximum Likelihood Anna Cho Scripps College Recommended Citation

### A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML

9 June 2011 A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML by Jun Inoue, Mario dos Reis, and Ziheng Yang In this tutorial we will analyze

### Database searching with DNA and protein sequences: An introduction Clare Sansom Date received (in revised form): 12th November 1999

Dr Clare Sansom works part time at Birkbeck College, London, and part time as a freelance computer consultant and science writer At Birkbeck she coordinates an innovative graduate-level Advanced Certificate

### Lab 2/Phylogenetics/September 16, 2002 1 PHYLOGENETICS

Lab 2/Phylogenetics/September 16, 2002 1 Read: Tudge Chapter 2 PHYLOGENETICS Objective of the Lab: To understand how DNA and protein sequence information can be used to make comparisons and assess evolutionary

### THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering

### A guided tutorial and Jalview clinic

A guided tutorial and Jalview clinic Jim Procter Barton Group, College of Life Sciences University of Dundee j.procter@dundee.ac.uk FASTA GFF Bioinformatics data is not fun to read.. PDB Newick CSV Alignment

### Student Guide for Mesquite

MESQUITE Student User Guide 1 Student Guide for Mesquite This guide describes how to 1. create a project file, 2. construct phylogenetic trees, and 3. map trait evolution on branches (e.g. morphological

### BLAST: Basic Local Alignment Search Tool. Jonathan M. Urbach Bioinformatics Group Department of Molecular Biology

BLAST: Basic Local Alignment Search Tool Jonathan M. Urbach Bioinformatics Group Department of Molecular Biology 1 Topics to be covered: " BLAST as a Sequence Alignment Tool " Uses of BLAST " Types of

### An Introduction to Sequence Similarity ( Homology ) Searching

An Introduction to Sequence Similarity ( Homology ) Searching Gary D. Stormo 1 UNIT 3.1 1 Washington University, School of Medicine, St. Louis, Missouri ABSTRACT Homologous sequences usually have the same,

### Protein Structure II. (end I3+I4-I6)

Protein Structure II (end I3+I4-I6) Amino Acid Substitution Let us define amino acid substitution : A viable mutation that changes a protein so that the amino acid that was at some location becomes another

### Worksheet - COMPARATIVE MAPPING 1

Worksheet - COMPARATIVE MAPPING 1 The arrangement of genes and other DNA markers is compared between species in Comparative genome mapping. As early as 1915, the geneticist J.B.S Haldane reported that

### Protein & DNA Sequence Analysis. Bobbie-Jo Webb-Robertson May 3, 2004

Protein & DNA Sequence Analysis Bobbie-Jo Webb-Robertson May 3, 2004 Sequence Analysis Anything connected to identifying higher biological meaning out of raw sequence data. 2 Genomic & Proteomic Data Sequence

### BIOINFORMATICS TUTORIAL

Bio 242 BIOINFORMATICS TUTORIAL Bio 242 α Amylase Lab Sequence Sequence Searches: BLAST Sequence Alignment: Clustal Omega 3d Structure & 3d Alignments DO NOT REMOVE FROM LAB. DO NOT WRITE IN THIS DOCUMENT.

### of interrelatedness and decent with modification. However, technological advances are opening another powerful avenue of comparison: DNA sequences.

SCI115SC Module 3 AVP Transcript Title: Genetic Similarity Title Slide Narrator: Welcome to this presentation on genetic similarity. Slide 2 Title: Underlying the Outwardly Visible Similarities, We Find

### Schools of Systematics. Schools of Systematics. What is Evolutionary Systematics? What is Evolutionary Systematics? What is Evolutionary Systematics?

Topic 3: Systematics II What are the schools of thought of systematics? How does one make a cladogram? What are higher taxa & ranks? How should they be used in this course? Applying phylogenies Evolution

### Gene Trees in Species Trees

Computational Biology Class Presentation Gene Trees in Species Trees Wayne P. Maddison Published in Systematic Biology, Vol. 46, No.3 (Sept 1997), pp. 523-536 Overview Gene Trees and Species Trees Processes

### PHYLOGENETIC ANALYSIS

Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Second Edition Andreas D. Baxevanis, B.F. Francis Ouellette Copyright 2001 John Wiley & Sons, Inc. ISBNs: 0-471-38390-2 (Hardback);

### How Populations Evolve

How Populations Evolve Darwin and the Origin of the Species Charles Darwin published On the Origin of Species by Means of Natural Selection, November 24, 1859. Darwin presented two main concepts: Life

### Accuracies of Ancestral Amino Acid Sequences Inferred by the Parsimony, Likelihood, and Distance Methods

J Mol Evol (1997) 44(Suppl 1):S139 S146 Springer-Verlag New York Inc. 1997 Accuracies of Ancestral Amino Acid Sequences Inferred by the Parsimony, Likelihood, and Distance Methods Jianzhi Zhang, Masatoshi

### A Novel Use of Equilibrium Frequencies in Models of Sequence Evolution

A Novel Use of Equilibrium Frequencies in Models of Sequence Evolution Nick Goldman and Simon Whelan Department of Zoology, University of Cambridge, U.K. Current mathematical models of amino acid sequence

### Using MATLAB: Bioinformatics Toolbox for Life Sciences

Using MATLAB: Bioinformatics Toolbox for Life Sciences MR. SARAWUT WONGPHAYAK BIOINFORMATICS PROGRAM, SCHOOL OF BIORESOURCES AND TECHNOLOGY, AND SCHOOL OF INFORMATION TECHNOLOGY, KING MONGKUT S UNIVERSITY

### Bioinformatics and handwriting/speech recognition: unconventional applications of similarity search tools

Bioinformatics and handwriting/speech recognition: unconventional applications of similarity search tools Kyle Jensen and Gregory Stephanopoulos Department of Chemical Engineering, Massachusetts Institute

### A branch-and-bound algorithm for the inference of ancestral. amino-acid sequences when the replacement rate varies among

A branch-and-bound algorithm for the inference of ancestral amino-acid sequences when the replacement rate varies among sites Tal Pupko 1,*, Itsik Pe er 2, Masami Hasegawa 1, Dan Graur 3, and Nir Friedman

### Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office

### Gene Mutation. (CHAPTER 16- Brooker Text) October 11, 2007 BIO 184 Dr. Tom Peavy

Gene Mutation (CHAPTER 16- Brooker Text) October 11, 2007 BIO 184 Dr. Tom Peavy The term mutation refers to a heritable change in the genetic material Mutations provide allelic variations On the positive

### MAKING AN EVOLUTIONARY TREE

Student manual MAKING AN EVOLUTIONARY TREE THEORY The relationship between different species can be derived from different information sources. The connection between species may turn out by similarities

### Bayesian coalescent inference of population size history

Bayesian coalescent inference of population size history Alexei Drummond University of Auckland Workshop on Population and Speciation Genomics, 2016 1st February 2016 1 / 39 BEAST tutorials Population

### Simon Tavarg Statistics Department Colorado State University Fort Collins, CO ABSTRACT

Simon Tavarg Statistics Department Colorado State University Fort Collins, CO 80523 ABSTRACT Some models for the estimation of substitution rates from pairs of functionally homologous DNA sequences are

### CSE8393 Introduction to Bioinformatics Lecture 3: More problems, Global Alignment. DNA sequencing

SE8393 Introduction to Bioinformatics Lecture 3: More problems, Global lignment DN sequencing Recall that in biological experiments only relatively short segments of the DN can be investigated. To investigate

### CD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/

CD-HIT User s Guide Last updated: April 5, 2010 http://cd-hit.org http://bioinformatics.org/cd-hit/ Program developed by Weizhong Li s lab at UCSD http://weizhong-lab.ucsd.edu liwz@sdsc.edu 1. Introduction

### Activity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations

Activity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations SCENARIO You have responded, as a result of a call from the police to the Coroner s Office, to the scene of the death of

### Core Bioinformatics. Degree Type Year Semester

Core Bioinformatics 2015/2016 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: Sonia.Casillas@uab.cat Teachers Use of

### BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

### TEACHER S GUIDE. Case Study. Evolution of colour vision in primates. Jarosław Bryk. Max Planck Institute for Evolutionary Biology, Plön. Version 1.

TEACHER S GUIDE Case Study Evolution of colour vision in primates Jarosław Bryk Max Planck Institute for Evolutionary Biology, Plön Version 1.1 Evolution of colour vision in primates In this activity students

### LAB 21 Using Bioinformatics to Investigate Evolutionary Relationships; Have a BLAST!

LAB 21 Using Bioinformatics to Investigate Evolutionary Relationships; Have a BLAST! Introduction: Between 1990-2003, scientists working on an international research project known as the Human Genome Project,

### Core Bioinformatics. Titulació Tipus Curs Semestre. 4313473 Bioinformàtica/Bioinformatics OB 0 1

Core Bioinformatics 2014/2015 Codi: 42397 Crèdits: 12 Titulació Tipus Curs Semestre 4313473 Bioinformàtica/Bioinformatics OB 0 1 Professor de contacte Nom: Sònia Casillas Viladerrams Correu electrònic:

### Computational searches of biological sequences

UNAM, México, Enero 78 Computational searches of biological sequences Special thanks to all the scientis that made public available their presentations throughout the web from where many slides were taken

### morephyml User Guide [Version 1.14] August 2011 by Alexis Criscuolo

morephyml User Guide [Version 1.14] August 2011 by Alexis Criscuolo ftp://ftp.pasteur.fr/pub/gensoft/projects/morephyml/ http://mobyle.pasteur.fr/cgi-bin/portal.py Please cite this paper if you use this

### DNA Sequence Classification in the Presence of Sequencing Errors Project of CSE847

DNA Sequence Classification in the Presence of Sequencing Errors Project of CSE847 Yuan Zhang zhangy72@msu.edu Cheng Yuan chengy@msu.edu Computer Science and Engineering Department Michigan State University

### NSilico Life Science Introductory Bioinformatics Course

NSilico Life Science Introductory Bioinformatics Course INTRODUCTORY BIOINFORMATICS COURSE A public course delivered over three days on the fundamentals of bioinformatics and illustrated with lectures,

### Multiple Sequence Alignment. Hot Topic 5/24/06 Kim Walker

Multiple Sequence Alignment Hot Topic 5/24/06 Kim Walker Outline Why are Multiple Sequence Alignments useful? What Tools are Available? Brief Introduction to ClustalX Tools to Edit and Add Features to

### Distributions of Statistics Used for the Comparison of Models of Sequence Evolution in Phylogenetics

Distributions of Statistics Used for the Comparison of Models of Sequence Evolution in Phylogenetics Simon Whelan and Nick Goldman Department of Genetics, University of Cambridge Asymptotic statistical

### Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Name: Class: Date: Chapter 17 Practice Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The correct order for the levels of Linnaeus's classification system,

### BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:

### Y-DNA FACT SHEET. Bruce A. Crawford

Y-DNA FACT SHEET By Bruce A. Crawford For those not familiar with DNA analysis and particularly Y-DNA, the following explanation may help. DNA is the basic building block of cell information and heredity.

### What next? Computational Biology and Bioinformatics. Finding homologs 2. Finding homologs. 4. Searching for homologs with BLAST

Computational Biology and Bioinformatics 4. Searching for homologs with BLAST What next? Comparing sequences and searching for homologs Sequence alignment and substitution matrices Searching for sequences

### DNA & Protein Sequence Comparison

equence Comparison & rotein equence Comparison harm 207 / Bio 207 ecture 2 utbuddin octor equence is often known early in analysis rotein sequence confers more information. lignment between sequences arious

### Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006

Hidden Markov Models in Bioinformatics By Máthé Zoltán Kőrösi Zoltán 2006 Outline Markov Chain HMM (Hidden Markov Model) Hidden Markov Models in Bioinformatics Gene Finding Gene Finding Model Viterbi algorithm

### Hierarchical Bayesian Modeling of the HIV Response to Therapy

Hierarchical Bayesian Modeling of the HIV Response to Therapy Shane T. Jensen Department of Statistics, The Wharton School, University of Pennsylvania March 23, 2010 Joint Work with Alex Braunstein and

### Multiple Sequence Alignment

Multiple Sequence Alignment Definition Given N sequences x 1, x 2,, x N : Insert gaps (-) in each sequence x i, such that All sequences have the same length L Score of the global map is maximum Applications

### Optimization of Gene Prediction via More Accurate Phylogenetic Substitution Models

Washington University in St. Louis Washington University Open Scholarship All Computer Science and Engineering Research Computer Science and Engineering Report Number: WUCSE-2011-78 2011 Optimization of

### Mathematics and Statistics: Apply probability methods in solving problems (91267)

NCEA Level 2 Mathematics (91267) 2013 page 1 of 5 Assessment Schedule 2013 Mathematics and Statistics: Apply probability methods in solving problems (91267) Evidence Statement with Merit Apply probability

### Protein Sequencing and Identification With Mass Spectrometry

Protein Sequencing and Identification With Mass Spectrometry Tandem Mass Spectrometry De Novo Peptide Sequencing Spectrum Graph Protein Identification via Database Search Identifying Post Translationally

### Guide for Bioinformatics Project Module 3

Structure- Based Evidence and Multiple Sequence Alignment In this module we will revisit some topics we started to look at while performing our BLAST search and looking at the CDD database in the first

### Current Motif Discovery Tools and their Limitations

Current Motif Discovery Tools and their Limitations Philipp Bucher SIB / CIG Workshop 3 October 2006 Trendy Concepts and Hypotheses Transcription regulatory elements act in a context-dependent manner.

### 12/22/2014. Read the introduction. How does a cell make proteins with the information from DNA? Protein Synthesis: Transcription and Translation

EQ How does a cell make proteins with the information from DNA? Protein Synthesis: Get Started Get Started Think of a corn cell that is genetically modified to contain the Bt gene and a corn cell that

### Genome Explorer For Comparative Genome Analysis

Genome Explorer For Comparative Genome Analysis Jenn Conn 1, Jo L. Dicks 1 and Ian N. Roberts 2 Abstract Genome Explorer brings together the tools required to build and compare phylogenies from both sequence

### Sequence Alignment. Part 0: Files. Part 1: Local Alignment

Sequence Alignment Note: in the experiments below, you will not necessarily be given an exact recipe for how to proceed. You are encouraged to explore and find out what works and what doesn't. You're also

### MCMC A T T T G C T C B T T C C C T C C G C C T C T C D C C T T C T C. (Saitou and Nei, 1987) (Swofford and Begle, 1993)

MCMC 1 1 1 DNA 1 2 3 4 5 6 7 A T T T G C T C B T T C C C T C C G C C T C T C D C C T T C T C ( 2) (Saitou and Nei, 1987) (Swofford and Begle, 1993) 1 A B C D (Vos, 2003) (Zwickl, 2006) (Morrison, 2007)

### Experiments beginning in the middle 1960s found that many enzymes in

Inferring Selection and Mutation from DNA Sequences: The McDonald-Kreitman Test Revisited Stanley A. Sawyer Abstract Aligned DNA sequences from the same species, and from two species that share a sufficiently

### Principles of Evolution - Origin of Species

Theories of Organic Evolution X Multiple Centers of Creation (de Buffon) developed the concept of "centers of creation throughout the world organisms had arisen, which other species had evolved from X

### PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference

PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference Stephane Guindon, F. Le Thiec, Patrice Duroux, Olivier Gascuel To cite this version: Stephane Guindon, F. Le Thiec, Patrice

### Probability, statistics and football Franka Miriam Bru ckler Paris, 2015.

Probability, statistics and football Franka Miriam Bru ckler Paris, 2015 Please read this before starting! Although each activity can be performed by one person only, it is suggested that you work in groups

### Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011

Sequence Formats and Sequence Database Searches Gloria Rendon SC11 Education June, 2011 Sequence A is the primary structure of a biological molecule. It is a chain of residues that form a precise linear

### Maximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1

Maximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1 Ziheng Yang Department of Animal Science, Beijing Agricultural University Felsenstein s maximum-likelihood