Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment

Size: px
Start display at page:

Download "Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment"

Transcription

1 Sequence Analysis 15: lecture 5 Substitution matrices Multiple sequence alignment

2 A teacher's dilemma To understand... Multiple sequence alignment Substitution matrices Phylogenetic trees You first need to know... Substitution matrices Phylogenetic trees Multiple sequence alignment We ll start with substitution matrices.

3 Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of mutation (or log-odds ratio) Derived from multiple sequence alignments Two commonly used matrices: PAM and BLOSUM PAM = percent accepted mutations (Dayhoff) BLOSUM = Blocks substitution matrix (Henikoff)

4 PAM M Dayhoff, 1978 Evolutionary time is measured in Percent Accepted Mutations, or PAMs One PAM of evolution means 1% of the residues/bases have changed, averaged over all 20 amino acids. To get the relative frequency of each type of mutation, we count the times it was observed in a database of multiple sequence alignments. Based on global alignments Assumes a Markov model for evolution. Margaret Oakley Dayhoff

5 BLOSUM Henikoff & Henikoff, 1992 Based on database of ungapped local alignments (BLOCKS) Alignments have lower similarity than PAM alignments. BLOSUM number indicates the percent identity level of sequences in the alignment. For example, for BLOSUM62 sequences with approximately 62% identity were counted. Some BLOCKS represent functional units, providing validation of the alignment. Steven Henikoff

6 Multiple Sequence Alignment A multiple sequence alignment is made using many pairwise sequence alignments

7 Columns in a MSA have a common evolutionary history a phylogenetic tree for one position in the alignment? N S A By aligning the sequences, we are asserting that the aligned residues in each column had a common ancestor.

8 A tree shows the evolutionary history of a single position Ancestral characters can be inferred by parsimony analysis. W N W W worm clam bird goat fish 8

9 Counting mutations without knowing ancestral sequences Naíve way: Assume any of the characters could be the ancestral one. Assume equal distance to the ancestor from each taxon. L K F L K F L K F L K F L K F L K F L K F W W N R L S K K P R L S K K P R L T K K P R L S K K P R L S R K P R L T R K P R L ~ K K P W W N If was the ancestor, then it mutated to a W twice, to N once, and stayed three times.

10 We could have picked W as the ancestor... L K F L K F L K F L K F L K F L K F L K F W W N R L S K K P R L S K K P R L T K K P R L S K K P R L S R K P R L T R K P R L ~ K K P W W N If W was the ancestor, then it mutated to a four times, to N once, and stayed W once. * *FYI: This is how you draw a phylogenetic tree when the branch order is not known.

11 Subsitution matrices are symmetrical Since we don't know which sequence came first, we don't know whether w or...is correct. So we count this as one mutation of each type. P(-->W) and P(W-->) are the same number. (That's why we only show the upper triangle) w

12 Summing the substitution counts We assume the ancestor is one of the observed amino acids, but we don't know which, so we try them all. W W N N N W one column of a MSA W symmetrical matrix

13 Next possible ancestor, again. We already counted this, so ignore it. W W N N N W W

14 W W N N N W 2 1 W 1

15 W W N N N W 2 1 W 0

16 N W W W N N W

17 Next... again W W N Counting as the ancestor many times as it appears recognizes the increased likelihood that (the most frequent aa at this position) is the true ancestor. N W N W 1 0 0

18 W W N (no counts for last seq.) N W N W 0 0 0

19 o to next column. Continue summing. W W N P P I N P P A Continue doing this for every column in every multiple sequence alignment... N W N W TOTAL=21 1

20 Probability ratios are expressed as log odds Substitutions (and many other things in bioinformatics) are expressed as a "likelihood ratio", or "odds ratio" of the observed data over the expected value. Likelihood and odds are synomyms for Probability. So Log Odds is the log (usually base 2) of the odds ratio. log odds ratio = log 2 (observed/expected )

21 How do you calculate log-odds? P() = 4/7 = 0.57 Observed probability of -> q = P(->)=6/21 = 0.29 Expected probability of ->, e = 0.57*0.57 = 0.33 odds ratio = q /e = 0.29/0.33 log odds ratio = log 2 (q /e ) If the lod is < 0., then the mutation is less likely than expected by chance. If it is > 0., it is more likely.

22 Same amino acids, different distribution, different outcome. P()=0.50 e = 0.25 q = 9/42 =0.21 lod = log 2 (0.21/0.25) = 0.2 A W W A N A A s spread over many columns P()=0.50 e = 0.25 q = 21/42 =0.5 lod = log 2 (0.50/0.25) = 1 W A W A W A A s concentrated

23 Different observations, same expectation P()=0.50, P(W)=0.14 e W = 0.07 q W = 7/42 =0.17 lod = log 2 (0.17/0.07) = 1.3 A W A W N A A and W seen together more often than expected. P()=0.50, P(W)=0.14 e W = 0.07 q = 3/42 =0.07 lod = log 2 (0.07/0.07) = 0 W A W A W A A s and W s not seen together.

24 In class exercise: et the substitution value for P->Q PQPP QQQP QQPP QPPP QQQP P Q P Q P(P)=, P(Q)= e PQ = q PQ = / = lod = log 2 (q PQ /e PQ ) = sequence alignment database. substitution counts expected (e), versus observed (q) for P->Q

25 PAM assumes Markovian evolution A Markov process is one where the likelihood of the next "state" depends only on the current state. The inference that evolution is Markovian assumes that base changes (or amino acid changes) occur at a constant rate and depend only on the identity of the current base (or amino acid). transition likelihood / MY current aa -> ->A ->V V->V V-> A V V millions of years (MY)

26 Markovian evolution is an extrapolation Start with one sequence. One position. Say ly. Wait 1 million years. What amino acids are now found at that position? PAM1 = Wait another million years. PAM1 = But,... PAM1 is just PAM1 PAM1

27 250 million years? PAM1 PAM1 PAM1 = 250 PAM250 = PAM1 The number after PAM denotes the power to which PAM1 was taken.

28 NOTE OF CLARIFICATION: PAM does not stand for Plus A Million years (or anything like that). It stands for Percent Accepted Mutations. One PAM1 unit does not correspond to 1 million years of evolution. There is no timescale associated with PAM. PAM1 corresponds to 1% mutations. (or 99% identity). The timescale depends on the species. 28

29 Differences between PAM and BLOSUM PAM PAM matrices are based on global alignments of closely related proteins. The PAM1 is the matrix calculated from comparisons of sequences with no more than 1% divergence. Other PAM matrices are extrapolated from PAM1 using an assumed Markov chain. BLOSUM BLOSUM matrices are based on local alignments. BLOSUM 62 is a matrix calculated from comparisons of sequences with approx 62% identity. All BLOSUM matrices are based on observed alignments; they are not extrapolated from comparisons of closely related proteins. BLOSUM 62 is the default matrix in BLAST (the database search program). It is tailored for comparisons of moderately distant proteins. Alignment of distant relatives may be more accurate with a different matrix.

30 PAM250

31 BLOSUM62

32 In class exercise: Which substitution matrix favors... PAM250 BLOSUM62 conservation of polar residues conservation of non-polar residues conservation of C, Y, or W polar-to-nonpolar mutations polar-to-polar mutations

33 Protein versus DNA alignments Are protein alignment better? Protein alphabet = 20, DNA alphabet = 4. Protein alignment is more informative Less chance of homoplasy with proteins. Homology detectable at greater edit distance Protein alignment more informative Better old Standard alignments are available for proteins. Better statistics from.s. alignments. On the other hand, DNA alignments are more sensitive to short evolutionary distances. 33

34 DNA evolutionary models: P-distance What is the relationship between time and the %identity? p = D L evolutionary time 0 p 1 p is a good measure of time only when p is small. 34

35 DNA evolutionary models: Poisson correction Corrects for multiple mutations at the same site. Unobserved mutations. p = D L 1-p = e -2rt d P = 2rt dp = -ln(1-p) The fraction unchanged decays according to the Poisson function. In the time t since the common ancestor, 2rt mutations have occurred, where r is the mutation rate (r = genetic drift * selection pressure) evolutionary time 0 p 1 Poisson correction assumes p goes to 1 at t=. Where should it really go? 35

36 DNA evolutionary models: Jukes-Cantor What is the relationship between true evolutionary distance and p-distance? Prob(mutation in one unit of time) = α Prob(no mutation) = 1-3α A C α << 1. T A 1-3α α α α At time t, fraction identical is q(t). Fraction non-identical is p(t). p(t) + q(t) = 1 In time t+1, each of q(t) positions stays same with prob = 1-3α. C T α α α 1-3α α α α 1-3α α α α 1-3α Prob that both sequences do not mutate = (1-3α) 2 =(1-6α+9α 2 ) (1-6α). ( Since α<<1, we can safely neglect α 2. ) Prob that a mismatch mutates back to an identity = 2αp(t) q(t+1) = q(t)(1-6α) + 2α(1-q(t)) d q(t)/dt q(t+1) - q(t) = 2α - 8α q(t) Integrating: q(t) = (1/4)(1 + 3exp(-8αt)) Solving for d JC = 6αt = -(3/4)ln(1 - (4/3)p), where p is the p-distance. 36

37 DNA evolutionary models evolutionary time 0 p 1 p-distance Poisson correction Jukes-Cantor In Jukes-Cantor, p limits to p=0.75 at infinite evolutionary distance. 37

38 Transitions/transversions A C In DNA replication, errors can be transitions (purine for purine, pyrimidine for pyrimidine) or transversions (purine for pyrimidine & vice versa) R = transitions/transversions. R would be 1/2 if all mutations were equally likely. In DNA alignments, R is observed to be about 4. Transitions are greatly favored over transversions. T 38

39 Jukes-Cantor with correction for transitions/ transversions (Kimura 2-parameter model, dk2p) A C A C T 1-2β-α β β α β 1-2β-α α β β α β β α 1-2β-α β 1-2β-α Split changes (D) into the two types, transition (P) and transversion (Q) p-distance = D/L = P + Q P = transitions/l, Q=transversions/L The the corrected evolutionary distance is... d K2P =-(1/2)ln(1-2P-Q)-(1/4)ln(1-2Q) 39

40 Further corrections are possible, but... Do we need a nucleotide substitution matrix? A C T A C Additional corrections possible for: Sequence position (gamma) Isochores (C-rich, AT-rich regions) Methylated, not methylated.

41 Review questions What are we asserting about ancestry by aligning the sequences? What kind of data is used to generate a substitution matrix? What makes the PAM matrix Markovian? When should you align amino acid sequence, when DNA? When is the P-distance an accurate measure of time? Why is time versus p-distance Poisson? Why are transitions more common the transversions? What does Kimura do that Jukes-Cantor does not? 41

Amino Acids and Their Properties

Amino Acids and Their Properties Amino Acids and Their Properties Recap: ss-rrna and mutations Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used So by aligning ss-rrna of one organism with that

More information

Bio-Informatics Lectures. A Short Introduction

Bio-Informatics Lectures. A Short Introduction Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively

More information

Introduction to Phylogenetic Analysis

Introduction to Phylogenetic Analysis Subjects of this lecture Introduction to Phylogenetic nalysis Irit Orr 1 Introducing some of the terminology of phylogenetics. 2 Introducing some of the most commonly used methods for phylogenetic analysis.

More information

Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6

Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 In the last lab, you learned how to perform basic multiple sequence alignments. While useful in themselves for determining conserved residues

More information

Network Protocol Analysis using Bioinformatics Algorithms

Network Protocol Analysis using Bioinformatics Algorithms Network Protocol Analysis using Bioinformatics Algorithms Marshall A. Beddoe Marshall_Beddoe@McAfee.com ABSTRACT Network protocol analysis is currently performed by hand using only intuition and a protocol

More information

How to Build a Phylogenetic Tree

How to Build a Phylogenetic Tree How to Build a Phylogenetic Tree Phylogenetics tree is a structure in which species are arranged on branches that link them according to their relationship and/or evolutionary descent. A typical rooted

More information

Pairwise sequence alignments

Pairwise sequence alignments Pairwise sequence alignments Volker Flegel Vassilios Ioannidis VI - 2004 Page 1 Outline Introduction Definitions Biological context of pairwise alignments Computing of pairwise alignments Some programs

More information

Pairwise Sequence Alignment

Pairwise Sequence Alignment Pairwise Sequence Alignment carolin.kosiol@vetmeduni.ac.at SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What

More information

Protein Sequence Analysis - Overview -

Protein Sequence Analysis - Overview - Protein Sequence Analysis - Overview - UDEL Workshop Raja Mazumder Research Associate Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center Topics Why do protein

More information

FROM PROTEIN SEQUENCES TO PHYLOGENETIC TREES

FROM PROTEIN SEQUENCES TO PHYLOGENETIC TREES FROM PROTEIN SEQUENCES TO PHYLOGENETIC TREES Robert Hirt Department of Zoology, The Natural History Museum, London Agenda Remind you that molecular phylogenetics is complex the more you know about the

More information

Rapid alignment methods: FASTA and BLAST. p The biological problem p Search strategies p FASTA p BLAST

Rapid alignment methods: FASTA and BLAST. p The biological problem p Search strategies p FASTA p BLAST Rapid alignment methods: FASTA and BLAST p The biological problem p Search strategies p FASTA p BLAST 257 BLAST: Basic Local Alignment Search Tool p BLAST (Altschul et al., 1990) and its variants are some

More information

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Similarity Searches on Sequence Databases: BLAST, FASTA Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Outline Importance of Similarity Heuristic Sequence Alignment:

More information

Clone Manager. Getting Started

Clone Manager. Getting Started Clone Manager for Windows Professional Edition Volume 2 Alignment, Primer Operations Version 9.5 Getting Started Copyright 1994-2015 Scientific & Educational Software. All rights reserved. The software

More information

Overview. bayesian coalescent analysis (of viruses) The coalescent. Coalescent inference

Overview. bayesian coalescent analysis (of viruses) The coalescent. Coalescent inference Overview bayesian coalescent analysis (of viruses) Introduction to the Coalescent Phylodynamics and the Bayesian skyline plot Molecular population genetics for viruses Testing model assumptions Alexei

More information

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

More information

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms  Hidden Markov Models Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm

More information

Core Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1

Core Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1 Core Bioinformatics 2014/2015 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformàtica/Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: Sonia.Casillas@uab.cat

More information

BLAST. Anders Gorm Pedersen & Rasmus Wernersson

BLAST. Anders Gorm Pedersen & Rasmus Wernersson BLAST Anders Gorm Pedersen & Rasmus Wernersson Database searching Using pairwise alignments to search databases for similar sequences Query sequence Database Database searching Most common use of pairwise

More information

Algorithms for Sequence Alignment. Dynamic programming

Algorithms for Sequence Alignment. Dynamic programming Algorithms for Sequence Alignment Previous lectures Global alignment (Needleman-Wunsch algorithm) Local alignment (Smith-Waterman algorithm) Heuristic method BLAST Statistics of BLAST scores Dynamic programming

More information

Sequence Database Searching (Basic Tools and Advanced Methods)

Sequence Database Searching (Basic Tools and Advanced Methods) I519 Introduction to Bioinformatics Sequence Database Searching (Basic Tools and Advanced Methods) Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Basics of DB search BLAST Table of

More information

Bioinformatics Resources at a Glance

Bioinformatics Resources at a Glance Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences

More information

Analysis of two genes from the X and Y chromosomes of plant species from the genus Silene

Analysis of two genes from the X and Y chromosomes of plant species from the genus Silene Analysis of DNA sequence data p. 1 Analysis of DNA sequence data using MEGA and DNAsp. Analysis of two genes from the X and Y chromosomes of plant species from the genus Silene The first two computer classes

More information

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab Mai S. Mabrouk 1, Marwa Hamdy 2, Marwa Mamdouh 2, Marwa Aboelfotoh 2,Yasser M. Kadah 2 1 Biomedical Engineering Department,

More information

Introduction to Bioinformatics 3. DNA editing and contig assembly

Introduction to Bioinformatics 3. DNA editing and contig assembly Introduction to Bioinformatics 3. DNA editing and contig assembly Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 matthewb@ba.ars.usda.gov

More information

BIOL 3306 Evolutionary Biology

BIOL 3306 Evolutionary Biology BIOL 3306 Evolutionary Biology Maximum score: 22.5 Time Allowed: 90 min Total Problems: 17 1. The figure below shows two different hypotheses for the phylogenetic relationships among several major groups

More information

DNA Insertions and Deletions in the Human Genome. Philipp W. Messer

DNA Insertions and Deletions in the Human Genome. Philipp W. Messer DNA Insertions and Deletions in the Human Genome Philipp W. Messer Genetic Variation CGACAATAGCGCTCTTACTACGTGTATCG : : CGACAATGGCGCT---ACTACGTGCATCG 1. Nucleotide mutations 2. Genomic rearrangements 3.

More information

Phylogenetic Trees Made Easy

Phylogenetic Trees Made Easy Phylogenetic Trees Made Easy A How-To Manual Fourth Edition Barry G. Hall University of Rochester, Emeritus and Bellingham Research Institute Sinauer Associates, Inc. Publishers Sunderland, Massachusetts

More information

The Central Dogma of Molecular Biology

The Central Dogma of Molecular Biology Vierstraete Andy (version 1.01) 1/02/2000 -Page 1 - The Central Dogma of Molecular Biology Figure 1 : The Central Dogma of molecular biology. DNA contains the complete genetic information that defines

More information

Bayesian Phylogeny and Measures of Branch Support

Bayesian Phylogeny and Measures of Branch Support Bayesian Phylogeny and Measures of Branch Support Bayesian Statistics Imagine we have a bag containing 100 dice of which we know that 90 are fair and 10 are biased. The

More information

Bioinformatics Grid - Enabled Tools For Biologists.

Bioinformatics Grid - Enabled Tools For Biologists. Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis

More information

Constructing Phylogenetic Trees Using Maximum Likelihood

Constructing Phylogenetic Trees Using Maximum Likelihood Claremont Colleges Scholarship @ Claremont Scripps Senior Theses Scripps Student Scholarship 2012 Constructing Phylogenetic Trees Using Maximum Likelihood Anna Cho Scripps College Recommended Citation

More information

A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML

A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML 9 June 2011 A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML by Jun Inoue, Mario dos Reis, and Ziheng Yang In this tutorial we will analyze

More information

Database searching with DNA and protein sequences: An introduction Clare Sansom Date received (in revised form): 12th November 1999

Database searching with DNA and protein sequences: An introduction Clare Sansom Date received (in revised form): 12th November 1999 Dr Clare Sansom works part time at Birkbeck College, London, and part time as a freelance computer consultant and science writer At Birkbeck she coordinates an innovative graduate-level Advanced Certificate

More information

Lab 2/Phylogenetics/September 16, 2002 1 PHYLOGENETICS

Lab 2/Phylogenetics/September 16, 2002 1 PHYLOGENETICS Lab 2/Phylogenetics/September 16, 2002 1 Read: Tudge Chapter 2 PHYLOGENETICS Objective of the Lab: To understand how DNA and protein sequence information can be used to make comparisons and assess evolutionary

More information

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering

More information

A guided tutorial and Jalview clinic

A guided tutorial and Jalview clinic A guided tutorial and Jalview clinic Jim Procter Barton Group, College of Life Sciences University of Dundee j.procter@dundee.ac.uk FASTA GFF Bioinformatics data is not fun to read.. PDB Newick CSV Alignment

More information

Student Guide for Mesquite

Student Guide for Mesquite MESQUITE Student User Guide 1 Student Guide for Mesquite This guide describes how to 1. create a project file, 2. construct phylogenetic trees, and 3. map trait evolution on branches (e.g. morphological

More information

BLAST: Basic Local Alignment Search Tool. Jonathan M. Urbach Bioinformatics Group Department of Molecular Biology

BLAST: Basic Local Alignment Search Tool. Jonathan M. Urbach Bioinformatics Group Department of Molecular Biology BLAST: Basic Local Alignment Search Tool Jonathan M. Urbach Bioinformatics Group Department of Molecular Biology 1 Topics to be covered: " BLAST as a Sequence Alignment Tool " Uses of BLAST " Types of

More information

An Introduction to Sequence Similarity ( Homology ) Searching

An Introduction to Sequence Similarity ( Homology ) Searching An Introduction to Sequence Similarity ( Homology ) Searching Gary D. Stormo 1 UNIT 3.1 1 Washington University, School of Medicine, St. Louis, Missouri ABSTRACT Homologous sequences usually have the same,

More information

Protein Structure II. (end I3+I4-I6)

Protein Structure II. (end I3+I4-I6) Protein Structure II (end I3+I4-I6) Amino Acid Substitution Let us define amino acid substitution : A viable mutation that changes a protein so that the amino acid that was at some location becomes another

More information

Worksheet - COMPARATIVE MAPPING 1

Worksheet - COMPARATIVE MAPPING 1 Worksheet - COMPARATIVE MAPPING 1 The arrangement of genes and other DNA markers is compared between species in Comparative genome mapping. As early as 1915, the geneticist J.B.S Haldane reported that

More information

Protein & DNA Sequence Analysis. Bobbie-Jo Webb-Robertson May 3, 2004

Protein & DNA Sequence Analysis. Bobbie-Jo Webb-Robertson May 3, 2004 Protein & DNA Sequence Analysis Bobbie-Jo Webb-Robertson May 3, 2004 Sequence Analysis Anything connected to identifying higher biological meaning out of raw sequence data. 2 Genomic & Proteomic Data Sequence

More information

BIOINFORMATICS TUTORIAL

BIOINFORMATICS TUTORIAL Bio 242 BIOINFORMATICS TUTORIAL Bio 242 α Amylase Lab Sequence Sequence Searches: BLAST Sequence Alignment: Clustal Omega 3d Structure & 3d Alignments DO NOT REMOVE FROM LAB. DO NOT WRITE IN THIS DOCUMENT.

More information

of interrelatedness and decent with modification. However, technological advances are opening another powerful avenue of comparison: DNA sequences.

of interrelatedness and decent with modification. However, technological advances are opening another powerful avenue of comparison: DNA sequences. SCI115SC Module 3 AVP Transcript Title: Genetic Similarity Title Slide Narrator: Welcome to this presentation on genetic similarity. Slide 2 Title: Underlying the Outwardly Visible Similarities, We Find

More information

Schools of Systematics. Schools of Systematics. What is Evolutionary Systematics? What is Evolutionary Systematics? What is Evolutionary Systematics?

Schools of Systematics. Schools of Systematics. What is Evolutionary Systematics? What is Evolutionary Systematics? What is Evolutionary Systematics? Topic 3: Systematics II What are the schools of thought of systematics? How does one make a cladogram? What are higher taxa & ranks? How should they be used in this course? Applying phylogenies Evolution

More information

Table of Contents. Chapter 1 Read Me First! 1. Chapter 2 Tutorial: Estimate a Tree 11

Table of Contents. Chapter 1 Read Me First! 1. Chapter 2 Tutorial: Estimate a Tree 11 Table of Contents Chapter 1 Read Me First! 1 New and Improved Software 2 Just What Is a Phylogenetic Tree? 3 Estimating Phylogenetic Trees: The Basics 4 Beyond the Basics 5 Learn More about the Principles

More information

Gene Trees in Species Trees

Gene Trees in Species Trees Computational Biology Class Presentation Gene Trees in Species Trees Wayne P. Maddison Published in Systematic Biology, Vol. 46, No.3 (Sept 1997), pp. 523-536 Overview Gene Trees and Species Trees Processes

More information

PHYLOGENETIC ANALYSIS

PHYLOGENETIC ANALYSIS Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Second Edition Andreas D. Baxevanis, B.F. Francis Ouellette Copyright 2001 John Wiley & Sons, Inc. ISBNs: 0-471-38390-2 (Hardback);

More information

How Populations Evolve

How Populations Evolve How Populations Evolve Darwin and the Origin of the Species Charles Darwin published On the Origin of Species by Means of Natural Selection, November 24, 1859. Darwin presented two main concepts: Life

More information

Accuracies of Ancestral Amino Acid Sequences Inferred by the Parsimony, Likelihood, and Distance Methods

Accuracies of Ancestral Amino Acid Sequences Inferred by the Parsimony, Likelihood, and Distance Methods J Mol Evol (1997) 44(Suppl 1):S139 S146 Springer-Verlag New York Inc. 1997 Accuracies of Ancestral Amino Acid Sequences Inferred by the Parsimony, Likelihood, and Distance Methods Jianzhi Zhang, Masatoshi

More information

A Novel Use of Equilibrium Frequencies in Models of Sequence Evolution

A Novel Use of Equilibrium Frequencies in Models of Sequence Evolution A Novel Use of Equilibrium Frequencies in Models of Sequence Evolution Nick Goldman and Simon Whelan Department of Zoology, University of Cambridge, U.K. Current mathematical models of amino acid sequence

More information

Using MATLAB: Bioinformatics Toolbox for Life Sciences

Using MATLAB: Bioinformatics Toolbox for Life Sciences Using MATLAB: Bioinformatics Toolbox for Life Sciences MR. SARAWUT WONGPHAYAK BIOINFORMATICS PROGRAM, SCHOOL OF BIORESOURCES AND TECHNOLOGY, AND SCHOOL OF INFORMATION TECHNOLOGY, KING MONGKUT S UNIVERSITY

More information

Bioinformatics and handwriting/speech recognition: unconventional applications of similarity search tools

Bioinformatics and handwriting/speech recognition: unconventional applications of similarity search tools Bioinformatics and handwriting/speech recognition: unconventional applications of similarity search tools Kyle Jensen and Gregory Stephanopoulos Department of Chemical Engineering, Massachusetts Institute

More information

A branch-and-bound algorithm for the inference of ancestral. amino-acid sequences when the replacement rate varies among

A branch-and-bound algorithm for the inference of ancestral. amino-acid sequences when the replacement rate varies among A branch-and-bound algorithm for the inference of ancestral amino-acid sequences when the replacement rate varies among sites Tal Pupko 1,*, Itsik Pe er 2, Masami Hasegawa 1, Dan Graur 3, and Nir Friedman

More information

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office

More information

Gene Mutation. (CHAPTER 16- Brooker Text) October 11, 2007 BIO 184 Dr. Tom Peavy

Gene Mutation. (CHAPTER 16- Brooker Text) October 11, 2007 BIO 184 Dr. Tom Peavy Gene Mutation (CHAPTER 16- Brooker Text) October 11, 2007 BIO 184 Dr. Tom Peavy The term mutation refers to a heritable change in the genetic material Mutations provide allelic variations On the positive

More information

MAKING AN EVOLUTIONARY TREE

MAKING AN EVOLUTIONARY TREE Student manual MAKING AN EVOLUTIONARY TREE THEORY The relationship between different species can be derived from different information sources. The connection between species may turn out by similarities

More information

Bayesian coalescent inference of population size history

Bayesian coalescent inference of population size history Bayesian coalescent inference of population size history Alexei Drummond University of Auckland Workshop on Population and Speciation Genomics, 2016 1st February 2016 1 / 39 BEAST tutorials Population

More information

Simon Tavarg Statistics Department Colorado State University Fort Collins, CO ABSTRACT

Simon Tavarg Statistics Department Colorado State University Fort Collins, CO ABSTRACT Simon Tavarg Statistics Department Colorado State University Fort Collins, CO 80523 ABSTRACT Some models for the estimation of substitution rates from pairs of functionally homologous DNA sequences are

More information

CSE8393 Introduction to Bioinformatics Lecture 3: More problems, Global Alignment. DNA sequencing

CSE8393 Introduction to Bioinformatics Lecture 3: More problems, Global Alignment. DNA sequencing SE8393 Introduction to Bioinformatics Lecture 3: More problems, Global lignment DN sequencing Recall that in biological experiments only relatively short segments of the DN can be investigated. To investigate

More information

CD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/

CD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/ CD-HIT User s Guide Last updated: April 5, 2010 http://cd-hit.org http://bioinformatics.org/cd-hit/ Program developed by Weizhong Li s lab at UCSD http://weizhong-lab.ucsd.edu liwz@sdsc.edu 1. Introduction

More information

Activity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations

Activity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations Activity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations SCENARIO You have responded, as a result of a call from the police to the Coroner s Office, to the scene of the death of

More information

Core Bioinformatics. Degree Type Year Semester

Core Bioinformatics. Degree Type Year Semester Core Bioinformatics 2015/2016 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: Sonia.Casillas@uab.cat Teachers Use of

More information

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

More information

TEACHER S GUIDE. Case Study. Evolution of colour vision in primates. Jarosław Bryk. Max Planck Institute for Evolutionary Biology, Plön. Version 1.

TEACHER S GUIDE. Case Study. Evolution of colour vision in primates. Jarosław Bryk. Max Planck Institute for Evolutionary Biology, Plön. Version 1. TEACHER S GUIDE Case Study Evolution of colour vision in primates Jarosław Bryk Max Planck Institute for Evolutionary Biology, Plön Version 1.1 Evolution of colour vision in primates In this activity students

More information

LAB 21 Using Bioinformatics to Investigate Evolutionary Relationships; Have a BLAST!

LAB 21 Using Bioinformatics to Investigate Evolutionary Relationships; Have a BLAST! LAB 21 Using Bioinformatics to Investigate Evolutionary Relationships; Have a BLAST! Introduction: Between 1990-2003, scientists working on an international research project known as the Human Genome Project,

More information

Core Bioinformatics. Titulació Tipus Curs Semestre. 4313473 Bioinformàtica/Bioinformatics OB 0 1

Core Bioinformatics. Titulació Tipus Curs Semestre. 4313473 Bioinformàtica/Bioinformatics OB 0 1 Core Bioinformatics 2014/2015 Codi: 42397 Crèdits: 12 Titulació Tipus Curs Semestre 4313473 Bioinformàtica/Bioinformatics OB 0 1 Professor de contacte Nom: Sònia Casillas Viladerrams Correu electrònic:

More information

Computational searches of biological sequences

Computational searches of biological sequences UNAM, México, Enero 78 Computational searches of biological sequences Special thanks to all the scientis that made public available their presentations throughout the web from where many slides were taken

More information

morephyml User Guide [Version 1.14] August 2011 by Alexis Criscuolo

morephyml User Guide [Version 1.14] August 2011 by Alexis Criscuolo morephyml User Guide [Version 1.14] August 2011 by Alexis Criscuolo ftp://ftp.pasteur.fr/pub/gensoft/projects/morephyml/ http://mobyle.pasteur.fr/cgi-bin/portal.py Please cite this paper if you use this

More information

DNA Sequence Classification in the Presence of Sequencing Errors Project of CSE847

DNA Sequence Classification in the Presence of Sequencing Errors Project of CSE847 DNA Sequence Classification in the Presence of Sequencing Errors Project of CSE847 Yuan Zhang zhangy72@msu.edu Cheng Yuan chengy@msu.edu Computer Science and Engineering Department Michigan State University

More information

NSilico Life Science Introductory Bioinformatics Course

NSilico Life Science Introductory Bioinformatics Course NSilico Life Science Introductory Bioinformatics Course INTRODUCTORY BIOINFORMATICS COURSE A public course delivered over three days on the fundamentals of bioinformatics and illustrated with lectures,

More information

Multiple Sequence Alignment. Hot Topic 5/24/06 Kim Walker

Multiple Sequence Alignment. Hot Topic 5/24/06 Kim Walker Multiple Sequence Alignment Hot Topic 5/24/06 Kim Walker Outline Why are Multiple Sequence Alignments useful? What Tools are Available? Brief Introduction to ClustalX Tools to Edit and Add Features to

More information

Distributions of Statistics Used for the Comparison of Models of Sequence Evolution in Phylogenetics

Distributions of Statistics Used for the Comparison of Models of Sequence Evolution in Phylogenetics Distributions of Statistics Used for the Comparison of Models of Sequence Evolution in Phylogenetics Simon Whelan and Nick Goldman Department of Genetics, University of Cambridge Asymptotic statistical

More information

Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question. Name: Class: Date: Chapter 17 Practice Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The correct order for the levels of Linnaeus's classification system,

More information

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:

More information

Y-DNA FACT SHEET. Bruce A. Crawford

Y-DNA FACT SHEET. Bruce A. Crawford Y-DNA FACT SHEET By Bruce A. Crawford For those not familiar with DNA analysis and particularly Y-DNA, the following explanation may help. DNA is the basic building block of cell information and heredity.

More information

What next? Computational Biology and Bioinformatics. Finding homologs 2. Finding homologs. 4. Searching for homologs with BLAST

What next? Computational Biology and Bioinformatics. Finding homologs 2. Finding homologs. 4. Searching for homologs with BLAST Computational Biology and Bioinformatics 4. Searching for homologs with BLAST What next? Comparing sequences and searching for homologs Sequence alignment and substitution matrices Searching for sequences

More information

DNA & Protein Sequence Comparison

DNA & Protein Sequence Comparison equence Comparison & rotein equence Comparison harm 207 / Bio 207 ecture 2 utbuddin octor equence is often known early in analysis rotein sequence confers more information. lignment between sequences arious

More information

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006 Hidden Markov Models in Bioinformatics By Máthé Zoltán Kőrösi Zoltán 2006 Outline Markov Chain HMM (Hidden Markov Model) Hidden Markov Models in Bioinformatics Gene Finding Gene Finding Model Viterbi algorithm

More information

Hierarchical Bayesian Modeling of the HIV Response to Therapy

Hierarchical Bayesian Modeling of the HIV Response to Therapy Hierarchical Bayesian Modeling of the HIV Response to Therapy Shane T. Jensen Department of Statistics, The Wharton School, University of Pennsylvania March 23, 2010 Joint Work with Alex Braunstein and

More information

Multiple Sequence Alignment

Multiple Sequence Alignment Multiple Sequence Alignment Definition Given N sequences x 1, x 2,, x N : Insert gaps (-) in each sequence x i, such that All sequences have the same length L Score of the global map is maximum Applications

More information

Optimization of Gene Prediction via More Accurate Phylogenetic Substitution Models

Optimization of Gene Prediction via More Accurate Phylogenetic Substitution Models Washington University in St. Louis Washington University Open Scholarship All Computer Science and Engineering Research Computer Science and Engineering Report Number: WUCSE-2011-78 2011 Optimization of

More information

Mathematics and Statistics: Apply probability methods in solving problems (91267)

Mathematics and Statistics: Apply probability methods in solving problems (91267) NCEA Level 2 Mathematics (91267) 2013 page 1 of 5 Assessment Schedule 2013 Mathematics and Statistics: Apply probability methods in solving problems (91267) Evidence Statement with Merit Apply probability

More information

Protein Sequencing and Identification With Mass Spectrometry

Protein Sequencing and Identification With Mass Spectrometry Protein Sequencing and Identification With Mass Spectrometry Tandem Mass Spectrometry De Novo Peptide Sequencing Spectrum Graph Protein Identification via Database Search Identifying Post Translationally

More information

Guide for Bioinformatics Project Module 3

Guide for Bioinformatics Project Module 3 Structure- Based Evidence and Multiple Sequence Alignment In this module we will revisit some topics we started to look at while performing our BLAST search and looking at the CDD database in the first

More information

Current Motif Discovery Tools and their Limitations

Current Motif Discovery Tools and their Limitations Current Motif Discovery Tools and their Limitations Philipp Bucher SIB / CIG Workshop 3 October 2006 Trendy Concepts and Hypotheses Transcription regulatory elements act in a context-dependent manner.

More information

12/22/2014. Read the introduction. How does a cell make proteins with the information from DNA? Protein Synthesis: Transcription and Translation

12/22/2014. Read the introduction. How does a cell make proteins with the information from DNA? Protein Synthesis: Transcription and Translation EQ How does a cell make proteins with the information from DNA? Protein Synthesis: Get Started Get Started Think of a corn cell that is genetically modified to contain the Bt gene and a corn cell that

More information

Genome Explorer For Comparative Genome Analysis

Genome Explorer For Comparative Genome Analysis Genome Explorer For Comparative Genome Analysis Jenn Conn 1, Jo L. Dicks 1 and Ian N. Roberts 2 Abstract Genome Explorer brings together the tools required to build and compare phylogenies from both sequence

More information

Sequence Alignment. Part 0: Files. Part 1: Local Alignment

Sequence Alignment. Part 0: Files. Part 1: Local Alignment Sequence Alignment Note: in the experiments below, you will not necessarily be given an exact recipe for how to proceed. You are encouraged to explore and find out what works and what doesn't. You're also

More information

MCMC A T T T G C T C B T T C C C T C C G C C T C T C D C C T T C T C. (Saitou and Nei, 1987) (Swofford and Begle, 1993)

MCMC A T T T G C T C B T T C C C T C C G C C T C T C D C C T T C T C. (Saitou and Nei, 1987) (Swofford and Begle, 1993) MCMC 1 1 1 DNA 1 2 3 4 5 6 7 A T T T G C T C B T T C C C T C C G C C T C T C D C C T T C T C ( 2) (Saitou and Nei, 1987) (Swofford and Begle, 1993) 1 A B C D (Vos, 2003) (Zwickl, 2006) (Morrison, 2007)

More information

Experiments beginning in the middle 1960s found that many enzymes in

Experiments beginning in the middle 1960s found that many enzymes in Inferring Selection and Mutation from DNA Sequences: The McDonald-Kreitman Test Revisited Stanley A. Sawyer Abstract Aligned DNA sequences from the same species, and from two species that share a sufficiently

More information

Principles of Evolution - Origin of Species

Principles of Evolution - Origin of Species Theories of Organic Evolution X Multiple Centers of Creation (de Buffon) developed the concept of "centers of creation throughout the world organisms had arisen, which other species had evolved from X

More information

PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference

PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference Stephane Guindon, F. Le Thiec, Patrice Duroux, Olivier Gascuel To cite this version: Stephane Guindon, F. Le Thiec, Patrice

More information

Probability, statistics and football Franka Miriam Bru ckler Paris, 2015.

Probability, statistics and football Franka Miriam Bru ckler Paris, 2015. Probability, statistics and football Franka Miriam Bru ckler Paris, 2015 Please read this before starting! Although each activity can be performed by one person only, it is suggested that you work in groups

More information

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011 Sequence Formats and Sequence Database Searches Gloria Rendon SC11 Education June, 2011 Sequence A is the primary structure of a biological molecule. It is a chain of residues that form a precise linear

More information

Maximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1

Maximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1 Maximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1 Ziheng Yang Department of Animal Science, Beijing Agricultural University Felsenstein s maximum-likelihood

More information

C1. A gene pool is all of the genes present in a particular population. Each type of gene within a gene pool may exist in one or more alleles.

C1. A gene pool is all of the genes present in a particular population. Each type of gene within a gene pool may exist in one or more alleles. C1. A gene pool is all of the genes present in a particular population. Each type of gene within a gene pool may exist in one or more alleles. The prevalence of an allele within the gene pool is described

More information

Enhanced Domain Recognition Using Phylogeny

Enhanced Domain Recognition Using Phylogeny Chapter 3 Enhanced Domain Recognition Using Phylogeny There have been several suggestions in the literature for combining sequence based hidden Markov models (HMMs) with models of evolution [Yan95, MD95,

More information