Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment
|
|
- Randolph Cannon
- 8 years ago
- Views:
Transcription
1 Sequence Analysis 15: lecture 5 Substitution matrices Multiple sequence alignment
2 A teacher's dilemma To understand... Multiple sequence alignment Substitution matrices Phylogenetic trees You first need to know... Substitution matrices Phylogenetic trees Multiple sequence alignment We ll start with substitution matrices.
3 Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of mutation (or log-odds ratio) Derived from multiple sequence alignments Two commonly used matrices: PAM and BLOSUM PAM = percent accepted mutations (Dayhoff) BLOSUM = Blocks substitution matrix (Henikoff)
4 PAM M Dayhoff, 1978 Evolutionary time is measured in Percent Accepted Mutations, or PAMs One PAM of evolution means 1% of the residues/bases have changed, averaged over all 20 amino acids. To get the relative frequency of each type of mutation, we count the times it was observed in a database of multiple sequence alignments. Based on global alignments Assumes a Markov model for evolution. Margaret Oakley Dayhoff
5 BLOSUM Henikoff & Henikoff, 1992 Based on database of ungapped local alignments (BLOCKS) Alignments have lower similarity than PAM alignments. BLOSUM number indicates the percent identity level of sequences in the alignment. For example, for BLOSUM62 sequences with approximately 62% identity were counted. Some BLOCKS represent functional units, providing validation of the alignment. Steven Henikoff
6 Multiple Sequence Alignment A multiple sequence alignment is made using many pairwise sequence alignments
7 Columns in a MSA have a common evolutionary history a phylogenetic tree for one position in the alignment? N S A By aligning the sequences, we are asserting that the aligned residues in each column had a common ancestor.
8 A tree shows the evolutionary history of a single position Ancestral characters can be inferred by parsimony analysis. W N W W worm clam bird goat fish 8
9 Counting mutations without knowing ancestral sequences Naíve way: Assume any of the characters could be the ancestral one. Assume equal distance to the ancestor from each taxon. L K F L K F L K F L K F L K F L K F L K F W W N R L S K K P R L S K K P R L T K K P R L S K K P R L S R K P R L T R K P R L ~ K K P W W N If was the ancestor, then it mutated to a W twice, to N once, and stayed three times.
10 We could have picked W as the ancestor... L K F L K F L K F L K F L K F L K F L K F W W N R L S K K P R L S K K P R L T K K P R L S K K P R L S R K P R L T R K P R L ~ K K P W W N If W was the ancestor, then it mutated to a four times, to N once, and stayed W once. * *FYI: This is how you draw a phylogenetic tree when the branch order is not known.
11 Subsitution matrices are symmetrical Since we don't know which sequence came first, we don't know whether w or...is correct. So we count this as one mutation of each type. P(-->W) and P(W-->) are the same number. (That's why we only show the upper triangle) w
12 Summing the substitution counts We assume the ancestor is one of the observed amino acids, but we don't know which, so we try them all. W W N N N W one column of a MSA W symmetrical matrix
13 Next possible ancestor, again. We already counted this, so ignore it. W W N N N W W
14 W W N N N W 2 1 W 1
15 W W N N N W 2 1 W 0
16 N W W W N N W
17 Next... again W W N Counting as the ancestor many times as it appears recognizes the increased likelihood that (the most frequent aa at this position) is the true ancestor. N W N W 1 0 0
18 W W N (no counts for last seq.) N W N W 0 0 0
19 o to next column. Continue summing. W W N P P I N P P A Continue doing this for every column in every multiple sequence alignment... N W N W TOTAL=21 1
20 Probability ratios are expressed as log odds Substitutions (and many other things in bioinformatics) are expressed as a "likelihood ratio", or "odds ratio" of the observed data over the expected value. Likelihood and odds are synomyms for Probability. So Log Odds is the log (usually base 2) of the odds ratio. log odds ratio = log 2 (observed/expected )
21 How do you calculate log-odds? P() = 4/7 = 0.57 Observed probability of -> q = P(->)=6/21 = 0.29 Expected probability of ->, e = 0.57*0.57 = 0.33 odds ratio = q /e = 0.29/0.33 log odds ratio = log 2 (q /e ) If the lod is < 0., then the mutation is less likely than expected by chance. If it is > 0., it is more likely.
22 Same amino acids, different distribution, different outcome. P()=0.50 e = 0.25 q = 9/42 =0.21 lod = log 2 (0.21/0.25) = 0.2 A W W A N A A s spread over many columns P()=0.50 e = 0.25 q = 21/42 =0.5 lod = log 2 (0.50/0.25) = 1 W A W A W A A s concentrated
23 Different observations, same expectation P()=0.50, P(W)=0.14 e W = 0.07 q W = 7/42 =0.17 lod = log 2 (0.17/0.07) = 1.3 A W A W N A A and W seen together more often than expected. P()=0.50, P(W)=0.14 e W = 0.07 q = 3/42 =0.07 lod = log 2 (0.07/0.07) = 0 W A W A W A A s and W s not seen together.
24 In class exercise: et the substitution value for P->Q PQPP QQQP QQPP QPPP QQQP P Q P Q P(P)=, P(Q)= e PQ = q PQ = / = lod = log 2 (q PQ /e PQ ) = sequence alignment database. substitution counts expected (e), versus observed (q) for P->Q
25 PAM assumes Markovian evolution A Markov process is one where the likelihood of the next "state" depends only on the current state. The inference that evolution is Markovian assumes that base changes (or amino acid changes) occur at a constant rate and depend only on the identity of the current base (or amino acid). transition likelihood / MY current aa -> ->A ->V V->V V-> A V V millions of years (MY)
26 Markovian evolution is an extrapolation Start with one sequence. One position. Say ly. Wait 1 million years. What amino acids are now found at that position? PAM1 = Wait another million years. PAM1 = But,... PAM1 is just PAM1 PAM1
27 250 million years? PAM1 PAM1 PAM1 = 250 PAM250 = PAM1 The number after PAM denotes the power to which PAM1 was taken.
28 NOTE OF CLARIFICATION: PAM does not stand for Plus A Million years (or anything like that). It stands for Percent Accepted Mutations. One PAM1 unit does not correspond to 1 million years of evolution. There is no timescale associated with PAM. PAM1 corresponds to 1% mutations. (or 99% identity). The timescale depends on the species. 28
29 Differences between PAM and BLOSUM PAM PAM matrices are based on global alignments of closely related proteins. The PAM1 is the matrix calculated from comparisons of sequences with no more than 1% divergence. Other PAM matrices are extrapolated from PAM1 using an assumed Markov chain. BLOSUM BLOSUM matrices are based on local alignments. BLOSUM 62 is a matrix calculated from comparisons of sequences with approx 62% identity. All BLOSUM matrices are based on observed alignments; they are not extrapolated from comparisons of closely related proteins. BLOSUM 62 is the default matrix in BLAST (the database search program). It is tailored for comparisons of moderately distant proteins. Alignment of distant relatives may be more accurate with a different matrix.
30 PAM250
31 BLOSUM62
32 In class exercise: Which substitution matrix favors... PAM250 BLOSUM62 conservation of polar residues conservation of non-polar residues conservation of C, Y, or W polar-to-nonpolar mutations polar-to-polar mutations
33 Protein versus DNA alignments Are protein alignment better? Protein alphabet = 20, DNA alphabet = 4. Protein alignment is more informative Less chance of homoplasy with proteins. Homology detectable at greater edit distance Protein alignment more informative Better old Standard alignments are available for proteins. Better statistics from.s. alignments. On the other hand, DNA alignments are more sensitive to short evolutionary distances. 33
34 DNA evolutionary models: P-distance What is the relationship between time and the %identity? p = D L evolutionary time 0 p 1 p is a good measure of time only when p is small. 34
35 DNA evolutionary models: Poisson correction Corrects for multiple mutations at the same site. Unobserved mutations. p = D L 1-p = e -2rt d P = 2rt dp = -ln(1-p) The fraction unchanged decays according to the Poisson function. In the time t since the common ancestor, 2rt mutations have occurred, where r is the mutation rate (r = genetic drift * selection pressure) evolutionary time 0 p 1 Poisson correction assumes p goes to 1 at t=. Where should it really go? 35
36 DNA evolutionary models: Jukes-Cantor What is the relationship between true evolutionary distance and p-distance? Prob(mutation in one unit of time) = α Prob(no mutation) = 1-3α A C α << 1. T A 1-3α α α α At time t, fraction identical is q(t). Fraction non-identical is p(t). p(t) + q(t) = 1 In time t+1, each of q(t) positions stays same with prob = 1-3α. C T α α α 1-3α α α α 1-3α α α α 1-3α Prob that both sequences do not mutate = (1-3α) 2 =(1-6α+9α 2 ) (1-6α). ( Since α<<1, we can safely neglect α 2. ) Prob that a mismatch mutates back to an identity = 2αp(t) q(t+1) = q(t)(1-6α) + 2α(1-q(t)) d q(t)/dt q(t+1) - q(t) = 2α - 8α q(t) Integrating: q(t) = (1/4)(1 + 3exp(-8αt)) Solving for d JC = 6αt = -(3/4)ln(1 - (4/3)p), where p is the p-distance. 36
37 DNA evolutionary models evolutionary time 0 p 1 p-distance Poisson correction Jukes-Cantor In Jukes-Cantor, p limits to p=0.75 at infinite evolutionary distance. 37
38 Transitions/transversions A C In DNA replication, errors can be transitions (purine for purine, pyrimidine for pyrimidine) or transversions (purine for pyrimidine & vice versa) R = transitions/transversions. R would be 1/2 if all mutations were equally likely. In DNA alignments, R is observed to be about 4. Transitions are greatly favored over transversions. T 38
39 Jukes-Cantor with correction for transitions/ transversions (Kimura 2-parameter model, dk2p) A C A C T 1-2β-α β β α β 1-2β-α α β β α β β α 1-2β-α β 1-2β-α Split changes (D) into the two types, transition (P) and transversion (Q) p-distance = D/L = P + Q P = transitions/l, Q=transversions/L The the corrected evolutionary distance is... d K2P =-(1/2)ln(1-2P-Q)-(1/4)ln(1-2Q) 39
40 Further corrections are possible, but... Do we need a nucleotide substitution matrix? A C T A C Additional corrections possible for: Sequence position (gamma) Isochores (C-rich, AT-rich regions) Methylated, not methylated.
41 Review questions What are we asserting about ancestry by aligning the sequences? What kind of data is used to generate a substitution matrix? What makes the PAM matrix Markovian? When should you align amino acid sequence, when DNA? When is the P-distance an accurate measure of time? Why is time versus p-distance Poisson? Why are transitions more common the transversions? What does Kimura do that Jukes-Cantor does not? 41
Amino Acids and Their Properties
Amino Acids and Their Properties Recap: ss-rrna and mutations Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used So by aligning ss-rrna of one organism with that
More informationBio-Informatics Lectures. A Short Introduction
Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively
More informationIntroduction to Phylogenetic Analysis
Subjects of this lecture Introduction to Phylogenetic nalysis Irit Orr 1 Introducing some of the terminology of phylogenetics. 2 Introducing some of the most commonly used methods for phylogenetic analysis.
More informationIntroduction to Bioinformatics AS 250.265 Laboratory Assignment 6
Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 In the last lab, you learned how to perform basic multiple sequence alignments. While useful in themselves for determining conserved residues
More informationNetwork Protocol Analysis using Bioinformatics Algorithms
Network Protocol Analysis using Bioinformatics Algorithms Marshall A. Beddoe Marshall_Beddoe@McAfee.com ABSTRACT Network protocol analysis is currently performed by hand using only intuition and a protocol
More informationPairwise Sequence Alignment
Pairwise Sequence Alignment carolin.kosiol@vetmeduni.ac.at SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What
More informationRapid alignment methods: FASTA and BLAST. p The biological problem p Search strategies p FASTA p BLAST
Rapid alignment methods: FASTA and BLAST p The biological problem p Search strategies p FASTA p BLAST 257 BLAST: Basic Local Alignment Search Tool p BLAST (Altschul et al., 1990) and its variants are some
More informationProtein Sequence Analysis - Overview -
Protein Sequence Analysis - Overview - UDEL Workshop Raja Mazumder Research Associate Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center Topics Why do protein
More informationSimilarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003
Similarity Searches on Sequence Databases: BLAST, FASTA Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Outline Importance of Similarity Heuristic Sequence Alignment:
More informationClone Manager. Getting Started
Clone Manager for Windows Professional Edition Volume 2 Alignment, Primer Operations Version 9.5 Getting Started Copyright 1994-2015 Scientific & Educational Software. All rights reserved. The software
More informationRETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
More informationCore Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1
Core Bioinformatics 2014/2015 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformàtica/Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: Sonia.Casillas@uab.cat
More informationPROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org
BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab Mai S. Mabrouk 1, Marwa Hamdy 2, Marwa Mamdouh 2, Marwa Aboelfotoh 2,Yasser M. Kadah 2 1 Biomedical Engineering Department,
More informationBLAST. Anders Gorm Pedersen & Rasmus Wernersson
BLAST Anders Gorm Pedersen & Rasmus Wernersson Database searching Using pairwise alignments to search databases for similar sequences Query sequence Database Database searching Most common use of pairwise
More informationPhylogenetic Trees Made Easy
Phylogenetic Trees Made Easy A How-To Manual Fourth Edition Barry G. Hall University of Rochester, Emeritus and Bellingham Research Institute Sinauer Associates, Inc. Publishers Sunderland, Massachusetts
More informationTHREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
More informationDNA Insertions and Deletions in the Human Genome. Philipp W. Messer
DNA Insertions and Deletions in the Human Genome Philipp W. Messer Genetic Variation CGACAATAGCGCTCTTACTACGTGTATCG : : CGACAATGGCGCT---ACTACGTGCATCG 1. Nucleotide mutations 2. Genomic rearrangements 3.
More informationBayesian Phylogeny and Measures of Branch Support
Bayesian Phylogeny and Measures of Branch Support Bayesian Statistics Imagine we have a bag containing 100 dice of which we know that 90 are fair and 10 are biased. The
More informationLab 2/Phylogenetics/September 16, 2002 1 PHYLOGENETICS
Lab 2/Phylogenetics/September 16, 2002 1 Read: Tudge Chapter 2 PHYLOGENETICS Objective of the Lab: To understand how DNA and protein sequence information can be used to make comparisons and assess evolutionary
More informationThe Central Dogma of Molecular Biology
Vierstraete Andy (version 1.01) 1/02/2000 -Page 1 - The Central Dogma of Molecular Biology Figure 1 : The Central Dogma of molecular biology. DNA contains the complete genetic information that defines
More informationIntroduction to Bioinformatics 3. DNA editing and contig assembly
Introduction to Bioinformatics 3. DNA editing and contig assembly Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 matthewb@ba.ars.usda.gov
More informationBioinformatics Grid - Enabled Tools For Biologists.
Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis
More informationDNA Sequence Alignment Analysis
Analysis of DNA sequence data p. 1 Analysis of DNA sequence data using MEGA and DNAsp. Analysis of two genes from the X and Y chromosomes of plant species from the genus Silene The first two computer classes
More informationDatabase searching with DNA and protein sequences: An introduction Clare Sansom Date received (in revised form): 12th November 1999
Dr Clare Sansom works part time at Birkbeck College, London, and part time as a freelance computer consultant and science writer At Birkbeck she coordinates an innovative graduate-level Advanced Certificate
More informationBioinformatics Resources at a Glance
Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences
More informationA Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML
9 June 2011 A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML by Jun Inoue, Mario dos Reis, and Ziheng Yang In this tutorial we will analyze
More informationBIOINFORMATICS TUTORIAL
Bio 242 BIOINFORMATICS TUTORIAL Bio 242 α Amylase Lab Sequence Sequence Searches: BLAST Sequence Alignment: Clustal Omega 3d Structure & 3d Alignments DO NOT REMOVE FROM LAB. DO NOT WRITE IN THIS DOCUMENT.
More informationBASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s
More informationPHYLOGENETIC ANALYSIS
Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Second Edition Andreas D. Baxevanis, B.F. Francis Ouellette Copyright 2001 John Wiley & Sons, Inc. ISBNs: 0-471-38390-2 (Hardback);
More informationWorksheet - COMPARATIVE MAPPING 1
Worksheet - COMPARATIVE MAPPING 1 The arrangement of genes and other DNA markers is compared between species in Comparative genome mapping. As early as 1915, the geneticist J.B.S Haldane reported that
More informationUsing MATLAB: Bioinformatics Toolbox for Life Sciences
Using MATLAB: Bioinformatics Toolbox for Life Sciences MR. SARAWUT WONGPHAYAK BIOINFORMATICS PROGRAM, SCHOOL OF BIORESOURCES AND TECHNOLOGY, AND SCHOOL OF INFORMATION TECHNOLOGY, KING MONGKUT S UNIVERSITY
More informationProtein & DNA Sequence Analysis. Bobbie-Jo Webb-Robertson May 3, 2004
Protein & DNA Sequence Analysis Bobbie-Jo Webb-Robertson May 3, 2004 Sequence Analysis Anything connected to identifying higher biological meaning out of raw sequence data. 2 Genomic & Proteomic Data Sequence
More informationAlgorithms in Computational Biology (236522) spring 2007 Lecture #1
Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office
More informationData for phylogenetic analysis
Data for phylogenetic analysis The data that are used to estimate the phylogeny of a set of tips are the characteristics of those tips. Therefore the success of phylogenetic inference depends in large
More informationComputational searches of biological sequences
UNAM, México, Enero 78 Computational searches of biological sequences Special thanks to all the scientis that made public available their presentations throughout the web from where many slides were taken
More informationA branch-and-bound algorithm for the inference of ancestral. amino-acid sequences when the replacement rate varies among
A branch-and-bound algorithm for the inference of ancestral amino-acid sequences when the replacement rate varies among sites Tal Pupko 1,*, Itsik Pe er 2, Masami Hasegawa 1, Dan Graur 3, and Nir Friedman
More informationCore Bioinformatics. Degree Type Year Semester
Core Bioinformatics 2015/2016 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: Sonia.Casillas@uab.cat Teachers Use of
More informationOverview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
More informationProbability, statistics and football Franka Miriam Bru ckler Paris, 2015.
Probability, statistics and football Franka Miriam Bru ckler Paris, 2015 Please read this before starting! Although each activity can be performed by one person only, it is suggested that you work in groups
More informationActivity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations
Activity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations SCENARIO You have responded, as a result of a call from the police to the Coroner s Office, to the scene of the death of
More informationBIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS
BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:
More informationCurrent Motif Discovery Tools and their Limitations
Current Motif Discovery Tools and their Limitations Philipp Bucher SIB / CIG Workshop 3 October 2006 Trendy Concepts and Hypotheses Transcription regulatory elements act in a context-dependent manner.
More informationBayesian coalescent inference of population size history
Bayesian coalescent inference of population size history Alexei Drummond University of Auckland Workshop on Population and Speciation Genomics, 2016 1st February 2016 1 / 39 BEAST tutorials Population
More informationPHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference
PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference Stephane Guindon, F. Le Thiec, Patrice Duroux, Olivier Gascuel To cite this version: Stephane Guindon, F. Le Thiec, Patrice
More informationPrinciples of Evolution - Origin of Species
Theories of Organic Evolution X Multiple Centers of Creation (de Buffon) developed the concept of "centers of creation throughout the world organisms had arisen, which other species had evolved from X
More informationMultiple Sequence Alignment. Hot Topic 5/24/06 Kim Walker
Multiple Sequence Alignment Hot Topic 5/24/06 Kim Walker Outline Why are Multiple Sequence Alignments useful? What Tools are Available? Brief Introduction to ClustalX Tools to Edit and Add Features to
More informationName: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.
Name: Class: Date: Chapter 17 Practice Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The correct order for the levels of Linnaeus's classification system,
More informationHeuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations
Heuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations AlCoB 2014 First International Conference on Algorithms for Computational Biology Thiago da Silva Arruda Institute
More informationCore Bioinformatics. Titulació Tipus Curs Semestre. 4313473 Bioinformàtica/Bioinformatics OB 0 1
Core Bioinformatics 2014/2015 Codi: 42397 Crèdits: 12 Titulació Tipus Curs Semestre 4313473 Bioinformàtica/Bioinformatics OB 0 1 Professor de contacte Nom: Sònia Casillas Viladerrams Correu electrònic:
More informationmorephyml User Guide [Version 1.14] August 2011 by Alexis Criscuolo
morephyml User Guide [Version 1.14] August 2011 by Alexis Criscuolo ftp://ftp.pasteur.fr/pub/gensoft/projects/morephyml/ http://mobyle.pasteur.fr/cgi-bin/portal.py Please cite this paper if you use this
More informationGenome Explorer For Comparative Genome Analysis
Genome Explorer For Comparative Genome Analysis Jenn Conn 1, Jo L. Dicks 1 and Ian N. Roberts 2 Abstract Genome Explorer brings together the tools required to build and compare phylogenies from both sequence
More informationHierarchical Bayesian Modeling of the HIV Response to Therapy
Hierarchical Bayesian Modeling of the HIV Response to Therapy Shane T. Jensen Department of Statistics, The Wharton School, University of Pennsylvania March 23, 2010 Joint Work with Alex Braunstein and
More informationMAKING AN EVOLUTIONARY TREE
Student manual MAKING AN EVOLUTIONARY TREE THEORY The relationship between different species can be derived from different information sources. The connection between species may turn out by similarities
More informationHidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006
Hidden Markov Models in Bioinformatics By Máthé Zoltán Kőrösi Zoltán 2006 Outline Markov Chain HMM (Hidden Markov Model) Hidden Markov Models in Bioinformatics Gene Finding Gene Finding Model Viterbi algorithm
More informationMaximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1
Maximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1 Ziheng Yang Department of Animal Science, Beijing Agricultural University Felsenstein s maximum-likelihood
More informationMEGA. Molecular Evolutionary Genetics Analysis VERSION 4. Koichiro Tamura, Joel Dudley Masatoshi Nei, Sudhir Kumar
MEGA Molecular Evolutionary Genetics Analysis VERSION 4 Koichiro Tamura, Joel Dudley Masatoshi Nei, Sudhir Kumar Center of Evolutionary Functional Genomics Biodesign Institute Arizona State University
More informationMORPHEUS. http://biodev.cea.fr/morpheus/ Prediction of Transcription Factors Binding Sites based on Position Weight Matrix.
MORPHEUS http://biodev.cea.fr/morpheus/ Prediction of Transcription Factors Binding Sites based on Position Weight Matrix. Reference: MORPHEUS, a Webtool for Transcripton Factor Binding Analysis Using
More informationA comparison of methods for estimating the transition:transversion ratio from DNA sequences
Molecular Phylogenetics and Evolution 32 (2004) 495 503 MOLECULAR PHYLOGENETICS AND EVOLUTION www.elsevier.com/locate/ympev A comparison of methods for estimating the transition:transversion ratio from
More informationSequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011
Sequence Formats and Sequence Database Searches Gloria Rendon SC11 Education June, 2011 Sequence A is the primary structure of a biological molecule. It is a chain of residues that form a precise linear
More informationMATCH Commun. Math. Comput. Chem. 61 (2009) 781-788
MATCH Communications in Mathematical and in Computer Chemistry MATCH Commun. Math. Comput. Chem. 61 (2009) 781-788 ISSN 0340-6253 Three distances for rapid similarity analysis of DNA sequences Wei Chen,
More informationCD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/
CD-HIT User s Guide Last updated: April 5, 2010 http://cd-hit.org http://bioinformatics.org/cd-hit/ Program developed by Weizhong Li s lab at UCSD http://weizhong-lab.ucsd.edu liwz@sdsc.edu 1. Introduction
More informationModule 10: Bioinformatics
Module 10: Bioinformatics 1.) Goal: To understand the general approaches for basic in silico (computer) analysis of DNA- and protein sequences. We are going to discuss sequence formatting required prior
More informationGuide for Bioinformatics Project Module 3
Structure- Based Evidence and Multiple Sequence Alignment In this module we will revisit some topics we started to look at while performing our BLAST search and looking at the CDD database in the first
More informationMolecular Clocks and Tree Dating with r8s and BEAST
Integrative Biology 200B University of California, Berkeley Principals of Phylogenetics: Ecology and Evolution Spring 2011 Updated by Nick Matzke Molecular Clocks and Tree Dating with r8s and BEAST Today
More informationMathematics and Statistics: Apply probability methods in solving problems (91267)
NCEA Level 2 Mathematics (91267) 2013 page 1 of 5 Assessment Schedule 2013 Mathematics and Statistics: Apply probability methods in solving problems (91267) Evidence Statement with Merit Apply probability
More informationMASCOT Search Results Interpretation
The Mascot protein identification program (Matrix Science, Ltd.) uses statistical methods to assess the validity of a match. MS/MS data is not ideal. That is, there are unassignable peaks (noise) and usually
More informationBiological Sequence Data Formats
Biological Sequence Data Formats Here we present three standard formats in which biological sequence data (DNA, RNA and protein) can be stored and presented. Raw Sequence: Data without description. FASTA
More informationSystematics - BIO 615
Outline - and introduction to phylogenetic inference 1. Pre Lamarck, Pre Darwin Classification without phylogeny 2. Lamarck & Darwin to Hennig (et al.) Classification with phylogeny but without a reproducible
More informationDNA Printer - A Brief Course in sequence Analysis
Last modified August 19, 2015 Brian Golding, Dick Morton and Wilfried Haerty Department of Biology McMaster University Hamilton, Ontario L8S 4K1 ii These notes are in Adobe Acrobat format (they are available
More informationSome Essential Statistics The Lure of Statistics
Some Essential Statistics The Lure of Statistics Data Mining Techniques, by M.J.A. Berry and G.S Linoff, 2004 Statistics vs. Data Mining..lie, damn lie, and statistics mining data to support preconceived
More informationα α λ α = = λ λ α ψ = = α α α λ λ ψ α = + β = > θ θ β > β β θ θ θ β θ β γ θ β = γ θ > β > γ θ β γ = θ β = θ β = θ β = β θ = β β θ = = = β β θ = + α α α α α = = λ λ λ λ λ λ λ = λ λ α α α α λ ψ + α =
More informationTo discuss this topic fully, let us define some terms used in this and the following sets of supplemental notes.
INFINITE SERIES SERIES AND PARTIAL SUMS What if we wanted to sum up the terms of this sequence, how many terms would I have to use? 1, 2, 3,... 10,...? Well, we could start creating sums of a finite number
More informationFlexible Information Visualization of Multivariate Data from Biological Sequence Similarity Searches
Flexible Information Visualization of Multivariate Data from Biological Sequence Similarity Searches Ed Huai-hsin Chi y, John Riedl y, Elizabeth Shoop y, John V. Carlis y, Ernest Retzel z, Phillip Barry
More informationAn Extension Model of Financially-balanced Bonus-Malus System
An Extension Model of Financially-balanced Bonus-Malus System Other : Ratemaking, Experience Rating XIAO, Yugu Center for Applied Statistics, Renmin University of China, Beijing, 00872, P.R. China Phone:
More informationPeople have thought about, and defined, probability in different ways. important to note the consequences of the definition:
PROBABILITY AND LIKELIHOOD, A BRIEF INTRODUCTION IN SUPPORT OF A COURSE ON MOLECULAR EVOLUTION (BIOL 3046) Probability The subject of PROBABILITY is a branch of mathematics dedicated to building models
More informationT cell Epitope Prediction
Institute for Immunology and Informatics T cell Epitope Prediction EpiMatrix Eric Gustafson January 6, 2011 Overview Gathering raw data Popular sources Data Management Conservation Analysis Multiple Alignments
More information2.3 Identify rrna sequences in DNA
2.3 Identify rrna sequences in DNA For identifying rrna sequences in DNA we will use rnammer, a program that implements an algorithm designed to find rrna sequences in DNA [5]. The program was made by
More informationMethods for big data in medical genomics
Methods for big data in medical genomics Parallel Hidden Markov Models in Population Genetics Chris Holmes, (Peter Kecskemethy & Chris Gamble) Department of Statistics and, Nuffield Department of Medicine
More informationA Rough Guide to BEAST 1.4
A Rough Guide to BEAST 1.4 Alexei J. Drummond 1, Simon Y.W. Ho, Nic Rawlence and Andrew Rambaut 2 1 Department of Computer Science The University of Auckland, Private Bag 92019 Auckland, New Zealand alexei@cs.auckland.ac.nz
More informationStatistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
More informationHidden Markov Models
8.47 Introduction to omputational Molecular Biology Lecture 7: November 4, 2004 Scribe: Han-Pang hiu Lecturer: Ross Lippert Editor: Russ ox Hidden Markov Models The G island phenomenon The nucleotide frequencies
More informationPhyML Manual. Version 3.0 September 17, 2008. http://www.atgc-montpellier.fr/phyml
PhyML Manual Version 3.0 September 17, 2008 http://www.atgc-montpellier.fr/phyml Contents 1 Citation 3 2 Authors 3 3 Overview 4 4 Installing PhyML 4 4.1 Sources and compilation.............................
More informationArbres formels et Arbre(s) de la Vie
Arbres formels et Arbre(s) de la Vie A bit of history and biology Definitions Numbers Topological distances Consensus Random models Algorithms to build trees Basic principles DATA sequence alignment distance
More informationAlgorithms in Bioinformatics I, WS06/07, C.Dieterich 47. This lecture is based on the following, which are all recommended reading:
Algorithms in Bioinformatics I, WS06/07, C.Dieterich 47 5 BLAST and FASTA This lecture is based on the following, which are all recommended reading: D.J. Lipman and W.R. Pearson, Rapid and Sensitive Protein
More informationSummary. 16 1 Genes and Variation. 16 2 Evolution as Genetic Change. Name Class Date
Chapter 16 Summary Evolution of Populations 16 1 Genes and Variation Darwin s original ideas can now be understood in genetic terms. Beginning with variation, we now know that traits are controlled by
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationUmm AL Qura University MUTATIONS. Dr Neda M Bogari
Umm AL Qura University MUTATIONS Dr Neda M Bogari CONTACTS www.bogari.net http://web.me.com/bogari/bogari.net/ From DNA to Mutations MUTATION Definition: Permanent change in nucleotide sequence. It can
More informationFinding Clusters in Phylogenetic Trees: A Special Type of Cluster Analysis
Finding lusters in Phylogenetic Trees: Special Type of luster nalysis Why try to identify clusters in phylogenetic trees? xample: origin of HIV. NUMR: Why are there so many distinct clusters? LUR04-7 SYNHRONY:
More informationREWARD System For Even Money Bet in Roulette By Izak Matatya
REWARD System For Even Money Bet in Roulette By Izak Matatya By even money betting we mean betting on Red or Black, High or Low, Even or Odd, because they pay 1 to 1. With the exception of the green zeros,
More informationProduction Functions and Cost of Production
1 Returns to Scale 1 14.01 Principles of Microeconomics, Fall 2007 Chia-Hui Chen October, 2007 Lecture 12 Production Functions and Cost of Production Outline 1. Chap 6: Returns to Scale 2. Chap 6: Production
More informationMultiple Sequence Alignment and Analysis: Part I An Introduction to the Theory and Application of Multiple Sequence Analysis.
Steven M. Thompson Manuscript for Multiple Sequence Alignment and Analysis Page 1 3/31/04 Multiple Sequence Alignment and Analysis: Part I An Introduction to the Theory and Application of Multiple Sequence
More informationSICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE
AP Biology Date SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE LEARNING OBJECTIVES Students will gain an appreciation of the physical effects of sickle cell anemia, its prevalence in the population,
More informationYew May Martin Maureen Maclachlan Tom Karmel Higher Education Division, Department of Education, Training and Youth Affairs.
How is Australia s Higher Education Performing? An analysis of completion rates of a cohort of Australian Post Graduate Research Students in the 1990s. Yew May Martin Maureen Maclachlan Tom Karmel Higher
More informationStochastic Gene Expression in Prokaryotes: A Point Process Approach
Stochastic Gene Expression in Prokaryotes: A Point Process Approach Emanuele LEONCINI INRIA Rocquencourt - INRA Jouy-en-Josas Mathematical Modeling in Cell Biology March 27 th 2013 Emanuele LEONCINI (INRIA)
More informationMultinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
More informationA linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form
Section 1.3 Matrix Products A linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form (scalar #1)(quantity #1) + (scalar #2)(quantity #2) +...
More informationGaussian Conjugate Prior Cheat Sheet
Gaussian Conjugate Prior Cheat Sheet Tom SF Haines 1 Purpose This document contains notes on how to handle the multivariate Gaussian 1 in a Bayesian setting. It focuses on the conjugate prior, its Bayesian
More informationDetection of changes in variance using binary segmentation and optimal partitioning
Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the
More informationNO CALCULATORS OR CELL PHONES ALLOWED
Biol 205 Exam 1 TEST FORM A Spring 2008 NAME Fill out both sides of the Scantron Sheet. On Side 2 be sure to indicate that you have TEST FORM A The answers to Part I should be placed on the SCANTRON SHEET.
More information