Scoring Matrices. Shifra Ben-Dor Irit Orr
|
|
|
- Jessie Cannon
- 9 years ago
- Views:
Transcription
1 Scoring Matrices Shifra Ben-Dor Irit Orr
2 Scoring matrices Sequence alignment and database searching programs compare sequences to each other as a series of characters. All algorithms (programs) for comparison rely on some scoring scheme for that. Scoring matrices are used to assign a score to each comparison of a pair of characters.
3 Scoring matrices The scores in the matrix are integer values. In most cases a positive score is given to identical or similar character pairs, and a negative or zero score to dissimilar character pairs.
4 Different types of matrices Identity scoring - the simplest scoring scheme, where characters are classified as: identical (scores 1), or non-identical (scores 0). This scoring scheme is not much used. DNA scoring - consider changes as transitions and transversions. This matrix scores identical bp 3, transitions 2, and transversions 0.
5 Different types of matrices Chemical similarity scoring (for proteins) - this matrix gives greater weight to amino acids with similar chemical properties (e.g size, shape or charge of the aa). Observed matrices for proteins - most commonly used by all programs. These matrices are constructed by analyzing the substitution frequencies seen in the alignments of known families of proteins.
6 Observed Scoring Matrices Every possible identity and substitution is assigned a score. This score is based on the observed frequencies of such occurrences in alignments of evolutionary related proteins. This score will also reflect the frequency that a particular amino acid occurs in nature, as some amino acids are more abundant than others.
7 Observed Scoring Matrices Identities are assigned the most positive scores Frequently observed substitutions also receive positive scores Mismatches, or matches that are unlikely to have been a result of evolution, are given negative scores.
8 Each matrix entry gives the ratio of the observed frequency of substitution between each possible pair of amino acids in related proteins to that expected by chance, given the frequencies of amino acids in proteins. These ratios are called odds scores. These ratios are transformed to logarithms of odds scores called log odds scores. Odds scores and log odds scores are used to score protein alignments
9 Different types of matrices Observed Scoring Matrices are superior to simple identity scores, or scores based solely on chemical propensities of the amino The most frequently used observed log odds matrices used are the PAM and BLOSUM matrices.
10 PAM Matrices Developed by Margaret Dayhoff and co-workers. Derived from global alignments of very similar sequences (at least 85% identity), so that there would be little likelihood of an observed change being the result of several successive mutations, but it should reflect one mutation only. PAM - Point Accepted Mutations.
11 An accepted point mutation in a protein is a replacement of one amino acid by another, accepted by natural selection. It is the result of two distinct processes: the first is the occurrence of a mutation in the portion of the gene template producing one amino acid of a protein the second is the acceptance of the mutation by the species as the new predominant form. To be accepted, the new amino acid usually must function in a way similar to the old one: chemical and physical similarities are found between the amino acids that are observed to interchange frequently.
12 Dayhoff estimated mutation rates from substitutions observed in closely related proteins and extrapolated those rates to model distant relationships. PAM gives the probability that a given amino acid will be replaced by any other particular amino acid after a given evolutionary interval, in this case 1 accepted point mutation per 100 amino acids.
13 When used for protein comparison, the mutation probability (odds) matrix is normalized and the logarithm is taken. (this lets us add the scores along a protein instead of multiplying the probabilities) The resulting matrix is the log-odds matrix, known as the PAM matrix.
14 PAM#=Point Accepted Mutations / 100 bases The number with the matrix (PAM120, PAM90), refers to the evolutionary distance. Greater numbers are greater distances. To derive PAM250 you multiply PAM1 250 times itself PAM250 is the matrix derived of sequences with 250 PAMs.
15 PAM250 At this evolutionary distance, only one amino acid in five remains unchanged. However, the amino acids vary greatly in their mutability; 55% of the tryptophans. 52% of the cysteines and 27% of the glycines would still be unchanged, but only 6% of the highly mutable asparagines would remain. Several other amino acids, particularly alanine, aspartic acid, glutamic acid, glycine, lysine, and serine are more likely to occur in place of an original asparagine than asparagine itself at this evolutionary distance!
16 PAM250 MATRIX # This matrix was produced by "pam" Version [28-Jul-93] # # PAM 250 substitution matrix, scale = ln(2)/3 = # # Expected score = , Entropy = bits # # Lowest score = -8, Highest score = 17 # A R N D C Q E G H I L K M F P S T W Y V B Z X * A R N D C Q E G H I L K M F P S T
17 Pet91 - an updated Dayhoff matrix Since the family of PAM matrices were derived from a comparatively small number of families, many of the possible mutations were not observed. Jones et al. have derived an updated matrix by examining a very large number of families, and created the PET91 scoring matrix.
18 Gonnet Matrices Another improvement on the PAM matrices Done when the database was larger All-against-all pairwise comparison to see all possible substitutions Optimized to find sequences that are further apart (as opposed to PAM, that started with very similar sequences)
19 BLOSUM Matrices Created by Henikoff & Henikoff, based on local multiple alignments of more distantly related sequences. First, multiple alignments of short regions (without gaps) of related sequences were gathered. In each alignment the sequences similar at some threshold value of percent identity were clustered into groups and averaged.
20 BLOSUM Matrices Substitution frequencies for all pairs of amino acids were calculated between the groups, this was used to create the log-odds BLOSUM ( Block Substitution Matrix ).
21 BLOSUM# - where # is the threshold identity percentage of the sequences clustered in those blocks. Thus, BLOSUM62 means that the sequences clustered in this block are at least 62% identical. This allows detection of more distantly related sequences, as it downplays the role of the more related sequences in the block when building the matrix.
22 BLOSUM62 MATRIX # Matrix made by matblas from blosum62.iij # * column uses minimum score # BLOSUM Clustered Scoring Matrix in 1/2 Bit Units # Blocks Database = /data/blocks_5.0/blocks.dat # Cluster Percentage: >= 62 # Entropy = , Expected = A R N D C Q E G H I L K M F P S T W Y V B Z X * A R N D C Q E G H I L K M F P S T W Y
23 So which observed matrix to use??? For global alignments use PAM matrices. Lower PAM matrices tend to find short alignments of highly similar regions. Higher PAM matrices will find weaker, longer alignments. For local alignments use BLOSUM matrices. BLOSUM matrices with HIGH number, are better for similar sequences. BLOSUM matrices with LOW number, are better for distant sequences.
24 Tips... When doing global alignment (and database scanning) of related (similar) sequences use PAM200 or PAM250. If you don t know what to expect (e.g. for database scanning) use PAM120. For local database scanning (e.g blast), or for ungapped, local alignments, use BLOSUM62 (recommended for proteins).
25 Tips. In all cases it is recommended to use more than one matrix for any database scanning.
Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment
Sequence Analysis 15: lecture 5 Substitution matrices Multiple sequence alignment A teacher's dilemma To understand... Multiple sequence alignment Substitution matrices Phylogenetic trees You first need
Amino Acids and Their Properties
Amino Acids and Their Properties Recap: ss-rrna and mutations Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used So by aligning ss-rrna of one organism with that
Rapid alignment methods: FASTA and BLAST. p The biological problem p Search strategies p FASTA p BLAST
Rapid alignment methods: FASTA and BLAST p The biological problem p Search strategies p FASTA p BLAST 257 BLAST: Basic Local Alignment Search Tool p BLAST (Altschul et al., 1990) and its variants are some
Network Protocol Analysis using Bioinformatics Algorithms
Network Protocol Analysis using Bioinformatics Algorithms Marshall A. Beddoe [email protected] ABSTRACT Network protocol analysis is currently performed by hand using only intuition and a protocol
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
Bio-Informatics Lectures. A Short Introduction
Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively
Pairwise Sequence Alignment
Pairwise Sequence Alignment [email protected] SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What
BLAST. Anders Gorm Pedersen & Rasmus Wernersson
BLAST Anders Gorm Pedersen & Rasmus Wernersson Database searching Using pairwise alignments to search databases for similar sequences Query sequence Database Database searching Most common use of pairwise
Pipe Cleaner Proteins. Essential question: How does the structure of proteins relate to their function in the cell?
Pipe Cleaner Proteins GPS: SB1 Students will analyze the nature of the relationships between structures and functions in living cells. Essential question: How does the structure of proteins relate to their
Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003
Similarity Searches on Sequence Databases: BLAST, FASTA Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Outline Importance of Similarity Heuristic Sequence Alignment:
Concluding lesson. Student manual. What kind of protein are you? (Basic)
Concluding lesson Student manual What kind of protein are you? (Basic) Part 1 The hereditary material of an organism is stored in a coded way on the DNA. This code consists of four different nucleotides:
H H N - C - C 2 R. Three possible forms (not counting R group) depending on ph
Amino acids - 0 common amino acids there are others found naturally but much less frequently - Common structure for amino acid - C, -N, and functional groups all attached to the alpha carbon N - C - C
PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: [email protected]
BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab Mai S. Mabrouk 1, Marwa Hamdy 2, Marwa Mamdouh 2, Marwa Aboelfotoh 2,Yasser M. Kadah 2 1 Biomedical Engineering Department,
Clone Manager. Getting Started
Clone Manager for Windows Professional Edition Volume 2 Alignment, Primer Operations Version 9.5 Getting Started Copyright 1994-2015 Scientific & Educational Software. All rights reserved. The software
Module 10: Bioinformatics
Module 10: Bioinformatics 1.) Goal: To understand the general approaches for basic in silico (computer) analysis of DNA- and protein sequences. We are going to discuss sequence formatting required prior
RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
Introduction to Phylogenetic Analysis
Subjects of this lecture Introduction to Phylogenetic nalysis Irit Orr 1 Introducing some of the terminology of phylogenetics. 2 Introducing some of the most commonly used methods for phylogenetic analysis.
Heuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations
Heuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations AlCoB 2014 First International Conference on Algorithms for Computational Biology Thiago da Silva Arruda Institute
Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.
Name: Class: Date: Chapter 17 Practice Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The correct order for the levels of Linnaeus's classification system,
Protein Sequence Analysis - Overview -
Protein Sequence Analysis - Overview - UDEL Workshop Raja Mazumder Research Associate Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center Topics Why do protein
Solving Systems of Linear Equations Using Matrices
Solving Systems of Linear Equations Using Matrices What is a Matrix? A matrix is a compact grid or array of numbers. It can be created from a system of equations and used to solve the system of equations.
BOC334 (Proteomics) Practical 1. Calculating the charge of proteins
BC334 (Proteomics) Practical 1 Calculating the charge of proteins Aliphatic amino acids (VAGLIP) N H 2 H Glycine, Gly, G no charge Hydrophobicity = 0.67 MW 57Da pk a CH = 2.35 pk a NH 2 = 9.6 pi=5.97 CH
The Central Dogma of Molecular Biology
Vierstraete Andy (version 1.01) 1/02/2000 -Page 1 - The Central Dogma of Molecular Biology Figure 1 : The Central Dogma of molecular biology. DNA contains the complete genetic information that defines
BIOINFORMATICS TUTORIAL
Bio 242 BIOINFORMATICS TUTORIAL Bio 242 α Amylase Lab Sequence Sequence Searches: BLAST Sequence Alignment: Clustal Omega 3d Structure & 3d Alignments DO NOT REMOVE FROM LAB. DO NOT WRITE IN THIS DOCUMENT.
Introduction to Bioinformatics 3. DNA editing and contig assembly
Introduction to Bioinformatics 3. DNA editing and contig assembly Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 [email protected]
Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.
Chapter 1 Vocabulary identity - A statement that equates two equivalent expressions. verbal model- A word equation that represents a real-life problem. algebraic expression - An expression with variables.
BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS
BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:
MAKING AN EVOLUTIONARY TREE
Student manual MAKING AN EVOLUTIONARY TREE THEORY The relationship between different species can be derived from different information sources. The connection between species may turn out by similarities
Phylogenetic Trees Made Easy
Phylogenetic Trees Made Easy A How-To Manual Fourth Edition Barry G. Hall University of Rochester, Emeritus and Bellingham Research Institute Sinauer Associates, Inc. Publishers Sunderland, Massachusetts
IV. -Amino Acids: carboxyl and amino groups bonded to -Carbon. V. Polypeptides and Proteins
IV. -Amino Acids: carboxyl and amino groups bonded to -Carbon A. Acid/Base properties 1. carboxyl group is proton donor! weak acid 2. amino group is proton acceptor! weak base 3. At physiological ph: H
Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6
Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 In the last lab, you learned how to perform basic multiple sequence alignments. While useful in themselves for determining conserved residues
PRACTICE TEST QUESTIONS
PART A: MULTIPLE CHOICE QUESTIONS PRACTICE TEST QUESTIONS DNA & PROTEIN SYNTHESIS B 1. One of the functions of DNA is to A. secrete vacuoles. B. make copies of itself. C. join amino acids to each other.
A successful market segmentation initiative answers the following critical business questions: * How can we a. Customer Status.
MARKET SEGMENTATION The simplest and most effective way to operate an organization is to deliver one product or service that meets the needs of one type of customer. However, to the delight of many organizations
Separation of Amino Acids by Paper Chromatography
Separation of Amino Acids by Paper Chromatography Chromatography is a common technique for separating chemical substances. The prefix chroma, which suggests color, comes from the fact that some of the
Bioinformatics Resources at a Glance
Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences
Amino Acids, Peptides, Proteins
Amino Acids, Peptides, Proteins Functions of proteins: Enzymes Transport and Storage Motion, muscle contraction Hormones Mechanical support Immune protection (Antibodies) Generate and transmit nerve impulses
Amino Acids and Proteins
Amino Acids and Proteins Proteins are composed of amino acids. There are 20 amino acids commonly found in proteins. All have: N2 C α R COO Amino acids at neutral p are dipolar ions (zwitterions) because
AMINO ACID ANALYSIS By High Performance Capillary Electrophoresis
AMINO ACID ANALYSIS By High Performance Capillary Electrophoresis Analysis of Amino Acid Standards Label free analysis using the HPCE-512 ABSTRACT Capillary electrophoresis using indirect UV detection
Paper: 6 Chemistry 2.130 University I Chemistry: Models Page: 2 of 7. 4. Which of the following weak acids would make the best buffer at ph = 5.0?
Paper: 6 Chemistry 2.130 University I Chemistry: Models Page: 2 of 7 4. Which of the following weak acids would make the best buffer at ph = 5.0? A) Acetic acid (Ka = 1.74 x 10-5 ) B) H 2 PO - 4 (Ka =
CSC 2427: Algorithms for Molecular Biology Spring 2006. Lecture 16 March 10
CSC 2427: Algorithms for Molecular Biology Spring 2006 Lecture 16 March 10 Lecturer: Michael Brudno Scribe: Jim Huang 16.1 Overview of proteins Proteins are long chains of amino acids (AA) which are produced
Advanced Medicinal & Pharmaceutical Chemistry CHEM 5412 Dept. of Chemistry, TAMUK
Advanced Medicinal & Pharmaceutical Chemistry CHEM 5412 Dept. of Chemistry, TAMUK Dai Lu, Ph.D. [email protected] Tel: 361-221-0745 Office: RCOP, Room 307 Drug Discovery and Development Drug Molecules Medicinal
Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon
Integrating DNA Motif Discovery and Genome-Wide Expression Analysis Department of Mathematics and Statistics University of Massachusetts Amherst Statistics in Functional Genomics Workshop Ascona, Switzerland
Biochemistry - I. Prof. S. Dasgupta Department of Chemistry Indian Institute of Technology, Kharagpur Lecture-11 Enzyme Mechanisms II
Biochemistry - I Prof. S. Dasgupta Department of Chemistry Indian Institute of Technology, Kharagpur Lecture-11 Enzyme Mechanisms II In the last class we studied the enzyme mechanisms of ribonuclease A
From DNA to Protein. Proteins. Chapter 13. Prokaryotes and Eukaryotes. The Path From Genes to Proteins. All proteins consist of polypeptide chains
Proteins From DNA to Protein Chapter 13 All proteins consist of polypeptide chains A linear sequence of amino acids Each chain corresponds to the nucleotide base sequence of a gene The Path From Genes
A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML
9 June 2011 A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML by Jun Inoue, Mario dos Reis, and Ziheng Yang In this tutorial we will analyze
Row Echelon Form and Reduced Row Echelon Form
These notes closely follow the presentation of the material given in David C Lay s textbook Linear Algebra and its Applications (3rd edition) These notes are intended primarily for in-class presentation
LabGenius. Technical design notes. The world s most advanced synthetic DNA libraries. [email protected] V1.5 NOV 15
LabGenius The world s most advanced synthetic DNA libraries Technical design notes [email protected] V1.5 NOV 15 Introduction OUR APPROACH LabGenius is a gene synthesis company focussed on the design and manufacture
Current Motif Discovery Tools and their Limitations
Current Motif Discovery Tools and their Limitations Philipp Bucher SIB / CIG Workshop 3 October 2006 Trendy Concepts and Hypotheses Transcription regulatory elements act in a context-dependent manner.
MASCOT Search Results Interpretation
The Mascot protein identification program (Matrix Science, Ltd.) uses statistical methods to assess the validity of a match. MS/MS data is not ideal. That is, there are unassignable peaks (noise) and usually
RNA and Protein Synthesis
Name lass Date RN and Protein Synthesis Information and Heredity Q: How does information fl ow from DN to RN to direct the synthesis of proteins? 13.1 What is RN? WHT I KNOW SMPLE NSWER: RN is a nucleic
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
Human Tubal Fluid (HTF) Media & Modifi ed Human Tubal Fluid (mhtf) Medium with Gentamicin
Human Tubal Fluid (HTF) Media & Modifi ed Human Tubal Fluid (mhtf) Medium with Gentamicin HTF Media are intended for use in assisted reproductive procedures which include gamete and embryo manipulation
Chapter 6 DNA Replication
Chapter 6 DNA Replication Each strand of the DNA double helix contains a sequence of nucleotides that is exactly complementary to the nucleotide sequence of its partner strand. Each strand can therefore
2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system
1. Systems of linear equations We are interested in the solutions to systems of linear equations. A linear equation is of the form 3x 5y + 2z + w = 3. The key thing is that we don t multiply the variables
Introduction to Bioinformatics 2. DNA Sequence Retrieval and comparison
Introduction to Bioinformatics 2. DNA Sequence Retrieval and comparison Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 [email protected]
Amino Acids, Proteins, and Enzymes. Primary and Secondary Structure Tertiary and Quaternary Structure Protein Hydrolysis and Denaturation
Amino Acids, Proteins, and Enzymes Primary and Secondary Structure Tertiary and Quaternary Structure Protein Hydrolysis and Denaturation 1 Primary Structure of Proteins H 3 N The particular sequence of
Okami Study Guide: Chapter 3 1
Okami Study Guide: Chapter 3 1 Chapter in Review 1. Heredity is the tendency of offspring to resemble their parents in various ways. Genes are units of heredity. They are functional strands of DNA grouped
2007 7.013 Problem Set 1 KEY
2007 7.013 Problem Set 1 KEY Due before 5 PM on FRIDAY, February 16, 2007. Turn answers in to the box outside of 68-120. PLEASE WRITE YOUR ANSWERS ON THIS PRINTOUT. 1. Where in a eukaryotic cell do you
MORPHEUS. http://biodev.cea.fr/morpheus/ Prediction of Transcription Factors Binding Sites based on Position Weight Matrix.
MORPHEUS http://biodev.cea.fr/morpheus/ Prediction of Transcription Factors Binding Sites based on Position Weight Matrix. Reference: MORPHEUS, a Webtool for Transcripton Factor Binding Analysis Using
Acids and Bases. but we will use the term Lewis acid to denote only those acids to which a bond can be made without breaking another bond
Acids and Bases. Brønsted acids are proton donors, and Brønsted bases are proton acceptors. Examples of Brønsted acids: HCl, HBr, H 2 SO 4, HOH, H 3 O +, + NH 4, NH 3, CH 3 CO 2 H, H CH 2 COCH 3, H C CH,
An introduction to Value-at-Risk Learning Curve September 2003
An introduction to Value-at-Risk Learning Curve September 2003 Value-at-Risk The introduction of Value-at-Risk (VaR) as an accepted methodology for quantifying market risk is part of the evolution of risk
a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.
Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given
11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
Ionization of amino acids
Amino Acids 20 common amino acids there are others found naturally but much less frequently Common structure for amino acid COOH, -NH 2, H and R functional groups all attached to the a carbon Ionization
Shu-Ping Lin, Ph.D. E-mail: [email protected]
Amino Acids & Proteins Shu-Ping Lin, Ph.D. Institute te of Biomedical Engineering ing E-mail: [email protected] Website: http://web.nchu.edu.tw/pweb/users/splin/ edu tw/pweb/users/splin/ Date: 10.13.2010
Module 1. Sequence Formats and Retrieval. Charles Steward
The Open Door Workshop Module 1 Sequence Formats and Retrieval Charles Steward 1 Aims Acquaint you with different file formats and associated annotations. Introduce different nucleotide and protein databases.
Here are some examples of combining elements and the operations used:
MATRIX OPERATIONS Summary of article: What is an operation? Addition of two matrices. Multiplication of a Matrix by a scalar. Subtraction of two matrices: two ways to do it. Combinations of Addition, Subtraction,
CD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/
CD-HIT User s Guide Last updated: April 5, 2010 http://cd-hit.org http://bioinformatics.org/cd-hit/ Program developed by Weizhong Li s lab at UCSD http://weizhong-lab.ucsd.edu [email protected] 1. Introduction
Amazing DNA facts. Hands-on DNA: A Question of Taste Amazing facts and quiz questions
Amazing DNA facts These facts can form the basis of a quiz (for example, how many base pairs are there in the human genome?). Students should be familiar with most of this material, so the quiz could be
Lecture 3: Models of Solutions
Materials Science & Metallurgy Master of Philosophy, Materials Modelling, Course MP4, Thermodynamics and Phase Diagrams, H. K. D. H. Bhadeshia Lecture 3: Models of Solutions List of Symbols Symbol G M
Learning from Diversity
Learning from Diversity Epitope Prediction with Sequence and Structure Features using an Ensemble of Support Vector Machines Rob Patro and Carl Kingsford Center for Bioinformatics and Computational Biology
Benford s Law and Digital Frequency Analysis
Get M.A.D. with the Numbers! Moving Benford s Law from Art to Science BY DAVID G. BANKS, CFE, CIA September/October 2000 Until recently, using Benford s Law was as much of an art as a science. Fraud examiners
ENZYME SCIENCE AND ENGINEERING PROF. SUBHASH CHAND DEPARTMENT OF BIOCHEMICAL ENGINEERING AND BIOTECHNOLOGY IIT DELHI LECTURE 4 ENZYMATIC CATALYSIS
ENZYME SCIENCE AND ENGINEERING PROF. SUBHASH CHAND DEPARTMENT OF BIOCHEMICAL ENGINEERING AND BIOTECHNOLOGY IIT DELHI LECTURE 4 ENZYMATIC CATALYSIS We will continue today our discussion on enzymatic catalysis
Built from 20 kinds of amino acids
Built from 20 kinds of amino acids Each Protein has a three dimensional structure. Majority of proteins are compact. Highly convoluted molecules. Proteins are folded polypeptides. There are four levels
Protein Prospector and Ways of Calculating Expectation Values
Protein Prospector and Ways of Calculating Expectation Values 1/16 Aenoch J. Lynn; Robert J. Chalkley; Peter R. Baker; Mark R. Segal; and Alma L. Burlingame University of California, San Francisco, San
Data for phylogenetic analysis
Data for phylogenetic analysis The data that are used to estimate the phylogeny of a set of tips are the characteristics of those tips. Therefore the success of phylogenetic inference depends in large
RNA Structure and folding
RNA Structure and folding Overview: The main functional biomolecules in cells are polymers DNA, RNA and proteins For RNA and Proteins, the specific sequence of the polymer dictates its final structure
Provincial Exam Questions. 9. Give one role of each of the following nucleic acids in the production of an enzyme.
Provincial Exam Questions Unit: Cell Biology: Protein Synthesis (B7 & B8) 2010 Jan 3. Describe the process of translation. (4 marks) 2009 Sample 8. What is the role of ribosomes in protein synthesis? A.
PHYLOGENETIC ANALYSIS
Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Second Edition Andreas D. Baxevanis, B.F. Francis Ouellette Copyright 2001 John Wiley & Sons, Inc. ISBNs: 0-471-38390-2 (Hardback);
Peptide Bond Amino acids are linked together by peptide bonds to form polypepetide chain.
Peptide Bond Peptide Bond Amino acids are linked together by peptide bonds to form polypepetide chain. + H 2 O 2 Peptide bonds are strong and not broken by conditions that denature proteins, such as heating.
LECTURE 6 Gene Mutation (Chapter 16.1-16.2)
LECTURE 6 Gene Mutation (Chapter 16.1-16.2) 1 Mutation: A permanent change in the genetic material that can be passed from parent to offspring. Mutant (genotype): An organism whose DNA differs from the
Guide for Bioinformatics Project Module 3
Structure- Based Evidence and Multiple Sequence Alignment In this module we will revisit some topics we started to look at while performing our BLAST search and looking at the CDD database in the first
2.6 Exponents and Order of Operations
2.6 Exponents and Order of Operations We begin this section with exponents applied to negative numbers. The idea of applying an exponent to a negative number is identical to that of a positive number (repeated
Mass Spectra Alignments and their Significance
Mass Spectra Alignments and their Significance Sebastian Böcker 1, Hans-Michael altenbach 2 1 Technische Fakultät, Universität Bielefeld 2 NRW Int l Graduate School in Bioinformatics and Genome Research,
MULTIPLE REGRESSION WITH CATEGORICAL DATA
DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting
Lecture 4: Exact string searching algorithms. Exact string search algorithms. Definitions. Exact string searching or matching
COSC 348: Computing for Bioinformatics Definitions A pattern (keyword) is an ordered sequence of symbols. Lecture 4: Exact string searching algorithms Lubica Benuskova http://www.cs.otago.ac.nz/cosc348/
agucacaaacgcu agugcuaguuua uaugcagucuua
RNA Secondary Structure Prediction: The Co-transcriptional effect on RNA folding agucacaaacgcu agugcuaguuua uaugcagucuua By Conrad Godfrey Abstract RNA secondary structure prediction is an area of bioinformatics
