Mass Spectra Alignments and their Significance



Similar documents
Tutorial for Proteomics Data Submission. Katalin F. Medzihradszky Robert J. Chalkley UCSF

Workshop IIc. Manual interpretation of MS/MS spectra. Ebbing de Jong. Center for Mass Spectrometry and Proteomics Phone (612) (612)

Laboration 1. Identifiering av proteiner med Mass Spektrometri. Klinisk Kemisk Diagnostik

Master course KEMM03 Principles of Mass Spectrometric Protein Characterization. Exam

Protein Prospector and Ways of Calculating Expectation Values

Preprocessing, Management, and Analysis of Mass Spectrometry Proteomics Data

Aiping Lu. Key Laboratory of System Biology Chinese Academic Society

Isotope distributions

MS Amanda Standalone User Manual

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance?

Introduction to mass spectrometry (MS) based proteomics and metabolomics

Error Tolerant Searching of Uninterpreted MS/MS Data

Challenges in Computational Analysis of Mass Spectrometry Data for Proteomics

PeptidomicsDB: a new platform for sharing MS/MS data.

m/z

Pesticide Analysis by Mass Spectrometry

Pep-Miner: A Novel Technology for Mass Spectrometry-Based Proteomics

Interpretation of MS-Based Proteomics Data

Advantages of the LTQ Orbitrap for Protein Identification in Complex Digests

ProteinScape. Innovation with Integrity. Proteomics Data Analysis & Management. Mass Spectrometry

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006

Increasing the Multiplexing of High Resolution Targeted Peptide Quantification Assays

In-Depth Qualitative Analysis of Complex Proteomic Samples Using High Quality MS/MS at Fast Acquisition Rates

Quantitative proteomics background

Global and Discovery Proteomics Lecture Agenda

13C NMR Spectroscopy

Mass Spectrometry Based Proteomics

Introduction to Proteomics 1.0

MASCOT Search Results Interpretation

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

Already said. Already said. Outlook. Look at LC-MS data. A look at data for quantitative analysis using MSight and Phenyx. What data for quantitation?

Using MATLAB: Bioinformatics Toolbox for Life Sciences

泛 用 蛋 白 質 體 學 之 質 譜 儀 資 料 分 析 平 台 的 建 立 與 應 用 Universal Mass Spectrometry Data Analysis Platform for Quantitative and Qualitative Proteomics

Title: Protein Isolation and Identification in Genetically Modified Food

Mass Spectrometry Signal Calibration for Protein Quantitation

Proteomics in Practice

Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments

Shotgun Proteomic Analysis. Department of Cell Biology The Scripps Research Institute

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

RNA Structure and folding

Learning Objectives:

Searching Nucleotide Databases

Agilent G2721AA/G2733AA Spectrum Mill MS Proteomics Workbench

using ms based proteomics

Lecture 4: Exact string searching algorithms. Exact string search algorithms. Definitions. Exact string searching or matching

Effects of Intelligent Data Acquisition and Fast Laser Speed on Analysis of Complex Protein Digests

Data, Measurements, Features

Proteomic Analysis using Accurate Mass Tags. Gordon Anderson PNNL January 4-5, 2005

Organic Spectroscopy. UV - Ultraviolet-Visible Spectroscopy. !! nm. Methods for structure determination of organic compounds:

Tutorial 9: SWATH data analysis in Skyline

ProteinPilot Report for ProteinPilot Software

Bioinformatics Resources at a Glance

Un (bref) aperçu des méthodes et outils de fouilles et de visualisation de données «omics»

Quan%ta%ve proteomics. Maarten Altelaar, 2014

Application Note # LCMS-66 Straightforward N-glycopeptide analysis combining fast ion trap data acquisition with new ProteinScape functionalities

Regular Languages and Finite State Machines

ST 371 (IV): Discrete Random Variables

Statistical Analysis Strategies for Shotgun Proteomics Data

MarkerView Software for Metabolomic and Biomarker Profiling Analysis

MassMatrix Web Server User Manual

Functional Data Analysis of MALDI TOF Protein Spectra

ProSightPC 3.0 Quick Start Guide

Application Note # LCMS-81 Introducing New Proteomics Acquisiton Strategies with the compact Towards the Universal Proteomics Acquisition Method

SimGlycan Software*: A New Predictive Carbohydrate Analysis Tool for MS/MS Data

For example: (Example is from page 50 of the Thinkbook)

Thermo Scientific PepFinder Software A New Paradigm for Peptide Mapping

La Protéomique : Etat de l art et perspectives

Mascot Search Results FAQ

OplAnalyzer: A Toolbox for MALDI-TOF Mass Spectrometry Data Analysis

Nucleic Acid Purity Assessment using A 260 /A 280 Ratios

Accurate Mass Screening Workflows for the Analysis of Novel Psychoactive Substances

MultiQuant Software 2.0 for Targeted Protein / Peptide Quantification

Mass Frontier 7.0 Quick Start Guide

Nuclear Magnetic Resonance notes

Activity 7.21 Transcription factors

MaxQuant User s Guide Version

Methods for Protein Analysis

Module 10: Bioinformatics

Electrospray Ion Trap Mass Spectrometry. Introduction

F321 THE STRUCTURE OF ATOMS. ATOMS Atoms consist of a number of fundamental particles, the most important are... in the nucleus of an atom

Introduction to Database Searching using MASCOT

Infrared Spectroscopy 紅 外 線 光 譜 儀

How Mascot Integra helps run a Core Lab

FACULTY OF MEDICAL SCIENCE

A Navigation through the Tracefinder Software Structure and Workflow Options. Frans Schoutsen Pesticide Symposium Prague 27 April 2015

Fast and Automatic Mapping of Disulfide Bonds in a Monoclonal Antibody using SYNAPT G2 HDMS and BiopharmaLynx 1.3

Absolute quantification of low abundance proteins by shotgun proteomics

High resolution mass spectrometry (HRMS*) in Graz

for mass spectrometry calibration tools Thermo Scientific Pierce Controls and Standards for Mass Spectrometry

Nuclear Structure. particle relative charge relative mass proton +1 1 atomic mass unit neutron 0 1 atomic mass unit electron -1 negligible mass

4. It is possible to excite, or flip the nuclear magnetic vector from the α-state to the β-state by bridging the energy gap between the two. This is a

CHAPTER 6. Shannon entropy

Transcription:

Mass Spectra Alignments and their Significance Sebastian Böcker 1, Hans-Michael altenbach 2 1 Technische Fakultät, Universität Bielefeld 2 NRW Int l Graduate School in Bioinformatics and Genome Research, Universität Bielefeld

Overview Mass Spectrometry in Proteomics Protein Identification via MS Alignment of Spectra Score Significance Conclusion

Overview Mass Spectrometry in Proteomics Protein Identification via MS Alignment of Spectra Score Significance Conclusion

Overview Mass Spectrometry in Proteomics Protein Identification via MS Alignment of Spectra Score Significance Conclusion

Overview Mass Spectrometry in Proteomics Protein Identification via MS Alignment of Spectra Score Significance Conclusion

Overview Mass Spectrometry in Proteomics Protein Identification via MS Alignment of Spectra Score Significance Conclusion

Proteins Biology Proteins are directed polymers of 20 different amino acids. G T D S T N D M A T I Q A T S A Mathematics Proteins are strings over an alphabet Σ.

Mass Spectrometry Mass Spectrometry in Bioscience Mass spectrometry measures the masses and quantity of molecules in a probe. It is widely used in biosciences to identify proteins and other biomolecules.

Fragmentation of peptides Problem Solely measuring the mass of a protein is not sufficient for identification. G T D S T N D M T I Q A A A T S abundance mass Idea Break up the protein into smaller pieces in a deterministic way. The spectrum of these pieces is called a fingerprint of the protein.

Fragmentation of peptides Problem Solely measuring the mass of a protein is not sufficient for identification. G T D S T N D M T I Q A A A T S abundance mass Idea Break up the protein into smaller pieces in a deterministic way. The spectrum of these pieces is called a fingerprint of the protein. D M A A T S T I Q A G T D S T N abundance mass

Peptide Mass Fingerprints Enzymatic cleavage example An enzyme cuts amino acid sequence after each letter. G T D S T N D M T I Q A T A S A

Peptide Mass Fingerprints Enzymatic cleavage example An enzyme cuts amino acid sequence after each letter. G T D S T N D M T I Q A A T S A

Peptide Mass Fingerprints Artificial Spectrum of GTDSTNDMASTAAQIT Rel. Abundance 0.70 0.75 0.80 0.85 0.90 0.95 1.00 A / 199.3618 QIT / 343.3801 DM / 374.4614 ASTA / 458.5151 GTDSTN / 703.7071 200 300 400 500 600 700 Mass

Real Mass Spectrum (PMF peaks annotated)

Processing the spectrum Peak extraction Spectra are summarized into peak lists, but extracting peaks is inherently difficult. Problem: Peak lists are never correct Inaccurate calibration Probe contamination Peak detection...

Identification Protein Identification w/ PMF Isolate many copies of ONE protein Digest it into specific smaller fragments (Mass Fingerprint) Make a mass spectrum of these fragments Compare spectrum to all predicted mass spectra from DB Mass Fingerprint via Mass Spectrometry Peaklist Peaklist Mass Fingerprint via in-silico fragmentation Comparison Score + Significance AVPPTVHIIT... VVGTASILLYV... VVNMTREEEASD... QEVFGGTELLPP... PLMRPHGTFD...... LMMMTGERDFG... HILMLVFDSAQ...

Identification Protein Identification w/ PMF Isolate many copies of ONE protein Digest it into specific smaller fragments (Mass Fingerprint) Make a mass spectrum of these fragments Compare spectrum to all predicted mass spectra from DB Mass Fingerprint via Mass Spectrometry Peaklist Peaklist Mass Fingerprint via in-silico fragmentation Comparison Score + Significance AVPPTVHIIT... VVGTASILLYV... VVNMTREEEASD... QEVFGGTELLPP... PLMRPHGTFD...... LMMMTGERDFG... HILMLVFDSAQ...

Comparing Two Peak Lists Peaklists and Empty Peaks Let S m, S p be an extracted and a predicted peaklist. Let ε denote a special gap peak. Scoring Scheme Each assignment between the two peak lists can be scored: score(s p, S m ) = score(i, j) matched peaks matched i,j + missing + additional score(i, ε) score(ε, j) missing peaks additional peaks

Matching peaklists Matching One-to-one peak matching Peak matchings should not cross Any peak must be matched either to a peak or to the gap peak Matching score mainly based on mass difference but can include other features Best matching Using such scoring schemes, the best peaklist matching can be computed using standard global alignment.

Scoring scheme example: Peak counting Peak counting score score(i, j) = { 1 mass(i) mass(j) δ 0 else score(i, ε) = score(ε, j) = 0 δ = 10, S m = {1000, 1230, 1500} and S p = {1000, 1235, 1700} Alignment S p 1000 1235 ε 1700 S m 1000 1230 1500 ε score(s m, S p ) = (1 + 1) + 0 + 0 = 2.

Estimating the score distribution Problem The score distribution depends on Measured spectrum Sequence length Mass and probability of characters Estimation techniques Different null-models: Sampling against spectra or sampling against sequences Sampling against sequences Random or DB sequences both take long time Estimation of moments Works with certain classes of distributions

Score distribution Claim In most useful cases, the score distribution for fixed string length can be well approximated by a normal distribution and is then determined by expectation and variance. Missing and additional scores are usually very small compared to matches.

Computing moments Main Idea Probability of a peak corresponds to probability of a fragment of same mass in peptide. Discretize masses by scaling and rounding Compute probability of fragment of length l with mass m Compute probability of string of length L to have no fragment of peak mass m Can all be done in preprocessing Estimate moments Compute p-value

Computing moments Main Idea Probability of a peak corresponds to probability of a fragment of same mass in peptide. Discretize masses by scaling and rounding Compute probability of fragment of length l with mass m Compute probability of string of length L to have no fragment of peak mass m Can all be done in preprocessing Estimate moments Compute p-value

Fragment probability Weighted Alphabet We call the tuple (Σ, µ) with mass function µ : Σ N an (integer) weighted alphabet. Define µ(s) := s k=1 µ(s k). Fragments Let x be the cleavage character and Σ x = Σ\{x}. The number of fragments of length l with mass m is then given by c[l, m] = c[l 1, m µ(σ)] σ Σ x,µ(σ) m and for uniform character distribution we get the probability r[l, m] = 1 c[l, m] Σ x l

Probability in Strings Main idea We compute prob. of string having NO fragment of mass m. Then the very first fragment must not have mass m and the following string must have no fragment of mass m. Iterate. p[l,m] G T D S T N D M A S T A A Q I T

Probability in Strings Main idea We compute prob. of string having NO fragment of mass m. Then the very first fragment must not have mass m and the following string must have no fragment of mass m. Iterate. p[l,m] G T D S T N D M A S T A A Q I T 1st cleavage site at position l

Probability in Strings Main idea We compute prob. of string having NO fragment of mass m. Then the very first fragment must not have mass m and the following string must have no fragment of mass m. Iterate. p[l,m] G T D S T N D M A S T A A Q I T r[l-1,m-µ()] 1st cleavage site at position l

Probability in Strings Main idea We compute prob. of string having NO fragment of mass m. Then the very first fragment must not have mass m and the following string must have no fragment of mass m. Iterate. p[l,m] G T D S T N D M A S T A A Q I T r[l-1,m-µ()] 1st cleavage site at position l p[l-l,m]

Probability in Strings Main idea We compute prob. of string having NO fragment of mass m. Then the very first fragment must not have mass m and the following string must have no fragment of mass m. Iterate. The prob. of s Σ L to have NO fragment of mass m is given by p[l, m] = r[l, m] P (no cleavage at all) L + r[l 1, m µ(x)] P (first cleavage at l) p[l l, m] } {{ } } {{ } l=1 first frag. suffix left

Expected match score of a peak Score Score distribution threshold M1 U1 U2 M2 U3 M3 m/z [Da] Extracted Peaks The expected value of extracted peak j with support U j is E(matchscore(j)) = m U j p[l, m] score(mass(j), m)

Conclusion Main features Scoring schemes allow very flexible identification routines Computation of significance is database independent Extension to other cleavage schemes possible Extension to nonuniform alphabets and to isotope masses straightforward