Interpretation of MS-Based Proteomics Data

Similar documents

泛用蛋白質體學之質譜儀資料分析平台的建立與應用 Universal Mass Spectrometry Data Analysis Platform for Quantitative and Qualitative Proteomics

Tutorial for Proteomics Data Submission. Katalin F. Medzihradszky Robert J. Chalkley UCSF

Global and Discovery Proteomics Lecture Agenda

Aiping Lu. Key Laboratory of System Biology Chinese Academic Society

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance?

Protein Prospector and Ways of Calculating Expectation Values

Error Tolerant Searching of Uninterpreted MS/MS Data

MASCOT Search Results Interpretation

Quantitative proteomics background

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

Session 1. Course Presentation: Mass spectrometry-based proteomics for molecular and cellular biologists

Challenges in Computational Analysis of Mass Spectrometry Data for Proteomics

Mass Spectrometry Based Proteomics

Introduction to Database Searching using MASCOT

Searching Nucleotide Databases

ProteinPilot Report for ProteinPilot Software

ProSightPC 3.0 Quick Start Guide

Master course KEMM03 Principles of Mass Spectrometric Protein Characterization. Exam

ProteinScape. Innovation with Integrity. Proteomics Data Analysis & Management. Mass Spectrometry

Effects of Intelligent Data Acquisition and Fast Laser Speed on Analysis of Complex Protein Digests

MRMPilot Software: Accelerating MRM Assay Development for Targeted Quantitative Proteomics

Workshop IIc. Manual interpretation of MS/MS spectra. Ebbing de Jong. Center for Mass Spectrometry and Proteomics Phone (612) (612)

Advantages of the LTQ Orbitrap for Protein Identification in Complex Digests

Proteomic data analysis for Orbitrap datasets using Resources available at MSI. September 28 th 2011 Pratik Jagtap

Application Note # LCMS-81 Introducing New Proteomics Acquisiton Strategies with the compact Towards the Universal Proteomics Acquisition Method

MultiQuant Software 2.0 for Targeted Protein / Peptide Quantification

Introduction to Proteomics 1.0

Laboration 1. Identifiering av proteiner med Mass Spektrometri. Klinisk Kemisk Diagnostik

Mass Spectra Alignments and their Significance

In-Depth Qualitative Analysis of Complex Proteomic Samples Using High Quality MS/MS at Fast Acquisition Rates

Pinpointing phosphorylation sites using Selected Reaction Monitoring and Skyline

Proteomic Analysis using Accurate Mass Tags. Gordon Anderson PNNL January 4-5, 2005

Application Note # MT-90 MALDI-TDS: A Coherent MALDI Top-Down-Sequencing Approach Applied to the ABRF-Protein Research Group Study 2008

Shotgun Proteomic Analysis. Department of Cell Biology The Scripps Research Institute

The Scheduled MRM Algorithm Enables Intelligent Use of Retention Time During Multiple Reaction Monitoring

Accurate Mass Screening Workflows for the Analysis of Novel Psychoactive Substances

AN ITERATIVE ALGORITHM TO QUANTIFY THE FACTORS INFLUENCING PEPTIDE FRAGMENTATION FOR MS/MS SPECTRUM

Analysis, statistical validation and dissemination of large-scale proteomics datasets generated by tandem MS

AB SCIEX TOF/TOF 4800 PLUS SYSTEM. Cost effective flexibility for your core needs

Quan%ta%ve proteomics. Maarten Altelaar, 2014

Increasing the Multiplexing of High Resolution Targeted Peptide Quantification Assays

# LCMS-35 esquire series. Application of LC/APCI Ion Trap Tandem Mass Spectrometry for the Multiresidue Analysis of Pesticides in Water

Mascot Search Results FAQ

Copyright 2007 Casa Software Ltd. ToF Mass Calibration

Pep-Miner: A Novel Technology for Mass Spectrometry-Based Proteomics

Chapter 14. Modeling Experimental Design for Proteomics. Jan Eriksson and David Fenyö. Abstract. 1. Introduction

Mass Spectrometry Signal Calibration for Protein Quantitation

Tutorial 9: SWATH data analysis in Skyline

Thermo Scientific PepFinder Software A New Paradigm for Peptide Mapping

Computational analysis of unassigned high-quality MS/MS spectra in proteomic data sets

Development of computational methods for analysing proteomic data for genome annotation

Pesticide Analysis by Mass Spectrometry

PeptidomicsDB: a new platform for sharing MS/MS data.

Overview. Introduction. AB SCIEX MPX -2 High Throughput TripleTOF 4600 LC/MS/MS System

Statistical Analysis Strategies for Shotgun Proteomics Data

SimGlycan Software*: A New Predictive Carbohydrate Analysis Tool for MS/MS Data

PROTEIN SEQUENCING. First Sequence

Mass Spectrometry. Overview

Mass Frontier Version 7.0

Thermo Scientific GC-MS Data Acquisition Instructions for Cerno Bioscience MassWorks Software

Introduction to Proteomics

DRUG METABOLISM. Drug discovery & development solutions FOR DRUG METABOLISM

Research-grade Targeted Proteomics Assay Development: PRMs for PTM Studies with Skyline or, How I learned to ditch the triple quad and love the QE

13C NMR Spectroscopy

Proteomics in Practice

TECHNICAL BULLETIN. ProteoMass Peptide & Protein MALDI-MS Calibration Kit. Catalog Number MSCAL1 Store at Room Temperature

Introduction to mass spectrometry (MS) based proteomics and metabolomics

Introduction to Proteomics

Principles and Applications of Proteomics

using ms based proteomics

Increasing Quality While Maintaining Efficiency in Drug Chemistry with DART-TOF MS Screening

PosterREPRINT AN LC/MS ORTHOGONAL TOF (TIME OF FLIGHT) MASS SPECTROMETER WITH INCREASED TRANSMISSION, RESOLUTION, AND DYNAMIC RANGE OVERVIEW

1 Genzyme Corp., Framingham, MA, 2 Positive Probability Ltd, Isleham, U.K.

Agilent G2721AA/G2733AA Spectrum Mill MS Proteomics Workbench

Rapid and Reproducible Amino Acid Analysis of Physiological Fluids for Clinical Research Using LC/MS/MS with the atraq Kit

Simultaneous qualitative and quantitative analysis using the Agilent 6540 Accurate-Mass Q-TOF

Sub menu of functions to give the user overall information about the data in the file

A Streamlined Workflow for Untargeted Metabolomics

Introduction. What Can an Offline Desktop Processing Tool Provide for a Chemist?

Database Searching Tutorial/Exercises Jimmy Eng

Metabolomics Software Tools. Xiuxia Du, Paul Benton, Stephen Barnes

Q-TOF User s Booklet. Enter your username and password to login. After you login, the data acquisition program will automatically start.

Mass Frontier 7.0 Quick Start Guide

Practical Analysis of Proteome Data Using Bioinformatics and Statistics

Improving the Metabolite Identification Process with Efficiency and Speed: LightSight Software for Metabolite Identification

Correlation of the Mass Spectrometric Analysis of Heat-Treated Glutaraldehyde Preparations to Their 235nm / 280 nm UV Absorbance Ratio

Preprocessing, Management, and Analysis of Mass Spectrometry Proteomics Data

MS Amanda Standalone User Manual

1 st day Basic Training Course

Analysis of Mass Spectrometry. Data for Protein Identification. in Complex Biological Mixtures

Transcription:

Interpretation of MS-Based Proteomics Data Yet-Ran Chen, 陳逸然 Agricultural Biotechnology Research Center Academia Sinica Brief Overview of Protein Identification Workflow Protein Sample Specific Protein Fragmentation Chemical Enzymatic High Energy CID Peptides MS Analysis MS Analysis MS/MS Analysis of Peptide Fragment MS Spectrum MS/MS Spectrum Calculation of Peptide Mass Calculation of Peptide Fragment Mass Protein Identification MS Spectrum? Mass 1

Basics for the Mass Spectrometric Data Basics for the Mass Spectrometric Data Typical MS Spectrum RKKR 9/4/2005 10:55:04 PM C24 H48 N12 O4 Centroid 568.39 Simulation RKKR Centroid 15 Resolution: Daltons 0.25 at 5% height Charges 1 10 Chrg dist 0 Ions 5334 569.39 Min Ion Ab. 1e-020 5 Min Ions 5000 Max Ions. 20000 570.40 568.01 568.76 569.77 571.02 571.40 571.77 572.41 572.79 573.40 573.78 0 567 568 569 570 571 572 573 m/z Abundance RKKR C24 H48 N12 O4 Abundance Profile 0.7 0.6 0.5 0.4 0.3 0.2 568.39 9/4/2005 10:54:35 PM 569.39 0.1 570.39 571.39 572.40 573.40 0.0 567.0 567.5 568.0 568.5 569.0 569.5 570.0 570.5 571.0 571.5 572.0 572.5 573.0 573.5 m/z Simulation RKKR Profile Resolution: Daltons 0.25 at 5% height Charges 1 Chrg dist 0 Ions 5334 M in Ion Ab. 1e-020 M in Ions 5000 M ax Ions. 20000 2

Basics for the Mass Spectrometric Data Three Major Definitions of Mass Monoisotopic mass: mass of an ion for a given empirical formula calculated using the exact mass of the most abundant isotope of each element Average mass: mass of an ion for a given empirical formula calculated using the average atomic weight averaged over all the isotopes for each element Nominal mass: mass of an ion with a given empirical formula calculated using the integer mass of the most abundant isotope of each element Basics for the Mass Spectrometric Data The Isotopic Pattern RKKR 9/4/200510:33:55 PM C24 H49 N12 O4 569.39997 0.70 0.65 0.60 0.55 0.50 Simulation RKKR +H Profile Resolution: Daltons 0.5 at half height Charges 1 Chrg dist 0 Ions 5348 M in Ion Ab. 1e-020 M in Ions 5000 Max Ions. 20000 m/z Relative Abundance % 569.3997 100 570.3997 27 571.2997 6 0.45 Abundance 0.40 0.35 0.30 0.25 0.20 0.15 0.10 570.39997 Nominal: 569 Monoisotopic: 569.39997 Average : 569.72386 Most abundant: 569.39997 0.05 571.39997 572.39997 573.39997 0.00 568 569 570 571 572 573 m/z 3

Basics for the Mass Spectrometric Data Definition of MS Resolution R m m m FWHM FWHM (Full Width at Half Maximum) Basics for the Mass Spectrometric Data Molecular Weight Determination Low Resolution vs. High Resolution Monoisotopic Mass Averaged Mass 4

Basics for the Mass Spectrometric Data Molecular Weight Determination Single and Multiple Charged MS Spectrum P1 =(Monoisotopic Mass+H)/1 (m/z) = 1 Single Charged P2 =(Monoisotopic Mass+1+H)/1 P1 =(Monoisotopic Mass+2H)/2 Double Charged (m/z) = 0.5 P2 =(Monoisotopic Mass+1+2H)/2 Charge State Z = 1 / ( p2-p1) then Molecular Weight MW = p1 * Z Z = (p1-1)*z Basics for the Mass Spectrometric Data Molecular Weight Determination Unresolved Isotopic Peaks 5

Basics for the Mass Spectrometric Data Choosing Valid Spectra Peaks for the Protein ID Resolving Power Ionization Method Protein Identification by Database Searching 6

Protein Identification by Database Searching Peptide Mass Fingerprinting (PMF) Protein Sample Specific Protein Fragmentation Chemical Enzymatic High Energy CID Peptides MS Analysis MS Analysis MS/MS Analysis of Peptide Fragments MS Spectrum MS/MS Spectrum Calculation of Peptide Mass Calculation of Peptide Fragment Mass MS/MS Ion Search How to Produce Confidence Experimental Result Use common search algorithm(s) and database to search Employ method (s) for evaluation of the FP rate Plan to integrate data for searching to find weak associations not evident in single datasets Employ common/consistent annotation of results Store data in original instrument vendor format in as minimally processed form as possible (exchange formats in flux) Files contain all the interesting info in unprocessed form - parent peak intensities for quantitation - resolution, peak spacing (charge states) - acquisition parameters Etc.. Guideline Draft for Proteomics Data Publication Molecular Cellular Proteomics, July 14, 2005 7

The Peptide Mass Fingerprinting (PMF) The Peptide Mass Fingerprinting Fenyo D Curr Opin Biotechnol 11:391 (2000) 8

Data Flow of Peptide Mass Fingerprinting Digesting the complete database with the enzyme in question. Calculating the peptide masses. Finding the location with most fitting masses. Problems Some masses are quite abundant Not annotated genomes could pose problem Mass Distribution of Tryptic Peptide The genomic databases were translated in all six reading frames on the fly and then digested with Trypsin (R K). 9

The MOWSE Algorithm D.N. Perkins et al., Electrophoresis 1999, 20, 3551-3567 The MOWSE Algorithm D.N. Perkins et al., Electrophoresis 1999, 20, 3551-3567 10

The MOWSE Scoring Scheme I ProteinDB BIN1 BIN2 BIN10 BIN10 10kDa ~ Protein MW ~ 100kDa In Silico Digest Peptides Generated from 90-100kDa of ProteinDB bin1 bin2 bin10 Peptide # bin10 Peptide MW 3 kda 2.9kDa Peptide MW 3kDa The MOWSE Scoring Scheme II Cell Freq. = # of peptides in specific MW / # of total peptides F-max F-max= most freq. peptide # m i,j f i,j f i,j max in column j Cell Freq. 2.9kDa 1 Peptide MW 3kDa Score 50,000 m M i, j P n Cell Value 2.9kDa F - Cell Frequency m - Cell Value Mp - 'hit' protein molecular weight Average Protein MW ~ 50000 Da Peptide MW 3kDa 11

MS-Fit Search Result Best Match (Significant Hit?) Random Match? Evaluation of the Search Result Quality Fenyo D Curr Opin Biotechnol 11:391 (2000) 12

Testing the Significant in Protein Assignment Lukas Mueller (2003) Implementation of a MS/MS identification algorithm by spectral alignment and optimization of the scoring function by genetic programming Distribution of Identification Score Distribution Curve Fitting (Gaussian or.) Significant Threshold 1 p = 0.05 Significant Threshold 2 p << 0.05 Random Hit Significant Hit? 13

The MASCOT Search Engine Probability-Based implementation of the MOWSE algorithm The total score is the absolute probability (P) that the observed score is a random evant The Mascot score is given as -10xlog(P) MASCOT PMF Search Result 14

MASCOT PMF Search Result Probability-Based Significant Threshold The Mascot score is given as -10xlog(P) P: probability that the observed score is a random event Interpretation PMF Search Result 15

MASCOT PMF Search Result Significant Threshold What p-value p is significant? The most common thresholds are 0.01 ~ 0.05. A threshold of 0.05 means you are 95% sure that the result is significant. Is 95% enough? It depends upon the cost associated with making a mistake. Examples of costs: Doing expensive wet lab validation. Making clinical treatment decisions. Misleading the scientific community. 16

The Peptide Mass Fingerprinting Fast, simple analysis High sensitivity Protein sequence database required Not suitable for mixtures or low MW proteins Peptide Identification from the MS/MS Spectra 17

Issues to be Considered * Q1 200 400 600 80010001200 m/z Theoretical Q2 Collision Cell Selection of search algorithm (s) Search Param. Selection of interrogate protein/genomic DB Correlative sequence database searching Peptide ID Q3 200 400 600 80010001200 m/z Acquired Verification of Peptide ID 200 40060080010001200 m/z Tandem mass spectrum Peak Extraction Protein ID Verification of Protein ID Apportionment of Proteins from ID peptides Evaluation of false positive rate of ID protein Degenerate peptide Degenerate peptides are more prevalent with databases of higher eukaryotes due to the presence of: related protein family members alternative splice forms partial sequences DB redundancy Single hit Peak Extraction RKKR C24 H48 N12 O4 Abundance Profile 0.7 0.6 0.5 0.4 0.3 0.2 568.39 9/4/2005 10:54:35 PM 569.39 Simulation RKKR Profile Resolution: Daltons 0.25 at 5% height Charges 1 Chrg dist 0 Ions 5334 M in Io n A b. 1e-020 M in Ions 5000 M ax Ions. 20000 0.1 570.39 571.39 572.40 573.40 0.0 567.0 567.5 568.0 568.5 569.0 569.5 570.0 570.5 571.0 571.5 572.0 572.5 573.0 573.5 m/z RKKR 9/4/2005 10:55:04 PM C24 H48 N12 O4 Centroid 568.39 Simulation RKKR Centroid 15 Resolution: Daltons 0.25 at 5% height Charges 1 10 Chrg dist 0 Ions 5334 569.39 Min Ion Ab. 1e-020 5 Min Ions 5000 Max Ions. 20000 570.40 568.01 568.76 569.77 571.02 571.40 571.77 572.41 572.79 573.40 573.78 0 567 568 569 570 571 572 573 m/z Abundance 18

How we extract the m/z info from raw spectra?? 280 260 240 100 % 90 % 369.1306 220 200 180 160 140 50 % 120 100 80 60 40 20 10 % +TO F P roduct (600.4): Experim ent 2, 10.388 m in from BS A_050726_04.wiff a=3.56206655064564900e-004, t0=5.30412452082455270e+001 M ax. 161.0 counts. 0 369.06 369.08 369.10 369.12 369.14 369.16 369.18 369.20 369.22 369.24 m/z, amu 70 65 60 55 50 45 40 35 535.1662 650.2076 30 25 276.0676 20 15 10 5 225.0400 662.6147 312.0596 761.2545 234.0831 375.1473 590.8756 632.1403 778.2572 0 200 400 600 800 1000 1200 1400 1600 1800 2000 m /z, a m u Matching Accuracy Using Different Centroid Settings 15 % Height 50 % Height 95 % Height Calibration without centroid (Calibrate by the Peak Apex) Calibration with centroid (Centroid Peak Height: 50%) (Can we use raw data??) 19

Selection of Search Algorithm MASCOT Probability Based MOWSE Scoring ASMS Workshop and User Meeting 2005 Selection of Search Algorithm SEQUEST generate a predicted spectrum for each potential peptide using a simple fragmentation model (all b and y ions have the same intensity; possible losses from b and y have a lower intensity) compute a "cross-correlation" correlation" score and find the best-matching peptide since this operation is very time-consuming, a simpler preliminary score is used to find the 500 peptides in the database that are most likely to be the correct identification 20

Selection of Search Algorithm SEQUEST Step 1: Mass spectrometry data reduction Fragment ion m/z are converted to nearest integer. (nominal mass) 10-u window around precursor ion removed All but the 200 most abundant ions are removed and remaining ions are renormalized to 100. Step 2: Peptide Mass Matching To match a pair of spectra, protein sequences are retrieved from the database which have masses (within a certain mass tolerance) matching the peptide of interest. m/z values for the predicted fragment ions of each sequence are calculated Step 3: Preliminary Scoring S p =(ΣI K )m(1+ )(1+ )/L m: number of matches (within ±1u) : 'reward' consecutive matched ion series (for example, 0.075) 'reward' of immonium ion (for example, 0.15) L: is the number of all theoretical ions of an amino acid sequence. Step 4: Cross-Correlation Analysis (Xcorr) R (E,T)=ΣX i Y i+ R Xcorr Xcorr of a candidate Value of R when =0 minus the mean of R (over the range -75< <75 and divided by 10 4 ) Selection of Search Algorithm SEQUEST - The R R (E,T)=ΣX Y b and y ions : 50 b and y ions ± 1 u (isotopic): 25 Ions with neutral loss of H 2 O, NH 3, CO and ± 1 u: 10 21

Selection of Search Algorithm Xcorr of a candidate Value of R when =0 minus the mean of R (over the range -75< <75 and divided by 10 ) R (E,T)=ΣX Y Time Consuming For a P4 3.2GHz PC, 1 spectrum takes 1-2 sec, 10,000 spectra (2.7-5.4 hr) Mascot v.s. SEQUEST Proteomics 2004, 4, 619 628 22

Selection of Search Algorithm Each search engine identifies about the same number of spectra, SEQUEST 9% 22% 4% 34% But the overlap is surprisingly small. Different search engines match different spectra. X!Tandem 19% 7% 5% Mascot Brian C. Searle, Proteome Software Inc Selection of Search Algorithm The reason that they identify different spectra is because each program has different strengths. 22% SEQUEST 9% 34% 4% considers intensities X!Tandem semi-tryptic, no neutral loss fragments 19% 7% 5% Mascot probability based scoring Brian C. Searle, Proteome Software Inc 23

Search Parameters Selection of interrogate protein/genomic DB Database EST MSDB NCBInr OWL Random SwissProt Comment EST divisions of Genbank, (currently EST_human, EST_mouse, EST_others) Comprehensive, non-identical protein database Comprehensive, non-identical protein database Non-identical protein database (obsolete) Random sequences for verifying scoring statistics High quality, curated protein database DB Cross-reference DB Redundancy DB Accuracy DB Size 24

Instrument Type Fragmentation Pattern http://www.matrixscience.com/help/search_field_help.html#instrument Interpretation of Search Result SEQUEST: Xcorr > 2.0 C n > 0.1 sort by search score MASCOT: Score > 45 Threshold? Use of conservative scoring and filtering thresholds reduces number of missassigned peptides and proteins, but does not eliminate false positives 25

Probability Accuracy with Different Algorithms 1 Mascot Calculated Probability 0.8 0.6 0.4 0.2 X!Tandem SEQUEST Combination Model Combination Model X!Tandem Model SEQUEST Model Mascot Model Ideal 0 0 0.2 0.4 0.6 0.8 1 Actual Probability Brian C. Searle, Proteome Software Inc Amplification of False Positive Error Rate from Peptide to Protein Level 5 correct (+) + + + + + Peptide 1 Peptide 2 Peptide 3 Peptide 4 Peptide 5 Peptide 6 Peptide 7 Peptide 8 Peptide 9 Peptide10 Prot Prot A Prot B Prot Prot Prot Prot in the sample (enriched for multi-hit proteins) not in the sample (mostly enriched for single hits ) Peptide Level: 50% False Positives Protein Level: 71% False Positives Alexey Nesvizhskii, Institute of Systems Biology 26

Score Distribution Small and Large Scale 1~10 proteins >100 proteins ASMS Workshop and User Meeting 2005 Large Scale Proteomics Data Analysis incorrect hit --- Normal Distribution/Possein Fitting Mascot X!Tandem correct hit--- 27

Difficulty Interpreting Protein Identification Different search score thresholds used to filter data Unknown and variable false positive error rates No reliable measures of confidence Testing the Significance in Protein Assignment Lukas Mueller (2003) Implementation of a MS/MS identification algorithm by spectral alignment and optimization of the scoring function by genetic programming 28

Search Against Normal and Decoy DB Normal Swissprot DB Decoy Swissprot DB X The Decoy DB Decoy DB (Randomized or Reversed) Randomized PMF Applicable Peptide number changed after randomized (alter the trial number) Degree of Random Reversed Not suitable for PMF Real KQTALVELLKH K.QTALVELLK.H MW=1013 Reversed HKLLEVLATQK K.LLEVLATQK.- MW=1013 Not suitable for no enzyme restricted =-18 29

Search Against Normal + Decoy DB Threshold X Large Scale Proteomics Data Analysis incorrect hit --- Normal Distribution/Poisson Fitting Mascot X!Tandem correct hit--- 30

ID Score Distribution of Identified Peptides entire dataset: Spectrum Peptide Score Spectrum 1 LGEYGH 4.5 Spectrum 2 FQSEEQ 3.4 Spectrum 3 FLYQE 1.3 Spectrum N EIQKKF 2.2 MS/MS spectrum best match database search score Keller at al. Anal. Chem. 2003 Statistical Model of Peptide Prophet entire dataset: Spectrum Peptide Score Spectrum 1 LGEYGH 4.5 1.0 Spectrum 2 FQSEEQ 3.4 0.97 Spectrum 3 FLYQE 1.3 0.01 Spectrum N EIQKKF 2.2 0.3 probability incorrect incorrect --- p=0.5 correct correct --- unsupervised learning EM mixture model algorithm learns the most likely distributions among correct and incorrect peptide assignments given the observed data Keller at al. Anal. Chem. 2003 31

Discriminating Power of Peptide Prophet SEQUEST thresholds (from literature) probability model Sensitivity: fraction of all correct results passing filter Error Rate: fraction of all results passing filter that are incorrect Ideal Spot Improved discrimination: more identifications (for the same error rate) Keller at al. Anal. Chem. 2003 MS-Based Proteomics: 2D v.s. shotgun Alexey Nesvizhskii, Institute of Systems Biology 32

Analysis of Complex Protein Mixture Protein Assignment from Identified Peptide(s) Alexey Nesvizhskii, Institute of Systems Biology 33

Protein Assignment from Identified Peptide(s) X X X Anal. Chem.2003, 75,4646-4658 Protein Assignment from Identified Peptide(s) Alexey Nesvizhskii, Institute of Systems Biology 34

Evaluation of Protein Probability Anal. Chem.2003, 75,4646-4658 Apportionment of Degenerate Peptides using Peptide Prophet Alexey Nesvizhskii, Institute of Systems Biology 35

Example for the Protein Apportionment Alexey Nesvizhskii, Institute of Systems Biology Example for the Protein Apportionment Alexey Nesvizhskii, Institute of Systems Biology 36

Example for the Protein Apportionment Alexey Nesvizhskii, Institute of Systems Biology Example for the Protein Apportionment Alexey Nesvizhskii, Institute of Systems Biology 37

Example for the Protein Apportionment Alexey Nesvizhskii, Institute of Systems Biology Example for the Protein Apportionment Alexey Nesvizhskii, Institute of Systems Biology 38

Output of ProteinProphet Protein Apportionment with MASCOT 39

Protein Apportionment with MASCOT Bold Red X Light Red Black Removing Degenerated Peptides from MASCOT Search Result 40

MS/MS Database Search Easily automated for high throughput More Accurate then PMF Can get matches from marginal data Time consuming No enzyme restriction Many variable modifications Large database Large dataset MS/MS is peptide identification Proteins by inference Protein sequence database required Protein ID by Sequence Query 41

Protein ID by Sequence Query Mass values combined with amino acid sequence or composition data Picking out a good tag From: John Cottrell, Matrix Science 42

Performing a Sequence Query Peptide Mass Partial sequence Sequence Query Result 43

Sequence Query Result Sequence Query with Different Possibilities 1890.2 tag(1004.1, Seq, 1548.5) Seq =LSADTG or LSAMG or LSAFG =LSA[DT M F]G 44

Error Tolerant Sequence Tag A sequence tag can match to a peptide despite there being an unsuspected modification or point mutation by allowing the mass values to 'float'. For example, take the peptide GVQVETISPGDGR, MH+ = 1314.7 and the (b ion) sequence tag: 1314.7 tag(614.3,tisp,911.5) If there was an unsuspected modification on the N-terminal side of the tag, which increased the mass by 100, this would affect both the fragment ion mass values in tandem. The tag interpreted from the spectrum would become: 1414.7 tag(714.3,tisp,1011.5) On the other hand, if the unsuspected modification was on the C-terminal side of the tag, or if the fragment ions were y series ions, the fragment ion mass values would be unchanged, and the interpreted tag would be: 1414.7 tag(614.3,tisp,911.5) Error Tolerant Sequence Query 45

Sequence Tag Rapid search times Error tolerant Match peptide with unknown modification or SNP Requires interpretation of spectrum Usually manual, hence not high throughput Tag has to be called correctly Although ambiguity is OK 2060.78 tag(977.4,[q K][Q K][Q K]EE,1619.7) From: John Cottrell, Matrix Science Sequence Tag Programs on the Web Sequence Tag / Sequence Homology MultiTag Sunyaev, S., et. al., MultiTag: Multiple error-tolerant sequence tag search for the sequence-similarity identification of proteins by mass spectrometry, Anal. Chem. 75 1307-1315 (2003). GutenTag Tabb, D. L., et. al., GutenTag: High-throughput sequence tagging via an empirically derived fragmentation model, Anal. Chem. 75 6415-6421 (2003). MS-Blast Shevchenko, A., et al., Charting the proteomes of organisms with unsequenced genomes by MALDI-quadrupole time of flight mass spectrometry and BLAST homology searching, Analytical Chemistry 73 1917-1926 (2001) FASTS, FASTF Mackey, A. J., et al., Getting More from Less - Algorithms for rapid protein identification with multiple short peptide sequences, Molecular & Cellular Proteomics 1 139-47 (2002) OpenSea Searle, B. C., et al., High-Throughput Identification of Proteins and Unanticipated Sequence Modifications Using a Mass-Based Alignment Algorithm for MS/MS de Novo Sequencing Results, Anal. Chem. 76 2220-30 (2004) CIDentify Taylor, J. A. and Johnson, R. S., Sequence database searches via de novo peptide sequencing by tandem mass spectrometry, Rapid Commun. Mass Spectrom. 11 1067-75 (1997) Sequence queries can be extremely powerful. These references are a good starting point if you are interested in learning more about the potential of combining mass and sequence information. From: John Cottrell, Matrix Science 46

De Novo Sequencing De Novo Sequencing Current Opinion in Structural Biology 2003, 13:595 601 47

David W. Speicher, The Wistar Institute Dipeptide MH+ ion Table (by Karl Clauser) Gly + Gly = 114.043 u and Asn = 114.043 u Ala + Gly = 128.059 u and Gln = 128.059 u and Lys = 128.095 u Gly + Val = 156.090 u and Arg = 156.101 u Ala + Asp = Glu + Gly = 186.064 and Trp = 186.079 u Ser + Val = 186.100 u and Trp = 186.079 u 48

De Novo Sequencing (M+2H + ) 2+ is most commonly observed when using tryptic peptide Doubly charged spectra give easiest spectra to interpret manually David W. Speicher, The Wistar Institute De Novo Sequencing - Application Analyze high quality spectra left after database matching Obtain sequence for proteins not in database Obtain experience with MS/MS data for manual validation of database matching David W. Speicher, The Wistar Institute 49

De Novo Sequencing - Factors 1. Neutral losses H 2 O (-18) y and b-ions from S,T,D,E NH 3 (-17) y and b-ions from R,K, (N,Q) C=O (-28) a-ions from small b-ions 2. Low mass cut-off on ion traps = 28%of precursor ion 3. Bond on N-terminal size of P, very labile Large y-ion ending in P Internal fragments starting at P + several a.a. Minor in ion trap, substantial on triple quads and QTof s 4.Immoniumions low mass ions indicative of presence of a certain type of a.a., BUT lack of observed immoniumion does not prove absence of that a.a. not useful for ion traps due to low mass limit 5.Isobaric amino acids & amino acid combinations L, I Q, K = 0.03 Da David W. Speicher, The Wistar Institute Residue Massed of Amino Acids (M+2H + ) 2+ = (ΣResidue Mass + 18 +2)/2 50

Common Immonium Ions Residue Massed of Amino Acids (N-term)KVPQVSTPTLVEVSR (15 Residures, MW=1638.93) b-ion series b1 m/z=?, b14 m/z=? y-ion series y1 m/z=?, y14 m/z=? b-ion series b1=128(k)+1(h) y-ion series y1=156(r)+17(ho)+2(h) R NH CH CO b 1 y 1 R NH 2 CH C O + R O NH 3 + CH C OH 51