Error Tolerant Searching of Uninterpreted MS/MS Data



Similar documents
Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance?

Searching Nucleotide Databases

Tutorial for Proteomics Data Submission. Katalin F. Medzihradszky Robert J. Chalkley UCSF

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

Effects of Intelligent Data Acquisition and Fast Laser Speed on Analysis of Complex Protein Digests

MASCOT Search Results Interpretation

Aiping Lu. Key Laboratory of System Biology Chinese Academic Society

Global and Discovery Proteomics Lecture Agenda

ProteinScape. Innovation with Integrity. Proteomics Data Analysis & Management. Mass Spectrometry

Application Note # LCMS-81 Introducing New Proteomics Acquisiton Strategies with the compact Towards the Universal Proteomics Acquisition Method

Advantages of the LTQ Orbitrap for Protein Identification in Complex Digests

Introduction to Database Searching using MASCOT

In-Depth Qualitative Analysis of Complex Proteomic Samples Using High Quality MS/MS at Fast Acquisition Rates

Mascot Search Results FAQ

ProSightPC 3.0 Quick Start Guide

AB SCIEX TOF/TOF 4800 PLUS SYSTEM. Cost effective flexibility for your core needs

泛 用 蛋 白 質 體 學 之 質 譜 儀 資 料 分 析 平 台 的 建 立 與 應 用 Universal Mass Spectrometry Data Analysis Platform for Quantitative and Qualitative Proteomics

High Throughput Proteomics

Quantitative proteomics background

Laboration 1. Identifiering av proteiner med Mass Spektrometri. Klinisk Kemisk Diagnostik

Challenges in Computational Analysis of Mass Spectrometry Data for Proteomics

Retrospective Analysis of a Host Cell Protein Perfect Storm: Identifying Immunogenic Proteins and Fixing the Problem

MS Amanda Standalone User Manual

Introduction to Proteomics 1.0

Proteomic Analysis using Accurate Mass Tags. Gordon Anderson PNNL January 4-5, 2005

Pep-Miner: A Novel Technology for Mass Spectrometry-Based Proteomics

Session 1. Course Presentation: Mass spectrometry-based proteomics for molecular and cellular biologists

Pinpointing phosphorylation sites using Selected Reaction Monitoring and Skyline

ProteinPilot Report for ProteinPilot Software

Tackling the data analysis challenge for characterisation of biotherapeutics

Tutorial 9: SWATH data analysis in Skyline

Thermo Scientific PepFinder Software A New Paradigm for Peptide Mapping

SimGlycan Software*: A New Predictive Carbohydrate Analysis Tool for MS/MS Data

Computational analysis of unassigned high-quality MS/MS spectra in proteomic data sets

Mass Spectra Alignments and their Significance

Shotgun Proteomic Analysis. Department of Cell Biology The Scripps Research Institute

Protein Prospector and Ways of Calculating Expectation Values

MRMPilot Software: Accelerating MRM Assay Development for Targeted Quantitative Proteomics

Increasing the Multiplexing of High Resolution Targeted Peptide Quantification Assays

Interpretation of MS-Based Proteomics Data

BLAST. Anders Gorm Pedersen & Rasmus Wernersson

Workshop IIc. Manual interpretation of MS/MS spectra. Ebbing de Jong. Center for Mass Spectrometry and Proteomics Phone (612) (612)

Master course KEMM03 Principles of Mass Spectrometric Protein Characterization. Exam

Introduction to Proteomics

Learning Objectives:

Bioinformatics Resources at a Glance

Mass Spectrometry Based Proteomics

MaxQuant User s Guide Version

The Scheduled MRM Algorithm Enables Intelligent Use of Retention Time During Multiple Reaction Monitoring

Application Note # LCMS-66 Straightforward N-glycopeptide analysis combining fast ion trap data acquisition with new ProteinScape functionalities

Methods for Protein Analysis

M.Sc. in Nano Technology with specialisation in Nano Biotechnology

Scaling from 1 PC to a super computer using Mascot

SimGlycan Software*: A New Predictive Carbohydrate Analysis Tool for MS/MS Data

Agilent G2721AA/G2733AA Spectrum Mill MS Proteomics Workbench

DNA, RNA, Protein synthesis, and Mutations. Chapters

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Biochemistry - I. Prof. S. Dasgupta Department of Chemistry Indian Institute of Technology, Kharagpur Lecture-11 Enzyme Mechanisms II

Development of computational methods for analysing proteomic data for genome annotation

PeptidomicsDB: a new platform for sharing MS/MS data.

From DNA to Protein. Proteins. Chapter 13. Prokaryotes and Eukaryotes. The Path From Genes to Proteins. All proteins consist of polypeptide chains

Application Note # MT-90 MALDI-TDS: A Coherent MALDI Top-Down-Sequencing Approach Applied to the ABRF-Protein Research Group Study 2008

Industry Perspective: Advantages of Open Access and Walkup LC/ MS Supporting Protein Drug Discovery and Development

NUVISAN Pharma Services

USP's Therapeutic Peptides Expert Panel discusses manufacturing processes and impurity control for synthetic peptide APIs.

MUTATION, DNA REPAIR AND CANCER

Database Searching Tutorial/Exercises Jimmy Eng

Bob Jesberg. Boston, MA April 3, 2014

A greedy algorithm for the DNA sequencing by hybridization with positive and negative errors and information about repetitions

Activity 7.21 Transcription factors

Name Class Date. Figure Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.

Mass Frontier 7.0 Quick Start Guide

Overview. Triple quadrupole (MS/MS) systems provide in comparison to single quadrupole (MS) systems: Introduction

Preprocessing, Management, and Analysis of Mass Spectrometry Proteomics Data

Statistical Analysis Strategies for Shotgun Proteomics Data

La Protéomique : Etat de l art et perspectives

Control of Gene Expression

Quantitative mass spectrometry in proteomics: a critical review

PROTEIN SEQUENCING. First Sequence

Quantification of Multiple Therapeutic mabs in Serum Using microlc-esi-q-tof Mass Spectrometry

AP BIOLOGY 2009 SCORING GUIDELINES

Accurate Mass Screening Workflows for the Analysis of Novel Psychoactive Substances

Analysis, statistical validation and dissemination of large-scale proteomics datasets generated by tandem MS

AP BIOLOGY 2010 SCORING GUIDELINES (Form B)

ISTEP+: Biology I End-of-Course Assessment Released Items and Scoring Notes

Building innovative drug discovery alliances. Evotec Munich. Quantitative Proteomics to Support the Discovery & Development of Targeted Drugs

Proteomic data analysis for Orbitrap datasets using Resources available at MSI. September 28 th 2011 Pratik Jagtap

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure enzymes control cell chemistry ( metabolism )

MultiQuant Software 2.0 for Targeted Protein / Peptide Quantification

Already said. Already said. Outlook. Look at LC-MS data. A look at data for quantitative analysis using MSight and Phenyx. What data for quantitation?

Mascot Integra: Data management for Proteomics ASMS 2004

Transcription and Translation of DNA

Lecture 8. Protein Trafficking/Targeting. Protein targeting is necessary for proteins that are destined to work outside the cytoplasm.

a. Ribosomal RNA rrna a type ofrna that combines with proteins to form Ribosomes on which polypeptide chains of proteins are assembled

Transcription:

Error Tolerant Searching of Uninterpreted MS/MS Data 1

In any search of a large LC-MS/MS dataset 2

There are always a number of spectra which get poor scores, or even no match at all. 3

Sometimes, this is because the spectra are empty, or contain little more than noise. However, some of the spectra may contain clear sequence ion ladders at good signal to noise 4

Why do we fail to get a match? Things that shouldn t happen: Incorrect determination of precursor charge Underestimated mass measurement error Things that often happen: Enzyme non-specificity Unsuspected chemical & post-translational modifications Peptide sequence not in the database. Why do the good quality spectra fail to match? Some problems are easily remedied. For example, the precursor charge may have been called incorrectly, or the mass tolerance estimate may be over-optimistic. Enzyme non-specificity is very common, and can be addressed with a noenzyme search, but this is very time consuming. The two other causes of failure to match are more difficult to deal with: Unsuspected modifications and variations in the primary sequence. 5

913.2 1278.3 T A G Mann, M., Wilm, M., Anal Chem 1994, 66, 4390-9 The best tool available for finding matches when there are unsuspected modifications or variations in the primary sequence is the error tolerant sequence tag, developed at EMBL. The standard tag combines an interpreted tag with the flanking fragment ion masses, the peptide mass, and the enzyme specificity. By relaxing one or more of these constraints, the tag can accommodate enzyme non-specificity and / or unexpected mass differences to one side or the other of the tag. 6

Error Tolerant Sequence Tag Rapid search times Requires good quality data Only a filter: no score Results need interpretation. The error tolerant sequence tag is a very fast search, but the data quality has to be high enough to interpret a reliable tag. Also, the error tolerant sequence tag is less specific, and it becomes more likely that multiple hits will be found. Since there is no score to indicate which hit is the best match, and whether the best match is significant, the user has to reconcile the spectrum to each of the candidate sequences, and decide which match to accept. To address these limitations, we decided to investigate an error tolerant approach using Mascot to search uninterpreted data 7

Error Tolerant Search of Uninterpreted MS/MS Data Enzyme non-specificity no-enzyme search In Silico digestion of NCBI nr (850,000 entries) 6000 tryptic limit peptides 1500 Da ± 0.5 2,400,000 non-specific peptides 1500 Da ± 0.5 Searching a complete sequence database with no enzyme specificity takes much longer than the same search with (say) tryptic specificity. This is because there are between 100 and 1000 times as many peptides to be tested. 8

Error Tolerant Search of Uninterpreted MS/MS Data Unsuspected chemical & P-T modifications Iterate through comprehensive list. For unsuspected modifications, it would be nice to try all possible mass values at all residue locations. Unfortunately, this doesn't work because all specificity is lost. We have chosen to iterate through a very comprehensive list of modifications 9

Chemical & P-T Modifications The longer the list, the better. Unfortunately, there is no standard database of modifications that is suitable for MS use, so we had to compile this list ourselves 10

Error Tolerant Search of Uninterpreted MS/MS Data Peptide sequence not in the database single base substitutions - Yes single base insertions & deletions - No (frame shift) multiple base substitutions, insertions & deletions - No (de novo). Virtually all of the protein database entries come from translations of DNA or RNA. Hence, for variations in the primary sequence, it is essential to consider changes at the NA level. We choose to allow all possible single NA substitutions, but not insertions or deletions, because these give rise to frame shifts. We also draw the line at multiple variations in a single peptide. These must be handled by de novo sequencing 11

Error Tolerant Search of Uninterpreted MS/MS Data Single base substitutions: Back translate AA to codons Perform all possible single base substitutions Translate resulting codons to AA. The procedure to generate a substitution matrix for all possible single base substitutions 12

Primary Sequence Variations A R N D C Q E G H I L K M F P S T W Y V A 1 1 1 1 1 1 1 R 1 1 1 1 1 1 1 1 1 1 1 1 N 1 1 1 1 1 1 1 D 1 1 1 1 1 1 1 C 1 1 1 1 1 1 Q 1 1 1 1 1 1 E 1 1 1 1 1 1 G 1 1 1 1 1 1 1 1 H 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 L 1 1 1 1 1 1 1 1 1 1 K 1 1 1 1 1 1 1 M 1 1 1 1 1 1 F 1 1 1 1 1 1 P 1 1 1 1 1 1 1 S 1 1 1 1 1 1 1 1 1 1 1 1 T 1 1 1 1 1 1 1 1 W 1 1 1 1 1 Y 1 1 1 1 1 1 V 1 1 1 1 1 1 1 1 This matrix is considerably sparser than all possible residue substitutions 13

Error Tolerant Search of Uninterpreted MS/MS Data Procedure Perform standard search Select one or more protein hits With no enzyme specificity, iterate through extensive list of modifications + substitution matrix. The error tolerant search is a second pass search 14

Here we see the matches to trypsin autolysis products from a standard search of an LC-MS/MS dataset from the analysis of a human cell lysate. 15

And here are the matches from the second pass, error tolerant search. The number of matches has increased from 10 to 17. Only those matches that were as good as or better than the original search are reported. Query 74 is a simple, non-specific cleavage product Query 107 shows acetylation at the N-terminus plus a mass increase of 16 at one of the N-term serines. It is not easy to find a reasonable mechanism to account for either an increase of 16 or an increase of 58 (42+16) at the N-term of this peptide 16

Query 270 is assigned to NINVVEGNEQFISASK with a biotinylated N- term. However, this sample was not biotinylated, so this assignment cannot be correct. In fact, we get exactly the same mass increment by moving the cleavage point two residues further towards the N-terminus and adding a pyro-glu modification. This is a good illustration of how difficult it is for the report to always give the correct assignment 17

There are several matches to the same peptide LGEDNINVVEGNEQFISASK with a variety of modifications. The first, Q->K, is identical to no modification. Although the match to Query 296 is shown as G->V, we would prefer to assign the mass change of 42 Da to N-term acetylation. 18

Looking at the details for this match 19

We see that the fragment ions have a quantitative neutral loss of water. Is this behaviour well known for acetylated Leu? 20

This suggests that a possible assignment of +24 Da for query 293 is prompt loss of water: 42-18=24??? 21

Finally, a slightly artificial illustration of finding a point mutation in the primary sequence. We can lose the match to query 1 by changing the taxonomy to 'green plants', which excludes both entries containing the correct peptide. If we now do an error tolerant search, the correct match is recovered via the substitution A->T 22

Error Tolerant Search of Uninterpreted MS/MS Data Successfully locate mass differences Difficult to report correct assignment Limited to proteins which have at least one good peptide match not very useful for (say) MHC peptides Next step: hybrid of Sequence Tag & MS/MS Ions Search? To summarise, the error tolerant search of uninterpreted MS/MS data is a powerful way of finding additional peptide matches. However, it is very difficult to provide a chemically credible assignment for many of the observed mass differences. Some expertise in mass spectrometry and protein chemistry is required to review and correct the reported assignments. 23

24