Proteomic data analysis for Orbitrap datasets using Resources available at MSI. September 28 th 2011 Pratik Jagtap The Minnesota http://www.mass.msi.umn.edu/
Proteomics workflow Trypsin Protein Peptides RP HPLC MS/MS Ion Chromatogram Mass Spectrum And Tandem Mass Spectrum Search Against Database Protein Identification
Proteomic data analysis for Orbitrap datasets using Resources available at MSI. Orbitrap and precision Proteomics. Data formats. data processing and effects Proteomics workflow - search algorithms - Statistical validation of protein identification - Visualization - Descriptive Statistics : PDST (ProteinPilot outputs) - Pathway analysis Applications of workflows : Horses for Courses.
Precision proteomics Precision proteomics: The case for high resolution and high mass accuracy ; PNAS (2008) Mann and Kelleher. Need for newer approaches to assign peptides by taking advantage of increase in mass resolution and accuracy of new mass spectrometers.
Accuracy vs Precision
Accuracy and Precision
Importance of high mass accuracy measurements Low Mass Accuracy: 1500 ppm (LTQ) 631 ± 1 630.0 630.1 630.2 630.3... Upon searching the Mus database: 197 proteins have a peptide in the range of 630-632 High Mass Accuracy: 10 ppm (Orbitrap) 631.345 ± 0.006 631.339 630.341 630.343 3 proteins have a peptide in the range of 631.339-631.351
Orbital Trap Mass Analyzer Axial Frequency is Independent of Initial Energy, Angle, And Position of Ions Hu et al. (2005) J Mass Spectrom 40: 430-43 Frequency is Similar to a Simple Harmonic Oscillator ω (m/z)1/2
University of Minnesota Center for Proteomics and Mass Spectrometry currently has: 1 LTQ-OrbitrapXL (ThermoScientific) It will be upgraded to a Velos-Orbitrap (ThermoScientific) within a few months
Orbitrap LTQ-Orbitrap has proven to be a tremendous advance for shotgun proteomics, combining high resolving power, mass accuracy, and reliability in a relatively compact form. The high-resolution part of the instrument consists of an Orbitrap analyzer, which is a new type of mass spectrometer developed by Alexander Makarov. The LTQ offers high sequencing speed, high sensitivity, and very robust performance but low resolving power. The use of LTQ-Orbitrap has lead to higher quality datasets and reduced false positive peptide identifications.
Orbitrap Peptide fragmentation is mostly performed in low-resolution but very sensitive and fast linear ion traps. The Orbitrap is capable of very high mass accuracy because of the axial motion and oscillation of ions along the central spindle. From http://cbsu.tc.cornell.edu
Orbitrap velos Olsen J V et al. Mol Cell Proteomics 2009;8:2759-2769 Schematic of the LTQ Orbitrap Velos MS instrument. A, the stacked ring ion guide (S-Lens) increases the ion flux from the electrospray ion source into the instrument by a factor 5 10. B, the dual linear ion trap design enables efficient trapping and activation in the high-pressure cell (left) and fast scanning and detection in the low pressure cell (right). C, the combo C-trap and HCD collision cell with an applied axial field with improved fragment ion extraction and trapping capabilities. 2009 by American Society for Biochemistry and Molecular Biology 2009 Regents of the University of Minnesota. All rights reserved.
Conclusions 1. High Mass Accuracy Measurements Facilitate Database Matching and Increase Peptide Identification Confidence 2. Orbitrap Mass Spectrometers Have Capability of Performing Robust and Accurate Quantitative Proteomics Experiments
Proteomics workflow Orbitrap Mass spectral data. (.raw) Processing Search Algorithm Sta9s9cal valida9on of Protein Iden9fica9on Visualiza9on Descrip9ve Sta9s9cs Pathway Analysis
Mass Spectrometers & data formats Thermofinnigan AB Sciex Waters Bruker Xcalibur /.raw Analyst /.wiff ;.t2d Masslynx /.raw.baf X! tandem OMSSA Mascot Sequest ProteinPilot
Orbitrap : Data formats mzxml : XML (extensible Markup Language) based common file format for proteomics mass spectrometric data. mzml : Joinf format developed from mzdata and mzxml formats. Sequest (DTA) Files : Convert.RAW to.dta file using extract_msn.exe utility. Name : Myoglobin_digest.0012.0015.2.dta File starts with : 1999 2 Mascot Generic Format (MGF) : ASCII format used as an input for Mascot searches (also OMSSA, X!tandem, Phenyx, ProteinPilot, Myrimatch, TagIdent, etc.). The precursor peptide mass is an observed m/z value, from which relative molecular mass is calculated using the prevailing charge state. PEPMASS=1000 CHARGE=2+ Relative molecular mass M r is 1998.
Proteomic data analysis for Orbitrap datasets using Resources available at MSI. Data formats. data processing and effects - search algorithms - de novo analysis : Peaks - Statistical validation of protein identification - Visualization - Descriptive Statistics : PDST (ProteinPilot outputs) - Pathway analysis Applications of workflows : Horses for Courses.
raw data and peaklist enovatia.com Raw Data Peaks with wider distribution (Area under curve) Additional information (Noise Peaks) Peak detection Noise removal Baseline correction Monoisotoping Charge state PeakList (Processed) Peaks (centroid) with intensity values. Removal of noise peaks.
Orbitrap :Processing and effects Average ppm and Standard deviation improves when MaxQuant processed files are used.
Orbitrap :Processing and effects Peak detection Noise removal Baseline correction Monoisotoping Charge state
raw data conversion tools XRawfile library from ThermoFinnigan Xcalibur software. mzxml.raw ReAdW MSConvert ProteoWizard mzxml Raw2MSM MSM http://tools.proteomecenter.org/wiki/index.php?title=software:readw http://proteowizard.sourceforge.net/ http://tinyurl.com/raw2msm
Proteomic data analysis for Orbitrap datasets using Resources available at MSI. Data formats. data processing and effects - search algorithms - Statistical validation of protein identification - Visualization - Descriptive Statistics : PDST (ProteinPilot outputs) - Pathway analysis Applications of workflows : Horses for Courses.
Proteomics workflow Orbitrap Mass spectral data. (.raw) Processing Search Algorithm
search algorithm Nature Methods - 4, 787-797 (2007)
search algorithm Mass spectral Mass data. Spectrometry Search algorithm Sequest X!tandem OMSSA MaxQuant RDC : sdvlapp32, PC2 and PC3 in SDVL. ProteinPilot CGL 138. Mascot https://sequest7.msi.umn.edu/mascot
Protip / TINT John Chilton Mark Nelson https://tropix.msi.umn.edu/
LTQ-iQuant (itraq Quantication software) in Protip. Getiria Onsongo
Protip Raw Data from Orbitrap msconvert mzxml format Mgf format dta format 14 node SEQUEST cluster X!TANDEM search OMSSA search SEQUEST search MASCOT search Scaffold Analysis Scaffold Viewer Powered by
Protip https://tropix.msi.umn.edu/
MAXQUANT ORBITRAP DATASET Raw Data MAXQUANT QUANT module MSM Mascot.dat file Protein Summary with informa=on that can be used for Pathway Analysis, Func=onal Enrichment, Localiza=on predic=on etc. IDENTIFY module Protein Identification List. Peptide or MS/MS Summary
Improving Mass Accuracy
References / Weblinks MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Cox J, Mann M. Nat Biotechnol. 2008; 26(12):1367-1372. http://www.ncbi.nlm.nih.gov/pubmed/19029910 A practical guide to the MaxQuant computational platform for SILAC-based quantitative proteomics. Cox J, Matic I, Hilger M, Nagaraj N, Selbach M, Olsen JV, Mann M. Nat Protoc. 2009;4 (5):698-705. http://www.ncbi.nlm.nih.gov/pubmed/19373234 Computational principles of determining and improving mass precision and accuracy for proteome measurements in an Orbitrap. Cox J, Mann M. J Am Soc Mass Spectrom. 2009;20(8):1477-1485. http://www.ncbi.nlm.nih.gov/pubmed/19553133 Precision proteomics: the case for high resolution and high mass accuracy. Mann M, Kelleher NL. Proc Natl Acad Sci U S A. 2008 Nov 25;105(47):18132-8. http://www.ncbi.nlm.nih.gov/pubmed/18818311 http://www.maxquant.org http://groups.google.com/group/maxquant-list http://tinyurl.com/maxquant-intro http://tinyurl.com/maxquant-tutorial
MaxQuant : Current availability Currently MaxQuant v1.0.13.13 is available on PC1 and PC3 You can login into these machines using your MSI password. Please store your raw files in U: drive Transfer your.par files to the C: drive after Quant analysis.
Search algorithm PROTEINPILOT https://netfiles.umn.edu/users/pjagtap/proteomics_webinars_at_msi./proteinpilot_web_modules.htm
PROTEINPILOT MGF peaklist generated from an Orbitrap dataset can be used as an input for ProteinPilot searches. Performs normalization and quantification of datasets acquired in HCD mode. Multiple features such as modifications, semitryptic and non-enzymatic digestion are detected simultaneously. Robust FDR estimation. https://netfiles.umn.edu/users/pjagtap/proteomics_webinars_at_msi./proteinpilot_web_modules.htm
Introduction to ProteinPilot ( LeeAnn Higgins and Pratik Jagtap). https://netfiles.umn.edu/users/pjagtap/proteomics_webinars_at_msi./proteinpilot_web_modules.htm Module One : Introduction to ProteinPilot (LeeAnn Higgins) http://mediamill.cla.umn.edu/mediamill/embed/66243 Time : 23 minutes Module Two : ProteinPilot Outputs http://mediamill.cla.umn.edu/mediamill/embed/68879 Time : 27 minutes Module Three : ProteinPilot Peptide Summary (LeeAnn Higgins) http://mediamill.cla.umn.edu/mediamill/embed/68701 Time : 21 minutes Module Four : ProteinPilot Protein Summary and Quantitation. http://mediamill.cla.umn.edu/mediamill/embed/68840 Time : 15 minutes Module Five : False Discovery Rate Analysis and ProteinPilot FDR output. http://mediamill.cla.umn.edu/mediamill/embed/68865 Time : 25 minutes Module Six : Beyond ProteinPilot. (LeeAnn Higgins) http://mediamill.cla.umn.edu/mediamill/embed/68866 Time : 15 minutes
acquisition mode and data analysis. Data Acquisition Conversion Tool Optimal use Software Centroid Mode Any Identification Any Profile Mode Raw2MSM yields better results MS-based quantification MaxQuant CID Any Identification Any ProteinPilot and HCD Any itraq Quantification LTQ-iQuant PQD Any itraq Quantification LTQ-iQuant ETD Any Modifications and c and z ions OMSSA and COMPASS
Proteomic data analysis for Orbitrap datasets using Resources available at MSI. Orbitrap and precision Proteomics. Data formats. data processing and effects Proteomics workflow - search algorithms - Statistical validation of protein identification - Visualization - Descriptive Statistics : PDST (ProteinPilot outputs) - Pathway analysis Applications of workflows : Horses for Courses.
Statistical Validation of Peptide and Protein Identification Orbitrap Mass spectral data. (.raw) Processing Search Algorithm Sta9s9cal valida9on of Protein Iden9fica9on
False Discovery Rate Analysis
# of Matches FDR Global Threshold 1 : 99 % correct; 1% incorrect. Threshold 2 : 95 % correct; 5% incorrect. Threshold 3 : 90 % correct; 10% incorrect. Local Threshold 1 : 95 % correct; 5% incorrect. Threshold 2 : 60 % correct; 40% incorrect. Threshold 3 : 20 % correct; 80% incorrect. Discriminant Score Slide Courtesy : Brian Searle (Proteome Software) Choi H, Ghosh D, Nesvizhskii A. J. Proteome Res. 2008 :7(1):286
Combining results from multiple search algorithms increases the confidence and number of peptide and protein identifications. HUMAN DATASET 8400 7200 8162 6554 6962 7443 6000 5522 5137 5486 4800 3600 2400 1200 401 370 411 491 441 441 462 0 Sequest X! tandem Mascot All Together Sequest + Mascot Sequest + X! tandem X! tandem + Mascot
Scaffold : Current availability Currently Scaffold (v 3.03) is available on app1.msi.umn.edu You can Remote login into app1.msi.umn.edu using your MSI password. Please store your files on U: drive Scaffold v 3.03 is also available on ProTIP.
ProteoIQ Sta9s9cal valida9on of pep9de and protein iden9fica9ons.
ProteoIQ: Current availability Currently ProteoIQ (v 2.1.03) is available on app1.msi.umn.edu You can Remote login into app1.msi.umn.edu using your MSI password. Please store your files on U: drive
Proteomic data analysis for Orbitrap datasets using Resources available at MSI. Orbitrap and precision Proteomics. Data formats. data processing and effects Proteomics workflow - search algorithms - Statistical validation of protein identification - Visualization - Descriptive Statistics : PDST (ProteinPilot outputs) - Pathway analysis Applications of workflows : Horses for Courses.
Visualization, Descriptive Statistics and Pathway analysis Mass spectral Mass data. Spectrometry Search algorithm De novo Tools. Sta9s9cal valida9on of pep9de and protein iden9fica9ons. Visualiza9on Descrip9ve Sta9s9cs Pathway analysis
Spectral Visualization Scaffold Viewer MaxQuant Viewer ProteoIQ ProteinPilot SeeMS Spectra Viewer
The ProteinPilot Descriptive Statistics Template (PDST) Tool Description: A tool for users of ProteinPilot Software that provides a large amount of additional analyses not provided directly by the software. Features: Describe and assure the quality of identification and quantitation results. Optimize acquisition effectiveness with detailed metrics on acquisition redundancy, chromatography, mass accuracy, etc. Characterize digestion, quantitation labeling efficiency, minor artifact modifications, etc. Detect when important features are missing from search space. Provide basic estimation of quantitation false discovery rates for workflows using itraq reagents. Slide content provided by Sean L. Seymour, AB Sciex
PDST : Descriptive Statistics Number of proteins searched. FDR single summary Peptide Summary Protein Summary Number of spectra ProteinPilot Outputs PDST Excel Template Quality Control of Identification and Quantitation. Detailed metrics on acquisition redundancy, chromatography, mass accuracy, etc. Characterize digestion, labeling efficiency, minor artifact modifications, etc. Provide basic estimation of quantitation false discovery rates for workflows using itraq reagents.
PDST : Descriptive Statistics Slide content provided by Sean L. Seymour, AB Sciex
PDST : Descriptive Statistics Slide content provided by Sean L. Seymour, AB Sciex
Pathway Analysis Software : Ingenuity Pathway Analysis
Pathway Analysis Database for Annotation, Visualization and Integrated Discovery (DAVID ) http://david.abcc.ncifcrf.gov/ http://www.nature.com/nprot/journal/v4/n1/abs/nprot.2008.211.html
Proteomic data analysis for Orbitrap datasets using Resources available at MSI. Orbitrap and precision Proteomics. Data formats. data processing and effects Proteomics workflow - search algorithms - Statistical validation of protein identification - Visualization - Descriptive Statistics : PDST (ProteinPilot outputs) - Pathway analysis Applications of workflows : Horses for Courses.
Choosing the workflow outputs to maximise identifications : Horses for courses
Choosing the workflow outputs to maximise identifications : Horses for courses
Orbitrap :Processing and effects
Protein identification : Orbitrap datasets
Conclusions Multiple search algorithms and statistical validation tools are available for analysis. (including Quantitative Analysis Tools) Various visualization, descriptive statistics and pathway tools are available. Outputs from various workflows can be creatively merged to harness the best features of each workflow.
Sequest X! tandem http://sitemaker.umich.edu/iwsmoi http://www.mass.msi.umn.edu/ Questions? pratik@msi.umn.edu