Proteomics software available in the public domain. Pratik Jagtap Minnesota Supercomputing institute



Similar documents
Proteomic data analysis for Orbitrap datasets using Resources available at MSI. September 28 th 2011 Pratik Jagtap

泛 用 蛋 白 質 體 學 之 質 譜 儀 資 料 分 析 平 台 的 建 立 與 應 用 Universal Mass Spectrometry Data Analysis Platform for Quantitative and Qualitative Proteomics

Tutorial for Proteomics Data Submission. Katalin F. Medzihradszky Robert J. Chalkley UCSF

ProteinScape. Innovation with Integrity. Proteomics Data Analysis & Management. Mass Spectrometry

ProteinPilot Report for ProteinPilot Software

using ms based proteomics

Mass Spectrometry Based Proteomics

Aiping Lu. Key Laboratory of System Biology Chinese Academic Society

MRMPilot Software: Accelerating MRM Assay Development for Targeted Quantitative Proteomics

Global and Discovery Proteomics Lecture Agenda

Agilent G2721AA/G2733AA Spectrum Mill MS Proteomics Workbench

Database Searching Tutorial/Exercises Jimmy Eng

Master course KEMM03 Principles of Mass Spectrometric Protein Characterization. Exam

PeptidomicsDB: a new platform for sharing MS/MS data.

Challenges in Computational Analysis of Mass Spectrometry Data for Proteomics

Introduction to Proteomics

MultiQuant Software 2.0 for Targeted Protein / Peptide Quantification

Already said. Already said. Outlook. Look at LC-MS data. A look at data for quantitative analysis using MSight and Phenyx. What data for quantitation?

Session 1. Course Presentation: Mass spectrometry-based proteomics for molecular and cellular biologists

CPAS Overview. Josh Eckels LabKey Software

The Scheduled MRM Algorithm Enables Intelligent Use of Retention Time During Multiple Reaction Monitoring

Sub menu of functions to give the user overall information about the data in the file

MASCOT Search Results Interpretation

For the next half hour I m going to be describing some of the different options for peak peaking. The profit is with getting better protein ID or

ProSightPC 3.0 Quick Start Guide

Introduction to Proteomics 1.0

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance?

Introduction to Database Searching using MASCOT

In-Depth Qualitative Analysis of Complex Proteomic Samples Using High Quality MS/MS at Fast Acquisition Rates

Protein Prospector and Ways of Calculating Expectation Values

Proteomic Analysis using Accurate Mass Tags. Gordon Anderson PNNL January 4-5, 2005

Tutorial 9: SWATH data analysis in Skyline

Quantitative proteomics background

Effects of Intelligent Data Acquisition and Fast Laser Speed on Analysis of Complex Protein Digests

AB SCIEX TOF/TOF 4800 PLUS SYSTEM. Cost effective flexibility for your core needs

Error Tolerant Searching of Uninterpreted MS/MS Data

Application Note # LCMS-81 Introducing New Proteomics Acquisiton Strategies with the compact Towards the Universal Proteomics Acquisition Method

Increasing the Multiplexing of High Resolution Targeted Peptide Quantification Assays

Proteomics in Practice

Research-grade Targeted Proteomics Assay Development: PRMs for PTM Studies with Skyline or, How I learned to ditch the triple quad and love the QE

Absolute quantification of low abundance proteins by shotgun proteomics

Mascot Integra: Data management for Proteomics ASMS 2004

How Mascot Integra helps run a Core Lab

Workshop IIc. Manual interpretation of MS/MS spectra. Ebbing de Jong. Center for Mass Spectrometry and Proteomics Phone (612) (612)

Isobaric Tag based MS Quantification Algorithms Analysis and Implementation

Mass Spectrometry Signal Calibration for Protein Quantitation

Interpretation of MS-Based Proteomics Data

Quan%ta%ve proteomics. Maarten Altelaar, 2014

Advantages of the LTQ Orbitrap for Protein Identification in Complex Digests

Pep-Miner: A Novel Technology for Mass Spectrometry-Based Proteomics

Searching Nucleotide Databases

Mass Spectra Alignments and their Significance

La Protéomique : Etat de l art et perspectives

Metabolomics Software Tools. Xiuxia Du, Paul Benton, Stephen Barnes

ms-data-core-api: An open-source, metadata-oriented library for computational proteomics

Pinpointing phosphorylation sites using Selected Reaction Monitoring and Skyline

Retrospective Analysis of a Host Cell Protein Perfect Storm: Identifying Immunogenic Proteins and Fixing the Problem

Mascot Search Results FAQ

A Streamlined Workflow for Untargeted Metabolomics

The Open2Dprot Proteomics Project for n-dimensional Protein Expression Data Analysis

Chapter 14. Modeling Experimental Design for Proteomics. Jan Eriksson and David Fenyö. Abstract. 1. Introduction

Thermo Scientific PepFinder Software A New Paradigm for Peptide Mapping

High Throughput Proteomics

Accurate Mass Screening Workflows for the Analysis of Novel Psychoactive Substances

Tutorial for proteome data analysis using the Perseus software platform

Electrospray Ion Trap Mass Spectrometry. Introduction

PEAKS Studio User Manual (v5.3) PEAKS Team

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

Introduction to Proteomics

Mass Frontier Version 7.0

SpikeTides TM Peptides for relative and absolute quantification in SRM and MRM Assays

MaxQuant User s Guide Version

Statistical Analysis Strategies for Shotgun Proteomics Data

Building innovative drug discovery alliances. Evotec Munich. Quantitative Proteomics to Support the Discovery & Development of Targeted Drugs

Integrated Data Mining Strategy for Effective Metabolomic Data Analysis

Application Note # MT-90 MALDI-TDS: A Coherent MALDI Top-Down-Sequencing Approach Applied to the ABRF-Protein Research Group Study 2008

Preprocessing, Management, and Analysis of Mass Spectrometry Proteomics Data

Shotgun Proteomic Analysis. Department of Cell Biology The Scripps Research Institute

Unique Software Tools to Enable Quick Screening and Identification of Residues and Contaminants in Food Samples using Accurate Mass LC-MS/MS

Pesticide Analysis by Mass Spectrometry

Quantitative mass spectrometry in proteomics: a critical review

A Tool To Visualize and Evaluate Data Obtained by Liquid Chromatography-Electrospray Ionization-Mass Spectrometry

Mass spectrometry-based proteomics in biomedical research: emerging technologies and future strategies

Management of Proteomics Data: 2D Gel Electrophoresis and Other Methods

MassHunter for Agilent GC/MS & GC/MS/MS

Thermo Scientific SIEVE Software for Differential Expression Analysis

Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments

Thermo Scientific ExactFinder Software

MassMatrix Web Server User Manual

Software for protein identification and quantitative analysis of mass spectrometry data used for protein characterization and proteomics Version 5.

Improving the Metabolite Identification Process with Efficiency and Speed: LightSight Software for Metabolite Identification

MarkerView Software for Metabolomic and Biomarker Profiling Analysis

Comparative LC-MS: A landscape of peaks and valleys

Transcription:

Proteomics software available in the public domain. Pratik Jagtap Minnesota Supercomputing institute

Two-Dimensional gel electrophoresis pi Mw Proteins are resolved based on their isolelectric point (using isoelectric focusing) and then molecular weight (using SDS-PAGE). Gels are compared, differentially expressed proteins are excised and identified.

Proteomics Fifteen Years Ago

Proteomics Fifteen Years Ago Mass Spectrometry Data Extrac5on. Search algorithm Analysis So9ware that correlates the protein ID to the excised gel spot.

Two-Dimensional gel electrophoresis pi Mw 2DGE : High molecular weight proteins, low molecular weight proteins, proteins with extreme isoelectric points, membrane proteins were underrepresented in the analysis.

Multi-Dimensional Protein Identification Technology

Proteomics workflow Protein Peptide Fragmentation Mass spectrum Search against database.

mass spectrometry

Mass Spectrometers & data formats Thermofinnigan Xcalibur /.raw Life Technologies Analyst /.wiff ;.t2d Sequest.dta.out ProteinPilot.t2d.group Waters Masslynx /.raw Bruker.baf mzxml pepxml mzml mzdata protxml X! tandem.xml OMSSA.xml.omx Mascot.mgf.dat

Proteo-Informatics Mass Spectrometry Data Extrac5on. Data Conversion. Search algorithm De novo Tools. Sta5s5cal valida5on of pep5de and protein iden5fica5ons. Quan5ta5ve Tools. Targeted Proteomics Spectral Matching Data Dissemina5on

Mass Spectrometry Data Extrac5on. Data extraction ReAdW http://www.ionsource.com/functional_reviews/readw/t2x_update_readw.htm ReAdW converts Xcalibur.raw files to universal mzxml format. T2D Extractor https://www.prime-sdms.org/primeinstallationsite/msviewer/t2dextractor.zip A tool that can access the Applied Biosystem s MALDI-TOF/TOF 4700 and 4800 database and can extract T2D files as well as peak lists. It can be used to extract individual spectra, runs, or entire spotsets. MS/MS peaklists are provided in.mgf formats. Runs on Java 1.5 platform. LCMS Peaklist Extractor Batch mode tool for extracting concatenated.mgf peaklist files. Quantitation Extractor Batch mode tool for extracting areas for peaks in MS/MS spectra.

Mass Spectrometers & data formats Thermofinnigan Xcalibur /.raw Life Technologies Analyst /.wiff ;.t2d Waters Masslynx /.raw Bruker.baf mzxml pepxml mzml mzdata protxml Mass Spectrometry Mascot.mgf.dat Sequest.dta.out X! tandem.xml OMSSA.xml.omx Data Conversion.

Mass Spectrometry Data Conversion. data conversion mzxml2other http://www.proteomecommons.org/current/522/ Converter from mzxml to sequest dta, mascot generic and micromass pkl formats. Peak List Conversion Utility (Java Web Start) https://proteomecommons.org/tool.jsp?i=1012 The ProteomeCommons.org IO Framework's tool for converting peak list and spectrum files between different formats. The tool can also merge multiple peak lists into a single concatinated peak list. The tools uses Java Web Start and runs locally on your computer. http://searcher.rrc.uic.edu/mm-docs/downloads /MM_File_Conversion_1p0.exe MassMatrix File Conversion Tools These tools convert between common input formats:.raw,.mzxml,.mgf.

search algorithm Mass Spectrometry Data Extrac5on. Data Conversion. Search algorithm

Search algorithm SEARCH ALGORITHM

Search algorithm X!tandem & the GPM http://www.thegpm.org/tandem/index.html X! Tandem can be utilized as a web-based application or deployed locally using precompiled binaries and FASTA-formatted files. X!Tandem takes inputs in.xml format and outputs.xml format. The data analysis components consist of Input file ; FASTA, Taxonomy; Parameters and output. Central Axiom : For each identifiable protein, there is at least one detectable tryptic peptide. Extensively search for modified/ non-enzymatic peptides only on identified proteins. How far is the top-scoring match from the rest of the pack? Uses E-value. Much faster than Sequest s Xcorr. The Global Proteome Machine Organization X!Hunter X! P3 Common

Search algorithm OMSSA OMSSA takes experimental ms/ms spectra, filters noise peaks, extracts m/z values, and then compares these m/z values to calculated m/z values derived from peptides produced by an in silico digestion of a protein sequence library. Calculates E-value as a discriminant score. An E-value for a hit is a score that is the expected number of random hits from a search library to a given spectrum such that the random hits have an equal or better score than the hit. It uses classical hypothesis testing based on type of statistical model that is used in BLAST. Faster; Runs on all platforms http://pubchem.ncbi.nlm.nih.gov/omssa/

Search algorithm maxquant http://www.maxquant.org/ MaxQuant is an integrated suite of algorithms specifically developed for highresolution, quantitative MS data. MaxQuant detects peaks, isotope clusters and stable amino acid isotope-labeled (SILAC) peptide pairs as three-dimensional objects in m/z, elution time and signal intensity space. By integrating multiple mass measurements, mass accuracy in the p.p.b. range is achieved. MaxQuant quantifies several hundred thousand peptides per SILAC-proteome experiment.

De novo tools Mass Spectrometry Data Extrac5on. Data Conversion. Search algorithm De novo Tools.

De novo Tools. de novo analysis Protein Peptide Fragmentation Mass spectrum Search against database. De novo Analysis : Generate sequence from spectrum and match against database by using BLAST

De novo Tools. pepnovo hep://pep5de.ucsd.edu/pepnovo.html PepNovo is a software for de novo sequencing of peptides from mass spectra. PepNovo uses a probabilistic network to model the peptide fragmentation events in a mass spectrometer. In addition, it uses a likelihood ratio hypothesis test to determine if the peaks observed in the mass spectrum are more likely to have been produced under the fragmentation model, than under a probabilistic model that treats the appearance of peaks as random events.

De novo Tools. lutefisk http://sourceforge.net/projects/lutefiskxp LUTEFISK uses a graph theory approach for de novo peptide sequence determinations from low-energy collision-induced dissociation (CID) data of tryptic peptides. Lutefisk converts all of the ions into their corresponding b-ion masses by making N- and C-terminal evidence lists that contain evidence for cleavage at every possible b-ion mass. Once the sequence spectrum has been established, the program proceeds by tracing sequences starting at the N-terminus. Highest ranked sequences are subjected to a cross-correlation analysis and scores are combined and normalized to produce a final score and ranking.

spectral matching Mass Spectrometry Data Extrac5on. Data Conversion. Search algorithm De novo Tools. Spectral Matching

Spectral Matching x!hunter http://www.thegpm.org X! Hunter is a search engine that compares experimentally observed spectra directly with a library of spectra that have been confidently assigned to a particular peptide sequence (an Annotated Spectrum Library, or ASL). It can identify proteins using information from large number of spectra in GPMDB database. Creation of ASLs : 1) Confident assignments for human and yeast peptides were extracted from GPMDB. 2) Replicate observations of the same peptide were averaged together and a final list of averaged peptide spectra was produced. Because the sequence modifications and cleavage sites for the peptides in the sequence library are already known, it is not necessary to specify as many parameters for this type of search as in more conventional search engines. This type of pattern matching tool is ideal for applications such as biomarker discovery.

Spectral Matching MS-Clustering http://proteomics.bioprojects.org/massspec MS-Clustering of MS/MS spectra takes advantage of dataset redundancy by identifying multiple spectra of the same peptide and replacing them with a single representative spectrum. Analyzing only representative spectra results in significant speed-up of MS/MS database searches. Large MS/MS data sets (over 10 million spectra) were reduced to smaller datasets and resulted in higher number of peptide identifications as compared to regular nonclustered searches.

Mass Spectrometry Data Extrac5on. Data Conversion. Search algorithm De novo Tools. Spectral Matching Sta5s5cal valida5on of pep5de and protein iden5fica5ons.

Sta5s5cal valida5on of pep5de and protein iden5fica5ons.

Trans-proteomic pipeline Sta5s5cal valida5on of pep5de and protein iden5fica5ons. Trans-Proteomic Pipeline (TPP) is a data analysis pipeline for the analysis of LC/ MS/MS proteomics data. TPP includes modules for validation of database search results, quantitation of isotopically labeled samples, and validation of protein identifications, as well as tools for viewing raw LC/MS data, peptide identification results, and protein identification results. The XML backbone of this pipeline enables a uniform analysis for LC/MS/MS data generated by a wide variety of mass spectrometer types, and assigned peptides using a wide variety of database search engines.

peparml Sta5s5cal valida5on of pep5de and protein iden5fica5ons. http://mac.softpedia.com/get/math-scientific/peparml.shtml X!Tandem Mascot Feature extraction PepArML OMSSA Other A model-free, result-combining peptide identification arbiter via machine learning.

Quantitative tools Mass Spectrometry Data Extrac5on. Data Conversion. Search algorithm De novo Tools. Sta5s5cal valida5on of pep5de and protein iden5fica5ons. Quan5ta5ve Tools. Spectral Matching

itraq : Isobaric Tags for Relative and Absolute Quantification. Isobaric Tag (Total mass = 145) Reporter Charged Balance Neutral loss Peptide Reactive Group Trypsin digest 114 31 PRG + 115 30 PRG 116 29 PRG 117 28 PRG + + + Mix MS -N H -N H -N H -N H MS [Reporter-Balance-Peptide] MS/MS % Intensity 100 90 80 70 60 50 40 114 115 116 117 % Intensity 100 90 80 70 60 50 40 30 20 10 0 Mass (m/z) QGQPIGLGEASNDTWI TTK 30 20 10 0 72.0 509.8 947.6 1385.4 1823.2 2261.0 Mass (m/z)

Proteomics Quantitatition

Quan5ta5ve Tools. i-tracker http://www.dasi.org.uk/download/itracker.htm i-tracker is an open-source peptide quantitation algorithm that allows the user to extract reporter ion peak ratios from non-centroided peak lists. The algorithm uses.dta and.mgf files as inputs. The reporter ion areas are calculated and corrected for their purity. The.csv output of i-tracker allows for the relative comparison of the itraq labeled peptides.

Quan5ta5ve Tools. TPP Quantitative Tools ASAP ratio, Xpress and libra ASAPRatio http://tools.proteomecenter.org/asapratio.php Automated Statistical Analysis on Protein Ratio (ASAPRatio) accurately calculates the relative abundances of proteins and the corresponding confidence intervals from ICATtype ESI-LC/MS data. XPRESS http://tools.proteomecenter.org/xpress.php The XPRESS software calculates the relative abundance of proteins, such as those obtained from an ICAT-reagent labeled experiment, by reconstructing the light and heavy elution profiles of the precursor ions and determining the elution areas of each peak. LIBRA http://tools.proteomecenter.org/wiki/index.php?title=software:libra Libra is a module within the trans-proteomic pipeline to perform quantification on MS/MS spectra that have itraq labeled samples.

APEX Quan5ta5ve Tools. http://pfgrc.jcvi.org/index.php/bioinformatics/ The APEX Quantitative Proteomics Tool is a free and open source Java implementation of the APEX technique for the absolute quantitation of proteins based on standard LC- MS/MS proteomics data. It uses machine learning techniques to improve quantitation accuracy for labelfree technique. The APEX Tool provides an intuitive user interface, an integrated help system, and rich documentation. A tutorial and sample data set is included to help first time users become acquainted with the system.

Quan5ta5ve Tool maxquant http://www.maxquant.org/ MaxQuant quantifies several hundred thousand peptides per SILAC-proteome experiment.

Targeted Proteomics Mass Spectrometry Data Extrac5on. Data Conversion. Search algorithm De novo Tools. Sta5s5cal valida5on of pep5de and protein iden5fica5ons. Quan5ta5ve Tools. Targeted Proteomics Spectral Matching

Targeted Proteomics Biochemistry vs Proteomics Targeted proteomics vs Shotgun Proteomics

Targeted Proteomics MRM Selectivity, Sensitivity and Dynamic Range Quantitative Proteomics Results Prediction Choose and Optimize Transistions

Targeted Proteomics TIQAM http://tools.proteomecenter.org/tiqam/tiqam.html TIQAM generates MRM transition lists and identifies the best performing transitions from MRM pre-experiments. In addition TIQAM provides a viewer to validate transitions by MRM-triggered MS/MS experiments. All the peptide and transition information is stored in a database to enable smart retrieval of the validated transitions for quantitative analysis. Commercial softwares : MRMPilot (Applied Biosystems), SRM Workflow Software (Thermo Scientific), VerifyE (Waters) and Optimizer (Agilent Technologies).

Targeted Proteomics X! P3 http://www.thegpm.org ftp://ftp.thegpm.org/proteotypic_peptide_profiles Uses identification of proteotypic peptides for identification of a protein. Because there will only be a few proteotypic peptides for a protein, it improves both the speed and accuracy of the resultant protein identifications. The X! P3 (Proteotypic Peptide Profiler) project uses the following steps : 1. In the first round, the spectrum data set is examined for the presence of proteotypic peptides. This is done by querying GPMDB to find the best peptides representative of a particular protein. 2. The full protein sequences of the proteins identified in the first round are then pulled from a sequence library. 3. Using this small set of full sequences, multiple rounds of refinement are performed to extract all of the non-proteotypic peptides from the full spectrum data set An X! P3 server has been established for two model organisms, namely Homo sapiens and Saccharomyces cerevisiae, as well as several commonly observed experimental artifacts, such as BSA and trypsin.

Mass Spectrometry Data Extrac5on. Data Conversion. Search algorithm De novo Tools. Spectral Matching Sta5s5cal valida5on of pep5de and protein iden5fica5ons. Quan5ta5ve Tools. Data Dissemina5on Targeted Proteomics Your Answer is going to be determined by the ques5on asked.

Data Dissemina5on Prestomic http://code.google.com/p/prestomic An open-source suite of tools for storing data and for presenting the data in a user-friendly format via a browser. The program was developed using mostly Perl.

Data Dissemina5on Tranche https://proteomecommons.org/tranche/ Tranche is a free and open source file sharing tool that enables the storage of large amounts of data. Designed and built with scientists and researchers in mind, Tranche can handle very large data sets, is secure, is scalable, and all data sets are citable in scientific journals.

Proteomic pipelines that use Open-source software. CPAS http://proteomics.fhrc.org/cpas Open source toolkit that integrates open source proteomics tools along with existing commercial software. CORRA http://tools.proteomecenter.org/corra/corra.html Statistical Analysis tools for Quantitative proteomics SysPIMP http://pimp.starflr.info Identify mutated proteins from mass spectrometry results. SwissPIT http://swisspit.cscs.ch Multitool platform that promotes use of multiple search algorithms. mmass data miner http://mmass.biographics.cz The OpenMS Proteomics Pipeline http://ww.openms.de

Protip Raw Data from Orbitrap mzxml format Mgf format dta format X!TANDEM search OMSSA search SEQUEST search MASCOT search Scaffold Analysis Scaffold Viewer

performing multiple searches through Protip HUMAN DATASET 8400 7200 8162 mzxml format 6554 6962 7443 Mgf format # of peptides 6000 4800 3600 2400 1200 0 5522 401 5137 5486 X!TANDEM search 370 411 OMSSA search 491 dta format SEQUEST search 441 441 MASCOT search 462 # of proteins Sequest X! tandem Mascot All Together Scaffold Analysis Sequest + Mascot Sequest + X! tandem X! tandem + Mascot

LAST WORD Questions? Pratik Jagtap pratik@msi.umn.edu