Shotgun Proteomic Analysis. Department of Cell Biology The Scripps Research Institute



Similar documents
Quantitative proteomics background

Aiping Lu. Key Laboratory of System Biology Chinese Academic Society

Mass Spectrometry Based Proteomics

Application Note # LCMS-81 Introducing New Proteomics Acquisiton Strategies with the compact Towards the Universal Proteomics Acquisition Method

Tutorial for Proteomics Data Submission. Katalin F. Medzihradszky Robert J. Chalkley UCSF

Introduction to Proteomics 1.0

In-Depth Qualitative Analysis of Complex Proteomic Samples Using High Quality MS/MS at Fast Acquisition Rates

Thermo Scientific PepFinder Software A New Paradigm for Peptide Mapping

The Scheduled MRM Algorithm Enables Intelligent Use of Retention Time During Multiple Reaction Monitoring

Global and Discovery Proteomics Lecture Agenda

泛 用 蛋 白 質 體 學 之 質 譜 儀 資 料 分 析 平 台 的 建 立 與 應 用 Universal Mass Spectrometry Data Analysis Platform for Quantitative and Qualitative Proteomics

Pep-Miner: A Novel Technology for Mass Spectrometry-Based Proteomics

Advantages of the LTQ Orbitrap for Protein Identification in Complex Digests

ProteinPilot Report for ProteinPilot Software

Error Tolerant Searching of Uninterpreted MS/MS Data

Session 1. Course Presentation: Mass spectrometry-based proteomics for molecular and cellular biologists

Protein Prospector and Ways of Calculating Expectation Values

Introduction to Proteomics

Research-grade Targeted Proteomics Assay Development: PRMs for PTM Studies with Skyline or, How I learned to ditch the triple quad and love the QE

Challenges in Computational Analysis of Mass Spectrometry Data for Proteomics

MultiQuant Software 2.0 for Targeted Protein / Peptide Quantification

Increasing the Multiplexing of High Resolution Targeted Peptide Quantification Assays

ProSightPC 3.0 Quick Start Guide

ProteinScape. Innovation with Integrity. Proteomics Data Analysis & Management. Mass Spectrometry

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance?

MRMPilot Software: Accelerating MRM Assay Development for Targeted Quantitative Proteomics

AB SCIEX TOF/TOF 4800 PLUS SYSTEM. Cost effective flexibility for your core needs

Introduction to mass spectrometry (MS) based proteomics and metabolomics

La Protéomique : Etat de l art et perspectives

MaxQuant User s Guide Version

Alignment and Preprocessing for Data Analysis

Systems Biology through Data Analysis and Simulation

Effects of Intelligent Data Acquisition and Fast Laser Speed on Analysis of Complex Protein Digests

Statistical Analysis Strategies for Shotgun Proteomics Data

Introduction to Proteomics

PeptidomicsDB: a new platform for sharing MS/MS data.

Searching Nucleotide Databases

Mass Spectrometry Signal Calibration for Protein Quantitation

Pesticide Analysis by Mass Spectrometry

Pinpointing phosphorylation sites using Selected Reaction Monitoring and Skyline

Retrospective Analysis of a Host Cell Protein Perfect Storm: Identifying Immunogenic Proteins and Fixing the Problem

Mass Spectra Alignments and their Significance

Proteomic Analysis using Accurate Mass Tags. Gordon Anderson PNNL January 4-5, 2005

Laboration 1. Identifiering av proteiner med Mass Spektrometri. Klinisk Kemisk Diagnostik

Proteomic data analysis for Orbitrap datasets using Resources available at MSI. September 28 th 2011 Pratik Jagtap

OpenMS A Framework for Quantitative HPLC/MS-Based Proteomics

Chapter 14. Modeling Experimental Design for Proteomics. Jan Eriksson and David Fenyö. Abstract. 1. Introduction

Analysis of Mass Spectrometry. Data for Protein Identification. in Complex Biological Mixtures

Preprocessing, Management, and Analysis of Mass Spectrometry Proteomics Data

Investigating Biological Variation of Liver Enzymes in Human Hepatocytes

Learning Objectives:

Absolute quantification of low abundance proteins by shotgun proteomics

Interpretation of MS-Based Proteomics Data

MASCOT Search Results Interpretation

Functional Data Analysis of MALDI TOF Protein Spectra

Core Facility Genomics

NUVISAN Pharma Services

Quantitative mass spectrometry in proteomics: a critical review

Tutorial 9: SWATH data analysis in Skyline

Analysis, statistical validation and dissemination of large-scale proteomics datasets generated by tandem MS

Analysis, statistical validation and dissemination of large-scale proteomics datasets generated by tandem MS

Mascot Search Results FAQ

Definition of the Measurand: CRP

Comparison of Spectra in Unsequenced Species

SpikeTides TM Peptides for relative and absolute quantification in SRM and MRM Assays

for mass spectrometry calibration tools Thermo Scientific Pierce Controls and Standards for Mass Spectrometry

Building innovative drug discovery alliances. Evotec Munich. Quantitative Proteomics to Support the Discovery & Development of Targeted Drugs

Application Note # MT-90 MALDI-TDS: A Coherent MALDI Top-Down-Sequencing Approach Applied to the ABRF-Protein Research Group Study 2008

Simultaneous qualitative and quantitative analysis using the Agilent 6540 Accurate-Mass Q-TOF

Protein Protein Interaction Networks

Quan%ta%ve proteomics. Maarten Altelaar, 2014

Industry Perspective: Advantages of Open Access and Walkup LC/ MS Supporting Protein Drug Discovery and Development

Integrated Data Mining Strategy for Effective Metabolomic Data Analysis

Computational analysis of unassigned high-quality MS/MS spectra in proteomic data sets

MarkerView Software for Metabolomic and Biomarker Profiling Analysis

Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments

The Open2Dprot Proteomics Project for n-dimensional Protein Expression Data Analysis

Introduction to Database Searching using MASCOT

Accurate Mass Screening Workflows for the Analysis of Novel Psychoactive Substances

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

using ms based proteomics

Design considerations for proteomic reference materials

Database Searching Tutorial/Exercises Jimmy Eng

Using NIST Search with Agilent MassHunter Qualitative Analysis Software James Little, Eastman Chemical Company Sept 20, 2012.

Thermo Scientific SIEVE Software for Differential Expression Analysis

Master course KEMM03 Principles of Mass Spectrometric Protein Characterization. Exam

Sub menu of functions to give the user overall information about the data in the file

Quantitative mass spec based proteomics

Workshop IIc. Manual interpretation of MS/MS spectra. Ebbing de Jong. Center for Mass Spectrometry and Proteomics Phone (612) (612)

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Mass Frontier Version 7.0

1 Genzyme Corp., Framingham, MA, 2 Positive Probability Ltd, Isleham, U.K.

Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium

CCR Biology - Chapter 9 Practice Test - Summer 2012

MassMatrix Web Server User Manual

A Streamlined Workflow for Untargeted Metabolomics

MiSeq: Imaging and Base Calling

ALGORITHMS FOR SHOTGUN PROTEOMICS SPECTRAL IDENTIFICATION AND QUALITY ASSESSMENT. Ze-Qiang Ma

Correlation of the Mass Spectrometric Analysis of Heat-Treated Glutaraldehyde Preparations to Their 235nm / 280 nm UV Absorbance Ratio

VALIDATION OF ANALYTICAL PROCEDURES: TEXT AND METHODOLOGY Q2(R1)

Transcription:

Shotgun Proteomic Analysis Department of Cell Biology The Scripps Research Institute

Biological/Functional Resolution of Experiments Organelle Multiprotein Complex Cells/Tissues Function Expression Analysis Information Content Interaction Analysis

Shotgun Proteomics Protein Mixture µlc Abundance Abundance proteolysis MS 3 Time Time Peptide Mixture Output Filtering and Re-Assembly 80 node Beowulf Computer Cluster DTASelect SEQUEST Abundance Abundance Abundance Abundance 2 m/z m/z m/z m/z 2 3 Abundance Abundance 1 MS/MS m/z m/z Abundance Abundance m/z m/z 1

Data Processing Issues with Shotgun Proteomics 1:1 Mixture of Unlabeled/ 15 N-Labeled Yeast Soluble Proteins Analyzed Using a Single 12h Analysis MS/MS Spectra Protein ID s (*1 peptide confirmed w/ RelEx) Protein ID s (*2 peptides confirmed w/ RelEx) LCQ 18,970 559 157 LTQ 86,950 (4.5 x) 891 (1.6x) 304 (1.9x) *RelEx was used to evaluate the presence of labeled isotopomer

Processing Tandem Mass Spectra Spectral Quality Filter Subtractive Analysis LibQUEST DTASelect/Contrast Database Analysis SEQUEST PepProb De Novo: Unusual, Unanticipated Features GutenTag Quantification RelX Filter and Assembly DTASelect ProtProb

Data Issues Data quality How is a match determined? Protease issues? Validation issues? Posttranslational Modifications? Quantification? Sampling Issues?

Spectral Filtering with Hand Crafted Features Called Good Called Bad %Correct +1 GOOD 671 75 89.9% +1 BAD 5585 11475 67.3% +2/+3 GOOD 3166 348 90.1% +2/+3 BAD 11611 26684 69.7% All GOOD 3837 423 90.1% All BAD 17196 38159 68.9% Bern, Goldberg, MacDonald, Yates Bioinformatics (in press)

Multi-Enzyme Digestion Procedures Identification of PTMs Sample is split into 3 aliquots Digest using 3 different proteases Mix and analyze by LC/LC- MS/MS trypsin elastase subtilisin MudPIT Interpret spectra using SEQUEST SEQUEST

High ph/proteinase K Method (hppk Method) High ph Proteinase K Wu et al., Nat. Biotech. 21:532-538 (2003) Howell and Palade, J. Cell Biol. 92:822-832 (1982)

Overlapping Peptide Coverage (TM7) gi 13794265 ref NP_056312.1 DKFZP564G2022 protein MAAAAWLQVL PVILLLLGAH PSPLSFFSAG PATVAAADRS KWHIPIPSGK NYFSFGKILF RNTTIFLKFD GEPCDLSLNI TWYLKSADCY NEIYNFKAEE VELYLEKLKE KRGLSGNIQT SSKLFQNCSE LFKTQTFSGD FMHRLPLLGE KQEAKENGTN LTFIGDKTAM HEPLQTWQDA PYIFIVHIGI SSSKESSKEN SLSNLFTMTV EVKGPYEYLT LEDYPLMIFF MVMCIVYVLF GVLWLAWSAC YWRDLLRIQF WIGAVIFLGM LEKAVFYAEF QNIRYKGESV QGALILAELL SAVKRSLART LVSIVSLGYG IVKPRLGVTL HKVVVAGALY LLFSGMEGVL RVTGAQTDLA SLAFIPLAFL DTALCWWIFI SLTQTMKLLK LRRNIVKLSL YRHFTNTLIL AVAASIVFII WTTMKFRIVT CQSDWRELWV DDAIWRLLFS MILFVIMVLW RPSANNQRFA FSPLSEEEEE DEQKEPMLKE SFEGMKMRST KQEPNGNSKV NKAQEDDLKW VEENVPSSVT DVALPALLDS DEERMITHFE RSKME WVEENVPSSVTDVALPALLDS*DEER VEENVPSSVTDVALPALLDS*DEER EENVPSSVTDVALPALLDS*DEER ENVPSSVTDVALPALLDS*DEER VPSSVTDVALPALLDS*DEER PSSVTDVALPALLDS*DEER LPALLDS*DEER PALLDS*DEER (TM6) gi 14249524 ref NP_116213.1 hypothetical protein FLJ14681 MVAACRSVAG LLPRRRRCFP ARAPLLRVAL CLLCWTPAAV RAVPELGLWL ETVNDKSGPL IFRKTMFNST DIKLSVKSFH CSGPVKFTIV WHLKYHTCHN EHSNLEELFQ KHKLSVDEDF CHYLKNDNCW TTKNENLDCN SDSQVFPSLN NKELINIRNV SNQERSMDVV ARTQKDGFHI FIVSIKTENT DASWNLNVSL SMIGPHGYIS ASDWPLMIFY MVMCIVYILY GILWLTWSAC YWKDILRIQF WIAAVIFLGM LEKAVFYSEY QNISNTGLST QGLLIFAELI SAIKRTLARL LVIIVSLGYG IVKPRLGTVM HRVIGLGLLY LIFAAVEGVM RVIGGSNHLA VVLDDIILAV IDSIFVWFIF ISLAQTMKTL RLRKNTVKFS LYRHFKNTLI FAVLASIVFM GWTTKTFRIA KCQSDWMERW VDDAFWSFLF SLILIVIMFL WRPSANNQRY AFMPLIDDSD DEIEEFMVTS ENLTEGIKLR ASKSVSNGTA KPATSENFDE DLKWVEENIP SSFTDVALPV LVDSDEEIMT RSEMAEKMFS SEKIM WVEENIPSSFTDVALPVLVDS*DEEIMTR IPSSFTDVALPVLVDS*DEEIMTR PSSFTDVALPVLVDS*DEEIMTR SFTDVALPVLVDS*DEEIMTR TDVALPVLVDS*DEEIMTR TDVALPVLVDS*DEEIMTRS DVALPVLVDS*DEEIMTR VALPVLVDS*DEEIMTR ALPVLVDS*DEEIMTR ALPVLVDS*DEEIMTRS LPVLVDS*DEEIMTR PVLVDS*DEEIMTR PVLVDS*DEEIMTRS

gi 27229118 ref NP_082129 RIKEN cdna 0610006F02; S-adenosylmethionine-dependent methyltransferase activity [Mus musculus] MDALVLFLQL LVLLLTLPLH LLALLGCWQP ICKTYFPYFM AMLTARSYKK MESKKRELFS QIKDLKGTSG NVALLELGCG TGANFQFYPQ GCKVTCVDPN PNFEKFLTKS MAENRHLQYE RFIVAYGENM KQLADSSMDV VVCTLVLCSV QSPRKVLQEV CVDPN PNFEKF VTCVDPN PNFEK VTCVDPN PNFEKFLTK QRVLRPGGLL FFWEHVAEPQ GSRAFLWQRV LEPTWKHIGD GCHLTRETWK DIERAQFSEV QLEWQPPPFR WLPVGPHIMG QFSEV QLEWQPPPFR WLPVGPHIM EV QLEWQPPPFR WLPVGPH EV QLEWQPPPFR WLPVGPH EV QLEWQPPPFR WLPVGPHIM EV QLEWQPPPFR WLPVGPHIM **LEWQPPPFR WLPVGPH LEWQPPPFR WLPVGPH LEWQPPPFR WLPVGPHIM LEWQPPPFR WLPVGPHIM LEWQPPPFR WLPVGPHIMG EWQPPPFR WLPVGPHIM WQPPPFR WLPVGPH KAVK

Dimethyl Arginine Containing Peptide Golgi Peptide Synthetic Peptide 100 y12 ++ 100 y12 ++ 80 y12 80 Relative Abundance 60 40 20 0 y5 b3 b4 y6 y3 y7 y14 ++ y8 y11 b9 y10 b10 b11 b13 b14 400 600 800 1000 1200 1400 1600 1800 m/z Relative Abundance 60 40 20 0 y5 y14 ++ y3 y6 y7 b4 y8 y12 b11 y11 b9y10 b10 b12 b13 b14 400 600 800 1000 1200 1400 1600 1800 m/z

Shotgun Proteomic Experiments and Sampling Issues Based on prior studies in yeast, we know not every protein present is id d Reproducibility is good for high abundance proteins 70-80% Reproducibility is not as good for low abundance proteins. 20-30% Is this predictable?

Random Sampling Model for Data Dependent Acquisition n L = # of protein species at particular level K = n L *(1 (1 L / N) S ) L = abundance level N = total number of proteins S = experiments Number of Identified Proteins 4000 3500 3000 2500 2000 Experimental Semi-empirical Model Random Model 1500 1000 2 4 6 8 10 12 14 16 Number of MudPIT Runs

Distribution of Protein Identifications After Repeating Analysis 9 times 700 # of proteins identified in all 9 experiments Number of Proteins 600 500 400 300 # of proteins identified in 1 experiment 200 100 2 4 6 8 10 Number of MudPIT's

RelX software 1) Predict m/z Range 100 2) Sum Signal in Range 100 DTASelect Output Peptide Sequence LVNHFIQEFK Relative Abundance 80 60 40 20 Relative Abundance 80 60 40 20 3) Store Mass Chromatograms 100 0 1270 1275 1280 1285 1290 1295 100 m/z 4) Peak Detection 0 1270 1275 1280 1285 1290 1295 m/z 5) Correlation Relative Abundance 80 60 40 20 Relative Abundance 80 60 40 20 Sample Intensity 0 32 33 34 35 36 37 Time (min) 0 32 33 34 35 36 37 Time (min) Reference Intensity

Systematic Errors are Present in Samples Average Peptide Ratio 7 6 5 4 3 2 1 1 2 3 4 5 Protein Mixture Ratio TSA1 = 1.335±0.015 SSA1 = 1.004±0.018 ADH1 = 0.661±0.045

Spectral Sampling for Relative Quantification Combined Data for 6 proteins added to Yeast Soluble Cell Lysate at 4 different levels % Spectrum Observed 30 25 20 15 10 5 0 y = 1.0613x + 0.2608 R 2 = 0.9997 Linear dynamic range 2-orders Measuring small changes is not as reliable 0 5 10 15 20 25 30 % Protein Markers Added

Synthesis of Ribosomal Proteins in Plasmodium falciparum Spectral Count 4000 3500 3000 2500 2000 1500 1000 Ribosomal Protein Abundance Striking trend: almost all ribosomal proteins increase over ring to troph transition 500 0 Ring T roph Sc hiz Mero Stage

Standards 1. Data formats: McDonald et al MS1, MS2, and SQT - three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identifications, RCM 2004, 18, 2162-8. Database search standard based on: Washburn et al, Large Scale analysis of the yeast proteome via multidimensional protein identification technology Nature Biotechnology 19, 242-247 (2001) MacCoss et al, Probability Based Validation of Protein Identifications Using a Modified SEQUEST Algorithm, Analytical Chemistry 74, 5593-5599 (2002). Normalized Scores

Standards 4. Standards should not prevent innovation Data formats should be practical e.g. storage space 7. Data processing tools should be transparent and validated, e.g. published 8. Data for publication: information to support biological conclusions- sequences of peptides id d. 9. Archiving data: biological conclusions should be the most important part of the experiment

Probability Distributions Number of Spectra 12000 10000 8000 6000 4000 2000 Charge State +1 Charge State +2 Charge State +3 0 0 5 10 15 20 Probability Scores

Single Spectral Matches are Problematic: How to tell if they are correct Searches determine closeness of fit based on some measure: Compare matches with different programs Probability scoring: : P = random match based on frequency of fragment ions in database SEQUEST: : XCorr measures how close the spectrum fits to ideal spectrum Manual validation, experimental validation de novo interpretation

Multi-Enzyme Digest Sample is split into 3 aliquots Digest using 3 different proteases trypsin elastase subtilisin

Multi-Enzyme Digest Sample is split into 3 aliquots Digest using 3 different proteases Mix and analyze by LC/LC- MS/MS Interpret spectra using SEQUEST trypsin elastase subtilisin MudPIT SEQUEST

Properties of Data Dependent Data Acquisition Most invariant property is spectral copy number 100 90 80 70 60 P1 P2 P3 P4 50 40 30 20 10 0 % of Proteins Identified % of Unique Peptides % of Spectrum Copies

GutenTag: : Partial de novo Sequencing of Tandem Mass Spectra Database searching assumes minimal errors in the database and sequence variations between strains, individuals and species Modifications need to be specified in database searches, so unanticipated modifications will be missed. Partial De novo analysis of tandem mass spectra in large-scale can identify peptides containing sequence variations and unanticipated modifications

GutenTag 6170 tandem mass spectra: LC/LC/MS/MS analysis of simple digested protein mixture 1987 spectra matched by SEQUEST 1328 spectra matched by GutenTag 766 partial matches suggesting modifications and sequence variations Total matching spectra by GutenTag: : 2,094 Partial de novo will extend identifications Software is Automated and Large-Scale

Improved Spectral Quality Effects Peptide Identification Infusion of 1 pmol/µl Angiotensin I LCQ-Classic 761 of 1000 MS/MS Spectra Matched the Correct Sequence LTQ 970 of 1000 MS/MS Spectra Matched the Correct Sequence Frequency 200 175 150 125 100 75 50 25 0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 XCorr Frequency 300 250 200 150 100 50 0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 XCorr

Database Searching with Tandem Mass Spectra The goal is to identify peptides using MS/MS spectra and amino acid sequence databases. Develop a probabilistic model that establishes a relationship between the database sequences and the spectrum to complement quantitative measures of closeness-of of-fit fit Develop non-empirical probabilistic measures using cross-correlation correlation measurements

Probability Model Null hypothesis: All fragment matches to MS/MS spectrum are by random. N, K number of all fragments and all fragment matches, respectively. N 1 is the number of fragments of a particular peptide which has K 1 matches. P K N1 K K N K K, N ( K1, N1) = N1 CN We seek an amino acid sequence that has the smallest probability of being a random match. C 1 * C 1

Model Hypergeometric Distributions Probability 0.25 0.20 0.15 0.10 0.05 K=100 K=200 K=300 N=1000 N 1 =30 N, K number of all fragments and all fragment matches, respectively. N1 is the number of fragments of a particular peptide which has K1 matches. 0.00 0 5 10 15 20 25 30 Number of Successes

Significance of Peptide Identification Not all identifications are significant: poor quality spectra of peptides, incomplete peptide fragmentation, inaccuracies in database, posttranslational modifications, MS/MS of chemical noise and non-peptide molecules. Significance of a match (P_value) is also obtained from the hypergeometric distribution.

Significance of a Match Probability 0.16 0.12 0.08 0.04 H 0 N=1000 K=300 N 1 =30 Significance of K 1 =14 0.00 0 5 10 15 20 25 30 Number of Successes

Yeast Database Probability,Frequency 0.25 0.20 0.15 0.10 0.05 Predicted Distribution Observed Frequency N=5284150 K=569160 N 1 =40 0.00 0 10 20 30 40 Number of Fragment Matches

Cumulative Distribution 1.0 Cumulative Distribution Function 0.8 0.6 0.4 0.2 0.0 0 10 20 Number of Fragment Matches

Mass/Charge State Dependence Scores that use closeness of fit measures can artificially inflate with weight/mass. This complicates use of uniform criteria for identification. Probabilities generated by hypergeometric distribution are charge/weight independent.

Number of Spectra Cross-Correlation Score Distribution 3500 3000 2500 2000 1500 1000 500 XCorr+1 XCorr+2 XCorr+3 0 0 1 2 3 4 5 6 7 8 Cross-Correlation Scores

Database Dependence The probability inferred from the hypergeometric distribution is in principle database dependent. However, the dependency is very weak.

NRP and Yeast Databases 0.25 0.20 NRP Database Yeast Database Probability 0.15 0.10 0.05 0.00 0 10 20 30 Number of Fragment Ion Matches

Pep_Probe Summary Implements 4 scoring schemes: hypergeometric, poisson, maximum likelihood and cross-correlation. correlation. Sorts results either by hypergeometric or cross-correlation correlation scores. No enzyme specificity is assumed. Reports significance of each match. Can search for posttranslational modifications to three different amino acids. Has been implemented to run on a standalone or compute clusters. Runs on heterogeneous cluster of computers, in WINDOWS and LINUX platforms.

Processing Tandem Mass Spectra Spectral Quality Eliminate Poor Quality Spectra, Score Quality of Other spectra Database Analysis Multiple Methods With Different Selectivity's Analyze Unusual or Unanticipated Features De novo or partial De novo analysis Assemble and Annotate Data Biological Significance

Increased Data Production Requires Automated Data Analysis Relative Abundance m/z Spectral Quality Bioinformatics (in press) Yates et al., JASMS 5, 976 (1994) Yates et al. Anal. Chem 67, 1426 (1995) Yates et al. Anal. Chem.. 67, 3205 (1995) MacCoss et al. Anal. Chem (2002) Sadygov and Yates, Anal. Chem (in press) LIBQUEST MS/MS Comparison SEQUEST & Pep_Prob Database Search Search Library Searching Comparative Analysis Subtractive Analysis Yates et al. Anal.Chem. 70, 3557 (1998) Existing Sequence PTM Probability DTASelect for Post Protein Analysis Review Identification Software SEQUEST-SNP SNP Peptide Id Probability Variant P-Val, Search S.C. Related Sequence SNP Analysis Mutation Analysis Link et al. Nature Biotech. 17,, 676-682 682 (1999) Tabb et. al. J. Proteome Res. 1, 26, (2002) GutenTag De Novo Sequencing Gatlin et al. Anal.Chem. 72, 757 (2000). Alternate Splicing Unantic.. Mod. Unknown ORFs Tabb et al Anal. Chem

Data Considerations Different types of experiments Different types of data analysis

Integrated Multi-Dimensional Liquid Chromatography RP SCX RP 100 micron FSC 5 µm 100-300 nl/min Waste hv

Comprehensive Analysis of Complex Protein Mixtures Cells/Tissues Multiprotein Complex/Organelle Total Protein Characterization Protein Identification: What s there Post Translational Modifications: Regulation Quantification: Dynamics Proteomic Data to Knowledge: Genetics, RNAi, sirna Translation of technology development into biological discovery