Global and Discovery Proteomics Christine A. Jelinek, Ph.D. Johns Hopkins University School of Medicine Department of Pharmacology and Molecular Sciences Middle Atlantic Mass Spectrometry Laboratory Global and Discovery Proteomics Lecture Agenda Genomics vs. Proteomics Discovery Proteomics: Basic Mass Spectrometry Techniques Discovery Proteomics: Basic Bioinformatic Techniques Database Searching: Mascot Sequest Combining Algorithms Cloud Computing 1
Human Genome Project Sequencing the human genome has transformed current biomedical research Human Genome Project http://www.ncbi.nlm.nih.gov/genome Initiated : October 1990 Working Draft : 2000 Complete : 2003 Coding genes: 20,476 Non coding genes: 22,170 Pseudogenes: 13,322 Ensembl Current Totals Inferences about biological systems Completion of genome sequencing inspired corresponding approach to identify and characterize proteins comprising the human proteome Protein-Coding Genes Gregory, TR. Nature Reviews Genetics. 6, 699-708. doi:10.1038/nrg1674 2
Proteomics A proteome consists of all proteins present in a sample (cell, tissue, body fluid, etc.) at a defined point in time and under defined conditions Proteomics is the large-scale study of the expression, localization, function, and interaction of proteins expressed by an organism s genome Proteomics Using Mass Spectrometry for Proteomics Experiments Surinova S. et al. J. Prot. Res. 2011 10:5-16 Over the last two decades, mass spectrometry-based technologies have undergone rapid advances and a high degree of innovation to fulfill the expectations of the proteomics and life science communities 3
Proteomics Using Mass Spectrometry for Proteomics Experiments Nagaraj N et al. Mol. Syst Biol 2011 7:548 Schwanhäusser B et al. Nature 2011 473:337-342 Beck M et al. Mol. Syst. Biol. 2011 7:549 Leading mass spectrometry-based proteomics laboratories have demonstrated that protein products of up to ~10,000 of the ~20,000 protein-coding human genes can be identified and quantified in a single experimental system The -omics Iceberg http://www.proteome.ru/en/avogadro 4
Human Plasma Proteome A case of the -omics Iceberg Anderson N L, Anderson N G Mol Cell Proteomics 2002;1:845-867 Challenges in Proteomics Aebersold R.Nature Methods. 6: 411 412. doi:10.1038/nmeth.f.255 5
Shotgun Proteomics Bottom-Up Proteomics General procedure using LC-MS/MS for proteomic profiling Aebersold R and Mann M. Nature 2003 422:198-207 Sample Preparation Proteins Peptides HPLC-MS/MS Analysis Statistical Analysis Bioinformatics MS Data MS/MS Data 6
Bottom-Up Proteomics Common Sample Preparative Steps Bottom-Up Proteomics General procedure using LC-MS/MS for proteomic profiling Aebersold R and Mann M. Nature 2003 422:198-207 Sample Preparation Proteins Peptides HPLC-MS/MS Analysis Statistical Analysis Bioinformatics MS Data MS/MS Data 7
Commonly used Enzymes Bottom-up Mass Spectrometry Bottom-Up Proteomics General procedure using LC-MS/MS for proteomic profiling Aebersold R and Mann M. Nature 2003 422:198-207 Sample Preparation Proteins Peptides HPLC-MS/MS Analysis Statistical Analysis Bioinformatics MS Data MS/MS Data 8
Bottom-Up Proteomics Data-Dependent Tandem Mass Spectrometry Tandem MS/MS Scans Fourier transformed MS Scan Fragmentation A B C Bottom-Up Proteomics Data-Dependent Tandem Mass Spectrometry 9
Bottom-Up Proteomics Peptide Fragmentation Biomed. Mass Spectrom. 11 (11): 601. doi:10.1002/bms.1200111109 Bottom-Up Proteomics Peptide Fragmentation 10
Bottom-Up Proteomics Identifying Post-Translational Modifications Technical Limitations: LOD Data Dependent Mass Spectrometry http://www.proteome.ru/en/avogadro 11
Technical Limitations: LOD Data Dependent Mass Spectrometry Ghaemmaghami S. et al. Nature. 2003. 425: 737-741. Technical Limitations: Dynamic Range Data Dependent Mass Spectrometry Smith R. et al. Advances in Protein Chemistry. 2003. 65: 85 131. 12
Technical Limitations: Sampling Data Dependent Mass Spectrometry Michalski A., Cox J., and Mann M. J. Proteome Res. 10, 1785 1793. Bottom-Up Proteomics General procedure using LC-MS/MS for proteomic profiling Aebersold R and Mann M. Nature 2003 422:198-207 Sample Preparation Proteins Peptides HPLC-MS/MS Analysis Statistical Analysis Bioinformatics MS Data MS/MS Data 13
800.43 100 90 80 70 448.20 60 50 40 602.48 814.37 30 440.22 701.38 400.63 20 213.13 391.80 585.04 10 312.35 515.36 683.26 782.43 132.93 0 200 300 400 500 600 700 800 900 m/z 11/9/2012 Bottom-Up Proteomics Protein and Peptide Identification Protein Tryptic Peptides Experimental Mass Spectrum BSA_500fmol_02_120601 #1854 RT: 21.52 AV: 1 NL: 1.91E3 T: ITMS + c NSI d Full ms2 457.27@cid35.00 [115.00-925.00] Relative Abundance Protein Sequence SEMHIKHYTTKILGFREE GDSCPLKQWDDSKILVAV ADKLLEYEEKILLFNSAKY LLDESSTYKLMHDDSV Theoretical Tryptic Peptides SEMHIKHYTTK ILGFR EEGDSCPLK QWDDSK ILVAVADK LLEYEEK ILLFNSAK YLLDESSTYK LMHDDSV Theoretical Mass Spectrum Bioinformatics Resources Protein and Peptide Identification 14
Database Searching Database Search Algorithms Protein and Peptide Identification Protein Comparing raw MS/MS data with molecular sequence databases to indentify constituent proteins Peptide molecular masses Fragment ion mass & intensity values Protein/DNA sequence databases Search Engine Protein identification and characterization 15
Public Proteomic Databases MSDB: Comprehensive, non-identical protein sequence database maintained by the Proteomics Department at the Hammersmith Campus of Imperial College London NCBInr: Comprehensive, non-identical protein database maintained by NCBI. The entries have been compiled from GenBank CDS translations, PIR, SWISS-PROT, PRF, and PDB SwissProt: High quality, curetted protein database dbest: Division of GenBank containing "single-pass" cdna sequences, or Expressed Sequence Tags ThermoFisher Public Proteomic Databases Uniprot FASTA file >gi 5524211 gb AAD44166.1 cytochrome b [Elephas maximus maximus] LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYV LPWGQMSFWGATVITNLFSAIPYIGTNLV EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHE TGSNNPLGLTSDSDKIPFHPYYTIKDFLG LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEW YFLFAYAILRSVPNKLGGVLALFLSIVIL GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQP VEYPYTIIGQMASILYFSIILAFLPIAGX IENY 16
Mascot Search Algorithm MASCOT Software Protein and Peptide Identification Mascot combines 3 types of searches: Peptide Mass Fingerprinting MS/MS ions Sequence Query Searches against any FASTA database Unique, true probability based scoring Accepts mass spectrometry data from all leading instrument manufacturers High throughput format for single and multi- processor systems and clusters Automates search submission without custom programming Results summary of search results in web browser format Licensed be more than a thousand academic and commercial laboratories 17
Peptide Mass Fingerprinting (PMF) Peptide Identification using MASCOT Peptide Mass Fingerprinting (PMF) Peptide Identification using MASCOT 18
MS/MS Ion Searching Peptide Identification using MASCOT MS/MS Ion Searching Peptide Identification using MASCOT 19
Protein Identification using MASCOT Using MS and MS/MS spectra x Protein identified using both MS and MS/MS spectra Protein Identification using MASCOT Combining MS and MS/MS spectra 20
Protein Identification using MASCOT Combining MS and MS/MS spectra Protein Identification using MASCOT Interpreting Results ThermoFisher 21
Protein Identification using MASCOT Interpreting Results ThermoFisher Protein Identification using MASCOT Interpreting Results ThermoFisher 22
Protein Identification using MASCOT Interpreting Results ThermoFisher Protein Identification using MASCOT Common Settings for Search ThermoFisher 23
Protein Identification using MASCOT Common Settings for Search ThermoFisher Sequest Search Algorithm 24
Protein Identification using Sequest MS/MS spectra-based Identification Data Dependent Mass Spectral Scans: MS/MS depends on the MS Correlation Analysis >gi 84670 pir B27257 coagulogen II precursor - horseshoe crab (Tachypleus tridentatus)gi 10809 (X04192) coagulogen type II [Tachypleus tridentatus]gi 217395 gnl PID d1000491 (D00077) coagulogen type 2 [Tachypleus tridentatus]gi 356167 prf 1208319A coagulogen [Tachypleus sp.] [MASS=21826] MEKKLFGIALLLTTVASVLAADTNAPICLCDEPGVLGRTQIV TTEIKDKIEKAVEAVAQESGVSGRGFSIFSHHPVFREC GKYECRTVRPEHSRCYNFPPFIHFKSECPVSTRDCEPVFGYT VAGEFRVIVQAPRAGFRQCVWQHKCRFGSNSCGYNGRC TQQRSVVRLVTYNLEKDGFLCESFRTCCGCPCRSF >gi 585398 sp P28175 LFC_TACTR LIMULUS FASTA Protein Database Protein / Peptide / Modification Identification Protein Identification using Sequest Sequest Search Workflow 25
Protein Identification using Sequest Sequest Search Parameters Extraction Parameters Search Parameters Modifications Comparing Algorithms Mascot vs. Sequest 26
Comparing Algorithms Mascot vs. Sequest Data file format ThermoFisher Proteome Discoverer Software 27
Combining Algorithms Mascot and Sequest using Proteome Discoverer ThermoFisher Combining Algorithms Mascot and Sequest using Proteome Discoverer ThermoFisher 28
Proteome Discoverer Interpreting Results ThermoFisher Proteome Discoverer Interpreting Results ThermoFisher 29
Proteome Discoverer Interpreting Results ThermoFisher Proteome Discoverer Interpreting Results ThermoFisher 30
Cloud Computing Strategies Cloud Computing OLD 40:00:00 NEW 01:00:00 31
Combining Bioinformatic Tools Integrated Analysis Inc. Pass Software Convert MS. (.mzxml,.mgf,.mzml) FASTA. Merger Create Decoy. FASTA Peptide Prophet. Isobaric Labeled. Quantitation Refine & Merge. Results OMSSA. X! Tandem. Custom Protein Prophet. Custom Algorithms. Combining Bioinformatic Tools Customizing Bioinformatic Workflows High Resolution MS OMSSA and X!Tandem Merge FASTA Create Reverse Sequence Convert MS 32