Introduction to Proteomics Åsa Wheelock, Ph.D. Division of Respiratory Medicine & Karolinska Biomics Center asa.wheelock@ki.se In: Systems Biology and the Omics Cascade, Karolinska Institutet, June 9-13, 2008 1
Focus of course: Tools for data analysis Your analysis is no better than data you have collected... The goals of this proteomics overview: Understand possibilities & limitations Pros and cons of different method Sources of variance in proteomics Take advantage of proteomics core facilities Perform proteomics collaborations Write a short research proposal in proteomics 2
Proteomics publications in Pubmed 3000 2500 2000 1500 1000 500 0 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 First proteomics publication 3
Why Proteins?!? Business end of the cell Detailed information with limited efforts As compared to metabolomics Relatively robust methods available TRANSCRIPTOMICS Limited info from mrna PROTEOME Detailed info Robust technology METABOLOME Techincally challenging 4
Proteomics Methodology No protein PCR 4 nucleotides vs 20+ amino acids Post-translational modifications (PTM) 3 MAIN PROTEOMICS PLATFORMS Gel based methods Shotgun methods (mass spec-based) chromatography-based, gel-free Array based (antibody based) 5
Gel based: 2-Dimensional Electrophoresis SEPARATION VISUALISATION Net charge +2 +1 0-1 -2 Isoelectric point (pi) 3 4 5 6 7 8 9 10 11 ph QUANTIFICATION Stoichiometric protein stain => 3rd dimension Molecular weight log Mw mobility Klose, J. 1975. Humangenetic 26, 231-43 O Farrel, P. 1975. J. Biol.Chem, 250, 4007-21 6
Image acquisition Image acquisition using fluorescent scanner 7
Quantitative image analysis Pixel intensity => 3rd dimension Spot volume = protein quantity 1. Detect spots 2. Match spots across gels 3. Quantify spot volumes 8
Protein identification Trypsin digestion Mass spectrometry (MALDI-TOF/TOF) DATABASE SEARCH => IDENTIFICATION (Swiss-prot, EnSemble) (statistical probability) 9
Protein Identification MS analysis Protein Trypsine digestion Peptides EXPERIMENTAL Peptide masses Protein database DTHKSEIAHRFK DLGEEHFKGLVL IAFSQYLQQCPF DEHVKLVNELTE FAKTCVADESHA GCEKSLHTLFGD ELCKVASLRET Virtual digest EXPECTED Peptide masses Statistical matching 10
Peptide mass mapping MS analysis=> peptide masses Statistical matching DLGEEHFK LHTLFGDR MS/MS analysis=> sequence information database search DTHKSEIAHRFK DLGEEHFKGLVL IAFSQYLQQCPF DEHVKLVNELTE FAKTCVADESHA GEKSLHTLFGRE LCKVASLRET Homology search Validate statistical hit 11
Shotgun vs. Gel-based proteomics Separate Quantify (2DE) Digest MS/MS extract ID Digest Separate (LC) MS/MS Quantify Adapted from Patterson and Aebersold, Nature Genetics 2003, 33:311-23. Fig. 3 12
Semi-quantitative proteomics Both 2DE and MS-based methods NOT quantitative by nature Co-separation: 2 samples => ratios Tags => Semi-quantitative proteomics 13
Shotgun isotope-tagging : Isotope coded affinity tag (ICAT) 14
Multiplexing in 2DE: DIGE - Differential Gel Electrophoresis Treated sample: Covalently labeled (Cy3) Control sample: Covalently labeled (Cy5) Co-separation by 2DE Protein visualization (super-imposable images) λ ex 532nm Cy3 Spot quantification λ ex 633nm Cy5 Direct ratiometric normalization Protein abundance = 532/633 signal ratio Statistical analysis 15
Semi-quantitative proteomics Both 2DE and MS-based methods NOT quantitative by nature Co-separation: 2 samples => ratios Tags => Semi-quantitative proteomics Pooled internal standard + 2-3 samples => Relative quantification 16
Internal Standards in 2DE: DIGE Treated Sample: Covalently labeled (Cy3) Pooled Internal standard (Cy2) Co-separation by 2DE Control Sample: Covalently labeled (Cy5) λ ex 488nm Cy2 λ ex 532nm Cy3 Protein visualization (super-imposable images) Spot quantification λ ex 633nm Cy5 Direct ratiometric normalization Protein abundance = relative to IS Statistical analysis 17
Proteomics in Pubmed 3000 2500 proteomics 2DE 2000 1500 1000 500 0 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 First proteomics publication First DIGE-method published 18
Multiplexing in MS: itraq - isobaric Tag for Relative and Absolute Quantitation Sample A Sample B Sample C Sample D 114 31 PRG Trypsin digestion (peptides) a b c d 115 30 PRG 116 29 PRG 117 28 PRG itraq labelling ID MS/MS m/z Quantification 19
Differential labelling opens up new possibilities Cysteine oxidative states Identify peptides on plasma membrane surface Cellular re-localization 20
2D or not 2D? Gel-based methods: 2-D electrophoresis + soluble proteins + post-translational modifications 21
Post-translational modifications Spot trains Intact proteins 22
2D or not 2D? Gel-based methods: 2-D electrophoresis + soluble proteins + post-translational modifications - technical variance, time consuming MS-based (Gel-free) methods: ICAT, itraq + membrane proteins + low abundance proteins 23
Extremes of physiochemical properties: Peptides Charge - pi range from 3-12 Size - Mw range of 5 500,000 kda Hydrophobicity - membrane proteins 24
2D or not 2D? Gel-based methods: 2-D electrophoresis + soluble proteins + post-translational modifications - technical variance, time consuming MS-based (Gel-free) methods: ICAT, itraq + membrane proteins + low abundance proteins - expensive, data intense 25
Shotgun approcahes and gelbased approaches complementary GEL-BASED SHOTGUN No true proteomics technique yet 26
DYNAMIC RANGE 100 101 102 103 104 105 106 107 108 109 1010 1011 1012 COPIES of each PROTEIN 1012 日本 Japan 104 103 Kobe 神戸 Kyoto 京都 Osaka 大阪 27
Post-translational Modifications (PTMs) - 400 reported PTMs Phosphorylation sites 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 Protein copy numbers in blood (log10) 28
Variance in 2DE BIOLOGICAL VARIANCE Experimental variance Pre-fractionation, isolation & labelling of proteins Protein staining Technical variance Gel-to-gel variation in 2DE Image acquisition (scanner) Post-experimental variance Software-induced variance User dependant variance 29
Variance in 2DE BIOLOGICAL VARIANCE Experimental variance Pre-fractionation, isolation & labelling of proteins Protein staining Technical variance Gel-to-gel variation in 2DE Image acquisition (scanner) Post-experimental variance Software-induced variance User dependant variance 30
Remember that variance adds up: Multiple-step method is not your friend... 10% 40% 31
Variance in 2DE BIOLOGICAL VARIANCE Experimental variance Pre-fractionation, isolation & labelling of proteins Protein staining Technical variance Gel-to-gel variation in 2DE Image acquisition (scanner) Post-experimental variance Software-induced variance User dependant variance 32
Technical variance in 2DE Biological variance Recovery? -SH modifications? Solubilization Rehydration Isoelectric focusing Inhomogenous electric field? Gel-to-gel variations? Staining artifacts? Load IPG strip Recovery? SDS-PAGE Background fluorescence? 33 Protein visualization
Tools to reduce variance Technical variance Internal standard: DIGE Software algorithms: Background subtraction Normalization 34
Dynamic range of scanner 16 bit pixel resolution (2 16 65,000 ~ 10 5 ) Make sure you are using the entire range! 35
Variance in 2DE BIOLOGICAL VARIANCE Experimental variance Pre-fractionation, isolation & labelling of proteins Protein staining Technical variance Gel-to-gel variation in 2DE Image acquisition (scanner) Post-experimental variance Software-induced variance User dependant variance 36
2DE analysis software Main purpose: match and quantify spots Normalization: reduce gel-to-gel variation Background subtraction: Reduce background noise Increase signal/noise ratio Increase sensitivity 37
38
39
Global Background Subtraction PDQuest: : Floating/Rolling Ball 40
Software induced variance PDQuest PG200 100 100 PD PD Quest; Quest; Power Power mean mean 100 100 Expression; Expression; No No background background subtraction subtraction CV CV (%) (%) n=5 n=5 75 75 50 50 25 25 CV CV (%) (%) n=5 n=5 75 75 50 50 25 25 0 0 1 1 51 51 101 101 151 151 201 201 251 251 301 301 351 351 Average CV=10% 0 0 1 51 101 151 201 251 301 351 401 451 1 51 101 151 201 251 301 351 401 451 Average CV=4.6% Software variance up to 30% of technical variance 41
Applications of proteomics BIOMARKER DISCOVERY Biomarker of disease & susceptibility CLINICAL APPLICATIONS Pharmaceutical target identification Improved diagnostics MECHANISTIC STUDIES Protein-protein interactions Protein adduction /Altered protein expression Hypothesis generation: avoid local maxima Systems Biology 42
Proteomics in the future Improved sensitivity Currently: scratching the surface laser capture microdissection Protein microarrays Antibody arrays (e.g. for cytokines) Tissue microarrays (Peter Nilsson, Friday) In vivo subcellular localization assays Protein amplicifation method? i.e. protein-pcr 43
Proteomics in the NEAR future... Focus on INTERPRETING data, not on ACQUIRING data. Pathway Analysis Integrate data from omics cascade Integrate heatmap with biological pathways 44
Preview of coming attractions KEGG & KegArray 45
Take home messages......keep your variance down and your dynamic range up!...keep your false positives down, and your power up! 46