A look at data for quantitative analysis using MSight and Phenyx Pierre-Alain Binz Institut Suisse de Bioinformatique GeneBio SA Atelier Protéomique Quantitative 25-27 Juin 2007 La Grande Motte Already said Importance of biological question, sample choice, experimental strategy Complexity of sample is a challenge for MS Peak capacity, concentration range, chemical properties, Many methods with goods and bads itraq, SILAC, ICAT, MRM, label-free, Many instrumental settings: heterogeneity of data type, amount, resolution Many bioinformatics tools Identification, signal detection, quantitation Validation methods Already said Importance of biological question, sample choice, experimental strategy Complexity of sample is a challenge for MS Peak capacity, concentration range, chemical properties, Many methods with goods and bads itraq, SILAC, ICAT, MRM, label-free, Many instrumental settings: heterogeneity of data type, amount, resolution Many bioinformatics tools Identification, signal detection, quantitation Outlook Visualise LC-MS data Detect signal Align LC-MS runs Match images (differential analysis) Add identification results Quantitation with search engine Validation methods What data for quantitation? Look at LC-MS data MS data: dimensions: Intensity Rt, pi, scan number Secondary data Sample (one, more than one) Molecular interpretation (peptide, protein) Quantitation method (label description, comparison method, thresholds, corrections) Raw MS traces or peaklists (spectrum view or gel view) Chromatographic profiles (TIC, XIC ) 2D images (LC-MS) Annotated spectra Overlapped spectra, head-to-head view Overlapped images 1
Visualise LC-MS data: spectrum view, gel view, chromatograms 2D representation 38x26 183 122 88 84 98 104 104 104 110 108 108 116 112 106 116 120 98 74 32 76 122 145 141 133 106 84 110 116 78 74 104 104 114 102 92 88 106 72 88 82 92 98 96 90 82 82 86 90 90 94 96 94 74 44 26 52 100 141 124 114 68 50 88 100 56 92 100 116 116 96 82 108 68 72 82 86 84 88 92 88 84 58 60 56 50 66 56 66 40 36 28 38 66 92 82 58 28 26 50 52 48 72 102 114 120 72 88 120 66 64 82 74 62 52 54 74 70 48 46 50 44 36 40 36 34 30 24 32 34 42 36 36 28 24 26 28 28 40 68 114 88 84 112 131 72 60 64 56 56 42 34 42 48 36 36 38 34 32 36 30 30 34 32 34 34 32 30 26 24 24 24 20 18 24 36 52 60 94 131 135 54 52 50 46 40 34 32 34 34 32 32 34 30 32 26 28 28 26 30 42 28 36 38 34 48 24 26 20 18 26 22 22 38 82 124 133 42 34 36 38 34 30 34 32 32 40 32 26 32 32 26 26 26 26 24 32 36 46 68 56 36 32 26 20 26 18 18 28 20 36 74 100 40 34 30 26 28 34 34 32 34 36 32 34 26 22 22 26 26 28 28 64 50 84 108 100 80 54 40 26 20 18 28 20 18 26 42 72 32 28 28 34 28 28 28 28 36 26 32 28 26 24 26 26 24 32 36 52 76 131 159 147 135 92 64 36 22 20 20 24 18 20 26 50 20 26 28 28 30 28 34 34 26 22 34 30 30 20 26 28 58 44 52 82 120 159 195 195 175 143 108 86 40 22 18 16 18 22 24 32 22 20 20 32 34 44 60 50 22 26 44 42 24 18 14 18 72 72 48 72 112 173 205 207 193 175 161 149 84 54 24 24 40 60 58 46 24 20 24 32 48 66 76 66 32 42 64 64 28 16 12 18 36 76 48 32 56 161 207 207 203 195 193 187 133 96 56 42 44 92 96 68 22 26 32 38 52 78 90 76 30 40 80 80 60 26 18 20 50 102 62 36 50 155 207 207 201 201 201 195 171 139 88 56 52 80 124 98 30 26 26 50 56 82 104 96 54 34 78 86 76 56 48 50 76 98 58 34 74 175 211 207 207 211 207 203 193 171 120 74 54 74 114 112 26 22 26 52 58 80 106 112 86 50 40 68 84 70 70 78 68 56 40 48 116 199 213 211 213 215 207 211 203 183 145 98 60 54 96 118 36 26 32 52 68 76 96 120 104 74 40 42 64 72 76 64 60 36 48 104 171 211 215 213 215 215 207 211 205 183 155 114 80 68 86 118 60 30 24 32 50 80 94 104 116 102 74 44 40 36 42 40 46 62 96 157 199 211 211 213 211 213 205 203 195 175 155 124 106 84 90 116 74 48 28 26 56 88 104 100 106 112 104 82 66 46 48 58 74 118 155 189 205 207 205 205 211 207 199 197 189 161 143 133 124 110 94 106 88 68 50 50 66 86 104 110 108 116 124 112 98 88 90 106 131 159 187 199 201 205 207 211 213 211 205 201 175 147 131 139 137 143 102 100 106 94 82 76 80 82 82 100 110 122 133 135 124 124 133 141 155 183 189 195 199 201 205 211 213 211 203 181 155 139 133 141 149 133 106 102 122 114 106 96 86 92 68 58 102 116 129 133 141 145 151 151 155 167 173 175 189 187 195 197 195 187 175 165 169 151 143 137 129 116 106 108 129 131 120 102 98 104 94 98 112 104 100 106 124 126 135 147 149 147 155 167 165 171 179 179 169 177 181 187 189 179 157 151 147 143 131 131 137 135 126 112 108 116 118 118 116 96 98 96 114 100 84 112 126 131 141 147 141 143 165 157 135 157 159 163 175 173 171 169 173 173 157 143 143 141 131 129 124 131 131 124 114 98 92 110 116 88 74 106 120 122 124 120 92 96 120 104 88 120 157 159 165 165 179 175 175 167 155 139 141 143 139 133 139 135 135 124 120 98 110 120 112 98 76 120 120 131 129 133 104 100 120 114 90 116 165 149 143 153 165 161 163 149 147 133 135 141 139 139 147 145 143 135 122 110 120 122 114 104 100 129 118 129 133 137 114 98 126 131 120 129 165 141 141 149 149 149 149 141 137 137 Rt I Rt Example: LC-ESI-Q-TOF 42-59 kda extract of human BJAB B-cell line Time 0 Data display principle 10 Time 20 20 min 30 40 Image part to display 6000x400 Projection Screen size 800x600 400 600 800 1000 200 Da mass time interval 400-1200 0-45 min sampling rate 0.025 3 s 32 000 measures 900 spectra 28 800 000 measures (55 MB) 1200 MS data 32000x900 Full image Time 0 Zoom 256x 10 Time 32.5 20 20 min 30 40 30 s 33 400 600 800 1000 200 Da 1200 658 658.5 658.75 658.25 659 659.5 659.25 659.75 660 660.25 657.5 657.75 660.5 0.5 Da Less 3+ than 0.001 % of the data displayed 2+ 0.33 0.5 2
MSight LC- MS data analysis tool It looks a bit like Melanie Developed by the Proteome Informatics Group of the Swiss Institute of Bioinformatics Based on Melanie 2D gel analysis software http://www.expasy.org Why MSight? Generate and evaluate LC-MS images Import LC-MS and MS/MS runs from various MS instruments and formats Workspace to manage experiments and data Rich visualisation and annotation Visualise the complexity of a LC-MS run Detect contaminants, running aberations Perform peak detection from raw LC-MS data Improve Rt and accuracy using 2D Quantitation and comparison Alignment and matching of LC-MS images Quantitation reports for differential expression analysis Label-free quantitation, Generation of inclusion/exclusion list Integrate with identification tools (Phenyx) Annotate MS peaks with peptide identity labels Use the annotations to validate matching peaks across LC-MS experiments Import Visualisation Raw LC-MS and MS/MS data format Native format (yep, baf, fid, T2D, dat) mzxml, mzdata Ascii exports Handle big original files (100MB-1GB) Include profile LC-MS trace and MS/MS spectra Open multiple images Zoom in/out Chromatographic profile («XIC») Spectrum view Editable and searchable annotations landmarks, Rt,, peptide sequence, hyperlinks, others Synchronisation between views Superpose images in transparency mode and complementary colors 3D view Artefacts Artefacts 1 min 100 Da 3
Mass calibration Contaminants 44 Da Polymer PEG 30 s 2 Da 500 Da SDS-MALDI-TOF 0.15 interval sampling rate 5 min 100 Da 2 Da mass 560-3000 0.05 48 800 measures 90 spectra 4 392 000 measures Contaminants (2) Redundancy: Peptide modifications 10 min 5 min 100 Da 100 Da Spot from 2DE gel Redundancy: Peptide modifications Redundancy: Peptide modifications 5.33 (3+) Oxidation 5.33 (3+) 10 min 2 min 100 Da 2+ 5 Da 3+ 2+ 4+ 3+ 5+ 4+ Oxidation 4
Outlook Peak detection Visualise LC-MS data Detect signal Align LC-MS runs Match images (differential analysis) Add identification results Quantitation with search engine Detect and quantify MS peaks in a 2D image Interactive use Manual validation via visualisation Export in centroid mode Peak detection variability Locating the source of noise High vs low resolution in axis Isotopic profile vs bump Sampling resolution (Rt and ) LC-MALDI < ESI-MS with MS/MS < ESI-MS (QTOF<LTQ) Noise (chemical, electronic) Shape (rectangle, circle, other) Intensity (max, sum, fit max, integrate) 5 min 5 Da 15 s And for quantitation: Detect individual sample and compare vs align and use one single shape per aligned feature 37.15 Locating the source of noise Streak a b c d e f g h 2000 (5+) 12000 i j k L (2+) 28 min 3000 m n (2+) 2 min 37.15 10 min 80 m i j n b c k d e L f g h 1 Da 1 Da 807 808 809 810 5
Peptide deconvolution Outlook time: 31.9 min 2+ 2+ 2+ 1 min 4+ Visualise LC-MS data Detect signal Align LC-MS runs Match images (relative quantitation) Add identification results Quantitation with identification results 1 Da Alignment and comparison Alignment transformation Align images via landmarks (corrections for local deviations) 4 min Match images (pair peaks together) Report relative quantification information 620 624 628 632 Migration variability Outlook A A - B B 1 min 2 Da 1 min 2 Da Visualise LC-MS data Detect signal Align LC-MS runs Match images (differential analysis) Add identification results Quantitation with identification results 6
Quantitation Quantitation 3+ 5 min Protein Mixture +26 fmol 32-45 kda fraction of lysate from a culture of BSAa B-cell line ~ 1pmol up to 180 proteins detectable in this sample when analysed 740.35 extensively (2+) by LC-MS/MS LGEYGFQNAL +83 fmol +520 fmol +26 fmol +83 fmol +520 fmol 2 min 3+ 10 Da 2 Da Quantitation Differential (low resolution) +26 fmol +83 fmol +520 fmol 5 min 20 Da BSA BSA+Lyz Differential analysis Differential analysis A-B A A-B A 2 Da B 100 Da 7
Outlook Visualise LC-MS data Detect signal Align LC-MS runs Match images (differential analysis) Add identification results Quantitation with search engine Coupling with identification Sofar, quantitation without consideration of molecular interpretation To quantitate protein, need to select signals and to couple with peptide identification Phenyx A software platform dedicated to the identification and characterization of proteins and peptides from mass spectrometry data Developed by GeneBio, in collaboration with the Swiss Institute of Bioinformatics (SIB) Launched in September 2004 (version 1.8) Version 2.3 in April 2007 Rapid development and recognized tool Integration in a number of third-party software (Scaffold, TPP, MSight, ProteinScape, Proteus LIMS, ) Adopted by a number of large renowned Proteomics centres Some features Core calculation Robust and flexible scoring including log likelihood measures Conflict resolution algorithm Use of annotations in databases (PTMs, variants, AA modifs ) Flexible and interactive interface: the Phenyx Web Interface User and jobs properties (user privileges, job sharing) Manual validation functionality Import third party jobs (Mascot, Sequest, X!Tandem, Popitam, ) Many exports (native Phenyx, Excel, XML, text ) Results comparison functionality Integration of Phenyx into workflows: a job follows a suite of configurable events (pre-processing, processing and postprocessing) http://www.phenyx-ms.com http://phenyx.vital-it.ch/pwi http://www.phenyx-ms.com http://phenyx.vital-it.ch/pwi Submission The Phenyx Web Interface: Integrate MSight and Phenyx Desktop Results views Example: Annotate LC-MS images with peptide identifications Annotated images Results comparison Management console Raw LC-MS Exported peptide identifications Peaklists Excel, xml and text exports http://phenyx.vital-it.ch/pwi Phenyx interface 8
Phenyx results are stored as annotations in the images LC-MS and MS/MS: undersampling 21.15 Time [min] 34.85 621 655 LC-MS and LC-MS/MS on a QStar of 49-62 KDa SDS separated and trypsin digested proteins, from a human B-cell line Focus on a small time x region (about 1/250 of the full run) LC-MS and MS/MS: undersampling Outlook 21.15 Time [min] 34.85 FFADLLDYIK LALDLEIATYR 621 655 7/40 peptides analysed 3/7 identified < 10% positively identified using stringent criteria SLDLDSIIAEVK Visualise LC-MS data Detect signal Align LC-MS runs Match images (relative quantitation) Add identification results Quantitation with search engine Quantitation with search engine Quantitation: needed information Use of MS/MS data Reporter ions: isobaric labeling (itraq, TMT) empai (~ratio observed/predicted peptides) Multiplex (SILAC, 18O) Use of MS raw traces Stable isotope labeling (ICAT, SILAC, AQUA, 18O, ICPL, ) Label-free Need identified peptides Need access to intensities (MS/MS and MS) Need quantitation method Labeling method (fixed, variable mode) Definition of pairs Intensity correction factors Thresholds for what peptides to consider (confidence levels, scores, #pep / protein) Create report, calculate ratios, evaluate outliers Include in search engine GUI 9
A quantitation module for Phenyx A quantitation module for Phenyx Generic Quantitation methods (Phenyx) result file API InSilicoSpectro PhenyxPerl + Prediction of Co-peptides Extraction of Intensities: MS level Extraction of Intensities: MS/MS level Labeling config file (xml) Quantitation module Quantitation Result file (text) Calculation of ratios; exportation InSilicoDef definition file (xml) External statistics ( R ) One possible integration with MSight (label-free) Phenyx: generate reports from identification results Perl scripts to generate many kinds of exports Annotated images Raw LC-MS Exported peptide identifications Peaklists Align, compare Annotated peptide ratios Raw LC-MS Exported peptide identifications Peaklists Example for itraq Examples of filters and search parameters that alter quantitation results Minimal number of peptides per protein Minimal number of proteotypic peptides Minimal score for each peptide Filter on redundancy same sequence (same or different charge states) same exact primary structure, Imbedded sequences (missed-cleavages, etc.) Remove outliers (quant values > threshold CV) Number of missed cleavages allowed Semi-tryptic peptides and fully unspecific cleavages Number of queried modifications 10
Only valid peptides: 6 proteins, 22 peptides Min. 3 valid peptides: 4 proteins, 19 peptides Min. 3 valid peptides, Intensities >10 000: 4 proteins, 15 peptides Min. 3 valid peptides, Intensities >10 000, CV<20%: 2 proteins, 7 peptides Effect of filters False discovery rate export # peptide in decoy database # peptide in forward database = f(z-score and p-value) Filter # peptides # proteins 10000 Number of valid hits as fct of zscore 8000 Z-score 22 6 # hits 6000 4000 True hits Hits in reverse 2000 + 3 peptides 19 4 0 4.0 6.0 8.0 10.0 12.0 14.0 z-score FDR (hits in rev / hits in fw d) + Intensity + CV 15 7 4 2 FDR (hits in rev / hits in fwd) 20% 18% 16% 14% 12% 10% 8% 6% 4% 2% 0% 5.0 6.0 7.0 8.0 9.0 10.0 z-score 11
Calibration status of instrument (3 datasets) Calibration status of instrument Effect of the search parameters 1rnd, Only 3 fixed mods 131 valid, 75% cov. 19 17 15 zscore 13 11 9 7 5 3-0.6-0.4-0.2 0.0 0.2 0.4 delta 2rnd, Add variable mods 205 valid, 84% cov. 2rnd, With all mods And half cleaved 348 valid, 90% cov. Import jobs into Phenyx Results comparison tool Sequest What protein in what job? What peptide in what protein/job? X!Tandem Mascot Phenyx Manual validation and then quantitation as if Phenyx job Concatenate results from different runs/search engine And then go to quantitation Summary Take-home messages LC-MS data and 2D image analysis (MSight) Rich source of information Detect strange behaviors (discontuity, contaminations, QC issues) Use of 2 dimensions efficient for signal detection Alignment of multiple MS runs: consider local aberrations Quantitation possible for pairs and for groups (statistics) Quantitation with protein identification tool only (Phenyx) Quantitation methods limited to information in peaklists (isobaric labeling, empai, Multiplex) Quantitation with MSight and Phenyx Get access to raw data information Full panel of quantitation methods Need tight integration (annotation, statistics, filters) Thanks to import functionality, access to other search engines Biological variability Experimental variability Quantitation method tolerance Error to appreciate Many tools available, make your choice according to: biological question capacity to analyse data from the chosen quantitation method capacity to analyse data from your instruments possibility to validate generated data (interactivity) Understand, evaluate 12
Aknowledgements Thank you for your attention! Phenyx devel team Alexandre Masselot Nicolas Budin Anne Niknejad Olivier Evalet PIG group Ron Appel Daniel Walther Gerard Bouchet Sébastien Catherinet Stéphane Pelhâtre Patricia Palagi BPRG Ali Vaezzadeh PAF Manfredo Quadroni University Bern Manfred Heller IPBS David Bouyssié MSight: http://www.expasy.org Phenyx: http://phenyx.vital-it.ch/pwi 13