Isobaric Tag based MS Quantification Algorithms Analysis and Implementation

Size: px
Start display at page:

Download "Isobaric Tag based MS Quantification Algorithms Analysis and Implementation"

Transcription

1 Isobaric Tag based MS Quantification Algorithms Analysis and Implementation Master s degree in Proteomics and Bioinformatics Written by Sankar Martial Supervisors: Nicolas Budin 1, Pierre-Alain Binz 1 Academic year 2007/2008 This thesis was submitted as part of the requirements for the Master s degree in Proteomics and Bioinformatics from the University of Geneva. 1 Geneva Bioinformatics (GeneBio) SA 25 avenue de Champel 1206 Geneva Switzerland 0

2 Contents Abstract... 2 Acknowledgements Introduction GeneBio SA Organisation Biological Context Quantitative Proteomics Global View Isobaric Tagging Experimental Application Experimental Design: Principle of Replicate Analysis Methods: Study Quantification Workflow Introduction Experimental samples Tested Software Presentation Software comparison Discussion Establish Quantification WF Introduction Description Validation of the algorithms Discussion Application of the Quantification Workflows Alireza collaboration: Peptides Ratios-based Quantification Approach applied to Characterize Daptomycin Resistance in Staphylococcus aureus Loic Dayon Collaboration CSF Analysis by TMT 6-plex CSF micro-dialysis Conclusion Reference

3 Abstract One single gene gives rise to several proteins. This well-known sentence illustrates all of the complexity of the proteome compared to the genome and the transcriptome. As does genomics, proteomics provides a large toolbox of experimental methods to achieve the quantification of proteins. In contrast, the analytical means to obtain a reliable and trusted value of protein relative abundances are less developed. Although more and more tools are released to assess protein ratios, a quick review of proteomics papers reveals that quantitative data analysis is still performed manually. For this purpose, three algorithms for isobaric tag-based quantitative analysis were implemented. They were successfully applied on experimental datasets provided by the Biomedical Proteomics Research Group (BPRG) and the Clinical Proteomics Research Group (CPRG) of the University of Geneva. Finally, these algorithms were implemented in the quantification module of the Phenyx software. 2

4 Acknowledgements It was an immense privilege to be guided and to be under the tutelage of my supervisors Nicolas Budin and Pierre-Alain Binz. I am very grateful to Alexandre Masselot for having given me the opportunity to do this training course at GeneBio, for his availability and his advice. I would like to thank Nasri Nahas for having permitted me to carry out this training under the optimal conditions. Many thanks are due to Olivier Evalet, Yann Mauron, Roman Mylonas and Ivan Topolsky for their support and their good mood. I am very thankful to Alireza Vaezzadeh and Loic Dayon for their collaboration, I thank very much David Bouyssié who did an essential previous work on the reporter ion peaks extraction. Finally, I would like to thank all the members of GeneBio for having welcoming me during one year. 3

5 1. Introduction 1.1. GeneBio SA Geneva Bioinformatics (GeneBio) SA is a bioinformatics company founded in November It was created quasi simultaneously with the Swiss Institute of Bioinformatics (SIB). One of the main activities of GeneBio is to act as the privileged commercial arm of the SIB, and therefore bring to market developments done at the SIB in order to provide back revenues to help further developments. Its first product line started in 1998 with the Swiss-Prot database. Swiss-2DPAGE (a 2D gel database), Prosite (a database of protein domains, families and functional sites) and Melanie (a 2D gel analysis software) soon followed. GeneBio also develops and commercialises proper specialized and innovative databases and software on biological molecules. These include Phenyx, a renowned software platform for the identification and characterization of proteins and peptides from mass spectrometry data. Another example is SmileMS, the latest GeneBio software and also developed in collaboration with the SIB. It is a unique platform for the identification and analysis of small molecules by mass spectrometry. Located in Geneva, a centre of excellence in the field of proteomics, GeneBio now has between 15 and 20 employees, including a majority of biologists and computer scientists Organisation My training course was achieved within the Phenyx development team. It's a bioinformatics training involving a binomial supervision. Dr Pierre-Alain-Binz has taken the responsibility of the scientific aspects of my project, giving me advice and orientation in proteomics and data analysis. Dr Nicolas Budin directed the informatics part of the project. He has managed the whole development side of the project, initiating me into R language, Java language, and to reliable methods of software development. The mass-spectrometric-based quantification universe is wide. From the beginning, it was decided that only itraq quantification would be covered in my work. To deepen my knowledge of the master courses about Mass-Spectrometric-based quantification in proteomics, I started the training by reading papers on itraq reagents and principles of quantification. Later, in order to familiarize with MS-identification and quantification, I focused on testing some of the available quantification tools. These steps permitted me to handle biological data, to see and understand the basis of large amounts of data analysis and to feel the span of the perspectives that the mass-spectrometric-based quantitative analysis offers. Subsequently, I have developed my own tools. Using the R Language, I have replicated the workflow of some of the studied software. Finally, the time came to offer quantification mean to the Phenyx users. The implementation was done in Java with the inclusion of calls to a few crucial statistical steps. 4

6 In order to obtain real users feedback, to collect what was needed in quantification, to see how the data was analysed, to obtain quantification materials and to have an idea of how the developed tools should behave when faced with real data, a collaboration was carried out with the Biochemical Proteomics Research Group and the Clinical Proteomics Group of the Geneva University Biological Context Proteomics analysis proposes a large toolbox of analytical methods, instruments and algorithms to identify and characterize proteins. The majority of the published proteomics studies are limited to the identification of the proteins expressed in a biological system. However, this is not sufficient to answer to most biological questions. Does a protein behave significantly different between two samples? Does a protein exhibit time-dependent change? Which proteins behave similarly in the experiment? Thus, quantitative answers are more and more required and populate an increasing number of publications. Initially, quantitative and comparative proteome analysis was performed with 2D-PAGE. Due to some limitations (low dynamic range, bias against membrane and soluble proteins), gelfree methods have complemented and are gradually supplanting gel-based quantitative proteomics. Specific techniques are used to address this issue. One solution is based on the employment of stable isotopes. Isotope label can be incorporated to the process in three ways: metabolically (during cell growth), enzymatically, and chemically. Chemical incorporation of the stable isotope has produced the most of the quantitative proteome data mainly due to its chemical versatility and because it allows the analysis of any biological sample (in contrast to metabolic). Due to the high amount of data, manual analysis is laborious but remains persistent in the scientific community. Thus, many software tools are made available to support data analysis. Several are open-source, with their own assets and caveats. Most of them are able to handle identification results from one or more different search engine (Mascot, Phenyx, SEQUEST, X!Tandem...). Recently, Mascot (a major player in the identification market) proposed its own quantification module. Phenyx need to meet the concurrent demands of quantitative high throughput MS data by proposing its own quantification module. Following the six month s work of the Master Student David Bouyssié in 2006 on the quantitative data extraction module for mainly SILAC and itraq, my main objective was to add the missing downstream analyses pieces of the puzzle and provide a complete quantification pipeline for itraq methodology. 5

7 2. Quantitative Proteomics 2.1. Global View Several methods exist to assess the protein abundance in a sample. Classical methods such as western blotting, fluorophores and radioactivity are widely used due to their sensitivity and dynamic range. However, these methods are not generally appropriate due to some constraints for large scale screening, and particularly for biomarker discovery (non-targetted experiments). Wu et al. has compared MS-based stable isotope labelling methods (itraq, cicat) and DIGE (Differential Gel Electrophoresis) and has shown that these problems can be overcomed by MS-based approaches [32]. MS-based strategy coupled with separation methods (2DE or LC) is currently the more efficient mean to perform the identification of a complex mixture of protein. However, due to the fact that protolytic peptides exhibit a wide range of physico-chemical properties (size, charge, hydrophocity...), the relationship between the amount of proteins and the signal intensities is complex. Therefore mass spectrometry is not inherently quantitative, when seen as a tool for absolute quantitation. Therefore, relative quantitation is preferred, where peptides are compared between experiments data points. This can be achieved in a numbers of ways. Thus, high throughput assessment of change in protein expression is usually performed by stable isotope labelling of peptides and proteins either metabolically, enzymatically ( 16 0, 18 0 incorporation by proteolysis) or chemically using external reagents (Figure 1). 6

8 Figure 1. Overview of quantification in Proteomics, [41] Metabolic Labelling involves in-vivo incorporation of the stable isotope during cell growth and division. One of the most widely used approaches is the SILAC (Stable Isotope Labelling Amino acid in Cell culture) approach, which was introduced by Mann and co-workers in 2002 [33]. In these methods, the heavy amino acid is incorporated during the protein synthesis. The main advantage is that no multiple steps in the labelling protocol are needed and the experimental error does not affect the ratio [34] as shown in However, this approach is almost exclusively applicable to cell An other approach is chemical labelling. For example, several papers studied fluids basing on isobaric tag, itraq [4] or TMT [7]. The principle of chemical tagging rests on the reactivity of the N-ter and side chains of lysine (itraq, TMT) and cysteine (ICAT). ICAT (Isotope Coded Affinity Tag) is the most known of the Cys tagging strategies. Chemical tags were designed to simultaneously allow the enrichment of a subfration (Cys peptides) from a complex mixture of proteins and the quantification of the selected peptides at the precursor ions level. Gygi et al. [35] developed this approach in which cysteine residues are specifically derivatized with a reagent containing either zero or eight deuterium atoms as well as a biotin group for affinity purification of cysteine-derivatized peptides and subsequent MS analysis. Modified versions of ICAT are emerged to solve problems of elution (cicat) and fragment loss (VICAT...). Wu et al. highlighted drawbacks and weaknesses of ICAT methods compared with other chemical tagging approaches [32]. Although ICAT analysis yields to good results when performed on simple or moderately complex sample, the cysteine-specificity leads to a loss of sensitivity when analyzing a complex protein mixture. 7

9 Figure 2. Common quantitative mass spectrometry workflows, Boxes in blue and yellow represent two experimental conditions. Horizontal lines indicate when samples are combined. Dashed lines indicate points at which experimental variation and thus quantification errors can occur [30]. Group of labelling reagents which targets the peptide N-terminus and the epsilon-amino group of lysine residues are the most sensitive. Most of the time, this is realized via the very specific N-hydroxysuccinimide (NHS) chemistry or other active esters and acid anhydride as in, e.g., the isotope coded protein label (ICPL), isotope tags for relative and absolute quantification (itraq) [19], tandem mass tags (TMT) [8],... ICPL, ICAT methods and most of the aforementioned chemical modification techniques, relative quantification is achieved by integration of MS signal over heavy and light labels. TMT and itraq introduce the concept of the isobaric tag. Isobaric tags labelled peptides co-migrate in liquid chromatography separations. The different tag can be distinguished by the mass spectrometer only upon peptide fragmentation. This permits the simultaneous determination of both identity and relative abundance of peptide in tandem-mass spectra. ITRAQ and TMT are described more in details below and some examples of application are provided and summarised (Figure 5). 8

10 2.2. Isobaric Tagging itraq ITRAQ reagents are amine specific stable isotope labels. Up to eight biological samples can be labelled simultaneously. Structure of the reagent is supplied in It consists of three groups: the reporter group and the balance group which form the isotobaric tag (145 Da) and the PRG (Peptide Reactive Group which reacts with the peptide primary amine group). The reporter group contains the charge, and gives strong signature ions in MS/MS. The balance group changes according to the reporter group. It undergoes a neutral loss during MS/MS. The basic itraq experimental workflow is displayed in Figure 4. To begin, proteins from one sample are digested using trypsin. As digestion results, N-ter pepides are ready to be derivatized with the samplespecific reagent via an acetylation reaction. Thanks to the isobaric nature of these reagents, one peak is obtained in MS that greatly simplify the MS spectrum. After CID, the balance group is loss, which leads to one peak for each reporter ion (Figure 4) in the region of low mass of the MS/MS spectrum (Figure 5). 9

11 Figure 3. itraq reagent structure Figure 4. Schematic itraq workflow; each sample is labelled with one of the eight itraq reagents and then pooled prior to MS analysis. 10

12 itraq reagents allow multiplexed quantification of up to eight samples (cell, tissue, serum). Moreover, it permits PTM analysis. Aggarwal et al. show that the reagent does not interfere negatively with the fragmentation to the extent that peptides length and amino acid content are similar to those obtained using other MS approaches [36]. Furthermore, itraq is a highly sensitive approach. Wu et al. demonstrated that itraq covers a large part of the E.coli proteomes. It helps to identify proteins across extreme pi and MW, it detects a great number of fragment peptides per protein and low abundance proteins are more often discerned. This high sensitivity can be explained by two factors. First, itraq is a global tagging reagent on all primary amine, contrary to ICAT (labels only cysteine). The second one is related the reactivity of lysine which leads to a stronger signal in MALDI-MS. Quantification relies on daughter ions generated during CID. However, a potential pitfall inherent to the Timed Ion Selector (TIS) resolution of the MALDI-TOF/TOF may affect quantification accuracy. Quantification relies on daughter ions generated during CID. TIS allows precursor ion and his fragment to pass through a gate in order to reach the detector and contribute in this way to the quantification ratio [32].A second limitation is that experimental variations can occur during tryptic digestion. In itraq workflow (Figure 4), digestion is prior to the labelling and the mix, contrary to ICAT where the labelling is prior to mix and digestion. This may introduce a potential source of error, especially in sample handling and variable degrees of tryptic digestion between two or more samples [32]. Table 1. Advantages/Disadvantage of itraq reagents Advantages Parallel proteomics: 8-plexing. Analyze proteins from cell, tissues or serum. PTM analysis. itraq reagent don't interfere negatively Disadvantage Mass spectrometer interference could hinder itraq reliability. itraq labelling after the tryptic digestion. with fragmentation. High sensitivity. TMT Tandem Mass Tag uses exactly the same approach as itraq. TMT tags are however heavier (TMT 6-plex: 126 to 131 Da). 11

13 2.3. Experimental Application The isobaric labelling approaches have been successfully applied to a variety of experiments and to various samples (prokaryotic and eukaryotic samples including Escherichia coli, yeast, human saliva, human fibroblasts and mammary epithelial cells...) [34,36]. The itraq approach can be applied for various purposes. For example, most of time when searching to characterize proteins from a specific signalling pathway, parallel proteomics approaches such as itraq 4-plex or TMT 6-plex are commonly used in order to obtain time courses profiles. Schmelzle et al. used itraq to label 4 samples of adipocytes stimulated for insulin at different time. Then, time course profiles have been plotted and proteins displaying the same behaviour in the same fashion are clustered. Unknown protein Glu-4 functionality was discovered in this manner [20]. Another study carried out by Zhang et al. used itraq for the same purpose. Profiles were made and clustered using Spotfire tm and the methods of SOM matrix [27]. Biomarkers can also be discovered using isobaric tags. Dayon et al and Choe et al analyzed CSF samples, using respectively TMT 6-plex [5] and itraq 4-plex [7]. Moreover, Desouza et al. identified five potential markers for endometrial cancer with itraq reagents and a set of four proteins using ICAT [9]. Cong et al. compared proteomes of human fibroblasts in four different biological states: replicatively senescent (under permanent growth arrest), stress-induced prematurely senescent, quiescent and young replicating, to identify the signature proteins of each biological state [6]. Figure 5 summarizes the purpose of quantitative proteomics using the isobaric tags. Figure 5. Common application of isobaric tags in proteomics and related analysis workflows. 12

14 2.4. Experimental Design: Principle of Replicate Analysis All these experiences can be declined in replicates [37]. Indeed, random variation when working with isobaric tags has many origins. The source of variation is a function of time, manpower, instrument, subject, subject condition, preparation process, etc... By definition, the variation is a measure of the spread around the expected value. This can be measured in three different forms: experimental, technical and biological variations. Typically, experimental replicates are the actual itraq replicates. Two or more experimental itraq sets serves to label the same samples. Technical replicates are used to assess of the consistency of a measure over repeated test of sample from a same biological source under identical conditions. It eliminates errors from sample preparation and it is very important to establish the significance of the protein expression (ANOVA, t-test, LPE test). Biological replicates are used to estimate the random biological variability associated with the test subject, by repeating the creation of the test subject under the same conditions (Figure 6). Figure 6. Schematic views of the relationship between technical, experimental and biological replicates in itraq experiments. A1 and A2 are two different samples under the same conditions. [37]. 13

15 3. Methods: Study Quantification Workflow 3.1. Introduction An important part of my training course was a prospective work. How is the quantification performed in proteomics? What are the tools? What are the best existing tools? What is the difference between them? Analysing manually a large dataset of MS identification results is time-consuming and not precise (some methods are impossible or difficult to perform manually such as outliers detection, quantile-quantile plot...). However, manual analysis is still widely used. More precise results can be obtained by computer-based data treatment. Thus, several Quantification tools (Q-tools) exist with their own properties (Table 2). As shown in Figure 7, the Applied Bioscience software, ProQuant tm, remains the most used Q-tool in the scientific community. In this part, I tested three tools to familiarize myself with tandem MS quantification; Mascot tm 's Q-tool, the itraq-specific Q-tool of the Trans-Proteomics Pipeline (TPP) and i- Tracker tm developed at the Cambridge University by Shadford et al [21]. Data from isobaric tag labelled samples was difficult to find. To overcome this problem, I utilized the on-line database, Peptide Atlas, which is closely linked to the TPP [23], as well as collaboration with the BPRG, which permitted me to obtain additional data (a description of it can be found in 3.2 materials). I tried to present the tested tools, to highlight their advantages and their limitations, to assess the quality of the quantification results by comparing the protein ratios and finally to determine a reliable quantification workflow. Figure 7. Pie chart of the number of publication by quantification tools. 14 publications have been read. Scientists still prefer to quantify manually or use the official itraq software ProQuant provided by Applied Bioscience. 14

16 Table 2. Summary of software available for quantification in isobaric-tag-based reagents. Tool Name Acad / Source Environment Comment Ref links Commercial ProteinPilot Applied Biosystems commercial w indow s easily distinguish protein isoforms, protein [23] subsets, and suppress false positives; and visualize peptide-protein associations and ProQuant Applied Biosystems commercial w indow s simultaneously quantitate and identify itraq [9] reagent-labeled peptides from MS/MS spectra. Pride Wizard Multi-Q MFPaQ Manchester Centre for integrative systems biology Institute of Information Science and Institute of Chemistry, Academia Sinica, Taiw an Institut de Pharmacologie et de Biologie Structurale, Toulouse, France academic w indow s Submission of mass spectrometry data and Mascot identifications to generate a valid PRIDE XML file. It also includes the facility to add itraq labels, allow ing quantitation data to be added to the PRIDE XML. academic web server itraq quantitation performed in a mascot way. [5] /Multi-Q-Web/ academic w indow s Takes mascot (DAT) result files as input for parsing and Analyst Wiff files for quantification. Quant University of Wurzburg, academic linux/w indow s Offers data and results visualization (boxplot, error [21] German plot), error estimation, Libra ISB academic linux/w indow s Quantification module of the TPP. (cf below [11] (cygw in) for more information) i-tracker Cranfield University,UK academic linux/windows (cf below for more information) [21] Mascot-Quanti matrix science commercial linux/w indow s (cf below for more information) w w w.matrixscience.com/ [22, 28] [20] w w.mcisb.org/resources/pridewizard/index.html vc/ sashimi/trunk/trans_proteomic_pipeline/sr c/quantitation/libra/docs/libra_info.html 3.2. Experimental samples ABRF Data. A sample from the ABRF 2006 study was used. It contains eight proteins spiked in ratio ranging from 1:1 to 1:76 marked with itraq 4-plex (114.1 to 117.1). Theoretical ratio can be found in Appendix 1. Peptides are identified by tandem mass spectrometry analysis with MALDI TOF-TOF (ABI 4700). Whitehead Data. Proteomic analysis was conducted at three time points (30, 40 and 60 min) for both control and γ irradiated cultures of Halobacterium salinarium strain NRC-1. Relative quantification was achieved using shotgun isobaric tagging with itraq reagents (Applied Biosystems, Foster City, CA). Quantification is achieved upon tandem MS, which fragments the itraq reagents unevenly to release daughter products of differing mass (m/z 114, 115, 116 and 117). For direct comparison across multiple runs a common reference sample derivatized with the 114 mass tag was included in each four-plex experiment. (More information is provided by Whitehead et al. [26]). This data were obtained from Peptide Atlas 15

17 3.3. Tested Software Presentation I-Tracker Three tools were compared; i-tracker developed by Shadford & al.[21], the itraq quantification tools packaged with the TTP pipeline [22], Libra and the quantification module of the 2.2 version of the Mascot sofware. The main goal of the i-tracker software is to calculate ratios from non-centroid MSMS peaks lists in a format linked to the results of protein identification tools i.e. Mascot and SEQUEST. The i-tracker process is detailed in Figure 8. The user can define an arbitrary intensity threshold as an unique filter. Such a threshold can lead to the loss of quantifiable peptides. Moreover, the purity correction coefficients can be entered (the itraq reagents are not completely pure and manufacturers therefore provide a correction factor to avoid peaks overlapping (cf Appendix 2 table 2)). Finally, the user can enter an ion tolerance to collect the reporter peaks areas. In its results, in addition to peptide ratios, i-tracker displays a table of quantization errors for each ratio. This error provides an interesting indication of the confidence we can give to a ratio, especially for ones calculated from very low abundance ions. Advantages and limitations of i-tracker can be found in Table 3. I-Tracker is limited to quantification at the peptide level. The i-tracker outputs a csv-formatted result files. Thus, Excel macros or parsing functions are easily applicable. A second disadvantage is that only samples labelled with itraq 4-plex reagents can be analysed. The main advantage of i-tracker is the quantisation error, which gives an indication of the confidence to give in a ratio of two peaks of low abundances. INPUT : itracker Algorithm : OUTPUT : - Non Centroid MS spectra (.mgf,.dta) - Ion intensity threshold - Purity correction - Reporter Peak Range - Reporter ion peak collection - Reporter ion area calculation - Purity correction - Peak normalisation (sum of all reporter intensities) - Under threshold checking - Ratio calculation - Quantisation error calculation - Relative errors of each reporter ion - Indicative Errors - Ratio for each reporter ion Figure 8. Scheme of i-tracker process 16

18 Table 3. Summary table of the advantages and limitations of i-tracker. Advantages Limitations Algorithm & source code for i-tracker are freely available Relative error, indicating the confidence to give to a ratio especially for low peaks ions Linked to other protein identification software Specific to itraq 4-plex, don't care about the others isobaric tag methods (itraq 8-plex, TMT...) Does not compute protein quantification 17

19 A. B. Figure 9. Screen shot of i-tracker OUTPUT. A. i-tracker outputs a.csv files containing the reporter-normalised area Norm and UT? which reports a flag UT if the ion peaks area is under the user-entered threshold. B. i-tracker outputs a.csv files containing the peptide ratios, with a matrix containing the quantisation errors. 18

20 TPP- Libra Libra is the itraq quantification module of the Trans-Proteomic Pipeline. The TPP is a collection of tools (Figure 10) for MS-based proteomics developed at the Seattle Proteome Center (SPC). It contains software for the quantification (Libra for itraq, XPRESS for ICAT...), converter (search engine format to pepxml...), validation and probability assignment (ProteinProphet, PeptideProphet Libra is part of the TPP pipeline and therefore relies on it for all quantification preprocessing steps (Figure 11). The Libra input data consists in (1) a database related to analyzed data placed in /dbase/, (2) a pepxml file, (3) a mzxml file and (4) a condition file. The most suitable database was found at The pepxml is the standard format for representing identification results. It stores information concerning PeptideProphet validation and quantification, and it references the mzxml file. The latter is the bedrock of the TPP. It is an open data format for storage and exchange of mass spectroscopy data, developed at the SPC/Institute for Systems Biology. It provides a standard container for MS and MS/MS proteomics data. Several converters are available to convert raw files (proprietary file formats from the most of vendors) to mzxml format. For example, the T2DExtractor tm to convert raw files from ABI instruments developed at the University of Michigan or Wolf tm, which converts MassLynx native acquisition files. Most of the time, converters must be run in the computer where the data acquisition instrument's software is installed. The last required file is an XML configuration file supplied by the condition.xml generator ( It contains all the requisite parameters to parse the mzxml file in order to extract reporter peaks intensities: the reagent M/Z values (for itraq 4-plex: to 117.1), mass tolerance, isotopic correction coefficients (provided by applied), a method of centroiding, a method of normalization (normalization against sum of intensities, against the most intense peaks...) and a minimum intensity threshold. Figure 10. Scheme of the software involved in the Trans-Proteomic Pipeline. Identification results from SEQUEST, Phenyx, Mascot and X!Tandem can be imported. ( 19

21 Figure 11. Scheme of the quantification within the TPP. XPRESS, ASAP and Libra are the three quantification modules within the TPP. Libra performs the protein quantification in a simple way. Usually, protein quantification is derived from the group of peptides associated with the protein. As summarized in the Figure 12, Libra applies a normalisation based on the sum of the reporter intensities. Then, the normalised intensities are averaged over all peptides of a protein. Normalized values of intensities differing by more than two sigma from the mean are considered as outliers and removed. The average of the reporter intensities of the protein is recalculated and the 1-sigma standard errors are calculated using the standard deviation. The software gives in output the average values of the reporters or the ratio if a reporter has been set in denominator (Figure 13). To resume, Libra employs a simple but accurate method to compute the protein ratio. Owing to converters packaged within the TPP (i.e. out2xml), Libra handles identification results from many search-engines (Mascot, SEQUEST...). Moreover, it can send (via protxml format) the quantification results through various post-quantification tools (Figure 11) such as Cytoscape, SBEAMS... The algorithm can also quantify samples labelled by the itraq 8-plex reagents. However, other isobaric tags are not taken into account for the moment. Moreover, this tool requires the installation of the whole TPP and it is not very user-friendly since all the TPP software only have command line interface (Table 4). A detailed Libra tutorial can be found in supplementary data

22 INPUT : Libra Algorithm : OUTPUT : PepXML, MzXML, m/z tolerance * Ion intensity treshold * Purity correction * Methods of centroiding * Methods of normalisation. * Reporter ion peak collection Apply purity correction Peak normalisation, Means of each reporter channel Outliers Removal Ratio calculation Protein Ratio Standard error Figure 12. Scheme of Libra process. (*) Contained in a configuration.xml file. Table 4. Summary table of the advantages and limitations of Libra. Advantages Limitations Simple but precise way to quantify Quantify protein identified by various search engines TPP pipeline allows an easy link with post Quantification tools (Cytoscape, SBAMS...) Can be extended to itraq 8-plex Maybe too many options displayed in the interface Only command line interface (converters, Libra...) Specific to itraq, no other isobaric tag can be quantified 21

23 Figure 13. Screenshot of the protxmlviewer shown protxml files which contain the results of the quantification. Means and SD of each reporter are displayed. 22

24 Mascot quantification module Mascot 2.2 includes a quantification module that computes ratios of identified proteins. This module covers most of the quantification methods i.e. reporter-based (itraq, TMT), precursor-based (ICAT, SILAC, Absolute Quantification...) and Label Free. All these methods are classified in protocols. Reporter protocol takes into account samples labelled with the most of the isobaric tag (itraq, TMT, ExacTag) except the AMT. All of the information required for the isobaric quantification is contained in the peak list, which is needed in input. Reagents and the MS/MS tolerance can be set in the interface. All the other parameters are set in the XML configuration file. The configuration.xml file encapsulates all users parameters. There are many different parameters, split into groups. For example, the group Methods contains the methods used to calculate the ratio and the significance level used for the statistical test (default 0.05). The group Component contains the information concerning each reporter (average and mono-isotopic mass, values of impurities correction...). In Ratio users can define each ratio they want to display (numerator, denominator). The group Quality contains the filter parameters, on intensity, on score, on expect value. Outlier and Normalization include specific methods. Mascot implements Grubbs, Rosner, Dixon detection methods, and three types of are available; summing intensities, median and geometric mean. Finally, Mascot performs the quantification following protein identification. It displays a summary box containing all the protein ratios (Figure 15A). A detailed view is also provided for each protein (Figure 15B). Each peptide ratio is given and the protein ratio value is displayed in a box coupled with a measure of spread, generally a geometric standard deviation (Figure 15B). Figure 14 summarizes the Mascot Process. Table 6 shows a comparison between the theoretical ratios of the proteins of the ABRF dataset and the ratios found in Mascot. Parameters are summarised in the Figure 1 of Appendix 3. Mascot fails to quantify three proteins. Carbonic anhydrase was not identified whereas betacasein and ribonuclease was identified but not quantified. The quantification cannot be performed due to the outlier s removal option. When the option is set to none, these two proteins become quantifiable. Due to its number of parameters and its very user-friendly interface, Mascot seems to be a very complete tool for quantification (Table 5). However, importing identification results from other search-engine is impossible. 23

25 INPUT : Mascot Algorithm : OUTPUT : Peak lists m/z tolerance * Significance threshold Impurities correction Reporter mass Threshold on peptide score Threshold on peptide maximum expect Ion intensity treshold Methods of integration Methods of normalization Methods of outliers removal Set numerator, denominator Reporter ion peaks collection Apply filters Peaks normalization Peptide ratio calculation Outliers Removal Ratio calculation Significance changes Quantification summary Box Individual proteins: peptide ratios, summary box Indication of the protein change Figure 14. Scheme of Mascot process. (*) Settable in the interface, all of the other parameters are contained in the configuration.xml file. Table 5. Summary Table of the advantages and limitations of Mascot. Advantages Limitations Correction of sample variability Strong outliers detection test Significance change Covers almost all of the isobaric tag-based quantification Clear interface Coupled to Mascot identification Display only the essentials parameters Bias in the ratio to the impurities correction Only compatible with Mascot identification results 24

26 Figure 15. Screenshots of Mascot result page, A/ Summary box of the quantified protein. B/ Protein summary box containing the type of methods to averaged peptides ratio, the number of peptides and the geometric SD. Below this box, the details of ratio at the peptide level. Table 6. Comparison of the theoretical protein ratio of ABRF sample and the Mascot ones. Protein Name AC Theoric Ratio Mascot Ratio Beta casein P :4 NQ* Catalase bovine liver C 1345 P :5 0.4 Glycogen phosphorylase rabbit P : Carbonic anhydrase I / 3:1 NI* Peroxidase horseradish P 6782 P : Ribonuclease A bovine R5500 P :1 NQ* Bovine serum albumin P02769_CHAIN0 1: Lactoperoxidase P :1 1.1 (*) NQ means that the protein is identified but Not Quantifiable. NI means that the proteins are Not Identified. 25

27 3.4. Software comparison Comparison of the different tools can be made at several levels; at the level of the quantification results and at the level of the algorithm. Comparison of the Quantification Results In a first step, quantification results have been compared. ABRF data have been used in order to compare peptide ratios found in i-tracker and Mascot. Because i-tracker handles noncentroid data, the peak list file has been modified in order to add some peaks for each itraq reporter (in the reporter m/z tolerance interval). Three peptides matches assigned to protein hits (observed mass: , , ) are used in input to i-tracker and then in Mascot. Parameters used in i-tracker and Mascot are shown in Figure 1 and Figure 2 of Appendix 3. Results provided in Table 7 show two peptides differentially expressed. The variation can be explained by the two methods of integration, area calculation and sum of reporter peak intensities. Libra and Mascot results have been compared using the Whitehead data [26]. Since no protein was spiked in this experiment we cannot conclude about which tools give the best result. However, protein ratios seem to follow the same profile for both tools (Table 8). Table 7. Result comparison table for three peptide matches Peptide m/z i-tracker ratio 117/114 Mascot ratio 117/ / Table 8. Result comparison table Libra VS Mascot Libra Mascot Protein Name Diphosphomevalonate 0.74+/ / / decarboxylase Adenylosuccinate synthetase 0.69+/ / / Proteasome-activating nucleotidase 2 Fumarate hydratase 0.64+/ / / / / / Heme biosynthesis protein 1.64+/-99.99* 2.18+/-99.99* 2.07+/-99.99* Vng6208c 1.10+/ / / (*) means that only one peptide is used to calculate the ratio, standard deviation is infinite. In Mascot bold numbers indicate the ratio is significantly different from 1. 26

28 A quantitative analysis can be subdivided in 5 cardinal steps. A pre-processing step in other words by which methods of integration the reporters are extracted (summing intensities of the profile or calculating the area under the profile curves) is performed. This is followed by a filtering step. We can imagine various filters; the most observed ones are a threshold on intensities, on score, and on p-value ( expect ). The normalization step is generally applied to correct systematic biases or to avoid giving too much weight to one reporter. This step is followed by the outliers removal step. Finally, the protein ratio can be estimated. The quantification workflow of each tool is compared for each of these steps. The three tested tools have their own ways to achieve the quantification. The i-tracker makes the quantification at the peptide level. The user must therefore manually calculate protein ratios. INPUT/OUTPUT First, we compare the INPUT file formats. Mascot quantification is clearly paired with the Mascot identification, in the extent that quantification of proteins which are identified using other identification software are impossible. On the contrary, i-tracker permits the importing of SEQUEST and Mascot results, and Libra can handle many types of identification tools results (Figure 10) due to the availability of tools such as Out2XML and Mascot2XML, which convert, respectively,.out file format from SEQUEST and.dat file format from Mascot in pepxml format. In addition to pepxml file, Libra inputs RAW files converted to mzxml format (for the conversion of raw to mzxml, converters need access to the computer where the instrumentspecific software for data acquisition is installed). In OUTPUT, Mascot releases a.dat file. Libra encapsulates its results in a protxml format file, which is read with the protxml viewer tool and exported to post-quantification tools (Cystoscape, SpotFire etc ). I-Tracker chooses to produce two types of.csv file. Output style 1 is designed to be human-readable when imported into programs such as MS Excel as a comma-separated variable file. It is strictly ordered so automated parsing is also straightforward. Output style 2 is designed to allow very easy basic analysis within programs such as MS Excel. All information is outputted on a one row per spectrum basis and thus all human-readability is lost, but to the gain of being able to run functions and macros more easily. Reporter Peaks Collection There are two ways to collect peaks. Mascot allows the choice between both via the configuration editor. Libra employs the sum of intensities of the peaks profiles whereas i-tracker used the trapezoid approximation for calculating the area under a curve. Filters After reporter ion peaks collection comes the filtering process. Many filters can be applied. All of them may involve data loss, especially when choosing a threshold on intensity, since peptides that present weak intensity are removed. 27

29 Type of Quantification Workflow We can now talk about the quantification workflow of each tool. Libra and Mascot have two ways to compute the protein ratio. Mascot computes an average of peptides ratios whereas Libra computes a ratio of averaged peptide reporter intensities. Outliers Removal In a quantification workflow, outliers removal (as normalisation) is a crucial step that is always in the quantification workflow although their implementation varies among the Q-tools. Mascot implements three methods for outlier removal. Dixon's r11 test, also referred to as N9, is used to detect and remove a single outlier at a time from either the upper or lower extreme of the range [27, 28]. It is applicable to values between 4 and 100. For a greater number of values, Rosner's test is applied [24]. Grubbs detection is applicable for values between 3 and 100 [26, 25]. Libra decides that a value of intensity is an outlier if it is outside of the range: ]µ - 2 * σ, µ + 2 * σ, [ (1) Where µ is the means and σ is the standard deviation. Care must be taken when outliers are blindly removed. A rigorous analysis would be to compare data with or without outliers to see to what extend the conclusions are qualitatively different. Normalisation A second important thing when working with large biological dataset is normalisation. Normalisation always takes place at peptide level. To remove variability of the sample incorporated during the experimental procedure, and based on the assumption that differential proteins in a biological sample are in a minority, Mascot proposes two methods to normalise the data so as to make the average of the whole population ratios across the entire data set equal to one. This can be done via the median or the geometric mean. Another method is the sum; totals intensities for each reporter across the entire data set are made equal. In contrast, for Libra and i- Tracker, no variability correction is implemented. However, in order to not give too much confidence in one reporter, both tools normalise on the sum of all reporter intensities. Determination of protein abundance and measure of spread Libra implements a simple ratio of the reporter means and displays a standard error calculated from the numerator standard deviation. Due to the fact that Mascot treats the peptide ratios, it allows several methods for averaging them; median, geometric mean, and weighted mean associated with a geometric standard deviation. I-Tracker does not provide the ratio at the protein level. 28

30 Bonus Several functionalities are software specific. I-Tracker for example is the only tool that displays a quantisation error. This value can serve as warning against placing too high confidence on reported ratios when these have been based on peaks with low ion counts [21]. Err(1,2) = (100 * ((0.5 / Peak1Max) + (0.5 / Peak2Max)) (2) Mascot provides an interesting and robust indication of the relative protein fold change. It employs a one-sample t-test. The null hypothesis H 0 is that the estimation of the ratios x is equal to one. If H 0 is rejected, x is significantly different from 1. The protein ratio is reported in bold Discussion Software comparison reveals that some steps are always found. Filters, outliers removal and estimation of the ratio are essentials in a quantitative analysis. Each tool has its own way to estimate the ratio. Although they display a good estimation of the relative protein abundance, some of them are not user-friendly (i-tracker, Libra). Moreover, they don t take into account the experimental design of the experiment (i.e. replicates analysis, time course...). Finally, no mean to visualize the protein expression is implemented. A complete summary of the tools that I tested can be found in table 1 of Appendix 4. 29

31 4. Establish Quantification WF 4.1. Introduction Based on the previous study, three methods of quantification were implemented using the R language. This programming language is very well appropriate when analysing large amount of data for quantification to the extent that it offers an environment for statistical programming and In fact, descriptive statistics methods and inference tests are all ready when installing R. There are many additional packages that are easy to use and well documented. Moreover, R is able to output graphs and charts and provides therefore an attractive way to represent quantitative results. Librus, an intensity-based method similar to Libra, and Mascat, a peptide ratio based method that resembles Mascot were first implemented. Furthermore, a novel quantification algorithm, named QI (Quantification Isobaric), was developed. QI workflow is a least-square regression-based workflow. According to Bantscheff et al. [30], linear regression could be a good alternative to the filter on intensity and the inherent data loss that such a filter involves. In fact, making the difference between (1) a weak intensity from a low abundance peptide and (2) a weak intensity from the background noise is difficult. A leastsquare regression line is a straight line that passes through the data so that the sum of the square of the vertical distance data points from the line is as small as possible. So, the advantages are double. First of all, we avoid useless loss of peptide matches by applying an arbitrary threshold on intensity, and secondly, we obtain an easy method in order to visualise the protein quantification. The idea of this method is to create a linear model and coerce the regression line to pass through the origin. The slope of the line is an estimation of the protein ratio. The R- squared gives an indication of the data spread (Figure 20). Then, strong methods of outliers and influential detection are used. These methods are part of regression diagnostics. As highlighted in part 3.5, several statistical steps are common to all quantification approaches. These steps are shown in dark blue in the Figure 16 and detailed in the next part. The validation of the algorithms was effected by using two sets of spiked proteins. The ratios of the proteins contained in the ABRF sample (described in 3.2) were compared between the three methods. Then a dataset (provided by Loïc Dayon) obtained from samples containing 4 spiked proteins [7] was used to measure the root mean square deviation (RMSD) between the expected theoretical ratio and the obtained ratios for each quantification algorithm. 30

32 Figure 16. Steps of the three approaches implemented; steps in common among Librus, Mascat and QI are shown in dark blue Description Signal Extraction The data were pre-processed using the LabelMS2Extractor (LMS2E), a tool designed to extract the intensities of peptides labelled with isobaric tags. The INPUT data consist of (1) the Phenyx identification result (pidres.xml), (2) the path of the configuration file that contains information about the mass of the reagents, and the values for impurities correction and the mass tolerance. Reporters are integrated using the method of sum of intensities along the range defined by the reagent mass and the user-entered mass tolerance. Missing values of reporter intensities are replaced by 0. Another way to compute reporter signals is to integrate the area under the reagent profile curve. The former method of integration is however more accurate and can handle cases where only one peak is detected in the reporter mass range [2]. The OUTPUT is a csv file (Appendix 2) that contains all of the quantifiable peptide matches, i.e. all of the peptides that have been labelled. 31

33 Filters In all implemented quantification methods, the first step consists of the filtering of the LMS2E results table. Filters can act at peptide level or at protein level. Various filters can be imagined. At protein level, a threshold on the number of peptide matches is implemented. The greater the number of peptides per proteins, the more accurate is the ratio. At peptides level, several filters are implemented; a threshold on the minimum intensity value, on the minimum score, the maximum p-value, and a filter on proteotypic peptides i.e. when a peptide matches for several proteins, the peptide is removed. Thus, only unique peptides are taken into consideration when performing the protein quantification. Normalisation/Correction step The second crucial step is the normalisation of the dataset. Librus implements the normalisation of the peptide intensity, as Libra does. Mascat implements the three normalisation methods of the Mascot quantification module. In fact, the Librus normalisation permits one to not give too much confidence to one reporter whereas the Mascat normalisation corrects the systematic bias that occurs during the sample preparation. Outliers removal In Librus, if a normalized value of intensity is more than 2σ from the mean of the signals of one reporter, it is considered as an outlier and removed. Mascot uses specific outliers detection tests (Grubbs, Rosner, Dixon). (For more details see part 3.4). QI bases its workflow on strong regression diagnostic methods. The DFFITS and studentized residuals are important techniques to the detection of outliers and influential points in a regression analysis. The DFFITS of an observation is a measure of the influence of this observation on its own predicted value. The studentized (standardized) residuals are adjusted by dividing them by an estimate of their standard deviation. The Table 9 shows the number of peptides remaining after outliers removal. Due to its tolerance to a large range of values, the Grubbs detection test has been chosen for Mascot. For Libra, the methods described above are used. For QI, if a DFFITS value is farther than m* standard deviation σ from the average of the DFFITS, the peptide is considered as an outlier and removed. We use the factor m equal to 2, 2.5 and 3. Authors recommend 2 [31] but the number of lost peptides becomes important (Table 9). As a reliable alternative, I propose a factor m equal to 2.5. The number of peptides per protein is greater. Moreover, the resulting ratios are closer to the theoretical ratios (Table 10). The Figure 17 shows the effect of such a method. Regression lines have been plotted before and after outliers removal (m = 2). 32

34 Table 9. Comparison of the number of remaining peptides for each methods of outliers removal. AC # peptides initial # peptides Grubbs # peptides +/- 2σ # peptides DFFITS ( m = 2 ) # peptides DFFITS ( m = 2.5 ) # peptides DFFITS ( m = 3 ) "P00432" "P00433_CHAIN0" "P00489" "P02769_WOSIG0" "P80025_CHAIN0" Table 10. Quantification results comparison for each workflow. AC Theoretical Ratio Mascat Ratio Librus Ratio QI Ratio ( m = 2) QI Ratio ( m = 2.5) QI Ratio for (m = 3) "P00432" 1:5 0.5* 0.47* * 0.36 "P00433_CHAIN0" 1: "P00489" 76:1 6.51* 5.27* * 3.86 P02769_WOSIG0 1: * 1.1 "P80025_CHAIN0" 1:1 1, *

35 A. B. Figure 17. Linear regression for four proteins, equations of the regression line and regression coefficient are provided, A/ before outliers removal, B/ after outliers removal. The outliers removal step improves the regression coefficient by diminishing the spread. 34

36 Protein ratio calculation and measure of spread Libra performs a ratio of averaged intensities and gives a standard error. Mascat averages the peptide ratios by some descriptive statistics methods (median and MAD, geometric mean and geometric SD, weighted mean and weighted SD ). Being based on a regression model, QI gives the ratio by least square estimation. OUTPUT and Visualization The three implemented approaches provide csv sheets and graphs to visualize the quantification. Each workflow has its own presentation of the results and specific graphs. Thus, Mascat displays a summary table containing the ratio and an indication of the protein fold change (Figure 18A), and a protein box containing all quantitative information of one protein is provided by the R package (Figure 18B). In addition to histogram and density distribution of the peptides ratio, boxplots are used to display the peptide ratio distribution for each quantified protein (Figure 18C). A boxplot is a graph of statistical summary, with the outliers plotted individually. The quartiles are spammed by the central box (first and third quartiles) and the median is displayed by the central line (bold). Observations plotted outside of the range 1.5*IQR are suspected as possible outliers and the whiskers show the largest and smallest peptide ratios that are not considered as outliers. Librus provides a matrix of result (Figure 19A), and a protein summary box that contains all information relative to one protein (Figure 19B). 35

37 A. B. C. Figure 18. Mascat output, csv files and boxplots A. summary box, contains proteins for the whole run, B. protein box displays information for one protein; ratios are calculated with the median and the MAD, Median Absolute Deviation. C. Standard boxplot of peptide ratios. Bold horizontal line is the median; outliers are visible as isolated points. 36

38 A. B. Figure 19. Librus output, A. all ratios for one protein are displayed in a matrix of ratios. B. individual information is summarized in the protein box (mean of normalised reporter intensities, SD, SE and the number of peptide matches). 37

39 QI workflow proposes individual protein boxes containing the ratios values calculated from the slope of the regression lines (Figure 20), and quantitative scatter plots of the peptides intensities numerator VS denominator with the fitted line (Figure 17A, 17B). A standard scatter plot displays the relationship between two quantitative variables. The x and y axis represent the denominator and nominator intensity values, respectively. A scatterplot provides information about the form, the direction and the strength of a relationship. In case of quantification by regression analysis, some of this information will be constant. The relationship is linear and peptide intensity values always show a straight-line pattern. The direction of the relationship is always unambiguous and the slope of the fitted line (that corresponds to the ratio) must be positive. It means that, in terms of regression, the association is positive. The strength of the relationship between the two variables is given by the coefficient of correlation. The reported standard error gives an estimation of the standard deviation of the slope. It is inversely proportional to the reliability of the computed ratio. Finally, the R-squared is provided. It gives a better feeling of the strength of the association between the points and the regression line than the correlation coefficient. In fact it can be interpreted as a measure of the spread. The closer the regression coefficient is to one, the better the regression line describes the data, and the better the slope is an estimation of the ratio. Figure 20. Qi output, individual box summary for one protein. It contains the slope that corresponds to an estimation of the protein ratio, the standard error for this coefficient, the regression coefficient and the number of peptide matches. A large number of peptides does not guarantee a precise estimation of the ratio. Quality of the data set It is generally admitted that high throughput shotgun proteomics data are log-normally distributed [2, 29]. In the case of peptide quantification, skewed distribution can be observed. Indeed, the ratios have log values lower or greater than zero, but very seldom have large values. Usually, they vary around zero. So, a non-normal distribution may indicate that the values are meaningless, or an experimental error occurred. However, this may also indicate that the sample contains very differential proteins. Log-normal distribution is required for an outliers detection test in Mascat Workflow and is therefore also checked before a data analysis is started. The test of normality is performed by the Shapiro-Wilk test. The null hypothesis is that the sample is taken from a normal distribution. This hypothesis is rejected if the critical p-value for the test statistic W is less than This test loses its reliability for a sample size greater than 2000 values. 38

40 This size threshold is almost always exceeded when working with high throughput proteomics data. To overcome this issue, we advise to always look at the peptide ratio distribution in Mascat (Figure 21A) and the reporter intensity distribution in Librus (Figure 21B), or the data quantilequantile plot (Figure 21C). A. C. B. Figure 21. Data quality assessment, A/ Mascat workflow, Distribution of the peptide ratios. Dotted red curves are the density distribution after normalisation (Ali Data). B/ Librus workflow, superposition of the reporter intensities density distribution (Loic Data, Ante-Mortem labelled reporters) C/ Mascat workflow, normal quantile-quantile plot of peptides ratio 117/114 (Ali Data). 39

41 Implementation The implementation was done using R version The program files, documentation and example script are contained in supplementary data 1. Reporter peaks were collected using the LMS2E that performed the integration by sum of intensities and corrected the reporter impurities. Librus workflow can be applied using the script librus.r. It executes the wrapper function librus that performs the quantification. It calls the function of normalisaton, firstnorm, the outliers removal function outlier; getratio calculated the ratio and the standard error. Mascat workflow is called by the wrapper mascat found in the module Mascatfunc.R. This workflow is devised in four modules. The wrapper calls the script Mascat_norm.R which contains normalisation function, the script Mascat_outliers.R which contains the outliers detection test (this script needs the installation of an additional library), outliers which clusters a collection of some tests commonly used for identifying outliers and Mascat_changes.R which contains statistical test for significance change. The script Mascat_displays.R performs the export of the quantification results and charts. The last workflow, QI, is designed as a single module Qifunc.R. It implements a wrapper qi, which calls sloperatio, a subroutine that creates the linear regression model and displays the regression summary. The summary contains the regression statistics of the model, the slope, the standard error, and the regression coefficient. Moreover the wrapper calls regression diagnostic methods by the function diagnrm. No additional libraries are needed for dffits and studentized residues. R default distribution contains all regression diagnostic functions. 40

42 Table 11. Summary of the methods used in the three implemented workflows. 41

43 4.3. Validation of the algorithms The ratios for the five spiked proteins of the ABRF data are compared in the table 10. It shows that ratios resulting from Mascat, Librus and QI are closed to the expected theoretical values. This therefore validates our three quantification algorithms. A more precise study was performed to determine which quantification workflow provides the more accurate ratios. For this purpose, we used a data set supplied by Loic Dayon that contains a mixture of albumin (ALBU) from bovine serum, myoglobin (MYO) from horse heart, -lactoglobulin (LACB) from bovine milk, and lysozyme (LYS) from hen egg in equal weight. This dataset was originally used to determine the coefficient for impurities correction for the TMT reagents, Dayon et Al. for more information [7]. Due to its robustness, root mean square deviation was chosen to calculate the deviation from expected theoretical ratio of 1:2:3:3:5:10. As summarized in Figure 22, RMSD for each expected ratios are combined for all the experiments. The profiles clearly show that the higher a ratio, the greater is its error. To obtain a better indication of the relative accuracy of a quantitative approach, a zoom is performed on the array of low (2:1, 3:1) and high (5:1, 10:1) theoretical ratio (Figure 23). This shows that for a relatively weak ratio, the approaches seem to behave equally (Figure 23A). However for ratios higher than 3:1, the workflow based on peptide ratio, Mascat presents the largest error in the extent that its errors calculated from all the jobs are higher than a RMSD value of 2.5 (Figure 23B). On the contrary, only one job presents a big deviation for the intensity-based workflow - Librus. Moreover except for this latter, Librus' error profiles are closely clustered whatever the expected ratios. The regression-based approach generally gives low error but the job profiles are more dispersed, almost parallel. More information is found in supplementary data 7. Detailed profiles per job can be seeing in Appendix 5. 42

44 Figure 22. Superposition of the RMSD profile for each job. 43

45 A. Root Mean Square Deviation for Librus, Mascat & QI Librus Librus Librus Librus Librus Mascat Mascat Mascat Mascat Mascat QI QI QI QI QI ,5 3 2,5 2 RMSD 1,5 1 0,5 0 5:1 10:1 Theoretical Ratio B. Root Mean Square Deviation for Librus, Mascat & QI Librus Librus Librus Librus Librus Mascat Mascat Mascat Mascat Mascat QI QI QI QI QI ,9 0,8 0,7 0,6 0,5 RMSD 0,4 0,3 0,2 0,1 0 2:1 3:1 Theoretical Ratio Figure 23. RMSD on low abundance and high abundance ratios, for weak ratio error seems to be relatively equal. However, when a protein is too differential, Mascat presents ratio with the highest deviations. 44

46 4.4. Discussion Three algorithms were developed and validated using the R language Each of them proposes their own statistical procedure (Appendix 6). By measuring the RMSD between the expected theoretical ratios and the estimated ratios from the different algorithms, we conclude that Librus or to a less extent, QI give the more accurate ratios. This corroborates the observations of Carillo et al. [39]. However, care must be taken by giving too much confidence to this conclusion. Indeed, the two analysed datasets are from samples containing only spiked proteins; the normality assumption is therefore not respected. This assumption is important especially in Mascat where the outliers detection test requires that the peptide ratios of the whole dataset are normally distributed. This could explain the fact that Mascat shows the biggest variations from the expected ratios. Consequently, to say that one of these algorithms gives the more accurate ratios, a study should be done on a dataset from real biological samples where some spiked proteins would be incorporated. The next studies show the application of the algorithms into real biological contexts. 45

47 5. Application of the Quantification Workflows 5.1. Alireza collaboration: Peptides Ratios-based Quantification Approach applied to Characterize Daptomycin Resistance in Staphylococcus aureus. A two-month work with Alireza Vaezzadeh, a PHD student from the Biomedical Proteomics Research Group (BPRG), has been carried out. My part of the job consisted of an analysis of MS/MS Data obtained from quantitative MS based proteomic experiments on Staphylococcus S.aureus Introduction S. aureus, also known as golden staph, is the most common cause of staph infections. It is a spherical bacterium, frequently living on the skin or in the nose of a person where it appears in grape-like clusters when viewed through a microscope. Figure 24. Microscopic image of Staphylococcus aureus (ATCC 25923). Gram staining, magnification:1,000. It infects tissues causing furuncles and severe diseases like Staphylococcal scalded skin syndrome (SSSS) in the infant. In order to stop the rising incidence of this infection, Daptomycin was approved by the FDA (Food and Drug Administration) in 2003 for the treatment of complicated skin and soft tissue infections caused by susceptible strains of S. aureus, including methicillin-resistant S.aureus (MRSA) strains, and other gram-positive bacteria. Despite significant efforts over the past 20 years, the mode of action of Daptomycin remains mysterious. Furthermore the bacterium seems to develop resistance to this antistaphylococcal agent. The exact mechanism of the resistance is not clearly known. As methods for gaining insight into this domain, Proteomic and Transcriptomic approaches have been developed. Transcriptional profiles were performed using a customized and extensively validated oligoarray by Dr. Patrice François at the Geneva University Hospitals. Protein MS-based quantification was performed using itraq on membrane-enriched extracts. A list of differentially expressed proteins was obtained using the Mascat Workflow. 46

48 Materials Experimental Workflow. Three strains were analysed in this study: 616 (initial patient isolate), 629 (first isolate breaking through Daptomycin therapy and demonstrating decreased Daptomycin susceptibility but still within the susceptible range, termed the transitional strain ), and 701 (subsequently isolated during Daptomycin therapy and non-susceptible to Daptomycin). Quantitative-MS based proteomic experiments were performed in practical triplicates. Samples were prepared according to manufacturer s protocol (Applied Biosystems, Framingham MA). More details concerning the Experiments can be found in Supplementary Data 2. For the first proteomic experiment (PR1), strain 601 was labelled with itraq 114, strain 629 with itraq 116 and 701 with itraq 117. For the second practical replicate experiment (PR2), strain tags were crossed and strain 616 was labelled with itraq 117, strain 629 with itraq 114 and 701 with itraq 116. Finally for practical replicate three (PR3), strain 616 was labelled with itraq 116, strain 629 with itraq 117 and 701 with itraq 114. This experimental design is shown in Table 12. Table 12. Experimental design for differential comparison of S.aureus strains with dissimilar Daptomycin susceptibility. The itraq tags are crossed in each practical replicate (PR) Daptomycin Susceptible Transitional Strain Daptomycin Nonsusceptible PR 1 itraq 114 itraq 116 itraq 117 PR 2 itraq 117 itraq 114 itraq 116 PR 3 itraq 116 itraq 117 itraq Methods itraq quantification. Although several quantification software packages exist, none of them allow an easy handling of inter-run replicate and can import the data processed from Phenyx. The quantification values are extracted with the LabelMS2Extractor from reporter peaks intensities in a mass range of +/- 0.1Da.Then the Mascat methods (steps are detailed Appendix 6) was applied with the following parameters (Table 13). Data were analysed via R version A series of various filters was implemented on the data: Thresholds on p value (>1e-7), Z score (<6) and intensities (<2000). In addition, peptides present in more than one protein were removed. Finally, Proteins with less than 2 peptides were excluded. In order to reduce artifactual variation and to focus only on biological variation, a correction factor, based on the sum of the intensities of each reporter, was applied on peptide ratios. Outliers were removed using the Grubbs detection test on each protein. Each time a value was removed, the test was repeated. Then, the median of the ratios was computed to display the protein ratio. To give an indication of the significance of relative fold change, one parameter student-test was employed for a significance level of If the assumption of normality [Shapiro-Wilk test, n>3] was not respected a non-parametric Wilcoxon test was performed instead. Finally proteins were reported as differentially expressed if (1) their three replicate ratios were significantly different from one and (2).For the final selection, only proteins, which were designated as significant in all three replicates and had a coefficient of variation fewer than 40% were taken into account. data. The R script, ali_analysis.r, used for the ratio calculations can be found in supplementary 47

49 Table 13. Quantification parameters Reporter extraction moz tolerance 0.1 Da Impurities Correction No Quantification Workflow Mascat Filters * Intensities < 50 * z-score < 6 Normalisation Outliers Ratio calculation * p-value >10^-7 Sum of intensities Grubbs Median Results In the first practical replicate (PR1), 2'803 unique peptides corresponding to 565 proteins were identified from the bacterial membrane fraction. In the second experiment (PR2), peptides corresponding to 511 proteins were identified and finally in the last replicate (PR3) unique peptides corresponding to 495 proteins were identified. A total of 728 proteins (3 248 peptides) were identified from combining all replicates (Figure 26). Almost all peptides (95%) produced intense signals from the reporter fragment ions at 114.1, and Da. However, relative quantification was performed only on 347 proteins commonly identified in all three replicates. Pair-wise comparisons of the strains in the three practical replicates are shown in Figure 27A as 3D scatter plots and in Figure 27B as 2D distribution plots. Figure 25. Proteins identified in three practical replicates (PR). In total 728 proteins corresponding to 3'248 peptides were identified. 48

50 Figure 26. A. 3D scatter plots of pair-wise comparisons of S.aureus strains in replicates with different susceptibilities towards Daptomycin. B. 2D distribution plots of the logarithmic expression values in three practical replicates (PR). Only proteins with similar expressions in three replicates were considered significant. In comparing the Daptomycin susceptible strain 616 to transitional strain 629, only 68 proteins were considered significantly differentially expressed in all three replicates. None of the proteins were out of the ratio difference window of -0.5 to 0.5 (log10 of protein expression ratio). However, 46 proteins showed a down regulation of less than -0.3 fold change with a coefficient of variation (CV) of less than 21%. Comparison of the transitional strain 629 to non-susceptible strain 701 resulted in identification of 14 proteins but none of them had a ratio change in the window -0.3 to 0.3. Comparison of the two strains of 616 of 701 resulted in a list of 60 proteins with 31 proteins having expression values lower than -0.3 and CVs of lower than 37% (cf supplementary data 3). Similar results were obtained with the transcriptomic approach. 49

51 Discussion A number of interesting observations emanated from our study. From a biological point of view, we can notice that most of the identified under-expressed proteins between the susceptible and transitional strains were similar to those observed with a down-regulation from the susceptible to the non-susceptible strains. The majority of these proteins were involved in metabolism or were ribosomal or transcriptional factors. Two proteins (Q99UV7, Q7A869) involved in the ion transportation were also identified and showed a reduction in expression. L.Cui & all observed a correlation between membranes thickness and Daptomycin susceptibility. The identification and quantification of proteins that contribute to the synthesis or the metabolism of the cell membrane like Q99V41, Q7A7A5, Q7A5K8, Q7A619, Q7A5D5, Q7A6K0 and P64003 tend to confirm this observation. Moreover, this study highlights several critical points in a quantification analysis. First, an automated, computer-based analysis for quantization of large amount of proteins is more reliable than manual. Indeed, another analysis that was manually carried out was not in agreement with transcriptomic results. Owing to the filtering and outlier removal steps, the correction of experimental variation is provided during the normalization step. Secondly, Mascat provides an indication of change that directly gives an indication of the protein expression (cf Supplementary Data 4). Third, this collaboration brought to our attention complex experimental designs involving inter-run replicates (experimental replicates) and lead to their implementation and support in the quantification workflows. Combined proteomic and transcriptomic analyses allowed for obtaining a global view of complex processes involving differentially regulated factors contributing to antibiotic resistance. This combined information is essential for the global integration of the data. Several potential proteins implicated in Daptomycin resistance were identified. However, their implication has to be confirmed by targeted investigation with conventional molecular biology techniques. In view of the results found in the literature and additional information obtained in this study, we showed that our data appeared particularly relevant and that multiple mechanisms are mobilized by Staphylococcus aureus to produce resistance towards antibiotics. 50

52 5.2. Loic Dayon Collaboration CSF Analysis by TMT 6-plex Introduction A second collaboration has been carried out on a TMT 6-plex dataset, generously provided by Loic Dayon. This study consisted of the quantitative analysis of proteins obtained by shotgun proteomic approach on Cerebrospinal fluid samples using TMT. The CSF is a clear fluid found in the brain chambers (ventricles), spinal canal, and spinal cord. It is secreted by the choroids plexus, a vascular part in the ventricles of the brain. It acts as a shock absorber to protect the brain against injury. It contains electrolytes, glucose, and low proteins concentration. Chemical labelling seems to Aggarwal et al show that isobaric tags are highly sensitive and can identify low abundant proteins. In this study, relative quantification on ante-mortem and post-mortem CSF was done [7]. Due to the disruption of the Blood-Brain Barrier (BBB) a few hours after death, post-mortem CSF and ante-mortem CSF are very different in composition. In a first step, I performed a quantification analysis using Librus (a quantification workflow based on the TPP-Libra) and compared the results with the published ratio values [7]. However, I found sometimes-dissimilar protein ratios. In order to address this issue, an investigation was carried out on two possible sources of bias in the protein ratio: missing values of intensities and impurities correction. Generally, missing values are due to problems in the detection of weak signal from low abundance peptides. Instruments sometimes fail to detect the signal and even if the detection is successful, the peaks intensity may be too low to be distinguished from the background noise [38]. When looking at the Phenyx job, we can notice that a lot of peptides contain missing events for one specific reporter. Thus, I tried to assess the influence of these missing events in the quantification results by running our three quantification approaches on the data. Results showed that an approach based on linear regression is a relatively good alternative to overcome this issue. A last study has been performed on the impurity correction. In fact, TMT or itraq reagents differ in the isotopic compositions of nitrogen, carbon and oxygen but have identical masses. Due to isotopic contamination in tags, peak overlapping occurs, i.e. the peak area for each reporter ion has some contribution from other reporter ions. To correct this bias, the manufacturer ABI (itraq)) provides a datasheet which indicates the percentages of each reporter ion reagent that differs by -2, -1, +1, +2 Da from the quoted mass. I-Tracker, proquant and most of the quantification software implement this correction. Purity correction may however lead to biased estimation of the protein fold change, especially when low abundance peptides are detected. 51

53 Materials CSF Collection. Post-Mortem (PM) CSF samples from different patients (n = 4) were collected by ventricular puncture at autopsy, 6 h after death on average. Control Ante-Mortem (AM) CSF samples were collected by routine diagnostic lumbar puncture from living healthy patients (n = 4). Clinical data of deceased and living patients have been previously reported. Each patient or patient s relatives gave informed consent prior to enrolment. The local institutional ethical committee board approved the clinical protocol. Experimental Workflow. After immunoaffinity depletion, triplicates of AM and PM CSF pooled samples were reduced, alkylated, digested by trypsin, and labelled, respectively, with the six isobaric variants of the TMT (with reporter ions from m/z ) to Th). The samples were pooled and fractionated by SCX chromatography. After RP-LC separation, peptides were identified and quantified by MS/MS analysis with MALDI TOF/TOF and ESI-Q- TOF. Spiking of Protein. β-lactoglobulin (LACB) from bovine milk (90%) was purchased from Sigma (St. Louis, MO). (cf Dayon & al. [7] For more details about the Experimental Procedure). 52

54 Data analysis Protein Quantification Reporter peaks are extracted using the LabelMS/MSExtractor. They are collected with a m/z tolerance of No isotopic correction is applied due to the bias generated on the low peaks intensities. Protein quantification was performed using the Librus method. (Appendix 6). Peptides are filtered according to the signal intensity (< 50), sum of intensity (< 300), and the quality of the labelling. The protein is removed if a protein does not contain at least two peptides. A parameters summary can be found in table 14. The R script CSF6plex_analysis.R can be found in supplementary data 6. Table 14. Quantification parameters Reporter extraction moz tolerance 0.15 Da Impurities correction Quantification Workflow Librus No Filters * intensities < 50 * quality of the labelling * sum of reporter per peptides <500 * min number of peptides : 2 Normalisation Sum of peptides reporter intensities Outliers Ratio calculation µ+/-2*σ Σ PM CSF reporter mean / Σ AM CSF reporter mean Error Estimation To obtain a more reliable comparison with Loic's published result, no filter on intensity is applied; the only filter applied is the one that assesses the quality of the labelling. Proteins are removed if a protein does not contain at least 2 peptides. The Librus quantification method has been used and the ratio of the mean label 131 m/z on 126 m/z reported. A confidence interval on the estimated ratio was then obtained. An average value for the noise was first estimated. According to Loic s opinion and a brief glance at the peak list, the background noise was estimated to 50 counts. This value was then used to determine the maximum and minimum values of each reporter intensity (Table 15). Finally, the following crossing pairs of ratio were used to compute the interval [Label6MAX/Label1MIN, Label6MIN/Label1MAX] (Figure 27). 53

55 Figure 27. Error Model, Total error was obtained by crossing the pair s max-min values of the two labels. Table 15. Quantification parameters Reporter extraction moz tolerance 0.15 Da Impurities Correction No Quantification Workflow Librus Filters Normalisation Outliers * quality of the labelling Sum of peptides reporter intensities µ+/-2*σ Ratio calculation Reporter 131 mean / Reporter 126 mean 54

56 Missing Events or Missing Values of Intensities (MVI) No filter on intensity is applied for Librus as well as for Mascat or QI. Nevertheless, a threshold on peptide number is set to two. A normalisation on the sum of intensities is performed in Mascat to avoid systematic bias due to experimental variation. Then, a removal step was applied on data to remove outliers (Mascat: Grubbs detection test, QI: DFFITS approaches, which specifically removed peptides outside the range [mean (DFFTIS) +/- 2.5 sigma]). Protein ratios were calculated by the median in Mascat and were given by the slope of the regression line in QI (Table 16). The R script CSF6plex_MVI_analysis.R can be found in supplementary data 6. Table 16. Quantification parameters Reporter extraction moz tolerance 0.15 Impurities Correction No Quantification Workflow Librus Mascat QI Filters Normalisation Quality of the labelling Sum of peptides reporter intensities Sum of intensities No Outliers µ+/-2*σ Grubbs detection DFFITS Ratio calculation Reporter 131 mean / Reporter 126 mean Median of peptide ratios 131/126 Slope of regression line Results and Discussion Protein Quantification Before the analysis of biological data, it is important to know the goal of the experiment and where the data comes from. This study is designed for biomarkers discovery. CSF from living healthy patients and CSF from dead patients are collected from four patients and pooled on two samples AM and PM, respectively. It is important to notice that PM CSF is collected 6 hours after death; consequently a lot of plasma specific proteins (albumin, immunoglobulin...) and cytoplasmic proteins are unloaded in the CSF because of the disruption of the blood-brain barrier. A pooled sample of AM CSF and a pooled sample of PM CSF were spiked with the same amount of LACB and each divided into three samples. The six resulting samples were depleted of albumin, transferrin, IgG, IgA, antitrypsin, and haptoglobin. They were reduced, alkylated, digested by trypsin, and labelled with TMT. The three AM CSF samples were, respectively, labelled with TMT with reporter at m/z) 126.1, 128.1,and Phenyx software identified 1246 peptides corresponding to 220 proteins. After the filtering step describes in the Data Analysis Section, 89 proteins identified by a total of 722 peptides are quantified. Histogram and normal quantile-quantile plot of the shotgun proteomics data generally give a mean to assess the quality of the experiment. It is commonly admitted that biological data 55

57 follows mostly a log-normal distribution. In the case of this study, the observation of the discontinuity in the negative region of the density distribution curves and by the dump on the points distribution on the normal quantile-quantile plot can be explained by the fact that, despite the filtering step, there remains a large amount of large values (Figure 28B). However, as displayed by the red dotted line, the median of the peptide quantification is close to 0 (Figure 28 29B). Figure 28. A. Density distribution of log transformed peptide ratios, the density curves are shown in red and the dotted red line is the median. B. quantile-quantile plot of peptide ratios AM/PM. The y-axis denotes the sorted logtransformed quantile and the x-axis the normal quantile. Taking into account the original authors assumptions (filters, normalization...) and the experimental design (replicate), the data has been analysed with a modified version of the Librus Package that processes in a way close to TPP-Libra methods. Protein ratio and statistic about proteins are displayed by the Librus default Export (cf supplementary data 8 and 9). The resulting list of protein (cf supplementary data 10) is compared with the published ones. As displayed in table 17, differences seem huge for several proteins. Indeed, the filter on reporter peaks intensity removed a lot of peptides. Because a lot of low abundant proteins are present, the probability to encounter MVIs is large. In the original authors' analysis, MVIs are replaced by 0 and incorporated in the downstream calculation. In my analysis, the MVIs are replaced by 0 and removed because a threshold on intensity was fixed to 50. Such threshold leads to a considerable loss of quantification materials and explains most of the ratio differences. The ratio differences can also be explained by the impurities correction. Though the influence of the systematic bias incorporated by the impurities correction is low when working with data sets containing few weak signals, the bias can be enhanced in this data set where weak reporter peaks signals are abundant. In the next study we try to give an alternative to this bias. Table 17. Comparison of ratios for six proteins 56

58 AC Librus Publi A1A A2A A4D A6H8M A6ND A6NDP A6NI A6NMS O14656_CHAIN_ Error Estimation The correction of impurities can incorporate a bias in the downstream step of quantification, especially when calculating a ratio from low peaks intensity (Figure 29). After correction, values can become negative, and impact the ratio. The computation of a ratio using a negative value is impossible. The filter on the reporter intensity tends to limits this problem. However, Q-tools and researchers generally replace these values by 0 or by an estimation of the background noise. In this study we compute an error indication and a ratio interval instead of a biased estimation of the protein fold change. Figure 29. Impurites correction model. The green vertical bars indicate the true peptide intensity for four samples labelled with itraq reagents. Blue bars are the intensities after correction. Low intensity is close to the red line background noise. The reporter 116 peaks will give a negative value. It was important to notice that the majority of the proteins display a differential ratio, as shown in supplementary Data 5, and by the bars in the Figure 30. Furthermore, the more the protein ratio is extreme, the larger is the error on the ratio, as mentioned in the part 4.4. The percentage of errors tends to increase linearly for highly differential proteins. Few of them displays a ratio higher than 10, but the error on this estimation is higher than 15%. 57

59 < Figure 30. Bar plot of the number of proteins. the black line shows the evolution of the error in function of category of ratio. The bars give the number of proteins in function of the categories of ratio. A comparison of Librus results and the published one are summarized in Table 18 and viewed in Figure 31. Three proteins present ratios that do not fall into the interval [Label6MAX/Label1MIN, Label6MIN/Label1MAX]. However, the interval seems to correctly frame the majority of ratios. The impurities of isotope represent a source of systematic error that can efficiently be corrected. The impact of the systematic error on the computed ratios is high when the percentage of low intensity peptides is large in the dataset. This correction can however lead to null and/or negative intensities in a low intensity dataset and therefore may lead to wrong ratios. We therefore replaced the isotopic correction by a ratio confidence interval. 58

泛 用 蛋 白 質 體 學 之 質 譜 儀 資 料 分 析 平 台 的 建 立 與 應 用 Universal Mass Spectrometry Data Analysis Platform for Quantitative and Qualitative Proteomics

泛 用 蛋 白 質 體 學 之 質 譜 儀 資 料 分 析 平 台 的 建 立 與 應 用 Universal Mass Spectrometry Data Analysis Platform for Quantitative and Qualitative Proteomics 泛 用 蛋 白 質 體 學 之 質 譜 儀 資 料 分 析 平 台 的 建 立 與 應 用 Universal Mass Spectrometry Data Analysis Platform for Quantitative and Qualitative Proteomics 2014 Training Course Wei-Hung Chang ( 張 瑋 宏 ) ABRC, Academia

More information

Aiping Lu. Key Laboratory of System Biology Chinese Academic Society APLV@sibs.ac.cn

Aiping Lu. Key Laboratory of System Biology Chinese Academic Society APLV@sibs.ac.cn Aiping Lu Key Laboratory of System Biology Chinese Academic Society APLV@sibs.ac.cn Proteome and Proteomics PROTEin complement expressed by genome Marc Wilkins Electrophoresis. 1995. 16(7):1090-4. proteomics

More information

Tutorial for Proteomics Data Submission. Katalin F. Medzihradszky Robert J. Chalkley UCSF

Tutorial for Proteomics Data Submission. Katalin F. Medzihradszky Robert J. Chalkley UCSF Tutorial for Proteomics Data Submission Katalin F. Medzihradszky Robert J. Chalkley UCSF Why Have Guidelines? Large-scale proteomics studies create huge amounts of data. It is impossible/impractical to

More information

Mass Spectrometry Signal Calibration for Protein Quantitation

Mass Spectrometry Signal Calibration for Protein Quantitation Cambridge Isotope Laboratories, Inc. www.isotope.com Proteomics Mass Spectrometry Signal Calibration for Protein Quantitation Michael J. MacCoss, PhD Associate Professor of Genome Sciences University of

More information

Quantitative proteomics background

Quantitative proteomics background Proteomics data analysis seminar Quantitative proteomics and transcriptomics of anaerobic and aerobic yeast cultures reveals post transcriptional regulation of key cellular processes de Groot, M., Daran

More information

Already said. Already said. Outlook. Look at LC-MS data. A look at data for quantitative analysis using MSight and Phenyx. What data for quantitation?

Already said. Already said. Outlook. Look at LC-MS data. A look at data for quantitative analysis using MSight and Phenyx. What data for quantitation? A look at data for quantitative analysis using MSight and Phenyx Pierre-Alain Binz Institut Suisse de Bioinformatique GeneBio SA Atelier Protéomique Quantitative 25-27 Juin 2007 La Grande Motte Already

More information

ProteinPilot Report for ProteinPilot Software

ProteinPilot Report for ProteinPilot Software ProteinPilot Report for ProteinPilot Software Detailed Analysis of Protein Identification / Quantitation Results Automatically Sean L Seymour, Christie Hunter SCIEX, USA Pow erful mass spectrometers like

More information

ProteinScape. Innovation with Integrity. Proteomics Data Analysis & Management. Mass Spectrometry

ProteinScape. Innovation with Integrity. Proteomics Data Analysis & Management. Mass Spectrometry ProteinScape Proteomics Data Analysis & Management Innovation with Integrity Mass Spectrometry ProteinScape a Virtual Environment for Successful Proteomics To overcome the growing complexity of proteomics

More information

AB SCIEX TOF/TOF 4800 PLUS SYSTEM. Cost effective flexibility for your core needs

AB SCIEX TOF/TOF 4800 PLUS SYSTEM. Cost effective flexibility for your core needs AB SCIEX TOF/TOF 4800 PLUS SYSTEM Cost effective flexibility for your core needs AB SCIEX TOF/TOF 4800 PLUS SYSTEM It s just what you expect from the industry leader. The AB SCIEX 4800 Plus MALDI TOF/TOF

More information

The Scheduled MRM Algorithm Enables Intelligent Use of Retention Time During Multiple Reaction Monitoring

The Scheduled MRM Algorithm Enables Intelligent Use of Retention Time During Multiple Reaction Monitoring The Scheduled MRM Algorithm Enables Intelligent Use of Retention Time During Multiple Reaction Monitoring Delivering up to 2500 MRM Transitions per LC Run Christie Hunter 1, Brigitte Simons 2 1 AB SCIEX,

More information

Session 1. Course Presentation: Mass spectrometry-based proteomics for molecular and cellular biologists

Session 1. Course Presentation: Mass spectrometry-based proteomics for molecular and cellular biologists Program Overview Session 1. Course Presentation: Mass spectrometry-based proteomics for molecular and cellular biologists Session 2. Principles of Mass Spectrometry Session 3. Mass spectrometry based proteomics

More information

MRMPilot Software: Accelerating MRM Assay Development for Targeted Quantitative Proteomics

MRMPilot Software: Accelerating MRM Assay Development for Targeted Quantitative Proteomics MRMPilot Software: Accelerating MRM Assay Development for Targeted Quantitative Proteomics With Unique QTRAP and TripleTOF 5600 System Technology Targeted peptide quantification is a rapidly growing application

More information

MASCOT Search Results Interpretation

MASCOT Search Results Interpretation The Mascot protein identification program (Matrix Science, Ltd.) uses statistical methods to assess the validity of a match. MS/MS data is not ideal. That is, there are unassignable peaks (noise) and usually

More information

MultiQuant Software 2.0 for Targeted Protein / Peptide Quantification

MultiQuant Software 2.0 for Targeted Protein / Peptide Quantification MultiQuant Software 2.0 for Targeted Protein / Peptide Quantification Gold Standard for Quantitative Data Processing Because of the sensitivity, selectivity, speed and throughput at which MRM assays can

More information

Increasing the Multiplexing of High Resolution Targeted Peptide Quantification Assays

Increasing the Multiplexing of High Resolution Targeted Peptide Quantification Assays Increasing the Multiplexing of High Resolution Targeted Peptide Quantification Assays Scheduled MRM HR Workflow on the TripleTOF Systems Jenny Albanese, Christie Hunter AB SCIEX, USA Targeted quantitative

More information

Introduction to Proteomics

Introduction to Proteomics Introduction to Proteomics Åsa Wheelock, Ph.D. Division of Respiratory Medicine & Karolinska Biomics Center asa.wheelock@ki.se In: Systems Biology and the Omics Cascade, Karolinska Institutet, June 9-13,

More information

using ms based proteomics

using ms based proteomics quantification using ms based proteomics lennart martens Computational Omics and Systems Biology Group Department of Medical Protein Research, VIB Department of Biochemistry, Ghent University Ghent, Belgium

More information

Quantitative mass spectrometry in proteomics: a critical review

Quantitative mass spectrometry in proteomics: a critical review Anal Bioanal Chem (2007) 389:1017 1031 DOI 10.1007/s00216-007-1486-6 REVIEW Quantitative mass spectrometry in proteomics: a critical review Marcus Bantscheff & Markus Schirle & Gavain Sweetman & Jens Rick

More information

Tutorial for proteome data analysis using the Perseus software platform

Tutorial for proteome data analysis using the Perseus software platform Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information

More information

Effects of Intelligent Data Acquisition and Fast Laser Speed on Analysis of Complex Protein Digests

Effects of Intelligent Data Acquisition and Fast Laser Speed on Analysis of Complex Protein Digests Effects of Intelligent Data Acquisition and Fast Laser Speed on Analysis of Complex Protein Digests AB SCIEX TOF/TOF 5800 System with DynamicExit Algorithm and ProteinPilot Software for Robust Protein

More information

La Protéomique : Etat de l art et perspectives

La Protéomique : Etat de l art et perspectives La Protéomique : Etat de l art et perspectives Odile Schiltz Institut de Pharmacologie et de Biologie Structurale CNRS, Université de Toulouse, Odile.Schiltz@ipbs.fr Protéomique et Spectrométrie de Masse

More information

In-Depth Qualitative Analysis of Complex Proteomic Samples Using High Quality MS/MS at Fast Acquisition Rates

In-Depth Qualitative Analysis of Complex Proteomic Samples Using High Quality MS/MS at Fast Acquisition Rates In-Depth Qualitative Analysis of Complex Proteomic Samples Using High Quality MS/MS at Fast Acquisition Rates Using the Explore Workflow on the AB SCIEX TripleTOF 5600 System A major challenge in proteomics

More information

Introduction to Proteomics

Introduction to Proteomics Introduction to Proteomics Why Proteomics? Same Genome Different Proteome Black Swallowtail - larvae and butterfly Biological Complexity Yeast - a simple proteome 6,113 proteins = 344,855 tryptic peptides

More information

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance?

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance? Optimization 1 Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance? Where to begin? 2 Sequence Databases Swiss-prot MSDB, NCBI nr dbest Species specific ORFS

More information

Application Note # LCMS-81 Introducing New Proteomics Acquisiton Strategies with the compact Towards the Universal Proteomics Acquisition Method

Application Note # LCMS-81 Introducing New Proteomics Acquisiton Strategies with the compact Towards the Universal Proteomics Acquisition Method Application Note # LCMS-81 Introducing New Proteomics Acquisiton Strategies with the compact Towards the Universal Proteomics Acquisition Method Introduction During the last decade, the complexity of samples

More information

Proteomic data analysis for Orbitrap datasets using Resources available at MSI. September 28 th 2011 Pratik Jagtap

Proteomic data analysis for Orbitrap datasets using Resources available at MSI. September 28 th 2011 Pratik Jagtap Proteomic data analysis for Orbitrap datasets using Resources available at MSI. September 28 th 2011 Pratik Jagtap The Minnesota http://www.mass.msi.umn.edu/ Proteomics workflow Trypsin Protein Peptides

More information

Introduction to Proteomics 1.0

Introduction to Proteomics 1.0 Introduction to Proteomics 1.0 CMSP Workshop Tim Griffin Associate Professor, BMBB Faculty Director, CMSP Objectives Why are we here? For participants: Learn basics of MS-based proteomics Learn what s

More information

Global and Discovery Proteomics Lecture Agenda

Global and Discovery Proteomics Lecture Agenda Global and Discovery Proteomics Christine A. Jelinek, Ph.D. Johns Hopkins University School of Medicine Department of Pharmacology and Molecular Sciences Middle Atlantic Mass Spectrometry Laboratory Global

More information

ProSightPC 3.0 Quick Start Guide

ProSightPC 3.0 Quick Start Guide ProSightPC 3.0 Quick Start Guide The Thermo ProSightPC 3.0 application is the only proteomics software suite that effectively supports high-mass-accuracy MS/MS experiments performed on LTQ FT and LTQ Orbitrap

More information

HRMS in Clinical Research: from Targeted Quantification to Metabolomics

HRMS in Clinical Research: from Targeted Quantification to Metabolomics A sponsored whitepaper. HRMS in Clinical Research: from Targeted Quantification to Metabolomics By: Bertrand Rochat Ph. D., Research Project Leader, Faculté de Biologie et de Médecine of the Centre Hospitalier

More information

Master course KEMM03 Principles of Mass Spectrometric Protein Characterization. Exam

Master course KEMM03 Principles of Mass Spectrometric Protein Characterization. Exam Exam Master course KEMM03 Principles of Mass Spectrometric Protein Characterization 2010-10-29 kl 08.15-13.00 Use a new paper for answering each question! Write your name on each paper! Aids: Mini calculator,

More information

SpikeTides TM Peptides for relative and absolute quantification in SRM and MRM Assays

SpikeTides TM Peptides for relative and absolute quantification in SRM and MRM Assays Protocol SpikeTides TM Peptides for relative and absolute quantification in SRM and MRM Assays Contact us: InfoLine: +49-30-6392-7878 Order per fax: +49-30-6392-7888 or e-mail: www: peptide@jpt.com www.jpt.com

More information

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

More information

Proteomic Analysis using Accurate Mass Tags. Gordon Anderson PNNL January 4-5, 2005

Proteomic Analysis using Accurate Mass Tags. Gordon Anderson PNNL January 4-5, 2005 Proteomic Analysis using Accurate Mass Tags Gordon Anderson PNNL January 4-5, 2005 Outline Accurate Mass and Time Tag (AMT) based proteomics Instrumentation Data analysis Data management Challenges 2 Approach

More information

MarkerView Software 1.2.1 for Metabolomic and Biomarker Profiling Analysis

MarkerView Software 1.2.1 for Metabolomic and Biomarker Profiling Analysis MarkerView Software 1.2.1 for Metabolomic and Biomarker Profiling Analysis Overview MarkerView software is a novel program designed for metabolomics applications and biomarker profiling workflows 1. Using

More information

Definition of the Measurand: CRP

Definition of the Measurand: CRP A Reference Measurement System for C-reactive Protein David M. Bunk, Ph.D. Chemical Science and Technology Laboratory National Institute of Standards and Technology Definition of the Measurand: Human C-reactive

More information

Advantages of the LTQ Orbitrap for Protein Identification in Complex Digests

Advantages of the LTQ Orbitrap for Protein Identification in Complex Digests Application Note: 386 Advantages of the LTQ Orbitrap for Protein Identification in Complex Digests Rosa Viner, Terry Zhang, Scott Peterman, and Vlad Zabrouskov, Thermo Fisher Scientific, San Jose, CA,

More information

PeptidomicsDB: a new platform for sharing MS/MS data.

PeptidomicsDB: a new platform for sharing MS/MS data. PeptidomicsDB: a new platform for sharing MS/MS data. Federica Viti, Ivan Merelli, Dario Di Silvestre, Pietro Brunetti, Luciano Milanesi, Pierluigi Mauri NETTAB2010 Napoli, 01/12/2010 Mass Spectrometry

More information

Preprocessing, Management, and Analysis of Mass Spectrometry Proteomics Data

Preprocessing, Management, and Analysis of Mass Spectrometry Proteomics Data Preprocessing, Management, and Analysis of Mass Spectrometry Proteomics Data M. Cannataro, P. H. Guzzi, T. Mazza, and P. Veltri Università Magna Græcia di Catanzaro, Italy 1 Introduction Mass Spectrometry

More information

Proteomics in Practice

Proteomics in Practice Reiner Westermeier, Torn Naven Hans-Rudolf Höpker Proteomics in Practice A Guide to Successful Experimental Design 2008 Wiley-VCH Verlag- Weinheim 978-3-527-31941-1 Preface Foreword XI XIII Abbreviations,

More information

CPAS Overview. Josh Eckels LabKey Software jeckels@labkey.com

CPAS Overview. Josh Eckels LabKey Software jeckels@labkey.com CPAS Overview Josh Eckels LabKey Software jeckels@labkey.com CPAS Web-based system for processing, storing, and analyzing results of MS/MS experiments Key goals: Provide a great analysis front-end for

More information

Quantitative mass spec based proteomics

Quantitative mass spec based proteomics Quantitative mass spec based proteomics Tuula Nyman Institute of Biotechnology tuula.nyman@helsinki.fi Proteomics is the large-scale study of proteins Proteomics provides information on: -protein expression

More information

MTH 140 Statistics Videos

MTH 140 Statistics Videos MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

More information

VALIDATION OF ANALYTICAL PROCEDURES: TEXT AND METHODOLOGY Q2(R1)

VALIDATION OF ANALYTICAL PROCEDURES: TEXT AND METHODOLOGY Q2(R1) INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE ICH HARMONISED TRIPARTITE GUIDELINE VALIDATION OF ANALYTICAL PROCEDURES: TEXT AND METHODOLOGY

More information

Introduction to mass spectrometry (MS) based proteomics and metabolomics

Introduction to mass spectrometry (MS) based proteomics and metabolomics Introduction to mass spectrometry (MS) based proteomics and metabolomics Tianwei Yu Department of Biostatistics and Bioinformatics Rollins School of Public Health Emory University September 10, 2015 Background

More information

Mascot Integra: Data management for Proteomics ASMS 2004

Mascot Integra: Data management for Proteomics ASMS 2004 Mascot Integra: Data management for Proteomics 1 Mascot Integra: Data management for proteomics What is Mascot Integra? What Mascot Integra isn t Instrument integration in Mascot Integra Designing and

More information

Challenges in Computational Analysis of Mass Spectrometry Data for Proteomics

Challenges in Computational Analysis of Mass Spectrometry Data for Proteomics Ma B. Challenges in computational analysis of mass spectrometry data for proteomics. SCIENCE AND TECHNOLOGY 25(1): 1 Jan. 2010 JOURNAL OF COMPUTER Challenges in Computational Analysis of Mass Spectrometry

More information

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS 1. The Technology Strategy sets out six areas where technological developments are required to push the frontiers of knowledge

More information

Proteomics software available in the public domain. Pratik Jagtap Minnesota Supercomputing institute

Proteomics software available in the public domain. Pratik Jagtap Minnesota Supercomputing institute Proteomics software available in the public domain. Pratik Jagtap Minnesota Supercomputing institute Two-Dimensional gel electrophoresis pi Mw Proteins are resolved based on their isolelectric point (using

More information

Mascot Search Results FAQ

Mascot Search Results FAQ Mascot Search Results FAQ 1 We had a presentation with this same title at our 2005 user meeting. So much has changed in the last 6 years that it seemed like a good idea to re-visit the topic. Just about

More information

Functional Data Analysis of MALDI TOF Protein Spectra

Functional Data Analysis of MALDI TOF Protein Spectra Functional Data Analysis of MALDI TOF Protein Spectra Dean Billheimer dean.billheimer@vanderbilt.edu. Department of Biostatistics Vanderbilt University Vanderbilt Ingram Cancer Center FDA for MALDI TOF

More information

Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments

Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments Mario Cannataro, Pietro Hiram Guzzi, Tommaso Mazza, and Pierangelo Veltri University Magna Græcia of Catanzaro, 88100

More information

OpenMS A Framework for Quantitative HPLC/MS-Based Proteomics

OpenMS A Framework for Quantitative HPLC/MS-Based Proteomics OpenMS A Framework for Quantitative HPLC/MS-Based Proteomics Knut Reinert 1, Oliver Kohlbacher 2,Clemens Gröpl 1, Eva Lange 1, Ole Schulz-Trieglaff 1,Marc Sturm 2 and Nico Pfeifer 2 1 Algorithmische Bioinformatik,

More information

Retrospective Analysis of a Host Cell Protein Perfect Storm: Identifying Immunogenic Proteins and Fixing the Problem

Retrospective Analysis of a Host Cell Protein Perfect Storm: Identifying Immunogenic Proteins and Fixing the Problem Retrospective Analysis of a Host Cell Protein Perfect Storm: Identifying Immunogenic Proteins and Fixing the Problem Kevin Van Cott, Associate Professor Dept. of Chemical and Biomolecular Engineering Nebraska

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Database Searching Tutorial/Exercises Jimmy Eng

Database Searching Tutorial/Exercises Jimmy Eng Database Searching Tutorial/Exercises Jimmy Eng Use the PETUNIA interface to run a search and generate a pepxml file that is analyzed through the PepXML Viewer. This tutorial will walk you through the

More information

Guidance for Industry

Guidance for Industry Guidance for Industry Q2B Validation of Analytical Procedures: Methodology November 1996 ICH Guidance for Industry Q2B Validation of Analytical Procedures: Methodology Additional copies are available from:

More information

Pep-Miner: A Novel Technology for Mass Spectrometry-Based Proteomics

Pep-Miner: A Novel Technology for Mass Spectrometry-Based Proteomics Pep-Miner: A Novel Technology for Mass Spectrometry-Based Proteomics Ilan Beer Haifa Research Lab Dec 10, 2002 Pep-Miner s Location in the Life Sciences World The post-genome era - the age of proteome

More information

Error Tolerant Searching of Uninterpreted MS/MS Data

Error Tolerant Searching of Uninterpreted MS/MS Data Error Tolerant Searching of Uninterpreted MS/MS Data 1 In any search of a large LC-MS/MS dataset 2 There are always a number of spectra which get poor scores, or even no match at all. 3 Sometimes, this

More information

NATIONAL GENETICS REFERENCE LABORATORY (Manchester)

NATIONAL GENETICS REFERENCE LABORATORY (Manchester) NATIONAL GENETICS REFERENCE LABORATORY (Manchester) MLPA analysis spreadsheets User Guide (updated October 2006) INTRODUCTION These spreadsheets are designed to assist with MLPA analysis using the kits

More information

itraq Tips and Tricks

itraq Tips and Tricks itraq Tips and Tricks Darryl Pappin Cold Spring Harbor Laboratory Patrick Emery Matrix Science Ltd. 1 Good morning. Unfortunately Darryl Pappin is unable to attend the Matrix Science workshop and the ASMS

More information

DeCyder Extended Data Analysis module Version 1.0

DeCyder Extended Data Analysis module Version 1.0 GE Healthcare DeCyder Extended Data Analysis module Version 1.0 Module for DeCyder 2D version 6.5 User Manual Contents 1 Introduction 1.1 Introduction... 7 1.2 The DeCyder EDA User Manual... 9 1.3 Getting

More information

Methods for Protein Analysis

Methods for Protein Analysis Methods for Protein Analysis 1. Protein Separation Methods The following is a quick review of some common methods used for protein separation: SDS-PAGE (SDS-polyacrylamide gel electrophoresis) separates

More information

Building innovative drug discovery alliances. Evotec Munich. Quantitative Proteomics to Support the Discovery & Development of Targeted Drugs

Building innovative drug discovery alliances. Evotec Munich. Quantitative Proteomics to Support the Discovery & Development of Targeted Drugs Building innovative drug discovery alliances Evotec Munich Quantitative Proteomics to Support the Discovery & Development of Targeted Drugs Evotec AG, Evotec Munich, June 2013 About Evotec Munich A leader

More information

Accurate Mass Screening Workflows for the Analysis of Novel Psychoactive Substances

Accurate Mass Screening Workflows for the Analysis of Novel Psychoactive Substances Accurate Mass Screening Workflows for the Analysis of Novel Psychoactive Substances TripleTOF 5600 + LC/MS/MS System with MasterView Software Adrian M. Taylor AB Sciex Concord, Ontario (Canada) Overview

More information

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

More information

Quan%ta%ve proteomics. Maarten Altelaar, 2014

Quan%ta%ve proteomics. Maarten Altelaar, 2014 Quan%ta%ve proteomics Maarten Altelaar, 2014 Proteomics Altelaar et al. Nat Rev Gen 14, 2013, 35-48 Quan%ta%ve proteomics Quan%ta%ve proteomics Control Diseased, s%mulated, Knock down, etc. How quan%ta%ve

More information

A Streamlined Workflow for Untargeted Metabolomics

A Streamlined Workflow for Untargeted Metabolomics A Streamlined Workflow for Untargeted Metabolomics Employing XCMS plus, a Simultaneous Data Processing and Metabolite Identification Software Package for Rapid Untargeted Metabolite Screening Baljit K.

More information

AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

More information

Pesticide Analysis by Mass Spectrometry

Pesticide Analysis by Mass Spectrometry Pesticide Analysis by Mass Spectrometry Purpose: The purpose of this assignment is to introduce concepts of mass spectrometry (MS) as they pertain to the qualitative and quantitative analysis of organochlorine

More information

Rapid and Reproducible Amino Acid Analysis of Physiological Fluids for Clinical Research Using LC/MS/MS with the atraq Kit

Rapid and Reproducible Amino Acid Analysis of Physiological Fluids for Clinical Research Using LC/MS/MS with the atraq Kit Rapid and Reproducible Amino Acid Analysis of Physiological Fluids for Clinical Research Using LC/MS/MS with the atraq Kit Fast, simple and cost effective analysis Many areas of biochemical research and

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Overview. Triple quadrupole (MS/MS) systems provide in comparison to single quadrupole (MS) systems: Introduction

Overview. Triple quadrupole (MS/MS) systems provide in comparison to single quadrupole (MS) systems: Introduction Advantages of Using Triple Quadrupole over Single Quadrupole Mass Spectrometry to Quantify and Identify the Presence of Pesticides in Water and Soil Samples André Schreiber AB SCIEX Concord, Ontario (Canada)

More information

Protein Prospector and Ways of Calculating Expectation Values

Protein Prospector and Ways of Calculating Expectation Values Protein Prospector and Ways of Calculating Expectation Values 1/16 Aenoch J. Lynn; Robert J. Chalkley; Peter R. Baker; Mark R. Segal; and Alma L. Burlingame University of California, San Francisco, San

More information

Alignment and Preprocessing for Data Analysis

Alignment and Preprocessing for Data Analysis Alignment and Preprocessing for Data Analysis Preprocessing tools for chromatography Basics of alignment GC FID (D) data and issues PCA F Ratios GC MS (D) data and issues PCA F Ratios PARAFAC Piecewise

More information

Validation and Calibration. Definitions and Terminology

Validation and Calibration. Definitions and Terminology Validation and Calibration Definitions and Terminology ACCEPTANCE CRITERIA: The specifications and acceptance/rejection criteria, such as acceptable quality level and unacceptable quality level, with an

More information

MiSeq: Imaging and Base Calling

MiSeq: Imaging and Base Calling MiSeq: Imaging and Page Welcome Navigation Presenter Introduction MiSeq Sequencing Workflow Narration Welcome to MiSeq: Imaging and. This course takes 35 minutes to complete. Click Next to continue. Please

More information

Thermo Scientific SIEVE Software for Differential Expression Analysis

Thermo Scientific SIEVE Software for Differential Expression Analysis m a s s s p e c t r o m e t r y Thermo Scientific SIEVE Software for Differential Expression Analysis Automated, label-free, semi-quantitative analysis of proteins, peptides, and metabolites based on comparisons

More information

Step-by-Step Analytical Methods Validation and Protocol in the Quality System Compliance Industry

Step-by-Step Analytical Methods Validation and Protocol in the Quality System Compliance Industry Step-by-Step Analytical Methods Validation and Protocol in the Quality System Compliance Industry BY GHULAM A. SHABIR Introduction Methods Validation: Establishing documented evidence that provides a high

More information

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

More information

Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

SELDI-TOF Mass Spectrometry Protein Data By Huong Thi Dieu La

SELDI-TOF Mass Spectrometry Protein Data By Huong Thi Dieu La SELDI-TOF Mass Spectrometry Protein Data By Huong Thi Dieu La References Alejandro Cruz-Marcelo, Rudy Guerra, Marina Vannucci, Yiting Li, Ching C. Lau, and Tsz-Kwong Man. Comparison of algorithms for pre-processing

More information

Agilent G2721AA/G2733AA Spectrum Mill MS Proteomics Workbench

Agilent G2721AA/G2733AA Spectrum Mill MS Proteomics Workbench Agilent G2721AA/G2733AA Spectrum Mill MS Proteomics Workbench Application Guide Agilent Technologies Notices Agilent Technologies, Inc. 2012 No part of this manual may be reproduced in any form or by any

More information

Thermo Scientific PepFinder Software A New Paradigm for Peptide Mapping

Thermo Scientific PepFinder Software A New Paradigm for Peptide Mapping Thermo Scientific PepFinder Software A New Paradigm for Peptide Mapping For Conclusive Characterization of Biologics Deep Protein Characterization Is Crucial Pharmaceuticals have historically been small

More information

Absolute quantification of low abundance proteins by shotgun proteomics

Absolute quantification of low abundance proteins by shotgun proteomics Absolute quantification of low abundance proteins by shotgun proteomics Dr. Stefanie Wienkoop www.proteomefactory.com In cooperation with: Max-Planck-Institut für Molekulare Pflanzenphysiologie Stable

More information

Sample Analysis Design Step 2 Calibration/Standard Preparation Choice of calibration method dependent upon several factors:

Sample Analysis Design Step 2 Calibration/Standard Preparation Choice of calibration method dependent upon several factors: Step 2 Calibration/Standard Preparation Choice of calibration method dependent upon several factors: 1. potential matrix effects 2. number of samples 3. consistency of matrix across samples Step 2 Calibration/Standard

More information

Integrated Data Mining Strategy for Effective Metabolomic Data Analysis

Integrated Data Mining Strategy for Effective Metabolomic Data Analysis The First International Symposium on Optimization and Systems Biology (OSB 07) Beijing, China, August 8 10, 2007 Copyright 2007 ORSC & APORC pp. 45 51 Integrated Data Mining Strategy for Effective Metabolomic

More information

Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant

Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant Statistical Analysis NBAF-B Metabolomics Masterclass Mark Viant 1. Introduction 2. Univariate analysis Overview of lecture 3. Unsupervised multivariate analysis Principal components analysis (PCA) Interpreting

More information

Statistical Analysis Strategies for Shotgun Proteomics Data

Statistical Analysis Strategies for Shotgun Proteomics Data Statistical Analysis Strategies for Shotgun Proteomics Data Ming Li, Ph.D. Cancer Biostatistics Center Vanderbilt University Medical Center Ayers Institute Biomarker Pipeline normal shotgun proteome analysis

More information

Standard Mixture. TOF/TOF Calibration Mixture. Calibration Mixture 1 (Cal Mix 1, 1:10) Calibration Mixture 1 (Cal Mix 1, 1:100)

Standard Mixture. TOF/TOF Calibration Mixture. Calibration Mixture 1 (Cal Mix 1, 1:10) Calibration Mixture 1 (Cal Mix 1, 1:100) Mass Standards Kit for Calibration of AB SCIEX TOF/TOF Instruments Protocol 1 Product Description The Mass Standards Kit includes reagents needed to test instrument function, optimize instrument parameters,

More information

Mass Spectra Alignments and their Significance

Mass Spectra Alignments and their Significance Mass Spectra Alignments and their Significance Sebastian Böcker 1, Hans-Michael altenbach 2 1 Technische Fakultät, Universität Bielefeld 2 NRW Int l Graduate School in Bioinformatics and Genome Research,

More information

How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

More information

Analysis of the Vitamin B Complex in Infant Formula Samples by LC-MS/MS

Analysis of the Vitamin B Complex in Infant Formula Samples by LC-MS/MS Analysis of the Vitamin B Complex in Infant Formula Samples by LC-MS/MS Stephen Lock 1 and Matthew Noestheden 2 1 AB SCIEX Warrington, Cheshire (UK), 2 AB SCIEX Concord, Ontario (Canada) Overview A rapid,

More information

ALLEN Mouse Brain Atlas

ALLEN Mouse Brain Atlas TECHNICAL WHITE PAPER: QUALITY CONTROL STANDARDS FOR HIGH-THROUGHPUT RNA IN SITU HYBRIDIZATION DATA GENERATION Consistent data quality and internal reproducibility are critical concerns for high-throughput

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

OplAnalyzer: A Toolbox for MALDI-TOF Mass Spectrometry Data Analysis

OplAnalyzer: A Toolbox for MALDI-TOF Mass Spectrometry Data Analysis OplAnalyzer: A Toolbox for MALDI-TOF Mass Spectrometry Data Analysis Thang V. Pham and Connie R. Jimenez OncoProteomics Laboratory, Cancer Center Amsterdam, VU University Medical Center De Boelelaan 1117,

More information

Applying Statistics Recommended by Regulatory Documents

Applying Statistics Recommended by Regulatory Documents Applying Statistics Recommended by Regulatory Documents Steven Walfish President, Statistical Outsourcing Services steven@statisticaloutsourcingservices.com 301-325 325-31293129 About the Speaker Mr. Steven

More information

Research-grade Targeted Proteomics Assay Development: PRMs for PTM Studies with Skyline or, How I learned to ditch the triple quad and love the QE

Research-grade Targeted Proteomics Assay Development: PRMs for PTM Studies with Skyline or, How I learned to ditch the triple quad and love the QE Research-grade Targeted Proteomics Assay Development: PRMs for PTM Studies with Skyline or, How I learned to ditch the triple quad and love the QE Jacob D. Jaffe Skyline Webinar July 2015 Proteomics and

More information

Your partner in immunology

Your partner in immunology Your partner in immunology Expertise Expertise Reactivity Reactivity Quality Quality Advice Advice Who are we? Specialist of antibody engineering Covalab is a French biotechnology company, specialised

More information