Statistical Analysis Strategies for Shotgun Proteomics Data


 Julius Hoover
 1 years ago
 Views:
Transcription
1 Statistical Analysis Strategies for Shotgun Proteomics Data Ming Li, Ph.D. Cancer Biostatistics Center Vanderbilt University Medical Center
2 Ayers Institute Biomarker Pipeline normal shotgun proteome analysis compare inventories, identify differences cancer no vs. targeted quantitative analysis yes Labelfree MRM in tissue Is it really there? Is there a difference? LCMRMMS method or Y Y ELISA vs. evaluate in clinical context sensitivity specificity performance metrics Discovery few(<20) samples or sample pools low throughput identify up to ~5,000 proteins inventory differences ~150 proteins Verification ~ samples higher throughput screen candidates identify high priority candidates Validation hundreds+ of samples highest throughput validate 110 candidates clinical validation trial
3 The Big Picture Proteomics Tools Mass Spectrometry Database Bioinformatics Proteinseparation Techniques Research Goal Biomarker Discovery High Throughput Data (MALDI MS, Shotgun, etc) Challenge on Data Analysis
4 Outline Background Shotgun Proteomics Techniques and Data The Challenges for Statistical Analysis Statistical Analysis Strategies Methods and Models Case Study and Preliminary Results Discussion and Future Work
5 Background on Shotgun Proteomics What is Shotgun Proteomics? A method of identifying proteins in complex mixtures using a certain separation/digestion method combined with mass spectrometry. More specifically, the proteins in the mixture are digested and the resulting peptides are separated (by HPLC, SCX, IEF, etc), tandem mass spectrometry (MS/MS) is then used to identify the peptides (LC MS/MS) and matched back to proteins.
6 General Flow in Proteomic Analysis Proteins SDSPAGE or Prep IEF or HPLC Digest Reverse phase LC Tandem LC Protein Fraction 1 Peptide Fraction 1 data Protein Fraction 2 Peptide Fraction 2 Protein Fraction 3 Peptide Fraction 3 MS/MS Analysis Database search algorithms Protein Fraction n Peptide Fraction n Identification From Liebler, Introduction to Proteomics, 2002.
7 Shotgun 101 Design Analysis Integration Nesvizhskii et al.
8 LC separation MS MSMS fragmentation m/z 2 4 m/z m/z 3 1 m/z m/z
9 dm_200392_tpepc_std #587 RT: AV: 1 NL: 1.95E7 T: + c d Full ms2 [ ] b 2 + H 3 N b 1 b 2 b 3 b 4 b 5 b 6 b 7 b A V A G C A G A R O HN O NH O NH O HS NH O NH O NH O H N + H 3 N NH O N H HN CO 2 H y y 6 Relative Abundance y 8 y 7 y 6 y 5 y 4 y 3 y 2 y 1 y b b 3 b b 6 y y b 7 b m/z y 4 m/z
10 dm_200392_tpepc_std #587 RT: AV: 1 NL: 1.95E7 T: + c d Full ms2 [ ] Relative Abundance m/z Actual MSMS scan Precursor peptide [M+H] + = Get database sequences that match precursor peptide mass AVAGCAGAR CVAAGAAGR VGGACAAAR etc Generate virtual MSMS spectra AVAGCAGAR CVAAGAAGR VGGACAAAR b2 y2b3 y3 b4 y4 b5 y5 b6 y6 b2 y2b3y3 b4 y4 b5 y5 b6 y6 b2 y2 b3 y3 b4 y4 b5 y5 b6 y6 Compare virtual spectra to real spectrum dm_200392_tpepc_std #587 RT: AV: 1 NL: 1.95E7 T: + c d Full ms2 [ ] Scoring Peptide score Relative Abundance m/z  Detect matches between theoretical b and yions and actual spectrum ions  Compute correlation scores  Rank hits AVAGCAGAR 2.56 CVAAGAAGR 0.57 VGGACAAAR 0.32
11 From Shotgun Proteomics to Biomarker Discovery A Protein/Peptide FrequencyBased Analysis Approach Compare spectral identifications between groups; The unit of measurement is the number of times a peptide observed in a single LCMS/MS round of analysis that are matched to the peptide sequences; The counts reflect the abundance of the protein from which these peptides are derived.
12 Shotgun Data and Statistical Challenge Normal Cancer Protein Sample 1 Sample m Sample 1 Sample m # # # # : : : : : : : : #N : : : : : :
13 Statistical Analysis Goal Provide investigators a winner list of proteins for further study; The winners are the potential biomarkers of diagnostics, prognostics or therapeutic; The winners are selected by appropriate statistical models and procedures.
14 Statistical Analysis Strategy Model the Count Data Poisson Regression Model Quasilikelihood Poisson Model (GEE method) Rate Model (Poisson Model with Offset) Handle Small Sample Size Provide Appropriate Test Statistics Permutation Test Deal with Multiple Comparison Issues Frequentist Approach (FDR) Empirical Bayes Approach (LOCFDR)
15 Poisson Model for Count Data Poisson Regression Model (Poisson GLM) Y is Poisson random variable with mean u, then P( Y = y) = E( Y ) = Var( Y ) = u Log link function: log (u)=( )=η= x T β (For our application: log (u)=( )=η= β 0 +(Group) β 1 ) By NewtonRaphson algorithm, get MLE of β and the 95%CI Gstatistic and Pearson s χ 2 statistics e u y u! y μ
16 Graphical Presentation of Poisson Distribution May not be Flexible for Empirical Fitting Purpose. Poisson Distributions: Poisson( λ ) Probability λ=5 λ=10 λ=15 λ= N o te : P o is s o n( λ ) has m ean λ and SD x λ
17 Extend Poisson Model Generalize Poisson Model by Allowing Dispersion Var (Y) = φ E(Y) = φu φ =1 (no dispersion), regular Poisson GLM is appropriate φ >1 (overdispersion) or φ <1 (underdispersion), dispersion), the standard error estimation is not reliable, adjust standard errors by φ. Quasilikelihood Poisson Model A more flexible modeling technique when over/under dispersion occurs in count data No strong idea about the appropriate distributional form of the outcome variable while can specify the link and variance function for the model
18 Under/OverDispersion for Shotgun Data Esitmate Dispersion Density Estimated Dispersion: Dp Ideal Dispersion Distribution: N(1, σ 2 )
19 QuasiLikelihood Poisson Model More details about quasilikelihood, define a score, U i : Then U i = Y i u ϕv( u ) i i EU ( ) = 0; VU ( ) = i i 1 ϕv( u ) ' i ϕ ( i) ( i i) ϕ ( i) 1 E 2 i ϕ i ϕ i U V u Y u V u E = = u [ V( u )] V( u ) i
20 Derive Quasilikelihood These properties are shared by the derivatives of the loglikelihood, likelihood, l, which suggest we can use U in place of l, define: yi t Q = i dt ϕv () t Then define the log quasilikelihood for all n observations as: Q n = i = 1 Q i
21 Generalized Estimation Equation (GEE) Method Quasilikelihood approach can be adapted for repeated measures and/or longitudinal experiment designs for shotgun proteomics research Generalized Estimation Equation (GEE) methods can be used for estimations Mixed effect models for nonnormal normal responses A multivariate analogue of the equations for the quasi likelihood models i T ui 1 ( ) Var( Yi; βα, ) ( Yi ui) = 0 β
22 Rate Model for Normalizing Shotgun Data Why do We Need Rate Model? The number of events observed may depend on a size variable that determines the number of opportunities for the events to occur For example: in some run of LCMS/MS, the sample density might be different; we need normalization before comparison Can be done within Poisson/QuasiPoisson GLM by modeling the effect of the size variable
23 Rate Model for Normalizing Shotgun Data Rate Model Log (u/w)= η = x β Log (u) = log(w)+ x β In this manner, we are modeling the rate of spectral observing while still maintaining the count response for the Poisson/QuasiPoisson model. We fix the coefficient as one by using an offset.
24 Handle Small Sample Size Issues with Small Sample Size: Asymptotic Distribution May not be Hold The χ 2 distribution is only an approximation that becomes more accurate as the sample size increases Not possible to say exactly how large we need, but N 5 N 5 is often suggested Lack of Power if the Effect is not Strong In the real world, due to financial/time constraints, the number of runs for shotgun data is usually small.
25 Sample Size for Poisson Power
26 Handle Small Sample Size Some Solutions Appropriate Test Statistics: D s D L ~ χ 2 ls Model S: Log(u)= )=log( log(w)+ β 0 Model L: log(u)=log( )=log(w)+)+ β 0 +(Group Group) β 1 Permutation Test: Reference distribution is obtained from the data P = {# T _perm > T _obs } / {# total permutation}
27 Statistical Analysis Strategy Model the Count Data Poisson Regression Model Quasilikelihood Poisson Model (GEE method) Rate Model (Poisson Model with Offset) Handle Small Sample Size Provide Appropriate Test Statistics Permutation Test Deal with Multiple Comparison Issues Frequentist Approach (FDR) Empirical Bayes Approach (LOCFDR)
28 Deal with Multiple Comparisons Why Worry About It? Selection Bias: : researchers tend to select significant ones to support their conclusions Inflate the False Positive Rate: : unadjusted Pvalues from the singleinference inference procedure result in increased type I error.
29 Multiple Tests Increase Chance of False Positive Probability of Having At Least One False Positive Probability o + * α = 0.1 ooooooooooooooooooooooooooooooooooooooo α = α = 0.01 o o ooo + o oo o o o + ++ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Number of Tests
30 Deal with Multiple Comparisons Family wise error rate (FWER) methods Too conservative and less application values Not suitable for largescale simultaneous hypothesis testing problems arouse from high through technologies A desirable error rate to control might be the expected proportion of errors among the rejected hypotheses.
31 Approach 1: False Discovery Rate Define the FDR to be the expectation of Q,, and control it under a value q * V V FDR = E( Q) = E( ) = E( ) V + S R True Null hypothesis Nontrue Null hypothesis Declared Declared Nonsignificant Significant U V T S mr R R Total m 0 m m 0 m From Benjamin & Hochberg, Controlling the False Discovery Rate, 1995
32 False Discovery Rate Controlling Procedure Sorted Pvalues q c = 0.1 q c = 0.2 q c =
33 Approach 2: Local False Discovery Rate Define Local FDR Null hypothesis: H 1, H 2,,, H N Test statistic: z 1, z 2,, z N p 0 =Pr{Null} f 0 (z)) density if null p 1 =Pr{nonnull} null} f 1 (z)) density if nonnull null Mixture density: f (z)) = p 0 f 0 (z)+ p 1 f 1 (z) Local FDR: fdr(z)= )=Pr{null z}= p 0 f 0 (z)) /f/ (z) From Efron, Local False Discovery Rates, 2005
34 How to Apply Local False Discovery Rate Histogram of Summary Statistics with Fitted M ixture Density Frequency MLE: delta: sigma: p0: 0.928
35 FDR vs. LOCFDR Frequentist Approach FDR LOCFDR Empirical Bayes method Works on Pvalues P values (null hypothesis tail area) Works on the test statistics Efron: In practice, FDR and LOCFDR can be combined, using BenjaminiHochberg algorithm to identify nonnull null cases, say with q=0.1, but also providing individual LOCFDR values for those cases.
36 Statistical Analysis Strategy Model the Count Data Poisson Regression Model Quasilikelihood Poisson Model Rate Model (Poisson Model with Offset) Handle Small Sample Size Provide Appropriate Test Statistics Permutation Test Deal with Multiple Comparison Issues Frequentist Approach (FDR) Empirical Bayes Approach (LOCFDR)
37 Case Study A Global Shotgun Proteomic Analysis of Colorectal Carcinoma Biological Materials RKO Cell Lines Rectal Adenocarcinoma Specimen Mass Spectrometry Method Peptides Separated by Strong Cation Exchange (SCX) and Isoelectric Focusing (IEF) LTQQrbitrap Qrbitrap Mass Spectrometer (Thermo Electron, San Joase,, CA) Database Searching and Filtering MyriMatch Search Algorithm; IPI Human Database 3.31; IDPicker 2.0
38 Case Study : Data Analysis Preliminary Results Protein.order Protein CAR_Counts RKO_Counts Poisson_Pvalue Quasi_Pvalue Pois_Pvalue_FD R 1513 IPI E E E E IPI E E IPI E E IPI E E E E IPI E E IPI E E E E IPI E E E E IPI E E E E IPI E E E E IPI E E E E IPI E E E E IPI E E E E IPI E E E E IPI E E E E IPI E E E E IPI E E E E IPI E E E E IPI E E E E IPI E E E E07 Quasi_Pvalue_FDR
39 Case Study Data Analysis Preliminary Results
40 Discussion and Future Study A framework Flexible to be Extend Repeat measurement; trend test, and etc. Quality Control of the Data Involve in the early step of data generation process Simulation Study Evaluating the Methods Add the Current Work to the pipeline of the Proteomic Research
41 Acknowledgement Rob Slebos Dan Liebler Pierre Massion Takefumi Kikuchi David Carbone Will Gray Yu Shyr Cancer Biostatistics Center Ayer Institute GI SPORE Lung SPORE
42 Thank You!
Cancer Biostatistics Workshop Science of Doing Science  Biostatistics
Cancer Biostatistics Workshop Science of Doing Science  Biostatistics Yu Shyr, PhD Jan. 18, 2008 Cancer Biostatistics Center VanderbiltIngram Cancer Center Yu.Shyr@vanderbilt.edu Aims Cancer Biostatistics
More informationGene Expression Analysis
Gene Expression Analysis Jie Peng Department of Statistics University of California, Davis May 2012 RNA expression technologies Highthroughput technologies to measure the expression levels of thousands
More informationAiping Lu. Key Laboratory of System Biology Chinese Academic Society APLV@sibs.ac.cn
Aiping Lu Key Laboratory of System Biology Chinese Academic Society APLV@sibs.ac.cn Proteome and Proteomics PROTEin complement expressed by genome Marc Wilkins Electrophoresis. 1995. 16(7):10904. proteomics
More informationApplication Note # LCMS81 Introducing New Proteomics Acquisiton Strategies with the compact Towards the Universal Proteomics Acquisition Method
Application Note # LCMS81 Introducing New Proteomics Acquisiton Strategies with the compact Towards the Universal Proteomics Acquisition Method Introduction During the last decade, the complexity of samples
More informationQuantitative proteomics background
Proteomics data analysis seminar Quantitative proteomics and transcriptomics of anaerobic and aerobic yeast cultures reveals post transcriptional regulation of key cellular processes de Groot, M., Daran
More informationGlobal and Discovery Proteomics Lecture Agenda
Global and Discovery Proteomics Christine A. Jelinek, Ph.D. Johns Hopkins University School of Medicine Department of Pharmacology and Molecular Sciences Middle Atlantic Mass Spectrometry Laboratory Global
More informationAB SCIEX TOF/TOF 4800 PLUS SYSTEM. Cost effective flexibility for your core needs
AB SCIEX TOF/TOF 4800 PLUS SYSTEM Cost effective flexibility for your core needs AB SCIEX TOF/TOF 4800 PLUS SYSTEM It s just what you expect from the industry leader. The AB SCIEX 4800 Plus MALDI TOF/TOF
More informationProtein Prospector and Ways of Calculating Expectation Values
Protein Prospector and Ways of Calculating Expectation Values 1/16 Aenoch J. Lynn; Robert J. Chalkley; Peter R. Baker; Mark R. Segal; and Alma L. Burlingame University of California, San Francisco, San
More informationInDepth Qualitative Analysis of Complex Proteomic Samples Using High Quality MS/MS at Fast Acquisition Rates
InDepth Qualitative Analysis of Complex Proteomic Samples Using High Quality MS/MS at Fast Acquisition Rates Using the Explore Workflow on the AB SCIEX TripleTOF 5600 System A major challenge in proteomics
More informationTutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
More informationProteinScape. Innovation with Integrity. Proteomics Data Analysis & Management. Mass Spectrometry
ProteinScape Proteomics Data Analysis & Management Innovation with Integrity Mass Spectrometry ProteinScape a Virtual Environment for Successful Proteomics To overcome the growing complexity of proteomics
More informationEffects of Intelligent Data Acquisition and Fast Laser Speed on Analysis of Complex Protein Digests
Effects of Intelligent Data Acquisition and Fast Laser Speed on Analysis of Complex Protein Digests AB SCIEX TOF/TOF 5800 System with DynamicExit Algorithm and ProteinPilot Software for Robust Protein
More informationIntroduction to mass spectrometry (MS) based proteomics and metabolomics
Introduction to mass spectrometry (MS) based proteomics and metabolomics Tianwei Yu Department of Biostatistics and Bioinformatics Rollins School of Public Health Emory University September 10, 2015 Background
More informationMRMPilot Software: Accelerating MRM Assay Development for Targeted Quantitative Proteomics
MRMPilot Software: Accelerating MRM Assay Development for Targeted Quantitative Proteomics With Unique QTRAP and TripleTOF 5600 System Technology Targeted peptide quantification is a rapidly growing application
More informationMultiQuant Software 2.0 for Targeted Protein / Peptide Quantification
MultiQuant Software 2.0 for Targeted Protein / Peptide Quantification Gold Standard for Quantitative Data Processing Because of the sensitivity, selectivity, speed and throughput at which MRM assays can
More informationThe Scheduled MRM Algorithm Enables Intelligent Use of Retention Time During Multiple Reaction Monitoring
The Scheduled MRM Algorithm Enables Intelligent Use of Retention Time During Multiple Reaction Monitoring Delivering up to 2500 MRM Transitions per LC Run Christie Hunter 1, Brigitte Simons 2 1 AB SCIEX,
More informationFunctional Data Analysis of MALDI TOF Protein Spectra
Functional Data Analysis of MALDI TOF Protein Spectra Dean Billheimer dean.billheimer@vanderbilt.edu. Department of Biostatistics Vanderbilt University Vanderbilt Ingram Cancer Center FDA for MALDI TOF
More informationPepMiner: A Novel Technology for Mass SpectrometryBased Proteomics
PepMiner: A Novel Technology for Mass SpectrometryBased Proteomics Ilan Beer Haifa Research Lab Dec 10, 2002 PepMiner s Location in the Life Sciences World The postgenome era  the age of proteome
More informationFrom Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNAseq data
From Reads to Differentially Expressed Genes The statistics of differential gene expression analysis using RNAseq data experimental design data collection modeling statistical testing biological heterogeneity
More informationStatistical issues in the analysis of microarray data
Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data
More informationProteinPilot Report for ProteinPilot Software
ProteinPilot Report for ProteinPilot Software Detailed Analysis of Protein Identification / Quantitation Results Automatically Sean L Seymour, Christie Hunter SCIEX, USA Pow erful mass spectrometers like
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More informationIncreasing the Multiplexing of High Resolution Targeted Peptide Quantification Assays
Increasing the Multiplexing of High Resolution Targeted Peptide Quantification Assays Scheduled MRM HR Workflow on the TripleTOF Systems Jenny Albanese, Christie Hunter AB SCIEX, USA Targeted quantitative
More informationFalse Discovery Rates
False Discovery Rates John D. Storey Princeton University, Princeton, USA January 2010 Multiple Hypothesis Testing In hypothesis testing, statistical significance is typically based on calculations involving
More informationTutorial for Proteomics Data Submission. Katalin F. Medzihradszky Robert J. Chalkley UCSF
Tutorial for Proteomics Data Submission Katalin F. Medzihradszky Robert J. Chalkley UCSF Why Have Guidelines? Largescale proteomics studies create huge amounts of data. It is impossible/impractical to
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002Topics in StatisticsBiological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationMASCOT Search Results Interpretation
The Mascot protein identification program (Matrix Science, Ltd.) uses statistical methods to assess the validity of a match. MS/MS data is not ideal. That is, there are unassignable peaks (noise) and usually
More informationPeptidomicsDB: a new platform for sharing MS/MS data.
PeptidomicsDB: a new platform for sharing MS/MS data. Federica Viti, Ivan Merelli, Dario Di Silvestre, Pietro Brunetti, Luciano Milanesi, Pierluigi Mauri NETTAB2010 Napoli, 01/12/2010 Mass Spectrometry
More informationSession 1. Course Presentation: Mass spectrometrybased proteomics for molecular and cellular biologists
Program Overview Session 1. Course Presentation: Mass spectrometrybased proteomics for molecular and cellular biologists Session 2. Principles of Mass Spectrometry Session 3. Mass spectrometry based proteomics
More informationStatistical Analysis. NBAFB Metabolomics Masterclass. Mark Viant
Statistical Analysis NBAFB Metabolomics Masterclass Mark Viant 1. Introduction 2. Univariate analysis Overview of lecture 3. Unsupervised multivariate analysis Principal components analysis (PCA) Interpreting
More informationWorkshop IIc. Manual interpretation of MS/MS spectra. Ebbing de Jong. Center for Mass Spectrometry and Proteomics Phone (612)6252280 (612)6252279
Workshop IIc Manual interpretation of MS/MS spectra Ebbing de Jong Why MS/MS spectra? The information contained in an MS spectrum (m/z, isotope spacing and therefore z ) is not enough to tell us the amino
More informationModels for Count Data With Overdispersion
Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extrapoisson variation and the negative binomial model, with brief appearances
More informationAdvantages of the LTQ Orbitrap for Protein Identification in Complex Digests
Application Note: 386 Advantages of the LTQ Orbitrap for Protein Identification in Complex Digests Rosa Viner, Terry Zhang, Scott Peterman, and Vlad Zabrouskov, Thermo Fisher Scientific, San Jose, CA,
More informationBootstrapping pvalue estimations
Bootstrapping pvalue estimations In microarray studies it is common that the the sample size is small and that the distribution of expression values differs from normality. In this situations, permutation
More informationChoices, choices, choices... Which sequence database? Which modifications? What mass tolerance?
Optimization 1 Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance? Where to begin? 2 Sequence Databases Swissprot MSDB, NCBI nr dbest Species specific ORFS
More informationAutomated Biosurveillance Data from England and Wales, 1991 2011
Article DOI: http://dx.doi.org/10.3201/eid1901.120493 Automated Biosurveillance Data from England and Wales, 1991 2011 Technical Appendix This online appendix provides technical details of statistical
More informationResearchgrade Targeted Proteomics Assay Development: PRMs for PTM Studies with Skyline or, How I learned to ditch the triple quad and love the QE
Researchgrade Targeted Proteomics Assay Development: PRMs for PTM Studies with Skyline or, How I learned to ditch the triple quad and love the QE Jacob D. Jaffe Skyline Webinar July 2015 Proteomics and
More information泛 用 蛋 白 質 體 學 之 質 譜 儀 資 料 分 析 平 台 的 建 立 與 應 用 Universal Mass Spectrometry Data Analysis Platform for Quantitative and Qualitative Proteomics
泛 用 蛋 白 質 體 學 之 質 譜 儀 資 料 分 析 平 台 的 建 立 與 應 用 Universal Mass Spectrometry Data Analysis Platform for Quantitative and Qualitative Proteomics 2014 Training Course WeiHung Chang ( 張 瑋 宏 ) ABRC, Academia
More informationA Streamlined Workflow for Untargeted Metabolomics
A Streamlined Workflow for Untargeted Metabolomics Employing XCMS plus, a Simultaneous Data Processing and Metabolite Identification Software Package for Rapid Untargeted Metabolite Screening Baljit K.
More informationChapter 14. Modeling Experimental Design for Proteomics. Jan Eriksson and David Fenyö. Abstract. 1. Introduction
Chapter Modeling Experimental Design for Proteomics Jan Eriksson and David Fenyö Abstract The complexity of proteomes makes good experimental design essential for their successful investigation. Here,
More informationShotgun Proteomic Analysis. Department of Cell Biology The Scripps Research Institute
Shotgun Proteomic Analysis Department of Cell Biology The Scripps Research Institute Biological/Functional Resolution of Experiments Organelle Multiprotein Complex Cells/Tissues Function Expression Analysis
More informationLa Protéomique : Etat de l art et perspectives
La Protéomique : Etat de l art et perspectives Odile Schiltz Institut de Pharmacologie et de Biologie Structurale CNRS, Université de Toulouse, Odile.Schiltz@ipbs.fr Protéomique et Spectrométrie de Masse
More informationLearning Objectives:
Proteomics Methodology for LCMS/MS Data Analysis Methodology for LCMS/MS Data Analysis Peptide mass spectrum data of individual protein obtained from LCMS/MS has to be analyzed for identification of
More informationOplAnalyzer: A Toolbox for MALDITOF Mass Spectrometry Data Analysis
OplAnalyzer: A Toolbox for MALDITOF Mass Spectrometry Data Analysis Thang V. Pham and Connie R. Jimenez OncoProteomics Laboratory, Cancer Center Amsterdam, VU University Medical Center De Boelelaan 1117,
More informationError Tolerant Searching of Uninterpreted MS/MS Data
Error Tolerant Searching of Uninterpreted MS/MS Data 1 In any search of a large LCMS/MS dataset 2 There are always a number of spectra which get poor scores, or even no match at all. 3 Sometimes, this
More informationUn (bref) aperçu des méthodes et outils de fouilles et de visualisation de données «omics»
Un (bref) aperçu des méthodes et outils de fouilles et de visualisation de données «omics» Workshop «Protéomique & Maladies rares» 25 th September 2012, Paris yves.vandenbrouck@cea.fr CEA Grenoble irtsv
More informationPinpointing phosphorylation sites using Selected Reaction Monitoring and Skyline
Pinpointing phosphorylation sites using Selected Reaction Monitoring and Skyline Christina Ludwig group of Ruedi Aebersold, ETH Zürich The challenge of phosphosite assignment Peptides Phosphopeptides MS/MS
More informationMass Spectrometry Based Proteomics
Mass Spectrometry Based Proteomics Proteomics Shared Research Oregon Health & Science University Portland, Oregon This document is designed to give a brief overview of Mass Spectrometry Based Proteomics
More informationProteomic Analysis using Accurate Mass Tags. Gordon Anderson PNNL January 45, 2005
Proteomic Analysis using Accurate Mass Tags Gordon Anderson PNNL January 45, 2005 Outline Accurate Mass and Time Tag (AMT) based proteomics Instrumentation Data analysis Data management Challenges 2 Approach
More informationCliquid ChemoView 3.0 Software Simple automated analysis, from sample to report
PRODUCT BULLETIN Cliquid ChemoView 3.0 Software for Routine Screening and Quantitation Cliquid ChemoView 3.0 Software Simple automated analysis, from sample to report KEY FEATURES Secure user login that
More informationLikelihood Approaches for Trial Designs in Early Phase Oncology
Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth GarrettMayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationAPPLIED MISSING DATA ANALYSIS
APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview
More informationMascot Search Results FAQ
Mascot Search Results FAQ 1 We had a presentation with this same title at our 2005 user meeting. So much has changed in the last 6 years that it seemed like a good idea to revisit the topic. Just about
More informationAbsolute quantification of low abundance proteins by shotgun proteomics
Absolute quantification of low abundance proteins by shotgun proteomics Dr. Stefanie Wienkoop www.proteomefactory.com In cooperation with: MaxPlanckInstitut für Molekulare Pflanzenphysiologie Stable
More informationRetrospective Analysis of a Host Cell Protein Perfect Storm: Identifying Immunogenic Proteins and Fixing the Problem
Retrospective Analysis of a Host Cell Protein Perfect Storm: Identifying Immunogenic Proteins and Fixing the Problem Kevin Van Cott, Associate Professor Dept. of Chemical and Biomolecular Engineering Nebraska
More informationIntroduction to Proteomics 1.0
Introduction to Proteomics 1.0 CMSP Workshop Tim Griffin Associate Professor, BMBB Faculty Director, CMSP Objectives Why are we here? For participants: Learn basics of MSbased proteomics Learn what s
More informationLogistic Regression (a type of Generalized Linear Model)
Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge
More informationIntroduction to Proteomics
Introduction to Proteomics Åsa Wheelock, Ph.D. Division of Respiratory Medicine & Karolinska Biomics Center asa.wheelock@ki.se In: Systems Biology and the Omics Cascade, Karolinska Institutet, June 913,
More informationMass Spectra Alignments and their Significance
Mass Spectra Alignments and their Significance Sebastian Böcker 1, HansMichael altenbach 2 1 Technische Fakultät, Universität Bielefeld 2 NRW Int l Graduate School in Bioinformatics and Genome Research,
More informationProteomics in Practice
Reiner Westermeier, Torn Naven HansRudolf Höpker Proteomics in Practice A Guide to Successful Experimental Design 2008 WileyVCH Verlag Weinheim 9783527319411 Preface Foreword XI XIII Abbreviations,
More informationChallenges in Computational Analysis of Mass Spectrometry Data for Proteomics
Ma B. Challenges in computational analysis of mass spectrometry data for proteomics. SCIENCE AND TECHNOLOGY 25(1): 1 Jan. 2010 JOURNAL OF COMPUTER Challenges in Computational Analysis of Mass Spectrometry
More informationIntroduction to Proteomics
Introduction to Proteomics Why Proteomics? Same Genome Different Proteome Black Swallowtail  larvae and butterfly Biological Complexity Yeast  a simple proteome 6,113 proteins = 344,855 tryptic peptides
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models  part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK2800 Kgs. Lyngby
More informationNotes for STA 437/1005 Methods for Multivariate Data
Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.
More informationTechnology StepbyStep Using StatCrunch
Technology StepbyStep Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate
More informationOpenMS A Framework for Quantitative HPLC/MSBased Proteomics
OpenMS A Framework for Quantitative HPLC/MSBased Proteomics Knut Reinert 1, Oliver Kohlbacher 2,Clemens Gröpl 1, Eva Lange 1, Ole SchulzTrieglaff 1,Marc Sturm 2 and Nico Pfeifer 2 1 Algorithmische Bioinformatik,
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks YoungRae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationStatistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013
Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.11.6) Objectives
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationStudy Design and Statistical Analysis
Study Design and Statistical Analysis Anny H Xiang, PhD Department of Preventive Medicine University of Southern California Outline Designing Clinical Research Studies Statistical Data Analysis Designing
More informationDRUG METABOLISM. Drug discovery & development solutions FOR DRUG METABOLISM
DRUG METABLISM Drug discovery & development solutions FR DRUG METABLISM Fast and efficient metabolite identification is critical in today s drug discovery pipeline. The goal is to achieve rapid structural
More informationMarkerView Software 1.2.1 for Metabolomic and Biomarker Profiling Analysis
MarkerView Software 1.2.1 for Metabolomic and Biomarker Profiling Analysis Overview MarkerView software is a novel program designed for metabolomics applications and biomarker profiling workflows 1. Using
More informationMaster course KEMM03 Principles of Mass Spectrometric Protein Characterization. Exam
Exam Master course KEMM03 Principles of Mass Spectrometric Protein Characterization 20101029 kl 08.1513.00 Use a new paper for answering each question! Write your name on each paper! Aids: Mini calculator,
More informationProteomic data analysis for Orbitrap datasets using Resources available at MSI. September 28 th 2011 Pratik Jagtap
Proteomic data analysis for Orbitrap datasets using Resources available at MSI. September 28 th 2011 Pratik Jagtap The Minnesota http://www.mass.msi.umn.edu/ Proteomics workflow Trypsin Protein Peptides
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationIntroduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures.
Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses.
More informationOnline 12  Sections 9.1 and 9.2Doug Ensley
Student: Date: Instructor: Doug Ensley Course: MAT117 01 Applied Statistics  Ensley Assignment: Online 12  Sections 9.1 and 9.2 1. Does a Pvalue of 0.001 give strong evidence or not especially strong
More informationThe Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities
The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities Elizabeth GarrettMayer, PhD Assistant Professor Sidney Kimmel Comprehensive Cancer Center Johns Hopkins University 1
More informationCorrelational Research
Correlational Research Chapter Fifteen Correlational Research Chapter Fifteen Bring folder of readings The Nature of Correlational Research Correlational Research is also known as Associational Research.
More informationProSightPC 3.0 Quick Start Guide
ProSightPC 3.0 Quick Start Guide The Thermo ProSightPC 3.0 application is the only proteomics software suite that effectively supports highmassaccuracy MS/MS experiments performed on LTQ FT and LTQ Orbitrap
More informationSystems Biology through Data Analysis and Simulation
Biomolecular Networks Initiative Systems Biology through Data Analysis and Simulation William Cannon Computational Biosciences 5/30/03 Cellular Dynamics Microbial Cell Dynamics Data Mining Nitrate NARX
More informationComparing Functional Data Analysis Approach and Nonparametric MixedEffects Modeling Approach for Longitudinal Data Analysis
Comparing Functional Data Analysis Approach and Nonparametric MixedEffects Modeling Approach for Longitudinal Data Analysis Hulin Wu, PhD, Professor (with Dr. Shuang Wu) Department of Biostatistics &
More informationAccurate Mass Screening Workflows for the Analysis of Novel Psychoactive Substances
Accurate Mass Screening Workflows for the Analysis of Novel Psychoactive Substances TripleTOF 5600 + LC/MS/MS System with MasterView Software Adrian M. Taylor AB Sciex Concord, Ontario (Canada) Overview
More informationBuilding innovative drug discovery alliances. Evotec Munich. Quantitative Proteomics to Support the Discovery & Development of Targeted Drugs
Building innovative drug discovery alliances Evotec Munich Quantitative Proteomics to Support the Discovery & Development of Targeted Drugs Evotec AG, Evotec Munich, June 2013 About Evotec Munich A leader
More informationUsing Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments
Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments Mario Cannataro, Pietro Hiram Guzzi, Tommaso Mazza, and Pierangelo Veltri University Magna Græcia of Catanzaro, 88100
More informationInvestigating Biological Variation of Liver Enzymes in Human Hepatocytes
Investigating Biological Variation of Liver Enzymes in Human Hepatocytes MS/MS ALL with SWATH Acquisition on the TripleTOF Systems Xu Wang 1, Hui Zhang 2, Christie Hunter 1 1 AB SCIEX, USA, 2 Pfizer, USA
More information11. Analysis of Casecontrol Studies Logistic Regression
Research methods II 113 11. Analysis of Casecontrol Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationSOME STATISTICAL ISSUES IN THE DESIGN AND ANALYSIS OF VACCINE CLINICAL TRIALS IN CANCER PATIENTS
SOME STATISTICAL ISSUES IN THE DESIGN AND ANALYSIS OF VACCINE CLINICAL TRIALS IN CANCER PATIENTS isbtc Workshop; 10/28/09 DOUGLAS M. POTTER Biostatistics Department, Graduate School of Public Health, University
More informationWISE Power Tutorial All Exercises
ame Date Class WISE Power Tutorial All Exercises Power: The B.E.A.. Mnemonic Four interrelated features of power can be summarized using BEA B Beta Error (Power = 1 Beta Error): Beta error (or Type II
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationEBM Cheat Sheet Measurements Card
EBM Cheat Sheet Measurements Card Basic terms: Prevalence = Number of existing cases of disease at a point in time / Total population. Notes: Numerator includes old and new cases Prevalence is crosssectional
More informationBasic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 12: June 22, 2012. Abstract. Review session.
June 23, 2012 1 review session Basic Data Analysis Stephen Turnbull Business Administration and Public Policy Lecture 12: June 22, 2012 Review session. Abstract Quantitative methods in business Accounting
More informationBiopharmaceutical Glycosylation Analysis
Biopharmaceutical Glycosylation Analysis Glycosylation Analysis: Product Offering Molecular model of erythropoietin with complex Nlinked glycans. Courtesy of M.R Wormald and R.A Dwek, Oxford Glycobioloy
More informationFrom the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
More informationPITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU
PITFALLS IN TIME SERIES ANALYSIS Cliff Hurvich Stern School, NYU The t Test If x 1,..., x n are independent and identically distributed with mean 0, and n is not too small, then t = x 0 s n has a standard
More informationStatistics in Retail Finance. Chapter 2: Statistical models of default
Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision
More information0BComparativeMarkerSelection Documentation
0BComparativeMarkerSelection Documentation Description: Author: Computes significance values for features using several metrics, including FDR(BH), Q Value, FWER, FeatureSpecific PValue, and Bonferroni.
More informationThe Bonferonni and Šidák Corrections for Multiple Comparisons
The Bonferonni and Šidák Corrections for Multiple Comparisons Hervé Abdi 1 1 Overview The more tests we perform on a set of data, the more likely we are to reject the null hypothesis when it is true (i.e.,
More informationAutomated reporting from gelbased proteomics experiments using the open source Proteios database application
668 DOI 10.1002/pmic.200600814 Proteomics 2007, 7, 668 674 RESEARCH ARTICLE Automated reporting from gelbased proteomics experiments using the open source Proteios database application Fredrik Levander
More information