Functional Data Analysis of MALDI TOF Protein Spectra



Similar documents
Preprocessing, Management, and Analysis of Mass Spectrometry Proteomics Data

Alignment and Preprocessing for Data Analysis

The accurate calibration of all detectors is crucial for the subsequent data

How To Use An Ionsonic Microscope

1 Genzyme Corp., Framingham, MA, 2 Positive Probability Ltd, Isleham, U.K.

Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant

Weight Loss Determined from Mass Spectrometry Trend Data in a Thermogravimetric/Mass Spectrometer System

AB SCIEX TOF/TOF 4800 PLUS SYSTEM. Cost effective flexibility for your core needs

SELDI-TOF Mass Spectrometry Protein Data By Huong Thi Dieu La

13C NMR Spectroscopy

Quantitative proteomics background

Effects of Intelligent Data Acquisition and Fast Laser Speed on Analysis of Complex Protein Digests

Introduction to mass spectrometry (MS) based proteomics and metabolomics

Statistics Graduate Courses

OplAnalyzer: A Toolbox for MALDI-TOF Mass Spectrometry Data Analysis

Tutorial for proteome data analysis using the Perseus software platform

Nonlinear Iterative Partial Least Squares Method

In-Depth Qualitative Analysis of Complex Proteomic Samples Using High Quality MS/MS at Fast Acquisition Rates

Aiping Lu. Key Laboratory of System Biology Chinese Academic Society

Accurate calibration of on-line Time of Flight Mass Spectrometer (TOF-MS) for high molecular weight combustion product analysis

Increasing the Multiplexing of High Resolution Targeted Peptide Quantification Assays

1 st day Basic Training Course

FTIR Instrumentation

Introduction to Longitudinal Data Analysis

Statistical Analysis Strategies for Shotgun Proteomics Data

Waters Core Chromatography Training (2 Days)

F321 THE STRUCTURE OF ATOMS. ATOMS Atoms consist of a number of fundamental particles, the most important are... in the nucleus of an atom

FUNCTIONAL DATA ANALYSIS: INTRO TO R s FDA

Data, Measurements, Features

High Dimensional Data Analysis with Applications in IMS and fmri Processing

Signal, Noise, and Detection Limits in Mass Spectrometry

> plot(exp.btgpllm, main = "treed GP LLM,", proj = c(1)) > plot(exp.btgpllm, main = "treed GP LLM,", proj = c(2)) quantile diff (error)

[ Care and Use Manual ]

Application of Automated Data Collection to Surface-Enhanced Raman Scattering (SERS)

Mass Spectrometry Signal Calibration for Protein Quantitation

Data Mining Techniques for Prognosis in Pancreatic Cancer

STA 4273H: Statistical Machine Learning

A Streamlined Workflow for Untargeted Metabolomics

Linear Models and Conjoint Analysis with Nonlinear Spline Transformations

Copyright 2007 Casa Software Ltd. ToF Mass Calibration

VALIDATION OF ANALYTICAL PROCEDURES: TEXT AND METHODOLOGY Q2(R1)

Integrated Data Mining Strategy for Effective Metabolomic Data Analysis

MarkerView Software for Metabolomic and Biomarker Profiling Analysis

泛 用 蛋 白 質 體 學 之 質 譜 儀 資 料 分 析 平 台 的 建 立 與 應 用 Universal Mass Spectrometry Data Analysis Platform for Quantitative and Qualitative Proteomics

using ms based proteomics

m/z

Guidance for Industry

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

Introduction to Fourier Transform Infrared Spectrometry

Tutorial for Proteomics Data Submission. Katalin F. Medzihradszky Robert J. Chalkley UCSF

MRMPilot Software: Accelerating MRM Assay Development for Targeted Quantitative Proteomics

Identification algorithms for hybrid systems

QUALITY ENGINEERING PROGRAM

5MD00. Assignment Introduction. Luc Waeijen

Java Modules for Time Series Analysis

Software Approaches for Structure Information Acquisition and Training of Chemistry Students

High resolution mass spectrometry (HRMS*) in Graz

Market Risk Analysis. Quantitative Methods in Finance. Volume I. The Wiley Finance Series

Cancer Biostatistics Workshop Science of Doing Science - Biostatistics

Bio and Polymer Analytics. RD Instrumental Analytical Chemistry. Organic Trace Analytics. RD Environmental & Process Analytics

Advanced Signal Processing and Digital Noise Reduction

Background Information

Fundamentals of modern UV-visible spectroscopy. Presentation Materials

Introduction to Engineering System Dynamics

Spectrophotometry and the Beer-Lambert Law: An Important Analytical Technique in Chemistry

Linear Threshold Units

QUANTITATIVE INFRARED SPECTROSCOPY. Willard et. al. Instrumental Methods of Analysis, 7th edition, Wadsworth Publishing Co., Belmont, CA 1988, Ch 11.

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Validation and Calibration. Definitions and Terminology

CS Introduction to Data Mining Instructor: Abdullah Mueen

Part 2: Analysis of Relationship Between Two Variables

Doppler. Doppler. Doppler shift. Doppler Frequency. Doppler shift. Doppler shift. Chapter 19

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Statistical Modeling by Wavelets

Overview. Triple quadrupole (MS/MS) systems provide in comparison to single quadrupole (MS) systems: Introduction

Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments

ANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1

Protein Prospector and Ways of Calculating Expectation Values

Tutorial 9: SWATH data analysis in Skyline

Proteomics in Practice

Using CyTOF Data with FlowJo Version Revised 2/3/14

PosterREPRINT AN LC/MS ORTHOGONAL TOF (TIME OF FLIGHT) MASS SPECTROMETER WITH INCREASED TRANSMISSION, RESOLUTION, AND DYNAMIC RANGE OVERVIEW

Learning Objectives:

Algebra 1 Course Information

MultiQuant Software 2.0 for Targeted Protein / Peptide Quantification

Monitoring of Cerebral Blood Flow. Transcranial Doppler Laser Doppler Flowmetry Thermal dilution method (Hemedex)

Mass Spectrometry. Overview

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance?

Better decision making under uncertain conditions using Monte Carlo Simulation

DYNAMIC LIGHT SCATTERING COMMON TERMS DEFINED

Austin Peay State University Department of Chemistry Chem The Use of the Spectrophotometer and Beer's Law

ProteinPilot Report for ProteinPilot Software

11. Time series and dynamic linear models

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Automated Quadratic Characterization of Flow Cytometer Instrument Sensitivity (flowqb Package: Introductory Processing Using Data NIH))

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Pep-Miner: A Novel Technology for Mass Spectrometry-Based Proteomics

Trans Fats. What is a trans fat? Trans fatty acids, or trans fats as they are known, are certain


Transcription:

Functional Data Analysis of MALDI TOF Protein Spectra Dean Billheimer dean.billheimer@vanderbilt.edu. Department of Biostatistics Vanderbilt University Vanderbilt Ingram Cancer Center FDA for MALDI TOF MS p.1/43

Outline Overview of MALDI TOF Mass Spectrometry Characteristics of Spectral Signals Standard Analysis and Some Problems Analysis of Spectra as Functions Analysis of Glioma Proteins Extending FDA for Mass Spectra (coming attractions) Summary FDA for MALDI TOF MS p.2/43

MALDI TOF Mass Spectrometry Emerging as a key technology in proteomics (Nobel prize 2002). Proposed for cancer screening, diagnosis, treatment. Tremendous promise for protein profiling. Matrix Assisted Laser Desorption Ionization method of generating ions from large biomolecules (proteins!) Chemical matrix is added to sample to enhance ion formation. Pulsed laser light vaporizes/ionizes biomolecules from sample. Electric field accelerates ions and directs them into the mass analyzer. Time Of Flight separates ions based on size (mass/charge. TOF Small molecules are fast Large molecules are slow short travel time long travel time ) FDA for MALDI TOF MS p.3/43

MALDI TOF MS Schematic Laser + + ++ + + Ion Beam Time of Flight Analyzer Detector Sample and Matrix FDA for MALDI TOF MS p.4/43

MALDI-TOF Spectrum - Normal White Matter Intensity 0 10000 20000 30000 40000 10000 20000 30000 40000 50000 Mass/Charge FDA for MALDI TOF MS p.5/43

MALDI-TOF Spectra - Normal White Matter Intensity 0 10000 20000 30000 40000 Normal 1 Normal 2 10000 20000 30000 40000 50000 Mass/Charge FDA for MALDI TOF MS p.6/43

Pros/Cons of MALDI TOF MS Advantages Can be used for tissue, serum or other biological samples. Measures proteins directly. Proteins remain intact (vs. other methods). Allows measurement of many proteins simultaneously. Disadvantage Signal can be complicated. Molecules are identified only by mass/charge. Ion detection is mass dependent. 10-fold more efficient at 6 kda than 66 kda. Resolution is mass dependent. FDA for MALDI TOF MS p.7/43

Characteristics of Spectral Signals Fundamental Premise: At a given, the mean intensity is proportional to the relative amount of protein at that. (see graph) This may be difficult to detect in individual spectra because of nuisance variation. sample matrix heterogeneity (intensity) chemical noise, protein fragments, salts, fats (baseline) detector output characteristics and sensitivity other sources of error (noise) Need good signal normalization! (see graph) FDA for MALDI TOF MS p.8/43

Statistical Issues of MALDI TOF Spectra Highly multivariate! ( ). Structured signal intensity is a function of mass/charge. Variance (and higher moments) related to intensity (and ). Nuisance variation (for each spectrum) baseline adjustment intensity scaling Model identification issues. Incidental parameter problem (Neyman and Scott, 1948) FDA for MALDI TOF MS p.9/43

Survey of Standard Analysis of MALDI Spectra Within each spectrum Smoothing ( de noising ) and baseline correction Mass assignment (registration, calibration) Intensity normalization (nonlinear transformation) Peak detection from smoothed spectrum to create a peak list. Across multiple spectra Peak binning identify homologous peaks (nearby values.) Use binned peak list intensities in a classification/clustering algorithm to segregate (known) biological samples. Test classifier on independent data to assess predictive performance FDA for MALDI TOF MS p.10/43

Concerns with Standard Analysis Within a spectrum Mass registration is subject to error. (magnitude increases with distance from control points) Smoothing goals and criteria are unclear (usually by the software shipped with the spectrometer) What is baseline? (how defined?) Peak detection How is peak defined? often based on S/N (but both of these change with More fundamental concern assumes all relevant information is captured by peak location and intensity huge data reduction loss of information ) (see graph) FDA for MALDI TOF MS p.11/43

More Concerns... Combine information across multiple spectra Errors in peak detection and/or mass assignment lead to binning problems. (see graph) Tends to omit small peaks that are consistently expressed. Classification algorithm, Ignores the ordering inherent in the data ( scale) Ignores all inference goals except classification/clustering Each step proceeds conditionally on all preceeding steps (no acknowledgement of uncertainty). FDA for MALDI TOF MS p.12/43

Brief Introduction to Functional Data Analysis (Ramsay and Silverman, 1997) functional data the fundamental unit of observation is a curve (function) - patient s hormone profile (through time) - electrical potential of a neuron measured through time - spectra (mass, Raman, fluorescence, and otherwise) IDEA: We are measuring a function (often at discrete sample points), and would like to treat the function as the observation. ADVANTAGE: We are incorporating into the analysis methods structural constraints (e.g., continuity, smoothness) that are present in the data. FDA for MALDI TOF MS p.13/43

Steps in FDA Data representation: convert sample points to functional form select a functional basis (e.g., B-spline, Fourier, Wavelet) project sample points onto basis space ensuing calculations involve the basis coefficients same methods as smoothing (but not the goal) Data registration or feature alignment. Data display Calculation of Summary Statistics Statistical Modeling FDA for MALDI TOF MS p.14/43

( ) ( ) ( ) Descriptive Statistics. The! " "", and be an observed function where Let estimated mean function % $ '& # The estimated variance function # % $ '& var Covariance and Correlation functions # # % $ '& cov ) cov corr ) ) var var FDA for MALDI TOF MS p.15/43

A Functional Linear Model, 0, * / * - 3$ / Usual Linear Model / -., *+ where is an design matrix and coefficients. The usual parameter estimator is a -vector of unknown * 2, 2, 1, (- In a functional model (FANOVA). -, where, and - are functions, but is same as before. FDA for MALDI TOF MS p.16/43

Basis Function Representation : 4 5& * 3$ Represent the observations via basis function expansion 28 9 8 5 5 67 where 8 5 are basis functions covering More compactly,, and are coefficients. 5 6 :; where is the matrix of basis function coefficients. Now the FANOVA estimator is 2:;, 2, 1, (- FDA for MALDI TOF MS p.17/43

Other (* easy *) Operations in FDA Functional principal components analysis Functional linear modeling Functional ANOVA observations and parameters are functions (standard design matrix) Scalar response variable and functional independent variable All model terms are functional Functional canonical correlation Differential operators and analysis ** Thanks to Jim Ramsay for making available code for FDA. FDA for MALDI TOF MS p.18/43

Glioma Protein Analysis Glioma is a type of tumor found in the brain s white matter (infiltrating tumor cells). Four stages defined by tissue pathology. Stage progression not well understood. Compare resected tumor tissue with normal white matter from lobectomy patients. Interest in identifying protein markers of stage. FDA for MALDI TOF MS p.19/43

Analysis of Brain Tissue Mass Spectra < = Data from normal and tumor tissue specimens. Tissue cross section mounted to MALDI plate (IMS prep) Mass (per charge) range from 2000 to 50000 Da/z Focus on limited mass range 7600 to 8000 Da/z 35 patients (7 normal, 8 grade II, 9 grade III, 11 grade IV) Use B-spline basis with 120 basis functions ( data values) Thanks to Sarah Schwarz in Vanderbilt MSRC for providing data. FDA for MALDI TOF MS p.20/43

Spectrum Normalization C C B Piecewise linear baseline correction Scaling by regression against standard spectrum. Global Box-Cox transfomation based on sampling replicate spectra A. >? @ where @? is baseline correction is a scaling coefficient ( is the Box-Cox parameter D C in the following analysis) FDA for MALDI TOF MS p.21/43

Autocorrelation of Spectra 7600 7700 7800 7900 8000-0.5 1.0 7600 7700 7800 7900 8000 FDA for MALDI TOF MS p.22/43

Functional Analysis of Variance F Statistic (3, 31) 0 2 4 6 8 10 12 0.001 0.01 0.05 7600 7700 7800 7900 8000 Mass/Charge FDA for MALDI TOF MS p.23/43

Group Means Normalized Intensity 0 5 10 15 Normal Grade 2 Grade 3 Grade 4 0.001 0.01 7600 7700 7800 7900 8000 Mass/Charge FDA for MALDI TOF MS p.24/43

Key Points from Glioma Protein Spectra Analysis Identify regions exhibiting differential protein expression. Some of these regions would be difficult to find via peak selection. Autocorrelation plot suggests method for identifying different forms of a single protein. FDA for MALDI TOF MS p.25/43

Next New Thing Currently the following steps are performed sequentially 1. smooth (or de noise) spectrum 2. estimate and remove baseline 3. normalize 4. peak selection 5. do actual analysis Each step depends on all preceeding steps any error is propagated forward any uncertainty is ignored Instead, try simultaneous modeling of the (believed) components of spectra. FDA for MALDI TOF MS p.26/43

Spectrum Decomposition Spectrum Decomposition 0 2 4 6 8 10 12 0 50 100 150 Baseline Group Specific Signal Spectrum Specific FDA for MALDI TOF MS p.27/43

Spectrum Decomposition via Bayesian Inference Baseline nuisance background (in each spectrum) smoooooth monotone non increasing non negative Group Specific Signal peaks common to a group of interest combine information across multiple spectra non negative represent peaks when present, zero otherwise Spectrum Specific Signal subject or spectrum specific unexplained variation no substantial prior information aid identification may prefer mean zero for each spectrum FDA for MALDI TOF MS p.28/43

MCMC Baseline Estimate of Mass Frauda y 0 2 4 6 8 10 12 0 50 100 150 x FDA for MALDI TOF MS p.29/43

Peaks and Spectrum Effects Baseline Corrected Signal Estimate for MS Frauda y 0 2 4 6 8 10 0 50 100 150 x FDA for MALDI TOF MS p.30/43

Corrected Signal with Peaks y 0 2 4 6 8 10 0 50 100 150 x FDA for MALDI TOF MS p.31/43

Parallel Approaches to Inference E VAMPIRE cluster of 110 linux-based processors (Beowulf) Currently Embarrassingly Parallel problems Code: combination of C, R, and job scheduling languages Point-wise mixed-model analysis (Bayesian inference, using MCMC) Next Steps: combine FDA with Component-wise Bayesian model implement ScaLaPack behind language FDA for MALDI TOF MS p.32/43

Summary Protein analysis by MS has tremendous potenital for cancer screening, diagnosis, and treatment. Functional data approach is a natural fit to MS data. identified expression differences that would be difficult to find with peak detection approaches inference limitations computational challenges Good normalization is key to quantitative analysis. Theory of Normalization (w/ B. LaFleur) Proteomics = Proteo metrics All problems reduce to quantitation Adherence to statistical principles is important! dean.billheimer@vanderbilt.edu FDA for MALDI TOF MS p.33/43

Quantitation of MALDI Spectra MALDI TOF MS Calibration Experiment (Bucknall, et al. 2002) go back Peak Intensity Ratio 0.0 0.5 1.0 1.5 2.0 2.5 3.0 y = 1.17x 0.14 r = 0.998 50 100 150 200 Concentration rat met GH (nmol) FDA for MALDI TOF MS p.34/43

Unnormalized MALDI Spectra MALDI TOF MS Calibration Experiment No Normalization (Bucknall, et al. 2002) Peak Intensity 0 5000 10000 15000 20000 25000 y = 65.77x + 755.07 r = 0.83 go back 50 100 150 200 Concentration rat met GH (nmol) FDA for MALDI TOF MS p.35/43

Spectrum 1 Intensity 0 100 200 300 400 500 600 700 2520 2540 2560 2580 2600 Mass / Charge FDA for MALDI TOF MS p.36/43

Spectrum 1 with Peak Detection Intensity 0 100 200 300 400 500 600 700 2520 2540 2560 2580 2600 Mass / Charge FDA for MALDI TOF MS p.37/43

Spectrum 1 Peaks Only Intensity 0 100 200 300 400 500 600 700 go back 2520 2540 2560 2580 2600 Mass / Charge FDA for MALDI TOF MS p.38/43

Spectrum 1 Intensity 0 100 200 300 400 500 600 700 2520 2540 2560 2580 2600 Mass / Charge FDA for MALDI TOF MS p.39/43

Spectrum 1 with Peak Detection Intensity 0 100 200 300 400 500 600 700 2520 2540 2560 2580 2600 Mass / Charge FDA for MALDI TOF MS p.40/43

Spectrum 2 Intensity 0 200 400 600 2520 2540 2560 2580 2600 Mass / Charge FDA for MALDI TOF MS p.41/43

Spectrum 2 with Peak Detection Intensity 0 200 400 600 2520 2540 2560 2580 2600 Mass / Charge FDA for MALDI TOF MS p.42/43

Peaks from Spectra 1 and 2 Intensity 0 200 400 600 go back 2520 2540 2560 2580 2600 Mass / Charge FDA for MALDI TOF MS p.43/43