SELDI-TOF Mass Spectrometry Protein Data By Huong Thi Dieu La



Similar documents
Preprocessing, Management, and Analysis of Mass Spectrometry Proteomics Data

Alignment and Preprocessing for Data Analysis

Copyright 2007 Casa Software Ltd. ToF Mass Calibration

Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments

Functional Data Analysis of MALDI TOF Protein Spectra

Quality Assessment of Exon and Gene Arrays

泛 用 蛋 白 質 體 學 之 質 譜 儀 資 料 分 析 平 台 的 建 立 與 應 用 Universal Mass Spectrometry Data Analysis Platform for Quantitative and Qualitative Proteomics

Learning Objectives:

1 Genzyme Corp., Framingham, MA, 2 Positive Probability Ltd, Isleham, U.K.

Protein Prospector and Ways of Calculating Expectation Values

OplAnalyzer: A Toolbox for MALDI-TOF Mass Spectrometry Data Analysis

using ms based proteomics

Tutorial for Proteomics Data Submission. Katalin F. Medzihradszky Robert J. Chalkley UCSF

Aiping Lu. Key Laboratory of System Biology Chinese Academic Society

Medical Informatics II

DeCyder Extended Data Analysis module Version 1.0

Biomarker Discovery and Data Visualization Tool for Ovarian Cancer Screening

Introduction to mass spectrometry (MS) based proteomics and metabolomics

Integrated Data Mining Strategy for Effective Metabolomic Data Analysis

MASCOT Search Results Interpretation

Mass Spectrometry Signal Calibration for Protein Quantitation

Quantitative proteomics background

Introduction to Proteomics 1.0

[ Care and Use Manual ]

Tutorial 9: SWATH data analysis in Skyline

MarkerView Software for Metabolomic and Biomarker Profiling Analysis

Software and Methods for the Analysis of Affymetrix GeneChip Data. Rafael A Irizarry Department of Biostatistics Johns Hopkins University

Pep-Miner: A Novel Technology for Mass Spectrometry-Based Proteomics

MEASURES OF VARIATION

Gene Expression Analysis

MRMPilot Software: Accelerating MRM Assay Development for Targeted Quantitative Proteomics

ELECTRON SPIN RESONANCE Last Revised: July 2007

Mass Spectrometry. Overview

Increasing the Multiplexing of High Resolution Targeted Peptide Quantification Assays

ProteinPilot Report for ProteinPilot Software

MultiQuant Software 2.0 for Targeted Protein / Peptide Quantification

Statistical issues in the analysis of microarray data

Enhancing GCMS analysis of trace compounds using a new dynamic baseline compensation algorithm to reduce background interference

Beware that Low Urine Creatinine! by Vera F. Dolan MSPH FALU, Michael Fulks MD, Robert L. Stout PhD

Signal, Noise, and Detection Limits in Mass Spectrometry

The Scheduled MRM Algorithm Enables Intelligent Use of Retention Time During Multiple Reaction Monitoring

Background Information

Accurate Mass Screening Workflows for the Analysis of Novel Psychoactive Substances

Data Exploration Data Visualization

Thermo Scientific PepFinder Software A New Paradigm for Peptide Mapping

DYNAMIC LIGHT SCATTERING COMMON TERMS DEFINED

La Protéomique : Etat de l art et perspectives

F321 THE STRUCTURE OF ATOMS. ATOMS Atoms consist of a number of fundamental particles, the most important are... in the nucleus of an atom

Introduction to Proteomics

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

In-Depth Qualitative Analysis of Complex Proteomic Samples Using High Quality MS/MS at Fast Acquisition Rates

Session 1. Course Presentation: Mass spectrometry-based proteomics for molecular and cellular biologists

Using MATLAB: Bioinformatics Toolbox for Life Sciences

Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant

An Introduction to Point Pattern Analysis using CrimeStat

PeptidomicsDB: a new platform for sharing MS/MS data.

The Open2Dprot Proteomics Project for n-dimensional Protein Expression Data Analysis

Un (bref) aperçu des méthodes et outils de fouilles et de visualisation de données «omics»

Analysis of Liquid Samples on the Agilent GC-MS

Mass Spectra Alignments and their Significance

Section 1.3 Exercises (Solutions)

Pesticide Analysis by Mass Spectrometry

The accurate calibration of all detectors is crucial for the subsequent data

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

Protein Protein Interaction Networks

Signal to Noise Instrumental Excel Assignment

Introduction to Pattern Recognition

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance?

Correlation of the Mass Spectrometric Analysis of Heat-Treated Glutaraldehyde Preparations to Their 235nm / 280 nm UV Absorbance Ratio

Statistical Analysis Strategies for Shotgun Proteomics Data

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

Tutorial for proteome data analysis using the Perseus software platform

13C NMR Spectroscopy

What Do We Learn about Hepatotoxicity Using Coumarin-Treated Rat Model?

AMD Analysis & Technology AG

Chapter 13 Spectroscopy NMR, IR, MS, UV-Vis

Accurate calibration of on-line Time of Flight Mass Spectrometer (TOF-MS) for high molecular weight combustion product analysis

Segmentation and Automatic Descreening of Scanned Documents

Proteomics in Practice

m/z

Tuning & Mass Calibration

Analyst 1.6 Software. Software Reference Guide

Appendix 5 Overview of requirements in English

Secondary Ion Mass Spectrometry

How To Use An Ionsonic Microscope

Application Report. Propeller Blade Inspection Station

Metabolomics Software Tools. Xiuxia Du, Paul Benton, Stephen Barnes

Jitter Measurements in Serial Data Signals

SSO Transmission Grating Spectrograph (TGS) User s Guide

Mass Frontier Version 7.0

Tools for Viewing and Quality Checking ARM Data

Nonlinear Iterative Partial Least Squares Method

Application of Automated Data Collection to Surface-Enhanced Raman Scattering (SERS)

Challenges in Computational Analysis of Mass Spectrometry Data for Proteomics

Transcription:

SELDI-TOF Mass Spectrometry Protein Data By Huong Thi Dieu La

References Alejandro Cruz-Marcelo, Rudy Guerra, Marina Vannucci, Yiting Li, Ching C. Lau, and Tsz-Kwong Man. Comparison of algorithms for pre-processing of SELDI-TOF mass spectrometry data. Bioinformatics, 24(19):2129 2136, 2008. Robert Gentleman, Vincent Carey, Wolfgang Huber, Rafael Irizarry, and Sandrine Dudoit. Bioinformatics and Computational Biology Solutions Using R and Bioconductor (Statistics for Biology and Health). Springer Science and Business Media, Inc, New York, first edition edition, 2005. Haleem J. Issaq, Timothy D. Veenstra, Thomas P. Conrads,, and Donna Felschow. The SELDI-TOF MS approach to proteomics: Protein profiling and biomarker identification. Biochemical and Biophysical Research Communications, 292:587 592, 2002.

SELDI-TOF-MS Surface Enhanced Laser Desorption/Ionization Time-of-Flight Mass Spectrometry Used to profile protein markers from tissue or bodily fluids and thus identify biomarkers that can aid in diagnosis, prognosis or treatment. Application: psychiatric disease, renal function, cancer (pancreatic, prostate, ovarian, and breast)

SELDI-TOF-MS Components ProteinChip array Retain specific proteins from the sample Reader Measures the molecular weights of the retained proteins and generates a trace showing the relative abundance vs. the molecular weights of these proteins Software Identify differences in protein abundances between two samples

Source:http://www.rci.rutgers.edu/~layla/AnalMedChem511/pdf_files/RB_pdf/403featu re_issaq.pdf

Preparation Biological samples are processed via fractionation. Fractionation: the process of splitting the original sample into subsamples which contain proteins that are more homogeneous

EAM: Energy Absorbing Molecule Source:http://urology.jhu.edu/research/img1/proteomics13.jpg

Preprocessing of MS data Alignment of the spectra Filtering (Denoising) Baseline subtraction Normalization Peak Detection Clustering of peaks Peak quantification

SELDI-TOF-MS softwares ProteinChip Software 3.1 SpecAlign Cromwell PROcess MassSpecWAvelet

PROcess package Process a single spectrum Process a set of spectra

Process a single spectrum Baseline subtraction Peak detection

Baseline subtraction Purpose: To level off the elevated, non-constant baseline caused by the chemical noise in the EAM and by ion overload, thus, make different spectra compatible. Solution: Using local regression to estimate the bottom of a spectrum and then subtracting that estimate from a spectrum Two approaches: Fitting local regression to: The points below a certain quantile Local minima: yields better results when estimating the baseline.

Baseline Subtraction

Baseline subtraction: algorithm For each spectrum, find local minima by segmenting the m/z range. Fit a local regression to local minima for each spectrum Subtract the estimated baseline from each spectrum

### Load libraries library(survival) library(icens) library(process) ### Read in the raw spectrum fdat <- system.file("test", package="process") fs <- list.files(fdat, pattern = "\\.*csv\\.*", full.names=true) f1 <- read.files(fs[1]) ### Plot the raw spectrum jpeg("f1.jpeg", width=480, height=480) plot(f1, type="l", xlab="m/z") title(basename(fs[1])) dev.off() ### Remove the baseline jpeg("f2.jpeg", width=480, height=480) bseoff <- bslnoff(f1, method="loess", bw=0.1, xlab="m/z", plot=true) title(basename(fs[1])) dev.off()

Peak detection Purpose: To detect peaks that represent the set of proteins that are differentially expressed between different samples.

Peak Detection: algorithm Smooth the spectrum using moving averages of k s nearest neighbors Compute local variability as the median of the absolute deviations of k v nearest neighbors. Identify local maxima of the smoothed spectrum using three thresholds: The signal to noise ratio: local smooth/local variability The detection threshold for the whole spectrum The shape ratio: the area under the curve within a small distance of a peak candidate/ maximum of all such peak areas of a spectrum

### Peak detection jpeg("f3.jpeg", width=480, height=480) pkgobj <- ispeak(bseoff, span=81, sm.span=11, plot=true, zerothrsh=2, area.w=0.003, ratio=0.2) dev.off() ### Inspect peaks in a particular range of m/z values jpeg("f4.jpeg", width=480, height=480) speczoom(pkgobj, xlim=c(5000,10000)) dev.off()

Peak detection

Processing a set of calibration spectra Apply baseline subtraction Normalize spectra Cutoff selection Identify peaks Quality assessment Get proto-biomarkers

Example Data Set A set of 8 spectra from a calibration data set Same 5 proteins are present in the sample: 1084, 1638, 3496, 5807, 7034 amu

### Read in the 8 spectra amu.cali <- c(1084,1638,3496,5807,7034) ### Plot 8 spectra and mark the protein positions by red vertical lines for each of them jpeg("f5.jpeg", width=1080, height=560) par(mfrow=c(2,4)) plotcali <- function(f, main, lab.cali){ x <- read.files(f) plot(x,main=main, ylim=c(0,max(x[,2])), type="n") abline(h=0, col="gray") abline(v=amu.cali, col="salmon") if(lab.cali) axis(3, at=amu.cali, labels=amu.cali, las=3, tick=false, col="salmon", cex.axis=0.94) lines(x) return(invisible(x)) } dir.cali <- system.file("calibration", package="process") files <- dir(dir.cali, full.names=true) i <- seq(along=files) mapply(plotcali, files, LETTERS[i], i <=2) dev.off()

Baseline subtraction Similar to baseline subtraction for a single spectra R code: Mcal <- rmbaseline(dir.cali, plot=true) head(mcal) 060503peptidecalib_1_128.csv 060503peptidecalib_1_16.csv 3.6385 0.7253853 0.7485778 3.6458 0.6859291 0.6960419 3.65287 0.6856960 0.7088729 3.65972 0.6985420 0.7249795 3.66635 0.6885195 0.6953421 3.67276 0.6752363 0.6885879

Normalize Spectra Purpose: reduce variation due to experimental noise Total ion normalization: Calculate each spectrum's area under the curve (AUC) for m/z values greater than the selected cutoff Scale all spectra to the median AUC Assumptions: The number of proteins being over-expressed is approximately equal to the number of proteins being under-expressed. The number of proteins whose expression levels change is small relative to the total number of proteins bound to the protein array surface

Cutoff selection Choose a cutoff point such that the magnitude of the noise is relatively stable above that point. Algorithm for a single cutoff point: Baseline-subtracted spectra within the group are normalized to the median of the sums of intensities of spectra The standard deviation of intensities at each m/z value is calculated The mean of those standard deviations is computed. Repeat for different cutoff points and Plot average standard deviations vs. cutoff points.

### Cutoff selection cts <- round(10^(seq(2,4,length=14))) sdsfirst <- sapply(cts, avesd, Ma=Mcal) jpeg("f6.jpeg", width=480, height=480) par(mfrow=c(1,1)) plot(cts, sdsfirst, xlab="cutpoint", pch=21, bg="red", log="x", ylab="average sd") dev.off() ### Normalize spectra- cutoff point m/z=400 M.r <- renorm(mcal, cutoff=400)

Identify Peaks Similar to peak detection for a single baselineadjusted spectrum R Code ### Identify peaks peakfile <- "calipeak.csv" getpeaks(m.r, peakfile, ratio=0.1)

Quality Assessment Purpose: Identify and eliminate spectra of poor quality Based on 3 parameters: Quality: measure of separation of signal from noise Retain: the number of high peaks in a single spectrum Peak: the number of peaks in a spectrum relative to the average number of peaks of the whole set of spectra being considered Poor quality spectra: Quality < 0.4, Retain < 0.1, Peak <0.5.

Quality assessment: algorithm Estimate the noise by subtracting from each spectrum its moving average with a window size of 5 points. Calculate the noise envelope as 3 times the standard deviation of the noise in a 250 point window. Calculate the area under each spectrum A 0 Calculate the area after subtracting the noise envelope from the spectrum A 1 Obtain Quality, Retain, and Peak

Quality assessment: algorithm Quality: A1/A0 Retain: the number of points with height greater than 5 times noise envelope/ the total numbrer of points in the spectrum Peak: the number of peaks in each spectrum detected/ the average number of peaks for all spectra in a run

qualres <- quality(m.r, peakfile, cutoff=400) QualRes Quality Retain peak 060503peptidecalib_1_128.csv 0.4144087 0.1710994 0.9696970 060503peptidecalib_1_16.csv 0.4558286 0.1406047 0.9696970 060503peptidecalib_1_2.csv 0.4971926 0.1178203 0.9696970 060503peptidecalib_1_256.csv 0.4095177 0.1778567 0.7272727 060503peptidecalib_1_32.csv 0.3556932 0.1297756 0.9696970 060503peptidecalib_1_4.csv 0.5220848 0.1432037 1.2121212 060503peptidecalib_1_64.csv 0.4790304 0.1430304 1.2121212 060503peptidecalib_1_8.csv 0.4174718 0.1201594 0.9696970

Get Proto-biomarkers Peak alignment: peaks across spectra that are likely to represent the same protein. Proto-biomarkers: peaks aligned across spectra To obtain a proto-biomarker: Generate an interval around each peak that is centered at the m/z value for the peak (0.3%) Determine which actual peaks are represented by a proto-biomarker Use the maximum value as the height of that proto-biomarker

### Get proto-biomarkers bmkfile <- "calibmk.csv" bmk1 <- pk2bmkr(peakfile, M.r, bmkfile, p.fltr=0.5) mk1 <- round(as.numeric(gsub("m", "", names(bmk1)))) mk1 ### [1] 2906 3498 5812 7036 jpeg("f7.jpeg", width=1080, height=560) par(mfrow=c(2,4)) plotcali2 <- function(...){ x <- plotcali(...) lines(x[,1]*2, x[,2]+25, col="blue") } mapply(plotcali2, files, LETTERS[i], i <=2) dev.off()

Analyze the result 5 known proteins: 1084, 1638, 3496, 5807, 7034 Obtained 4 proto-biomarkers: 2906, 3498, 5812, and 7036 Within 0.3% of m/z values of known proteins: 3498, 5812, and 7036 Result of larger proteins with two charges: 2x2906 (5807) and 2x3496 (7034) Failed to detect peaks at m/z=1084 and 1638

Summary PROcess package: Process SELDI-TOF-MS data Advantage: produce more producible results regarding peak quantification Limitation: The results were not homogeneous across laser intensities