Alignment and Preprocessing for Data Analysis

Similar documents

Enhancing GCMS analysis of trace compounds using a new dynamic baseline compensation algorithm to reduce background interference

Setting up a Quantitative Analysis MS ChemStation

The Scheduled MRM Algorithm Enables Intelligent Use of Retention Time During Multiple Reaction Monitoring

Integrated Data Mining Strategy for Effective Metabolomic Data Analysis

Analysis of Liquid Samples on the Agilent GC-MS

MarkerView Software for Metabolomic and Biomarker Profiling Analysis

Pesticide Analysis by Mass Spectrometry

Quick Start Guide Chromeleon 7.2

Learning Objectives:

Chemical Analysis of Chardonnay Wines: Identification and Analysis of Important Aroma Compounds

Background Information

Mass Frontier 7.0 Quick Start Guide

MassHunter for Agilent GC/MS & GC/MS/MS

Technical Report. Automatic Identification and Semi-quantitative Analysis of Psychotropic Drugs in Serum Using GC/MS Forensic Toxicological Database

Accurate Mass Screening Workflows for the Analysis of Novel Psychoactive Substances

Strategies for Developing Optimal Synchronous SIM-Scan Acquisition Methods AutoSIM/Scan Setup and Rapid SIM. Technical Overview.

Functional Data Analysis of MALDI TOF Protein Spectra

# LCMS-35 esquire series. Application of LC/APCI Ion Trap Tandem Mass Spectrometry for the Multiresidue Analysis of Pesticides in Water

1.1 This test method covers the qualitative and quantitative determination of the content of benzene and toluene in hydrocarbon wax.

Chemistry 321, Experiment 8: Quantitation of caffeine from a beverage using gas chromatography

Guidance for Industry

SELDI-TOF Mass Spectrometry Protein Data By Huong Thi Dieu La

Increasing the Multiplexing of High Resolution Targeted Peptide Quantification Assays

GC METHODS FOR QUANTITATIVE DETERMINATION OF BENZENE IN GASOLINE

AppNote 6/2003. Analysis of Flavors using a Mass Spectral Based Chemical Sensor KEYWORDS ABSTRACT

NIRCal Software data sheet

Overview. Introduction. AB SCIEX MPX -2 High Throughput TripleTOF 4600 LC/MS/MS System

Tutorial 9: SWATH data analysis in Skyline

Thermo Scientific SIEVE Software for Differential Expression Analysis

Signal, Noise, and Detection Limits in Mass Spectrometry

A Navigation through the Tracefinder Software Structure and Workflow Options. Frans Schoutsen Pesticide Symposium Prague 27 April 2015

Expectations for GC-MS Lab

VALIDATION OF ANALYTICAL PROCEDURES: TEXT AND METHODOLOGY Q2(R1)

1 st day Basic Training Course

Daniel M. Mueller, Katharina M. Rentsch Institut für Klinische Chemie, Universitätsspital Zürich, CH-8091 Zürich, Schweiz

A Streamlined Workflow for Untargeted Metabolomics

The First Quantitative Analysis of Alkylated PAH and PASH by GCxGC/MS and its Implications on Weathering Studies

Application of TargetView software within the food industry - the identi cation of pyrazine compounds in potato crisps

STANDARD OPERATING PROCEDURES

Performance and advantages of qnmr measurements

ANALYTICAL METHODS INTERNATIONAL QUALITY SYSTEMS

Step-by-Step Analytical Methods Validation and Protocol in the Quality System Compliance Industry

Mass Frontier Version 7.0

Analyst 1.6 Software. Software Reference Guide

Analysis of Phthalate Esters in Children's Toys Using GC-MS

Overview. Triple quadrupole (MS/MS) systems provide in comparison to single quadrupole (MS) systems: Introduction

Noise reduction of fast, repetitive GC/MS measurements using principal component analysis (PCA)

Using NIST Search with Agilent MassHunter Qualitative Analysis Software James Little, Eastman Chemical Company Sept 20, 2012.

ProteinPilot Report for ProteinPilot Software

In-Depth Qualitative Analysis of Complex Proteomic Samples Using High Quality MS/MS at Fast Acquisition Rates

Metabolomics Software Tools. Xiuxia Du, Paul Benton, Stephen Barnes

MultiQuant Software Version 3.0 for Accurate Quantification of Clinical Research and Forensic Samples

Chemical analysis service, Turner s Green Technology Group

Analyst Software

RAPID MARKER IDENTIFICATION AND CHARACTERISATION OF ESSENTIAL OILS USING A CHEMOMETRIC APROACH

Quantitative proteomics background

A4, Empower3 Processing Tips and Tricks

Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant

AMD Analysis & Technology AG

Analysis of the Vitamin B Complex in Infant Formula Samples by LC-MS/MS

Sample Analysis Design Isotope Dilution

Indoor Air Comfort Test Report

Analysis of Organophosphorus Pesticides in Milk Using SPME and GC-MS/MS. No. GCMS No. SSI-GCMS Shilpi Chopra, Ph.D.

1 Genzyme Corp., Framingham, MA, 2 Positive Probability Ltd, Isleham, U.K.

A commitment to quality and continuous improvement

MultiQuant Software 2.0 for Targeted Protein / Peptide Quantification

Analysing chromatographic data using data mining to monitor petroleum content in water

Simultaneous qualitative and quantitative analysis using the Agilent 6540 Accurate-Mass Q-TOF

WHITE PAPER CONTINUOUS AIR MONITORING USING GRIFFIN 460 MOBILE GC/MS. HEADINGS: ALL CAPS, 12 PT TREBUCHET FONT, ICx BLUE

Application of TargetView software in the fragrance industry: Accurate identi cation of allergens with fast GC

ICH Topic Q 2 (R1) Validation of Analytical Procedures: Text and Methodology. Step 5

MHI3000 Big Data Analytics for Health Care Final Project Report

Analysis of Polyphenols in Fruit Juices Using ACQUITY UPLC H-Class with UV and MS Detection

Samples RMN Analytical Report ( ) GC/FID Analysis (RM01) Nicotine Purity

STANFORD UNIVERSITY MASS SPECTROMETRY 333 CAMPUS DR., MUDD 175 STANFORD, CA

How To Use An Ionsonic Microscope

MS Data Analysis I: Importing Data into Genespring and Initial Quality Control

UHPLC/MS: An Efficient Tool for Determination of Illicit Drugs

Method development for analysis of formaldehyde in foodsimulant. melamine-ware by GC-MS and LC-MS/MS. Internal Technical Report

Cluster Analysis. Isabel M. Rodrigues. Lisboa, Instituto Superior Técnico

Micromass LCT User s Guide

HPLC Analysis of Acetaminophen Tablets with Waters Alliance and Agilent Supplies

Unique Software Tools to Enable Quick Screening and Identification of Residues and Contaminants in Food Samples using Accurate Mass LC-MS/MS

Analysis of Free Bromate Ions in Tap Water using an ACQUITY UPLC BEH Amide Column

Assay Development and Method Validation Essentials

Accelerate your Quantitative LC/MS Workflows with a Fully Integrated UHPLC-QQQ Platform

Tutorial for proteome data analysis using the Perseus software platform

LC-MS/MS Method for the Determination of Docetaxel in Human Serum for Clinical Research

Evaluating System Suitability CE, GC, LC and A/D ChemStation Revisions: A.03.0x- A.08.0x

Q-TOF User s Booklet. Enter your username and password to login. After you login, the data acquisition program will automatically start.

AB SCIEX TOF/TOF 4800 PLUS SYSTEM. Cost effective flexibility for your core needs

using ms based proteomics

SPE, LC-MS/MS Method for the Determination of Ethinyl Estradiol from Human Plasma

GUIDELINES FOR THE VALIDATION OF ANALYTICAL METHODS FOR ACTIVE CONSTITUENT, AGRICULTURAL AND VETERINARY CHEMICAL PRODUCTS.

Simultaneous Metabolite Identification and Quantitation with UV Data Integration Using LightSight Software Version 2.2

Application of Automated Data Collection to Surface-Enhanced Raman Scattering (SERS)

Gas Chromatography. Let s begin with an example problem: SPME head space analysis of pesticides in tea and follow-up analysis by high speed GC.

Application Note # LCMS-81 Introducing New Proteomics Acquisiton Strategies with the compact Towards the Universal Proteomics Acquisition Method

Transcription:

Alignment and Preprocessing for Data Analysis Preprocessing tools for chromatography Basics of alignment GC FID (D) data and issues PCA F Ratios GC MS (D) data and issues PCA F Ratios PARAFAC Piecewise Alignment GUI (available online) synoveclab.chem.washington.edu/downloads.htm Email for username/password

Tools for Analysis: Classification Principal Component Analysis (PCA) Signal Retention Time PCA Loadings` Score PC X Scores + Error DCS Retention Time Score PC Degree of Class Separation (DCS) Score PC DCS DCS D s A AB, s B D, ( X X ) ( Y Y ) AB A B A B Score PC

Why Align? Reduction in classification PCA Increase in uncertainty for quantification PARAFAC Misalignment occurs frequently Daily instrument variation causes misalignment Correction is necessary to apply these methods

Retention Time Precision & PCA Replicates with limited shifting Loadings Scores Signal PCA _ Score PC DCS=3 Retention Time Retention Time Score PC Replicates with large shifting Loadings Scores Signal PCA _ Score PC DCS=3 Retention Time Retention Time Score PC

Basics of Alignment Types of alignment algorithms Cross correlation coefficient Correlation Optimized Warping (COW) *Piecewise alignment Alignment Parameters Window Size Shift Target Selection PCA Correlation Coefficient Windowed Target

Alignment Algorithms Cross correlation coefficient Move the entire chromatogram to maximize correlation Correlation Optimized Warping (COW) Separate the chromatogram into windows Warp and move the windows to optimize the correlation Find the best alignment path to correct the data *Piecewise alignment Separate the chromatogram into windows Shift the windows to optimize the correlation Find the best alignment path to correct the data

Alignment Parameters Parameter Description Effect Too Small Effect Too Large Determining correct values Window Size, W Window Size ~ pk width Relative movement high, difficult to determine quality of alignment Insufficient flexibility to correct peak to peak shifting Alignment Metric Shift, L <shift<=maximum shift L t R Insufficient movement of segments Increases time Alignment Metric

Target Selection *Global Approach All chromatograms are initially collected *User chosen target PCA optimized target Scores are produced for every sample The sample with the minimum distance from the center is the target *Maximum correlation target Calculate the product of each chromatograms correlation to the others The maximum correlation is the target Online Approach An initial target is set Chromatograms are aligned as they are collected The target changes as new chromatograms are collected

Target Selection (Maximum Correlation).6.55 Sample # 9 is the target.5 6 5 9 8 7 6 5 4 3 Type Type Type 3 Product Correlation.45.4.35.3.5..5 Signal 4 3 8.5 8. 8.5 8.. 5 5 8 Sample # Target Type Type Type 3 Signal 6 4 4 6 8 Retention Time, min 8.5 8. 8.5 8. Retention Time, min

Alignment of GC FID (D) Data and Issues Removal of artifacts and solvent peaks Baseline correction and normalization Alignment Improving PCA F Ratio

Preprocessing Tools for Chromatography Noise filtering Median filter Baseline correction Normalization Alignment

Experimental GC FID Separation 4 diesel sample types 9 minute separation 7 replicate injections over 5 days

Noise Filtering (Solvent, Spikes, and Outliers) 4 GC-FID Separation.8.6 Noise Spike Signal.4. Signal 8 6 4 Noise Spike.8.6.4..8.6 7 7. 7.4 7.6 7.8 Median Filtered Data (minimize spike) 4 6 8 Retention Time, min Signal.4..8.6.4 7 7. 7.4 7.6 7.8 Retention Time, min

Noise Filtering (Solvent, Spikes, and Outliers) GC-FID Separation Solvent.5 x 8 Scores Plot Solvent Dominated Info Signal 8 6 PC.5 -.5 4 - -.5 Outlier (low signal) - 4 6 8 Retention Time, min -.5.8..4.6.8. x 9 PC

Noise Filtering (Solvent, Spikes, and Outliers) 4 GC-FID Separation 5 x 6 Scores Plot 4 3 Signal 8 6 PC Outlier (Low Signal) 4 - Outlier Low Signal 4 6 8 Retention Time, min - -3.5.5 3 3.5 4 4.5 5 x 7 PC

Noise Filtering (Solvent, Spikes, and Outliers) 4 GC-FID Separation 5 x 6 Scores Plot 4 3 Signal 8 6 PC No Outlier (Baseline / Normalization on PC) 4 - - No Outlier 4 6 8 Retention Time, min -3 3.4 3.6 3.8 4 4. 4.4 4.6 x 7 PC

Noise Filtering (Solvent, Spikes, and Outliers).5 x -9 Baseline Corrected and Normalized Data 8 x -4 6 Scores Plot Normalized Signal.5.5 PC 4 - Tight Clustering in PCA Subtract offset line from data Then divide by total signal -.5 4 6 8 Retention Time, min -4 6. 6.3 6.4 6.5 6.6 6.7 6.8 x -3 PC

Experimental GC Separation 3 gasoline sample types 5 minute separation 6 replicate injections over days Misalignment is due to day to day instrument variation

Alignment GUI Initially blank to guide the user through alignment

Alignment GUI Load from file Load from workspace

Load & View Data Sizes Alignment GUI Zoom & Pan Align Data Choose or Estimate Parameters Choose or Estimate Target

Alignment GUI Data before alignment Estimated target and parameters Now save data to workspace or.mat file Data after alignment

PCA Classification of Gasoline x -8 3 3 x -3 Scores Plot Signal.5.5 PC - - We want better separation of groups -3-4.5 8.5 8. 8.5 8. 8.5 8.3 8.35 Retention Time, min -5.38.385.39.395.4.45.4 PC

Fisher Ratio Method (F Ratio) For each mass channel calculate Fisher Ratio at each point in D space, Sample Fisher Ratio = btwclass w ithinclass Fisher Ratios GC MS Separations Sample Col Sample 3 Col Works well for samples that have large amounts of within class variance Works best when comparing a small number of sample classes

Fisher Ratio Method (F Ratio) x -8 Selected Chromatographic Region 8 6 No alignment Signal 3.5.5 F-Ratio 8 4 8 6 4 5 5 Retention Time, min 7 6 After alignment.5 8.5 8. 8.5 8. 8.5 8.3 8.35 F-Ratio 5 4 3 Select Top Ten Retention Time, min 5 5 Retention Time, min

Fisher Ratio Method (F Ratio) 7 x -8 Regions of interest 6.5 x -3 Scores Plot 5 Signal 4 3 PC.5 Tight clustering of sample types.5 -.5-5 5 Retention Time, min - 5 6 7 8 9 x -3 PC

Summary Removal of solvents or artifacts is essential Baseline correction is an important step Alignment is essential for improving classification The F Ratio algorithm can further improve classification

Alignment of GC MS (D) Data and Issues Removal of artifacts and solvent peaks Baseline correction and normalization Alignment Improving PCA F Ratio PARAFAC

Experimental GC MS Separation 3 gasoline sample types 5 minute separation 6 replicate injections over days Misalignment is due to day to day instrument variation

Alignment GUI Initially blank to guide the user through alignment

Alignment GUI Load from file Load from workspace

Load & View Data Sizes Alignment GUI Zoom & Pan Align Data Choose or Estimate Parameters Choose or Estimate Target

Alignment GUI Data before alignment Estimated target and parameters Now save data to workspace or.mat file Data after alignment

Alignment Sample Target TIC Total Ion Current Shift Function (TIC SF) 5 Alignment Shift 5 4 Retention Time Retention Time

Alignment Total Ion Current Shift Function (TIC SF) Shift GC MS Data 8 6 4 Sample Target Retention Time GC MS Data Apply alignment 8 6 4 4 Retention Time 4 Retention Time

Alignment x -8.5 TIC Signal 7 x -8 6 5 4 3 TIC Signal.5.5 x -8.5 7. 7.5 7. 7.5 7.3 Align TIC Signal.5-5 5 Retention Time, min.5 7. 7.5 7. 7.5 7.3 Retention Time, min

Classification of Full MS Data.5 x -8 3.5 x -3 Scores Plot TIC Signal.5 PC.5 -.5 - -.5 PCA w/ full MS.5 -.3.35.4.45.5 PC 7.5 7. 7.5 7. 7.5 Retention Time, min

Fisher Ratio Method For each mass channel calculate Fisher Ratio at each point in D space, Fisher Ratio = btwclass w ithinclass GC MS Separations Sample Col Sample Col m/z m/z Fisher Ratios for Samples m/z Sum Fisher Ratios along mass channel dimension Sample 3 m/z Col Col Col Works well for samples that have large amounts of within class variance Works best when comparing a small number of sample classes

Simulated Example Using F Ratios for PCA TIC W = 4 L = Align TIC Scores GC-MS Data D F Ratios PCA F-Ratios m/z Shows differences between all samples Retention Time

F Ratios 4 GC MS Fisher Ratios 7 6 7 x -8 GC MS Separation 5 4 6 8 3 5 6 TIC Signal 4 3 4 5 5 7 GC MS Fisher Ratios 6 9 8 5 4 7 3-5 5 6 Retention Time, min 5 4 Top Fisher Ratios 4.8 5 5. 5.4 5.6

Procedure Select masses using mass spectral information Select time regions using F Ratios Combine to reduce the data set Improve results of PCA

Classification of Full MS Data 7 x -8 6 5 4 Regions of interest (with MS values) Scores Plot of Nearest Samples.5 x -4.5 Signal 3 PC -.5 - -.5 Clear Separation on st PC - - 5 5 -.5.8.9...3.4 x -3 PC Retention Time, min

Quantification of GC MS Data Use aligned, baseline corrected and normalized data Use PARAFAC of small regions for analysis Match values to mass spectra Peak Sums Peak Profiles

Quantification Target analyte Parallel Factor Analysis (PARAFAC) isolates the pure component peak and mass spectral information from overlapping peaks and background for both identification and quantification. Data Matrix = Analyte + Analyte +... Noise x x R = z z + + + y y E PARAFAC Results Compound : malate, TMS # of factors: 3 Chosen comp.: #3 Match value : 948 Peak sum : 44388 Peak height : 37688 Intensity Col. Time Intensity Col. Time

PARAFAC for GC MS Data Sample Peak Profile 8 6 8 4 6 4 4 4 Sample Stack Samples Run PARAFAC...4.6 Mass Spectrum 8 6 4 Retention Time Sample Analyte Concentration Sample 4 Sample

Experimental GC MS Separation 3 gasoline sample types 5 minute separation 6 replicate injections over days Misalignment is due to day to day instrument variation Looking for isobutyl benzene in the gasoline

Isobutyl Benzene Spectrum 9 8 7 Signal 6 5 4 3 4 6 8 4 6 Mass

PARAFAC Results for GC MS Data 6 x -9 5 8 Selected Analysis Region for PARAFAC 7 Profile 6 Qualitative Info 9 x -3 5 4 3 Peak 3 5 5 5 3 35 3.5 x -3 TIC Signal 4 3 Quantitative Info 3.5.5.5 Peak Sum 3 5 5 7 x -5 7.4 7.6 7.8 7. 7. 7.4 7.6 7.8 Retention Time, min Qualitative Info 6 5 4 3 Mass Spectrum 4 6 8 4 6

PARAFAC Results for GC MS Data Best match to isobutyl benzene Component Component Component 3.9.9.9.8.7.8.8 MV IBB =43 MV.7 IBB =788.7 MV IBB =9.6.5.4.3.. 4 6 8 4 6.6.5.4.3.. 4 6 8 4 6.6.5.4.3.. 4 6 8 4 6

Summary Mass spectral chromatographic data should be aligned An available alignment algorithm is able to align D and D data F Ratio methods can be used to improve classification PARAFAC can be effectively used to separate overlapped analytes