O2PLS for improved analysis and visualization of complex data



Similar documents
Multivariate Chemometric and Statistic Software Role in Process Analytical Technology

Chemometric Analysis for Spectroscopy

NIRCal Software data sheet

Multi- and Megavariate Data Analysis

Partial Least Squares (PLS) Regression.

Multivariate Tools for Modern Pharmaceutical Control FDA Perspective

SIMCA 14 MASTER YOUR DATA SIMCA THE STANDARD IN MULTIVARIATE DATA ANALYSIS

An Introduction to Partial Least Squares Regression

Asian Journal of Food and Agro-Industry ISSN Available online at

Fundamentals of modern UV-visible spectroscopy. Presentation Materials

RAPID MARKER IDENTIFICATION AND CHARACTERISATION OF ESSENTIAL OILS USING A CHEMOMETRIC APROACH

Austin Peay State University Department of Chemistry Chem The Use of the Spectrophotometer and Beer's Law

Applications of Near Infrared Spectroscopic Analysis in the Food Industry and. Research

Near Infrared Transmission Spectroscopy in the Food Industry

Application of Automated Data Collection to Surface-Enhanced Raman Scattering (SERS)

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

Application Note. The Optimization of Injection Molding Processes Using Design of Experiments

Gas emission measurements with a FTIR gas analyzer - verification of the analysis method Kari Pieniniemi 1 * and Ulla Lassi 1, 2

Environmental Remote Sensing GEOG 2021

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

A Streamlined Workflow for Untargeted Metabolomics

Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant

Dimensionality Reduction: Principal Components Analysis

Qualitative NIR Analysis for Ingredients in the Baking Industry

Monograph. NIR Spectroscopy. A guide to near-infrared spectroscopic analysis of industrial manufacturing processes.

Calculation of Minimum Distances. Minimum Distance to Means. Σi i = 1

Validation and Calibration. Definitions and Terminology

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Reversed Phase High Presssure Liquid Chromatograhphic Technique for Determination of Sodium Alginate from Oral Suspension

Industrial Process Monitoring Requires Rugged AOTF Tools

Back to Basics Fundamentals of Polymer Analysis

Factor Analysis. Sample StatFolio: factor analysis.sgp

Chemistry 111 Lab: Intro to Spectrophotometry Page E-1

Data Mining: Algorithms and Applications Matrix Math Review

Digital Remote Sensing Data Processing Digital Remote Sensing Data Processing and Analysis: An Introduction and Analysis: An Introduction

NIR Chemical Imaging as a Process Analytical Tool

MarkerView Software for Metabolomic and Biomarker Profiling Analysis

Principal Component Analysis

Analysing Questionnaires using Minitab (for SPSS queries contact -)

5.33 Lecture Notes: Introduction to Spectroscopy

A Beer s Law Experiment

STABILITY TESTING: PHOTOSTABILITY TESTING OF NEW DRUG SUBSTANCES AND PRODUCTS

FT-NIR for Online Analysis in Polyol Production

Monitoring chemical processes for early fault detection using multivariate data analysis methods

Partial Least Square Regression PLS-Regression

Introduction to Principal Components and FactorAnalysis

Copyright PEOPLECERT Int. Ltd and IASSC

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Design of Experiments for Analytical Method Development and Validation

Analyze IQ Spectra Manager Version 1.2

1 st day Basic Training Course

ANEXO VI. Near infrared transflectance spectroscopy. Determination of dexketoprofen in a hydrogel. M. Blanco y M. A. Romero

Spectrophotometry and the Beer-Lambert Law: An Important Analytical Technique in Chemistry

Simple Linear Regression Inference

Electromagnetic Radiation (EMR) and Remote Sensing

Scatter Plots with Error Bars

In this column installment, we present results of tablet. Raman Microscopy for Detecting Counterfeit Drugs A Study of the Tablets Versus the Packaging

EDXRF of Used Automotive Catalytic Converters

Vertical Alignment Colorado Academic Standards 6 th - 7 th - 8 th

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS

FACTOR ANALYSIS. Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables.

Analytical Chemistry Lab Reports

Data representation and analysis in Excel

How to report the percentage of explained common variance in exploratory factor analysis

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

OLIVÉR BÁNHIDI 1. Introduction

Reflectance Measurements of Materials Used in the Solar Industry. Selecting the Appropriate Accessories for UV/Vis/NIR Measurements.

2 Absorbing Solar Energy

A Comparison of Variable Selection Techniques for Credit Scoring

Chemistry 111 Laboratory Experiment 7: Determination of Reaction Stoichiometry and Chemical Equilibrium

Introduction to X-Ray Powder Diffraction Data Analysis

Validation of measurement procedures

8.1 Summary and conclusions 8.2 Implications

Chapter 5 -- The Spectrophotometric Determination of the ph of a Buffer. NAME: Lab Section: Date: Sign-Off:

LED Lighting - Error Consideration for Illuminance Measurement

Calibration of the MASS time constant by simulation

Department of Engineering Enzo Ferrari University of Modena and Reggio Emilia

HPLC Analysis of Acetaminophen Tablets with Waters Alliance and Agilent Supplies

EXPERIMENT 11 UV/VIS Spectroscopy and Spectrophotometry: Spectrophotometric Analysis of Potassium Permanganate Solutions.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Ultra-low Sulfur Diesel Classification with Near-Infrared Spectroscopy and Partial Least Squares

Lab #11: Determination of a Chemical Equilibrium Constant

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

ON-STREAM XRF ANALYSIS OF HEAVY METALS AT PPM CONCENTRATIONS

Simple linear regression

Review Jeopardy. Blue vs. Orange. Review Jeopardy

INFRARED SPECTROSCOPY (IR)

STA 4273H: Statistical Machine Learning

Upon completion of this lab, the student will be able to:

Process Analytical Technology (PAT) Capabilities and Implementations under QbD Principles QbD and PAT Department

QUANTITATIVE INFRARED SPECTROSCOPY. Willard et. al. Instrumental Methods of Analysis, 7th edition, Wadsworth Publishing Co., Belmont, CA 1988, Ch 11.

PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA

Alignment and Preprocessing for Data Analysis

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Functional Data Analysis of MALDI TOF Protein Spectra

An Innovative Method for Dead Time Correction in Nuclear Spectroscopy

AP CHEMISTRY 2006 SCORING GUIDELINES (Form B)

Transcription:

O2PLS for improved analysis and visualization of complex data Lennart Eriksson 1, Svante Wold 2 and Johan Trygg 3 1 Umetrics AB, POB 7960, SE-907 19 Umeå, Sweden, lennart.eriksson@umetrics.com 2 Umetrics Inc., 17 Kiel Ave., Kinnelon, NJ 07405, USA, svante.wold@umetrics.com 3 Institute of Chemistry, Umeå University, SE-901 87 Umeå, Sweden, johan.trygg@chem.umu.se Keywords: O2PLS, predictive components, Y-orthogonal variability, X-orthogonal variability. 1 Introduction O2PLS is a generalization of PLS and OPLS [1-5]. In contrast to PLS and OPLS, it is bidirectional, i.e. X Y; therefore X can be used to predict Y, and Y can be used to predict X. O2PLS allows the partitioning of the systematic variability in X and Y into three parts: the X/Y joint predictive variation; the Y-orthogonal variation in X; and the X- unrelated variation in Y (Figure 1). Figure 1. Overview of the O2PLS model relating two data tables to each other. 2 Theory The O2PLS model can be written as: Model of X: X = T P P P + T O P O + E (1) Model of Y: Y = U P Q P + U O Q O+ F (2) where a linear relationship exists between T P and U P and the score vectors in T P and T O are mutually orthogonal. The number of components in the respective set of components is determined using cross-validation. 3 Material and methods The application data set contains quantitative data for five different types of carrageenans derived from their NIR, IR and Raman spectra. Carrageenans are polysaccharides that are extracted from seaweed and used as gelling and thickening agents in a wide range of industries, including food, pharmaceuticals and cosmetics. Many different types of carrageenans exist, each having different gelling and thickening properties. The raw material (seaweed) contains a mixture of carrageenan types and hence the final commercial product is also a mixture of types. It is imperative that the industry know the composition of specific products in order to target appropriate applications areas or, if necessary, perform chemical modification prior to release. The proportions of five different types of carrageenans (Lambda, Kappa, Iota, Mu and Nu carrageenans) were varied in this work using a five-component mixture design in six levels [6]. This produced a data set containing 128 samples [6],

sampled over five days. For each sample, NIR (1100 2500 nm; 699 variables), IR (550 4000 cm-1; 662 variables) and Raman (3600 200 cm -1 ; 3401 variables) spectra were acquired. These spectra are treated as three separate blocks of data. The relationship between these blocks and the proportions of the five carrageenan types in each mixture sample, is reported elsewhere [7]. The objective of this work is to investigate the information overlap between the three blocks of spectral data using O2PLS. Additionally, these analyses will reveal spectral variability unique to each method. 4 Results and discussion The blocks NIR, IR and Raman data were analyzed in a pair-wise fashion using O2PLS as implemented in SIMCA-P + version 12. NIR data (used as the X-block) and the IR data (used as the Y-block) were contrasted, and the results are given below. The O2PLS model obtained was a 6 + 1 + 3 model (Figure 2). The notation should be viewed as 6 predictive components taking care of the joint NIR/IR variation, 1 Y-orthogonal component expressing the variability in the NIR data that is not present in the IR data, and 3 X-unrelated components representing the variability in the IR data that is not available in the NIR data. Figure 2. Model summary of the O2PLS model relating the NIR data to the IR data. The results in Figure 2 show a high degree of information overlap between the NIR and IR data. Only 1.4% of the variability in the NIR data is orthogonal (unrelated) to the IR data while the analogous fraction in the IR data that is unrelated to the NIR data is larger, i.e. 17.8%. Interpretation of the joint X/Y co-variation (the information overlap between the NIR and IR data): The score plot (Figure 3) shows the data structure captured by the first two predictive components. The triangular distribution of the 128 samples is due to the underlying mixture design. This distribution indicates that the information overlap between the NIR and the IR data is affected by the systematically changing nature of the samples.

Figure 3. Scatter plot of the scores of the first two predictive components of the O2PLS model. Each point is one sample. The samples are colored according to the content of the Iota-type carrageenan constituent. Interpretation of the Y-orthogonal variation in X (variation unique to the NIR data): The variability in the NIR data that is orthogonal to the IR data amounts to 1.4%. An examination of the single score vector belonging to this compartment of the O2PLS model shows no apparent trend or grouping in the data. This suggests that the Y-orthogonal variation is spread in a similar fashion over the five sampling days. Figure 4 shows the loading spectrum of this component. The non-correlating variability mainly resides in the wavelength regions 1700-1860nm, 1950nm, and 2220-2350nm, whereas the region between 1100-1700nm contains comparatively little variability unique to the NIR data. Figure 4. Loading spectrum of the single Y-orthogonal component. This plot highlights which spectral regions in the NIR data contain variability that is not present in the IR data. Interpretation of the X-unrelated variability in Y (variation unique to the IR data): The fraction of variability unique to the IR data is 17.8%. It is expressed by the three X-unrelated O2PLS components. A plot of the scores of the first of these components (Figure 5) shows a systematic shift among the data measured early in the sampling. This variability corresponds to 9.4% of the total variance explained and hence cannot be neglected. Such systematic differences are not seen in the NIR data for the same samples. The corresponding loading spectrum points to the wavelnumber regions that capture this structure (Figure 6).

Figure 5. Score plot of the first X-unrelated O2PLS component. The horizontal lines indicate the shift in sample properties that takes place when going from day 1 to day 2, and from day 2 to 3. Figure 6. Loading spectrum of the first X-unrelated component. This plot highlights which spectral regions in the IR data contain variability that is not present in the NIR-data, i.e., where the deviating properties of the day 2 samples are seen. 5 Conclusion The main advantage of the O2PLS method is that it simplifies the analysis, visualization and interpretation of complex, multi-block data sets by producing more informative plots than the conventional PLS method. This makes O2PLS highly useful for the treatment of analytical and bioanalytical chemical data, e.g. for comparing and contrasting blocks of data compiled using different spectral methods or different omics platforms (microarray data, electrophoresis data, etc.). O2PLS is especially useful in calibration transfer applications where the method helps to expose systematic differences between analytical instruments. O2PLS is also useful in differentiating subject responses before and after a given treatment.

6 References [1] Trygg, J., and Wold, S., Orthogonal Projections to Latent Structures (OPLS), Journal of Chemometrics, 16, 119-128, 2002. [2] Trygg, J., Prediction and Spectral Profile Estimation in Multivariate Calibration, Journal of Chemometrics, 18, 166-172, 2004. [3] Eriksson, L., Johansson, E., Kettaneh-Wold, N., Trygg, J., Wikström, M., and Wold, S., Multi- and Megavariate Data Analysis, Part II, Method Extensions and Advanced Applications, Chapter 23, Umetrics Academy, 2005. [4] Trygg, J., O2-PLS for Qualitative and Quantitative Analysis in Multivariate Calibration, Journal of Chemometrics, 16, 283-293, 2002. [5] Trygg, J., and Wold, S., O2-PLS, a Two-Block (X-Y) Latent Variable Regression (LVR) Method With an Integral OSC Filter, Journal of Chemometrics, 17, 53-64, 2003. [6] Dyrby, M., Petersen, R.V., Larsen, J., Rudolf, B., Nørgaard, L., Engelsen, S.B., Towards on-line monitoring of the composition of commercial Carrageenan powders, Carbohydrate Polymers, 57, 337-348, 2004. [7] Eriksson, L., Dyrby, M., Trygg, J., and Wold, S., Separating Y-predictive and Y-orthogonal variation in multi-block spectral data, Journal of Chemometrics, 20, 352-361, 2006.