Calibration and Linear Regression Analysis: A Self-Guided Tutorial



Similar documents
Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

1. Measuring association using correlation and regression

SIMPLE LINEAR CORRELATION

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHAPTER 14 MORE ABOUT REGRESSION

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

An Alternative Way to Measure Private Equity Performance

DEFINING %COMPLETE IN MICROSOFT PROJECT

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

total A A reag total A A r eag

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Calculating the high frequency transmission line parameters of power cables

What is Candidate Sampling

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

A statistical approach to determine Microbiologically Influenced Corrosion (MIC) Rates of underground gas pipelines.

How To Calculate The Accountng Perod Of Nequalty

The OC Curve of Attribute Acceptance Plans

BERNSTEIN POLYNOMIALS

Calculation of Sampling Weights

Quantization Effects in Digital Filters

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

Recurrence. 1 Definitions and main statements

Texas Instruments 30X IIS Calculator

The Application of Fractional Brownian Motion in Option Pricing

Simple Interest Loans (Section 5.1) :

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining

Brigid Mullany, Ph.D University of North Carolina, Charlotte

Portfolio Loss Distribution

Economic Interpretation of Regression. Theory and Applications

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Forecasting the Direction and Strength of Stock Market Movement

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Question 2: What is the variance and standard deviation of a dataset?

1 Example 1: Axis-aligned rectangles

Regression Models for a Binary Response Using EXCEL and JMP

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB.

Viscosity of Solutions of Macromolecules

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

STATISTICAL DATA ANALYSIS IN EXCEL

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Although ordinary least-squares (OLS) regression

Implementation of Deutsch's Algorithm Using Mathcad

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

The issue of June, 1925 of Industrial and Engineering Chemistry published a famous paper entitled

The Effect of Mean Stress on Damage Predictions for Spectral Loading of Fiberglass Composite Coupons 1

Analysis of Premium Liabilities for Australian Lines of Business

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA*

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Statistical Methods to Develop Rating Models

Problem Set 3. a) We are asked how people will react, if the interest rate i on bonds is negative.

Traffic-light a stress test for life insurance provisions

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

Evaluating the Effects of FUNDEF on Wages and Test Scores in Brazil *

Method for assessment of companies' credit rating (AJPES S.BON model) Short description of the methodology

Joe Pimbley, unpublished, Yield Curve Calculations

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Faraday's Law of Induction

Support Vector Machines

ECONOMICS OF PLANT ENERGY SAVINGS PROJECTS IN A CHANGING MARKET Douglas C White Emerson Process Management

World Economic Vulnerability Monitor (WEVUM) Trade shock analysis

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM

n + d + q = 24 and.05n +.1d +.25q = 2 { n + d + q = 24 (3) n + 2d + 5q = 40 (2)

Least Squares Fitting of Data

Properties of Indoor Received Signal Strength for WLAN Location Fingerprinting

Fixed income risk attribution

Abstract. 260 Business Intelligence Journal July IDENTIFICATION OF DEMAND THROUGH STATISTICAL DISTRIBUTION MODELING FOR IMPROVED DEMAND FORECASTING

Section 5.4 Annuities, Present Value, and Amortization

FLASH POINT DETERMINATION OF BINARY MIXTURES OF ALCOHOLS, KETONES AND WATER. P.J. Martínez, E. Rus and J.M. Compaña

Shielding Equations and Buildup Factors Explained

Logistic Regression. Steve Kroon

Measuring Ad Effectiveness Using Geo Experiments

L10: Linear discriminants analysis

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

Ring structure of splines on triangulations

High Correlation between Net Promoter Score and the Development of Consumers' Willingness to Pay (Empirical Evidence from European Mobile Markets)

Time Domain simulation of PD Propagation in XLPE Cables Considering Frequency Dependent Parameters

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Level Annuities with Payments Less Frequent than Each Interest Period

Extending Probabilistic Dynamic Epistemic Logic

HÜCKEL MOLECULAR ORBITAL THEORY

Two Faces of Intra-Industry Information Transfers: Evidence from Management Earnings and Revenue Forecasts

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

Efficient Project Portfolio as a tool for Enterprise Risk Management

Time Value of Money Module

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

Transcription:

Calbraton and Lnear Regresson Analyss: A Self-Guded Tutoral Part The Calbraton Curve, Correlaton Coeffcent and Confdence Lmts CHM314 Instrumental Analyss Department of Chemstry, Unversty of Toronto Dr. D. Stone (prepared by J. Ells) 1 The Calbraton Curve and Correlaton Coeffcent Every nstrument used n chemcal analyss can be charactersed by a specfc response functon, that s an equaton relatng the nstrument output sgnal (S) to the analyte concentraton (C). Ths response functon may be lnear, logarthmc, exponental, or any other approprate mathematcal form, dependng on the nature of the behavour of the system beng measured, and the measurement process tself. Whle ths may be known theoretcally, varous factors (such as the specfc analyte beng measured, nterference effects caused by other components of the sample matrx, or random expermental errors) requre that we calbrate each nstrument for the specfc analyte and measurement condtons to be used n a partcular experment. A calbraton curve s an emprcal equaton that relates the response of a specfc nstrument to the concentraton of a specfc analyte n a specfc sample matrx (the chemcal background of the sample). As wth the nstrument response functon, the calbraton curve can have a number of mathematcal forms, dependng on the type of measurement beng performed. Some common examples are lsted below: Type Lnear (zero ntercept) Lnear (non-zero ntercept) Logarthmc Equaton S = bc S = bc + a S = a + b ln C or S = a +.303b log C The calbraton curve s obtaned by fttng an approprate equaton to a set of expermental data (calbraton data) consstng of the measured responses to known concentratons of analyte. For example, n molecular absorpton spectroscopy, we expect the nstrument response to follow the Beer-Lambert equaton, A = εbc, and so we would ft a lnear equaton wth zero ntercept to the data. On the other hand, f we were measurng electrochemcal cell potentals (.e. potentometry) we would expect the response to be gven by the Nernst equaton, whch s logarthmc n form. We would therefore ether ft a logarthmc equaton to the calbraton data, or lnearse the data by calculatng the sgnal response S as 10 E (where E s the cell potental). The most common response functon encountered n nstrumental analytcal chemstry s lnear, so we requre some means of determnng and qualfyng the best-ft straght lne through our calbraton data. Before dscussng ths n detal, however, a word of cauton: even when we expect a lnear nstrument response functon, we should not assume that the calbraton data must always be lnear. In fact, a moment of reflecton reveals that we already know that ths cannot be true. For example, stray lght and polychromatc radaton cause non-lnear devatons from Beer s law at hgher concentratons; quenchng and self-absorpton can cause fluorescence ntenstes to start decreasng wth ncreasng concentraton; and column- or detector-overload can cause non-lneartes n chromatography.

Calbraton and Lnear Regresson Analyss: A Self-Guded Tutoral (Part ) CHM314 Instrumental Analyss, Dept. of Chemstry, Unv. of Toronto D. Stone, J. Ells 1.1 The Correlaton Coeffcent In Part 1 of the tutoral, we saw how to use the trendlne feature n Excel to ft a straght lne through calbraton data and obtan both the equaton of the best-ft straght lne and the correlaton coeffcent, R (sometmes dsplayed as R ). There are n fact varous correlaton coeffcents, but the one we are nterested n here s the Pearson or product-moment correlaton coeffcent (often smply referred to as the correlaton coeffcent ). The Pearson R value provdes a measure of the degree to whch the values of x and y are lnearly correlated. We can assess ths vsually usng a scatter plot (Fgure 1), n whch we also mark the centrod of the data, { x,y}. y 8 6 4 { x,y} 4 6 8 x Fgure 1 XY scatter plot showng the centrod of the data If x and y were lnearly correlated, we would expect all the ponts to fall on a straght lne passng through the centrod. As a result, we would expect all x values to be unformly dstrbuted ether sde of x ; smlarly, all the y values should be unformly dstrbuted about y. The Pearson R s calculated usng the formula [ ( x x )( y y )] R = ( x x ) ( y y ) It follows that f x and y are perfectly correlated n a lnear fashon, we would expect the value of R to be ether +1 or -1, dependng on whether y ncreases (postve slope) or decreases (negatve slope) wth x. To demonstrate how to calculate ths formula n Excel, we return to our prevous example of fluorescence ntensty data from Part 1. Then, 1. Set up a spreadsheet wth the x and y values n columns

Calbraton and Lnear Regresson Analyss: A Self-Guded Tutoral (Part ) CHM314 Instrumental Analyss, Dept. of Chemstry, Unv. of Toronto D. Stone, J. Ells. In the adjacent cells, set up expressons for ( x x ), ( y y ), ther squares, and ther product. For nstance, the formula for ( x x ) may look lke =B3-AVERAGE(B$:B$8), dependng on the locaton of your cells n the spreadsheets. 3. Determne the sums of squares ( x x ) and ( y y ), and the sum of products [( x x )( y y )] n Excel and nsert these values n the formula for R. 4. To calculate the square root n the denomnator, use the SQRT functon. The easest way to calculate R n Excel s by settng up a table to calculate the requred values, as shown below. As you can see ths, yelds a correlaton coeffcent R = 0.9978, so the data are well-correlated and the best-ft lne descrbes the data. A few ponts to menton regardng the correlaton coeffcent: o It s essental to retan a large number of sgnfcant fgures n the numerator and denomnator durng the calculaton, otherwse a msleadng value of R may be obtaned. o Even a hgh R value of, say, 0.9991 does not necessarly ndcate that the data fts to a straght lne. The trendlne should always be plotted and nspected vsually. R s more dscrmnatng n ths respect, although t no longer ndcates the slope of the regresson lne. Ths, however, s evdent by nspecton. o Any curvature n the data wll result n erroneous conclusons about the correlaton. R values are only applcable to lnear correlatons. Nonlnear correlatons are possble, but nvolve a dfferent measure than R, and R values wll not necessarly be close to 1. o The statstcal sgnfcance of R depends on the number of samples n the data set n. 1. The Regresson Lne Calculaton of the regresson lne s straghtforward. The equaton wll have the form y = bx + a, where b s the slope of the lne and a s the y-ntercept. The slope s gven by the formula [ ( x x )( y y )] b = ( x x ) and the ntercept s 3

Calbraton and Lnear Regresson Analyss: A Self-Guded Tutoral (Part ) CHM314 Instrumental Analyss, Dept. of Chemstry, Unv. of Toronto D. Stone, J. Ells a = y bx, both of whch can be easly calculated n Excel wth the table of data used n the prevous secton. The method s smlar to that n the prevous secton. The AVERAGE functon can be used to calculate x and y. Usng the fluorescence data, the equaton of the lne s y = 1.930x + 1.518. Fgure shows an example of a regresson lne wth the calbraton data, centrod and y-resduals dsplayed. Note that, as s commonly the case, t s assumed that any error n the data les solely n the y- values. Techncally, the best-ft straght lne shown s termed the lne of regresson of y on x. Ths method for lnear regresson assumes that the errors are normally dstrbuted. Other methods exst that do not make ths type of assumpton. y 8 6 4 y = 0.590x +.000 r = 0.754 4 6 8 x Fgure XY scatter plot showng the centrod (red crcle), regresson lne, and y-resduals. Fnally, t should be noted that errors n y values for large x values tend to dstort or skew the best-ft lne. Ths can be taken nto account usng ether a weghted or robust regresson technque. However, ths s beyond the scope of the present tutoral. Errors and Confdence Lmts In any area of measurement scence, there s always some error n any sgnal. The error can arse from many sources, and can normally be accounted for usng statstcal technques. However, because there s always some randomness assocated wth measurement error, t contrbutes some degree of uncertanty nto the measurement, whch corresponds to a certan confdence lmt, wthn whch we can be certan about the accuracy of our measurement. Ths leads to the way n whch results are normally reported, where a measurement s reported wth the error, such as C = 51. ±0.05 µg/ml. The ±0.05 s the standard error. When preparng a calbraton curve, there s always some degree of uncertanty n the calbraton equaton. To calculate the standard errors of the slope and the y-ntercept, we requre the resduals. The resdual s the dfference between the measured y-value and the y-value calculated from the calbraton curve, 4

Calbraton and Lnear Regresson Analyss: A Self-Guded Tutoral (Part ) CHM314 Instrumental Analyss, Dept. of Chemstry, Unv. of Toronto D. Stone, J. Ells for a gven observaton. The calculated y-value s easly determned from the calbraton equaton and denoted y ˆ, so the resdual would be ( y ˆ ). y Once the resduals are known, we can calculate the standard devaton n the y-drecton, whch estmates the random errors n the y-drecton. s y x = ( y y ˆ ) n Ths standard devaton can be used to calculate the standard devatons of the slop and the y-ntercept usng the formulas s b = s y x ( x x ) s a = s y x n x ( x x ) where s b s the standard devaton of the slope and s a s the standard devaton of the y-ntercept. The confdence lmts can then be calculated from the t-statstc for n degrees of freedom. Tables of t-statstcs are avalable n any undergraduate statstcs textbook, and are also ncluded n the lab manual. Note that some table gve values of t for dfferent values of n, whle others gve them for values of ν = n 1. Check carefully so that you use the approprate value. The confdence lmts for the slope are then b±t n- s b and for the y-ntercept a±t n- s a. For a large number of samples wth a 99% confdence nterval, we can use t n- =.58. For the fluorescence data, the standard devaton of the slope s s b = 0.0409, so the slope wth confdence nterval b = 1.93 ±(.58 0.0409) = 1.93 ±0.11. The y-ntercept wth confdence nterval s a = 1.5 ±0.76..1 Random Error and Calculaton of Concentraton from the Calbraton Curve: No Replcaton, Interpolated Value Once we know the equaton of the regresson lne, we can easly calculate the concentraton x 0 from a gven sgnal y 0. However, because we are now gong from a y-value to an x-value (nstead of the other way around), we need to fnd the error n x. Ths can be done wth the standard devaton n x 0 s x0 = s y x b ( ) ( ) 1 + 1 n + y 0 y b x x Here, y 0 s the expermental sgnal from the nstrument for whch x 0 s to be determned, and n s the number of samples. Ths formula only apples f there s no replcaton of each measurement. To calculate the concentraton of a sample where the fluorescence ntensty s.9, 1. Use the calbraton equaton determned prevously, y = 1.930x + 1.518, wth y 0 =.9, gvng x 0 = 0.7 pg ml -1.. Calculate the standard devaton s x0 usng the equaton above. For n = 7, s y/x = 0.439, and b = 1.93, we obtan s x0 = 0.6, where the uncertanty s expressed as s x0. 5

Calbraton and Lnear Regresson Analyss: A Self-Guded Tutoral (Part ) CHM314 Instrumental Analyss, Dept. of Chemstry, Unv. of Toronto D. Stone, J. Ells 3. Obtan a 95% confdence nterval n the nterpolated concentraton by determnng the two-taled t- statstc for n- degrees of freedom. It s mportant to note that a two-taled test s requred for the nterpolated results (n- d.o.f.), compared to the one-taled test for the mean. From table of t-values, for ν = n = 5, t 5 =.57. The nterpolated concentraton wth 95% confdence nterval s then reported as C = x 0 ± t ν s x0 = 0.7 ± 0.6 pg ml -1.. Random Error and Calculaton of Concentraton from the Calbraton Curve: Wth Replcaton, Interpolated Value When you perform a sample measurement, you would normally perform more than one measurement of each sample, whch s called replcaton. Replcaton s mportant n the statstcal determnaton of your answer, n order to reduce the uncertanty and mprove the accuracy of your measurement. Random fluctuatons, whch occur n any system, can lead to small errors n each measurement. By performng replcatons at each measurement, some or most of the error due to random fluctuatons can be averaged out. If replcatons are performed, the formula n the prevous secton must be modfed to account for the extra degrees of freedom, as a result of the extra measurements. The formula for the standard devaton n x 0 wth m replcatons s s x0,r = s y x b 1 m + 1 n + ( y 0 y ) ( x x ) b where the varable are the same as before. When workng wth a calbraton curve wth n measurements and a sample measurement y 0, the concentraton wth error as read from the calbraton curve s x 0 ± s x0,r..3 Random Error and Calculaton of Concentraton from the Calbraton Curve: No Replcaton, Extrapolated Value In some cases, the measurement value for the sample wll be outsde the measured range of you calbraton curve. Whle ths stuaton s not desrable, due to the possblty of nonlnear effects outsde the measurement range, t s sometmes unavodable, and the results can stll be used! All ths requres s knowledge of a dfferent way to calculate the standard devaton for extrapolaton, s xe = s y x b 1 n + y b x x ( ) where n s the number of calbraton values. The dfferences between ths equaton and the prevous ones s that replcatons are not taken nto account, and y 0 = 0, whch s shown as part of the numerator n the square root. y 0 s shfted to the x-axs, and all calbraton values are calculated from there. The reported sample concentraton s then x E ± s xe..4 Lmts of Detecton As mentoned above, there s always some error assocated wth any nstrumental measurement. Ths also apples to the baselne (or background or blank) measurement,.e. the sgnal obtaned when no analyte s present. One very mportant determnaton that must therefore be made s how large a sgnal needs to be before t can be dstngushed from the background nose assocated wth the nstrumental measurement. Varous crtera have been appled to ths determnaton, however the generally accepted rule n analytcal chemstry s that the sgnal must be at least three tmes greater than the backgound nose. 6

Calbraton and Lnear Regresson Analyss: A Self-Guded Tutoral (Part ) CHM314 Instrumental Analyss, Dept. of Chemstry, Unv. of Toronto D. Stone, J. Ells Formally, then, the lmt-of-detecton (lod) s defned as the concentraton of analyte requred to gve a sgnal equal to the background (blank) plus three tmes the standard devaton of the blank. That s, we frst calculate the nstrument response obtaned wth no analyte: y lod = y blank + 3s blank and convert that value nto the lmt-of-detecton by nterpolaton usng the calbraton equaton. Where no blank has been measured, we can use the calbraton data and regresson statstcs nstead. In ths case, we would use the y-ntercept and standard devaton of the regresson: y lod = a + 3s y/x Agan, the actual lmt-of-detecton s the concentraton of analyte gvng rse to ths value. We can therefore obtan the confdence nterval for the lmt-of-detecton n the same way as for any nterpolated value as shown above. When performng a calbraton, you should always determne and report the lod from your calbraton data, n addton to the regresson statstcs outlned above. The lod represents the level below whch we cannot be confdent whether or not the analyte s actually present. It follows from ths that no analytcal method can ever conclusvely prove that a partcular chemcal substance s not present n a sample, only that t cannot be detected. In other words, there s no such thng as a zero concentraton! 7