Calibration and Linear Regression Analysis: A Self-Guided Tutorial

Calbraton and Lnear Regresson Analyss: A Self-Guded Tutoral Part The Calbraton Curve, Correlaton Coeffcent and Confdence Lmts CHM314 Instrumental Analyss Department of Chemstry, Unversty of Toronto Dr. D. Stone (prepared by J. Ells) 1 The Calbraton Curve and Correlaton Coeffcent Every nstrument used n chemcal analyss can be charactersed by a specfc response functon, that s an equaton relatng the nstrument output sgnal (S) to the analyte concentraton (C). Ths response functon may be lnear, logarthmc, exponental, or any other approprate mathematcal form, dependng on the nature of the behavour of the system beng measured, and the measurement process tself. Whle ths may be known theoretcally, varous factors (such as the specfc analyte beng measured, nterference effects caused by other components of the sample matrx, or random expermental errors) requre that we calbrate each nstrument for the specfc analyte and measurement condtons to be used n a partcular experment. A calbraton curve s an emprcal equaton that relates the response of a specfc nstrument to the concentraton of a specfc analyte n a specfc sample matrx (the chemcal background of the sample). As wth the nstrument response functon, the calbraton curve can have a number of mathematcal forms, dependng on the type of measurement beng performed. Some common examples are lsted below: Type Lnear (zero ntercept) Lnear (non-zero ntercept) Logarthmc Equaton S = bc S = bc + a S = a + b ln C or S = a +.303b log C The calbraton curve s obtaned by fttng an approprate equaton to a set of expermental data (calbraton data) consstng of the measured responses to known concentratons of analyte. For example, n molecular absorpton spectroscopy, we expect the nstrument response to follow the Beer-Lambert equaton, A = εbc, and so we would ft a lnear equaton wth zero ntercept to the data. On the other hand, f we were measurng electrochemcal cell potentals (.e. potentometry) we would expect the response to be gven by the Nernst equaton, whch s logarthmc n form. We would therefore ether ft a logarthmc equaton to the calbraton data, or lnearse the data by calculatng the sgnal response S as 10 E (where E s the cell potental). The most common response functon encountered n nstrumental analytcal chemstry s lnear, so we requre some means of determnng and qualfyng the best-ft straght lne through our calbraton data. Before dscussng ths n detal, however, a word of cauton: even when we expect a lnear nstrument response functon, we should not assume that the calbraton data must always be lnear. In fact, a moment of reflecton reveals that we already know that ths cannot be true. For example, stray lght and polychromatc radaton cause non-lnear devatons from Beer s law at hgher concentratons; quenchng and self-absorpton can cause fluorescence ntenstes to start decreasng wth ncreasng concentraton; and column- or detector-overload can cause non-lneartes n chromatography.

Calbraton and Lnear Regresson Analyss: A Self-Guded Tutoral (Part ) CHM314 Instrumental Analyss, Dept. of Chemstry, Unv. of Toronto D. Stone, J. Ells 1.1 The Correlaton Coeffcent In Part 1 of the tutoral, we saw how to use the trendlne feature n Excel to ft a straght lne through calbraton data and obtan both the equaton of the best-ft straght lne and the correlaton coeffcent, R (sometmes dsplayed as R ). There are n fact varous correlaton coeffcents, but the one we are nterested n here s the Pearson or product-moment correlaton coeffcent (often smply referred to as the correlaton coeffcent ). The Pearson R value provdes a measure of the degree to whch the values of x and y are lnearly correlated. We can assess ths vsually usng a scatter plot (Fgure 1), n whch we also mark the centrod of the data, { x,y}. y 8 6 4 { x,y} 4 6 8 x Fgure 1 XY scatter plot showng the centrod of the data If x and y were lnearly correlated, we would expect all the ponts to fall on a straght lne passng through the centrod. As a result, we would expect all x values to be unformly dstrbuted ether sde of x ; smlarly, all the y values should be unformly dstrbuted about y. The Pearson R s calculated usng the formula [ ( x x )( y y )] R = ( x x ) ( y y ) It follows that f x and y are perfectly correlated n a lnear fashon, we would expect the value of R to be ether +1 or -1, dependng on whether y ncreases (postve slope) or decreases (negatve slope) wth x. To demonstrate how to calculate ths formula n Excel, we return to our prevous example of fluorescence ntensty data from Part 1. Then, 1. Set up a spreadsheet wth the x and y values n columns

Calbraton and Lnear Regresson Analyss: A Self-Guded Tutoral (Part ) CHM314 Instrumental Analyss, Dept. of Chemstry, Unv. of Toronto D. Stone, J. Ells. In the adjacent cells, set up expressons for ( x x ), ( y y ), ther squares, and ther product. For nstance, the formula for ( x x ) may look lke =B3-AVERAGE(B$:B$8), dependng on the locaton of your cells n the spreadsheets. 3. Determne the sums of squares ( x x ) and ( y y ), and the sum of products [( x x )( y y )] n Excel and nsert these values n the formula for R. 4. To calculate the square root n the denomnator, use the SQRT functon. The easest way to calculate R n Excel s by settng up a table to calculate the requred values, as shown below. As you can see ths, yelds a correlaton coeffcent R = 0.9978, so the data are well-correlated and the best-ft lne descrbes the data. A few ponts to menton regardng the correlaton coeffcent: o It s essental to retan a large number of sgnfcant fgures n the numerator and denomnator durng the calculaton, otherwse a msleadng value of R may be obtaned. o Even a hgh R value of, say, 0.9991 does not necessarly ndcate that the data fts to a straght lne. The trendlne should always be plotted and nspected vsually. R s more dscrmnatng n ths respect, although t no longer ndcates the slope of the regresson lne. Ths, however, s evdent by nspecton. o Any curvature n the data wll result n erroneous conclusons about the correlaton. R values are only applcable to lnear correlatons. Nonlnear correlatons are possble, but nvolve a dfferent measure than R, and R values wll not necessarly be close to 1. o The statstcal sgnfcance of R depends on the number of samples n the data set n. 1. The Regresson Lne Calculaton of the regresson lne s straghtforward. The equaton wll have the form y = bx + a, where b s the slope of the lne and a s the y-ntercept. The slope s gven by the formula [ ( x x )( y y )] b = ( x x ) and the ntercept s 3

Calbraton and Lnear Regresson Analyss: A Self-Guded Tutoral (Part ) CHM314 Instrumental Analyss, Dept. of Chemstry, Unv. of Toronto D. Stone, J. Ells a = y bx, both of whch can be easly calculated n Excel wth the table of data used n the prevous secton. The method s smlar to that n the prevous secton. The AVERAGE functon can be used to calculate x and y. Usng the fluorescence data, the equaton of the lne s y = 1.930x + 1.518. Fgure shows an example of a regresson lne wth the calbraton data, centrod and y-resduals dsplayed. Note that, as s commonly the case, t s assumed that any error n the data les solely n the y- values. Techncally, the best-ft straght lne shown s termed the lne of regresson of y on x. Ths method for lnear regresson assumes that the errors are normally dstrbuted. Other methods exst that do not make ths type of assumpton. y 8 6 4 y = 0.590x +.000 r = 0.754 4 6 8 x Fgure XY scatter plot showng the centrod (red crcle), regresson lne, and y-resduals. Fnally, t should be noted that errors n y values for large x values tend to dstort or skew the best-ft lne. Ths can be taken nto account usng ether a weghted or robust regresson technque. However, ths s beyond the scope of the present tutoral. Errors and Confdence Lmts In any area of measurement scence, there s always some error n any sgnal. The error can arse from many sources, and can normally be accounted for usng statstcal technques. However, because there s always some randomness assocated wth measurement error, t contrbutes some degree of uncertanty nto the measurement, whch corresponds to a certan confdence lmt, wthn whch we can be certan about the accuracy of our measurement. Ths leads to the way n whch results are normally reported, where a measurement s reported wth the error, such as C = 51. ±0.05 µg/ml. The ±0.05 s the standard error. When preparng a calbraton curve, there s always some degree of uncertanty n the calbraton equaton. To calculate the standard errors of the slope and the y-ntercept, we requre the resduals. The resdual s the dfference between the measured y-value and the y-value calculated from the calbraton curve, 4

Calbraton and Lnear Regresson Analyss: A Self-Guded Tutoral (Part ) CHM314 Instrumental Analyss, Dept. of Chemstry, Unv. of Toronto D. Stone, J. Ells for a gven observaton. The calculated y-value s easly determned from the calbraton equaton and denoted y ˆ, so the resdual would be ( y ˆ ). y Once the resduals are known, we can calculate the standard devaton n the y-drecton, whch estmates the random errors n the y-drecton. s y x = ( y y ˆ ) n Ths standard devaton can be used to calculate the standard devatons of the slop and the y-ntercept usng the formulas s b = s y x ( x x ) s a = s y x n x ( x x ) where s b s the standard devaton of the slope and s a s the standard devaton of the y-ntercept. The confdence lmts can then be calculated from the t-statstc for n degrees of freedom. Tables of t-statstcs are avalable n any undergraduate statstcs textbook, and are also ncluded n the lab manual. Note that some table gve values of t for dfferent values of n, whle others gve them for values of ν = n 1. Check carefully so that you use the approprate value. The confdence lmts for the slope are then b±t n- s b and for the y-ntercept a±t n- s a. For a large number of samples wth a 99% confdence nterval, we can use t n- =.58. For the fluorescence data, the standard devaton of the slope s s b = 0.0409, so the slope wth confdence nterval b = 1.93 ±(.58 0.0409) = 1.93 ±0.11. The y-ntercept wth confdence nterval s a = 1.5 ±0.76..1 Random Error and Calculaton of Concentraton from the Calbraton Curve: No Replcaton, Interpolated Value Once we know the equaton of the regresson lne, we can easly calculate the concentraton x 0 from a gven sgnal y 0. However, because we are now gong from a y-value to an x-value (nstead of the other way around), we need to fnd the error n x. Ths can be done wth the standard devaton n x 0 s x0 = s y x b ( ) ( ) 1 + 1 n + y 0 y b x x Here, y 0 s the expermental sgnal from the nstrument for whch x 0 s to be determned, and n s the number of samples. Ths formula only apples f there s no replcaton of each measurement. To calculate the concentraton of a sample where the fluorescence ntensty s.9, 1. Use the calbraton equaton determned prevously, y = 1.930x + 1.518, wth y 0 =.9, gvng x 0 = 0.7 pg ml -1.. Calculate the standard devaton s x0 usng the equaton above. For n = 7, s y/x = 0.439, and b = 1.93, we obtan s x0 = 0.6, where the uncertanty s expressed as s x0. 5

Calbraton and Lnear Regresson Analyss: A Self-Guded Tutoral (Part ) CHM314 Instrumental Analyss, Dept. of Chemstry, Unv. of Toronto D. Stone, J. Ells 3. Obtan a 95% confdence nterval n the nterpolated concentraton by determnng the two-taled t- statstc for n- degrees of freedom. It s mportant to note that a two-taled test s requred for the nterpolated results (n- d.o.f.), compared to the one-taled test for the mean. From table of t-values, for ν = n = 5, t 5 =.57. The nterpolated concentraton wth 95% confdence nterval s then reported as C = x 0 ± t ν s x0 = 0.7 ± 0.6 pg ml -1.. Random Error and Calculaton of Concentraton from the Calbraton Curve: Wth Replcaton, Interpolated Value When you perform a sample measurement, you would normally perform more than one measurement of each sample, whch s called replcaton. Replcaton s mportant n the statstcal determnaton of your answer, n order to reduce the uncertanty and mprove the accuracy of your measurement. Random fluctuatons, whch occur n any system, can lead to small errors n each measurement. By performng replcatons at each measurement, some or most of the error due to random fluctuatons can be averaged out. If replcatons are performed, the formula n the prevous secton must be modfed to account for the extra degrees of freedom, as a result of the extra measurements. The formula for the standard devaton n x 0 wth m replcatons s s x0,r = s y x b 1 m + 1 n + ( y 0 y ) ( x x ) b where the varable are the same as before. When workng wth a calbraton curve wth n measurements and a sample measurement y 0, the concentraton wth error as read from the calbraton curve s x 0 ± s x0,r..3 Random Error and Calculaton of Concentraton from the Calbraton Curve: No Replcaton, Extrapolated Value In some cases, the measurement value for the sample wll be outsde the measured range of you calbraton curve. Whle ths stuaton s not desrable, due to the possblty of nonlnear effects outsde the measurement range, t s sometmes unavodable, and the results can stll be used! All ths requres s knowledge of a dfferent way to calculate the standard devaton for extrapolaton, s xe = s y x b 1 n + y b x x ( ) where n s the number of calbraton values. The dfferences between ths equaton and the prevous ones s that replcatons are not taken nto account, and y 0 = 0, whch s shown as part of the numerator n the square root. y 0 s shfted to the x-axs, and all calbraton values are calculated from there. The reported sample concentraton s then x E ± s xe..4 Lmts of Detecton As mentoned above, there s always some error assocated wth any nstrumental measurement. Ths also apples to the baselne (or background or blank) measurement,.e. the sgnal obtaned when no analyte s present. One very mportant determnaton that must therefore be made s how large a sgnal needs to be before t can be dstngushed from the background nose assocated wth the nstrumental measurement. Varous crtera have been appled to ths determnaton, however the generally accepted rule n analytcal chemstry s that the sgnal must be at least three tmes greater than the backgound nose. 6

Calbraton and Lnear Regresson Analyss: A Self-Guded Tutoral (Part ) CHM314 Instrumental Analyss, Dept. of Chemstry, Unv. of Toronto D. Stone, J. Ells Formally, then, the lmt-of-detecton (lod) s defned as the concentraton of analyte requred to gve a sgnal equal to the background (blank) plus three tmes the standard devaton of the blank. That s, we frst calculate the nstrument response obtaned wth no analyte: y lod = y blank + 3s blank and convert that value nto the lmt-of-detecton by nterpolaton usng the calbraton equaton. Where no blank has been measured, we can use the calbraton data and regresson statstcs nstead. In ths case, we would use the y-ntercept and standard devaton of the regresson: y lod = a + 3s y/x Agan, the actual lmt-of-detecton s the concentraton of analyte gvng rse to ths value. We can therefore obtan the confdence nterval for the lmt-of-detecton n the same way as for any nterpolated value as shown above. When performng a calbraton, you should always determne and report the lod from your calbraton data, n addton to the regresson statstcs outlned above. The lod represents the level below whch we cannot be confdent whether or not the analyte s actually present. It follows from ths that no analytcal method can ever conclusvely prove that a partcular chemcal substance s not present n a sample, only that t cannot be detected. In other words, there s no such thng as a zero concentraton! 7