Evaluation of Quantitative Data (errors/statistical analysis/propagation of error) 1. INTRODUCTION Laboratory work in chemistry can be divided into the general categories of qualitative studies and quantitative studies. Qualitative studies include such areas as chemical synthesis, chemical identification, mixture separation and structure elucidation. Quantitative studies include such areas as quantitative chemical analysis, physical property determinations and reaction rate studies. Qualitative work is seldom completely devoid of quantitative data but usually the required level of precision and accuracy is much different in qualitative studies than in quantitative studies. For example, there is a significant difference in the precision and accuracy requirements in using an approximate melting point to identify a substance and precisely measuring the melting point of a substance for inclusion in a reference book. In the first-semester physical chemistry laboratory, we will attempt to make precision measurements like those suitable for publication. In any quantitative experiment measurement errors or uncertainties exist which can be reduced by improvement in techniques and methods but can never be completely eliminated. The magnitude of these errors must be estimated to establish the reliability (accuracy) of the results. The assessment of the reliability of results is a very important part of any quantitative experiment. This assessment involves an understanding of the sources of error, an estimation of the magnitude of each error and the application of statistical and/or propagation of error methods to help establish the reliability of the final result. 2. TYPES of ERRORS The term "error" is defined as the difference between a measured, calculated or observed value and the "true" or accepted value. Often the "true" or accepted value is not known and we must use the data to determine how much confidence we can place in the experimental results. Errors can generally be classified as (a) random or indeterminate, (b) systematic or determinate or (c) gross or illegitimate. Gross errors or illegitimate errors are the result of a mistake. The mistake may be in the recording of a measurement or instrument reading, it may be a miscalculation or it may be due to an error in the model or choice of model that you are using to guide your experiment. Recording errors and calculation errors are usually easy to spot because they produce a data point which is obviously inconsistent with other data from the same experiment (see figure 1). Model errors are much harder to detect. For example, data is fitted with a linear equation that has a positive slope, but the theory predicts that the data should have a negative slope presents a dilemma; is the data poor or the model wrong?
Gross errors can always be eliminated from the experiment by greater care and attention to proper experimental procedures. Miscalculation errors can be corrected and the data salvaged. Data from recording errors usually cannot be corrected and generally should be rejected discarded run. Figure 1 illustrates a data point which contains a recording or calculation error. Sometimes the data point is so obviously bad that it can be discarded after documenting that the data has been discarded without further justification. In other cases the point appears bad but the case for discarding the point is less clear. It is best to use statistical methods to help in the decision to discard the data point. x x x x x x ---------x---------------x------------------ True value x x x x Run number Figure 1- Gross error Systematic errors or determinate errors are not so easy to detect. These errors tend to affect all results from the experiment by the same amount and in the same direction. Because of this, all data is affected and none of the results will stand out as obviously bad. Systematic errors can also be called operator bias. Examples of systematic errors include an incorrectly standardized or malfunctioning instrument which always reads too high or too low. A systematic error could be introduced by a person who does not understand parallax when reading a buret or pipet. Systematic errors also include what is usually referred to as number bias. Many people have a bias toward always ending a reading with an even digit or with a zero or a five. Other sources of this type of error are such things as using the wrong method to make the measurement or not waiting long enough to allow the system to reach equilibrium before taking the measurement. Since these errors are systematic, if they are detected, they can be eliminated from the experiment. Sometimes the affected data can be corrected and used to give reliable results. Figure 2 illustrates data which contains systematic errors. 2
x x x x x x x x x x x -------------------------------------------- True value Run number Figure 2 - Systematic error Random or indeterminate errors are errors that cause results to fluctuate in a random fashion about the "true" or accepted value. These errors are completely random in nature. With experience and careful attention to good experimental techniques these errors can be minimized but they can never be completely eliminated from quantitative experiments. Figure 3 illustrates data which contains only random errors x x x x x ----x-------------------x-------------- True value x x x x Run number Figure 3 - Random error For the rest of our discussion, we will assume that gross and systematic errors have been eliminated from the experiment and the data contains only random errors. It is a waste of time to attempt to establish the validity of a result which contains either a gross or systematic error. A professional chemist has the responsibility to insure that results are free of gross errors and that systematic errors have been reduced to an insignificant level before you validate your results. 3
3. PRECISION and ACCURACY Precision and accuracy are two terms that usually enter into any discussion of the reliability of an experimental result. Precision describes the reproducibility of the result. Precision is a statement of the numerical agreement between two or more measurements made on identical systems in exactly the same manner. Accuracy described how close the measurement or result comes to the "true" or accepted value. Accuracy is usually expressed as either the absolute difference between the measured and "true" values (eq 1) or as the relative difference between the "true" and experimental values ( eq 2). S= X μ (1) Where: S = absolute error X = experimental measurement or result μ = "true" or accepted value X μ R = μ (2) Where: R = relative error in the measurement or result Relative errors are often multiplied by 100 and expressed as parts per hundred (%). Other common multipliers are 1000 to give parts per thousand (ppt) or 1,000,000 to give parts per million (ppm). A standard sample has an accepted absorbance value of 0.516 The measured absorbance of this sample is 0.509 Then: S = 0.509-0.516 = 0.007 and: R = (0.509-0.516)/0.516 R = 0.014 = 1.4% = 14 ppt = 14000 ppm 4
4. STATISTICAL ANALYSIS If a measurement of some property of a system is repeated several times (i.e. replicate data) and if the measurement contains only random errors, a statistical treatment of the data can provide an indication of the reliability of the measuring process. Strictly speaking, the laws of statistics apply only to very large pools of data. When applying these laws to a typical collection of laboratory data it is necessary to assume that the laboratory data is representative of the larger pool of data that would have been collected if sufficient time was available to perform many runs rather than the usual three or four runs. Because there is no assurance that this assumption is correct, the conclusions of a statistical analysis on typical laboratory data is usually stated in terms of probabilities. The definitions and uses of most common statistical terms are given below. (a) N = The number of results in your data set. (b) X i = The value of the i th result. A set with N = 4 would have results X 1, X 2, X 3, and X 4. _ (c) X = The average or mean of the results. Assuming that the X i s contain only random errors, the mean value is usually a better indicator of the true value than any one of the individual values. (d) (e) d i = deviation from the average of the i th value. _ = X - μ s = standard deviation s = d 2 N 1 i (f) s = standard error s = s N 5
(g) t = student s t-factor, usually gotten from a table such as Table 1. N-1 t(90%) t(95%) t(99%) 1 6.31 12.7 63.7 2 2.92 4.30 9.92 3 2.35 3.18 5.84 4 2.13 2.78 4.60 5 2.01 2.57 4.03 6 1.94 2.45 3.71 8 1.86 2.31 3.36 10 1.81 2.23 3.17 Table 1 - Partial Table of Student s t Notice that the student s t-factors are dependent on the size of your data set and the confidence level you want to attach to your data. (h) λ =confidence limits λ = s t The confidence limit is the range at the selected confidence level on either side of the average value between which the true value should lie. In this course we will report average values at ± 95% confidence limits. When 95% confidence limits are used in conjunction with average value, a statement is made to the effect that the next measurement taken will lie within the range expressed by the confidence limits 19 out of 20 times. X = X ±λ ( 95% ) 6
5. REJECTION OF A RESULT Consider the following data collected during an experiment to measure the molecular weight (MW) of a gas. Run # 1 2 3 4 MW(g/mol) 58.16 60.25 61.31 52.54 Table 2 - Experimental results A closer look at these results shows that run 4 is significantly out of line with the other three results. This result appears to be "bad" but cannot simply be rejected because it doesn't appear to fit. If there are no observations that would indicate any problems with this particular run, then the run can be rejected only using a statistically valid method. The simplest statistical test for rejection is known as the Q test. Q = R 1 /R 2 Where R 1 = questioned value - nearest neighbor R 2 = highest X - lowest X (including questioned value) The R's are known as ranges. In the calculation of Q, the absolute values of the ranges are used. If Q is greater than the value listed in Table 3, the questioned result can be rejected. More complete Q test tables can be found in most textbooks on statistics. N 3 4 5 6 Q(90%) 0.94 0.76 0.64 0.51 Table 3 - Critical Q Test values for rejection of an extreme result In this course we will reject a data point at the 90% confidence level. 7
As a general rule, only one out of every five data points should be considered for rejection. Thus, in the case of N = 4, only one point can be considered for rejection. For the data in table 2, Q = 0.64 for run #4. Critical Q for N = 4 from Table 3 is = 0.76 Since 0.64 is less than 0.76, X 4 cannot be rejected with 90% confidence and all four runs should be included in the statistical evaluation of this data set. Statistical analysis of the data in table 2 then gives the following results. s = 3.91 s = 1.96 λ(95%) = 6.23 Therefore, at 95% confidence limits MW = 58.06 ± 6.23 g/mol (95%) This means we are 95% certain that the true result is between 51.83 g/mol and 64.29 g/mol. More correctly the result should be written: MW = 58 ± 6 g/mol. Or at the most: MW = 58.1 ± 6.2 g/mol. The reason for the reduction in precision is that the number of figures in a standard deviation becomes less useful as they increase. For the example above, the standard deviations is initially calculated as 0.391 g/mol. There is no meaningful difference in the result if the standard deviation is 0.392 g/mol or 0.390 g/mol. Since no meaningful difference results, no reason exists in reporting the last value. Thus standard deviations should have no more than significant figures (and often only one significant figure). 8
6. PROPAGATION OF ERRORS Propagation of errors is the mathematical process of estimating the expected error in a calculated result caused by the errors associated with the measured variables used in the calculation. This method can be applied to both replicate and derivative data. It is most commonly used with derivative data but it does provide some information about replicate data that cannot be obtained by statistical analysis. The method of propagation of error is generally harder to apply than statistical analysis but it is the only method in common use for evaluating derivative data. Suppose you wish to use the ideal gas law, PV = nrt, to calculate the number of moles (n) of gas in a system. This requires that the variables P, V and T be measured. As with all measured quantities, there will be some random measurement error associated with each variable. The propagation of error method requires that the experimenter provide reasonable estimates of the error associated with each measured variable. The propagation of error method will then allow the experimenter to calculate the combined effect of these estimated errors on the result. In this example, the method of propagation of errors will allow the estimation of the resulting error in the calculated value of n caused by the errors associated with the measured values of P, T and V. When done correctly, the propagation of error method can accomplish two objectives. The first objective is to establish the reliability of a result (calculation of the expected error) without having to resort to multiple determinations and statistical analysis. The second objective is to determine which measurements contribute significantly to the total error and which measurements do not contribute significantly. This analysis allows the experimenter to understand which measurements may need improvement and which measurements do not. There are several methods currently used to do propagation of errors calculations. Some methods calculate the maximum error possible and some methods calculate the most probable error or expected error. The difference between the two methods is mostly philosophical. The maximum error methods assume that all errors occur in the same direction and that no errors cancel. The other methods assume that some errors will cancel, thus reducing the total error in the final result. We will follow this second philosophy and calculate the most probable error or expected error rather than the maximum error. In either method, the calculations require that the dependent variable and all independent variables be connected by one or more mathematical equations. The following symbols are typically used in propagation of errors calculations. Let u = f(x,y,z) where u = dependent variable and x,y,z = independent variables Then S u, S x, S y, S z = absolute error in u, x, y and z R u, R x, R y, R z = relative error in u, x, y and z Remember, R x = S x /x etc. 9
The basic equation for calculating the expected absolute error in a result (S u ) is given by equation 3. 2 2 2 2 u 2 u 2 u 2 u x y x,y x y.z y z x,z x,y S = S + S + S + (3) Sample Expected Error Calculation Suppose the ideal gas equation (equation 4) is used to determine the molar mass (MM) of a gas. The experiment requires that the mass (m), the Kelvin temperature (T), the pressure (P), and the volume (V) of the gas sample are measured. mrt MM = PV (4) Table 4 contains a set of typical data with their estimated absolute errors. This information can be used to calculate the expected error in the experiment. m 1.2117g ± 0.0002g T 298.00K ± 0.05K P V R 735 torr ± 5 torr 500. ml ± 2 ml 0.08206 L Atm/mol K Table 4 - Gas experiment data and estimated errors Since the MM depends on the independent variables g, T, P and V we would say that; a) Why isn't the gas constant (R) included in equation 5? MM = f(m,t,p,v) (5) b) Why is no error listed for R? c) Based on equation 5, give the error equation for this problem. 10
In Class Assignment 1) Your error equation should contain the following partial derivatives. If it does not correct your mistake before continuing. MM MM MM MM ; ; ; m T P V T,P,V m,p,v m,t,v m,t,p 2. Based on equation 4, generate the four required partial derivatives. For example: MM m T,P,V RT = PV 3. Using the data in Table 4, determine the numerical value of each of the four partial derivatives. For example: MM m T,P,V = 50.60mol 1 11
4. Using the numerical values obtained in step 2 and the errors listed in Table 4 calculate the predicted error contributed by each independent variable. 5. Which measurement contributes the greatest amount of error? 6. Calculate the total expected error in MM predicted for this experiment. Author: Roger Hoburg Editing: Ed Tisko 12