Introduction to Error Analysis

UNIVERSITÄT BASEL DEPARTEMENT CHEMIE Introduction to Error Analysis Physikalisch Chemisches Praktikum Dr. Nico Bruns, Dr. Katarzyna Kita, Dr. Corinne Vebert 2012

1. Why is error analysis important? First of all, every measurement of any value (in chemistry, biology, physics,...) always has some error. Therefore, every time a result of an experiment should be presented in the following way: (X ± ΔX) [units]; for example (20 ± 2) 0 C. In our laboratory work, we want to have possibly small errors, and for analysis of our results we always need to estimate those errors. Depending on the error, we can discuss the results. For example, if we obtain a value and its error is 120 %, that means either we have made some serious mistake with the calculations, or something was very wrong with the experiment, but in general such a result does not make much sense, and is useless in the laboratory practice. 2. What types of errors can we have? There are a few types of errors, some of them are easier to eliminate than others. Firstly, we have large errors, which in general are dependent on the person performing the experiment and / or documenting the data, and result from human mistakes. For example, we measured 19 0 C, but we wrote down 9 0 C, or we write 1.9 0 C instead of 0.19 0 C, etc. It could also happen that we make a mistake and instead of measuring the size of one object we measure a different object, or we confuse radius with diameter, and so on. As a rule, such 'strange' results, need to be always eliminated from data analysis, and they are not included in error calculations either. In order to eliminate those errors, we need to work carefully, pay attention to small details, perform necessary calibrations and control experiments. It is also good to work in small groups, where one person can check on what their partner does and control data documentation. Additionally, we always want to repeat the experiment, to make sure that the result is reproducible. Next, we have systematic errors, which in general cannot be avoided. They result from the fact that we use some equipment (machines), which by definition are never 100 % precise. Finally, we have statistical errors, which are difficult to avoid as well. They appear because there are always small differences in the measured values depending on external conditions (temperature, pressure, humidity). Therefore, it can happen than the measurements taken in summer will slightly differ from the winter results. We can minimize such errors by careful machine calibration and trying to avoid dramatic changes in external conditions. 3. Ways to treat errors 3.1. One can treat errors in three ways: If a variable has been measured several times (at least 10 times) it can be treated according to the laws of statistics ( Gaussian error propagation) in order to calculate a mean value and a standard deviation. However, this case is quite rare for the experiments in the PC Praktikum, because most often the number of individual measurements is not high enough for a statistical treatment. 2

If you are able to measure a variable only a couple of times,e.g., due to time limitations, the error has to be estimated as a maximal error. If the results can be theoretically described by a mathematical function, it is possible to estimate the error by analysis of the corresponding graph. This can be done by hand or solved numerically by a computer. 3.2. Proper estimation of errors If errors can t be treated statistically, it is your responsibility as a researcher to estimate maximal errors of a single variable. For example, the error could be the last digit of an instrument, e.g. 0.01 mg for a scale with a display showing 126.56 mg. However, a lot of instruments (especially ones with a digital display) show more digits than their actual accuracy allows. Therefore, it is wise to think twice about the accuracy of an instrument. The scale might have an accuracy of 0.02 mg. Instruments are normally calibrated and the accuracy is stated in the manual or in some kind of certificate. If this information is not available, it is your job to estimate the error and it is always better to overestimate it, in the case of the scale for example 0.05 mg! The error might also be larger than estimated from the display due to other experimental facts. E.g. a stop-watch might be accurate to 0.02 s. However, if you stop the time of an experiment manually, your reaction time will be a lot longer than those 0.02 s. Therefore, your reaction time has to be taken as maximal error. 3.3. How to state errors A single variable was measured with an error. The experimental result can be written as: There are several ways to state errors: Absolute error Relative error Per cent error / 100% / Example: = 39.23 cm; error: =0.28 cm This result can be written as: = (39.23 ± 0.28) cm; / = ±0.007, or 100% / = ± 0.7% = (39.2 ± 0.3) cm It is not exact to write = 39.2 cm ± 0.7%. 3

3.4. Decimal places The result of a measurement is x = 4.21596847 ± 0.00686521. In this case, the third decimal place is already affected by the error. Therefore, it does not make sense to state all the following decimal places. The result is always presented up to and including the first uncertain decimal place. (In rare cases the second uncertain decimal place can also be stated.) The error has to be presented with a maximum of two decimal places different from cero. The error has to be rounded up. Applying this rule to the example results in: x = 4.2160 ± 0.0069 or x = 4.216 ± 0.007 3.5. Error propagation I In most cases, the result of an experiment depends on more than one measured value. Each value has to be determined individually and each of these values has its error. So how do these errors influence the error of the final result? They can accumulate or they can cancel each other out to some degree. The latter case will be discussed later (Gaussian error propagation). Here, we will assume the case that all individual errors falsify the true value into the same direction. We will therefore discuss the error propagation of the maximal error. The result of an experiment F is a function of several individual values x,y,z, We can write: F = f(x,y,z, ) A Taylor expansion gives (in a first approximation) F: 3.5. Two important cases will be highlighted: This results in an error of: Addition of absolute errors of the individual values 4

This results in an error of: Addition of relative errors of the individual values If F is a more complicated function of x,y,z, (log, exp, sin, etc.), the error has to be calculated explicitly by using equation 3.5., i.e., the function has to be differentiated with respect to each variable. Alternatively, a graphical error estimation can be carried out in such cases. 4. Mean value The mean value ā is defined as 1 where n is the number of experiments (repetitions), a i are the results of individual experiments, and the sum extends from 1 to n. In general, repeating the experiment n times increases the accuracy of the mean value, because it will be more influenced by the most often obtained results. In laboratory, every experiment should be repeated at least three times! A result from one experiment only makes no sense! An example: we measure the length of a table six times (with the accuracy of the ruler being 0.5 cm), and obtain the following results: n length [cm] 1 148.5 2 149.0 3 148.0 4 123.0 Contains a large error! We do not include it in analysis! 5 150.0 6 149.5 Mean value: = (148.5 + 149.0 + 148.0 + 150.0 + 149.5) / 5 = 149.0 cm 5. Accuracy of the mean value Now we need to find how accurate the mean value is. There are two ways of presenting the mean value's error: as absolute error or as relative error. Both are equivalent and have the same meaning, they are just differently presented. Absolute errors give the deviation between the mean value and the measurements in the same units, e.g. (20 ± 2) 0 C, while relative errors show this difference as a per-cent of the mean value: 20 0 C, / 100% = 1 %. 5

Let us come back now to our table length measurements and try to estimate the error. We do it by comparing the mean value to the experimental results: n l [cm] l l [cm] 1 148.5 0.5 2 149.0 0.0 3 148.0 1.0 4 150.0 1.0 5 149.5 0.5 The maximum difference is 1.0 cm, and therefore it is called the maximum error. This number is therefore taken as the experimental error (it can be of course smaller, but it definitely cannot be bigger). We can display it as the absolute error: = (149.0 ± 1.0) cm, or as a relative error (per-cent of the mean value): = 149.0 cm, / 100% = 0.7 %. 0.7 % is a very small error. In many experiments, errors up to 20 % are tolerated. If the error is higher than 20 %, something is wrong (probably calculations). If the error exceeds 100 %, something is VERY wrong: check calculations again and think what could have produced such an error. Every time, errors need to be discussed in PC Praktikum reports! 6. Standard deviation When we have performed more than at least 5 repetitions of an experiment, we can use the standard deviation to estimate the errors. Standard deviation (S n ) is a statistical measure of the error, and can only be applied to a large series of data (many repetitions). It is generally smaller than the maximum error and it tells us about the data distribution. Definition: 1 1 7. Error propagation II: Gaussian error propagation In physical chemistry, as in many different cases, we often have values that depend on other values, for example light absorption depends on concentration, or gas pressure depends on temperature. In such a case, the error of the measured value will also depend on the error of the other value. For example, if we make a sugar solution in water, the mass of this solution (C) is a sum of the mass of water (A) and the mass of sugar (B). The masses of water and sugar are determined with some error, so the accuracy of the final mass will depend of the errors of both masses. So if C = A + B, then ΔC = ΔA + ΔB. 6

In the most general case, if our value X is a function of some variables: X = f(a, B, C,...), the error of X is determined by the following procedure: S Example: We have an ideal gas equation, pv = nrt. We perform an isothermal experiment (T = const.) and measure the change of volume (V) with pressure (p), so we need to find the error ΔV. Our 'experimental' equation takes the form:, where n and R are constants (and as a rule we do not consider any errors for physical constants, therefore Δn = 0 and ΔR = 0). We have, however, errors from measuring temperature, ΔT, and pressure, Δp (which we should know from performing multiple measurements and estimating the standard deviations of T and p, see 6). By applying the above formula, we can see that: Now we need to find partial derivatives and, which are:,, so so And the last step is to insert the partial derivatives into the general equation: After including all numbers (and checking for units!), we obtain ΔV. 8. Graphical data presentation In some cases, we need to present our results graphically, and most often there is a mathematical function that theoretically describes what the graph should look like. Below a few comments about how to prepare a good graph: Graphics can be drawn by hand (on millimeter paper only!) or prepared on a computer. In both cases, the axes have to be labeled, including units, e.g. speed of light, km/h. The scale on the axes has to have equal increments (0-10-20, or 0-500-1000, whatever fits to your experiment), please do not 7

mark the values that correspond to all your experimental data. If you repeated the experiment a few times and obtained a mean value, only this value goes on the graph. Every experimental point has to have error bars in both X and Y directions. An example of a good graph is given below (absorbance versus concentration). Please note, that from your experiments you will only have data points, so it does not make too much sense to connect them in your graph by any lines, especially if you do not know what the line should be. We will come back to that point later. 1.0 0.8 Absorbance 0.6 0.4 0.2 0.0 0.000 0.005 0.010 0.015 0.020 0.025 c, mol/dm 3 9. The least squares method In many experiments, we measure the dependence of a physical value as a function of another value, and the relationship between the two is predicted by theory. It could be a linear dependence ( ), an exponential growth / decay, or many more mathematical functions. In general, fitting of the obtained data with a theoretical model provides fitting parameters (for example a and b in the linear fit), which have some physico-chemical meaning and could be interesting to us. Now the question is: how to find good fitting parameters, when we have experimental points only, like for example in the graph above? We will consider the simplest case of a linear dependence. For such fitting (which is often called 'linear regression'), we have to make sure that we have enough data points (remember: it is always possible to draw a straight line through two points: the line will be a perfect fit, but may have little to no physical meaning, if the points themselves have large errors). In general, 5 points are enough for a reasonable fit. The 'best' line is defined such as it lies the closest to the experimental points. Practically, we look for the smallest sum of square distances between the points and the line (and therefore the name: 'the least squares method'). When we have a linear function, this procedure allows for the calculation of the slope (a) and the intercept (b) from the following formulas: 8

a y where n is the number of points taken for fitting, x i and y i are the coordinates of the data points, respectively, and the sums extend from 1 to n. The calculations of a and b can be done automatically by most calculators and with computer programs (linear fit option). There are also on-line tools that calculate function parameters, see e.g. http://www.chemie.unibas.ch/~huber/statistik/linreg/index.html. For example, fitting of the points from our example graph with OriginPro TM gives slope of 38.8 dm 3 /mol and intercept of 0.004, so the function is: A = 38.8 c + 0.004. 1.0 0.8 Absorbance 0.6 0.4 0.2 0.0 0.000 0.005 0.010 0.015 0.020 0.025 c, mol/dm 3 In this particular example, the theoretical dependence of absorbance on concentration is, where l is the light pathway through the solution, and ε is extinction coefficient, a physicochemical parameter characterizing the ability of a compound to absorb electromagnetic radiation. If we need to find this parameter, we can easily calculate it from the slope of the fitted line ( / ). 9

10. Regression errors Since the fitted line does not go through all experimental points and only as close to them as possible, we need to discuss fitting errors. In other words, we should ask how good the fit is, or: how well does the fitted line follow the experimental data? One common approach is to analyze the so-called coefficient of determination R 2 (R is the correlation coefficient). In a data set accounted for by a statistical model, R 2 is the proportion of variability in that set of data.. The value of R 2 lies between 0 and 1, and a good fit is described by R 2 well above 0.9 (in our example we had R 2 = 0.99931, which means a very good fit). The perfect fit has R 2 = 1. The formula to calculate R is the following: Again, R 2 is normally computed by calculators and computer programs together with linear regression parameters. On the other hand, R 2 does not inform us directly about individual errors of the slope and intercept of the fitted line. We can calculate them from the following formulas: 2 1 In conclusion: when performing your experiments, try to do as many repetitions as possible, think if the measured value depends on other values, and always discuss the experimental errors in your reports! 11. Further Reading John R. Taylor: An Introduction to Error Analysis: The Study of Uncertainities in Physical Measurements, 2nd edition, University Science Books, Sausalito, 1997. Manfred Drosq: Der Umgang mit Unsicherheiten: Ein Leitfaden zur Fehleranalyse, Facultas Universitätsverlag, Wien, 2006. 10