APPENDIX A Using Microsoft Excel for Error Analysis

89 APPENDIX A Using Microsoft Excel for Error Analysis This appendix refers to the sample0.xls file available for download from the class web page. This file illustrates how to use various features of Microsoft Excel for error analysis purposes. It will also be useful if you reread Graph and Fit Data with MS Excelsection from the University/General Physics laboratory manual before working through this appendix. It will help to recall how to enter data and perform editing and formatting of data in MS Excel. Suppose that you have performed a series of measurements of two physical quantities x and y. Quantity x is measured in some imaginary units draps and quantity y is measured in some imaginary units cooms. We will begin with the Main Data -worksheet of the file, sample0.xls. Only one measurement was performed for each value of x. These values are entered in column A. If the measuring device, which was used to collect x-values, has the smallest increment on its scale of 0.2 draps then the instrumental uncertainty of this device is at least 0.1 draps 1. This minimum instrumental uncertainty is entered into column C in the same worksheet. Depending on the circumstances, the instrumental uncertainty can be even greater. For instance, it may be that you were not able to read the device s scale clearly or the reading was fluctuating. If this is the case, you can increase the instrumental uncertainty as you feel appropriate. Since every value of x was only measured once, the instrumental uncertainty is the only source of uncertainty we have for this quantity. Next, every value of y was measured 9 times. These values are entered in columns D through L of the same worksheet. The measuring device used to measure y has the smallest scale increment of 0.15 cooms. So, the instrumental uncertainty for this device is 0.08 cooms (half of the smallest increment rounded to one significant figure). This uncertainty is entered in column N. Often, you will be required to average a set of data points. The statistical theory term for average is called the mean. The built-in MS-Excel function =AVERAGE(Range) is used to do this. These values are calculated in column O. 1 Note that it does not matter if you have an analog or digital measuring device. The instrumental uncertainty is at least half of the smallest increment on the scale of the analog device and at least half of the last digit read by the digital device.

90 The mean value of y (equation 0.1) is calculated based on all 9 measurements for each x- data point. Standard deviation measures how widely spread the values in a data set are. If the data points are close to the mean, then the standard deviation is small. As well, if many data points are far from the mean, then the standard deviation is large. If all the data values are equal, then the standard deviation is zero. The standard deviation (equation 0.4) among the measurements is calculated in the Main Data worksheet. The built-in MS-Excel function =STDEV(Range) is used to do that in column P. In order to calculate the statistical uncertainty for each of the y-values based on equation (0.11), the appropriate Student s coefficient has to be chosen from the table from my introductory lecture. For each set of 9 measurements, its value is 2.3. Column Q in the same worksheet contains the function, =IF(COUNT(Range)>15,2,LOOKUP(COUNT(Range),{2,3,4,5,6,7,8,9,10,11,12,13,14,1 5},{12.7,4.3,3.2,2.8,2.6,2.4,2.4,2.3,2.3,2.2,2.2,2.2,2.1,2})), which illustrates how complex operations can be completed automatically using built in MS-Excel functions If you do not want to copy this rather long statement, you can use a special function =STU(Range) (see column R ). This function is not a part of the standard MS-Excel package, as its code was written by me. If you know how to program in Visual Basic program language, you can take a look at this code by activating Visual Basic Editor ([Alt]-[F11]) in your MS-Excel window. The code is located in the StudentCoefficients -module. If you do not know how to program in Visual Basic you can still use this function. To do this you shall download the StudentCoefficients.bas file from the class web page and then import it into any of your MS-Excel file. To import, open the Visual Basic Editor,,go to the Import file option ([Ctrl]-[M]), then finish by clicking on appropriate file. Now we can calculate the statistical uncertainty of y-values according to equations (0.9) and (0.11). This is done in column S using standard MS-Excel functions or in column T using my own =DELTAS(Range) function. You can use this function by downloading and installing the StatisticalUncertainty.bas-file, using the same procedure as for StudentCoefficients.bas file. Note that both these modules have to be installed in order for the second one to work.

91 The total uncertainty of y-value, which according to equation (0.12) depends on both instrumental and statistical uncertainty, is calculated in column U. In column V the same uncertainty is found in one step with the help of my own function =DELTAT(Range, Value), which is also available for download as Uncertainty.bas file. It has to be installed in a same way as previous two modules and only works when those two previous modules are installed. Now when we have collected all the data and determined the uncertainty for these data, we are ready to graph y vs. x dependence. Refer to the directions in Graph and Fit Data with MS Excel-section from the University/General Physics laboratory manual on how to do that. The results are presented in the Chart (Y vs. X) worksheet of the sample0.xls file. The x-values for this graph are taken from column A of the Main Data -worksheet, the y-values are taken from column O of the Main Data -worksheet, the x-error bars are from column C, and the y-error bar are from columns U or V. Note that the built-in options of MS-Excel best-fit Trendline allow you to display the R-squared value for the correlation coefficient from equation (0.14) to see how well your data actually fit the linear pattern. Also note that the built-in MS-Excel function =INDEX(INDEX(LINEST(Yrange, Xrange, TRUE, TRUE),1),1) allows you to find the slope of the best-fit line (see cell E15 in the Main Data -worksheet), the built-in MS-Excel function =INDEX(INDEX(LINEST(Yrange, Xrange, TRUE, TRUE),1),2) allows you to find the intercept of the best-fit line (see cell E16 in the Main Data - worksheet), the built-in MS-Excel function =INDEX(INDEX(LINEST(Yrange, Xrange, TRUE, TRUE),2),1) allows you to find the uncertainty of the slope for the best-fit line (see cell G15 in the Main Data -worksheet) and the built-in MS-Excel function =INDEX(INDEX(LINEST(Yrange, Xrange, TRUE, TRUE),1),2) allows you to find the uncertainty of the intercept for the best-fit line (see cell G16 in Main Data - worksheet). This simple, automatic, best-fit line is listed in the legend on our graph as Linear (Y vs. X). Even though these built-in MS-Excel functions are very handy in calculating slopes, intercepts, and their uncertainties, they do not reflect the whole picture as it is described by equations (0.16). The standard MS-Excel functions do not take into account the fact that different y-values may have different uncertainties, which is the case, as we can see

92 from columns U or V on our worksheet. So we can only use these functions when uncertainties of y-values are unknown, alike, or negligible. In these cases the uncertainties of each y-value should not affect the uncertainties of the slope and intercept of the linear graph because the uncertainties for the slope and intercept are only calculated based on the spread of the data points near the best-fit line. In our case, however, the complete analysis, according to equations (0.16), is necessary. The Extra Data worksheet in sample0.xls file illustrates how to do that manually. Columns A, B, C, D and E in this worksheet are intermediate results necessary for different parts of equations (0.16). The quantity from these equations is calculated in the cell B14. The slope of the linear graph is found in the cell B15, the intercept in the cell B16, the uncertainty of the slope in the cell D15 and the uncertainty of the intercept in the cell D16. You can notice that all these values are slightly different from the ones found using the simplified model. Of course all the calculations presented in the Extra Data worksheet are very time consuming. This is why I have created my own MS-Excel functions to fasten the process. The function =SL(Xrange, Yrange, DELTAYrange) calculates the value of the slope in the cell E18 of the Main Data worksheet and is available for download and installation as Slope.bas file from the class web page. The function =INTE(Xrange, Yrange, DELTAYrange) calculates the value of the intercept in the cell E19 of the Main Data worksheet and is available for download and installation as Intercept.bas file from the class web page. The function =DSL(Xrange, Yrange, DELTAYrange) calculates the value of the uncertainty for the slope in the cell G18 of the Main Data worksheet and is available for download and installation as SlopeUncertainty.bas file from the class web page. Finally, the function =DINTE(Xrange, Yrange, DELTAYrange) calculates the value of the uncertainty for the intercept in the cell G19 of the Main Data worksheet and is also available for download and installation as InterceptUncertainty.bas file from the class web page. This more accurate best-fit line is also listed in the legend on our graph as Linear 2 (Y vs. X) in the Chart (Y vs. X) worksheet.

93 Using Microsoft-Excel to Study Distribution of Data Suppose that you have performed a series of measurements for the same value of the physical quantity C. This quantity C is measured in the same imaginary units cooms. The measurements are entered in column A of the Distribution Data - worksheet of the file sample0.xls. Together there are 150 measurements performed to measure the value of quantity C, as it can be seen from cell D5 of the worksheet. There are significant fluctuations between the measurements, as it can be seen from the minimum (=MIN(Range)), the maximum (= MAX(Range) ), and the mean (=AVERAGE(Range)) values calculated in cells D2, D3 and D4 for this distribution. If you are using MS-2003, the sample distribution for this data set can be found using the special Data Analysis tool package available under the Tools -menu of MS-Excel. If you can not find this tool in the menu, you should install it by clicking Add-Ins under Tools menu and choose Analysis ToolPak to be installed. Once the Data Analysis tool is installed, you should run it and choose Histogram from the list of analysis tools. If you are using MS-2007, the sample distribution for this data set can be found using the special Data Analysis tool package available under the Analysis -group of the Data-tab. If you can not find this tool in the menu, you should install it by clicking Click the Microsoft Office Button, and then click Excel Options. Click Add- Ins,and then in the Manage box, select Excel Add-ins. Click Go. In the Add-Ins available box, select the Analysis ToolPak check box, and then click OK. After you load the Analysis ToolPak, the Data Analysis command is available in the Analysis group on the Data-tab. Once the Data Analysis tool is installed, you should run it and choose Histogram from the list of analysis tools. The Histogram tool requires the input range of data. This range consists of the C- values from column A in the worksheet. It will also require the bin range. This range has been created in advance in column F of the Distribution Data -worksheet. I have chosen the size of the bin as 10 cooms. So the data points in column F are starting just below the minimum value of quantity C from the measured data, changing by 10 cooms

94 from the previous cell to the next cell, and ending just above the maximum value of quantity C from the measured data. Column F is used as the bin range for the Histogram tool. The output results are placed in columns G and H. Column G contains the same bin range, column H indicates how many times the values from the bin range were detected during the experiment among the values from the input range from column A. The data in column H is not normalized. This distribution can be normalized if all the values in column H are divided by the total number of coom measurements. These normalized frequencies are entered into column I of the worksheet. The distribution data from columns G and I are then graphed as a histogram in the Histogram chart of the sample0.xls file. This is done based on the standard graphing directions from the Graph and Fit Data with MS Excel-section of the University/General Physics laboratory manual. Columns G and I are now representing the sample distribution of our data, which we just graphed, and it is ready for further analysis to see what parent distribution it fits the best.