Sme Statistical Prcedures and Functins with Excel Intrductry Nte: Micrsft s Excel spreadsheet prvides bth statistical prcedures and statistical functins. The prcedures are accessed by clicking n Tls in the task bar at the tp f the Excel screen. Frm the Tls menu, chse Data Analysis and frm the menu presented, chse the apprpriate prcedure. NOTE ABOUT THE DATA ANALYSIS TOOLS: Excel cmes with the Data Analysis tl pack, but this tl pack is an Add In. If yu have never used the Data Analysis tls, yu must first click n Tls, then click n Add-Ins. Click n the bxes fr Analysis Tlpak and Analysis Tlpak- VBA s that a check appears in each bx. Then click n OK, and yu will nw be able t bring up the Data Analysis tls. Excel s statistical functins are built-in frmulas that carry ut certain calculatins. T use them, enter the apprpriate frmula in a cell and give the frmula all the arguments it requires. Yu may already be familiar with sme f these functins frm accunting r finance curses, where yu may have learned t calculate present values, payments n a debt at a given interest rate, r the sum f a clumn f figures. Descriptive Statistics: Chse Tls Data Analysis. Frm the list which appears chse Descriptive Statistics and click OK A dialg bx appears. Mark the range which cntains the data, which yu shuld have previusly entered. If there is a data label in the first rw, mark that label and check the bx fr Labels in first rw. Indicate an utput range in yur wrksheet by either entering the address f the first cell r clicking n the cell. Check the bx fr Summary Statistics and click OK. Output lks like this: Scres Mean 28.2 Standard Errr 7.144228 Median 24 Mde #N/A Standard Deviatin 15.97498 Sample Variance 255.2 Kurtsis -0.40167 Skewness 0.891063 Range 38 Minimum 14 Maximum 52 Sum 141 Cunt 5 Cnfidence Level(95.0%) 19.8356 Ntes: Scres is the label frm the first line f the clumn cntaining data Standard Errr is the sample standard deviatin divided by the square rt f the sample size Standard Deviatin is a sample value Cnfidence Level is the errr in the estimate f a cnfidence interval, calculated using the t distributin; that is, cnfidence level = t 0.95 s x A 95% cnfidence interval is implicit in this utput: it is 28.2 ± 19.84
Excel, page 2 T calculate an interval fr a different cnfidence level: in the Descriptive Statistics dialg bx there is an entry fr Cnfidence Level fr Mean. This is the cnfidence level f the interval t be calculated. Fr a hypthesis test, t = (X - µ 0 )/s X. Fr the hypthesis test H 0 : µ 25 vs. H 1 : µ > 25, fr example, we wuld have t = (28.2 25)/7.144 = 0.4479. We culd then use the TDIST functin t determine the p-value f the test. Excel als prvides spreadsheet frmulas fr descriptive statistics. T use these, enter an = sign in a cell, fllwed by the frmula with apprpriate range designatin AVERAGE(RANGE) : returns the arithmetic mean STDEV(RANGE) : returns the sample standard deviatin STDEVP(RANGE) : returns the ppulatin standard deviatin VAR(RANGE) : returns the sample variance VARP(RANGE) : returns the ppulatin variance MEDIAN(RANGE) Prbability Functins in Excel MODE(RANGE) COUNT(RANGE) : returns the number f cells in the range which cntain numberic data. Nte that the cunt functin des nt cunt blank cells r cells cntaining alphabetic infrmatin (wrds). Binmial Prbabilities: BINOMDIST(x 0, n, π, CUMULATIVE) CUMULATIVE takes the values TRUE r FALSE ; false returns the prbability f the individual number f successes, while true returns the value P(x x 0 ) BINOMDIST(4, 12,.3, false) = 0.23114 is the prbability f 4 successes in 12 trials with prbability f success = 0.3 fr each trial BINOMDIST(4, 12,.3, false) = 0.723655 is the prbability f 4 r fewer successes in 12 trials T wrk repeated prblems, create a specialized wrksheet. Fr example, in cell A5, enter Prb x =, in cell B5 enter =binmdist(b2,b3,b4,false) and in cell B6 enter =binmdist(b2,b3,b4, true) b2 is the entry cell fr the number f success, b3 fr the number f trials and b4 fr π enter yur wn labels fr cells a2 t a4 and a6 Pissn Prbabilities: POISSON(x, µ, CUMULATIVE) CUMULATIVE takes the values TRUE and FALSE, fr cumulative r individual values remember that Pissn prbabilities depend entirely n the expected value µ Expnential Prbabilities: EXPONDIST(t 0, r, CUMULATIVE) CUMULATIVE will usually take the value TRUE r is the rate f ccurrence and t 0 is the interval until first ccurrence, thus this frmula returns P(t t 0 ) EXPONDIST(2,0.5,true) = 0.632121 is the prbability that the first success will ccur within 2 minutes if the average rate f ccurrence is 0.5 per minute t find P(t > t 0 ) enter 1 EXPONDIST Nrmal Prbabilities: NORMDIST(x 0, µ, σ, CUMULATIVE)
Excel, page 3 If CUMULATIVE has value TRUE this frmula returns P(x x 0 ) fr the nrmal distributin with given µ, σ =nrmdist(20,25,5,true) = 0.1587 is the prbability f values less than r equal t 20 n a nrmal distributin with µ = 25 and σ = 5 NORMINV(PROBABILITY, µ, σ) this frmula returns the x 0 such that P(x x 0 ) has the prbability entered in the frmula NORMINV(.975, 200, 20) = 239.2; n a nrmal distributin with mean 200 and standard deviatin 20,.975 f the distributin is less than 239.2 NORMSDIST(z 0 ): returns P(z z 0 ) NORMSINV(PROBABILITY): returns z 0 such that P(z z 0 ) has the given prbability T wrk repeated prblems, create a specialized spreadsheet: fr example, in Cell A4 enter Prb (x <= x0); in cell B4, enter =NORMDIST(B6, B7, B8, TRUE). In A5 enter Prb( x > x0) and in B5 enter =1-B4). Then enter an x value in B6, mean in B7, and standard deviatin in B8. Yu will f curse want t enter labels in A6 t A8. t Distributin Prbabilities: TDIST(t, degrees f freedm, tails) x µ 0 t is a calculated value frm the frmula t = r ther t frmulas which s x we will encunter degrees f freedm will depend n the prblem; in simple hypthesis tests, we have df = n 1 tails takes the value 1 r 2, depending n whether it s a ne-tailed r twtailed test the result f tdist is the prbability f a t value as great as that actually btained; it is the area under the graph f the t distributin beynd the calculated value f t. If we specify 1 fr tails, it is the area in ne tail beynd the calculated value; if we specify 2 fr tails, it is the area in the tails beynd ±t. in hypthesis testing, the result f the TDIST frmula is the p-value f the test. TDIST(3.15, 9, 1) = 0.00362; TDIST(1.93, 22, 2) = 0.0666 TINV(prbability, degrees f freedm) returns a t value with the specified prbability split between the tw tails used fr finding t values fr use with cnfidence intervals TINV(0.05, 22) = 2.073875 gives the t value that wuld be used fr calculating a 95% cnfidence interval with a sample f n = 23 r fr finding critical t values: fr a tw-tailed test, enter the significance level fr prbability ; fr a tw-tailed test, enter twice the significance level fr prbability TINV(0.01, 44) = 2.692286 is the critical value fr a tw-tailed test at 1% significance with 44 degrees f freedm TINV(0.1, 26) = 1.705616 is the critical value fr an upper ne-tailed test at 5% significance with 26 degrees f freedm; 1.705616 is the critical value fr a lwer ne-tailed test with same cnditins Nrmal Prbabilities and z tests: Sample Prblems and Applicatins Fr a cmpact mdel f micrwave ven, the average pwer used is 750 watts with standard deviatin 10 watts. What is the prbability that a randmly selected ven uses less than 735 watts? Slutin: use NORMDIST(735, 750, 10, true) What prprtin f these vens draw mre than 720 watts?
Excel, page 4 Slutin: use NORMDIST(720, 750, 10, true). The result is the prprtin that use less than 720, and the required answer is 1 that value, r 1 NORMDIST(720, 750, 10, true) Hw much pwer d the lwest 25% f these vens use? Slutin: use NORMINV(0.25,750,10). The result is the number f watts such that 25% use that many r fewer watts. Hw much pwer d the highest 10% f these vens use? Slutin: since 10% use mre, 90% use less. Enter NORMINV(0.9,750,10). The result is a wattage figure such that nly 10% f the vens use that much r mre. If we chse a sample f 25 f these vens, what is the prbability that the mean pwer usage will be mre 755 watts? Slutin: This questin refers t the distributin f sample means; that distributin has µ X = 750; the relevant standard deviatin is the standard errr f the mean σ X. Calculate that value: σ X = σ/ n = 10/5 = 2. Then use the nrmdist functin: =1 NORMDIST(755,750,2,TRUE) The thickness f steel plates is nrmally distributed with σ = 0.05 mm. Fr a sample f 30 plates, X = 22mm. Calculate a 95% cnfidence interval fr the mean diameter f all plates. Slutin: This prblem requires the use f z values which demarcate the middle 95% f a nrmal distributin, 2-1/2% n each end. T find the apprpriate values, enter =NORMSINV(0.025) and/r NORMSINV(0.975). The numbers yu get will have the same abslute value. A general frmula t find the z values fr a cnfidence interval wuld be = NORMSINV(1 (cnfidence level)/200) Prbabilities and Hypthesis Tests with the t Distributin: The amunts custmers spend at Ye Olde Antique Barne are skewed upwards. In a sample f 57 custmers we find a sample mean f $312 with standard deviatin $70. Find a 90% cnfidence interval fr the average spending at YOAB. Slutin: This prblem requires the use f t values. T find the crrect t-value enter TINV(0.1, degrees f freedm). A general frmula wuld be =TINV((100-cnfidence level)/100, df). Ntice that the TINV functin uses bth tails f the distributin. Use the infrmatin frm the preceding prblem t test the hyptheses H 0 : µ 320 vs. H 1 : µ < 320. Use 5% significance level. Slutin: Calculate the t value t = (X µ 0 )/s X ; in this case s X = 70/7.5498 = 9.2717, s t = (312 320)/9.2717 = 0.8628. p-value apprach: use TDIST t find the p-value: enter TDIST(0.8628,56,1). The result is the prbability f a t value as large as r larger than 0.8628, and that is the p-value f the test. (We actually want the prbability f values as small r smaller than 0.8628, but by the symmetry f the t distributin, that is the same. NOTE: the t value entered must be a psitive number. If yu are setting up a spreadsheet t d a number f these prblems, use the ABS, r abslute value, functin. Fr example, the expressin TDIST(ABS(B4),B5-1,1) wuld give the p-value in a ne-tailed test fr the t value entered in cell B4 and degrees f freedm equal t the sample size, entered in cell B5, minus 1. Critical value apprach. Use TINV t find the critical values. Enter TINV(0.1, 56). The result is the critical value fr an upper ne-tailed test at 5% significance. Ntice that t find critical values fr a netailed test, we must enter TWO TIMES the significance level.
Excel, page 5 Fr a lwer ne-tailed test, we use the same prcedure but append a minus sign t the critical t value. Fr a tw-tailed test, we enter the significance level f the test as the prbability. Cpper tubing must have an average diameter f 0.575 in; diameters are knwn t be nrmally distributed. In a sample f 20 sectins f pipe, the mean diameter is 0.569 in with standard deviatin 0.04 in. At 5% significance level, des the tubing meet the standard? Slutin: this is a hypthesis test f H 0 : µ = 0.575 vs. H 1 : µ 0.575. The t statistic = -0.6708. p-value: use TDIST(0.6708,19,2); this will give the p-value f the test. critical value: t find the critical values, enter TINV(0.05,19). The result is 2.0930, and the decisin rule is Reject H 0 if t > +2.0930 r if t < 2.0930.