Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis. Items withi a sample lose their idividual characteristics i the aalysis. Rather a summary statistic, e.g. sample mea, is used to represet the iformatio i the sample. See the examples of samples below:. A sectio of BA3352 studets i the curret semester is a sample of studets. The the sample size is the umber of studets i the sectio. Differet sectios costitute differet samples. The umber of sectios offered i the curret semester would be the umber of samples. 2. Voters surveyed by a give pollig agecy o a sigle day is a sample. The sample size is the umber of voters surveyed o that particular day. Polls made o differet days costitute differet samples. The umber of the polls is the umber of samples. 3. Customers buyig a particular brad of perfume over a specified moth ca be cosidered as a sample. The sample size is the umber of customers buyig the perfume over the specified moth. Aother sample ca be geerated by cosiderig customers buyig aother brad of perfume. If we cosider four brads of perfumes, we ed up with four samples. The umber of samples ad the sample size ca potetially be cofusig. Sample size is the umber of items withi a group. Number of samples is the umber of groups. Example : After a midterm exam for a course that is give to five sectios of a course, the average exam grade x j i sectio j is computed ad reported below. Sec Sec 2 Sec 3 Sec 4 Sec 5 Average grade 68 72 74 82 7 Suppose that there are 50 studets i each sectio ad use x i,j to deote the ith studet s grade i Sec j. The the average grades are computed by 50 x j = x i,j for j {, 2, 3, 4, 5}. 50 Sice all 50 grades withi a sectio are reduced to a sigle summary statistic (the sample mea), all the studets withi a sectio are represeted merely by the sectio s summary statistic (the sample mea); Idividual studet grades are immaterial for a aalysis that checks if a certai secti is performig better tha the others. Clearly, the sample size is 50 ad the umber of samples is 5. There are two ways to compute the stadard deviatio σ x of sample meas. The first way requires the kowledge of the stadard deviatio σ x of the idividual values withi a sample, the secod way does ot require σ x.. Computig σ x with kow σ x I order to uderstad what we have ad what we wat, first recall that V ar(x) = σ 2 x ad V ar( X) = σ 2 x. Note that V ar(x) is kow ad we wat to compute V ar( X). I order to perform this computatio, we eed to recall the followig propositio from statistics:
Propositio. i) If X is a radom variable ad c is a costat, the V ar(c X) = c 2 V ar(x). ii) If X ad X 2 are two idepedet radom variables, the V ar(x + X 2 ) = V ar(x ) + V ar(x 2 ). Proof: i) First covice yourself that the mea of cx would be c x where x is the mea of X. We start with V ar(c X) ad use the defiitio of variace V ar(cx) = (cx i c x) 2 = c 2 (x i x) 2 = c 2 V ar(x). ii) Agai by usig the defiitio V ar(x + X 2 ) = = = = (x,i + x 2,i x x 2 ) 2 {(x,i x ) 2 + (x 2,i x 2 ) 2 + 2(x,i x )(x 2,i x 2 )} (x,i x ) 2 + (x,i x ) 2 + = V ar(x ) + V ar(x 2 ) (x 2,i x 2 ) 2 + 2 (x 2,i x 2 ) 2 + 0 (x,i x )(x 2,i x 2 ) The fourth equality is due to the fact that X ad X 2 are idepedet so the sum of the cross products is zero. This sum would be the covariace of X ad X 2, if X ad X 2 were ot idepedet. Now Propositio ca be used to relate the variace of the sample mea to the variace of the observatio withi the samples. We start with the defiitio ofthe sample mea, proceed as follows ( ) V ar( X) = V ar X i ( ) ( P rop..i 2 ) = V ar X i ( ) P rop..ii 2 = V ar(x i ) = 2 V ar(x) = V ar(x) () where we use the fact that each idividual observatio has the same variace as the other idividuals: V ar(x ) = V ar(x 2 ) = V ar(x i ) = V ar(x) where X stads for a geeric observatio ad represets oe of X, X 2,... X. This fact is assumed whe costructig samples; otherwise, we would be groupig apples with orages. Give () which relates varaices, relatig the stadard deviatios is easy. Just take the square root of the both sides i () to arrive at σ x = σ x. (2) 2
Example 2: Refer to Example ad suppose that the idivudual scores has a stadard deviatio of 20, compute the stadard deviatio of the sample meas. Solutio: We are give σ = 20, sample size is already kow as = 50. The by usig (2), σ x = σ x = 50 20..2 Computig σ x with ukow σ x This method is rather direct; Without σ x, the oly iformatio available is the populatio of the sample meas { x, x 2,... x m } where the umber of samples is deoted by m. We could use this populatio to estimate the stadard deviatio of the sample meas. First let us compute the variace: V ar( X) = m where x is the grad mea which ca be computed by x = m ( x j x) 2 x j. Fially the stadard deviatio of the sample mea is σ x = ( x j x) m 2. (3) Example 3: Refer to Example ad compute the stadard deviatio of the sample meas from the populatio {68, 72, 74, 82, 7}. Solutio: First we compute the grad mea x = m x j = 73.4. The the stadard deviatio of the sample meas by (3) is σ x = 5 {(68 73.4)2 + (72 73.4) 2 + (74 73.4) 2 + (82 73.4) 2 + (7 73.4) 2 }..3 Remark Whe σ x is ukow, you must use (3) to compute σ x. I this case, you do ot have ay choice. Whe σ x is kow, you have to choose betwee equatios (2) ad (3). Uless otherwise is specified, use (2) to fid σ x. Ratioale here is that the computatio i (2) is exact whereas (3) gives you oly a estimate. The geeral priciple applies: use the iformatio available to you as much as possible ad refrai from estimatio uless absolutely ecessary. 3
2 Exercise Questios. Every year about 500 people apply for UTD s full time MBA program. Over the years it has bee observed that GMAT score of each of these people are distributed ormally with mea 600 ad variace 300. a) If UTD decides to accept all applicats whose GMAT score is above 620, o average how may people will be accepted per year? b) If UTD decides to accept 50 studets with highest GMAT scores every year, what should be the cut off GMAT score (lowest score amog the 50 accepted studets). 2. Draw a Ishikawa diagram listig the possible causes of your midterm grade. Iclude Eviromet, Materials, Method, Persoel, etc. 3. Read Cotiuous Improvemet o the Free-Throw Lie pp.42-44 of the textbook. I couple seteces explai a process from your ow life, which you have improved by studyig reasos for failure or substadard performace. Example processes are parallel parkig, speakig i public, washig dishes, fidig the closest parkig spot to your office/class, etc. 4. The DFW passeger data below pertais to the first eight moths of 200. Suppose that every moth has 30 days. Number of passegers flyig out of DFW airport per day ad the umber of passegers who are searched per day are: Ja Feb Mar Apr May Ju Jul Aug ȳ Ja ȳ F eb ȳ Mar ȳ Apr ȳ May ȳ Ju ȳ Jul ȳ Aug Average # of passegers/day 5000 4000 2600 3300 4700 400 6800 7500 z Ja z F eb z Mar z Apr z May z Ju z Jul z Aug Average # of searched passegers/day 47 53 6 4 42 44 5 43 The average umber of passegers per day is computed as follows. Let y i,j be the umber of the passegers o the ith day of moth j. The average umber of passegers per day for moth j is ȳ j defied as ȳ j = 30 y i,j 30 for j {Ja, F eb, Mar, Apr, May, Ju, Jul, Aug}. The average umber of passegers searched per day is computed similarly. Let z i,j be the umber of the passegers searched o the ith day of moth j. The average umber of passegers searched per day for moth j is z j defied as z j = 30 z i,j 30 for j {Ja, F eb, Mar, Apr, May, Ju, Jul, Aug}. a) What is the sample size for computig averages i the table? b) Suppose that the stadard deviatio of the umber of passegers (y i,j ) flyig out of DFW every day is 3000, what is the stadard deviatio of the average umber of passegers (ȳ j ) flyig out of DFW per day? c) Assumig a Normal distributio for the umber of passegers, how may sigmas (σ) will give you a Type I error of 20% for a x-chart o the average umber of passegers flyig out of DFW per day? 5. Refer to questio 4. a) Fid out 3-sigma UCL ad LCL for a x chart o the average umber of passegers flyig out of DFW 4
per day. b) Is the process i cotrol durig the first eight moths? Explai. 6. Refer to questio 4. a) Compute the variace of the average umber of passegers searched ( z j ) per day durig the first eight moths. I other words, fid the variace of the populatio { z Ja, z F eb, z Mar, z Apr, z May, z Ju, z Jul, z Aug } by usig the data i the table. Let us call this variace σ 2 z. b) Compute the ratio of σ 2 z to the grad mea of the averages of the passegers searched per day durig the first eight moths. Lookig at this ratio ad cosiderig the fact that the umber of searches per day is a iteger umber, what distributio would be appropriate to study the umber of searches? c) What are UCL ad LCL for a 2.5-sigma c-cotrol chart for the umber of passegers searched per day? 7. Refer to questio 4. a) Obtai the proportio r j of passegers searched per day for each moth. I other words, costruct the populatio { r Ja, r F eb, r Mar, r Apr, r May, r Ju, r Jul, r Aug } by usig the data i the table. b) Compute the grad mea ad the variace σ 2 r of the populatio i a). c) What are UCL ad LCL for a 2.5-sigma p-cotrol chart for the proportio of passegers searched? 8. Refer to questios 4,6 ad 7. Below are average umber of passegers ad average the umber of passegers searched i September ad October 200. Sep Oct Average umber of passegers/day 900 6200 Average umber of searched passegers/day 57 63 Usig c- ad p-cotrol charts obtaied i questios 6 ad 7 ad the recet umbers above determie if a) The umber of passegers searched per day is i cotrol? b) The proportio of passegers searched per day is i cotrol? c) How ca you recocile your aswers if you say yes to either a) or b) above, ad o to the other? 5