6 SAMPLING METHODS 6 Using Statistics 6-6 2 Nonprobability Sapling and Bias 6-6 Stratified Rando Sapling 6-2 6 4 Cluster Sapling 6-4 6 5 Systeatic Sapling 6-9 6 6 Nonresponse 6-2 6 7 Suary and Review of Ters 6-24 Case 2 The Boston Redevelopent Authority 6-27 LEARNING OBJECTIVES After studying this chapter, you should be able to: Apply nonprobability sapling ethods. Decide when to use a stratified sapling ethod. Copute estiates fro stratified saple results. Decide when to use a cluster sapling ethod. Copute estiates fro cluster sapling results. Decide when to use a systeatic sapling ethod. Copute estiates fro systeatic sapling results. Avoid nonresponse biases in estiates.
6 Using Statistics Throughout this book, we have always assued that inforation is obtained through rando sapling. The ethod we have used until now is called siple rando sapling. In siple rando sapling, we assue that our saple is randoly chosen fro the entire population of interest, and that every set of n eleents in the population has an equal chance of being selected as our saple. We assue that a randoization device is always available to the experienter. We also assue that the entire population of interest is known, in the sense that a rando saple can be drawn fro the entire population where every eleent has an equal chance of being included in our saple. Randoly choosing our saple fro the entire population in question is our insurance against sapling bias. This was deonstrated in Chapter 5 with the story of the Literary Digest. But do these conditions always hold in real sapling situations? Also, are there any easier ways ore efficient, ore econoical ways of drawing rando saples? Are there any situations where, instead of randoly drawing every single saple point, we ay randoize less frequently and still obtain an adequately rando saple for use in statistical inference? Consider the situation where our population is ade up of several groups and the eleents within each group are siilar to one another, but different fro the eleents in other groups (e.g., sapling an econoic variable in a city where soe people live in rich neighborhoods and thus for one group, while others live in poorer neighborhoods, foring other groups). Is there a way of using the hoogeneity within the groups to ake our sapling ore efficient? The answers to these questions, as well as any other questions related to sapling ethodology, are the subject of this chapter. In the next section, we will thoroughly explore the idea of rando sapling and discuss its practical liitations. We will see that biases ay occur if saples are chosen without randoization. We will also discuss criteria for the prevention of such selection biases. In the following sections, we will present ore efficient and involved ethods of drawing rando saples that are appropriate in different situations. 6 2 Nonprobability Sapling and Bias The advantage of rando sapling is that the probabilities that the saple estiator will be within a given nuber of units fro the population paraeter it estiates are known. Sapling ethods that do not use saples with known probabilities of selection are known as nonprobability sapling ethods. In such sapling ethods, we have no objective way of evaluating how far away fro the population paraeter our estiate ay be. In addition, when we do not select our saple randoly out of the entire population of interest, our sapling results ay be biased. That is, the average value of the estiate in repeated sapling is not equal to the paraeter of interest. Put siply, our saple ay not be a true representative of the population of interest. A arket research fir wants to estiate the proportion of consuers who ight be interested in purchasing the Spanish sherry Jerez if this product were available at liquor stores in this country. How should inforation for this study be obtained? EXAMPLE 6 6-
6-2 Chapter 6 Solution The population relevant to our case is not a clear-cut one. Before ebarking on our study, we need to define our population ore precisely. Do we ean all consuers in the United States? Do we ean all failies of consuers? Do we ean only people of drinking age? Perhaps we should consider our population to be only those people who, at least occasionally, consue siilar drinks. These are iportant questions to answer before we begin the sapling survey. The population ust be defined in accordance with the purpose of the study. In the case of a proposed new product such as Jerez, we are interested in the product s potential arket share. We are interested in estiating the proportion of the arket for alcoholic beverages that will go to Jerez once it is introduced. Therefore, we define our population as all people who, at least occasionally, consue alcoholic beverages. Now we need to know how to obtain a rando saple fro this population. To obtain a rando saple out of the whole population of people who drink alcoholic beverages at least occasionally, we ust have a frae. That is, we need a list of all such people, fro which we can randoly choose as any people as we need for our saple. In reality, of course, no such list is available. Therefore, we ust obtain our saple in soe other way. Market researchers send field workers to places where consuers ay be found, usually shopping alls. There shoppers are randoly selected and prescreened to ascertain that they are in the population of interest in this case, that they are people who at least occasionally consue alcoholic beverages. Then the selected people are given a taste test of the new product and asked to fill out a questionnaire about their response to the product and their future purchase intent. This ethod of obtaining a rando saple works as long as the interviewers do not choose people in a nonrando fashion, for exaple, choosing certain types of people because of their appearance. If this should happen, a bias ay be introduced if the variable favoring selection is soehow related to interest in the product. Another point we ust consider is the requireent that people selected at the shopping all constitute a representative saple fro the entire population in which we are interested. We ust consider the possibility that potential buyers of Jerez ay not be found in shopping alls, and if their proportion outside the alls is different fro what it is in the alls where surveys take place, a bias will be introduced. We ust consider the location of shopping alls and ust ascertain that we are not favoring soe segents of the population over others. Preferably, several shopping alls, located in different areas, should be chosen. We should randoize the selection of people or eleents in our saple. However, if our saple is not chosen in a purely rando way, it ay still suffice for our purposes as long as it behaves as a purely rando saple and no biases are introduced. In designing the study, we should collect a few rando saples at different locations, chosen at different ties and handled by different field workers, to iniize the chances of a bias. The results of different saples validate the assuption that we are indeed getting a representative saple. 6 Stratified Rando Sapling In soe cases, a population ay be viewed as coprising different groups where eleents in each group are siilar to one another in soe way. In such cases, we ay gain sapling precision (i.e., reduce the variance of our estiators) as well as reduce the costs of the survey by treating the different groups separately. If we consider these groups, or strata, as separate subpopulations and draw a separate rando saple fro each stratu and cobine the results, our sapling ethod is called stratified rando sapling. In stratified rando sapling, we assue that the population of N units ay be divided into groups with N i units in group i, i,...,. The
Sapling Methods 6- strata are nonoverlapping and together they ake up the total population: N N 2 N N. We define the true weight of stratu i as W i N i N. That is, the weight of stratu i is equal to the proportion of the size of stratu i in the whole population. Our total saple, of size n, is divided into subsaples fro each of the strata. We saple n i ites in stratu i, and n n 2 n n. We define the sapling fraction in stratu i as f i n i N i. The true ean of the entire population is, and the true ean in stratu i is i. The variance of stratu i is i2, and the variance of the entire population is 2. The saple ean in stratu i is X i, and the cobined estiator, the saple ean in stratified rando sapling, X st is defined as follows: The estiator of the population ean in stratified rando sapling is X st = a W i X i (6 ) In siple rando sapling with no stratification, the stratified estiator in equation 6 is, in general, not equal to the siple estiator of the population ean. The reason is that the estiator in equation 6 uses the true weights of the strata W i. The siple rando sapling estiator of the population ean is X ( all data X ) n ( n i X i ) n. This is equal to X st only if we have n i n N i N for each stratu, that is, if the proportion of the saple taken fro each stratu is equal to the proportion of each stratu in the entire population. Such a stratification is called stratification with proportional allocation. Following are soe iportant properties of the stratified estiator of the population ean:. If the estiator of the ean in each stratu X i is unbiased, then the stratified estiator of the ean X st is an unbiased estiator of the population ean. 2. If the saples in the different strata are drawn independently of one another, then the variance of the stratified estiator of the population ean X st is given by V ( X st ) = a V ( X i ) (6 2) W i 2 where V( X i ) is the variance of the saple ean in stratu i.. If sapling in all strata is rando, then the variance of the estiator, given in equation 6 2, is further equal to V ( st ) = a W i 2 a 2 i X b( - f (6 ) n i ) i When the sapling fractions f i are sall and ay be ignored, we get W 2 2 i i n i V ( X st ) = a (6 4)
6-4 Chapter 6 4. If the saple allocation is proportional [n i n(n i N ) for all i], then V ( st ) = - f X (6 5) n a W i 2 i which reduces to ( n) W i i 2 when the sapling fraction is sall. Note that f is the sapling fraction, that is, the size of the saple divided by the population size. In addition, if the population variances in all the strata are equal, then = 2 V ( X st ) (6 6) n when the sapling fraction is sall. Practical Applications In practice, the true population variances in the different strata are usually not known. When the variances are not known, we estiate the fro our data. An unbiased estiator of i 2, the population variance in stratu i, is given by S 2 (X - X i ) 2 i = a data in stratu i n i - (6 7) The estiator in equation 6 7 is the usual unbiased saple estiator of the population variance in each stratu as a separate population. A particular estiate of the variance in stratu i will be denoted by s i2. If sapling in each stratu is rando, then an unbiased estiator of the variance of the saple estiator of the population ean is S 2 (X st ) = a a W 2 i S 2 i n i b( - f i ) (6 8) Any of the preceding forulas apply in the special situations where they can be used with the estiated variances substituted for the population variances. Confidence Intervals We now give a confidence interval for the population ean obtained fro stratified rando sapling.
Sapling Methods 6-5 A ( ) 00% confidence interval for the population ean using stratified sapling is Xst Z a>2 s ( Xst ) (6 9) where s( Xst ) is the square root of the estiate of the variance of Xst given in equation 6 8. When the saple sizes in soe of the strata are sall and the population variances are unknown, but the populations are at least approxiately noral, we use the t distribution instead of Z. We denote the degrees of freedo of the t distribution by df. The exact value of df is difficult to deterine, but it lies soewhere between the sallest n i (the degrees of freedo associated with the saple fro stratu i) and the su of the degrees of freedo associated with the saples fro all strata i =(n i ). An approxiation for the effective nuber of degrees of freedo is given by Effective df = c a N i (N i - n i )s 2 2 i >n i d a [N i (N i - n i )>n i ] 2 s 4 i >(n i - ) (6 0) We deonstrate the application of the theory of stratified rando sapling presented so far by the following exaple. Once a year, Fortune agazine publishes the Fortune Service 500, a list of the largest service copanies in the United States. The 500 firs belong to six ajor industry groups. The industry groups and the nuber of firs in each group are listed in Table 6. The 500 firs are considered a coplete population: the population of the top 500 service copanies in the United States. An econoist who is interested in this population wants to estiate the ean net incoe of all firs in the index. However, obtaining the data for all 500 firs in the index is difficult, tie-consuing, or costly. Therefore, the econoist wants to gather a rando saple of the firs, copute a quick average of the net incoe for the firs in the saple, and use it to estiate the ean net incoe for the entire population of 500 firs. EXAMPLE 6 2 TABLE 6 The Fortune Service 500 Group Nuber of Firs. Diversified service copanies 00 2. Coercial banking copanies 00. Financial service copanies (including savings and insurance) 50 4. Retailing copanies 50 5. Transportation copanies 50 6. Utilities 50 500
6-6 Chapter 6 Solution The econoist believes that firs in the sae industry group share coon characteristics related to net incoe. Therefore, the six groups are treated as different strata, and a rando saple is drawn fro each stratu. The weights of each of the strata are known exactly as coputed fro the strata sizes in Table 6. Using the definition of the population weights W i N i N, we get the following weights: W N N 00 500 0.2 W 2 N 2 N 00 500 0.2 W N N 50 500 0. W 4 N 4 N 50 500 0. W 5 N 5 N 50 500 0. W 6 N 6 N 50 500 0. The econoist decides to select a rando saple of 00 of the 500 firs listed in Fortune. The econoist chooses to use a proportional allocation of the total saple to the six strata (another ethod of allocation will be presented shortly). With proportional allocation, the total saple of 00 ust be allocated to the different strata in proportion to the coputed strata weights. Thus, for each i, i,..., 6, we copute n i as n i nw i. This gives the following saple sizes: n 20 n 2 20 n 0 n 4 0 n 5 0 n 6 0 We will assue that the net incoe values in the different strata are approxiately norally distributed and that the estiated strata variances (to be estiated fro the data) are the true strata variances i2, so that the noral distribution ay be used. The econoist draws the rando saples and coputes the saple eans and variances. The results, in illions of dollars (for the eans) and in illions of dollars squared (for the variances), are given in Table 6 2, along with the saple sizes in the different strata and the strata weights. Fro the table, and with the aid of equation 6 for the ean and equation 6 5 for the variance of the saple ean (with the estiated saple variances substituted for the population variances in the different strata), we will now copute the stratified saple ean and the estiated variance of the stratified saple ean. 6 x st = a W i x i = (0.2)(52.7) + (0.2)(2.6) + (0.)(85.6) + (0.)(2.6) + (0.)(8.9) + (0.)(52.) = $66.2 illion and s(x st ) = A - f n 6 a W i s 2 i 0.8 [(0.2)(97,650) + (0.2)(64,00) + (0.)(76,990) + (0.)(8,20) A 00 (0.)(9,07) (0.)(8,500)] 2.08
Sapling Methods 6-7 TABLE 6 2 Sapling Results for Exaple 6 2 Stratu Mean Variance n i W i 52.7 97,650 20 0.2 2 2.6 64,00 20 0.2 85.6 76,990 0 0. 4 2.6 8,20 0 0. 5 8.9 9,07 0 0. 6 52. 8,500 0 0. (Our sapling fraction is f 00 500 0.2.) Our unbiased point estiate of the average net incoe of all firs in the Fortune Service 500 is $66.2 illion. Using equation 6 9, we now copute a 95% confidence interval for, the ean net incoe of all firs in the index. We have the following: x st z a> 2 s(x st ) = 66.2 (.96)(2.08) = [20.88,.6] Thus, the econoist ay be 95% confident that the average net incoe for all firs in the Fortune Service 500 is anywhere fro 20.88 to.6 illion dollars. Incidentally, the true population ean net incoe for all 500 firs in the index is $6.496 illion. The Teplate Figure 6 shows the teplate that can be used to estiate population eans by stratified sapling in the exaple. Stratified Sapling for the Population Proportion The theory of stratified sapling extends in a natural way to sapling for the population proportion p. Let the saple proportion in stratu i be P $ i X i n i, where X i is the nuber of successes in a saple of size n i. Then the stratified estiator of the population proportion p is the following. FIGURE 6 The Teplate for Estiating Means by Stratified Sapling [Stratified Sapling.xls; Sheet: Mean] 2 4 5 6 7 8 9 0 2 4 5 6 7 8 9 20 2 22 2 24 A B C E G H I J K L M N O P Q Stratified Sapling for Estiating Fortune Service 500 Stratu N n x-bar s 2 00 20 52.7 97650 2 00 20 2.6 6400 50 0 85.6 76990 X-bar V(X-bar) S(X-bar) df 4 50 0 2.6 820 66.2 52.586 2.0777 80 5 50 0 8.9 907 6 50 0 52. 8500 ( ) CI for X-bar 7 95% 66.2 + or - 45.9264 or 20.986 to 2.046 8 9 0 2 4 5 6 7 8 9 20 Total 500 00
6-8 Chapter 6 The stratified estiator of the population proportion p is P $ st = a W i P $ i (6 ) where the weights W i are defined as in the case of sapling for the population ean: W i N i N. The following is an approxiate expression for the variance of the estiator of the population proportion P $ st, for use with large saples. The approxiate variance of Pˆst is where Q $ P $ i i. 2 V( st ) = a W P$ iq $ i P $ i (6 2) n i When finite-population correction factors f i ust be considered, the following expression is appropriate for the variance of : P $ st V (P $ st) = N 2 a P $ iq $ N i 2 i (N i - n i ) (N i - )n i (6 ) When proportional allocation is used, an approxiate expression is V (P $ st) = - f n a W i P $ iq $ i (6 4) Let us now return to Exaple 6, sapling for the proportion of people who ight be interested in purchasing the Spanish sherry Jerez. Suppose that the arket researchers believe that preferences for iported wines differ between consuers in etropolitan areas and those in other areas. The area of interest for the survey covers a few states in the Northeast, where it is known that 65% of the people live in etropolitan areas and 5% live in nonetropolitan areas. A saple of 0 people randoly chosen at shopping alls in etropolitan areas shows that 28 are interested in Jerez, while a rando saple of 70 people selected at alls outside the etropolitan areas shows that 8 are interested in the sherry. Let us use these results in constructing a 90% confidence interval for the proportion of people in the entire population who are interested in the product. Fro equation 6, using two strata with weights 0.65 and 0.5, we get p$ st = a 2 W i p$ i = (0.65) 28 0 + (0.5) 8 70 = 0.2
Sapling Methods 6-9 Our allocation is proportional because n 0, n 2 70, and n 0 70 200, so that n n 0.65 W and n 2 n 0.5 W 2. In addition, the saple sizes of 0 and 70 represent tiny fractions of the two strata; hence, no finite-population correction factor is required. The equation for the estiated variance of the saple estiator of the proportion is therefore equation 6 4 without the finite-population correction: V (P $ st) = n a 2 0.0008825 W i p$ i q$ [(0.65)(0.25)(0.785) + (0.5)(0.257)(0.74)] 200 The standard error of P $ is therefore 2V(P $ st st) 0.0008825 0.0297. Thus, our 90% confidence interval for the population proportion of people interested in Jerez is p$ st z a>2 s(p $ st) = 0.2 (.645)(0.0297) = [0.8, 0.279] The stratified point estiate of the percentage of people in the proposed arket area for Jerez who ay be interested in the product, if it is introduced, is 2%. A 90% confidence interval for the population percentage is 8.% to 27.9%. The Teplate Figure 6 2 shows the teplate that can be used to estiate population proportions by stratified sapling. The data in the figure correspond to the Jerez exaple. What Do We Do When the Population Strata Weights Are Unknown? Here and in the following subsections, we will explore soe optional, advanced aspects of stratified sapling. When the true strata weights W i = N i /N are unknown that is, when we do not know what percentage of the whole population belongs to each FIGURE 6 2 The Teplate for Estiating Proportions by Stratified Sapling [Stratified Sapling.xls; Sheet: Proportion] 2 4 5 6 7 8 9 0 2 4 5 6 7 8 9 20 2 22 2 24 A B C E G H I J K L M N O P Q Stratified Sapling for Estiating Proportion Jerez Stratu N n x p-hat 650000 0 28 0.258 2 50000 70 8 0.2574 P-hat V(P-hat) S(P-hat) 4 0.200 0.00088 0.0297 5 6 ( ) CI for P 7 90% 0.200 + or - 0.0489 or 0.8 to 0.2789 8 9 0 2 4 5 6 7 8 9 20 Total 000000 200
6-0 Chapter 6 stratu we ay still use stratified rando sapling. In such cases, we use estiates of the true weights, denoted by w i. The consequence of using estiated weights instead of the true weights is the introduction of a bias into our sapling results. The interesting thing about this kind of bias is that it is not eliinated as the saple size increases. When errors in the stratu weights exist, our results are always biased; the greater the errors, the greater the bias. These errors also cause the standard error of the saple ean s( Xst) to underestiate the true standard deviation. Consequently, confidence intervals for the population paraeter of interest tend to be narrower than they should be. How Many Strata Should We Use? The nuber of strata to be used is an iportant question to consider when you are designing any survey that uses stratified sapling. In any cases, there is a natural breakdown of the population into a given nuber of strata. In other cases, there ay be no clear, unique way of separating the population into groups. For exaple, if age is to be used as a stratifying variable, there are any ways of breaking the variable and foring strata. The two guidance rules for constructing strata are presented below. Rules for Constructing Strata. The nuber of strata should preferably be less than or equal to 6. 2. Choose the strata so that Cu fx2 is approxiately constant for all strata [where Cu fx2 is the cuulative square root of the frequency of X, the variable of interest]. The first rule is clear. The second rule says that in the absence of other guidelines for breaking down a population into strata, we partition the variable used for stratification into categories so that the cuulative square root of the frequency function of the variable is approxiately equal for all strata. We illustrate this rule in the hypothetical case of Table 6. As can be seen in this siplified exaple, the cobined age groups 20 0, 5, and 6 45 all have a su of 2f equal to 5; hence, these groups ake good strata with respect to age as a stratifying variable according to rule 2. Postsapling Stratification At ties, we conduct a survey using siple rando sapling with no stratification, and after obtaining our results, we note that the data ay be broken into categories of siilar eleents. Can we now use the techniques of stratified rando sapling and enjoy its benefits in ters of reduced variances of the estiators? Surprisingly, the answer is yes. In fact, if the subsaples in each of our strata contain at least 20 eleents, and if our estiated weights of the different strata w i (coputed fro the data as n i n, or fro ore accurate inforation) are close to the true population strata weights W i, then our stratified estiator will be alost as good as that of stratified rando sapling with proportional allocation. This procedure is called poststratification. TABLE 6 Constructing Strata by Age Age Frequency f 2f Cu 2f 20 25 26 0 6 4 5 5 25 5 5 6 40 4 2 4 45 9 5
Sapling Methods 6- We close this section with a discussion of an alternative to proportional allocation of the saple in stratified rando sapling, called optiu allocation. Optiu Allocation With optiu allocation, we select the saple sizes to be allocated to each of the strata so as to iniize one of two criteria. Either we iniize the cost of the survey for a given value of the variance of our estiator, or we iniize the variance of our estiator for a given cost of taking the survey. We assue a cost function of the for C = C 0 + a C i n i (6 5) where C is the total cost of the survey, C 0 is the fixed cost of setting up the survey, and C i is the cost per ite sapled in stratu i. Clearly, the total cost of the survey is the su of the fixed cost and the costs of sapling in all the strata (where the cost of sapling in stratu i is equal to the saple size n i ties the cost per ite sapled C i ). Under the assuption of a cost function given in equation 6 5, the optiu allocation that will iniize our total cost for a fixed variance of the estiator, or iniize the variance of the estiator for a fixed total cost, is as follows. Optiu allocation: n i n = W i i > 2C i a W i i > 2C i (6 6) Equation 6 6 has an intuitive appeal. It says that for a given stratu, we should take a larger saple if the stratu is ore variable internally (greater i ), if the relative size of the stratu is larger (greater W i ), or if sapling in the stratu is cheaper (saller C i ). If the cost per unit sapled is the sae in all the strata (i.e., if C i c for all i), then the optiu allocation for a fixed total cost is the sae as the optiu allocation for fixed saple size, and we have what is called the Neyan allocation (after J. Neyan, although this allocation was actually discovered earlier by A. A. Tschuprow in 92). The Neyan allocation: n i n = W i i a W i i (6 7) Suppose that we want to allocate a total saple of size,000 to three strata, where stratu has weight 0.4, standard deviation, and cost per sapled ite of 4 cents; stratu 2 has weight 0.5, standard deviation 2, and cost per ite of 9 cents;
6-2 Chapter 6 and stratu has weight 0., standard deviation, and cost per ite of 6 cents. How should we allocate this saple if optiu allocation is to be used? We have a W i i = (0.4)() 2C i 24 + (0.5)(2) 29 + (0.)() 26 = 0.608 Fro equation 6 6, we get n n = W > 2C a (W i i > 2C i ) n 2 n = W 2 2 > 2C 2 a (W i i > 2C i ) n n = W > 2C a (W i i > 2C i ) = = = (0.4)()> 24 0.608 (0.5)(2)> 29 0.608 (0.)()> 26 0.608 = 0.29 = 0.548 = 0.2 The optiu allocation in this case is 29 ites fro stratu ; 548 ites fro stratu 2; and 2 ites fro stratu (aking a total of,000 saple ites, as specified). Let us now copare this allocation with proportional allocation. With a saple of size,000 and a proportional allocation, we would allocate our saple only by the stratu weights, which are 0.4, 0.5, and 0., respectively. Therefore, our allocation will be 400 fro stratu ; 500 fro stratu 2; and 00 fro stratu. The optiu allocation is different, as it incorporates the cost and variance considerations. Here, the difference between the two sets of saple sizes is not large. Suppose, in this exaple, that the costs of sapling fro the three strata are the sae. In this case, we can use the Neyan allocation and get, fro equation 6 7, n n = W = a W i i n 2 n = W 2 2 = a W i i n n = W = a W i i (0.4)().7 (0.5)(2).7 (0.)().7 = 0.25 = 0.588 = 0.76 Thus, the Neyan allocation gives a saple of size 25 to stratu ; 588 to stratu 2; and 76 to stratu. Note that these subsaples add only to 999, due to rounding error. The last saple point ay be allocated to any of the strata.
Sapling Methods 6- FIGURE 6 The Teplate for Optial Allocation for Stratified Sapling [Stratified Sapling.xls; Sheet: Allocation] 2 4 5 6 7 8 9 0 2 4 5 6 7 8 9 20 2 22 2 24 25 26 A B C D E H I J K L M Optiu Allocation Planned Total Saple Size 000 Stratu W C n 0.4 $ 4.00 29 Fixed Cost 2 0.5 2 $ 9.00 548 Variable Cost $ 8,26.00 0. $ 6.00 2 Total Cost $ 8,26.00 4 5 6 7 8 9 0 2 4 5 6 7 8 9 20 Total Actual total 000 In general, stratified rando sapling gives ore precise results than those obtained fro siple rando sapling: The standard errors of our estiators fro stratified rando sapling are usually saller than those of siple rando sapling. Furtherore, in stratified rando sapling, an optiu allocation will produce ore precise results than a proportional allocation if soe strata are ore expensive to saple than others or if the variances within strata are different fro one another. The Teplate Figure 6 shows the teplate that can be used to estiate population proportions by stratified sapling. The data in the figure correspond to the exaple we have been discussing. The sae teplate can be used for Neyan allocation. In colun E, enter the sae cost C, say $, for all the strata. PROBLEMS 6. A securities analyst wants to estiate the average percentage of institutional holding of all publicly traded stocks in the United States. The analyst believes that stocks traded on the three ajor exchanges have different characteristics and therefore decides to use stratified rando sapling. The three strata are the New York Stock Exchange (NYSE), the Aerican Exchange (AMEX), and the Over the Counter (OTC) exchange. The weights of the three strata, as easured by the nuber of stocks listed in each exchange, divided by the total nuber of stocks, are NYSE, 0.44; AMEX, 0.5; OTC, 0.4. A total rando saple of 200 stocks is selected, with proportional allocation. The average percentage of institutional holdings of the subsaple selected fro the issues of the NYSE is 46%, and the standard deviation is 8%. The corresponding results for the AMEX are 9% average institutional holdings and a standard deviation of 4%, and the corresponding results for the OTC stocks are 29% average institutional holdings and a standard deviation of 6%. a. Give a stratified estiate of the ean percentage of institutional holdings per stock. b. Give the standard error of the estiate in a.
6-4 Chapter 6 c. Give a 95% confidence interval for the ean percentage of institutional holdings. d. Explain the advantages of using stratified rando sapling in this case. Copare with siple rando sapling. 6 2. A copany has 2,00 eployees belonging to the following groups: production,,200; arketing, 600; anageent, 00; other, 200. The copany president wants to obtain an estiate of the views of all eployees about a certain ipending executive decision. The president knows that the anageent eployees views are ost variable, along with eployees in the other category, while the arketing and production people have rather unifor views within their groups. The production people are the ost costly to saple, because of the tie required to find the at their different jobs, and the anageent people are easiest to saple. a. Suppose that a total saple of 00 eployees is required. What are the saple sizes in the different strata under proportional allocation? b. Discuss how you would design an optiu allocation in this case. 6. Last year, consuers increasingly bought fleece (industry jargon for hotselling jogging suits, which now rival jeans as the unifor for casual attire). A New York designer of jogging suits is interested in the new trend and wants to estiate the aount spent per person on jogging suits during the year. The designer knows that people who belong to health and fitness clubs will have different buying behavior than people who do not. Furtherore, the designer finds that, within the proposed study area, 8% of the population are ebers of health and fitness clubs. A rando saple of 00 people is selected, and the saple is proportionally allocated to the two strata: ebers of health clubs and nonebers of health clubs. It is found that aong ebers, the average aount spent is $52.4 and the standard deviation is $25.77, while aong the nonebers, the average aount spent is $5. and the standard deviation is $5.. a. What is the stratified estiate of the ean? b. What is the standard error of the estiator? c. Give a 90% confidence interval for the population ean. d. Discuss one possible proble with the data. (Hint: Can the data be considered norally distributed? Why?) 6 4. A financial analyst is interested in estiating the average aount of a foreign loan by U.S. banks. The analyst believes that the aount of a loan ay be different depending on the bank, or, ore precisely, on the extent of the bank s involveent in foreign loans. The analyst obtains the following data on the percentage of profits of U.S. banks fro loans to Mexico and proposes to use these data in the construction of stratu weights. The strata are the different banks: First Chicago, %; Manufacturers Hanover, 27%; Bankers Trust, 2%; Cheical Bank, 9%; Wells Fargo Bank, 9%; Citicorp, 6%; Mellon Bank, 6%; Chase Manhattan, 5%; Morgan Guarantee Trust, 9%. a. Construct the stratu weights for proportional allocation. b. Discuss two possible probles with this study. 6 4 Cluster Sapling Let us consider the case where we have no frae (i.e., no list of all the eleents in the population) and the eleents are clustered in larger units. Each unit, or cluster, contains several eleents of the population. In this case, we ay choose to use the ethod of cluster sapling. This ay also be the case when the population is large and spread over a geographic area in which saller subregions are easily sapled and where a siple rando saple or a stratified rando saple ay not be carried out as easily.
Sapling Methods 6-5 Suppose that the population is coposed of M clusters and there is a list of all M clusters fro which a rando saple of clusters is selected. Two possibilities arise. First, we ay saple every eleent in every one of the selected clusters. In this case, our sapling ethod is called single-stage cluster sapling. Second, we ay select a rando saple of clusters and then select a rando saple of n eleents fro each of the selected clusters. In this case, our sapling ethod is called two-stage cluster sapling. The Relation with Stratified Sapling In stratified sapling, we saple eleents fro every one of our strata, and this assures us of full representation of all segents of the population in the saple. In cluster sapling, we saple only soe of the clusters, and although eleents within any cluster ay tend to be hoogeneous, as is the case with strata, not all the clusters are represented in the saple; this leads to lowered precision of the cluster sapling ethod. In stratified rando sapling, we use the fact that the population ay be broken into subgroups. This usually leads to a saller variance of our estiators. In cluster sapling, however, the ethod is used ainly because of ease of ipleentation or reduction in sapling costs, and the estiates do not usually lead to ore precise results. Single-Stage Cluster Sapling for the Population Mean Let n, n 2,..., n be the nuber of eleents in each of the sapled clusters. Let X, X 2,..., X be the eans of the sapled clusters. The cluster sapling unbiased estiator of the population ean is given as follows. The cluster sapling estiator of is X cl = a n i X i a n i (6 8) An estiator of the variance of the estiator of in equation 6 8 is s 2 (X cl ) = M - a ni 2 (X i - X cl ) 2 Mn 2 - (6 9) where n ( n i ) is the average nuber of units in the sapled clusters. Single-Stage Cluster Sapling for the Population Proportion The cluster sapling estiator of the population proportion p is P $ cl = a n i P $ i where the P $ i are the proportions of interest within the sapled clusters. a n i (6 20)
6-6 Chapter 6 The estiated variance of the estiator in equation 6 20 is given by s 2 (P $ cl) = M - a ni 2 (P $ i - P $ cl) 2 Mn 2 - (6 2) We now deonstrate the use of cluster sapling for the population ean with the following exaple. EXAMPLE 6 Solution J. B. Hunt Transport Copany is especially interested in lowering fuel costs in order to survive in the tough world of deregulated trucking. Recently, the copany introduced new easures to reduce fuel costs for all its trucks. Suppose that copany trucks are based in 0 centers throughout the country and that the copany s anageent wants to estiate the average aount of fuel saved per truck for the week following the institution of the new easures. For reasons of lower cost and adinistrative ease, anageent decides to use single-stage cluster sapling, select a rando saple of 20 trucking centers, and easure the weekly fuel saving for each of the trucks in the selected centers (each center is a cluster). The average fuel savings per truck, in gallons, for each of the 20 selected centers are as follows (the nuber of trucks in each center is given in parentheses): 2 (8), 22 (8), (9), 4 (0), 28 (7), 25 (8), 8 (0), 24 (2), 9 (), 20 (6), 0 (8), 26 (9), 2 (9), 7 (8), (0), 29 (8), 24 (8), 26 (0), 8 (0), 22 (). Fro these data, copute an estiate of the average aount of fuel saved per truck for all Hunt s trucks over the week in question. Also give a 95% confidence interval for this paraeter. Fro equation 6 8, we get x cl = a n i x i = a n i [2(8) 22(8) (9) 4(0) 28(7) 25(8) 8(0) 24(2) 9() 20(6) 0(8) 26(9) 2(9) 7(8) (0) 29(8) 24(8) 26(0) 8(0) 22()] (8 8 9 0 7 8 0 2 6 8 9 9 8 0 8 8 0 0 ) 2.8 Fro equation 6 9, we find that the estiated variance of our saple estiator of the ean is s 2 (X cl ) = M - an 2 i (x i - x cl ) 2 Mn 2-0 - 20 = (0)(20)(9) 2[82 (2-2.8) 2 + 8 2 (22-2.8) 2 =.587 20 + 9 2 ( - 2.8) 2 +... + 2 (22-2.8) 2 ]>9 Using the preceding inforation, we construct a 95% confidence interval for as follows: x cl.96s (X cl) = 2.8.962.587 = [9.6, 24.0]
Sapling Methods 6-7 Thus, based on the sapling results, Hunt s anageent ay be 95% confident that average fuel savings per truck for all trucks over the week in question is anywhere fro 9.6 to 24.0 gallons. The Teplates The teplate for estiating a population ean by cluster sapling is shown in Figure 6 4. The data in the figure correspond to Exaple 6. The teplate for estiating a population proportion by cluster sapling is shown in Figure 6 5. FIGURE 6 4 The Teplate for Estiating Means by Cluster Sapling [Cluster Sapling.xls; Sheet: Mean] 2 4 5 6 7 8 9 0 2 4 5 6 7 8 9 20 2 22 2 24 A B C D E F G H I J K L M Cluster Sapling J. B. Hunt Transport Co. Cluster n x-bar 8 2 8 9 4 0 5 7 6 8 7 0 8 2 9 0 6 8 2 9 9 4 8 5 0 6 8 7 8 8 0 9 0 20 Total 80 2 22 4 28 25 8 24 9 20 0 26 2 7 29 24 26 8 22 M 0 20 n-bar 9 X-bar V(X-bar) S(X-bar) 2.8.5869.2597 95% ( ) CI for X-bar 2.8 + or - 2.46902 or 9.64 to 24.024 FIGURE 6 5 The Teplate for Estiating Proportions by Cluster Sapling [Cluster Sapling.xls; Sheet: Proportion] 2 4 5 6 7 8 9 0 2 4 5 6 7 8 9 20 2 22 2 24 25 A B C D E F G H I J K L M Cluster Sapling Cluster n p 2 8 65 22 0. 0.26 0.28 4 5 6 7 8 9 0 2 4 5 6 7 8 9 20 Total 05 M 2 n-bar 5 P V(P) s(p) 0.2705 8.4E-05 0.0098 95% ( ) CI for P 0.2705 + or - 0.0799 or 0.2505 to 0.28904
6-8 Chapter 6 Two-Stage Cluster Sapling When clusters are very large or when eleents within each cluster tend to be siilar, we ay gain little inforation by sapling every eleent within the selected clusters. In such cases, selecting ore clusters and sapling only soe of the eleents within the chosen clusters ay be ore econoical. The forulas for the estiators and their variances in the case of two-stage cluster sapling are ore coplicated and ay be found in advanced books on sapling ethodology. PROBLEMS 6 5. There are 602 aerobics and fitness centers in Japan (up fro 70 five years ago). Adidas, the European aker of sports shoes and apparel, is very interested in this fastgrowing potential arket for its products. As part of a arketing survey, Adidas wants to estiate the average incoe of all ebers of Japanese fitness centers. (Mebers of one such club pay $62 to join and another $62 per onth. Adidas believes that the average incoe of all fitness club ebers in Japan ay be higher than that of the general population, for which census data exist.) Since travel and adinistrative costs for conducting a siple rando saple of all ebers of fitness clubs throughout Japan would be prohibitive, Adidas decided to conduct a cluster sapling survey. Five clubs were chosen at rando out of the entire collection of 602 clubs, and all ebers of the five clubs were interviewed. The following are the average incoes (in U.S. dollars) for the ebers of each of the five clubs (the nuber of ebers in each club is given in parentheses): $7,27 (560), $4,8 (45), $28,800 (890), $5,498 (7), $47,446 (20). Give the cluster sapling estiate of the population ean incoe for all fitness club ebers in Japan. Also give a 90% confidence interval for the population ean. Are there any liitations to the ethodology in this case? 6 6. Israel s kibbutzi are by now well diversified beyond their agrarian roots, producing everything fro lollipops to plastic pipe. These 282 widely scattered counes of several hundred ebers aintain hundreds of factories and other production facilities. An econoist wants to estiate the average annual revenues of all kibbutz production facilities. Since each kibbutz has several production units, and since travel and other costs are high, the econoist wants to consider a saple of 5 randoly chosen kibbutzi and find the annual revenues of all production units in the selected kibbutzi. Fro these data, the econoist hopes to estiate the average annual revenue per production unit in all 282 kibbutzi. The saple results are as follows: Total Kibbutz Annual Revenues Kibbutz Nuber of Production Units (in illions of dollars) 4 4.5 2 2 2.8 6 8.9 4 2.2 5 5 7.0 6 2.2 7 2 2. 8 0.8 9 8 2.5 0 4 6.2 5.5 2 6.2 2.8 4 5 9.0 5 2.4
Sapling Methods 6-9 Fro these data, copute the cluster sapling estiate of the ean annual revenue of all kibbutzi production units, and give a 95% confidence interval for the ean. 6 7. Under what conditions would you use cluster sapling? Explain the differences aong cluster sapling, siple rando sapling, and stratified rando sapling. Under what conditions would you use two-stage cluster sapling? Explain the difference between single-stage and two-stage cluster sapling. What are the liitations of cluster sapling? 6 8. Recently a survey was conducted to assess the quality of investent brokers. A rando saple of 6 brokerage houses was selected fro a total of 27 brokerage houses. Each of the brokers in the selected brokerage houses was evaluated by an independent panel of industry experts as highly qualified (HQ) or was given an evaluation below this rating. The designers of the survey wanted to estiate the proportion of all brokers in the entire industry who would be considered highly qualified. The survey results are in the following table. Brokerage House Total Nuber of Brokers Nuber of HQ Brokers 20 80 2 50 75 200 00 4 00 65 5 88 45 6 260 200 Use the cluster sapling estiator of the population proportion to estiate the proportion of all highly qualified brokers in the investent industry. Also give a 99% confidence interval for the population proportion you estiated. 6 9. Forty-two cruise ships coe to Alaska s Glacier Bay every year. The state tourist board wants to estiate the average cruise passenger s satisfaction fro this experience, rated on a scale of 0 to 00. Since the ships arrivals are evenly spread throughout the season, siple rando sapling is costly and tie-consuing. Therefore, the agency decides to send its volunteers to board the first five ships of the season, consider the as clusters, and randoly choose 50 passengers in each ship for interviewing. a. Is the ethod eployed single-stage cluster sapling? Explain. b. Is the ethod eployed two-stage cluster sapling? Explain. c. Suppose that each of the ships has exactly 50 passengers. Is the proposed ethod single-stage cluster sapling? d. The 42 ships belong to 2 cruise ship copanies. Each copany has its own characteristics in ters of price, luxury, services, and type of passengers. Suggest an alternative sapling ethod, and explain its benefits. 6 5 Systeatic Sapling Soeties a population is arranged in soe order: files in a cabinet, crops in a field, goods in a warehouse, etc. In such cases drawing our rando saple in a systeatic way ay be easier than generating a siple rando saple that would entail looking for particular ites within the population. To select a systeatic saple of n eleents fro a population of N eleents, we divide the N eleents in the population into n groups of k eleents and then use the following rule: We randoly select the first eleent out of the first k eleents in the population, and then we select every kth unit afterward until we have a saple of n eleents.
6-20 Chapter 6 For exaple, suppose k 20, and we need a saple of n 0 ites. We randoly select the first ite fro the integers to 20. If the rando nuber selected is, then our systeatic saple will contain the eleents, 20, 20 5,..., and so on until we have 0 eleents in our saple. A variant of this rule, which solves the probles that ay be encountered when k is not an integer ultiple of the saple size n (their product being N ), is to let k be the nearest integer to N/n. We now regard the N eleents as being arranged in a circle (with the last eleent preceding the first eleent). We randoly select the first eleent fro all N population ebers and then select every kth ite until we have n ites in our saple. The Advantages of Systeatic Sapling In addition to the ease of drawing saples in a systeatic way for exaple, by siply easuring distances with a ruler in a file cabinet and sapling every fixed nuber of inches the ethod has soe statistical advantages as well. First, when k N/n, the saple estiator of the population ean is unbiased. Second, systeatic sapling is usually ore precise than siple rando sapling because it actually stratifies the population into n strata, each stratu containing k eleents. Therefore, systeatic sapling is approxiately as precise as stratified rando sapling with one unit per stratu. The difference between the two ethods is that the systeatic saple is spread ore evenly over the entire population than a stratified saple, because in stratified sapling the saples in the strata are drawn separately. This adds precision in soe cases. Systeatic sapling is also related to cluster sapling in that it aounts to selecting one cluster out of a population of k clusters. Estiation of the Population Mean in Systeatic Sapling The systeatic sapling estiator of the population ean is X sy = n a X i n (6 22) The estiator is, of course, the sae as the siple rando sapling estiator of the population ean based on a saple of size n. The variance of the estiator in equation 6 22 is difficult to estiate fro the results of a single saple. The estiation requires soe assuptions about the order of the population. The estiated variances of X sy in different situations are given below.. When the population values are assued to be in no particular order with respect to the variable of interest, the estiated variance of the estiator of the ean is the sae as in the case of siple rando sapling s 2 (X sy ) = N - n Nn S 2 (6 2) where S 2 is the usual saple variance, and the first ter accounts for finitepopulation correction as well as division by n.
Sapling Methods 6-2 2. When the ean is constant within each stratu of k eleents but different fro stratu to stratu, the estiated variance of the saple ean is s 2 (X sy ) = N - n Nn n a (X i - X i + k ) 2 2(n - ) (6 24). When the population is assued to be either increasing or decreasing linearly in the variable of interest, and when the saple size is large, the appropriate estiator of the variance of our estiator of the ean is s 2 (X sy ) = N - n Nn n a (X i - 2X i + k + X i + 2k ) 2 6(n - ) (6 25) for i n 2. There are forulas that apply in ore coplicated situations as well. We deonstrate the use of systeatic sapling with the following exaple. An investor obtains a copy of The Wall Street Journal and wants to get a quick estiate of how the New York Stock Exchange has perfored since the previous day. The investor knows that there are about 2,00 stocks listed on the NYSE and wants to look at a quick saple of 00 stocks and deterine the average price change for the saple. The investor thus decides on an every 2st systeatic sapling schee. The investor uses a ruler and finds that this eans that a stock should be selected about every.5 inches along the listings coluns in the Journal. The first stock is randoly selected fro aong the first 2 stocks listed on the NYSE by using a randonuber generator in a calculator. The selected stock is the seventh fro the top, which happens to be ANR. For the day in question, the price change for ANR is 0.25. The next stock to be included in the saple is the one in position 7 2 28th fro the top. The stock is Aflpb, which on this date had a price change of 0 fro the previous day. As entioned, the selection is not done by counting the stocks, but by the faster ethod of successively easuring.5 inches down the colun fro each selected stock. The resulting saple of 00 stocks gives a saple ean of x sy 0.5 and S 2 0.6. Give a 95% confidence interval for the average price change of all stocks listed on the NYSE. We have absolutely no reason to believe that the order in which the NYSE stocks are listed in The Wall Street Journal (i.e., alphabetically) has any relationship to the stocks price changes. Therefore, the appropriate equation for the estiated variance of X sy is equation 6 2. Using this equation, we get EXAMPLE 6 4 Solution s 2 (X sy ) = N - n Nn S 2 = 2,00-00 20,000 0.6 = 0.004
6-22 Chapter 6 A 95% confidence interval for, the average price change on this day for all stocks on the NYSE, is therefore x sy.96s ( X sy ) = 0.5 (.96)(20.004) = [0.86, 0.64] The investor ay be 95% sure that the average stock on the NYSE gained anywhere fro $0.86 to $0.64. When sapling for the population proportion, use the sae equations as the ones used for siple rando sapling if it ay be assued that no inherent order exists in the population. Otherwise use variance estiators given in advanced texts. The Teplate The teplate for estiating a population ean by systeatic sapling is shown in Figure 6 6. The data in the figure correspond to Exaple 6 4. FIGURE 6 6 The Teplate for Estiating Population Means by Systeatic Sapling [Systeatic Sapling.xls; Sheet: Sheet ] 2 4 5 6 7 8 9 0 2 A B C D E F G H I J Systeatic Sapling Average price change N n x - bar s 2 200 00 0.5 0.6 X - bar 0.5 V(X - bar) 0.00429 Assuing that the population is in no particular order. S(X - bar) 0.058554 α (-α) CI for X-bar 95% 0.5 + or - 0.476 or 0.8524 to 0.6476 PROBLEMS 6 0. A tire anufacturer aintains strict quality control of its tire production. This entails frequent sapling of tires fro large stocks shipped to retailers. Saples of tires are selected and run continuously until they are worn out, and the average nuber of iles driven in the laboratory is noted. Suppose a warehouse contains,000 tires arranged in a certain order. The copany wants to select a systeatic saple of 50 tires to be tested in the laboratory. Use randoization to deterine the first ite to be sapled, and give the rule for obtaining the rest of the saple in this case. 6. A large discount store gives its sales personnel bonuses based on their average sale aount. Since each salesperson akes hundreds of sales each onth, the store anageent decided to base average sale aount for each salesperson on a rando saple of the person s sales. Since records of sales are kept in books, the use of systeatic sapling is convenient. Suppose a salesperson has ade 855 sales over the onth, and anageent wants to choose a saple of 0 sales for estiation of the average aount of all sales. Suggest a way of doing this so that no probles would result due to the fact that 855 is not an integer ultiple of 0. Give
Sapling Methods 6-2 the first eleent you choose to select, and explain how the rest of the saple is obtained. 6 2. An accountant always audits the second account and every fourth account thereafter when sapling a client s accounts. a. Does the accountant use systeatic sapling? Explain. b. Explain the probles that ay be encountered when this sapling schee is used. 6. Beer sales in a tavern are cyclical over the week, with large volue during weekend nights, lower volue during the beginning of the week, and soewhat higher volue at idweek. Explain the possible probles that could arise, and the conditions under which they ight arise, if systeatic sapling were used to estiate beer sales volue per night. 6 4. A population is coposed of 00 ites arranged in soe order. Every stratu of 0 ites in the order of arrangeent tends to be siilar in its values. An every 0th systeatic saple is selected. The first ite, randoly chosen, is the 6th ite, and its value is 20. The following ites in the saple are, of course, the 6th, the 26th, etc. The values of all ites in the systeatic saple are as follows: 20, 25, 27, 4, 28, 22, 28, 2, 7,. Give a 90% confidence interval for the population ean. 6 5. Explain the relationship between the ethod eployed in proble 6 4 and the ethod of stratified rando sapling. Explain the differences between the two ethods. 6 6 Nonresponse Nonresponse to saple surveys is one of the ost serious probles that occur in practical applications of sapling ethodology. The proble is one of loss of inforation. For exaple, suppose that a survey questionnaire dealing with soe issue is ailed to a randoly chosen saple of 500 people and that only 00 people respond to the survey. The question is: What can you say about the 200 people who did not respond? This is a very iportant question, with no iediate answer, precisely because the people did not respond; we know nothing about the. Suppose that the questionnaire asks for a yes or no answer to a particular public issue over which people have differing views, and we want to estiate the proportion of people who would respond yes. People ay have such strong views about the issue that those who would respond no ay refuse to respond altogether. In this case, the 200 nonrespondents to our survey will contain a higher proportion of no answers than the 00 responses we have. But, again, we would not know about this. The result will be a bias. How can we copensate for such a possible bias? We ay want to consider the population as ade up of two strata: the respondents stratu and the nonrespondents stratu. In the original survey, we anaged to saple only the respondents stratu, and this caused the bias. What we need to do is to obtain a rando saple fro the nonrespondents stratu. This is easier said than done. Still, there are ways we can at least reduce the bias and get soe idea about the proportion of yes answers in the nonresponse stratu. This entails callbacks: returning to the nonrespondents and asking the again. In soe ail questionnaires, it is coon to send several requests for response, and these reduce the uncertainty. There ay, however, be hard-core refusers who just do not want to answer the questionnaire. These people are likely to have distinct views about the issue in question, and if you leave the out, your conclusions will reflect a significant bias. In such a situation, gathering a sall rando saple of the hard-core refusers and offering the soe onetary reward for their answers ay be useful. In cases where people ay find the question ebarrassing or ay worry about revealing their
6-24 Chapter 6 personal views, a rando-response echanis ay be used; here the respondent randoly answers one of two questions, one is the sensitive question, and the other is an innocuous question of no relevance. The interviewer does not know which question any particular respondent answered but does know the probability of answering the sensitive question. This still allows for coputation of the aggregated response to the sensitive question while protecting any given respondent s privacy. 6 7 Suary and Review of Ters In this chapter, we considered soe advanced sapling ethods that allow for better precision than siple rando sapling, or for lowered costs and easier survey ipleentation. We concentrated on stratified rando sapling, the ost iportant and useful of the advanced ethods and one that offers statistical advantages of iproved precision. We then discussed cluster sapling and systeatic sapling, two ethods that are used priarily for their ease of ipleentation and reduced sapling costs. We entioned a few other advanced ethods, which are described in books devoted to sapling ethodology. We discussed the proble of nonresponse. ADDITIONAL PROBLEMS 6 6. Blooingdale s ain store in New York has the following departents on its ezzanine level: Stendahl, Ralph Lauren, Beauty Spot, and Lauder Prescriptives. The ezzanine level is anaged separately fro the other levels, and during the store s postholiday sale, the level anager wanted to estiate the average sales aount per custoer throughout the sale. The following table gives the relative weights of the different departents (known fro previous operation of the store), as well as the saple eans and variances of the different strata for a total saple of,000 custoers, proportionally allocated to the four strata. Give a 95% confidence interval for the average sale (in dollars) per custoer for the entire level over the period of the postholiday sale. Stratu Weight Saple Mean Saple Variance Stendahl 0.25 65.00 2.00 Ralph Lauren 0.5 87.00 2.80 Beauty Spot 0.5 52.00 88.85 Lauder Prescriptives 0.25 8.50 00.40 Note: We assue that shoppers visit the ezzanine level to purchase fro only one of its departents. Since the brands and designers are copetitors, and since shoppers are known to have a strong brand loyalty in this arket, the assuption sees reasonable. 6 7. A state departent of transportation is interested in sapling couters to deterine certain of their characteristics. The departent arranges for its field workers to board buses at rando as well as stop private vehicles at intersections and ask the couters to fill out a short questionnaire. Is this ethod cluster sapling? Explain. 6 8. Use systeatic sapling to estiate the average perforance of all stocks in one of the listed stock exchanges on a given day. Copare your results with those reported in the edia for the day in question. 6 9. An econoist wants to estiate average annual profits for all businesses in a given counity and proposes to draw a systeatic saple of all businesses listed
Sapling Methods 6-25 in the local Yellow Pages. Coent on the proposed ethodology. What potential proble do you foresee? 6 20. In an every 2rd systeatic sapling schee, the first ite was randoly chosen to be the 7th eleent. Give the nubers of 6 saple ites out of a population of 20. 6 2. A quality control sapling schee was carried out by Sony for estiating the percentage of defective radios in a large shipent of,000 containers with 00 radios in each container. Twelve containers were chosen at rando, and every radio in the was checked. The nubers of defective radios in each of the containers are 8, 0, 4,,, 6, 9, 0, 2, 7, 6, 2. Give a 95% confidence interval for the proportion of defective radios in the entire shipent. 6 22. Suppose that the radios in the,000 containers of proble 6 2 were produced in five different factories, each factory known to have different internal production controls. Each container is arked with a nuber denoting the factory where the radios were ade. Suggest an appropriate sapling ethod in this case, and discuss its advantages. 6 2. The akers of Taster s Choice instant coffee want to estiate the proportion of underfilled jars of a given size. The jars are in 4 warehouses around the country, and each warehouse contains crates of cases of jars of coffee. Suggest a sapling ethod, and discuss it. 6 24. Cadbury, Inc., is interested in estiating people s responses to a new chocolate. The copany believes that people in different age groups differ in their preferences for chocolate. The copany believes that in the region of interest, 25% of the population are children, 55% are young adults, and 20% are older people. A proportional allocation of a total rando saple of size,000 is undertaken, and people s responses on a scale of 0 to 00 are solicited. The results are as follows. For the children, x 90 and s 5; for the young adults, x 82 and s ; and for the older people, x 88 and s 6. Give a 95% confidence interval for the population average rating for the new chocolate. 6 25. For proble 6 24, suppose that it costs twice as uch oney to saple a child as the younger and older adults, where costs are the sae per sapled person. Use the inforation in proble 6 24 (the weights and standard deviations) to deterine an optial allocation of the total saple. 6 26. Refer to the situation in proble 6 24. Suppose that the following relative age frequencies in the population are known: Age Group Frequency Under 0 0.0 0 to 5 0.0 6 to 8 0.05 9 to 22 0.05 2 to 25 0.5 26 to 0 0.5 to 5 0.0 6 to 40 0.0 4 to 45 0.05 46 to 50 0.05 5 to 55 0.05 56 and over 0.05 Define strata to be used in the survey.
6-26 Chapter 6 6 27. Nae two sapling ethods that are useful when there is inforation about a variable related to the variable of interest. 6 28. Suppose that a study was undertaken using a siple rando saple fro a particular population. When the results of the study becae available, they revealed that the population could be viewed as consisting of several strata. What can be done now? 6 29. For proble 6 28, suppose that the population is viewed as coprising 8 strata. Is using this nuber of strata advantageous? Are there any alternative solutions? 6 0. Discuss and copare the three sapling ethods: cluster sapling, stratified sapling, and systeatic sapling. 6. The following table reports return on capital for insurance copanies. Consider the data a population of U.S. insurance copanies, and select a rando saple of firs to estiate ean return on capital. Do the sapling two ways: first, take a systeatic saple considering the entire list as a unifor, ordered population; and second, use stratified rando sapling, the strata being the types of insurance copany. Copare your results. Return on Return on Return on Capital Capital Capital Latest Latest Latest 2 Mos. Copany 2 Mos. Copany 2 Mos. Copany Diversified % Life & Health % Property & Casualty % Marsh & McLennan Cos 25.4 Conseco.7 20th Century Industries 25. Loews.8 First Capital Holding 0.7 Geico 20.4 Aerican Intl Group 4.6 Torchark 8.4 Argonaut Group 7. General Rental 7.4 Capital Holding 9.9 Hartford Stea Boiler 9. Safeco.6 Aerican Faily 8.5 Progressive 0. Leucadia National 27.0 Kentucky Central Life 6. WR Berkley.5 CNA Financial 8.6 Provident Life & Acc. Mercury General 28. Aon 2.8 NWNL 8.4 Selective Insurance.6 Keper.4 UNUM.2 Hanover Insurance 6. Cincinnati Financial 0.7 Liberty Corp 0. St Paul Cos 6.9 Reliance Group 2. Jefferson-Pilot 9.5 Chubb 4. Alexander & Alexander 9.9 USLife 6.7 Ohio Casualty 9. Zenith National Ins 9.8 Aerican Natl Ins 5.2 First Aerican Finl 4.8 Old Republic Intl.4 Monarch Capital 0 Berkshire Hathaway 7. Transaerica 7.5 Washington National.2 ITT 0.2 Uslico 7.7 Broad 8.0 USF&G 6.6 Aetna Life & Cas 8.0 First Executive 0 Xerox 5. Aerican General 8.2 ICH 0 Orion Capital 0.8 Lincoln National 9.2 Freont General 2.6 Sears, Roebuck 7.2 Foreost Corp of Aer 0 Independent Insurance 7.8 Continental Corp 6.6 Cigna 7.0 Alleghany 9.0 Travelers 0 Aerican Bankers 8. Unitrin 6. Fro Annual Report on Aerican Industry, Forbes, January 7, 99. Reprinted by perission of Forbes Magazine. 200 Forbes Inc.CA
Sapling Methods 6-27 CASE 2 The Boston Redevelopent Authority The Boston Redevelopent Authority is andated with the task of iproving and developing urban areas in Boston. One of the Authority s ain concerns is the developent of the counity of Roxbury. This counity has undergone any changes in recent years, and uch interest is given to its future developent. Currently, only 2% of the total land in this counity is used in industry, and 9% is used coercially. As part of its efforts to develop the counity, the EXHIBIT Roxbury Planning Subdistrict COLUMBUS A V HIGHLAND PARK MADISON PARK DUDLEY SO TREMONT ST NEW DUDLEY ST WASHINGTON ST WASHINGTON PARK NORTH LOWER ROXBURY MELNEA CASS BLVD MT PLEASANT SAV MOR MASSACHUSETTS AV Boston Redevelopent Authority is interested in deterining the attitudes of the residents of Roxbury toward the developent of ore business and industry in their region. The authority therefore plans to saple residents of the counity to deterine their views and use the saple or saples to infer about the views of all residents of the counity. Roxbury is divided into planning subdistricts. The population density is believed to be unifor across all subdistricts, and the population of each subdistrict is approxiately proportional to the subdistrict s size. There is no known list of all the people in Roxbury. A ap of the counity is shown in Exhibit. Advise the Boston Redevelopent Authority on designing the survey. HAMPTON GEORGE SHIRLEY- EUSTIS M L KING BLVD WARREN ST SEAVER ST WASHINGTON PARK SOUTH BLUEHILL AV QUINCY GENEVA COLUMBIA RD 0 000 2000 000 FT.5 MILES