PSI RESEARCH TOOLKIT. Sampling Strategies B UILDING R ESEARCH C APACITY

Size: px
Start display at page:

Download "PSI RESEARCH TOOLKIT. Sampling Strategies B UILDING R ESEARCH C APACITY"

Transcription

1 B UILDING R ESEARCH C APACITY PSI s Core Values Bottom Line Health Impact * Private Sector Speed and Efficiency * Decentralization, Innovation, and Entrepreneurship * Long-term Commitment to the People We Serve

2 Research & Metrics Population Services International 1120 Nineteenth Street, NW, Suite 600 Washington, DC PSI Research & Metrics 2007 Population Services International, 2007 Contact Information Virgile Capo-Chichi, PhD 1 and Steven Chapman, PhD 2 1. Regional Researcher, PSI/West and Central Africa 2. Vice President and Director, PSI/Research For more information, please contact: Virgile Capo-Chichi, PhD 08 BP 0876 Tri Postal Cotonou, Benin Telephone vcapo-chichi@psibenin.org

3 SAMPLING STRATEGIES LEARNING OBJECTIVES By the end of this chapter, the reader will be able to: 1. Choose an appropriate sampling strategy for a study. 2. Determine the type of sample size calculation needed. 3. Calculate the sample size. BACKGROUND Sampling is the last major study design step before sending a research brief to the regional researcher for quality control. Through sampling, a small part of a population is selected to represent a much larger population. Its purpose is to minimize the cost of data collection while maintaining the validity and representativeness of quantitative studies. Sampling is a subset of the research strategy. There are several excellent sampling manuals available that outline additional detail to supplement the information discussed in this chapter (DHS, 1996; FHI, 2001; Krotki, 1998; WHO, 1994). HOW-TO-STEPS As shown in Figure 1, there are five steps in the sampling process, each of which is explained below in more detail.

4 FIGURE 1: THE SAMPLING PROCESS DEFINE A SAMPLE POPULATION DEVELOP A SAMPLING FRAME SELECT A SAMPLING METHOD DETERMINE A SAMPLE SIZE SELECT A SAMPLE DEFINE A SAMPLING POPULATION A sampling population consists of everyone who is of interest to the social marketer namely the target population. A sampling unit is an individual member of the sampling population. Table 1 offers some examples of sample populations and sample units. Rarely in social marketing research is the sample unit other than an individual or a retail outlet. DEVELOP A SAMPLING FRAME A sampling frame is a listing of all elements in the sample population. When a sampling frame is assembled correctly, then all sample units have an equal chance of being selected to participate in the study (i.e., they have an equal probability of being included in the sample, which is the total number of sample units selected into a study). 4

5 TABLE 1: EXAMPLES OF SAMPLE POPULATIONS AND SAMPLE UNIT Research Objective Sample Population Sample Unit Segmentation: Youth sexual behavior All Russian youth ages Any Russian youth aged Monitoring: Condom distribution survey Evaluation: Insecticidetreated-net use among reproductive age women in Mali All retail kiosks, grocery stores, pharmacies, or bars located in Benin All women ages living in Mali Any retail kiosk, grocery store, pharmacy, or bar located in Benin Any woman aged living in Mali Ideally, a sampling frame is a complete list of sample units. Theoretically, for a population survey, the census provides the most complete sampling frame. However, the accuracy and currency of the census varies greatly from country to country. For retail studies, a retail census comprising a complete list of sales points would be needed to have a complete sampling frame; that too is rare. Some national statistical offices keep lists of registered businesses, which may be useful as a starting point (Emanuel et al., 2000). Thus, the starting point for establishing a sampling frame varies from country to country. Generally, the sampling frame is assembled in steps (Figure 2). FIGURE 2: A SAMPLING FRAME TOTAL SAMPLE POPULATION CENSUS DISTRICT 1 CENSUS DISTRICT 2 CENSUS DISTRICT 3 ETC. ENUMERATION AREA 1 ENUMERATION AREA 2 ENUMERATION AREA 3 HOUSEHOLD 1 HOUSEHOLD 2 HOUSEHOLD 3 HOUSEHOLD 4 ETC. SAMPLE UNIT 1 SAMPLE UNIT 2 SAMPLE UNIT 3 SAMPLE UNIT 4 ETC. 5

6 Districts, as established by the census, serve as the primary sampling frame. Instead of having a list of everyone in the country, the researchers have a list of districts with known population sizes. A sample of these districts is then selected. For a retail survey, population size is often used as a proxy for the number of outlets in a district. Enumeration areas are the next smallest census units; they may or may not be identifiable. An enumeration area is a relatively small group of dwellings or outlets within a district. A sample of these dwellings or outlets is then selected. Households are the next smallest census unit for audience research purposes. For distribution surveys and social franchising studies, retail outlets and service delivery points are identified at this stage as the sampling units. In some countries, dwellings and outlets have been numbered and mapped. If not, PSI or a research subcontractor can do so. A sample or, in some cases, all of these dwellings and outlets can then be selected. Sample units can now be identified (or have been identified already, in the case of retail outlets). These units are members of the sample population in the household or customers exiting the retail outlet or service delivery point. A sample (or in some cases all) of these can then be selected. There are a variety of methods for selecting an eligible individual from the household (such as a Kish grid or the last birthday method). Therefore, a sampling frame could vary in specificity from a complete list of all sample units in a sample population to a partial list, in some districts, in some enumeration areas therein, and in some households therein, of sample units in a sample population. The two primary objectives of sampling are representativeness and precision. A sample is considered representative of the sample population when the sample characteristics (e.g., age and sex) are similar to those of the population. Representative samples are necessary to draw conclusions about the sample population. Samples that are not representative are biased; those that are representative are unbiased. Precision (i.e., a measure of how close a sample estimate is to the true value of a population characteristic) can perhaps be best explained using an example. In this example, the sample population is 2,500 Russian youth, ages 15 19, living in one town. Researchers select a sample of 100 of them. One would expect that of the 2,500 youth in the town, 50 percent would be males and 50 percent females. But even if the researchers have an ideal, complete sampling frame to begin with, they would be unlikely to draw a sample of 50 boys and 50 girls. If they drew the sample five times, they would likely always get different proportions of boys and girls. Each estimate that researchers make from a sample, such as the proportion of females, has variation associated with it. Therefore, the first time the researchers measure the proportion of females in the hypothetical sample of Russian youth, they find that more than 50 percent are female (Graph 1). The second time, the sample is less than 50 percent female. If they were to take five samples, they might get five different proportions of females. 6

7 GRAPH 1: REPRESENTATION OF SAMPLING VARIATION Proportion of female from five different samples Sample size=100 Population size=2,500 sample 1 sample 2 sample 3 sample 4 sample 5 The larger the sample size, the more precise the estimate. Fortunately, the size or range of this sampling variation is known and is a function of the sample size. Therefore, The larger the sample size, the smaller the sampling variation; The smaller the sampling variation, the more precise the estimate; and The larger the sample size, the more precise the estimate. The purpose of sampling is to produce an unbiased or representative sample that is sufficiently precise, as shown in Figure 3. Determining what is sufficiently precise is presented later in this chapter. FIGURE 3: REPRESENTATIVENESS AND PRECISION 7

8 SELECT A SAMPLING METHOD There are two main sampling methods: those that use probability methods and those that do not (Figure 4). FIGURE 4: DECISION TREE FOR SELECTING A SAMPLING METHOD DO RESULTS NEED TO BE REPRESENTATIVE? YES NO PROBABILITY SAMPLE SIMPLE RANDOM SYSTEMATIC STRATIFIED CLUSTER NON-PROBABILITY SAMPLE CONVENIENCE JUDGMENT QUOTA Probability Sampling Probability sampling means that each sampling unit has a known or calculable and nonzero chance of being selected into the sample. Probability sampling allows researchers to say, for example: In this country, X percent of women of reproductive age use modern contraceptive methods. Two types of probability sampling are simple random sampling and systematic sampling (Figure 5). 8

9 FIGURE 5: PROBABILITY SAMPLING SIMPLE RANDOM AND SYSTEMATIC SAMPLING Is the population homogeneous? Is it feasible to study all m embers of the population? Yes No If the population is divided into subgroups are the subgroups representative of the overall population? No Yes Is the objective of the study only to measure coverage? No Yes Simple Random Sampling Systematic Sampling Stratified Sam pling Cluster Sampling Lot Quality Assurance Sampling Simple Random Sampling Simple random sampling is a method of sampling where sample units are selected from the population entirely by chance. Each sample unit has the same probability of being selected. Two conditions are required to use simple random sampling: The sample population must be homogeneous; that is, the population characteristics (e.g., residence in an urban or rural area or sex) are not expected to result in substantially different rates of behavior or health status. There is an exhaustive list of all sample units. Clearly, such homogeneity and an exhaustive list of all sample units are rare; therefore, the use of simple random sampling within social marketing research is rare. Although in some instances (such as smallscale studies in schools), it has been used successfully. Simple random sampling can be achieved through several approaches; two of which will be presented here. 1. Lottery Method the lottery method uses four steps. i. Identify each sample unit on a piece of paper. ii. Put all the pieces of paper in a container (hat, box, or so forth). iii. Mix all the pieces of paper so that no one knows which piece of paper says what. iv. Draw one piece of paper at a time. The lottery method is practical only with small populations, probably no more than 30 units. 2. Random Number Method this method has four steps and requires Microsoft Excel to execute. 9

10 i. Assign numbers from 1 to N to your sample units, where N is the sample population size. ii. Create the formula =rand()*(n-1)+1 in Microsoft Excel and hit enter, resulting in a number between 1 and N. iii. Copy the formula into a different cell and hit enter for a new random number to appear. Do this repeatedly until the desired sample size is reached. iv. At the end, round each number to the nearest integer. Systematic Sampling Systematic sampling is similar to simple random sampling, except that only one random number is needed throughout the entire sampling process. As with simple random sampling, the requirements for systematic sampling are population homogeneity and the availability of an exhaustive list of sampling units. Example: To select a sample of 60 from a population of 1,200, these are the steps for systematic sampling. 1. Assign a unique number to each sampling unit from 1 to 1, Compute the skip interval by dividing the population size (1,200) by the sample size (60). The skip number k is equal to Generate a random number (Excel formula is =rand()*20 to randomly select a number between 1 and 20) (e.g., 7). 4. Apply the skip interval to determine which numbers will be in the sample (7, 27, 47, 67, 1,167, 1,187). Systematic sampling can also be adapted in situations where there is not an exhaustive list of the sampling population. Example: In a family planning exit interview study, you do not know in advance how many women will come to the family planning clinic on any given day and be available for interview as they exit. You decide to interview every fourth woman who comes to the clinic. The skip interval is therefore 4. If the first woman to be interviewed is number 3, then the others will be 7, 11, 15, 19,

11 FIGURE 6: EXAMPLES OF SIMPLE RANDOM AND SYSTEMATIC SAMPLING EXAMPLES OF SIMPLE RANDOM AND SYTEMATIC SAMPLING Examples of simple random sampling and systematic sampling for a sample of size 10 in a population of size 20 Population Simple random sample Systematic sample (skip interval=2, first selected=2) Stratified Sampling In stratified sampling, the population is stratified or divided into subpopulations. Then a sample is drawn (Figure 7). Each subpopulation is called a stratum. The word strata refers to more than one stratum. Think of stratified sampling as creating two or more sample populations, each of which provides a separate sample. In a stratified sampling, a sample must be drawn from each stratum using a simple random, systematic, or cluster sampling approach. Cluster approach is discussed later in this chapter. The goal of stratification is to ensure that each stratum is adequately represented in the final sample. Stratification is also advised when you are planning to conduct separate statistical analyses for each stratum. The key advantage of stratification is to improve the precision of estimates (i.e., to reduce sample variation in each stratum). To benefit from stratification, the sample has to be divided proportionally to stratum size. 11

12 FIGURE 7: PROBABILITY SAMPLING STRATIFIED SAMPLING Is the population homogeneous? Is it feasible to study all members of the population? Yes No If the population is divided into subgroups are the subgroups representative of the overall population? No Yes Is the objective of the study only to measure coverage? No Yes Simple Random Sampling Systematic Sampling Stratified Sampling Cluster Sampling Lot Quality Assurance Sampling Example: Proportional allocation of a sample of 2,000 to four regions (total population is 16,000). See Table 2. In this example, the sample is always one-eighth of the population in each region and also of the total population. First compute the proportionality coefficient, which is simply the sample size divided by the population size. Therefore 2,000/16,000 or 1/8 = Then, for each region, multiply the region s population size by the proportionality coefficient. Therefore: Region A: 3,000 x = 375 Region B: 4,000 x = 500 Region C: 4,000 x = 500 Region D: 5,000 x =

13 TABLE 2: PROPORTIONAL ALLOCATION OF A SAMPLE Example: Proportional allocation of a sample of 2,000 to four regions (total population is 16,000) Region A B C D Total Population 3,000 4,000 4,000 5,000 16,000 Sample ,000 Proportionality coefficient: 2,000/16,000 = 1/8 Sam ple for Region A: 3,000 * 1/8 = 375 Proportional allocation is not always practical; it will sometimes lead to the population of one stratum being so small that the precision of the estimate is insufficient, as will be discussed further below. When that occurs, the sample size of this stratum can be increased. For example, using the data above, say the sample size of 375 in region A is insufficient to narrow our sampling variation appropriately. Assume that 500 is the minimum sample size required to narrow the sampling variation sufficiently. Then, the sample size in region A can be increased from 375 to 500, without changing the sample size of regions B, C, and D. When you do this, you will have sufficient sampling variation in all regions. If, however, you want to make an estimate (say, of the proportion of females for the whole country) by combining regions A, B, C, and D, then you have to compute weights. A weight allows you to use all the data from regions A, B, C, and D. If you increased the sample size in region A to 500 and then combined region A with regions B, C, and D to calculate the proportion of females, you would need to reduce the weight given to region A by 0.33, or = 125. Why? Simply because the sample units in region A have a 0.33 (125/375 = 0.33) higher probability of being selected to participate in the study than the sample units in regions B, C, and D. If you weight each observation in region A by = 0.66, then each sampling unit in region A provides a proportionate contribution to the calculation of the proportion of females. Cluster Sampling At the start of this chapter, you saw that a sampling frame (Figure 2) could vary in specificity from a complete list of all sample units in a sample population to a partial list. You could select a sample of census districts, then a sample of enumeration areas, then a sample of dwellings in some areas, and finally a sample of sample units in that dwelling. Cluster sampling is the application of this process down to a predetermined level called a cluster. A cluster could be, for example, a census district, an enumeration area, or a group of dwellings. The critical element is to list all units and their population sizes as you move down through the sampling frame. 13

14 On the basis of the unit list with population sizes, draw a number of clusters using either random or systematic sampling with the probability of selection proportional to the population size (PPS). Example: Assume a sample population resides in 31 census districts with the population sizes shown in Table 3 (Valadez et al., 1995). Create a new column, entitled Cumulative Population. Row 1 of that column is equal to the total population of Pagai. Row 2 equals the cumulative population in row 1 plus the total population of Santai. Row 3 equals the cumulative population of row 2 plus the total population of Serina, and so on. The total population is then divided by the number of clusters you want to select. Say you want to select 19 (more purposeful guidance on selecting the number of clusters is given below). Therefore, 23,489/19 = 1,

15 TABLE 3: DATA FOR THE CLUSTER SAMPLING EXAMPLE So, how would you apply systematic sampling in this example? 1. Select a random number using Microsoft Excel. Say it is 622. Locate the row in which that number is less than the cumulative population of that row and greater than the cumulative population of the row above it. Here it is row 2, Santai. 2. Then, add 1, = 1,858. That number is in row 3, Serina. 3. Then, add 1, ,236 = 3,094. That number is in row 5, Fanta. 4. Continue until you have selected 19 clusters. 15

16 Now you can decide how to proceed. Say in Pagai, there are again 31 subunits (i.e., enumeration areas). You can collect data from all 31 units in each of the selected clusters (the take-all approach). Or, you can list all sampling units in each cluster and then proceed again with random or systematic sampling, just as was done above. This process is called two-stage sampling. The first stage is sampling of census districts, and the second stage is sampling of enumeration areas. Cluster sampling can involve more than two stages. For example, in a survey, you can start with a random sampling of regions, provinces, states, or census districts. If it is census districts, a second stage could then be to randomly choose within each selected district a number of villages or municipalities. In a village or municipalities, enumeration areas can then be selected; in them, dwellings or households; and in them, individuals can be selected. This is called a multistage cluster sampling. In cluster sampling, the first level of selection (the census district in this example) is called the primary sampling unit. Note that throughout the cluster sampling process, each sample unit in the population has an equal probability of selection. For social marketing and distribution surveys, cluster sampling is usually the best sampling method to use. It is also the method used by the Demographic and Health Surveys. It requires only the information listed above. Where the required lists do not exist, the missing lists usually are simply households. Stratified and Cluster Sampling Stratified sampling can be an entry point to both random/systematic sampling and cluster sampling. Stratified and cluster sampling have been devised to reduce cost and to improve precision. Both have advantages and disadvantages. (See Box 1.) BOX 1: CHOOSING BETWEEN STRATIFIED AND CLUSTER SAMPLING Generally, you will use the cluster sampling strategy. However, first decide whether all your estimates need to be made in terms of residence (urban or rural) or by province, region, or state. If so, then you will need to stratify your sample. As you will see below, cluster sampling reduces cost, albeit at some loss of precision. Stratification in turn can increase the precision, but it does so at increased cost. Lot Quality Assurance Sampling If you only have estimates to make (e.g., of levels of opportunity, ability, and motivation) and you do not need to test for associations (whether exposure to your intervention resulted in increased opportunity), then lot quality assurance sampling could be a highly efficient strategy for monitoring (Figure 8). This is because monitoring is the primary application in the social marketing process for which tests of association are not always needed. Lot quality assurance sampling is also the basis of Project MAP (Measuring Access and Performance) that measures coverage and other measures of opportunity at the retail level. 16

17 FIGURE 8: PROBABILITY SAMPLING CLUSTER AND LOT QUALITY ASSURANCE SAMPLING Is the population homogeneous? Is it feasible to study all members of the population? Yes No If the population is divided into subgroups are the subgroups representative of the overall population? No Yes Is the objective of the study only to measure coverage? No Yes Simple Random Sampling Systematic Sampling Stratified Sampling Cluster Sampling Lot Quality Assurance Sampling Example: The proportion knowing the social marketing brand slogan is 50 percent. The social marketer decides that proportion should be increased to 80 percent in each of the 30 sales zones in the country. After that, the campaign needs to slow its advertising of the slogan and reallocate funds to promote selfefficacy. The social marketer can know inexpensively whether the 80 percent level has been achieved in each sales zone by using lot quality assurance sampling. Lot quality assurance sampling (LQAS) is a special case of stratified sampling. Assume that the sampling population is all men, ages 15 49, in the nation. LQAS says that 19 men, ages 15 49, can be randomly selected in each of those 30 zones and asked whether they know the social marketing brand slogan. If 14 say yes, then the 80 percent is achieved. Why is 14/19 (73%) considered equal to 80%? It has to do with the statistic being used, which is called the binomial. This determination, however, is not important to review in detail at this stage. What is important is that LQAS has major cost and time-saving advantages. It allows a social marketer to conduct a survey without a researcher. The only analysis required is counting the proportion of the 19 persons surveyed who respond positively. The result is immediately known and immediately actionable. Lot quality assurance sampling also has a major up-front cost the establishment of the capacity to randomly select the person. PSI is experimenting with ways of establishing this capacity so that it can be repeatedly used at no marginal cost. In principle, PSI would do the following: Stratify the country into managerially meaningful zones. This would be a zone that is salient in resource allocation terms, such as a defined media market or sales zone. Cluster the sample down to an enumeration area, which would then be listed to household level. 17

18 Each time monitoring is done, randomly select the household and person. Keep the location of the cluster unknown by the social marketers so that special interventions or intense interventions are not made in the clusters monitored. Non-Probability Sampling In non-probability sampling, sample units are left out deliberately. This means that the researcher knows beforehand that the sample will be biased. Why, then, is non-probability sampling used? Non-probability sampling is sometimes useful for qualitative research (particularly for concept development) for which estimates are not required although descriptions and a willingness by participants to describe are. FIGURE 9: NON-PROBABILITY SAMPLING Does the researcher know enough about the population to select the sample based on expert judgment? No Yes No Looking for participants for a focus group or in-depth interview? Yes Is the study population hard to reach? Is there a subgroup that is likely to be underrepresented? Is there a desired number of participants? Yes No Snowball Sampling Quota Sampling Convenience Sampling Judgment Sampling Non-probability sampling methods include the following: Judgment Sample units are included based on the knowledge of the researcher derived (e.g., from an expert panel). Convenience Sample units are chosen based on location, time, or some other element of convenience. Quota A predetermined number of sample units must be included from various subgroups. Snowball One sample unit is used to determine the next. Judgment Sampling PSI uses judgment sampling primarily for stakeholder surveys in strategic planning. Convenience Sampling 18

19 PSI often uses convenience sampling to select participants for focus groups and for in-depth interviews. Quota Sampling In quota sampling, the sample population is stratified (Figure 10). Then the researcher decides that a certain number of units a quota must be selected in each subgroup. Judgment and convenience sampling are then used. PSI sometimes uses quota sampling to select participants for focus group discussions and in-depth interviews. Snowball Sampling In snowball sampling, the researcher starts with one sampling unit. The researcher uses this first unit to identify new units and so forth (Figure 10). Snowball sampling is used in studies involving respondents who may be hard to find, such as commercial sex workers, intravenous drug users, homeless people, or men who have sex with men. For example, if a researcher locates and interviews one homeless person, that person may be able to identify other homeless people for the researcher to interview. FIGURE 10: EXAMPLES OF TWO TYPES OF NON-PROBABILITY SAMPLING (QUOTA AND SNOWBALL) Quota sampling and snowballing strategies Quota sampling Snowball sampling Male groups Start groups Female groups groups (eg. Focus groups) (eg. In-depth interview s) DETERMINE A SAMPLE SIZE The sample size, the number of people you will interview, affects the precision and cost of the study. The larger the sample is, the more precision it has. But it also follows that the more precision your sample has, the more the study costs. How much precision is necessary? Precision PSI can easily estimate the minimum sample size required to achieve a given level of precision. 19

20 Researchers talk of estimates, such as 50 percent ± 10 percent. Where do those estimates and confidence intervals come from? Graph 2 shows the relationship between sample size, N, and the size of a confidence interval. On the Y or vertical axis, labeled ½ CI Length, the ±10 percent can be seen at The X-axis gives the N (i.e., sample size). Where the sloping line crosses 0.100, it is approximately at N = 40. The sloping line says, for all proportions between 0.10 and 0.50 (50%) and for all proportions from 50 percent to 90 percent, a confidence interval of ± 10 percent holds. To reduce that to 5 percent, then you need to raise the sample size to 80. Confidence Intervals What is a confidence interval? Statisticians have demonstrated that if 100 samples of the same size are drawn from a sampling population, 95 of those will produce an estimate that is within a certain interval. This interval is typically estimated at 95 percent probability. This means that if any one sample is drawn, its estimate has a 95 percent chance of being inside the confidence interval and a 5 percent chance of being outside the confidence limit. Why 95 percent? It is simply convention. It is also possible to calculate 90 percent or 99 percent confidence intervals. Significance Levels Confidence intervals and significance levels are complementary terms. Thus, a 95 percent confidence interval is associated with a 5 percent significance level. Estimating a Population Parameter The first step in calculating sample size is surprising to most people. You must first guess what the estimate is of the indicator (proportion, percentage, or average) that you want to measure. This process is known as estimating the population parameter. Say that you want to measure condom use in casual partnerships. You might guess, based on other surveys done recently, that it is about 50 percent. If other surveys do not exist, then guessing that the estimate is high (say 75 percent), or low (say 25 percent), or very low (10 percent) is fine. But if you really do not know, guess 50 percent. It calculates the largest possible sample size that you might need. Here are the steps to follow for estimating the population parameter: 1. Give a rough idea of the proportion or percentage p that you want to estimate. Is it in the range of 15 percent? 30 percent? 65 percent? 80 percent? If you do not know, use 50 percent. 2. Decide on the level of precision (±) you want. 10 percent? 5 percent? 1 percent? 3. Decide on the significance level; 5 percent is a typical value. 4. With this information, the sample size can be calculated using the formula: where N is the required sample size. N = Z 2 p (100 p)/e 2 20

21 Z is a value corresponding to your significance level (and is called the standard normal deviation ). Z = 1.96 (rounded to 2) for 5 percent significance levels. p is the rough value you provided for your estimated percentage (proportion). e is the precision you wish to achieve. Example: You want to conduct a household survey among youth. Your primary objective is to estimate the proportion of youth who know that condom use protects against AIDS. This is how you would compute your required sample size, assuming simple random sampling: You know nothing about the value of p firsthand, so you estimate p at.5, or 50 percent. Assume that you want an estimate with 10 percent precision. Assume that you want a 5 percent significance level. Then, applying the formula above: N = 1.96² 50(100 50)/10² = 96. What if you increase the desired precision level to 5 percent? Then your sample size would be 384. GRAPH 2: RELATIONSHIP OF N, P, AND CONFIDENCE INTERVALS Confidence Interval length for values of N (Sample Size) and P (Proportion) /2 CI Length N Taking the Sampling Method or Design into Account Cluster sampling increases variability and therefore reduces precision from what you would get if simple random sampling were used. To achieve the same precision as simple random sampling in cluster sampling, statisticians have calculated that the computed sample size must be multiplied by a factor 21

22 known as deff, the design effect. This factor is usually between 1 and 3. Demographic and health survey reports often contain values of the design effect for most of their indicators in their appendices. The formula above then becomes: N = deff Z 2 p (100 p)/e 2. If deff is not known, use 2. So, continuing the example of the youth household survey, if you assume a precision of 5 percent, the use of a cluster sampling strategy, and no knowledge of deff, then the sample size calculation is: N = = 768. A Multiple Indicator Approach How do you decide which population parameter to use? Condom use? With a regular partner or casual partner? Social marketing researchers are never interested in one single indicator; they want to know about exposure, opportunity, ability, motivation, and behavior. That means many indicators. Sometimes researchers want to use the same study to estimate indicators in different populations (e.g., mothers and children in a household survey). In this case, it might be desirable to estimate the prevalence of diarrhea among children aged 0 35 months, plus the mothers use of contraception. The best approach is to compute the required sample size for each of the indicators you are interested in and then take the largest. This process is tedious, but it will prevent disappointing results when tables are being filled in. Planning for Losses When computing the sample size for a study, you should plan for people who will refuse to answer your questions, those whom you will not find at home, questionnaires that are badly completed, and other inevitabilities. Such planning means deciding on a percentage of respondents who will be discarded from the final sample for various reasons. If you are not sure what the percent loss will be, assume 10 percent. Most studies have losses less than that. By planning for losses, you will not have to worry that your final sample size is less that what you require for valid conclusions. To do this: 1. Multiply your required sample size by the loss factor: 100/(100 % loss) 2. Multiply your previously calculated sample size by the loss factor: N = 768 [100/(100 % loss)] 3. If, for example, your predicted percent loss is 10 percent, then the new sample size would be: N = 768 [100/(100 10)] = 854 The Implications of Stratification 22

23 Say you want to stratify your youth sample by region in a country where you have six regions. If you just divide your sample of 854 proportionally to region size, the level of precision in each region will be much lower than the whole. If you are interested in obtaining estimates with the same precision as above for each of the regions, then each region becomes an independent sample population, and you have to compute a sample size for it. In this case, if your regions are the same size, you end up with a sample of: = 5,124 Unfortunately, it is rare that your regions are the same size. To obtain a final sample that is proportional to region size, the adequate sample size at country level easily could reach 8,000 to 9,000, which is historically more than any country s program could afford. The first challenge in stratification is to decide on an acceptable level of the desired precision at the region level. For example, in the youth KAP study, 5 percent precision is acceptable, but 10 percent is too wide. So why not decide on a percentage in between these for the regions, say 7 percent? Then the corresponding minimum sample size per region would be: N = 2 (100/90) 1.96² 50 (100 50)/7² = 436 where 2 is the design effect and 100/90 is the loss factor. The second challenge in stratification is to arrive at a final sample that is distributed proportionally to stratum (region) size. This is important because otherwise you will need to weight your sample before conducting your analyses. To do this, compute the proportionality factor by dividing the size of the smallest region by the minimum sample size. Then obtain the required sample size for each subsequent region by dividing the size of that region by the proportionality factor. Table 4 provides an example of how to do it, using the youth household study example and the six regions. TABLE 4: EXAMPLE OF COMPUTATION OF STRATUM-SPECIFIC SAMPLE SIZES Region Size Proportionality Factor Sample Size 1 850, ,320, ,810,000 1, ,000 1, , ,530,000 1,042 Country 7,139,000 4,857 23

24 Special Case 1: Determining Sample Size for Dashboard Studies Calculating sample size required for producing dashboards is a relatively complex process in terms of thinking through and also in its practical implementation. This section will help you with both aspects. (Please see Box 2.) To keep things simple, the following explanation will be based on the three key population dashboard tables: monitoring, segmentation (step 3), and evaluation. Although the statistical reasoning behind each table is the same (comparisons), the parameters involved take different meanings from one table to the other. BOX 2: DETERMINING SAMPLING SIZE As in multiple indicators surveys, the rule of thumb here will also be as follows: determine the minimum required sample size for each table and take the maximum of the three as the sample size for the study. Determining sample size required to estimate change is not the same as estimating a population parameter. Because dashboards are comparison tables, the calculation of sample size required to produce them requires two types of decision making elements: Statistical Type I error [level of significance] Type II error [power] Programmatic Minimum significant difference to be detected [expected change] Type I and Type II Errors There are two crucial issues here. First, because of sampling variations, both estimates have a 5 percent chance of falling outside of their confidence interval. For this reason, we may say that the two estimates are different when in fact they are not. This is known as type I error rejecting the null hypothesis (that there is no difference) when it should be accepted. Another possibility is that the two estimates truly are different, but you wrongly assume that they are the same just because their confidence intervals overlap. This is called a type II error failing to reject the null hypothesis when it is false. The complement of type II error is the probability of observing a significant difference when it really exists. This is called the power of the study. Type I error and power are two important elements to take into account when planning to evaluate a program using a survey. As a general rule, type I error is set at 5 percent. 24

25 The power of a study needs to be set in such a way that you can be fairly confident that you can measure a change over time. The power is often set to between 80 percent and 90 percent, depending on the implications of a wrong decision. In most population-based studies, a power of 80 percent is sufficient. But in no case should you accept a power of less than 80 percent. Precision and Expected Change or Difference Decide before conducting a study whether it will be used to estimate change. All too often, people realize their baseline sample was insufficient only when it is too late. Once you have set the significance level and study power, you need to estimate how much change you expect. Then you can compute the sample size for estimating change or difference. How much change or difference do you expect? If you are working in a community where knowledge of condom use among youth is about 40 percent at baseline, how much of a change do you expect to achieve in two years? Do you expect to increase knowledge to 50 percent, 60 percent, or 90 percent? It may seem like there is not much difference between 50 percent and 60 percent, but this difference is crucial when computing required sample size. If you calculate a sample size to detect a 30 percent change and only achieve 20 percent change, your sample will be too small to detect it. The key formula that we will be using to calculate sample sizes for monitoring, segmentation, and evaluation is as follows: where deff n = [ Z1 α 2P( 1 P) + Z1 β P1 ( 1 P1 ) + P2 ( 1 P2 )] ( P P ) 2 P 1 is the hypothesized value of the indicator in the first instance (e.g., time 1), P 2 is the expected value of the indicator at the second instance (e.g., time 2), P = (P 1 + P 2 )/2, Z α is the standard normal deviate value for an α type I error, Z 1 β is the standard normal deviate value for a c (or 1 β) type II error, and Deff is the design effect in the case of multistage cluster sample design. 2 Note: This formula is conservative because it assumes only two values to be compared (e.g., time 1 and time 2, or exposed and non-exposed). It is common to compare more than two values (e.g., in a monitoring table with three or four yearly data points). In this case, the required sample size is slightly less than what would be obtained using the approach proposed here, but the gain is probably not worth the complications. The sample size computed using this formula needs to be adjusted for filtering processes such as risk groups and refusals. For example, when the indicator is calculated using a specific group such as men with non-marital sexual partners, a second step needs to be included. One is to compute the required

26 number of men with non-marital sexual partners. Because we do not know if a man has non-marital sexual partner before interviewing him, the calculated number needs to be inflated using the estimated proportion of men with non-marital sexual partners in the target population. A similar inflation will be done to account for refusals. For this reason, sample sizes will be calculated in two steps each time. Monitoring Monitoring studies compare values of given indicators at different times. For example, logical frameworks set to compare values of key indicators at baseline and at the end of the project. Table 5 shows the basic layout of a monitoring table with key parameters. TABLE 5: SAMPLE MONITORING TABLE Monitoring Time 1 vs Time 2 Population A At Risk Time 1 Time 2 Sig Behavioural Y or Determinant X P 1 % P 2 % *** Parameters % At Risk (P r ) % interview completed (P i ) P 1 P 2 Step 1: A program desires to compare a given parameter before and after an intervention. The required sample size for each arm of the study will be calculated using the formula where n 1 deff = [ Z1 α 2P( 1 P) + Z1 β P1 ( 1 P1 ) + P2 ( 1 P2 )] ( P P ) 2 P 1 is the hypothesized value of the indicator before intervention, P 2 is the expected value of the indicator after intervention, P = (P 1 + P 2 )/2, Z α is the standard normal deviate value for an α type I error, Z 1 β is the standard normal deviate value for a c (or 1 β) type II error, and Deff is the design effect in the case of multistage cluster sample design

27 The same principle would apply if we want to conduct yearly surveys. In this case, the calculated sample size is for each year. Example: A program desires to detect an increase of 20 percent in the percentage of youth who always use condoms with their casual partners with a power of 90 percent and a 95 percent confidence level. You assume that 40 percent of youth may be using condoms with their casual partners at the time of the first survey and that the design effect is 2. Step 1: P 1 = 40% P 2 P 1 = 20% P 2 = 60% P = (40% + 60%)/2 = 50% Z 1 α = Z 1 β = Deff = 2 n 1 = (.5) ( 1.4) +.6( 1.6) (.6.4) 2 2 Therefore, your sample size must be 153 youth at risk (i.e., youth with casual partners). If you want to detect a 10 percent difference under the same conditions as above, you would need 610 youths. The smaller the difference that you want to detect, the larger the sample size must be. Note: This example involves an important assumption, that is, that the percentage of youth who would use condoms would increase. Therefore, your calculations involved desired change only in one direction. If you do not know that the change will occur in only one direction, your calculations must take into account that the desired change might involve either an increase or a decrease. This requires that Z 1 α be replaced by Z 1 α/2 in the formula. Common values of Z 1 α and Z 1 β (and also Z 1 α/2 ) are given in Table 6. Values in bold are the ones most widely used. TABLE 6: COMMON VALUES OF Z 1 α, Z 1 β, AND Z 1 α/2 α Z 1 α Z 1 α/2 1% % % β Z 1 β 5% % % Step 2: Account for risk and refusals. 27

28 Having calculated the number of individuals required for one arm of the study (youth at risk), you now need to determine how many youths should be selected overall so as to have this many number at risk. To do this, the computed sample size would be inflated by the factor 1/p r, where p r is the proportion of individuals at risk. Similarly, we should also account for refusals by inflating the computed value by the factor 1/p i, where p i is the proportion of individuals actually interviewed among all those selected. In short, you calculate the final sample size of all individuals to be selected using the following formula: n = n1 p r * p i where n 1 is the sample size calculated at step 1, p r is the proportion at risk, and p i is the proportion finally interviewed among those selected. For example, in country X, 40 percent of all youths aged declared having casual partners, and previous studies suggest that 96 percent of all selected youths completed their interviews. Using this information, the final sample size for the study given the first scenario would be n = *.96 = 398 Segmentation Segmentation studies compare values of a given behavioral determinant (bubble) between users and nonusers among populations at risk (typically a segmentation table like Table 7). Sample size determination therefore will require taking into account the level of use, risk, and refusals. TABLE 7: SAMPLE SEGMENTATION TABLE Segmentation Use vs Non Use Population A At Risk Non users Users Sig Behavioural Determinant X P 1 % P 2 % *** Parameters % At Risk (P r ) % Use among at Risk (P u ) % interviews completed P 1 P 2 28

29 Sample size for segmentation studies is computed in two main steps. Step 1: Determine the required number n 1 of behavers (users) among the population at risk, using an adapted version of the monitoring formula. Unlike the monitoring case, P 1 here refers to the percentage of users that are positive for the given behavioral determinant. P 2 refers to the percentage of nonusers who are positive for the same behavioral determinant. All other parameters remain similar. Note that this sample size will be calculated with each behavioral determinant of interest and the largest value retained. deff n = 1 [ Z1 α 2P( 1 P) + Z1 β P1 ( 1 P1 ) + P2 ( 1 P2 )] ( P P ) 2 where I = Level behavioral determinant among nonusers, P 2 P 1 = Expected difference between users and nonusers, P 2 = Level of behavioral determinants among users, P = (P 1 + P 2 )/2, Z 1 α = Standard normal value for α type I error, Z 1 β = Standard normal value for β type II error, and Deff = Design effect. 2 For example, one of the behavioral determinants of interest for condom use among youths aged is risk perception (susceptibility). Assume that risk perception among nonusers of condom is 15 percent (P 1 ). We want to detect a minimum difference of 10 percent between users and nonusers, so P 2 P 1 = 10%. In other words, P 2 = 25%. Using 5 percent type I errors, 80 percent power, and a design effect of 2, the required number of users would be: n [ ( 1.20) ( 1.15) +.25( 1.25)] 2 (.25.15) = = This results in the need for 347 users. Step 2: Compute the final sample of youths required by accounting for use, risk, and refusals. If P u and P r are those proportions, respectively, and if P i is the proportion of interviews completed, then the required sample size can be computed using the formula: n = p u n * p 1 r * p i Assume 60 percent of youths at risk use condoms with casual partners, 40 percent are at risk, and 96 percent completion. The final sample size would be: n = *.40*.96 = 1,505 Therefore, the total number of youths to be selected to allow for a segmentation is 1,

30 Evaluation Evaluation studies compare indicator values (e.g., behavior or behavioral determinants) between people who are exposed to program activities and those who are not exposed among populations at risk. They are therefore similar to the segmentation approach. Table 8 is a sample evaluation table. TABLE 8: SAMPLE EVALUATION TABLE Evaluation Exposed vs Non Exposed Population A At Risk Non Exposed Exposed Sig Behavioural Determinant X P 1 % P 2 % *** Parameters % At Risk (P r ) % Exposed among at Risk (P e ) % intereviews completed (P i ) P 1 Level of X among non exposed P 2 Level of X among exposed Step 1: Compute the required number of subjects exposed to program activities (or not exposed) by adapting the monitoring formula, as was the case with segmentation. P 1 here refers to the percentage of non-exposed subjects who are positive for the given behavior (or behavioral determinant). P 2 refers to the percentage of exposed subjects who are positive for the same behavior (or behavioral determinant). All other parameters remain similar. deff n = 1 [ Z1 α 2P( 1 P) + Z1 β P1 ( 1 P1 ) + P2 ( 1 P2 )] ( P P ) 2 where P 1 = Level of behavior or behavioral determinant among non-exposed, P 2 P 1 = Expected difference between exposed and non-exposed, P 2 = Level of behavior or behavioral determinants among exposed, P = (P 1 + P 2 )/2, Z 1 α =Standard normal value for α type I error, Z 1 β = Standard normal value for β type II error, and Deff = Design effect. 2 In many settings, the level of behavior (or behavioral determinants) is often low among people who are not exposed to social marketing programs. For example, in Rwanda the percentage of youths not exposed to the Centre Dushishoze program who consistently used condoms with casual partners was 35 percent

31 If one sets to detect a 15 percent difference between exposed and non-exposed, then P 2 would be 50 percent and the required number of youth exposed can be calculated as: n 2 = [ *.425* * *.50] (.50.35) 1 = 2 Step 2: Account for exposure, risk, and interview completion. If P e is the proportion of youths exposed to the program, P r the proportion at risk, and P i the completion rate, then the final required sample size can be computed using the formula: n = p e n * p 1 r * p i If 50 percent of youths are exposed to the program, 40 percent at risk, and 96 percent completed, the final n would be: n = *.40*.96 = 1,389 In many instances, the analysis of evaluation data may requires more than two groups (e.g., not exposed, low or medium exposure, and high exposure). Calculations for this case are more complicated, but a safe approach is to hypothesize that the same number of subjects is required for each exposure level. Special Case 2: Qualitative Studies Sample size determination is most important for quantitative studies. However, qualitative studies also need to decide on how many individuals to include for data collection. In most instances, this number is obtained through a balance between how much time and money is available and how many individuals need to be contacted to learn something coherent from the population. Also, the method of sampling may influence the sample size. For example, it is recommended that for focus group studies, at least two focus groups be conducted for any one participant category of interest, such as sex, age, and residence (urban or rural). If such is the case, then Figure 11 reveals that 16 focus groups are necessary. 31

Why Sample? Why not study everyone? Debate about Census vs. sampling

Why Sample? Why not study everyone? Debate about Census vs. sampling Sampling Why Sample? Why not study everyone? Debate about Census vs. sampling Problems in Sampling? What problems do you know about? What issues are you aware of? What questions do you have? Key Sampling

More information

NON-PROBABILITY SAMPLING TECHNIQUES

NON-PROBABILITY SAMPLING TECHNIQUES NON-PROBABILITY SAMPLING TECHNIQUES PRESENTED BY Name: WINNIE MUGERA Reg No: L50/62004/2013 RESEARCH METHODS LDP 603 UNIVERSITY OF NAIROBI Date: APRIL 2013 SAMPLING Sampling is the use of a subset of the

More information

Introduction to Sampling. Dr. Safaa R. Amer. Overview. for Non-Statisticians. Part II. Part I. Sample Size. Introduction.

Introduction to Sampling. Dr. Safaa R. Amer. Overview. for Non-Statisticians. Part II. Part I. Sample Size. Introduction. Introduction to Sampling for Non-Statisticians Dr. Safaa R. Amer Overview Part I Part II Introduction Census or Sample Sampling Frame Probability or non-probability sample Sampling with or without replacement

More information

Chapter 8: Quantitative Sampling

Chapter 8: Quantitative Sampling Chapter 8: Quantitative Sampling I. Introduction to Sampling a. The primary goal of sampling is to get a representative sample, or a small collection of units or cases from a much larger collection or

More information

Chapter 3. Sampling. Sampling Methods

Chapter 3. Sampling. Sampling Methods Oxford University Press Chapter 3 40 Sampling Resources are always limited. It is usually not possible nor necessary for the researcher to study an entire target population of subjects. Most medical research

More information

Survey Research: Choice of Instrument, Sample. Lynda Burton, ScD Johns Hopkins University

Survey Research: Choice of Instrument, Sample. Lynda Burton, ScD Johns Hopkins University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Sampling: What is it? Quantitative Research Methods ENGL 5377 Spring 2007

Sampling: What is it? Quantitative Research Methods ENGL 5377 Spring 2007 Sampling: What is it? Quantitative Research Methods ENGL 5377 Spring 2007 Bobbie Latham March 8, 2007 Introduction In any research conducted, people, places, and things are studied. The opportunity to

More information

Descriptive Methods Ch. 6 and 7

Descriptive Methods Ch. 6 and 7 Descriptive Methods Ch. 6 and 7 Purpose of Descriptive Research Purely descriptive research describes the characteristics or behaviors of a given population in a systematic and accurate fashion. Correlational

More information

Sampling and Sampling Distributions

Sampling and Sampling Distributions Sampling and Sampling Distributions Random Sampling A sample is a group of objects or readings taken from a population for counting or measurement. We shall distinguish between two kinds of populations

More information

Sampling strategies *

Sampling strategies * UNITED NATIONS SECRETARIAT ESA/STAT/AC.93/2 Statistics Division 03 November 2003 Expert Group Meeting to Review the Draft Handbook on Designing of Household Sample Surveys 3-5 December 2003 English only

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Sampling. COUN 695 Experimental Design

Sampling. COUN 695 Experimental Design Sampling COUN 695 Experimental Design Principles of Sampling Procedures are different for quantitative and qualitative research Sampling in quantitative research focuses on representativeness Sampling

More information

Introduction to Quantitative Research Contact: tel 01296 680374

Introduction to Quantitative Research Contact: tel 01296 680374 Introduction to Quantitative Research Quantitative Research Quantification - i.e. numbers e.g 51% of the population is female 74% of households have a washing machine 33% strongly agree with the statement.

More information

Inclusion and Exclusion Criteria

Inclusion and Exclusion Criteria Inclusion and Exclusion Criteria Inclusion criteria = attributes of subjects that are essential for their selection to participate. Inclusion criteria function remove the influence of specific confounding

More information

SAMPLING METHODS IN SOCIAL RESEARCH

SAMPLING METHODS IN SOCIAL RESEARCH SAMPLING METHODS IN SOCIAL RESEARCH Muzammil Haque Ph.D Scholar Visva Bharati, Santiniketan,West Bangal Sampling may be defined as the selection of some part of an aggregate or totality on the basis of

More information

SAMPLING & INFERENTIAL STATISTICS. Sampling is necessary to make inferences about a population.

SAMPLING & INFERENTIAL STATISTICS. Sampling is necessary to make inferences about a population. SAMPLING & INFERENTIAL STATISTICS Sampling is necessary to make inferences about a population. SAMPLING The group that you observe or collect data from is the sample. The group that you make generalizations

More information

Reflections on Probability vs Nonprobability Sampling

Reflections on Probability vs Nonprobability Sampling Official Statistics in Honour of Daniel Thorburn, pp. 29 35 Reflections on Probability vs Nonprobability Sampling Jan Wretman 1 A few fundamental things are briefly discussed. First: What is called probability

More information

Techniques for data collection

Techniques for data collection Techniques for data collection Technical workshop on survey methodology: Enabling environment for sustainable enterprises in Indonesia Hotel Ibis Tamarin, Jakarta 4-6 May 2011 Presentation by Mohammed

More information

Chapter 7 Sampling (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.

Chapter 7 Sampling (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters. Chapter 7 Sampling (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.) The purpose of Chapter 7 it to help you to learn about sampling in

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Chapter 1 Dr. Ghamsary Page 1 Elementary Statistics M. Ghamsary, Ph.D. Chap 01 1 Elementary Statistics Chapter 1 Dr. Ghamsary Page 2 Statistics: Statistics is the science of collecting,

More information

Sampling Techniques Surveys and samples Source: http://www.deakin.edu.au/~agoodman/sci101/chap7.html

Sampling Techniques Surveys and samples Source: http://www.deakin.edu.au/~agoodman/sci101/chap7.html Sampling Techniques Surveys and samples Source: http://www.deakin.edu.au/~agoodman/sci101/chap7.html In this section you'll learn how sample surveys can be organised, and how samples can be chosen in such

More information

Selecting Research Participants

Selecting Research Participants C H A P T E R 6 Selecting Research Participants OBJECTIVES After studying this chapter, students should be able to Define the term sampling frame Describe the difference between random sampling and random

More information

COI Research Management Summary on behalf of the Department of Health

COI Research Management Summary on behalf of the Department of Health COI Research Management Summary on behalf of the Department of Health Title: Worth Talking About Campaign Evaluation 2010 / 2011 Quantitative research conducted by TNS-BMRB COI Reference number: 114770

More information

Non-random/non-probability sampling designs in quantitative research

Non-random/non-probability sampling designs in quantitative research 206 RESEARCH MET HODOLOGY Non-random/non-probability sampling designs in quantitative research N on-probability sampling designs do not follow the theory of probability in the choice of elements from the

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

Welcome back to EDFR 6700. I m Jeff Oescher, and I ll be discussing quantitative research design with you for the next several lessons.

Welcome back to EDFR 6700. I m Jeff Oescher, and I ll be discussing quantitative research design with you for the next several lessons. Welcome back to EDFR 6700. I m Jeff Oescher, and I ll be discussing quantitative research design with you for the next several lessons. I ll follow the text somewhat loosely, discussing some chapters out

More information

Clinical Study Design and Methods Terminology

Clinical Study Design and Methods Terminology Home College of Veterinary Medicine Washington State University WSU Faculty &Staff Page Page 1 of 5 John Gay, DVM PhD DACVPM AAHP FDIU VCS Clinical Epidemiology & Evidence-Based Medicine Glossary: Clinical

More information

Sampling Procedures Y520. Strategies for Educational Inquiry. Robert S Michael

Sampling Procedures Y520. Strategies for Educational Inquiry. Robert S Michael Sampling Procedures Y520 Strategies for Educational Inquiry Robert S Michael RSMichael 2-1 Terms Population (or universe) The group to which inferences are made based on a sample drawn from the population.

More information

Notes on using capture-recapture techniques to assess the sensitivity of rapid case-finding methods

Notes on using capture-recapture techniques to assess the sensitivity of rapid case-finding methods Notes on using capture-recapture techniques to assess the sensitivity of rapid case-finding methods VALID International Ltd. Version 0.71 July 2006 Capture-recapture studies Capture-recapture studies are

More information

Tobacco Questions for Surveys A Subset of Key Questions from the Global Adult Tobacco Survey (GATS) 2 nd Edition GTSS

Tobacco Questions for Surveys A Subset of Key Questions from the Global Adult Tobacco Survey (GATS) 2 nd Edition GTSS GTSS GLOBAL TOBACCO SURVEILLANCE SYSTEM Tobacco Questions for Surveys A Subset of Key Questions from the Global Adult Tobacco Survey (GATS) 2 nd Edition GTSS GLOBAL TOBACCO SURVEILLANCE SYSTEM Tobacco

More information

Introduction... 3. Qualitative Data Collection Methods... 7 In depth interviews... 7 Observation methods... 8 Document review... 8 Focus groups...

Introduction... 3. Qualitative Data Collection Methods... 7 In depth interviews... 7 Observation methods... 8 Document review... 8 Focus groups... 1 Table of Contents Introduction... 3 Quantitative Data Collection Methods... 4 Interviews... 4 Telephone interviews... 5 Face to face interviews... 5 Computer Assisted Personal Interviewing (CAPI)...

More information

How to Select a National Student/Parent School Opinion Item and the Accident Rate

How to Select a National Student/Parent School Opinion Item and the Accident Rate GUIDELINES FOR ASKING THE NATIONAL STUDENT AND PARENT SCHOOL OPINION ITEMS Guidelines for sampling are provided to assist schools in surveying students and parents/caregivers, using the national school

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Unit 26 Estimation with Confidence Intervals

Unit 26 Estimation with Confidence Intervals Unit 26 Estimation with Confidence Intervals Objectives: To see how confidence intervals are used to estimate a population proportion, a population mean, a difference in population proportions, or a difference

More information

Statistics 2014 Scoring Guidelines

Statistics 2014 Scoring Guidelines AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

More information

HM REVENUE & CUSTOMS. Child and Working Tax Credits. Error and fraud statistics 2008-09

HM REVENUE & CUSTOMS. Child and Working Tax Credits. Error and fraud statistics 2008-09 HM REVENUE & CUSTOMS Child and Working Tax Credits Error and fraud statistics 2008-09 Crown Copyright 2010 Estimates of error and fraud in Tax Credits 2008-09 Introduction 1. Child Tax Credit (CTC) and

More information

Types of Error in Surveys

Types of Error in Surveys 2 Types of Error in Surveys Surveys are designed to produce statistics about a target population. The process by which this is done rests on inferring the characteristics of the target population from

More information

How To Collect Data From A Large Group

How To Collect Data From A Large Group Section 2: Ten Tools for Applying Sociology CHAPTER 2.6: DATA COLLECTION METHODS QUICK START: In this chapter, you will learn The basics of data collection methods. To know when to use quantitative and/or

More information

Sample Size Issues for Conjoint Analysis

Sample Size Issues for Conjoint Analysis Chapter 7 Sample Size Issues for Conjoint Analysis I m about to conduct a conjoint analysis study. How large a sample size do I need? What will be the margin of error of my estimates if I use a sample

More information

The Office of Public Services Reform The Drivers of Satisfaction with Public Services

The Office of Public Services Reform The Drivers of Satisfaction with Public Services The Office of Public Services Reform The Drivers of Satisfaction with Public Services Research Study Conducted for the Office of Public Services Reform April - May 2004 Contents Introduction 1 Executive

More information

The Cross-Sectional Study:

The Cross-Sectional Study: The Cross-Sectional Study: Investigating Prevalence and Association Ronald A. Thisted Departments of Health Studies and Statistics The University of Chicago CRTP Track I Seminar, Autumn, 2006 Lecture Objectives

More information

As we saw in the previous chapter, statistical generalization requires a representative sample. Chapter 6. Sampling. Population or Universe

As we saw in the previous chapter, statistical generalization requires a representative sample. Chapter 6. Sampling. Population or Universe 62 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics Chapter 6 Sampling As we saw in the previous chapter, statistical generalization requires a representative

More information

Monitoring & Evaluation for Results. Reconstructing Baseline Data for Monitoring & Evaluation -Data Collection Methods-

Monitoring & Evaluation for Results. Reconstructing Baseline Data for Monitoring & Evaluation -Data Collection Methods- Monitoring & Evaluation for Results Reconstructing Baseline Data for Monitoring & Evaluation -Data Collection Methods- 2007. The World Bank Group. All rights reserved. Baseline Data Baseline data are initial

More information

Self-Check and Review Chapter 1 Sections 1.1-1.2

Self-Check and Review Chapter 1 Sections 1.1-1.2 Self-Check and Review Chapter 1 Sections 1.1-1.2 Practice True/False 1. The entire collection of individuals or objects about which information is desired is called a sample. 2. A study is an observational

More information

Northumberland Knowledge

Northumberland Knowledge Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

More information

5.1 Identifying the Target Parameter

5.1 Identifying the Target Parameter University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying

More information

Evaluation: Designs and Approaches

Evaluation: Designs and Approaches Evaluation: Designs and Approaches Publication Year: 2004 The choice of a design for an outcome evaluation is often influenced by the need to compromise between cost and certainty. Generally, the more

More information

Mind on Statistics. Chapter 12

Mind on Statistics. Chapter 12 Mind on Statistics Chapter 12 Sections 12.1 Questions 1 to 6: For each statement, determine if the statement is a typical null hypothesis (H 0 ) or alternative hypothesis (H a ). 1. There is no difference

More information

Lesson 17: Margin of Error When Estimating a Population Proportion

Lesson 17: Margin of Error When Estimating a Population Proportion Margin of Error When Estimating a Population Proportion Classwork In this lesson, you will find and interpret the standard deviation of a simulated distribution for a sample proportion and use this information

More information

GUIDELINES FOR REVIEWING QUANTITATIVE DESCRIPTIVE STUDIES

GUIDELINES FOR REVIEWING QUANTITATIVE DESCRIPTIVE STUDIES GUIDELINES FOR REVIEWING QUANTITATIVE DESCRIPTIVE STUDIES These guidelines are intended to promote quality and consistency in CLEAR reviews of selected studies that use statistical techniques and other

More information

THE JOINT HARMONISED EU PROGRAMME OF BUSINESS AND CONSUMER SURVEYS

THE JOINT HARMONISED EU PROGRAMME OF BUSINESS AND CONSUMER SURVEYS THE JOINT HARMONISED EU PROGRAMME OF BUSINESS AND CONSUMER SURVEYS List of best practice for the conduct of business and consumer surveys 21 March 2014 Economic and Financial Affairs This document is written

More information

Point and Interval Estimates

Point and Interval Estimates Point and Interval Estimates Suppose we want to estimate a parameter, such as p or µ, based on a finite sample of data. There are two main methods: 1. Point estimate: Summarize the sample by a single number

More information

How do we know what we know?

How do we know what we know? Research Methods Family in the News Can you identify some main debates (controversies) for your topic? Do you think the authors positions in these debates (i.e., their values) affect their presentation

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

Assessing Vaccination Coverage Levels Using Clustered Lot Quality Assurance Sampling. Field Manual

Assessing Vaccination Coverage Levels Using Clustered Lot Quality Assurance Sampling. Field Manual Assessing Vaccination Coverage Levels Using Clustered Lot Quality Assurance Sampling Field Manual VERSION EDITED FOR THE GLOBAL POLIO ERADICATION INITIATIVE (GPEI) 27 April 2012 Table of contents Executive

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

MARKETING RESEARCH AND MARKET INTELLIGENCE (MRM711S) FEEDBACK TUTORIAL LETTER SEMESTER `1 OF 2016. Dear Student

MARKETING RESEARCH AND MARKET INTELLIGENCE (MRM711S) FEEDBACK TUTORIAL LETTER SEMESTER `1 OF 2016. Dear Student MARKETING RESEARCH AND MARKET INTELLIGENCE (MRM711S) FEEDBACK TUTORIAL LETTER SEMESTER `1 OF 2016 Dear Student Assignment 1 has been marked and this serves as feedback on the assignment. I have included

More information

Statistical & Technical Team

Statistical & Technical Team Statistical & Technical Team A Practical Guide to Sampling This guide is brought to you by the Statistical and Technical Team, who form part of the VFM Development Team. They are responsible for advice

More information

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe Introduction to the Practice of Statistics Fifth Edition Moore, McCabe Section 5.1 Homework Answers 5.7 In the proofreading setting if Exercise 5.3, what is the smallest number of misses m with P(X m)

More information

AP Statistics Chapters 11-12 Practice Problems MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

AP Statistics Chapters 11-12 Practice Problems MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. AP Statistics Chapters 11-12 Practice Problems Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) Criticize the following simulation: A student

More information

Sample size and sampling methods

Sample size and sampling methods Sample size and sampling methods Ketkesone Phrasisombath MD, MPH, PhD (candidate) Faculty of Postgraduate Studies and Research University of Health Sciences GFMER - WHO - UNFPA - LAO PDR Training Course

More information

Q&A on methodology on HIV estimates

Q&A on methodology on HIV estimates Q&A on methodology on HIV estimates 09 Understanding the latest estimates of the 2008 Report on the global AIDS epidemic Part one: The data 1. What data do UNAIDS and WHO base their HIV prevalence estimates

More information

SAMPLE DESIGN RESEARCH FOR THE NATIONAL NURSING HOME SURVEY

SAMPLE DESIGN RESEARCH FOR THE NATIONAL NURSING HOME SURVEY SAMPLE DESIGN RESEARCH FOR THE NATIONAL NURSING HOME SURVEY Karen E. Davis National Center for Health Statistics, 6525 Belcrest Road, Room 915, Hyattsville, MD 20782 KEY WORDS: Sample survey, cost model

More information

Non-response bias in a lifestyle survey

Non-response bias in a lifestyle survey Journal of Public Health Medicine Vol. 19, No. 2, pp. 203-207 Printed in Great Britain Non-response bias in a lifestyle survey Anthony Hill, Julian Roberts, Paul Ewings and David Gunnell Summary Background

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim About a Proportion 8-5 Testing a Claim About a Mean: s Not Known 8-6 Testing

More information

The SURVEYFREQ Procedure in SAS 9.2: Avoiding FREQuent Mistakes When Analyzing Survey Data ABSTRACT INTRODUCTION SURVEY DESIGN 101 WHY STRATIFY?

The SURVEYFREQ Procedure in SAS 9.2: Avoiding FREQuent Mistakes When Analyzing Survey Data ABSTRACT INTRODUCTION SURVEY DESIGN 101 WHY STRATIFY? The SURVEYFREQ Procedure in SAS 9.2: Avoiding FREQuent Mistakes When Analyzing Survey Data Kathryn Martin, Maternal, Child and Adolescent Health Program, California Department of Public Health, ABSTRACT

More information

The Margin of Error for Differences in Polls

The Margin of Error for Differences in Polls The Margin of Error for Differences in Polls Charles H. Franklin University of Wisconsin, Madison October 27, 2002 (Revised, February 9, 2007) The margin of error for a poll is routinely reported. 1 But

More information

Gender Sensitive Data Gathering Methods

Gender Sensitive Data Gathering Methods Gender Sensitive Data Gathering Methods SABINA ANOKYE MENSAH GENDER AND DEVELOPMENT COORDINATOR GRATIS FOUNDATION, TEMA, GHANA sabinamensah@hotmail.com Learning objectives By the end of this lecture, participants:

More information

Patient Responsibility in Health Care: An AARP Bulletin Survey

Patient Responsibility in Health Care: An AARP Bulletin Survey Patient Responsibility in Health Care: An AARP Bulletin Survey May 2011 Patient Responsibility in Health Care: An AARP Bulletin Survey Data Collected by SSRS Report Prepared by Teresa A. Keenan, Ph.D.

More information

Sampling Probability and Inference

Sampling Probability and Inference PART II Sampling Probability and Inference The second part of the book looks into the probabilistic foundation of statistical analysis, which originates in probabilistic sampling, and introduces the reader

More information

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR

More information

Sample design for educational survey research

Sample design for educational survey research Quantitative research methods in educational planning Series editor: Kenneth N.Ross Module Kenneth N. Ross 3 Sample design for educational survey research UNESCO International Institute for Educational

More information

Variables Control Charts

Variables Control Charts MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. Variables

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

What is a P-value? Ronald A. Thisted, PhD Departments of Statistics and Health Studies The University of Chicago

What is a P-value? Ronald A. Thisted, PhD Departments of Statistics and Health Studies The University of Chicago What is a P-value? Ronald A. Thisted, PhD Departments of Statistics and Health Studies The University of Chicago 8 June 1998, Corrections 14 February 2010 Abstract Results favoring one treatment over another

More information

Presented by: Deborah Bourne C/O Hope Enterprises Ltd. 25 Burlington Ave., KGN 10, Jamaica W.I

Presented by: Deborah Bourne C/O Hope Enterprises Ltd. 25 Burlington Ave., KGN 10, Jamaica W.I Findings and methodological and ethical challenges involved in conducting the FHI study Early Sexual Debut, Sexual Violence, and Sexual Risk-taking among Pregnant Adolescents and Their Peers in Jamaica

More information

Global Food Security Programme A survey of public attitudes

Global Food Security Programme A survey of public attitudes Global Food Security Programme A survey of public attitudes Contents 1. Executive Summary... 2 2. Introduction... 4 3. Results... 6 4. Appendix Demographics... 17 5. Appendix Sampling and weighting...

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Attitudes and Beliefs about Social Determinants of Health. Halton Region Health Department

Attitudes and Beliefs about Social Determinants of Health. Halton Region Health Department Attitudes and Beliefs about Social Determinants of Health Halton Region Health Department May 2014 Contents Background... 3 A Comparison of 10 Social Determinants of Health... 4 Key Demographic Findings...

More information

Assessing Research Protocols: Primary Data Collection By: Maude Laberge, PhD

Assessing Research Protocols: Primary Data Collection By: Maude Laberge, PhD Assessing Research Protocols: Primary Data Collection By: Maude Laberge, PhD Definition Data collection refers to the process in which researchers prepare and collect data required. The data can be gathered

More information

SIMULATION STUDIES IN STATISTICS WHAT IS A SIMULATION STUDY, AND WHY DO ONE? What is a (Monte Carlo) simulation study, and why do one?

SIMULATION STUDIES IN STATISTICS WHAT IS A SIMULATION STUDY, AND WHY DO ONE? What is a (Monte Carlo) simulation study, and why do one? SIMULATION STUDIES IN STATISTICS WHAT IS A SIMULATION STUDY, AND WHY DO ONE? What is a (Monte Carlo) simulation study, and why do one? Simulations for properties of estimators Simulations for properties

More information

Hypothesis Testing for Beginners

Hypothesis Testing for Beginners Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes

More information

p ˆ (sample mean and sample

p ˆ (sample mean and sample Chapter 6: Confidence Intervals and Hypothesis Testing When analyzing data, we can t just accept the sample mean or sample proportion as the official mean or proportion. When we estimate the statistics

More information

Probability Distributions

Probability Distributions CHAPTER 5 Probability Distributions CHAPTER OUTLINE 5.1 Probability Distribution of a Discrete Random Variable 5.2 Mean and Standard Deviation of a Probability Distribution 5.3 The Binomial Distribution

More information

Table 1: Profile of Consumer Particulars Classification Numbers Percentage Upto 20 25 41.67 Age. 21 to 40 18 30.00 Above 40 17 28.

Table 1: Profile of Consumer Particulars Classification Numbers Percentage Upto 20 25 41.67 Age. 21 to 40 18 30.00 Above 40 17 28. 2014; 1(7): 280-286 IJMRD 2014; 1(7): 280-286 www.allsubjectjournal.com Received: 04-12-2014 Accepted: 22-12-2014 e-issn: 2349-4182 p-issn: 2349-5979 V. Suganthi Assistant Professor, Department of Commerce,

More information

Chapter 2: Research Methodology

Chapter 2: Research Methodology Chapter 2: Research Methodology 1. Type of Research 2. Sources of Data 3. Instruments for Data Collection 4. Research Methods 5. Sampling 6. Limitations of the Study 6 Chapter 2: Research Methodology Research

More information

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses Introduction to Hypothesis Testing 1 Hypothesis Testing A hypothesis test is a statistical procedure that uses sample data to evaluate a hypothesis about a population Hypothesis is stated in terms of the

More information

Basic research methods. Basic research methods. Question: BRM.2. Question: BRM.1

Basic research methods. Basic research methods. Question: BRM.2. Question: BRM.1 BRM.1 The proportion of individuals with a particular disease who die from that condition is called... BRM.2 This study design examines factors that may contribute to a condition by comparing subjects

More information

Solar Energy MEDC or LEDC

Solar Energy MEDC or LEDC Solar Energy MEDC or LEDC Does where people live change their interest and appreciation of solar panels? By Sachintha Perera Abstract This paper is based on photovoltaic solar energy, which is the creation

More information

In 2013, U.S. residents age 12 or older experienced

In 2013, U.S. residents age 12 or older experienced U.S. Department of Justice Office of Justice Programs Bureau of Justice Statistics Revised 9/19/2014 Criminal Victimization, 2013 Jennifer L. Truman, Ph.D., and Lynn Langton, Ph.D., BJS Statisticians In

More information

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other 1 Hypothesis Testing Richard S. Balkin, Ph.D., LPC-S, NCC 2 Overview When we have questions about the effect of a treatment or intervention or wish to compare groups, we use hypothesis testing Parametric

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

The AmeriSpeak ADVANTAGE

The AmeriSpeak ADVANTAGE OVERVIEW Funded and operated by NORC at the University of Chicago, AmeriSpeak TM is a probabilitybased panel (in contrast to a non-probability panel). Randomly selected households are sampled with a known,

More information

6.3 Conditional Probability and Independence

6.3 Conditional Probability and Independence 222 CHAPTER 6. PROBABILITY 6.3 Conditional Probability and Independence Conditional Probability Two cubical dice each have a triangle painted on one side, a circle painted on two sides and a square painted

More information