1 MBACATÓLICA JAN/APRIL 2006 Marketing Research Fernando S. Machado Week 6 Sampling: Design and Procedures Sampling: Sample Size Determination Data Preparation 1 Sampling: Design and Procedures The Sampling Process Sampling Techniques Application: Shopping Center Sampling 2 1

2 Identifying the Target Population Determining the Sampling Frame Selecting a Sampling Technique Probability Sampling Non-Probability Sampling Determining the Relevant Sample Size Execute Sampling Data Collection From Respondents Information for Decision-Making Reconciling the Population, Sampling Frame Differences The Sampling Process Handling the Non- Response Problem 3 Sampling Process Target Population: The collection of elements or objects that possess the information sought by the researcher and about which inferences are to be made. Determining Target Population Look to the research objectives Consider all alternatives Know your market Consider the appropriate sampling unit Consider convenience Sampling Frame: representation of the elements of the target population. It consists of a list or set of directions for identifying the target population. 4 2

3 Sampling Process (Contd.) Selecting a Sampling Procedure Decide whether to use probability or non probability sampling Probability Sampling All population members have a known probability of being in the sample Non Probability Sampling Costs and trouble of developing sampling frame are eliminated Results can contain hidden biases and uncertainties 5 Classification of Sampling Techniques Sampling Techniques Nonprobability Sampling Techniques Probability Sampling Techniques Convenience Sampling Judgmental Sampling Quota Sampling Snowball Sampling Simple random Sampling Systematic Sampling Stratified Sampling Cluster Sampling Other sampling Techniques 6 3

4 Procedures for Drawing Probability Samples Simple Random Sampling Each population member, and each possible sample, has equal probability of being selected 1. Select a suitable sampling frame 2. Each element is assigned a number from 1 to N (pop. size) 3. Generate n (sample size) different random numbers between 1 and N 4. The numbers generated denote the elements that should be included in the sample 7 Systematic Sampling Involves systematically spreading the sample through the list of population members Commonly used in telephone surveys 1. Select a suitable sampling frame 2. Each element is assigned a number from 1 to N (pop. size) 3. Determine the sample interval i:i=n/n. If i is a fraction, round to the nearest integer 4. Select a random number, r, between 1 and i, as explained in simple random sampling 5. The elements with the following numbers will comprise the systematic random sample: r, r+i, r+2i, r+3i, r+4i,..., r+(n-1)i 8 4

5 Stratified Sampling The chosen sample is forced to contain units from each of the segments or strata of the population 1. Select a suitable frame 2. Select the stratification variable(s) and the number of strata, H 3. Divide the entire population into H strata. Based on the classification variable, each element of the population is assigned to one of the H strata 4. In each stratum, number the elements from 1 to N h (the pop. size of stratum h) 5. Determine the sample size of each stratum, n h, based on proportionate or disproportionate stratified sampling, where H h=1 n h = n 6. In each stratum select a simple random sample of size n h 9 Sampling Techniques (Contd.) Types of Stratified Sampling Proportionate Stratified Sampling Number of objects/sampling units chosen from each group is proportional to number in population Can be classified as directly proportional or indirectly proportional stratified sampling Disproportionate Stratified Sampling Sample size in each group is not proportional to the respective group sizes Used when multiple groups are compared and respective group sizes are small 10 5

6 Sampling Techniques (Contd.) Cluster Sampling Involves dividing population into subgroups Random sample of subgroups/clusters is selected For each selected cluster, either all the elements are included in the sample or a sample of elements is drawn probabilistically Cluster Sampling is: Very cost effective Useful when subgroups can be identified that are representative of entire population 11 An example of area sampling How to select probabilistically a sample of 3 freguesias from a group of 10? Freguesia nº Population Cumulative Solution: Randomly generate 3 numbers on the interval Ex: , , select freguesias 1, 4 and

7 A Comparison of Stratified and Cluster Sampling Processes Stratified Sampling Homogeneity within groups Heterogeneity between groups All groups included Cluster Sampling Homogeneity between groups Heterogeneity within groups Random selection of groups Sampling efficiency improved by increasing accuracy at a faster rate than cost Sampling efficiency improved by decreasing cost at a faster rate than accuracy 13 Sampling Techniques (Contd.) Types of Non Probability Sampling Judgmental sampling "Expert" uses judgement to identify representative samples Snowball sampling Form of judgmental sampling Appropriate when reaching small, specialized populations Each respondent, after being interviewed, is asked to identify one or more others in the appropriate group Convenience sampling Used to obtain information quickly and inexpensively Quota sampling Judgmental sampling where the sample includes a minimum number from each specified subgroup in the population Often based on demographic data 14 7

8 Strengths and Weaknesses of Basic Sampling Techniques Technique Strengths Weaknesses Nonprobability Sampling Convenience sampling Judgmental sampling Quota sampling Snowball sampling Least expensive, least time-consuming, most convenient Low cost, convenient, not time-consuming Sample can be controlled for certain characteristics Can estimate rare characteristics Selection bias, sample not representative, not recommended for descriptive or causal research Does not allow generalization, subjective Selection bias, no assurance of representativeness Time-consuming Probability sampling Simple random sampling (SRS) Systematic sampling Stratified sampling Cluster sampling Easily understood, results projectable Can increase representativeness, Easier to implement than SRS, sampling frame not necessary Include all important subpopulations, precision Easy to implement, cost effective Difficult to construct sampling frame, expensive, lower precision, no assurance of representativeness. Can decrease representativeness Difficult to select relevant stratification variables, not feasible to stratify on many variables, expensive Imprecise, difficult to compute and interpret results 15 Application: Shopping Center Sampling 20% of all questionnaires completed or interviews granted are store-intercept interviews Bias is introduced by methods used to select the sample Sources of Bias: Selection of shopping center Point of shopping center from which respondents are drawn Time of day More frequent shoppers will be more likely to be selected 16 8

9 Shopping Center Sampling (Contd.) Solutions to Bias: Shopping Center Bias Use several shopping centers in different neighborhoods Use several diverse cities Sample Locations Within a Center Stratify by entrance location Take separate sample from each entrance To obtain overall average, strata averages should be combined by weighing them to reflect traffic that is associated with each entrance 17 Shopping Center Sampling (Contd.) Time Sampling - Stratify by time segments - Interview during each segment - Final counts should be weighed according to traffic counts Sampling people versus shopping visits - If the goal is to develop a sample that represents the total population, then we should attach a lower weight to more frequent shoppers. 18 9

10 Sampling: Sample Size Determination Statistical vs. Ad-hoc Methods of Sample Size Determination Sample Reliability, Sampling Distributions and Sample Size Some Practical Rules for Determining Sample Size Non-response Problems 19 Sample Size and Statistical Theory Determining the Sample Size Use of statistical techniques or ad hoc methods Ad hoc methods used when researcher knows from experience what sample size to adopt or when budgetary constraints dictate the size of the sample A Sampling Problem The management of a local restaurant wants to determine the average monthly amount spent by households at fancy restaurants. Some households do not spend anything at all, whereas other households spend as much as 300 per month. Management wants to be 95% confident of the findings and does not want the error to exceed plus or minus 5. What sample size should be used to determine the average monthly household expenditure? 20 10

11 Population Characteristics/parameters Population Mean (µ) Normally unknown; Determine its value as closely as possible by taking a sample from population Population Variance (σ 2 ) Measure of population dispersion Based on degree to which a response differs from population average response Sample Characteristics/statistics Sample Mean (X) Is used to estimate the unknown population mean Sample Variance (S 2 ) Is used to estimate unknown population variance 21 Effect of Sample Size on Distribution of Sample Mean Distribution of Sample Means (n=100) (n=10) < >

12 Effect of Population Variance on Distribution of Sample Mean Distribution of Sample sample means Means (n=10, sigma=20) sigma=50) < > X will vary from sample to sample As sample size (n) increases, variation in X will decrease Standard error depends on sample size As the population variation increases, variation in X will increase Standard error depends on population variance Assume that variation of X follows normal distribution - reasonable if population is normal or if sample size is large Sampling distribution Sample Reliability Indicates probability of getting a particular sample mean 24 12

13 Sampling Distributions X ( µ, σ ) N p N( π, σ ) X p Standard error of sample mean: = σ n s = n σ π( 1 π) p( 1 p) X S X σ = Standard error of sample proportion: p S p n n = 25 Symbols for Population and Sample Variables Variable Population Sample Mean µ X Proportion π p Variance σ 2 S 2 Standard Deviation σ S Size N n Standard error of the mean σ X S X Standard error of proportion σ p S p Standardized variate (X-µ)/σ ( X X) / S 26 13

14 Finding Probabilities Corresponding to Known Values Area between µ and µ + s 1= Area between µ and µ + s 2 = Area between µ and µ + s 3 = Area is µ-3s µ-2s µ-1s µ µ+1s µ+2s µ+3s Z Scale (µ=50, s =5) Z Scale 27 Finding Values Corresponding to Known Probabilities: Confidence Interval Area is Area is Area is Area is X 50 X Scale Z Scale -Z 0 -Z 28 14

15 95% Confidence Interval for the Population Mean samplingerror= 2 σ n _ X L σ 2 σ = 2 X n _ X σ 2 σ = 2 X n For a given n we can be 95% confident that µ lies in the interval [ X, X ] L U _ X U 29 Size of Interval Estimate and Confidence Level To determine sample size we need to specify: Precision level: when estimating a population parameter by using a sample statistic, the precision level is the desired size of the interval (maximum permissible difference between the sample statistic and the population parameter) Confidence level: probability that a confidence interval will include the population parameter Sample size 30 15

16 Sample Size Question Size of the sampling error that is desired (D) Confidence level Z Use an estimate (s) of unknown population st. dev. (σ) Sample size n = Z 2 s 2 /D 2 Z σ = D n Determining the Population Standard Deviation Use a sample standard deviation obtained from a previous comparable survey or from a pilot survey Estimate the population standard deviation subjectively 31 Sample Size When Proportions Are Used sampling error = D = Z ( 1 ) π π n = z 2 p(1 - p)/(sampling error) 2 n For a 95% confidence level (Z=1,96), sampling error is maximised when π=0.5 π(1- π)=0.25. Using 1 st expression we can obtain: n erro 100 9,8% 250 6,2% 500 4,4% 750 3,6% ,1% 32 16

17 Sample Size Determination for Means and Proportions Steps Means Proportions 1. Specify the level of precision D = ±\$5.00 D = p - = ± Specify the confidence level (CL) CL = 95% CL = 95% 3. Determine the z value associated with CL 4. Determine the standard deviation of the population 5. Determine the sample size using the formula for the standard error 7. If necessary, reestimate the confidence interval by employing s to estimate σ z value is 1.96 z value is 1.96 Estimate σ: σ = 55 Estimate : = 0.64 n = σ 2 z 2 /D 2 = 465 n = (1- ) z 2 /D 2 = 355 = X ± z S X = p ± z s p 8. If precision is specified in relative rather than absolute terms, determine the sample size by substituting for D. D = Rµ n = C 2 z 2 /R 2 D = R n = z 2 (1- )/(R 2 ) 33 A Sampling Problem The management of a local restaurant wants to determine the average monthly amount spent by households at fancy restaurants. Some households do not spend anything at all, whereas other households spend as much as 300 per month. Management wants to be 95% confident of the findings and does not want the error to exceed plus or minus 5. i) What sample size should be used to determine the average monthly household expenditure? ii) After the survey was conducted, the average expenditure was found to be and the standard deviation was 45. Construct a 95% confidence interval. What can be said about the level of precision? 34 17

18 Sample Size For Estimating Multiple Parameters Variable Mean Household Monthly Expense On Departm.store shopping Clothes Gifts Confidence level 95% 95% 95% z value Precision level (D) \$5 \$5 \$4 Standard deviation of the population (σ) \$55 \$40 \$30 Required sample size (n) Rules of Thumb Sample should be large enough, so that when it is divided into groups, each group will have a minimum sample size of 100 or more If analysis involves comparison between subgroups, sample size in each subgroup should be 20 to 50 Use disproportionate sampling if one of groups of population is relatively small Researcher must decide whether sample size dictated by budget constraints allows a worthwhile study to be conducted Find similar studies and use their sample sizes as a guide 36 18

19 Sample Sizes Used in Marketing Research Studies Type of Study Minimum Size Typical Range Problem identification research (e.g. market potential) Problem-solving research (e.g. pricing) 500 1,000-2, Product tests Test marketing studies TV, radio, or print advertising (per commercial or ad tested) Focus groups 2 groups 4-12 groups 37 Non Response Problems Sample size has to be large enough to allow for non response Those who respond may differ from non respondents in a meaningful way, creating biases Seriousness of nonresponse bias depends on extent of non response 38 19

20 Solutions to Nonresponse Problem - Improving Response Rates Methods of Improving Response Rates Reducing Refusals Reducing Not-at-Homes Prior Notification Motivating Respondents Incentives Questionnaire Design and Administration Follow-Up Other Facilitators Callbacks 39 Solutions to Nonresponse Problem (Contd) Attempt to estimate the nonresponse bias (assess whether there are significant differences between respondents and non-respondents) Adjust to non-response bias - Sub-sampling of nonrespondents (contact a subsample of non-respondents in a mail survey by telephone) - Replacement (contact nonrespondents from an earlier, similar survey) - Substitution (divide the sample into sub-groups that are internally homogeneous in terms of respondent characteristics but heterogeneous in terms of response rates) - Weighting (Potitz approach: attach higher weight to respondents who are less often at home) 40 20

21 Data Preparation Editing the Data Coding the Data Transcribing the Data Cleaning the Data Statistically Adjusting the Data 41 Data Preparation Process Prepare Preliminary Plan of Data Analysis Edit Code Transcribe Clean Data Statistically Adjust the Data 42 21

22 Preparing the Data for Analysis (Contd.) Data Editing A review of the questionnaires with the objective of increasing accuracy and precision. Problems Identified With Data Editing Interviewer Error (incorrect instructions from interviewers) Omissions (respondents may fail to answer a question or a whole section of the questionnaire) Ambiguity (responses may not be legible or may be unclear) Inconsistencies (preliminary check of obvious inconsistencies) Lack of Cooperation (ex: respondent who checks always the same response category in every item of a Likert scale) Ineligible Respondent 43 Preparing the Data for Analysis (Contd.) Treatment of unsatisfactory responses Return questionnaire to the field in order to be completed Assign missing values Discard unsatisfactory respondents 44 22

23 Preparing the Data for Analysis (Contd.) Coding Coding closed-ended questions involves specifying how the responses are to be entered Open-ended questions are difficult to code Lengthy list of possible responses is generated Coding and Transcribing Different Types of Variables Categorical data on sex, age, income, etc Categorical data with possibility to choose multiple categories Rank-ordered data 45 Preparing the Data for Analysis (Contd.) Data cleaning Consistency checks Out-of-range values Extreme values Logical inconsistencies Treatment of missing responses Casewise deletion Pairwise deletion ( different sample for each calculation) Substitute a neutral value Substitute an imputed response (the researcher attempts to infer from the available data the responses the individuals would have given if they had answered)

24 Preparing the Data for Analysis (Contd.) Statistically Adjusting the Data: Weighting Each response is assigned a number according to a prespecified rule Makes sample data more representative of target population on specific characteristics Modifies number of cases in the sample that possess certain characteristics Adjusts the sample so that greater importance is attached to respondents with certain characteristics 47 Weighting An example: Population (N=10 000) Sample (n=100) Primary Secondary Higher Weight in Weight in Weight population sample Primary 50% 40% 1.25 Secondary 30% 20% 1.5 Higher 20% 40%

25 Preparing the Data for Analysis (Contd.) Statistically Adjusting the Data: Variable Respecification Existing data is modified to create new variables that are consistent with study objectives Recoding of a variable Large number of variables collapsed into fewer variables One categorical variable with d categories transformed into d-1 dummy variables 49 Preparing the Data for Analysis (Contd.) Statistically Adjusting the Data: Scale Transformation Scale values are manipulated to ensure comparability with other scales Standardization allows the researcher to compare variables that have been measured using different types of scales Variables are forced to have a mean of zero and a standard deviation of one Can be done only on interval or ratio scaled data 50 25

### Confidence Intervals for the Difference Between Two Means

Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means

### How to do a Survey (A 9-Step Process) Mack C. Shelley, II Fall 2001 LC Assessment Workshop

How to do a Survey (A 9-Step Process) Mack C. Shelley, II Fall 2001 LC Assessment Workshop 1. Formulate the survey keeping in mind your overall substantive and analytical needs. Define the problem you