Does Sample Size Still Matter?

Similar documents
Sample Size Issues for Conjoint Analysis

The HB. How Bayesian methods have changed the face of marketing research. Summer 2004

A Procedure for Classifying New Respondents into Existing Segments Using Maximum Difference Scaling

Analyzing Portfolio Expected Loss

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Association Between Variables

Bayesian Statistical Analysis in Medical Research

Fixed-Effect Versus Random-Effects Models

Probability. a number between 0 and 1 that indicates how likely it is that a specific event or set of events will occur.

Likelihood: Frequentist vs Bayesian Reasoning

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

Fairfield Public Schools

Marketing Mix Modelling and Big Data P. M Cain

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Sample Size and Power in Clinical Trials

1 Prior Probability and Posterior Probability

Keep It Simple: Easy Ways To Estimate Choice Models For Single Consumers

5.1 Identifying the Target Parameter

Model-based Synthesis. Tony O Hagan

Statistics Graduate Courses

The Margin of Error for Differences in Polls

The Normal distribution

From the help desk: Bootstrapped standard errors

Likelihood Approaches for Trial Designs in Early Phase Oncology

R Simulations: Monty Hall problem

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

UNDERSTANDING THE TWO-WAY ANOVA

9. Sampling Distributions

Covariance and Correlation

SAS Certificate Applied Statistics and SAS Programming

Sawtooth Software. How Many Questions Should You Ask in Choice-Based Conjoint Studies? RESEARCH PAPER SERIES

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

How Much Should I Save for Retirement? By Massi De Santis, PhD and Marlena Lee, PhD Research

6.4 Normal Distribution

The normal approximation to the binomial

Sawtooth Software Prize: CBC Predictive Modeling Competition

Sawtooth Software. Which Conjoint Method Should I Use? RESEARCH PAPER SERIES. Bryan K. Orme Sawtooth Software, Inc.

Sawtooth Software. The CBC System for Choice-Based Conjoint Analysis. Version 8 TECHNICAL PAPER SERIES. Sawtooth Software, Inc.

Nonparametric statistics and model selection

Linear regression methods for large n and streaming data

Point and Interval Estimates

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

AMS 5 CHANCE VARIABILITY

MAS2317/3317. Introduction to Bayesian Statistics. More revision material

CALCULATIONS & STATISTICS

Unit 26 Estimation with Confidence Intervals

Evaluating Consumer Preferences for Medicare Part D Using Conjoint Analysis

Quantitative Methods for Finance

Lecture Note 1 Set and Probability Theory. MIT Spring 2006 Herman Bennett

16 th Annual Transamerica Retirement Survey Influences of Educational Attainment on Retirement Readiness

INTERNATIONAL COMPARISONS OF PART-TIME WORK

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

The problem with waiting time

1. How different is the t distribution from the normal?

The Binomial Distribution

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

Elementary Statistics


Market Simulators for Conjoint Analysis

Book Review of Rosenhouse, The Monty Hall Problem. Leslie Burkholder 1

NCSS Statistical Software

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

Course Syllabus MATH 110 Introduction to Statistics 3 credits

Constructing and Interpreting Confidence Intervals

Binomial Sampling and the Binomial Distribution

An introduction to Value-at-Risk Learning Curve September 2003

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

Confidence intervals

The Assumption(s) of Normality

Bayesian Phylogeny and Measures of Branch Support

Momentum Traders in the Housing Market: Survey Evidence and a Search Model

Dr Christine Brown University of Melbourne

The Variability of P-Values. Summary

Probability and statistics; Rehearsal for pattern recognition

An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics

Characteristics of Binomial Distributions

Need for Sampling. Very large populations Destructive testing Continuous production process

Section 6-5 Sample Spaces and Probability

Modeling, Simulation & Data Mining: Answering Tough Cost, Date & Staff Forecasts Questions

Executive Summary. Summary - 1

Better decision making under uncertain conditions using Monte Carlo Simulation

How To Check For Differences In The One Way Anova

What Is Probability?

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Foundations of Statistics Frequentist and Bayesian

University of Chicago Graduate School of Business. Business 41000: Business Statistics Solution Key

Trade-off Analysis: A Survey of Commercially Available Techniques

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

WHITE PAPER: Optimizing Employee Recognition Programs

Decision Theory Rational prospecting

Constructing a TpB Questionnaire: Conceptual and Methodological Considerations

Hypothesis Testing for Beginners

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

Lottery Combinatorics

Food Demand Survey (FooDS) Technical Information on Survey Questions and Methods. May 22, Jayson L. Lusk

CUSTOMER SERVICE SATISFACTION WAVE 4

What? So what? NOW WHAT? Presenting metrics to get results

Transcription:

Does Sample Size Still Matter? David Bakken and Megan Bond, KJT Group Introduction The survey has been an important tool in academic, governmental, and commercial research since the 1930 s. Because in most cases the intent of a survey is to measure or estimate the value that a variable takes on in some population of interest, the development of sampling science has been integral to the advancement of survey research. While it may be possible to conduct a census among a small, easily accessed population, in most cases observing or measuring a sample of members of the population is necessary for reasons of cost, timing, and practicality. Most of our understanding of sampling theory and method is based on probability sampling. A probability sample is one in which all members of the population of interest have a known probability of being included in the sample. The most basic form of probability sampling is the simple random sample (SRS) without replacement. With SRS without replacement, each population member or unit has an equal probability of being selected for the sample (with that probability being equal to 1/N, where N is the size of the population). The importance of probability sampling becomes apparent when we want to make statements about the degree of difference between the value of a parameter (such as a mean, a proportion, or a regression coefficient) observed in the sample and the true population value of that parameter. Probability sampling allows us to estimate the error attributable to looking at a sample rather than the entire population. The math of probability sampling (based on the number of possible permutations, such as the number of ways that you can get a result of seven by rolling a pair of dice) is such that if we took an infinitely large number of samples of a given size and measured a parameter for each sample, such as the mean of a variable, the distribution of these sample means (a.k.a. the sampling distribution of the mean) would have a normal distribution and the mean of this distribution would equal the population mean. Furthermore, we can calculate a margin of error around our sample mean based on this overall sampling distribution of means. It turns out that the margin of error for a sample estimate is related to the size of the sample, and larger probability samples, all other things being equal, will have smaller sampling errors. If we were to compare the sampling distribution of means based on SRS samples of 1,000 and 100, we would expect to find greater variability in the means based on samples of size 100. In other words, larger samples lead to more precise estimates of the parameter under study. This property has guided the design of survey samples, and most market researchers understand the relationship between population size, sample size, and precision (or margin of error), and they may apply relatively simple formulas to determine the appropriate sample size to achieve a specific level of precision. 1

Different areas of research practice have different standards or expectations for survey sampling error. Opinion polls conducted to forecast the outcome of an election may be designed for a margin of error of around 3% at a stated likelihood (usually 95%), by which we mean that if we repeated the poll 100 times with the same size probability sample, we would expect to find a value for the expected vote to be within three percentage points on either side of the sample estimate in 95 of those samples. For commercial purposes, the desired precision or margin of error is more likely to be a function of the cost of making a bad bet on some future outcome (this is known as the loss function ) and the magnitude of a meaningful difference in the real world. For example, a small difference in market share may represent a significant increase in revenue for one company but mere accounting noise for another company, and each company will have different requirements for precision in order to make the right bet on a particular action. Precision comes with a cost, however, and as, Figure 1 illustrates, the relationship between precision and sample size is non-linear. Reducing the margin of error at 95% confidence from 3% to 2% requires a near doubling of the sample size; reducing it from 3% to 1% requires a seven-fold increase in sample size. For that reason, researchers must find the appropriate trade-off between cost and precision for a particular survey problem. We should mention two other considerations with respect to precision. When estimating proportions, the formula for calculating the margin of error for a specific sample size is: ME = z (p (1 p))/n where p is the expected proportion. The margin of error for a given sample size is greatest when that proportion is exactly 50%. If we have a prior belief that the population proportion of interest is less than 50%, we may able to achieve a specified level of precision with a smaller sample. However, in the absence of that prior belief, 50% is the most conservative estimate and many people use that value as a default. Similarly, the degree of variability in the population impacts precision, and if we have prior beliefs about the degree of homogeneity or heterogeneity in the population, we may be able to achieve the precision required to satisfy our decision-making needs with a smaller sample. Despite the well-known math of probability sampling, market researchers often fail to conduct studies with samples that are large enough (based on sampling theory) to support their conclusions. Many researchers develop heuristics to simplify decisions about sample size. For example, psychology graduate students of a certain era were taught that a small sample (in particular for a randomized control group experiment) was 30, because that was the point at which one could switch from Student s T to a z-test to compare means. Market researchers have similar rules of thumb for determining the minimum number of elements from a population subgroup or segment to include in a sample. These rules of thumb are often intuitive rather than empirically-based. 2

The Shrinking Market Research Survey Sample Market researchers face a number of challenges in designing and implementing sampling schemes for survey research. Unlike public opinion polling, where the target population may be more or less the same from one poll to another, market research surveys serve a wide variety of information objectives and last week s survey may have targeted a completely different population from this week s. The advent of online research, in particular, online panels, promised to make very large samples affordable. Alas while online panels have driven down CPI, small samples (with perhaps fewer than 100 respondents) have become commonplace. Reasons include the targeting of niche and otherwise low incidence segments and declining response rates. Faced with the need to help marketers make reasonable business decisions using survey data obtained from relatively small samples, we set out to investigate the relationship between sample size, the variability of parameter estimates based on those sample sizes, and the implications for managerial decision-making. We could, of course, calculate sampling errors for our different sample sizes and let it go at that. In fact, the frequentist approach, based on the long term frequency with which a parameter estimate occurs, such as the sampling distribution of the mean, stops at this point. However, this approach assumes that we are completely ignorant about the true population parameter value (even if we have measured it previously). Our research was inspired in part by the story of Jean Baptiste Eugène Estienne, a French Army general who devised a method using Bayes theorem that enabled assessment of the overall quality of a batch of 20,000 artillery shells by destructive testing of no more than 20 shells. At the outset of World War I Germany seized much of France s manufacturing capability, making the existing ammunition stores that much more precious. Applying the standard frequentist approach (calculating a sample size based on an acceptable margin of error around some criterion, such as 10% of all shells) would have required destruction of a few hundred shells. Estienne s method relied on updating the probability that a batch overall was defective (i.e., 10% or more bad shells) with each successive detonation. Thomas Bayes was an 18 th Century English clergyman and amateur mathematician who proposed a rule for accounting for uncertainty. Bayes theorem, as it is known, was described in a paper published posthumously in 1763 by the Royal Society. This theorem is the foundation of Bayesian statistical inference. In Bayesian statistics, probabilities reflect a belief about the sample of data under study rather than about the frequency of events across hypothetical samples. In effect, the Bayesian statistician asks the question, given the data I have in hand, what is the probability of any specific hypothesis about the population parameter value? In contrast, the frequentist asks how probable is the data, given my hypothesis? In effect, the frequentist approach decides whether to accept the data as real. With respect to small samples, we speculated that a Bayesian approach to inference would provide a means to account for uncertainty in a way that gives managers a better understanding of the probability of the sample data with respect to a specific decision. In this approach, we take the data as given and then calculate the probability of different possible true values. This requires a shift in thinking about the marketer s decision problem. Suppose that a company is planning to launch a new product and wants to determine the potential adoption rate at a few 3

different price points. Imagine that the company conducts a survey, employing a simple direct elicitation of willingness to pay, such as the Gabor-Granger method. Further imagine that the results indicate that 15% of the target market says they will definitely purchase the product at a price of $15 or less. The company has determined that they need to achieve at least 20% market adoption at a price of $15 in order move ahead with the launch. The standard frequentist approach is not much help in this case. If the survey sample is relatively small, the 20% threshold is likely to fall within the margin of error; if the sample is large, the resulting increase in precision will shrink the confidence interval around the 15% estimate such that the 20% threshold looks extremely unlikely. We can use Bayes theorem to reduce the uncertainty. Bayes theorem exploits the fact that the joint probability of two events, A and B, can be written as the product of the probability of one event and the conditional probability of the second event, given the first event. While there are some different ways to express the theorem, here is a simple representation: Prob H = xy xy + z(1 x) We wish to estimate the probability of our hypothesis (for example, that the adoption rate will be 20%). The value X reflects our best guess about the likelihood of the hypothesis in the absence of any data (our prior probability belief). Y is the probability that the hypothesis is true given the data, and z is the probability of observing the data if the hypothesis is not true. Overview of Our Study The overall objective of this study, as noted previously, was to assess the variability in parameter estimates for samples of different sizes. We followed the classic paradigm for evaluation of parameter estimates under varying treatments or methods. We started with a population where the parameter values were known. In many studies such a population is synthetic; the observations are generated by specifying the parameter values and then using Monte Carlo simulation methods to create one or more synthetic populations with those parameter values. In our case, we started with a reasonably large sample of actual survey responses and, treating that sample as the population, drew multiple simple random samples of varying size (as described below). Using responses to a choice-based conjoint exercise that was embedded in an online survey of approximately 897 individuals, we created a series of samples of different sizes using different restrictions to reflect the ways in which both probability and convenience samples might be generated. The choice-based conjoint was a simple brand and price exercise that included four brands of LCD television and four price levels. We conducted two separate experiments, as described below. 4

Experiment 1: We drew multiples of ten random samples of 25, 50, 75, 100, 150, 225 and 450 from our population of 897 respondents, resulting in 70 individual samples. We estimated HB models for each sample (using Sawtooth Software s CBC-HB program). Experiment 2: We repeated the method of Experiment 1 but altered the sampling strategy so that samples were more homogeneous. We used two different sets of restrictions to achieve this, one based on demographics, and one based on an attitudinal measure in the original survey. We applied the same overall design, with multiples of 10 samples of size 25, 50, 75, and 100, resulting in a total of 40 samples based on the demographic restriction and 40 based on the attitudinal restriction. Results When using results from choice-based conjoint analysis for research-on-research, we usually employ choice shares predicted by a market simulator (employing a logit transformation to generate purchase probabilities). This method is preferable to comparing different samples using model-based parameters (e.g., regression coefficients) because, in the multinomial logit model that captures the likelihood of choosing an alternative given the alternative s attributes, each sample has a unique scaling parameter. Transforming the model coefficients into predicted choice shares removes this difference between samples. In addition to comparing samples of different size with respect to the variance in predicted choice shares and deviation from the true population value, we also looked at aggregate and individual (i.e., hit rate ) validation using holdout choice tasks. Experiment 1 Figure 2 shows the average prediction variance across the 10 replicates at each sample size. There are two interesting patterns here. First, some brands have smaller prediction variance. These happen to be somewhat larger brands than the other two. The second pattern is that prediction variance shrinks as sample size increases, dropping roughly in half when the sample size is at least 100, compared to samples of 25. Insert Figure 2 here. Figure 3 compares aggregate holdout prediction errors for each of the sample replicates. Aggregate holdout prediction error is the difference between the shares predicted for each brand at the prices set for a holdout task (that is not included in the modeling) and the actual choices that respondents made in those tasks. Larger errors reflect more noise in the parameters, and we see that these errors are both larger on average and more variable when the sample is small than when it is larger. Insert Figure 3 here. 5

Figure 4 compares individual hit rates for each of the sample replicates. The hit rate is the proportion of times the prediction choice for a given respondent matches the actual choice the respondent made in that holdout task. With one notable exception (samples of 100), the average hit rates and the variability in hit rates are similar across different sample sizes. This is probably a consequence of the HB method used to estimate the individual-level utilities. This method borrows data from other respondents to derive individual models for each respondent. It is possible that the hit rates for smaller samples are the result of over-fitting since there are fewer cases to borrow data from (which pulls the individual models in the direction of the overall average) while with larger samples, the individual parameter space is better represented, so the borrowed data is more probable. Insert Figure 4 here. The final indication of the potential error associated with sample size is reflected in the differences between predicted choice shares based on each sample replicate and the overall population value (the modeled choice shares using the entire sample). Figure 5 shows these errors for predicted choice shares for just one of the brands. As with the other measures, individual sample prediction errors are larger for smaller samples, but when the samples are averaged (within sample size), the predictions are pretty close to the actual population value. Insert Figure 5 here. Experiment 2 As we noted in the description of our second experiment, market research samples often are restricted in ways that might impact the variability or heterogeneity within the sample. All other things being equal, samples from populations that are more homogeneous should produce more consistent parameter estimates (as long as the population variability is related to the parameter of interest). We devised two constrained sampling approaches to yield samples that would be either demographically more similar (using age) or attitudinally more similar. Overall, as Figures 6 and 7 indicate, the patterns of variability in predicted choice shares in these constrained samples is similar to the unconstrained samples. Since our sample restrictions were arbitrary and only two of many possible sample restrictions, it is possible that any resulting increase in homogeneity was either small or not relevant to the parameters of interest. It is also possible that the HB method attenuates the impact of increased homogeneity on the individual- level choice models. Insert Figures 6 and 7 about here. 6

Accounting for Uncertainty Looking across these sample replicates, we want to know, for a given sample size, how likely we are to make a seriously wrong decision. We applied Bayes theorem to estimate the uncertainty associated with samples of different size. Knowing that the population choice share for Toshiba at a particular price is roughly 19% and that if the price is lowered by $100 the choice share doubles, we can calculate the uncertainty for each of the samples. Figure 8 compares the results of this calculation for samples of 25 and 100. We can see that we should have greater confidence in any one sample of 100 than in any one sample of size 25. Insert Figure 8 about here. Conclusions For us, our experiments indicate that sample size does still matter. Moreover, we now have greater confidence in drawing the line for minimum sample size of about 100 respondents, at least for studies involving relatively simple choice-based conjoint models estimated using a hierarchical Bayesian method. Regardless of the sample size, Bayes theorem offers a way to quantify the uncertainty around population parameters. Bayes theorem requires that we alter our way of thinking about the data. Rather than base our inferences on the long term frequencies from hypothetical sample replicates, Bayes theorem allows us to ground our estimates in the data at hand. We do not view Bayesian inference as a total replacement for frequentist methods of estimating sampling error. Instead, we see Bayes theorem as an additional tool that can help managers make the best possible decisions or bets based on all the information we have available. 7

Figures Figure 1. Figure 2. 8

Figure 3. Figure 4. 9

Figure 5. Figure 6. 10

Figure 7. Figure 8. 11