Does Sample Size Still Matter?

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Does Sample Size Still Matter?"

Transcription

1 Does Sample Size Still Matter? David Bakken and Megan Bond, KJT Group Introduction The survey has been an important tool in academic, governmental, and commercial research since the 1930 s. Because in most cases the intent of a survey is to measure or estimate the value that a variable takes on in some population of interest, the development of sampling science has been integral to the advancement of survey research. While it may be possible to conduct a census among a small, easily accessed population, in most cases observing or measuring a sample of members of the population is necessary for reasons of cost, timing, and practicality. Most of our understanding of sampling theory and method is based on probability sampling. A probability sample is one in which all members of the population of interest have a known probability of being included in the sample. The most basic form of probability sampling is the simple random sample (SRS) without replacement. With SRS without replacement, each population member or unit has an equal probability of being selected for the sample (with that probability being equal to 1/N, where N is the size of the population). The importance of probability sampling becomes apparent when we want to make statements about the degree of difference between the value of a parameter (such as a mean, a proportion, or a regression coefficient) observed in the sample and the true population value of that parameter. Probability sampling allows us to estimate the error attributable to looking at a sample rather than the entire population. The math of probability sampling (based on the number of possible permutations, such as the number of ways that you can get a result of seven by rolling a pair of dice) is such that if we took an infinitely large number of samples of a given size and measured a parameter for each sample, such as the mean of a variable, the distribution of these sample means (a.k.a. the sampling distribution of the mean) would have a normal distribution and the mean of this distribution would equal the population mean. Furthermore, we can calculate a margin of error around our sample mean based on this overall sampling distribution of means. It turns out that the margin of error for a sample estimate is related to the size of the sample, and larger probability samples, all other things being equal, will have smaller sampling errors. If we were to compare the sampling distribution of means based on SRS samples of 1,000 and 100, we would expect to find greater variability in the means based on samples of size 100. In other words, larger samples lead to more precise estimates of the parameter under study. This property has guided the design of survey samples, and most market researchers understand the relationship between population size, sample size, and precision (or margin of error), and they may apply relatively simple formulas to determine the appropriate sample size to achieve a specific level of precision. 1

2 Different areas of research practice have different standards or expectations for survey sampling error. Opinion polls conducted to forecast the outcome of an election may be designed for a margin of error of around 3% at a stated likelihood (usually 95%), by which we mean that if we repeated the poll 100 times with the same size probability sample, we would expect to find a value for the expected vote to be within three percentage points on either side of the sample estimate in 95 of those samples. For commercial purposes, the desired precision or margin of error is more likely to be a function of the cost of making a bad bet on some future outcome (this is known as the loss function ) and the magnitude of a meaningful difference in the real world. For example, a small difference in market share may represent a significant increase in revenue for one company but mere accounting noise for another company, and each company will have different requirements for precision in order to make the right bet on a particular action. Precision comes with a cost, however, and as, Figure 1 illustrates, the relationship between precision and sample size is non-linear. Reducing the margin of error at 95% confidence from 3% to 2% requires a near doubling of the sample size; reducing it from 3% to 1% requires a seven-fold increase in sample size. For that reason, researchers must find the appropriate trade-off between cost and precision for a particular survey problem. We should mention two other considerations with respect to precision. When estimating proportions, the formula for calculating the margin of error for a specific sample size is: ME = z (p (1 p))/n where p is the expected proportion. The margin of error for a given sample size is greatest when that proportion is exactly 50%. If we have a prior belief that the population proportion of interest is less than 50%, we may able to achieve a specified level of precision with a smaller sample. However, in the absence of that prior belief, 50% is the most conservative estimate and many people use that value as a default. Similarly, the degree of variability in the population impacts precision, and if we have prior beliefs about the degree of homogeneity or heterogeneity in the population, we may be able to achieve the precision required to satisfy our decision-making needs with a smaller sample. Despite the well-known math of probability sampling, market researchers often fail to conduct studies with samples that are large enough (based on sampling theory) to support their conclusions. Many researchers develop heuristics to simplify decisions about sample size. For example, psychology graduate students of a certain era were taught that a small sample (in particular for a randomized control group experiment) was 30, because that was the point at which one could switch from Student s T to a z-test to compare means. Market researchers have similar rules of thumb for determining the minimum number of elements from a population subgroup or segment to include in a sample. These rules of thumb are often intuitive rather than empirically-based. 2

3 The Shrinking Market Research Survey Sample Market researchers face a number of challenges in designing and implementing sampling schemes for survey research. Unlike public opinion polling, where the target population may be more or less the same from one poll to another, market research surveys serve a wide variety of information objectives and last week s survey may have targeted a completely different population from this week s. The advent of online research, in particular, online panels, promised to make very large samples affordable. Alas while online panels have driven down CPI, small samples (with perhaps fewer than 100 respondents) have become commonplace. Reasons include the targeting of niche and otherwise low incidence segments and declining response rates. Faced with the need to help marketers make reasonable business decisions using survey data obtained from relatively small samples, we set out to investigate the relationship between sample size, the variability of parameter estimates based on those sample sizes, and the implications for managerial decision-making. We could, of course, calculate sampling errors for our different sample sizes and let it go at that. In fact, the frequentist approach, based on the long term frequency with which a parameter estimate occurs, such as the sampling distribution of the mean, stops at this point. However, this approach assumes that we are completely ignorant about the true population parameter value (even if we have measured it previously). Our research was inspired in part by the story of Jean Baptiste Eugène Estienne, a French Army general who devised a method using Bayes theorem that enabled assessment of the overall quality of a batch of 20,000 artillery shells by destructive testing of no more than 20 shells. At the outset of World War I Germany seized much of France s manufacturing capability, making the existing ammunition stores that much more precious. Applying the standard frequentist approach (calculating a sample size based on an acceptable margin of error around some criterion, such as 10% of all shells) would have required destruction of a few hundred shells. Estienne s method relied on updating the probability that a batch overall was defective (i.e., 10% or more bad shells) with each successive detonation. Thomas Bayes was an 18 th Century English clergyman and amateur mathematician who proposed a rule for accounting for uncertainty. Bayes theorem, as it is known, was described in a paper published posthumously in 1763 by the Royal Society. This theorem is the foundation of Bayesian statistical inference. In Bayesian statistics, probabilities reflect a belief about the sample of data under study rather than about the frequency of events across hypothetical samples. In effect, the Bayesian statistician asks the question, given the data I have in hand, what is the probability of any specific hypothesis about the population parameter value? In contrast, the frequentist asks how probable is the data, given my hypothesis? In effect, the frequentist approach decides whether to accept the data as real. With respect to small samples, we speculated that a Bayesian approach to inference would provide a means to account for uncertainty in a way that gives managers a better understanding of the probability of the sample data with respect to a specific decision. In this approach, we take the data as given and then calculate the probability of different possible true values. This requires a shift in thinking about the marketer s decision problem. Suppose that a company is planning to launch a new product and wants to determine the potential adoption rate at a few 3

4 different price points. Imagine that the company conducts a survey, employing a simple direct elicitation of willingness to pay, such as the Gabor-Granger method. Further imagine that the results indicate that 15% of the target market says they will definitely purchase the product at a price of $15 or less. The company has determined that they need to achieve at least 20% market adoption at a price of $15 in order move ahead with the launch. The standard frequentist approach is not much help in this case. If the survey sample is relatively small, the 20% threshold is likely to fall within the margin of error; if the sample is large, the resulting increase in precision will shrink the confidence interval around the 15% estimate such that the 20% threshold looks extremely unlikely. We can use Bayes theorem to reduce the uncertainty. Bayes theorem exploits the fact that the joint probability of two events, A and B, can be written as the product of the probability of one event and the conditional probability of the second event, given the first event. While there are some different ways to express the theorem, here is a simple representation: Prob H = xy xy + z(1 x) We wish to estimate the probability of our hypothesis (for example, that the adoption rate will be 20%). The value X reflects our best guess about the likelihood of the hypothesis in the absence of any data (our prior probability belief). Y is the probability that the hypothesis is true given the data, and z is the probability of observing the data if the hypothesis is not true. Overview of Our Study The overall objective of this study, as noted previously, was to assess the variability in parameter estimates for samples of different sizes. We followed the classic paradigm for evaluation of parameter estimates under varying treatments or methods. We started with a population where the parameter values were known. In many studies such a population is synthetic; the observations are generated by specifying the parameter values and then using Monte Carlo simulation methods to create one or more synthetic populations with those parameter values. In our case, we started with a reasonably large sample of actual survey responses and, treating that sample as the population, drew multiple simple random samples of varying size (as described below). Using responses to a choice-based conjoint exercise that was embedded in an online survey of approximately 897 individuals, we created a series of samples of different sizes using different restrictions to reflect the ways in which both probability and convenience samples might be generated. The choice-based conjoint was a simple brand and price exercise that included four brands of LCD television and four price levels. We conducted two separate experiments, as described below. 4

5 Experiment 1: We drew multiples of ten random samples of 25, 50, 75, 100, 150, 225 and 450 from our population of 897 respondents, resulting in 70 individual samples. We estimated HB models for each sample (using Sawtooth Software s CBC-HB program). Experiment 2: We repeated the method of Experiment 1 but altered the sampling strategy so that samples were more homogeneous. We used two different sets of restrictions to achieve this, one based on demographics, and one based on an attitudinal measure in the original survey. We applied the same overall design, with multiples of 10 samples of size 25, 50, 75, and 100, resulting in a total of 40 samples based on the demographic restriction and 40 based on the attitudinal restriction. Results When using results from choice-based conjoint analysis for research-on-research, we usually employ choice shares predicted by a market simulator (employing a logit transformation to generate purchase probabilities). This method is preferable to comparing different samples using model-based parameters (e.g., regression coefficients) because, in the multinomial logit model that captures the likelihood of choosing an alternative given the alternative s attributes, each sample has a unique scaling parameter. Transforming the model coefficients into predicted choice shares removes this difference between samples. In addition to comparing samples of different size with respect to the variance in predicted choice shares and deviation from the true population value, we also looked at aggregate and individual (i.e., hit rate ) validation using holdout choice tasks. Experiment 1 Figure 2 shows the average prediction variance across the 10 replicates at each sample size. There are two interesting patterns here. First, some brands have smaller prediction variance. These happen to be somewhat larger brands than the other two. The second pattern is that prediction variance shrinks as sample size increases, dropping roughly in half when the sample size is at least 100, compared to samples of 25. Insert Figure 2 here. Figure 3 compares aggregate holdout prediction errors for each of the sample replicates. Aggregate holdout prediction error is the difference between the shares predicted for each brand at the prices set for a holdout task (that is not included in the modeling) and the actual choices that respondents made in those tasks. Larger errors reflect more noise in the parameters, and we see that these errors are both larger on average and more variable when the sample is small than when it is larger. Insert Figure 3 here. 5

6 Figure 4 compares individual hit rates for each of the sample replicates. The hit rate is the proportion of times the prediction choice for a given respondent matches the actual choice the respondent made in that holdout task. With one notable exception (samples of 100), the average hit rates and the variability in hit rates are similar across different sample sizes. This is probably a consequence of the HB method used to estimate the individual-level utilities. This method borrows data from other respondents to derive individual models for each respondent. It is possible that the hit rates for smaller samples are the result of over-fitting since there are fewer cases to borrow data from (which pulls the individual models in the direction of the overall average) while with larger samples, the individual parameter space is better represented, so the borrowed data is more probable. Insert Figure 4 here. The final indication of the potential error associated with sample size is reflected in the differences between predicted choice shares based on each sample replicate and the overall population value (the modeled choice shares using the entire sample). Figure 5 shows these errors for predicted choice shares for just one of the brands. As with the other measures, individual sample prediction errors are larger for smaller samples, but when the samples are averaged (within sample size), the predictions are pretty close to the actual population value. Insert Figure 5 here. Experiment 2 As we noted in the description of our second experiment, market research samples often are restricted in ways that might impact the variability or heterogeneity within the sample. All other things being equal, samples from populations that are more homogeneous should produce more consistent parameter estimates (as long as the population variability is related to the parameter of interest). We devised two constrained sampling approaches to yield samples that would be either demographically more similar (using age) or attitudinally more similar. Overall, as Figures 6 and 7 indicate, the patterns of variability in predicted choice shares in these constrained samples is similar to the unconstrained samples. Since our sample restrictions were arbitrary and only two of many possible sample restrictions, it is possible that any resulting increase in homogeneity was either small or not relevant to the parameters of interest. It is also possible that the HB method attenuates the impact of increased homogeneity on the individual- level choice models. Insert Figures 6 and 7 about here. 6

7 Accounting for Uncertainty Looking across these sample replicates, we want to know, for a given sample size, how likely we are to make a seriously wrong decision. We applied Bayes theorem to estimate the uncertainty associated with samples of different size. Knowing that the population choice share for Toshiba at a particular price is roughly 19% and that if the price is lowered by $100 the choice share doubles, we can calculate the uncertainty for each of the samples. Figure 8 compares the results of this calculation for samples of 25 and 100. We can see that we should have greater confidence in any one sample of 100 than in any one sample of size 25. Insert Figure 8 about here. Conclusions For us, our experiments indicate that sample size does still matter. Moreover, we now have greater confidence in drawing the line for minimum sample size of about 100 respondents, at least for studies involving relatively simple choice-based conjoint models estimated using a hierarchical Bayesian method. Regardless of the sample size, Bayes theorem offers a way to quantify the uncertainty around population parameters. Bayes theorem requires that we alter our way of thinking about the data. Rather than base our inferences on the long term frequencies from hypothetical sample replicates, Bayes theorem allows us to ground our estimates in the data at hand. We do not view Bayesian inference as a total replacement for frequentist methods of estimating sampling error. Instead, we see Bayes theorem as an additional tool that can help managers make the best possible decisions or bets based on all the information we have available. 7

8 Figures Figure 1. Figure 2. 8

9 Figure 3. Figure 4. 9

10 Figure 5. Figure 6. 10

11 Figure 7. Figure 8. 11

Sample Size Issues for Conjoint Analysis

Sample Size Issues for Conjoint Analysis Chapter 7 Sample Size Issues for Conjoint Analysis I m about to conduct a conjoint analysis study. How large a sample size do I need? What will be the margin of error of my estimates if I use a sample

More information

The HB. How Bayesian methods have changed the face of marketing research. Summer 2004

The HB. How Bayesian methods have changed the face of marketing research. Summer 2004 The HB How Bayesian methods have changed the face of marketing research. 20 Summer 2004 Reprinted with permission from Marketing Research, Summer 2004, published by the American Marketing Association.

More information

Trade-Off Study Sample Size: How Low Can We go?

Trade-Off Study Sample Size: How Low Can We go? Trade-Off Study Sample Size: How Low Can We go? The effect of sample size on model error is examined through several commercial data sets, using five trade-off techniques: ACA, ACA/HB, CVA, HB-Reg and

More information

A Procedure for Classifying New Respondents into Existing Segments Using Maximum Difference Scaling

A Procedure for Classifying New Respondents into Existing Segments Using Maximum Difference Scaling A Procedure for Classifying New Respondents into Existing Segments Using Maximum Difference Scaling Background Bryan Orme and Rich Johnson, Sawtooth Software March, 2009 Market segmentation is pervasive

More information

Analyzing Portfolio Expected Loss

Analyzing Portfolio Expected Loss Analyzing Portfolio Expected Loss In this white paper we discuss the methodologies that Visible Equity employs in the calculation of portfolio expected loss. Portfolio expected loss calculations combine

More information

Survey Process White Paper Series The Six Steps in Conducting Quantitative Marketing Research

Survey Process White Paper Series The Six Steps in Conducting Quantitative Marketing Research Survey Process White Paper Series The Six Steps in Conducting Quantitative Marketing Research POLARIS MARKETING RESEARCH, INC. 1455 LINCOLN PARKWAY, SUITE 320 ATLANTA, GEORGIA 30346 404.816.0353 www.polarismr.com

More information

An Introduction to Bayesian Statistics

An Introduction to Bayesian Statistics An Introduction to Bayesian Statistics Robert Weiss Department of Biostatistics UCLA School of Public Health robweiss@ucla.edu April 2011 Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA

More information

Determining Sample Size 1

Determining Sample Size 1 Fact Sheet PEOD-6 November 1992 Determining Sample Size 1 Glenn D. Israel 2 Perhaps the most frequently asked question concerning sampling is, "What size sample do I need?" The answer to this question

More information

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Be able to explain the difference between the p-value and a posterior

More information

Sawtooth Software. Assessing the Monetary Value of Attribute Levels with Conjoint Analysis: Warnings and Suggestions RESEARCH PAPER SERIES

Sawtooth Software. Assessing the Monetary Value of Attribute Levels with Conjoint Analysis: Warnings and Suggestions RESEARCH PAPER SERIES Sawtooth Software RESEARCH PAPER SERIES Assessing the Monetary Value of Attribute Levels with Conjoint Analysis: Warnings and Suggestions Bryan K. Orme, Sawtooth Software, Inc. 2001 Copyright 2001-2002,

More information

Fixed-Effect Versus Random-Effects Models

Fixed-Effect Versus Random-Effects Models CHAPTER 13 Fixed-Effect Versus Random-Effects Models Introduction Definition of a summary effect Estimating the summary effect Extreme effect size in a large study or a small study Confidence interval

More information

Chapter 16 Multiple Choice Questions (The answers are provided after the last question.)

Chapter 16 Multiple Choice Questions (The answers are provided after the last question.) Chapter 16 Multiple Choice Questions (The answers are provided after the last question.) 1. Which of the following symbols represents a population parameter? a. SD b. σ c. r d. 0 2. If you drew all possible

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

More information

Bayesian Statistical Analysis in Medical Research

Bayesian Statistical Analysis in Medical Research Bayesian Statistical Analysis in Medical Research David Draper Department of Applied Mathematics and Statistics University of California, Santa Cruz draper@ams.ucsc.edu www.ams.ucsc.edu/ draper ROLE Steering

More information

DETERMINING SURVEY SAMPLE SIZE A SIMPLE PLAN

DETERMINING SURVEY SAMPLE SIZE A SIMPLE PLAN DETERMINING SURVEY SAMPLE SIZE A SIMPLE PLAN Prepared by Market Directions Market Directions B O S T O N 617-323-1862 800-475-9808 www.marketdirectionsmr.com info@marketdirectionsmr.com DETERMING SAMPLE

More information

Fundamental Probability and Statistics

Fundamental Probability and Statistics Fundamental Probability and Statistics "There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 16: Bayesian inference (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 35 Priors 2 / 35 Frequentist vs. Bayesian inference Frequentists treat the parameters as fixed (deterministic).

More information

Experimental data and survey data

Experimental data and survey data Experimental data and survey data An experiment involves the collection of measurements or observations about populations that are treated or controlled by the experimenter. A survey is an examination

More information

Keep It Simple: Easy Ways To Estimate Choice Models For Single Consumers

Keep It Simple: Easy Ways To Estimate Choice Models For Single Consumers Keep It Simple: Easy Ways To Estimate Choice Models For Single Consumers Christine Ebling, University of Technology Sydney, christine.ebling@uts.edu.au Bart Frischknecht, University of Technology Sydney,

More information

Appendix Methodology and Statistics

Appendix Methodology and Statistics Appendix Methodology and Statistics Introduction Science and Engineering Indicators (SEI) contains data compiled from a variety of sources. This appendix explains the methodological and statistical criteria

More information

Summary of Probability

Summary of Probability Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible

More information

Likelihood: Frequentist vs Bayesian Reasoning

Likelihood: Frequentist vs Bayesian Reasoning "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B University of California, Berkeley Spring 2009 N Hallinan Likelihood: Frequentist vs Bayesian Reasoning Stochastic odels and

More information

MAJOR FIELD TESTS Description of Test Reports

MAJOR FIELD TESTS Description of Test Reports MAJOR FIELD TESTS Description of Test Reports NOTE: Some of the tests do not include Assessment Indicators and some tests do not include Subscores. Departmental Roster Includes scale scores for all students

More information

Sample Size Determination

Sample Size Determination Sample Size Determination Population A: 10,000 Population B: 5,000 Sample 10% Sample 15% Sample size 1000 Sample size 750 The process of obtaining information from a subset (sample) of a larger group (population)

More information

When Does it Make Sense to Perform a Meta-Analysis?

When Does it Make Sense to Perform a Meta-Analysis? CHAPTER 40 When Does it Make Sense to Perform a Meta-Analysis? Introduction Are the studies similar enough to combine? Can I combine studies with different designs? How many studies are enough to carry

More information

An Introduction to Sampling

An Introduction to Sampling An Introduction to Sampling Sampling is the process of selecting a subset of units from the population. We use sampling formulas to determine how many to select because it is based on the characteristics

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Posterior probability!

Posterior probability! Posterior probability! P(x θ): old name direct probability It gives the probability of contingent events (i.e. observed data) for a given hypothesis (i.e. a model with known parameters θ) L(θ)=P(x θ):

More information

Checklists and Examples for Registering Statistical Analyses

Checklists and Examples for Registering Statistical Analyses Checklists and Examples for Registering Statistical Analyses For well-designed confirmatory research, all analysis decisions that could affect the confirmatory results should be planned and registered

More information

Model-based Synthesis. Tony O Hagan

Model-based Synthesis. Tony O Hagan Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

15.0 More Hypothesis Testing

15.0 More Hypothesis Testing 15.0 More Hypothesis Testing 1 Answer Questions Type I and Type II Error Power Calculation Bayesian Hypothesis Testing 15.1 Type I and Type II Error In the philosophy of hypothesis testing, the null hypothesis

More information

Probability. a number between 0 and 1 that indicates how likely it is that a specific event or set of events will occur.

Probability. a number between 0 and 1 that indicates how likely it is that a specific event or set of events will occur. Probability Probability Simple experiment Sample space Sample point, or elementary event Event, or event class Mutually exclusive outcomes Independent events a number between 0 and 1 that indicates how

More information

Homework #3 is due Friday by 5pm. Homework #4 will be posted to the class website later this week. It will be due Friday, March 7 th, at 5pm.

Homework #3 is due Friday by 5pm. Homework #4 will be posted to the class website later this week. It will be due Friday, March 7 th, at 5pm. Homework #3 is due Friday by 5pm. Homework #4 will be posted to the class website later this week. It will be due Friday, March 7 th, at 5pm. Political Science 15 Lecture 12: Hypothesis Testing Sampling

More information

Lecture 4 : Bayesian inference

Lecture 4 : Bayesian inference Lecture 4 : Bayesian inference The Lecture dark 4 energy : Bayesian puzzle inference What is the Bayesian approach to statistics? How does it differ from the frequentist approach? Conditional probabilities,

More information

Bayesian data analysis: what it is and what it is not. Prof. Andrew Gelman Dept. of Statistics Columbia University

Bayesian data analysis: what it is and what it is not. Prof. Andrew Gelman Dept. of Statistics Columbia University Bayesian data analysis: what it is and what it is not Prof. Andrew Gelman Dept. of Statistics Columbia University Talk for Columbia University Department of Computer Science, 15 Dec 2003 1 Themes Popular

More information

Hypothesis Testing. Chapter Introduction

Hypothesis Testing. Chapter Introduction Contents 9 Hypothesis Testing 553 9.1 Introduction............................ 553 9.2 Hypothesis Test for a Mean................... 557 9.2.1 Steps in Hypothesis Testing............... 557 9.2.2 Diagrammatic

More information

MAT140: Applied Statistical Methods Summary of Calculating Confidence Intervals and Sample Sizes for Estimating Parameters

MAT140: Applied Statistical Methods Summary of Calculating Confidence Intervals and Sample Sizes for Estimating Parameters MAT140: Applied Statistical Methods Summary of Calculating Confidence Intervals and Sample Sizes for Estimating Parameters Inferences about a population parameter can be made using sample statistics for

More information

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

More information

From the help desk: Bootstrapped standard errors

From the help desk: Bootstrapped standard errors The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

More information

Marketing Mix Modelling and Big Data P. M Cain

Marketing Mix Modelling and Big Data P. M Cain 1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

Sociology 6Z03 Topic 15: Statistical Inference for Means

Sociology 6Z03 Topic 15: Statistical Inference for Means Sociology 6Z03 Topic 15: Statistical Inference for Means John Fox McMaster University Fall 2016 John Fox (McMaster University) Soc 6Z03: Statistical Inference for Means Fall 2016 1 / 41 Outline: Statistical

More information

Sampling Distributions and the Central Limit Theorem

Sampling Distributions and the Central Limit Theorem 135 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics Chapter 10 Sampling Distributions and the Central Limit Theorem In the previous chapter we explained

More information

Statistical Inference

Statistical Inference Statistical Inference Idea: Estimate parameters of the population distribution using data. How: Use the sampling distribution of sample statistics and methods based on what would happen if we used this

More information

Linear regression methods for large n and streaming data

Linear regression methods for large n and streaming data Linear regression methods for large n and streaming data Large n and small or moderate p is a fairly simple problem. The sufficient statistic for β in OLS (and ridge) is: The concept of sufficiency is

More information

SAS Certificate Applied Statistics and SAS Programming

SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and Advanced SAS Programming Brigham Young University Department of Statistics offers an Applied Statistics and

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

KEYWORDS Monte Carlo simulation; profit shares; profit commission; reinsurance.

KEYWORDS Monte Carlo simulation; profit shares; profit commission; reinsurance. THE USE OF MONTE CARLO SIMULATION OF MORTALITY EXPERIENCE IN ESTIMATING THE COST OF PROFIT SHARING ARRANGEMENTS FOR GROUP LIFE POLICIES. By LJ Rossouw and P Temple ABSTRACT This paper expands previous

More information

R Simulations: Monty Hall problem

R Simulations: Monty Hall problem R Simulations: Monty Hall problem Monte Carlo Simulations Monty Hall Problem Statistical Analysis Simulation in R Exercise 1: A Gift Giving Puzzle Exercise 2: Gambling Problem R Simulations: Monty Hall

More information

Math 140: Introductory Statistics Instructor: Julio C. Herrera Exam 3 January 30, 2015

Math 140: Introductory Statistics Instructor: Julio C. Herrera Exam 3 January 30, 2015 Name: Exam Score: Instructions: This exam covers the material from chapter 7 through 9. Please read each question carefully before you attempt to solve it. Remember that you have to show all of your work

More information

9. Sampling Distributions

9. Sampling Distributions 9. Sampling Distributions Prerequisites none A. Introduction B. Sampling Distribution of the Mean C. Sampling Distribution of Difference Between Means D. Sampling Distribution of Pearson's r E. Sampling

More information

Lecture 3 : Hypothesis testing and model-fitting

Lecture 3 : Hypothesis testing and model-fitting Lecture 3 : Hypothesis testing and model-fitting These dark lectures energy puzzle Lecture 1 : basic descriptive statistics Lecture 2 : searching for correlations Lecture 3 : hypothesis testing and model-fitting

More information

POLI 300 Handout #2 N. R. Miller RANDOM SAMPLING. Key Definitions Pertaining to Sampling

POLI 300 Handout #2 N. R. Miller RANDOM SAMPLING. Key Definitions Pertaining to Sampling POLI 300 Handout #2 N. R. Miller Key Definitions Pertaining to Sampling RANDOM SAMPLING 1. Population: the set of units (in survey research, usually individuals or households), N in number, that are to

More information

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are

More information

The Margin of Error for Differences in Polls

The Margin of Error for Differences in Polls The Margin of Error for Differences in Polls Charles H. Franklin University of Wisconsin, Madison October 27, 2002 (Revised, February 9, 2007) The margin of error for a poll is routinely reported. 1 But

More information

Valuation of transport time and reliability in freight transport

Valuation of transport time and reliability in freight transport Summary TØI-report 1083/2010 Authors: Askill Harkjerr Halse, Hanne Samstad, Marit Killi and Stefan Flügel, Farideh Ramjerdi Oslo 2010, 177 pages in Norwegian Valuation of transport time and reliability

More information

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

More information

INTRODUCTORY STATISTICS

INTRODUCTORY STATISTICS INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore

More information

CONFIDENCE INTERVALS FOR MEANS AND PROPORTIONS

CONFIDENCE INTERVALS FOR MEANS AND PROPORTIONS LESSON SEVEN CONFIDENCE INTERVALS FOR MEANS AND PROPORTIONS An interval estimate for μ of the form a margin of error would provide the user with a measure of the uncertainty associated with the point estimate.

More information

Sampling and Hypothesis Testing

Sampling and Hypothesis Testing Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus

More information

Foundations of Statistics Frequentist and Bayesian

Foundations of Statistics Frequentist and Bayesian Mary Parker, http://www.austincc.edu/mparker/stat/nov04/ page 1 of 13 Foundations of Statistics Frequentist and Bayesian Statistics is the science of information gathering, especially when the information

More information

Sawtooth Software. Which Conjoint Method Should I Use? RESEARCH PAPER SERIES. Bryan K. Orme Sawtooth Software, Inc.

Sawtooth Software. Which Conjoint Method Should I Use? RESEARCH PAPER SERIES. Bryan K. Orme Sawtooth Software, Inc. Sawtooth Software RESEARCH PAPER SERIES Which Conjoint Method Should I Use? Bryan K. Orme Sawtooth Software, Inc. Copyright 2009, Sawtooth Software, Inc. 530 W. Fir St. Sequim, 0 WA 98382 (360) 681-2300

More information

OLS is not only unbiased it is also the most precise (efficient) unbiased estimation technique - ie the estimator has the smallest variance

OLS is not only unbiased it is also the most precise (efficient) unbiased estimation technique - ie the estimator has the smallest variance Lecture 5: Hypothesis Testing What we know now: OLS is not only unbiased it is also the most precise (efficient) unbiased estimation technique - ie the estimator has the smallest variance (if the Gauss-Markov

More information

Chapter 7 Sampling and Sampling Distributions. Learning objectives

Chapter 7 Sampling and Sampling Distributions. Learning objectives Chapter 7 Sampling and Sampling Distributions Slide 1 Learning objectives 1. Understand Simple Random Sampling 2. Understand Point Estimation and be able to compute point estimates 3. Understand Sampling

More information

Sampling (cont d) and Confidence Intervals Lecture 9 8 March 2006 R. Ryznar

Sampling (cont d) and Confidence Intervals Lecture 9 8 March 2006 R. Ryznar Sampling (cont d) and Confidence Intervals 11.220 Lecture 9 8 March 2006 R. Ryznar Census Surveys Decennial Census Every (over 11 million) household gets the short form and 17% or 1/6 get the long form

More information

Description and comparison of the methods of cluster sampling and lot quality assurance sampling to assess immunization coverage

Description and comparison of the methods of cluster sampling and lot quality assurance sampling to assess immunization coverage WHO/V&B/01.26 ENGLISH ONLY DISTR.: GENERAL Description and comparison of the methods of cluster sampling and lot quality assurance sampling to assess immunization coverage Written by Stacy Hoshaw-Woodard,

More information

How to Conduct a Hypothesis Test

How to Conduct a Hypothesis Test How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some

More information

Testing: is my coin fair?

Testing: is my coin fair? Testing: is my coin fair? Formally: we want to make some inference about P(head) Try it: toss coin several times (say 7 times) Assume that it is fair ( P(head)= ), and see if this assumption is compatible

More information

Nonparametric statistics and model selection

Nonparametric statistics and model selection Chapter 5 Nonparametric statistics and model selection In Chapter, we learned about the t-test and its variations. These were designed to compare sample means, and relied heavily on assumptions of normality.

More information

Inferential Statistics

Inferential Statistics Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

More information

The Assumption(s) of Normality

The Assumption(s) of Normality The Assumption(s) of Normality Copyright 2000, 2011, J. Toby Mordkoff This is very complicated, so I ll provide two versions. At a minimum, you should know the short one. It would be great if you knew

More information

MODELLING OCCUPATIONAL EXPOSURE USING A RANDOM EFFECTS MODEL: A BAYESIAN APPROACH ABSTRACT

MODELLING OCCUPATIONAL EXPOSURE USING A RANDOM EFFECTS MODEL: A BAYESIAN APPROACH ABSTRACT MODELLING OCCUPATIONAL EXPOSURE USING A RANDOM EFFECTS MODEL: A BAYESIAN APPROACH Justin Harvey * and Abrie van der Merwe ** * Centre for Statistical Consultation, University of Stellenbosch ** University

More information

Task force on quality of BCS data. Analysis of sample size in consumer surveys

Task force on quality of BCS data. Analysis of sample size in consumer surveys Task force on quality of BCS data Analysis of sample size in consumer surveys theoretical considerations and factors determining minimum necessary sample sizes, link between country size and sample size

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

Bayesian and Classical Inference

Bayesian and Classical Inference Eco517, Part I Fall 2002 C. Sims Bayesian and Classical Inference Probability statements made in Bayesian and classical approaches to inference often look similar, but they carry different meanings. Because

More information

E205 Final: Version B

E205 Final: Version B Name: Class: Date: E205 Final: Version B Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The owner of a local nightclub has recently surveyed a random

More information

Basic assumptions of conjoint analysis. * The product is a bundle of attributes. * Utility predicts behavior (i.e., purchases)

Basic assumptions of conjoint analysis. * The product is a bundle of attributes. * Utility predicts behavior (i.e., purchases) Basic assumptions of conjoint analysis * The product is a bundle of attributes * Utility of a product is a simple function of the utilities of the attributes * Utility predicts behavior (i.e., purchases)

More information

1 Prior Probability and Posterior Probability

1 Prior Probability and Posterior Probability Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

More information

Likelihood Approaches for Trial Designs in Early Phase Oncology

Likelihood Approaches for Trial Designs in Early Phase Oncology Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth Garrett-Mayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University

More information

4. Introduction to Statistics

4. Introduction to Statistics Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

More information

AMS 5 HYPOTHESIS TESTING

AMS 5 HYPOTHESIS TESTING AMS 5 HYPOTHESIS TESTING Hypothesis Testing Was it due to chance, or something else? Decide between two hypotheses that are mutually exclusive on the basis of evidence from observations. Test of Significance

More information

Chapter 8 CREDIBILITY HOWARD C. MAHLER AND CURTIS GARY DEAN

Chapter 8 CREDIBILITY HOWARD C. MAHLER AND CURTIS GARY DEAN Chapter 8 CREDIBILITY HOWARD C. MAHLER AND CURTIS GARY DEAN 1. INTRODUCTION Credibility theory provides tools to deal with the randomness of data that is used for predicting future events or costs. For

More information

Sawtooth Software. The CBC System for Choice-Based Conjoint Analysis. Version 8 TECHNICAL PAPER SERIES. Sawtooth Software, Inc.

Sawtooth Software. The CBC System for Choice-Based Conjoint Analysis. Version 8 TECHNICAL PAPER SERIES. Sawtooth Software, Inc. Sawtooth Software TECHNICAL PAPER SERIES The CBC System for Choice-Based Conjoint Analysis Version 8 Sawtooth Software, Inc. 1 Copyright 1993-2013, Sawtooth Software, Inc. 1457 E 840 N Orem, Utah +1 801

More information

AP Statistics 2012 Scoring Guidelines

AP Statistics 2012 Scoring Guidelines AP Statistics 2012 Scoring Guidelines The College Board The College Board is a mission-driven not-for-profit organization that connects students to college success and opportunity. Founded in 1900, the

More information

THE INTERNET AND THE IRAQ WAR HOW ONLINE AMERICANS HAVE USED THE INTERNET TO

THE INTERNET AND THE IRAQ WAR HOW ONLINE AMERICANS HAVE USED THE INTERNET TO THE INTERNET AND THE IRAQ WAR HOW ONLINE AMERICANS HAVE USED THE INTERNET TO LEARN WAR NEWS, UNDERSTAND EVENTS, AND PROMOTE THEIR VIEWS AUTHORS LEE RAINIE, DIRECTOR SUSANNAH FOX, RESEARCH DIRECTOR DEBORAH

More information

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 11

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 11 CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note Conditional Probability A pharmaceutical company is marketing a new test for a certain medical condition. According

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

Sawtooth Software. How Many Questions Should You Ask in Choice-Based Conjoint Studies? RESEARCH PAPER SERIES

Sawtooth Software. How Many Questions Should You Ask in Choice-Based Conjoint Studies? RESEARCH PAPER SERIES Sawtooth Software RESEARCH PAPER SERIES How Many Questions Should You Ask in Choice-Based Conjoint Studies? Richard M. Johnson and Bryan K. Orme, Sawtooth Software, Inc. 1996 Copyright 1996-2002, Sawtooth

More information

Regression Analysis: Basic Concepts

Regression Analysis: Basic Concepts The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance

More information

What is Bayesian statistics and why everything else is wrong

What is Bayesian statistics and why everything else is wrong What is Bayesian statistics and why everything else is wrong 1 Michael Lavine ISDS, Duke University, Durham, North Carolina Abstract We use a single example to explain (1), the Likelihood Principle, (2)

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Exam Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Find the mean for the given sample data. 1) Bill kept track of the number of hours he spent

More information

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other 1 Hypothesis Testing Richard S. Balkin, Ph.D., LPC-S, NCC 2 Overview When we have questions about the effect of a treatment or intervention or wish to compare groups, we use hypothesis testing Parametric

More information

Chris Slaughter, DrPH. GI Research Conference June 19, 2008

Chris Slaughter, DrPH. GI Research Conference June 19, 2008 Chris Slaughter, DrPH Assistant Professor, Department of Biostatistics Vanderbilt University School of Medicine GI Research Conference June 19, 2008 Outline 1 2 3 Factors that Impact Power 4 5 6 Conclusions

More information

Expected values, standard errors, Central Limit Theorem. Statistical inference

Expected values, standard errors, Central Limit Theorem. Statistical inference Expected values, standard errors, Central Limit Theorem FPP 16-18 Statistical inference Up to this point we have focused primarily on exploratory statistical analysis We know dive into the realm of statistical

More information

The problem with waiting time

The problem with waiting time The problem with waiting time Why the only way to real optimization of any process requires discrete event simulation Bill Nordgren, MS CIM, FlexSim Software Products Over the years there have been many

More information

Bayesian Updating: Odds Class 12, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Bayesian Updating: Odds Class 12, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals Bayesian Updating: Odds Class 12, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1. Be able to convert between odds and probability. 2. Be able to update prior odds to posterior odds

More information

5.1 Identifying the Target Parameter

5.1 Identifying the Target Parameter University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying

More information

Decision Theory. 36.1 Rational prospecting

Decision Theory. 36.1 Rational prospecting 36 Decision Theory Decision theory is trivial, apart from computational details (just like playing chess!). You have a choice of various actions, a. The world may be in one of many states x; which one

More information

Bayesian networks. Data Mining. Abraham Otero. Data Mining. Agenda. Bayes' theorem Naive Bayes Bayesian networks Bayesian networks (example)

Bayesian networks. Data Mining. Abraham Otero. Data Mining. Agenda. Bayes' theorem Naive Bayes Bayesian networks Bayesian networks (example) Bayesian networks 1/40 Agenda Introduction Bayes' theorem Bayesian networks Quick reference 2/40 1 Introduction Probabilistic methods: They are backed by a strong mathematical formalism. They can capture

More information