# Does Sample Size Still Matter?

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Does Sample Size Still Matter? David Bakken and Megan Bond, KJT Group Introduction The survey has been an important tool in academic, governmental, and commercial research since the 1930 s. Because in most cases the intent of a survey is to measure or estimate the value that a variable takes on in some population of interest, the development of sampling science has been integral to the advancement of survey research. While it may be possible to conduct a census among a small, easily accessed population, in most cases observing or measuring a sample of members of the population is necessary for reasons of cost, timing, and practicality. Most of our understanding of sampling theory and method is based on probability sampling. A probability sample is one in which all members of the population of interest have a known probability of being included in the sample. The most basic form of probability sampling is the simple random sample (SRS) without replacement. With SRS without replacement, each population member or unit has an equal probability of being selected for the sample (with that probability being equal to 1/N, where N is the size of the population). The importance of probability sampling becomes apparent when we want to make statements about the degree of difference between the value of a parameter (such as a mean, a proportion, or a regression coefficient) observed in the sample and the true population value of that parameter. Probability sampling allows us to estimate the error attributable to looking at a sample rather than the entire population. The math of probability sampling (based on the number of possible permutations, such as the number of ways that you can get a result of seven by rolling a pair of dice) is such that if we took an infinitely large number of samples of a given size and measured a parameter for each sample, such as the mean of a variable, the distribution of these sample means (a.k.a. the sampling distribution of the mean) would have a normal distribution and the mean of this distribution would equal the population mean. Furthermore, we can calculate a margin of error around our sample mean based on this overall sampling distribution of means. It turns out that the margin of error for a sample estimate is related to the size of the sample, and larger probability samples, all other things being equal, will have smaller sampling errors. If we were to compare the sampling distribution of means based on SRS samples of 1,000 and 100, we would expect to find greater variability in the means based on samples of size 100. In other words, larger samples lead to more precise estimates of the parameter under study. This property has guided the design of survey samples, and most market researchers understand the relationship between population size, sample size, and precision (or margin of error), and they may apply relatively simple formulas to determine the appropriate sample size to achieve a specific level of precision. 1

2 Different areas of research practice have different standards or expectations for survey sampling error. Opinion polls conducted to forecast the outcome of an election may be designed for a margin of error of around 3% at a stated likelihood (usually 95%), by which we mean that if we repeated the poll 100 times with the same size probability sample, we would expect to find a value for the expected vote to be within three percentage points on either side of the sample estimate in 95 of those samples. For commercial purposes, the desired precision or margin of error is more likely to be a function of the cost of making a bad bet on some future outcome (this is known as the loss function ) and the magnitude of a meaningful difference in the real world. For example, a small difference in market share may represent a significant increase in revenue for one company but mere accounting noise for another company, and each company will have different requirements for precision in order to make the right bet on a particular action. Precision comes with a cost, however, and as, Figure 1 illustrates, the relationship between precision and sample size is non-linear. Reducing the margin of error at 95% confidence from 3% to 2% requires a near doubling of the sample size; reducing it from 3% to 1% requires a seven-fold increase in sample size. For that reason, researchers must find the appropriate trade-off between cost and precision for a particular survey problem. We should mention two other considerations with respect to precision. When estimating proportions, the formula for calculating the margin of error for a specific sample size is: ME = z (p (1 p))/n where p is the expected proportion. The margin of error for a given sample size is greatest when that proportion is exactly 50%. If we have a prior belief that the population proportion of interest is less than 50%, we may able to achieve a specified level of precision with a smaller sample. However, in the absence of that prior belief, 50% is the most conservative estimate and many people use that value as a default. Similarly, the degree of variability in the population impacts precision, and if we have prior beliefs about the degree of homogeneity or heterogeneity in the population, we may be able to achieve the precision required to satisfy our decision-making needs with a smaller sample. Despite the well-known math of probability sampling, market researchers often fail to conduct studies with samples that are large enough (based on sampling theory) to support their conclusions. Many researchers develop heuristics to simplify decisions about sample size. For example, psychology graduate students of a certain era were taught that a small sample (in particular for a randomized control group experiment) was 30, because that was the point at which one could switch from Student s T to a z-test to compare means. Market researchers have similar rules of thumb for determining the minimum number of elements from a population subgroup or segment to include in a sample. These rules of thumb are often intuitive rather than empirically-based. 2

4 different price points. Imagine that the company conducts a survey, employing a simple direct elicitation of willingness to pay, such as the Gabor-Granger method. Further imagine that the results indicate that 15% of the target market says they will definitely purchase the product at a price of \$15 or less. The company has determined that they need to achieve at least 20% market adoption at a price of \$15 in order move ahead with the launch. The standard frequentist approach is not much help in this case. If the survey sample is relatively small, the 20% threshold is likely to fall within the margin of error; if the sample is large, the resulting increase in precision will shrink the confidence interval around the 15% estimate such that the 20% threshold looks extremely unlikely. We can use Bayes theorem to reduce the uncertainty. Bayes theorem exploits the fact that the joint probability of two events, A and B, can be written as the product of the probability of one event and the conditional probability of the second event, given the first event. While there are some different ways to express the theorem, here is a simple representation: Prob H = xy xy + z(1 x) We wish to estimate the probability of our hypothesis (for example, that the adoption rate will be 20%). The value X reflects our best guess about the likelihood of the hypothesis in the absence of any data (our prior probability belief). Y is the probability that the hypothesis is true given the data, and z is the probability of observing the data if the hypothesis is not true. Overview of Our Study The overall objective of this study, as noted previously, was to assess the variability in parameter estimates for samples of different sizes. We followed the classic paradigm for evaluation of parameter estimates under varying treatments or methods. We started with a population where the parameter values were known. In many studies such a population is synthetic; the observations are generated by specifying the parameter values and then using Monte Carlo simulation methods to create one or more synthetic populations with those parameter values. In our case, we started with a reasonably large sample of actual survey responses and, treating that sample as the population, drew multiple simple random samples of varying size (as described below). Using responses to a choice-based conjoint exercise that was embedded in an online survey of approximately 897 individuals, we created a series of samples of different sizes using different restrictions to reflect the ways in which both probability and convenience samples might be generated. The choice-based conjoint was a simple brand and price exercise that included four brands of LCD television and four price levels. We conducted two separate experiments, as described below. 4

5 Experiment 1: We drew multiples of ten random samples of 25, 50, 75, 100, 150, 225 and 450 from our population of 897 respondents, resulting in 70 individual samples. We estimated HB models for each sample (using Sawtooth Software s CBC-HB program). Experiment 2: We repeated the method of Experiment 1 but altered the sampling strategy so that samples were more homogeneous. We used two different sets of restrictions to achieve this, one based on demographics, and one based on an attitudinal measure in the original survey. We applied the same overall design, with multiples of 10 samples of size 25, 50, 75, and 100, resulting in a total of 40 samples based on the demographic restriction and 40 based on the attitudinal restriction. Results When using results from choice-based conjoint analysis for research-on-research, we usually employ choice shares predicted by a market simulator (employing a logit transformation to generate purchase probabilities). This method is preferable to comparing different samples using model-based parameters (e.g., regression coefficients) because, in the multinomial logit model that captures the likelihood of choosing an alternative given the alternative s attributes, each sample has a unique scaling parameter. Transforming the model coefficients into predicted choice shares removes this difference between samples. In addition to comparing samples of different size with respect to the variance in predicted choice shares and deviation from the true population value, we also looked at aggregate and individual (i.e., hit rate ) validation using holdout choice tasks. Experiment 1 Figure 2 shows the average prediction variance across the 10 replicates at each sample size. There are two interesting patterns here. First, some brands have smaller prediction variance. These happen to be somewhat larger brands than the other two. The second pattern is that prediction variance shrinks as sample size increases, dropping roughly in half when the sample size is at least 100, compared to samples of 25. Insert Figure 2 here. Figure 3 compares aggregate holdout prediction errors for each of the sample replicates. Aggregate holdout prediction error is the difference between the shares predicted for each brand at the prices set for a holdout task (that is not included in the modeling) and the actual choices that respondents made in those tasks. Larger errors reflect more noise in the parameters, and we see that these errors are both larger on average and more variable when the sample is small than when it is larger. Insert Figure 3 here. 5

6 Figure 4 compares individual hit rates for each of the sample replicates. The hit rate is the proportion of times the prediction choice for a given respondent matches the actual choice the respondent made in that holdout task. With one notable exception (samples of 100), the average hit rates and the variability in hit rates are similar across different sample sizes. This is probably a consequence of the HB method used to estimate the individual-level utilities. This method borrows data from other respondents to derive individual models for each respondent. It is possible that the hit rates for smaller samples are the result of over-fitting since there are fewer cases to borrow data from (which pulls the individual models in the direction of the overall average) while with larger samples, the individual parameter space is better represented, so the borrowed data is more probable. Insert Figure 4 here. The final indication of the potential error associated with sample size is reflected in the differences between predicted choice shares based on each sample replicate and the overall population value (the modeled choice shares using the entire sample). Figure 5 shows these errors for predicted choice shares for just one of the brands. As with the other measures, individual sample prediction errors are larger for smaller samples, but when the samples are averaged (within sample size), the predictions are pretty close to the actual population value. Insert Figure 5 here. Experiment 2 As we noted in the description of our second experiment, market research samples often are restricted in ways that might impact the variability or heterogeneity within the sample. All other things being equal, samples from populations that are more homogeneous should produce more consistent parameter estimates (as long as the population variability is related to the parameter of interest). We devised two constrained sampling approaches to yield samples that would be either demographically more similar (using age) or attitudinally more similar. Overall, as Figures 6 and 7 indicate, the patterns of variability in predicted choice shares in these constrained samples is similar to the unconstrained samples. Since our sample restrictions were arbitrary and only two of many possible sample restrictions, it is possible that any resulting increase in homogeneity was either small or not relevant to the parameters of interest. It is also possible that the HB method attenuates the impact of increased homogeneity on the individual- level choice models. Insert Figures 6 and 7 about here. 6

7 Accounting for Uncertainty Looking across these sample replicates, we want to know, for a given sample size, how likely we are to make a seriously wrong decision. We applied Bayes theorem to estimate the uncertainty associated with samples of different size. Knowing that the population choice share for Toshiba at a particular price is roughly 19% and that if the price is lowered by \$100 the choice share doubles, we can calculate the uncertainty for each of the samples. Figure 8 compares the results of this calculation for samples of 25 and 100. We can see that we should have greater confidence in any one sample of 100 than in any one sample of size 25. Insert Figure 8 about here. Conclusions For us, our experiments indicate that sample size does still matter. Moreover, we now have greater confidence in drawing the line for minimum sample size of about 100 respondents, at least for studies involving relatively simple choice-based conjoint models estimated using a hierarchical Bayesian method. Regardless of the sample size, Bayes theorem offers a way to quantify the uncertainty around population parameters. Bayes theorem requires that we alter our way of thinking about the data. Rather than base our inferences on the long term frequencies from hypothetical sample replicates, Bayes theorem allows us to ground our estimates in the data at hand. We do not view Bayesian inference as a total replacement for frequentist methods of estimating sampling error. Instead, we see Bayes theorem as an additional tool that can help managers make the best possible decisions or bets based on all the information we have available. 7

8 Figures Figure 1. Figure 2. 8

9 Figure 3. Figure 4. 9

10 Figure 5. Figure 6. 10

11 Figure 7. Figure 8. 11

### Sample Size Issues for Conjoint Analysis

Chapter 7 Sample Size Issues for Conjoint Analysis I m about to conduct a conjoint analysis study. How large a sample size do I need? What will be the margin of error of my estimates if I use a sample

### The HB. How Bayesian methods have changed the face of marketing research. Summer 2004

The HB How Bayesian methods have changed the face of marketing research. 20 Summer 2004 Reprinted with permission from Marketing Research, Summer 2004, published by the American Marketing Association.

### Trade-Off Study Sample Size: How Low Can We go?

Trade-Off Study Sample Size: How Low Can We go? The effect of sample size on model error is examined through several commercial data sets, using five trade-off techniques: ACA, ACA/HB, CVA, HB-Reg and

### A Procedure for Classifying New Respondents into Existing Segments Using Maximum Difference Scaling

A Procedure for Classifying New Respondents into Existing Segments Using Maximum Difference Scaling Background Bryan Orme and Rich Johnson, Sawtooth Software March, 2009 Market segmentation is pervasive

### Analyzing Portfolio Expected Loss

Analyzing Portfolio Expected Loss In this white paper we discuss the methodologies that Visible Equity employs in the calculation of portfolio expected loss. Portfolio expected loss calculations combine

### Survey Process White Paper Series The Six Steps in Conducting Quantitative Marketing Research

Survey Process White Paper Series The Six Steps in Conducting Quantitative Marketing Research POLARIS MARKETING RESEARCH, INC. 1455 LINCOLN PARKWAY, SUITE 320 ATLANTA, GEORGIA 30346 404.816.0353 www.polarismr.com

### An Introduction to Bayesian Statistics

An Introduction to Bayesian Statistics Robert Weiss Department of Biostatistics UCLA School of Public Health robweiss@ucla.edu April 2011 Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA

### Determining Sample Size 1

Fact Sheet PEOD-6 November 1992 Determining Sample Size 1 Glenn D. Israel 2 Perhaps the most frequently asked question concerning sampling is, "What size sample do I need?" The answer to this question

### Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Be able to explain the difference between the p-value and a posterior

### Sawtooth Software. Assessing the Monetary Value of Attribute Levels with Conjoint Analysis: Warnings and Suggestions RESEARCH PAPER SERIES

Sawtooth Software RESEARCH PAPER SERIES Assessing the Monetary Value of Attribute Levels with Conjoint Analysis: Warnings and Suggestions Bryan K. Orme, Sawtooth Software, Inc. 2001 Copyright 2001-2002,

### Fixed-Effect Versus Random-Effects Models

CHAPTER 13 Fixed-Effect Versus Random-Effects Models Introduction Definition of a summary effect Estimating the summary effect Extreme effect size in a large study or a small study Confidence interval

### Chapter 16 Multiple Choice Questions (The answers are provided after the last question.)

Chapter 16 Multiple Choice Questions (The answers are provided after the last question.) 1. Which of the following symbols represents a population parameter? a. SD b. σ c. r d. 0 2. If you drew all possible

### CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

### Bayesian Statistical Analysis in Medical Research

Bayesian Statistical Analysis in Medical Research David Draper Department of Applied Mathematics and Statistics University of California, Santa Cruz draper@ams.ucsc.edu www.ams.ucsc.edu/ draper ROLE Steering

### DETERMINING SURVEY SAMPLE SIZE A SIMPLE PLAN

DETERMINING SURVEY SAMPLE SIZE A SIMPLE PLAN Prepared by Market Directions Market Directions B O S T O N 617-323-1862 800-475-9808 www.marketdirectionsmr.com info@marketdirectionsmr.com DETERMING SAMPLE

### Fundamental Probability and Statistics

Fundamental Probability and Statistics "There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are

### MS&E 226: Small Data

MS&E 226: Small Data Lecture 16: Bayesian inference (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 35 Priors 2 / 35 Frequentist vs. Bayesian inference Frequentists treat the parameters as fixed (deterministic).

### Experimental data and survey data

Experimental data and survey data An experiment involves the collection of measurements or observations about populations that are treated or controlled by the experimenter. A survey is an examination

### Keep It Simple: Easy Ways To Estimate Choice Models For Single Consumers

Keep It Simple: Easy Ways To Estimate Choice Models For Single Consumers Christine Ebling, University of Technology Sydney, christine.ebling@uts.edu.au Bart Frischknecht, University of Technology Sydney,

### Appendix Methodology and Statistics

Appendix Methodology and Statistics Introduction Science and Engineering Indicators (SEI) contains data compiled from a variety of sources. This appendix explains the methodological and statistical criteria

### Summary of Probability

Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible

### Likelihood: Frequentist vs Bayesian Reasoning

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B University of California, Berkeley Spring 2009 N Hallinan Likelihood: Frequentist vs Bayesian Reasoning Stochastic odels and

### MAJOR FIELD TESTS Description of Test Reports

MAJOR FIELD TESTS Description of Test Reports NOTE: Some of the tests do not include Assessment Indicators and some tests do not include Subscores. Departmental Roster Includes scale scores for all students

### Sample Size Determination

Sample Size Determination Population A: 10,000 Population B: 5,000 Sample 10% Sample 15% Sample size 1000 Sample size 750 The process of obtaining information from a subset (sample) of a larger group (population)

### When Does it Make Sense to Perform a Meta-Analysis?

CHAPTER 40 When Does it Make Sense to Perform a Meta-Analysis? Introduction Are the studies similar enough to combine? Can I combine studies with different designs? How many studies are enough to carry

### An Introduction to Sampling

An Introduction to Sampling Sampling is the process of selecting a subset of units from the population. We use sampling formulas to determine how many to select because it is based on the characteristics

### Fairfield Public Schools

Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

### Posterior probability!

Posterior probability! P(x θ): old name direct probability It gives the probability of contingent events (i.e. observed data) for a given hypothesis (i.e. a model with known parameters θ) L(θ)=P(x θ):

### Checklists and Examples for Registering Statistical Analyses

Checklists and Examples for Registering Statistical Analyses For well-designed confirmatory research, all analysis decisions that could affect the confirmatory results should be planned and registered

### Model-based Synthesis. Tony O Hagan

Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that

### Association Between Variables

Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

### 15.0 More Hypothesis Testing

15.0 More Hypothesis Testing 1 Answer Questions Type I and Type II Error Power Calculation Bayesian Hypothesis Testing 15.1 Type I and Type II Error In the philosophy of hypothesis testing, the null hypothesis

### Probability. a number between 0 and 1 that indicates how likely it is that a specific event or set of events will occur.

Probability Probability Simple experiment Sample space Sample point, or elementary event Event, or event class Mutually exclusive outcomes Independent events a number between 0 and 1 that indicates how

### Homework #3 is due Friday by 5pm. Homework #4 will be posted to the class website later this week. It will be due Friday, March 7 th, at 5pm.

Homework #3 is due Friday by 5pm. Homework #4 will be posted to the class website later this week. It will be due Friday, March 7 th, at 5pm. Political Science 15 Lecture 12: Hypothesis Testing Sampling

### Lecture 4 : Bayesian inference

Lecture 4 : Bayesian inference The Lecture dark 4 energy : Bayesian puzzle inference What is the Bayesian approach to statistics? How does it differ from the frequentist approach? Conditional probabilities,

### Bayesian data analysis: what it is and what it is not. Prof. Andrew Gelman Dept. of Statistics Columbia University

Bayesian data analysis: what it is and what it is not Prof. Andrew Gelman Dept. of Statistics Columbia University Talk for Columbia University Department of Computer Science, 15 Dec 2003 1 Themes Popular

### Hypothesis Testing. Chapter Introduction

Contents 9 Hypothesis Testing 553 9.1 Introduction............................ 553 9.2 Hypothesis Test for a Mean................... 557 9.2.1 Steps in Hypothesis Testing............... 557 9.2.2 Diagrammatic

### MAT140: Applied Statistical Methods Summary of Calculating Confidence Intervals and Sample Sizes for Estimating Parameters

MAT140: Applied Statistical Methods Summary of Calculating Confidence Intervals and Sample Sizes for Estimating Parameters Inferences about a population parameter can be made using sample statistics for

### Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

### From the help desk: Bootstrapped standard errors

The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

### Marketing Mix Modelling and Big Data P. M Cain

1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

### Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

### Sociology 6Z03 Topic 15: Statistical Inference for Means

Sociology 6Z03 Topic 15: Statistical Inference for Means John Fox McMaster University Fall 2016 John Fox (McMaster University) Soc 6Z03: Statistical Inference for Means Fall 2016 1 / 41 Outline: Statistical

### Sampling Distributions and the Central Limit Theorem

135 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics Chapter 10 Sampling Distributions and the Central Limit Theorem In the previous chapter we explained

### Statistical Inference

Statistical Inference Idea: Estimate parameters of the population distribution using data. How: Use the sampling distribution of sample statistics and methods based on what would happen if we used this

### Linear regression methods for large n and streaming data

Linear regression methods for large n and streaming data Large n and small or moderate p is a fairly simple problem. The sufficient statistic for β in OLS (and ridge) is: The concept of sufficiency is

### SAS Certificate Applied Statistics and SAS Programming

SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and Advanced SAS Programming Brigham Young University Department of Statistics offers an Applied Statistics and

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

### KEYWORDS Monte Carlo simulation; profit shares; profit commission; reinsurance.

THE USE OF MONTE CARLO SIMULATION OF MORTALITY EXPERIENCE IN ESTIMATING THE COST OF PROFIT SHARING ARRANGEMENTS FOR GROUP LIFE POLICIES. By LJ Rossouw and P Temple ABSTRACT This paper expands previous

### R Simulations: Monty Hall problem

R Simulations: Monty Hall problem Monte Carlo Simulations Monty Hall Problem Statistical Analysis Simulation in R Exercise 1: A Gift Giving Puzzle Exercise 2: Gambling Problem R Simulations: Monty Hall

### Math 140: Introductory Statistics Instructor: Julio C. Herrera Exam 3 January 30, 2015

Name: Exam Score: Instructions: This exam covers the material from chapter 7 through 9. Please read each question carefully before you attempt to solve it. Remember that you have to show all of your work

### 9. Sampling Distributions

9. Sampling Distributions Prerequisites none A. Introduction B. Sampling Distribution of the Mean C. Sampling Distribution of Difference Between Means D. Sampling Distribution of Pearson's r E. Sampling

### Lecture 3 : Hypothesis testing and model-fitting

Lecture 3 : Hypothesis testing and model-fitting These dark lectures energy puzzle Lecture 1 : basic descriptive statistics Lecture 2 : searching for correlations Lecture 3 : hypothesis testing and model-fitting

### POLI 300 Handout #2 N. R. Miller RANDOM SAMPLING. Key Definitions Pertaining to Sampling

POLI 300 Handout #2 N. R. Miller Key Definitions Pertaining to Sampling RANDOM SAMPLING 1. Population: the set of units (in survey research, usually individuals or households), N in number, that are to

### Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are

### The Margin of Error for Differences in Polls

The Margin of Error for Differences in Polls Charles H. Franklin University of Wisconsin, Madison October 27, 2002 (Revised, February 9, 2007) The margin of error for a poll is routinely reported. 1 But

### Valuation of transport time and reliability in freight transport

Summary TØI-report 1083/2010 Authors: Askill Harkjerr Halse, Hanne Samstad, Marit Killi and Stefan Flügel, Farideh Ramjerdi Oslo 2010, 177 pages in Norwegian Valuation of transport time and reliability

### What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

### INTRODUCTORY STATISTICS

INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore

### CONFIDENCE INTERVALS FOR MEANS AND PROPORTIONS

LESSON SEVEN CONFIDENCE INTERVALS FOR MEANS AND PROPORTIONS An interval estimate for μ of the form a margin of error would provide the user with a measure of the uncertainty associated with the point estimate.

### Sampling and Hypothesis Testing

Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus

### Foundations of Statistics Frequentist and Bayesian

Mary Parker, http://www.austincc.edu/mparker/stat/nov04/ page 1 of 13 Foundations of Statistics Frequentist and Bayesian Statistics is the science of information gathering, especially when the information

### Sawtooth Software. Which Conjoint Method Should I Use? RESEARCH PAPER SERIES. Bryan K. Orme Sawtooth Software, Inc.

Sawtooth Software RESEARCH PAPER SERIES Which Conjoint Method Should I Use? Bryan K. Orme Sawtooth Software, Inc. Copyright 2009, Sawtooth Software, Inc. 530 W. Fir St. Sequim, 0 WA 98382 (360) 681-2300

### OLS is not only unbiased it is also the most precise (efficient) unbiased estimation technique - ie the estimator has the smallest variance

Lecture 5: Hypothesis Testing What we know now: OLS is not only unbiased it is also the most precise (efficient) unbiased estimation technique - ie the estimator has the smallest variance (if the Gauss-Markov

### Chapter 7 Sampling and Sampling Distributions. Learning objectives

Chapter 7 Sampling and Sampling Distributions Slide 1 Learning objectives 1. Understand Simple Random Sampling 2. Understand Point Estimation and be able to compute point estimates 3. Understand Sampling

### Sampling (cont d) and Confidence Intervals Lecture 9 8 March 2006 R. Ryznar

Sampling (cont d) and Confidence Intervals 11.220 Lecture 9 8 March 2006 R. Ryznar Census Surveys Decennial Census Every (over 11 million) household gets the short form and 17% or 1/6 get the long form

### Description and comparison of the methods of cluster sampling and lot quality assurance sampling to assess immunization coverage

WHO/V&B/01.26 ENGLISH ONLY DISTR.: GENERAL Description and comparison of the methods of cluster sampling and lot quality assurance sampling to assess immunization coverage Written by Stacy Hoshaw-Woodard,

### How to Conduct a Hypothesis Test

How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some

### Testing: is my coin fair?

Testing: is my coin fair? Formally: we want to make some inference about P(head) Try it: toss coin several times (say 7 times) Assume that it is fair ( P(head)= ), and see if this assumption is compatible

### Nonparametric statistics and model selection

Chapter 5 Nonparametric statistics and model selection In Chapter, we learned about the t-test and its variations. These were designed to compare sample means, and relied heavily on assumptions of normality.

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### The Assumption(s) of Normality

The Assumption(s) of Normality Copyright 2000, 2011, J. Toby Mordkoff This is very complicated, so I ll provide two versions. At a minimum, you should know the short one. It would be great if you knew

### MODELLING OCCUPATIONAL EXPOSURE USING A RANDOM EFFECTS MODEL: A BAYESIAN APPROACH ABSTRACT

MODELLING OCCUPATIONAL EXPOSURE USING A RANDOM EFFECTS MODEL: A BAYESIAN APPROACH Justin Harvey * and Abrie van der Merwe ** * Centre for Statistical Consultation, University of Stellenbosch ** University

### Task force on quality of BCS data. Analysis of sample size in consumer surveys

Task force on quality of BCS data Analysis of sample size in consumer surveys theoretical considerations and factors determining minimum necessary sample sizes, link between country size and sample size

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### Bayesian and Classical Inference

Eco517, Part I Fall 2002 C. Sims Bayesian and Classical Inference Probability statements made in Bayesian and classical approaches to inference often look similar, but they carry different meanings. Because

### E205 Final: Version B

Name: Class: Date: E205 Final: Version B Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The owner of a local nightclub has recently surveyed a random

### Basic assumptions of conjoint analysis. * The product is a bundle of attributes. * Utility predicts behavior (i.e., purchases)

Basic assumptions of conjoint analysis * The product is a bundle of attributes * Utility of a product is a simple function of the utilities of the attributes * Utility predicts behavior (i.e., purchases)

### 1 Prior Probability and Posterior Probability

Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

### Likelihood Approaches for Trial Designs in Early Phase Oncology

Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth Garrett-Mayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University

### 4. Introduction to Statistics

Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

### AMS 5 HYPOTHESIS TESTING

AMS 5 HYPOTHESIS TESTING Hypothesis Testing Was it due to chance, or something else? Decide between two hypotheses that are mutually exclusive on the basis of evidence from observations. Test of Significance

### Chapter 8 CREDIBILITY HOWARD C. MAHLER AND CURTIS GARY DEAN

Chapter 8 CREDIBILITY HOWARD C. MAHLER AND CURTIS GARY DEAN 1. INTRODUCTION Credibility theory provides tools to deal with the randomness of data that is used for predicting future events or costs. For

### Sawtooth Software. The CBC System for Choice-Based Conjoint Analysis. Version 8 TECHNICAL PAPER SERIES. Sawtooth Software, Inc.

Sawtooth Software TECHNICAL PAPER SERIES The CBC System for Choice-Based Conjoint Analysis Version 8 Sawtooth Software, Inc. 1 Copyright 1993-2013, Sawtooth Software, Inc. 1457 E 840 N Orem, Utah +1 801

### AP Statistics 2012 Scoring Guidelines

AP Statistics 2012 Scoring Guidelines The College Board The College Board is a mission-driven not-for-profit organization that connects students to college success and opportunity. Founded in 1900, the

### THE INTERNET AND THE IRAQ WAR HOW ONLINE AMERICANS HAVE USED THE INTERNET TO

THE INTERNET AND THE IRAQ WAR HOW ONLINE AMERICANS HAVE USED THE INTERNET TO LEARN WAR NEWS, UNDERSTAND EVENTS, AND PROMOTE THEIR VIEWS AUTHORS LEE RAINIE, DIRECTOR SUSANNAH FOX, RESEARCH DIRECTOR DEBORAH

### Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 11

CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note Conditional Probability A pharmaceutical company is marketing a new test for a certain medical condition. According

### 6.4 Normal Distribution

Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

### Sawtooth Software. How Many Questions Should You Ask in Choice-Based Conjoint Studies? RESEARCH PAPER SERIES

Sawtooth Software RESEARCH PAPER SERIES How Many Questions Should You Ask in Choice-Based Conjoint Studies? Richard M. Johnson and Bryan K. Orme, Sawtooth Software, Inc. 1996 Copyright 1996-2002, Sawtooth

### Regression Analysis: Basic Concepts

The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance

### What is Bayesian statistics and why everything else is wrong

What is Bayesian statistics and why everything else is wrong 1 Michael Lavine ISDS, Duke University, Durham, North Carolina Abstract We use a single example to explain (1), the Likelihood Principle, (2)

### MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Exam Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Find the mean for the given sample data. 1) Bill kept track of the number of hours he spent

### research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

1 Hypothesis Testing Richard S. Balkin, Ph.D., LPC-S, NCC 2 Overview When we have questions about the effect of a treatment or intervention or wish to compare groups, we use hypothesis testing Parametric

### Chris Slaughter, DrPH. GI Research Conference June 19, 2008

Chris Slaughter, DrPH Assistant Professor, Department of Biostatistics Vanderbilt University School of Medicine GI Research Conference June 19, 2008 Outline 1 2 3 Factors that Impact Power 4 5 6 Conclusions

### Expected values, standard errors, Central Limit Theorem. Statistical inference

Expected values, standard errors, Central Limit Theorem FPP 16-18 Statistical inference Up to this point we have focused primarily on exploratory statistical analysis We know dive into the realm of statistical

### The problem with waiting time

The problem with waiting time Why the only way to real optimization of any process requires discrete event simulation Bill Nordgren, MS CIM, FlexSim Software Products Over the years there have been many

### Bayesian Updating: Odds Class 12, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

1 Learning Goals Bayesian Updating: Odds Class 12, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1. Be able to convert between odds and probability. 2. Be able to update prior odds to posterior odds

### 5.1 Identifying the Target Parameter

University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying