Does Sample Size Still Matter?


 Ursula Cook
 1 years ago
 Views:
Transcription
1 Does Sample Size Still Matter? David Bakken and Megan Bond, KJT Group Introduction The survey has been an important tool in academic, governmental, and commercial research since the 1930 s. Because in most cases the intent of a survey is to measure or estimate the value that a variable takes on in some population of interest, the development of sampling science has been integral to the advancement of survey research. While it may be possible to conduct a census among a small, easily accessed population, in most cases observing or measuring a sample of members of the population is necessary for reasons of cost, timing, and practicality. Most of our understanding of sampling theory and method is based on probability sampling. A probability sample is one in which all members of the population of interest have a known probability of being included in the sample. The most basic form of probability sampling is the simple random sample (SRS) without replacement. With SRS without replacement, each population member or unit has an equal probability of being selected for the sample (with that probability being equal to 1/N, where N is the size of the population). The importance of probability sampling becomes apparent when we want to make statements about the degree of difference between the value of a parameter (such as a mean, a proportion, or a regression coefficient) observed in the sample and the true population value of that parameter. Probability sampling allows us to estimate the error attributable to looking at a sample rather than the entire population. The math of probability sampling (based on the number of possible permutations, such as the number of ways that you can get a result of seven by rolling a pair of dice) is such that if we took an infinitely large number of samples of a given size and measured a parameter for each sample, such as the mean of a variable, the distribution of these sample means (a.k.a. the sampling distribution of the mean) would have a normal distribution and the mean of this distribution would equal the population mean. Furthermore, we can calculate a margin of error around our sample mean based on this overall sampling distribution of means. It turns out that the margin of error for a sample estimate is related to the size of the sample, and larger probability samples, all other things being equal, will have smaller sampling errors. If we were to compare the sampling distribution of means based on SRS samples of 1,000 and 100, we would expect to find greater variability in the means based on samples of size 100. In other words, larger samples lead to more precise estimates of the parameter under study. This property has guided the design of survey samples, and most market researchers understand the relationship between population size, sample size, and precision (or margin of error), and they may apply relatively simple formulas to determine the appropriate sample size to achieve a specific level of precision. 1
2 Different areas of research practice have different standards or expectations for survey sampling error. Opinion polls conducted to forecast the outcome of an election may be designed for a margin of error of around 3% at a stated likelihood (usually 95%), by which we mean that if we repeated the poll 100 times with the same size probability sample, we would expect to find a value for the expected vote to be within three percentage points on either side of the sample estimate in 95 of those samples. For commercial purposes, the desired precision or margin of error is more likely to be a function of the cost of making a bad bet on some future outcome (this is known as the loss function ) and the magnitude of a meaningful difference in the real world. For example, a small difference in market share may represent a significant increase in revenue for one company but mere accounting noise for another company, and each company will have different requirements for precision in order to make the right bet on a particular action. Precision comes with a cost, however, and as, Figure 1 illustrates, the relationship between precision and sample size is nonlinear. Reducing the margin of error at 95% confidence from 3% to 2% requires a near doubling of the sample size; reducing it from 3% to 1% requires a sevenfold increase in sample size. For that reason, researchers must find the appropriate tradeoff between cost and precision for a particular survey problem. We should mention two other considerations with respect to precision. When estimating proportions, the formula for calculating the margin of error for a specific sample size is: ME = z (p (1 p))/n where p is the expected proportion. The margin of error for a given sample size is greatest when that proportion is exactly 50%. If we have a prior belief that the population proportion of interest is less than 50%, we may able to achieve a specified level of precision with a smaller sample. However, in the absence of that prior belief, 50% is the most conservative estimate and many people use that value as a default. Similarly, the degree of variability in the population impacts precision, and if we have prior beliefs about the degree of homogeneity or heterogeneity in the population, we may be able to achieve the precision required to satisfy our decisionmaking needs with a smaller sample. Despite the wellknown math of probability sampling, market researchers often fail to conduct studies with samples that are large enough (based on sampling theory) to support their conclusions. Many researchers develop heuristics to simplify decisions about sample size. For example, psychology graduate students of a certain era were taught that a small sample (in particular for a randomized control group experiment) was 30, because that was the point at which one could switch from Student s T to a ztest to compare means. Market researchers have similar rules of thumb for determining the minimum number of elements from a population subgroup or segment to include in a sample. These rules of thumb are often intuitive rather than empiricallybased. 2
3 The Shrinking Market Research Survey Sample Market researchers face a number of challenges in designing and implementing sampling schemes for survey research. Unlike public opinion polling, where the target population may be more or less the same from one poll to another, market research surveys serve a wide variety of information objectives and last week s survey may have targeted a completely different population from this week s. The advent of online research, in particular, online panels, promised to make very large samples affordable. Alas while online panels have driven down CPI, small samples (with perhaps fewer than 100 respondents) have become commonplace. Reasons include the targeting of niche and otherwise low incidence segments and declining response rates. Faced with the need to help marketers make reasonable business decisions using survey data obtained from relatively small samples, we set out to investigate the relationship between sample size, the variability of parameter estimates based on those sample sizes, and the implications for managerial decisionmaking. We could, of course, calculate sampling errors for our different sample sizes and let it go at that. In fact, the frequentist approach, based on the long term frequency with which a parameter estimate occurs, such as the sampling distribution of the mean, stops at this point. However, this approach assumes that we are completely ignorant about the true population parameter value (even if we have measured it previously). Our research was inspired in part by the story of Jean Baptiste Eugène Estienne, a French Army general who devised a method using Bayes theorem that enabled assessment of the overall quality of a batch of 20,000 artillery shells by destructive testing of no more than 20 shells. At the outset of World War I Germany seized much of France s manufacturing capability, making the existing ammunition stores that much more precious. Applying the standard frequentist approach (calculating a sample size based on an acceptable margin of error around some criterion, such as 10% of all shells) would have required destruction of a few hundred shells. Estienne s method relied on updating the probability that a batch overall was defective (i.e., 10% or more bad shells) with each successive detonation. Thomas Bayes was an 18 th Century English clergyman and amateur mathematician who proposed a rule for accounting for uncertainty. Bayes theorem, as it is known, was described in a paper published posthumously in 1763 by the Royal Society. This theorem is the foundation of Bayesian statistical inference. In Bayesian statistics, probabilities reflect a belief about the sample of data under study rather than about the frequency of events across hypothetical samples. In effect, the Bayesian statistician asks the question, given the data I have in hand, what is the probability of any specific hypothesis about the population parameter value? In contrast, the frequentist asks how probable is the data, given my hypothesis? In effect, the frequentist approach decides whether to accept the data as real. With respect to small samples, we speculated that a Bayesian approach to inference would provide a means to account for uncertainty in a way that gives managers a better understanding of the probability of the sample data with respect to a specific decision. In this approach, we take the data as given and then calculate the probability of different possible true values. This requires a shift in thinking about the marketer s decision problem. Suppose that a company is planning to launch a new product and wants to determine the potential adoption rate at a few 3
4 different price points. Imagine that the company conducts a survey, employing a simple direct elicitation of willingness to pay, such as the GaborGranger method. Further imagine that the results indicate that 15% of the target market says they will definitely purchase the product at a price of $15 or less. The company has determined that they need to achieve at least 20% market adoption at a price of $15 in order move ahead with the launch. The standard frequentist approach is not much help in this case. If the survey sample is relatively small, the 20% threshold is likely to fall within the margin of error; if the sample is large, the resulting increase in precision will shrink the confidence interval around the 15% estimate such that the 20% threshold looks extremely unlikely. We can use Bayes theorem to reduce the uncertainty. Bayes theorem exploits the fact that the joint probability of two events, A and B, can be written as the product of the probability of one event and the conditional probability of the second event, given the first event. While there are some different ways to express the theorem, here is a simple representation: Prob H = xy xy + z(1 x) We wish to estimate the probability of our hypothesis (for example, that the adoption rate will be 20%). The value X reflects our best guess about the likelihood of the hypothesis in the absence of any data (our prior probability belief). Y is the probability that the hypothesis is true given the data, and z is the probability of observing the data if the hypothesis is not true. Overview of Our Study The overall objective of this study, as noted previously, was to assess the variability in parameter estimates for samples of different sizes. We followed the classic paradigm for evaluation of parameter estimates under varying treatments or methods. We started with a population where the parameter values were known. In many studies such a population is synthetic; the observations are generated by specifying the parameter values and then using Monte Carlo simulation methods to create one or more synthetic populations with those parameter values. In our case, we started with a reasonably large sample of actual survey responses and, treating that sample as the population, drew multiple simple random samples of varying size (as described below). Using responses to a choicebased conjoint exercise that was embedded in an online survey of approximately 897 individuals, we created a series of samples of different sizes using different restrictions to reflect the ways in which both probability and convenience samples might be generated. The choicebased conjoint was a simple brand and price exercise that included four brands of LCD television and four price levels. We conducted two separate experiments, as described below. 4
5 Experiment 1: We drew multiples of ten random samples of 25, 50, 75, 100, 150, 225 and 450 from our population of 897 respondents, resulting in 70 individual samples. We estimated HB models for each sample (using Sawtooth Software s CBCHB program). Experiment 2: We repeated the method of Experiment 1 but altered the sampling strategy so that samples were more homogeneous. We used two different sets of restrictions to achieve this, one based on demographics, and one based on an attitudinal measure in the original survey. We applied the same overall design, with multiples of 10 samples of size 25, 50, 75, and 100, resulting in a total of 40 samples based on the demographic restriction and 40 based on the attitudinal restriction. Results When using results from choicebased conjoint analysis for researchonresearch, we usually employ choice shares predicted by a market simulator (employing a logit transformation to generate purchase probabilities). This method is preferable to comparing different samples using modelbased parameters (e.g., regression coefficients) because, in the multinomial logit model that captures the likelihood of choosing an alternative given the alternative s attributes, each sample has a unique scaling parameter. Transforming the model coefficients into predicted choice shares removes this difference between samples. In addition to comparing samples of different size with respect to the variance in predicted choice shares and deviation from the true population value, we also looked at aggregate and individual (i.e., hit rate ) validation using holdout choice tasks. Experiment 1 Figure 2 shows the average prediction variance across the 10 replicates at each sample size. There are two interesting patterns here. First, some brands have smaller prediction variance. These happen to be somewhat larger brands than the other two. The second pattern is that prediction variance shrinks as sample size increases, dropping roughly in half when the sample size is at least 100, compared to samples of 25. Insert Figure 2 here. Figure 3 compares aggregate holdout prediction errors for each of the sample replicates. Aggregate holdout prediction error is the difference between the shares predicted for each brand at the prices set for a holdout task (that is not included in the modeling) and the actual choices that respondents made in those tasks. Larger errors reflect more noise in the parameters, and we see that these errors are both larger on average and more variable when the sample is small than when it is larger. Insert Figure 3 here. 5
6 Figure 4 compares individual hit rates for each of the sample replicates. The hit rate is the proportion of times the prediction choice for a given respondent matches the actual choice the respondent made in that holdout task. With one notable exception (samples of 100), the average hit rates and the variability in hit rates are similar across different sample sizes. This is probably a consequence of the HB method used to estimate the individuallevel utilities. This method borrows data from other respondents to derive individual models for each respondent. It is possible that the hit rates for smaller samples are the result of overfitting since there are fewer cases to borrow data from (which pulls the individual models in the direction of the overall average) while with larger samples, the individual parameter space is better represented, so the borrowed data is more probable. Insert Figure 4 here. The final indication of the potential error associated with sample size is reflected in the differences between predicted choice shares based on each sample replicate and the overall population value (the modeled choice shares using the entire sample). Figure 5 shows these errors for predicted choice shares for just one of the brands. As with the other measures, individual sample prediction errors are larger for smaller samples, but when the samples are averaged (within sample size), the predictions are pretty close to the actual population value. Insert Figure 5 here. Experiment 2 As we noted in the description of our second experiment, market research samples often are restricted in ways that might impact the variability or heterogeneity within the sample. All other things being equal, samples from populations that are more homogeneous should produce more consistent parameter estimates (as long as the population variability is related to the parameter of interest). We devised two constrained sampling approaches to yield samples that would be either demographically more similar (using age) or attitudinally more similar. Overall, as Figures 6 and 7 indicate, the patterns of variability in predicted choice shares in these constrained samples is similar to the unconstrained samples. Since our sample restrictions were arbitrary and only two of many possible sample restrictions, it is possible that any resulting increase in homogeneity was either small or not relevant to the parameters of interest. It is also possible that the HB method attenuates the impact of increased homogeneity on the individual level choice models. Insert Figures 6 and 7 about here. 6
7 Accounting for Uncertainty Looking across these sample replicates, we want to know, for a given sample size, how likely we are to make a seriously wrong decision. We applied Bayes theorem to estimate the uncertainty associated with samples of different size. Knowing that the population choice share for Toshiba at a particular price is roughly 19% and that if the price is lowered by $100 the choice share doubles, we can calculate the uncertainty for each of the samples. Figure 8 compares the results of this calculation for samples of 25 and 100. We can see that we should have greater confidence in any one sample of 100 than in any one sample of size 25. Insert Figure 8 about here. Conclusions For us, our experiments indicate that sample size does still matter. Moreover, we now have greater confidence in drawing the line for minimum sample size of about 100 respondents, at least for studies involving relatively simple choicebased conjoint models estimated using a hierarchical Bayesian method. Regardless of the sample size, Bayes theorem offers a way to quantify the uncertainty around population parameters. Bayes theorem requires that we alter our way of thinking about the data. Rather than base our inferences on the long term frequencies from hypothetical sample replicates, Bayes theorem allows us to ground our estimates in the data at hand. We do not view Bayesian inference as a total replacement for frequentist methods of estimating sampling error. Instead, we see Bayes theorem as an additional tool that can help managers make the best possible decisions or bets based on all the information we have available. 7
8 Figures Figure 1. Figure 2. 8
9 Figure 3. Figure 4. 9
10 Figure 5. Figure 6. 10
11 Figure 7. Figure 8. 11
Sample Size Issues for Conjoint Analysis
Chapter 7 Sample Size Issues for Conjoint Analysis I m about to conduct a conjoint analysis study. How large a sample size do I need? What will be the margin of error of my estimates if I use a sample
More informationThe HB. How Bayesian methods have changed the face of marketing research. Summer 2004
The HB How Bayesian methods have changed the face of marketing research. 20 Summer 2004 Reprinted with permission from Marketing Research, Summer 2004, published by the American Marketing Association.
More informationTradeOff Study Sample Size: How Low Can We go?
TradeOff Study Sample Size: How Low Can We go? The effect of sample size on model error is examined through several commercial data sets, using five tradeoff techniques: ACA, ACA/HB, CVA, HBReg and
More informationA Procedure for Classifying New Respondents into Existing Segments Using Maximum Difference Scaling
A Procedure for Classifying New Respondents into Existing Segments Using Maximum Difference Scaling Background Bryan Orme and Rich Johnson, Sawtooth Software March, 2009 Market segmentation is pervasive
More informationAnalyzing Portfolio Expected Loss
Analyzing Portfolio Expected Loss In this white paper we discuss the methodologies that Visible Equity employs in the calculation of portfolio expected loss. Portfolio expected loss calculations combine
More informationSurvey Process White Paper Series The Six Steps in Conducting Quantitative Marketing Research
Survey Process White Paper Series The Six Steps in Conducting Quantitative Marketing Research POLARIS MARKETING RESEARCH, INC. 1455 LINCOLN PARKWAY, SUITE 320 ATLANTA, GEORGIA 30346 404.816.0353 www.polarismr.com
More informationAn Introduction to Bayesian Statistics
An Introduction to Bayesian Statistics Robert Weiss Department of Biostatistics UCLA School of Public Health robweiss@ucla.edu April 2011 Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA
More informationDetermining Sample Size 1
Fact Sheet PEOD6 November 1992 Determining Sample Size 1 Glenn D. Israel 2 Perhaps the most frequently asked question concerning sampling is, "What size sample do I need?" The answer to this question
More informationComparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom
Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Be able to explain the difference between the pvalue and a posterior
More informationSawtooth Software. Assessing the Monetary Value of Attribute Levels with Conjoint Analysis: Warnings and Suggestions RESEARCH PAPER SERIES
Sawtooth Software RESEARCH PAPER SERIES Assessing the Monetary Value of Attribute Levels with Conjoint Analysis: Warnings and Suggestions Bryan K. Orme, Sawtooth Software, Inc. 2001 Copyright 20012002,
More informationFixedEffect Versus RandomEffects Models
CHAPTER 13 FixedEffect Versus RandomEffects Models Introduction Definition of a summary effect Estimating the summary effect Extreme effect size in a large study or a small study Confidence interval
More informationChapter 16 Multiple Choice Questions (The answers are provided after the last question.)
Chapter 16 Multiple Choice Questions (The answers are provided after the last question.) 1. Which of the following symbols represents a population parameter? a. SD b. σ c. r d. 0 2. If you drew all possible
More informationCONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE
1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,
More informationBayesian Statistical Analysis in Medical Research
Bayesian Statistical Analysis in Medical Research David Draper Department of Applied Mathematics and Statistics University of California, Santa Cruz draper@ams.ucsc.edu www.ams.ucsc.edu/ draper ROLE Steering
More informationDETERMINING SURVEY SAMPLE SIZE A SIMPLE PLAN
DETERMINING SURVEY SAMPLE SIZE A SIMPLE PLAN Prepared by Market Directions Market Directions B O S T O N 6173231862 8004759808 www.marketdirectionsmr.com info@marketdirectionsmr.com DETERMING SAMPLE
More informationFundamental Probability and Statistics
Fundamental Probability and Statistics "There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 16: Bayesian inference (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 35 Priors 2 / 35 Frequentist vs. Bayesian inference Frequentists treat the parameters as fixed (deterministic).
More informationExperimental data and survey data
Experimental data and survey data An experiment involves the collection of measurements or observations about populations that are treated or controlled by the experimenter. A survey is an examination
More informationKeep It Simple: Easy Ways To Estimate Choice Models For Single Consumers
Keep It Simple: Easy Ways To Estimate Choice Models For Single Consumers Christine Ebling, University of Technology Sydney, christine.ebling@uts.edu.au Bart Frischknecht, University of Technology Sydney,
More informationAppendix Methodology and Statistics
Appendix Methodology and Statistics Introduction Science and Engineering Indicators (SEI) contains data compiled from a variety of sources. This appendix explains the methodological and statistical criteria
More informationSummary of Probability
Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible
More informationLikelihood: Frequentist vs Bayesian Reasoning
"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B University of California, Berkeley Spring 2009 N Hallinan Likelihood: Frequentist vs Bayesian Reasoning Stochastic odels and
More informationMAJOR FIELD TESTS Description of Test Reports
MAJOR FIELD TESTS Description of Test Reports NOTE: Some of the tests do not include Assessment Indicators and some tests do not include Subscores. Departmental Roster Includes scale scores for all students
More informationSample Size Determination
Sample Size Determination Population A: 10,000 Population B: 5,000 Sample 10% Sample 15% Sample size 1000 Sample size 750 The process of obtaining information from a subset (sample) of a larger group (population)
More informationWhen Does it Make Sense to Perform a MetaAnalysis?
CHAPTER 40 When Does it Make Sense to Perform a MetaAnalysis? Introduction Are the studies similar enough to combine? Can I combine studies with different designs? How many studies are enough to carry
More informationAn Introduction to Sampling
An Introduction to Sampling Sampling is the process of selecting a subset of units from the population. We use sampling formulas to determine how many to select because it is based on the characteristics
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationPosterior probability!
Posterior probability! P(x θ): old name direct probability It gives the probability of contingent events (i.e. observed data) for a given hypothesis (i.e. a model with known parameters θ) L(θ)=P(x θ):
More informationChecklists and Examples for Registering Statistical Analyses
Checklists and Examples for Registering Statistical Analyses For welldesigned confirmatory research, all analysis decisions that could affect the confirmatory results should be planned and registered
More informationModelbased Synthesis. Tony O Hagan
Modelbased Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that
More informationAssociation Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
More information15.0 More Hypothesis Testing
15.0 More Hypothesis Testing 1 Answer Questions Type I and Type II Error Power Calculation Bayesian Hypothesis Testing 15.1 Type I and Type II Error In the philosophy of hypothesis testing, the null hypothesis
More informationProbability. a number between 0 and 1 that indicates how likely it is that a specific event or set of events will occur.
Probability Probability Simple experiment Sample space Sample point, or elementary event Event, or event class Mutually exclusive outcomes Independent events a number between 0 and 1 that indicates how
More informationHomework #3 is due Friday by 5pm. Homework #4 will be posted to the class website later this week. It will be due Friday, March 7 th, at 5pm.
Homework #3 is due Friday by 5pm. Homework #4 will be posted to the class website later this week. It will be due Friday, March 7 th, at 5pm. Political Science 15 Lecture 12: Hypothesis Testing Sampling
More informationLecture 4 : Bayesian inference
Lecture 4 : Bayesian inference The Lecture dark 4 energy : Bayesian puzzle inference What is the Bayesian approach to statistics? How does it differ from the frequentist approach? Conditional probabilities,
More informationBayesian data analysis: what it is and what it is not. Prof. Andrew Gelman Dept. of Statistics Columbia University
Bayesian data analysis: what it is and what it is not Prof. Andrew Gelman Dept. of Statistics Columbia University Talk for Columbia University Department of Computer Science, 15 Dec 2003 1 Themes Popular
More informationHypothesis Testing. Chapter Introduction
Contents 9 Hypothesis Testing 553 9.1 Introduction............................ 553 9.2 Hypothesis Test for a Mean................... 557 9.2.1 Steps in Hypothesis Testing............... 557 9.2.2 Diagrammatic
More informationMAT140: Applied Statistical Methods Summary of Calculating Confidence Intervals and Sample Sizes for Estimating Parameters
MAT140: Applied Statistical Methods Summary of Calculating Confidence Intervals and Sample Sizes for Estimating Parameters Inferences about a population parameter can be made using sample statistics for
More informationCurriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 20092010
Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 20092010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different
More informationFrom the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
More informationMarketing Mix Modelling and Big Data P. M Cain
1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored
More informationSample Size and Power in Clinical Trials
Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance
More informationSociology 6Z03 Topic 15: Statistical Inference for Means
Sociology 6Z03 Topic 15: Statistical Inference for Means John Fox McMaster University Fall 2016 John Fox (McMaster University) Soc 6Z03: Statistical Inference for Means Fall 2016 1 / 41 Outline: Statistical
More informationSampling Distributions and the Central Limit Theorem
135 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics Chapter 10 Sampling Distributions and the Central Limit Theorem In the previous chapter we explained
More informationStatistical Inference
Statistical Inference Idea: Estimate parameters of the population distribution using data. How: Use the sampling distribution of sample statistics and methods based on what would happen if we used this
More informationLinear regression methods for large n and streaming data
Linear regression methods for large n and streaming data Large n and small or moderate p is a fairly simple problem. The sufficient statistic for β in OLS (and ridge) is: The concept of sufficiency is
More informationSAS Certificate Applied Statistics and SAS Programming
SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and Advanced SAS Programming Brigham Young University Department of Statistics offers an Applied Statistics and
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002Topics in StatisticsBiological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationKEYWORDS Monte Carlo simulation; profit shares; profit commission; reinsurance.
THE USE OF MONTE CARLO SIMULATION OF MORTALITY EXPERIENCE IN ESTIMATING THE COST OF PROFIT SHARING ARRANGEMENTS FOR GROUP LIFE POLICIES. By LJ Rossouw and P Temple ABSTRACT This paper expands previous
More informationR Simulations: Monty Hall problem
R Simulations: Monty Hall problem Monte Carlo Simulations Monty Hall Problem Statistical Analysis Simulation in R Exercise 1: A Gift Giving Puzzle Exercise 2: Gambling Problem R Simulations: Monty Hall
More informationMath 140: Introductory Statistics Instructor: Julio C. Herrera Exam 3 January 30, 2015
Name: Exam Score: Instructions: This exam covers the material from chapter 7 through 9. Please read each question carefully before you attempt to solve it. Remember that you have to show all of your work
More information9. Sampling Distributions
9. Sampling Distributions Prerequisites none A. Introduction B. Sampling Distribution of the Mean C. Sampling Distribution of Difference Between Means D. Sampling Distribution of Pearson's r E. Sampling
More informationLecture 3 : Hypothesis testing and modelfitting
Lecture 3 : Hypothesis testing and modelfitting These dark lectures energy puzzle Lecture 1 : basic descriptive statistics Lecture 2 : searching for correlations Lecture 3 : hypothesis testing and modelfitting
More informationPOLI 300 Handout #2 N. R. Miller RANDOM SAMPLING. Key Definitions Pertaining to Sampling
POLI 300 Handout #2 N. R. Miller Key Definitions Pertaining to Sampling RANDOM SAMPLING 1. Population: the set of units (in survey research, usually individuals or households), N in number, that are to
More informationAccurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios
Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are
More informationThe Margin of Error for Differences in Polls
The Margin of Error for Differences in Polls Charles H. Franklin University of Wisconsin, Madison October 27, 2002 (Revised, February 9, 2007) The margin of error for a poll is routinely reported. 1 But
More informationValuation of transport time and reliability in freight transport
Summary TØIreport 1083/2010 Authors: Askill Harkjerr Halse, Hanne Samstad, Marit Killi and Stefan Flügel, Farideh Ramjerdi Oslo 2010, 177 pages in Norwegian Valuation of transport time and reliability
More informationWhat s New in Econometrics? Lecture 8 Cluster and Stratified Sampling
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and
More informationINTRODUCTORY STATISTICS
INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore
More informationCONFIDENCE INTERVALS FOR MEANS AND PROPORTIONS
LESSON SEVEN CONFIDENCE INTERVALS FOR MEANS AND PROPORTIONS An interval estimate for μ of the form a margin of error would provide the user with a measure of the uncertainty associated with the point estimate.
More informationSampling and Hypothesis Testing
Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus
More informationFoundations of Statistics Frequentist and Bayesian
Mary Parker, http://www.austincc.edu/mparker/stat/nov04/ page 1 of 13 Foundations of Statistics Frequentist and Bayesian Statistics is the science of information gathering, especially when the information
More informationSawtooth Software. Which Conjoint Method Should I Use? RESEARCH PAPER SERIES. Bryan K. Orme Sawtooth Software, Inc.
Sawtooth Software RESEARCH PAPER SERIES Which Conjoint Method Should I Use? Bryan K. Orme Sawtooth Software, Inc. Copyright 2009, Sawtooth Software, Inc. 530 W. Fir St. Sequim, 0 WA 98382 (360) 6812300
More informationOLS is not only unbiased it is also the most precise (efficient) unbiased estimation technique  ie the estimator has the smallest variance
Lecture 5: Hypothesis Testing What we know now: OLS is not only unbiased it is also the most precise (efficient) unbiased estimation technique  ie the estimator has the smallest variance (if the GaussMarkov
More informationChapter 7 Sampling and Sampling Distributions. Learning objectives
Chapter 7 Sampling and Sampling Distributions Slide 1 Learning objectives 1. Understand Simple Random Sampling 2. Understand Point Estimation and be able to compute point estimates 3. Understand Sampling
More informationSampling (cont d) and Confidence Intervals Lecture 9 8 March 2006 R. Ryznar
Sampling (cont d) and Confidence Intervals 11.220 Lecture 9 8 March 2006 R. Ryznar Census Surveys Decennial Census Every (over 11 million) household gets the short form and 17% or 1/6 get the long form
More informationDescription and comparison of the methods of cluster sampling and lot quality assurance sampling to assess immunization coverage
WHO/V&B/01.26 ENGLISH ONLY DISTR.: GENERAL Description and comparison of the methods of cluster sampling and lot quality assurance sampling to assess immunization coverage Written by Stacy HoshawWoodard,
More informationHow to Conduct a Hypothesis Test
How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some
More informationTesting: is my coin fair?
Testing: is my coin fair? Formally: we want to make some inference about P(head) Try it: toss coin several times (say 7 times) Assume that it is fair ( P(head)= ), and see if this assumption is compatible
More informationNonparametric statistics and model selection
Chapter 5 Nonparametric statistics and model selection In Chapter, we learned about the ttest and its variations. These were designed to compare sample means, and relied heavily on assumptions of normality.
More informationInferential Statistics
Inferential Statistics Sampling and the normal distribution Zscores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are
More informationThe Assumption(s) of Normality
The Assumption(s) of Normality Copyright 2000, 2011, J. Toby Mordkoff This is very complicated, so I ll provide two versions. At a minimum, you should know the short one. It would be great if you knew
More informationMODELLING OCCUPATIONAL EXPOSURE USING A RANDOM EFFECTS MODEL: A BAYESIAN APPROACH ABSTRACT
MODELLING OCCUPATIONAL EXPOSURE USING A RANDOM EFFECTS MODEL: A BAYESIAN APPROACH Justin Harvey * and Abrie van der Merwe ** * Centre for Statistical Consultation, University of Stellenbosch ** University
More informationTask force on quality of BCS data. Analysis of sample size in consumer surveys
Task force on quality of BCS data Analysis of sample size in consumer surveys theoretical considerations and factors determining minimum necessary sample sizes, link between country size and sample size
More informationAuxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationBayesian and Classical Inference
Eco517, Part I Fall 2002 C. Sims Bayesian and Classical Inference Probability statements made in Bayesian and classical approaches to inference often look similar, but they carry different meanings. Because
More informationE205 Final: Version B
Name: Class: Date: E205 Final: Version B Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The owner of a local nightclub has recently surveyed a random
More informationBasic assumptions of conjoint analysis. * The product is a bundle of attributes. * Utility predicts behavior (i.e., purchases)
Basic assumptions of conjoint analysis * The product is a bundle of attributes * Utility of a product is a simple function of the utilities of the attributes * Utility predicts behavior (i.e., purchases)
More information1 Prior Probability and Posterior Probability
Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which
More informationLikelihood Approaches for Trial Designs in Early Phase Oncology
Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth GarrettMayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University
More information4. Introduction to Statistics
Statistics for Engineers 41 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation
More informationAMS 5 HYPOTHESIS TESTING
AMS 5 HYPOTHESIS TESTING Hypothesis Testing Was it due to chance, or something else? Decide between two hypotheses that are mutually exclusive on the basis of evidence from observations. Test of Significance
More informationChapter 8 CREDIBILITY HOWARD C. MAHLER AND CURTIS GARY DEAN
Chapter 8 CREDIBILITY HOWARD C. MAHLER AND CURTIS GARY DEAN 1. INTRODUCTION Credibility theory provides tools to deal with the randomness of data that is used for predicting future events or costs. For
More informationSawtooth Software. The CBC System for ChoiceBased Conjoint Analysis. Version 8 TECHNICAL PAPER SERIES. Sawtooth Software, Inc.
Sawtooth Software TECHNICAL PAPER SERIES The CBC System for ChoiceBased Conjoint Analysis Version 8 Sawtooth Software, Inc. 1 Copyright 19932013, Sawtooth Software, Inc. 1457 E 840 N Orem, Utah +1 801
More informationAP Statistics 2012 Scoring Guidelines
AP Statistics 2012 Scoring Guidelines The College Board The College Board is a missiondriven notforprofit organization that connects students to college success and opportunity. Founded in 1900, the
More informationTHE INTERNET AND THE IRAQ WAR HOW ONLINE AMERICANS HAVE USED THE INTERNET TO
THE INTERNET AND THE IRAQ WAR HOW ONLINE AMERICANS HAVE USED THE INTERNET TO LEARN WAR NEWS, UNDERSTAND EVENTS, AND PROMOTE THEIR VIEWS AUTHORS LEE RAINIE, DIRECTOR SUSANNAH FOX, RESEARCH DIRECTOR DEBORAH
More informationDiscrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 11
CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note Conditional Probability A pharmaceutical company is marketing a new test for a certain medical condition. According
More information6.4 Normal Distribution
Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under
More informationSawtooth Software. How Many Questions Should You Ask in ChoiceBased Conjoint Studies? RESEARCH PAPER SERIES
Sawtooth Software RESEARCH PAPER SERIES How Many Questions Should You Ask in ChoiceBased Conjoint Studies? Richard M. Johnson and Bryan K. Orme, Sawtooth Software, Inc. 1996 Copyright 19962002, Sawtooth
More informationRegression Analysis: Basic Concepts
The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance
More informationWhat is Bayesian statistics and why everything else is wrong
What is Bayesian statistics and why everything else is wrong 1 Michael Lavine ISDS, Duke University, Durham, North Carolina Abstract We use a single example to explain (1), the Likelihood Principle, (2)
More informationMULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Exam Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Find the mean for the given sample data. 1) Bill kept track of the number of hours he spent
More informationresearch/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other
1 Hypothesis Testing Richard S. Balkin, Ph.D., LPCS, NCC 2 Overview When we have questions about the effect of a treatment or intervention or wish to compare groups, we use hypothesis testing Parametric
More informationChris Slaughter, DrPH. GI Research Conference June 19, 2008
Chris Slaughter, DrPH Assistant Professor, Department of Biostatistics Vanderbilt University School of Medicine GI Research Conference June 19, 2008 Outline 1 2 3 Factors that Impact Power 4 5 6 Conclusions
More informationExpected values, standard errors, Central Limit Theorem. Statistical inference
Expected values, standard errors, Central Limit Theorem FPP 1618 Statistical inference Up to this point we have focused primarily on exploratory statistical analysis We know dive into the realm of statistical
More informationThe problem with waiting time
The problem with waiting time Why the only way to real optimization of any process requires discrete event simulation Bill Nordgren, MS CIM, FlexSim Software Products Over the years there have been many
More informationBayesian Updating: Odds Class 12, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom
1 Learning Goals Bayesian Updating: Odds Class 12, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1. Be able to convert between odds and probability. 2. Be able to update prior odds to posterior odds
More information5.1 Identifying the Target Parameter
University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying
More informationDecision Theory. 36.1 Rational prospecting
36 Decision Theory Decision theory is trivial, apart from computational details (just like playing chess!). You have a choice of various actions, a. The world may be in one of many states x; which one
More informationBayesian networks. Data Mining. Abraham Otero. Data Mining. Agenda. Bayes' theorem Naive Bayes Bayesian networks Bayesian networks (example)
Bayesian networks 1/40 Agenda Introduction Bayes' theorem Bayesian networks Quick reference 2/40 1 Introduction Probabilistic methods: They are backed by a strong mathematical formalism. They can capture
More information