Sampling Distributions and Simulation OPRE 6301

Similar documents
Important Probability Distributions OPRE 6301

Characteristics of Binomial Distributions

An Introduction to Basic Statistics and Probability

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

Normal distribution. ) 2 /2σ. 2π σ

Week 3&4: Z tables and the Sampling Distribution of X

Random variables, probability distributions, binomial random variable

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Introduction to Hypothesis Testing OPRE 6301

Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

You flip a fair coin four times, what is the probability that you obtain three heads.

Point and Interval Estimates

Sampling Distributions

Chapter 4 Lecture Notes

Stats on the TI 83 and TI 84 Calculator

5.1 Identifying the Target Parameter

4. Continuous Random Variables, the Pareto and Normal Distributions

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

The Math. P (x) = 5! = = 120.

10.2 Series and Convergence

Lecture 14. Chapter 7: Probability. Rule 1: Rule 2: Rule 3: Nancy Pfenning Stats 1000

MAS108 Probability I

Probability: Terminology and Examples Class 2, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Section 1.3 P 1 = 1 2. = P n = 1 P 3 = Continuing in this fashion, it should seem reasonable that, for any n = 1, 2, 3,..., =

16. THE NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION

WHERE DOES THE 10% CONDITION COME FROM?

6 3 The Standard Normal Distribution

REPEATED TRIALS. The probability of winning those k chosen times and losing the other times is then p k q n k.

2. Discrete random variables

How To Check For Differences In The One Way Anova

TEACHER NOTES MATH NSPIRED

Chapter 6: Point Estimation. Fall Probability & Statistics

Introduction to the Practice of Statistics Sixth Edition Moore, McCabe Section 5.1 Homework Answers

Chapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS. Part 3: Discrete Uniform Distribution Binomial Distribution

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Hypothesis Testing for Beginners

Math 461 Fall 2006 Test 2 Solutions

ST 371 (IV): Discrete Random Variables

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Chapter 4. Probability and Probability Distributions

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe

ECE302 Spring 2006 HW3 Solutions February 2,

9. Sampling Distributions

Homework 4 - KEY. Jeff Brenion. June 16, Note: Many problems can be solved in more than one way; we present only a single solution here.

8. THE NORMAL DISTRIBUTION

Math Quizzes Winter 2009

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

5544 = = = Now we have to find a divisor of 693. We can try 3, and 693 = 3 231,and we keep dividing by 3 to get: 1

Lecture 8. Confidence intervals and the central limit theorem

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

The normal approximation to the binomial

I. Pointwise convergence

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. A) B) C) D) 0.

CHAPTER 7 SECTION 5: RANDOM VARIABLES AND DISCRETE PROBABILITY DISTRIBUTIONS

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Feb 7 Homework Solutions Math 151, Winter Chapter 4 Problems (pages )

Opgaven Onderzoeksmethoden, Onderdeel Statistiek

Joint Exam 1/P Sample Exam 1

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

E3: PROBABILITY AND STATISTICS lecture notes

3.4 Statistical inference for 2 populations based on two samples

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Descriptive Statistics and Measurement Scales

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

Unit 4 The Bernoulli and Binomial Distributions

PROBABILITY AND SAMPLING DISTRIBUTIONS

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

Exploratory Data Analysis

Math 431 An Introduction to Probability. Final Exam Solutions

Principle of Data Reduction

Statistics 104: Section 6!

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Introduction to Hypothesis Testing

WEEK #22: PDFs and CDFs, Measures of Center and Spread

Lecture 3: Continuous distributions, expected value & mean, variance, the normal distribution

Section 5 Part 2. Probability Distributions for Discrete Random Variables

Statistics 2014 Scoring Guidelines

Notes on Continuous Random Variables

Math 58. Rumbos Fall Solutions to Review Problems for Exam 2

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Recall this chart that showed how most of our course would be organized:

Ch5: Discrete Probability Distributions Section 5-1: Probability Distribution

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Math/Stats 425 Introduction to Probability. 1. Uncertainty and the axioms of probability

The Normal Approximation to Probability Histograms. Dice: Throw a single die twice. The Probability Histogram: Area = Probability. Where are we going?

DETERMINE whether the conditions for a binomial setting are met. COMPUTE and INTERPRET probabilities involving binomial random variables

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Business Statistics 41000: Probability 1

Lecture 6: Discrete & Continuous Probability and Random Variables

ECE302 Spring 2006 HW4 Solutions February 6,

TImath.com. F Distributions. Statistics

WISE Sampling Distribution of the Mean Tutorial

Covariance and Correlation

The normal approximation to the binomial

The Assumption(s) of Normality

Transcription:

Sampling Distributions and Simulation OPRE 6301

Basic Concepts... We are often interested in calculating some properties, i.e., parameters, of a population. For very large populations, the exact calculation of a parameter is typically prohibitive. A more economical/sensible approach is to take a random sample from the population of interest, calculate a statistic related to the parameter of interest, and then make an inference about the parameter based on the value of the statistic. This is called statistical inference. Any statistic is a random variable. The distribution of a statistic is a sampling distribution. The sampling distribution helps us understand how close is a statistic to its corresponding population parameter. Typical parameters of interest include: Mean Proportion Variance 1

Sample Mean... The standard statistic that is used to infer about the population mean is the sample mean. Rolling a Die... Suppose a fair die is rolled an infinite number of times. Imagine a population that is consisted of such sequences of outcomes. Let X be the outcome of a single roll. Then, the probability mass function of X is: X = x P(x) 1 1/6 2 1/6 3 1/6 4 1/6 5 1/6 6 1/6 2

Rolling the die once can be viewed as one draw from an infinite population with mean µ = E(X) = 3.5 and variance σ 2 = V (X) = 2.92. Let us pretend that we do not know the population mean and would like to estimate that from a sample. Each draw can be viewed as a sample of size 1. Suppose we simply used the outcome of a single roll, i.e., a realized value of X, as an estimate of µ. How good is this estimate? Our estimate could assume any of the values 1, 2,..., 6 with probability 1/6. If we used this approach repeatedly, then the average deviation of our estimates from µ would be E(X µ) = 0. This certainly is desirable, but note that the variance of the deviation X µ is σ 2 and, in particular, that this gap between our estimate and µ is never less than 0.5! 3

Can we improve this situation? The key concept is that we need to increase the sample size. Consider now a sample of size 2. Denote the outcomes of two independent rolls by X 1 and X 2. Let the average of the two outcomes be X ( X bar ), i.e., X = X 1 + X 2 2 How good is X as an estimate of µ? Terminology: A method for estimating a parameter of a population is called an estimator. Here, X is an estimator for µ and, for this reason, it is often denoted as ˆµ ( µ hat ). Note that ˆµ is a function of the observations X 1 and X 2.. 4

Let us look at the sampling distribution of X. There are 36 possible pairs of (X 1, X 2 ): (1, 1), (1, 2), (1, 3),..., (6, 4), (6, 5), (6, 6). The corresponding values of X for these pairs are: 1, 1.5, 2,..., 5, 5.5, 6. Using the fact that each pair has probability 1/36, we obtain (there are only 11 possible values for X, since some values, 3.5 for example, occur in more than one way): X = x P(x) Pairs 1.0 1/36 (1,1) 1.5 2/36 (1,2), (2,1) 2.0 3/36 (1,3), (2,2), (3,1) 2.5 4/36 (1,4), (2,3), (3,2), (4,1) 3.0 5/36 (1,5), (2,4), (3,3), (4,2), (5,1) 3.5 6/36 (1,6), (2,5), (3,4), (4,3), (5,2), (6,1) 4.0 5/36 (2,6), (3,5), (4,4), (5,3), (6,2) 4.5 4/36 (3,6), (4,5), (5,4), (6,3) 5.0 3/36 (4,6), (5,5), (6,4) 5.5 2/36 (5,6), (6,5) 6.0 1/36 (6,6) As you can see, the sampling distribution of X is more complicated. (Imagine doing this for n = 3, 4,....) 5

The sampling distribution of X is charted below: 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 Notice that this distribution is symmetric around 3.5 and that it has less variability than that of X, which is also charted below: 1 2 3 4 5 6 6

Formally, we have ( ) X1 + X 2 E( X) = E 2 = 1 2 (E(X 1) + E(X 2 )) = µ and V ( X) = V ( ) X1 + X 2 2 = 1 4 (V (X 1) + V (X 2 )) = σ2 2. (X 1 and X 2 are independent) Thus, the variance of our estimator has been reduced in half, while the mean remained centered. 7

Standard Estimator for Mean... More generally, consider a sample of size n with outcomes X 1, X 2,..., X n from a population, where the X i s have a common arbitrary distribution, and are independent. (In our previous example, X happens to be discrete uniform over the values 1, 2,..., 6.) Define again Then, and X = 1 n X i. (1) n i=1 µ X E( X) = µ (2) σ 2 X V ( X) = 1 n σ2. (3) The standard (best, in a certain sense) estimator for the population mean µ is µ X. The standard deviation of X, σ X, is called the standard error of the estimator. 8

Observe that The sample mean µ X does not depend on the sample size n. The variance V ( X) does depend on n, and it shrinks to zero as n. (Recall the effect of increasing n in our example on discrimination.) Moreover, it is most important to realize that the calculations in (2) and (3) do not depend on the assumed distribution of X, i.e., of the population. Can we say the same about the sampling distribution of X? We may be asking too much, but it turns out... 9

Central Limit Theorem... An interesting property of the normal distribution is that if X 1, X 2,... are normally distributed, i.e., if the population from which successive samples are taken, has a normal distribution, then X is normally distributed (with parameters given in (2) and (3) above) for all n. What if the population (i.e., X) is not normallly distributed? Central Limit Theorem: For any infinite population with mean µ and variance σ 2, the sampling distribution of X is well approximated by the normal distribution with mean µ and variance σ 2 /n, provided that n is sufficiently large. This is the most fundamental result in statistics, because it applies to any infinite population. To facilitate understanding, we will look at several simulation examples in a separate Excel file (C9-01-Central Limit Theorem.xls). 10

The definition of sufficiently large depends on the extent of nonnormality of X (e.g., heavily skewed, multimodal,... ). In general, the larger the sample size, the more closely the sampling distribution of X will resemble a normal distribution. For most applications, a sample size of 30 is considered large enough for using the normal approximation. For finite population, the standard error of X should be corrected to σ X = σ n N n N 1, where N is the size of the population. The term N n N 1 is called the finite population correction factor. For large N, this factor, of course, approaches 1 and hence can be ignored. The usual rule of thumb is to consider N large enough if it is at least 20 times larger than n. 11

Example 1: Soda in a Bottle Suppose the amount of soda in each 32-ounce bottle is normally distributed with a mean of 32.2 ounces and a standard deviation of 0.3 ounce. If a customer buys one bottle, what is the probability that the bottle will contain more than 32 ounces of soda? Answer: This can be viewed as a question on the distribution of the population itself, i.e., of X (or of a sample of size one). We wish to find P(X > 32), where X is normally distributed with µ = 32.2 and σ = 0.3: ( X µ P(X > 32) = P > σ = P(Z > -0.67) = 1 P(Z 0.67) = 0.7468, 32 32.2 0.3 where the last equality comes from the Excel function NORMSDIST(). Hence, there is about 75% chance for a single bottle to contain more than 32 ounces of soda. ) 12

Suppose now that a customer buys a box of four bottles, what is the probability for the mean amount of soda in these four bottles to be greater than 32 ounces? Answer: We are now interested in X for a sample of size 4. Since X is normally distributed, X also is. We also have µ X = µ = 32.2 and σ X = σ/ n = 0.3/ 4 = 0.15. It follows that ( ) X P( X µ X 32 32.2 > 32) = P > σ X 0.15 = P(Z > -1.33) = 1 P(Z 1.33) = 0.9082. Thus, there is about 91% chance for the mean amount of soda in four bottles to exceed 32 ounces. The answer here, 91%, is greater than that in a sample of size one. Is this expected? Note that the standard deviation of the sample mean (σ X) is smaller than the standard deviation of the population (σ), as highlighted in yellow above. Pictorially,... 13

Reducing Variability: mean=32.2 Sample of Size One: Sample of Size Four: 14

Example 2: Average Salary The Dean of a Business school claims that the average salary of the school s graduates one year after graduation is $800 per week with a standard deviation of $100. A second-year student would like to check whether the claim about the mean is correct. He carries out a survey of 25 people who graduated one year ago. He discovers the sample mean to be $750. Based on this information, what can we say about the Dean s claim? Analysis: To answer this question, we will calculate the probability for a sample of 25 graduates to have a mean of $750 or less when the population mean is $800 and the population standard deviation is $100, i.e., the p-value for the given sample mean. Although X is likely to be skewed, it seems reasonable to assume that X is normally distributed. The mean of X is µ X = 800 and the standard deviation of X is σ X = σ/ n = 100/ 25 = 20. 15

Therefore, ( X µ P( X 750) = P X σ X = P(Z 2.5) = 0.0062. ) 750 800 20 Since this p-value is extremely small (compared to 0.05 or 0.01), we conclude that the Dean s claim is not supported by data. 16

Standardizing the Sample Mean... A common practice is to convert calculations regarding X into one regarding the standardized variable Z: Z = X µ X σ X = X µ σ/ n. (4) Indeed, we had done this in the previous two examples; and this unifies discussion and notation. From the central limit theorem, Z as defined in (4) is, for large n, normally distributed with mean 0 and variance 1. That is, the distribution of Z can be well approximated by the standard normal distribution. Define z α/2 as the value such that P( z α/2 < Z z α/2 ) = 1 α. (5) The choice of α, the so-called significance level, is usually taken as 0.05 or 0.01. 17

The value z α/2 is called the two-tailed critical value at level α, in contrast with the one-tailed z A, or z α in our current notation, discussed earlier. Using NORMSINV() (see C7-07-Normal.xls; Heights example, last question), it is easily found that For α = 0.1, z α/2 = 1.645. For α = 0.05, z α/2 = 1.96. For α = 0.01, z α/2 = 2.576. For α = 0.05, the most-common choice, this means:.025.025-1.96 0 1.96 Z 18

Upon substitution of (4), (5) becomes ( P z α/2 < X ) µ σ/ n z α/2 = 1 α, which is equivalent to: ( ) σ σ P µ z α/2 n < X µ + z α/2 n = 1 α. (6) This means that for a given α, the probability for the sample mean X to fall in the interval (µ z α/2 σ n, µ + z α/2 σ n ) (7) is 100(1 α)%. We will often use this form for statistical inference, since it is easy to check if a given X is contained in the above interval. The end points µ ± z α/2 σ n are often called control limits; they can be computed without even taking any sample. 19

Example 2: Average Salary Continued We can also check the Dean s claim using a slightly different approach. Let α = 0.05 (say) and define an observed X as rare if it falls outside the interval (7). Analysis: With µ = 800, σ = 100, n = 25, our control limits are 800 ± 1.96 100 ; 25 or, from 760.8 to 839.2. Since the observed X = 750 is outside these limits, the Dean s claim is not justified. Note that if we had observed an X of 850 (50 above the mean, as opposed to below), we would also have rejected the Dean s claim. This suggests that we could also define a two-sided p-value as: P( X 750 or X 850). This is just twice the original (one-sided) p-value, and hence equals 2 0.0062 = 0.0124. Since this is less than the given α = 0.05, we arrive, as expected, the same conclusion. 20

Sampling Distribution of a Proportion... The central limit theorem also applies to sample proportions. Let X be a binomial random variable with parameters n and p. Since each trial results in either a success or a failure, we can define for trial i a variable X i that equals 1 if we have a success and 0 otherwise. Then, the proportion of trials that resulted in a success is given by: ˆp 1 n n X i = X n. (8) i=1 In the Discrimination example (see C7-05-Binomial.xls), n is the number of available positions and p is the probability for hiring a female for a position. Let X i indicate whether or not the ith hire is a female. Then, ˆp as defined in (8) is the (random) proportion of hires that are female. 21

Observe that the middle term in (8) is an average, or a sample mean. Therefore, the central limit theorem implies that the sampling distribution of ˆp is approximately normal with mean (note that E(X i ) = p) and variance E(ˆp) = p (9) V (ˆp) = (note that V (X i ) = p(1 p)). p(1 p) n (10) This discussion, in fact, explains why, under suitable conditions, the normal distribution can serve as a good approximation to the binomial; see C7-07-Normal.xls. From (9) and (10), we see that ˆp can be standardized to a Z variable: Z = ˆp p p(1 p)/n. 22

Example: Discrimination Continued For n = 50, p = 0.3 and ˆp = 0.1 (10% of 50), we have Z = = ˆp p p(1 p)/n 0.1 0.3 0.3(1 0.3)/50 = 3.086. For α = 0.01, we have z α = 2.326 (NORMSINV(0.99); note that this is a one-tailed critical value). Since 3.086 is less than 2.326 (the normal density is symmetric), we conclude that it is likely that discrimination exists. Note that our calculations here are based on a normal approximation to the exact binomial probability in C7-05-Binomial.xls. Since the approximation is good, the conclusions are consistent. 23

Sampling Distribution of a Difference... We are frequently interested in comparing two populations. One possible scenario is that we have two independent samples from each of two normal populations. In such a case, the sampling distribution of the difference between the two sample means, denoted by X 1 X 2, will be normally distributed with mean and variance µ X1 X 2 = E( X 1 X 2 ) = µ 1 µ 2 (11) σ 2 X1 X 2 = V ( X 1 X 2 ) = σ2 1 n 1 + σ2 2 n 2. (12) If the two populations are not both normally distributed, then the above still applies, provided that the sample sizes n 1 and n 2 are large (e.g., greater than 30). 24

As usual, we can standardize the difference between the two means: Z = ( X 1 X 2 ) (µ 1 µ 2 ) σ 2 1 /n 1 + σ 2 2 /n 2. (13) Example: MBA Salaries The starting salaries of MBA graduates from two universities are $62,000 and $60,000, with respective standard deviations $14,500 and $18,300. Assume that the two populations of salaries are normally distributed. Suppose n 1 = 50 and n 2 = 60 samples are taken from these two universities. What is the probability for the first sample mean X 1 to exceed the second sample mean X 2? 25

Analysis: We wish to find P( X 1 X 2 > 0), which can be computed via Z as: ( ) 0 (62000 60000) P Z > 145002 /50 + 18300 2 /60 = P(Z > 0.64) = 1 P(Z 0.64) = 1 0.261 = 0.739. Thus, there is about a 74% chance for the sample mean starting salary of the first university to exceed that of the second university. 26