The normal approximation to the binomial



Similar documents
The normal approximation to the binomial

Normal Approximation. Contents. 1 Normal Approximation. 1.1 Introduction. Anthony Tanbakuchi Department of Mathematics Pima Community College

Normal Distribution as an Approximation to the Binomial Distribution

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

The Binomial Probability Distribution

Normal distribution. ) 2 /2σ. 2π σ

Probability Distributions

SOLUTIONS: 4.1 Probability Distributions and 4.2 Binomial Distributions

You flip a fair coin four times, what is the probability that you obtain three heads.

4. Continuous Random Variables, the Pareto and Normal Distributions

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

Stats on the TI 83 and TI 84 Calculator

Chapter 5: Normal Probability Distributions - Solutions

Lecture 10: Depicting Sampling Distributions of a Sample Proportion

2 Binomial, Poisson, Normal Distribution

Sample Questions for Mastery #5

Binomial random variables

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

Confidence Intervals for One Standard Deviation Using Standard Deviation

Key Concept. Density Curve

Lecture 5 : The Poisson Distribution

Chapter 4. Probability and Probability Distributions

Probability Distributions

MATH 140 Lab 4: Probability and the Standard Normal Distribution

MAT 155. Key Concept. September 27, S5.5_3 Poisson Probability Distributions. Chapter 5 Probability Distributions

Introduction to Hypothesis Testing

Important Probability Distributions OPRE 6301

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

Binomial Probability Distribution

Notes on Continuous Random Variables

Lecture 2: Discrete Distributions, Normal Distributions. Chapter 1

Statistics 100A Homework 4 Solutions

Binomial random variables (Review)

2. Discrete random variables

5. Continuous Random Variables

How To Test For Significance On A Data Set

16. THE NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION

Practice Problems for Homework #6. Normal distribution and Central Limit Theorem.

Binomial Distribution Problems. Binomial Distribution SOLUTIONS. Poisson Distribution Problems

MATH 10: Elementary Statistics and Probability Chapter 7: The Central Limit Theorem

Characteristics of Binomial Distributions

Confidence Intervals for the Difference Between Two Means

Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

Introduction to the Practice of Statistics Sixth Edition Moore, McCabe Section 5.1 Homework Answers

HYPOTHESIS TESTING: POWER OF THE TEST

Chapter 5 Discrete Probability Distribution. Learning objectives

39.2. The Normal Approximation to the Binomial Distribution. Introduction. Prerequisites. Learning Outcomes

Normal Probability Distribution

6 3 The Standard Normal Distribution

REPEATED TRIALS. The probability of winning those k chosen times and losing the other times is then p k q n k.

6.4 Normal Distribution

Chapter 4. Probability Distributions

Department of Civil Engineering-I.I.T. Delhi CEL 899: Environmental Risk Assessment Statistics and Probability Example Part 1

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

Section 5 Part 2. Probability Distributions for Discrete Random Variables

Lecture 8. Confidence intervals and the central limit theorem

Statistical estimation using confidence intervals

STATISTICAL QUALITY CONTROL (SQC)

WHERE DOES THE 10% CONDITION COME FROM?

Probability Distribution for Discrete Random Variables

5.1 Identifying the Target Parameter

Numerical Methods for Option Pricing

Confidence Intervals

Week 3&4: Z tables and the Sampling Distribution of X

Chapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS. Part 3: Discrete Uniform Distribution Binomial Distribution

Example 1. so the Binomial Distrubtion can be considered normal

Unit 26 Estimation with Confidence Intervals

Chapter 5. Random variables

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. A) B) C) D) 0.

Chapter 5 - Practice Problems 1

Introduction to Hypothesis Testing OPRE 6301

Ch5: Discrete Probability Distributions Section 5-1: Probability Distribution

The Normal Distribution. Alan T. Arnholt Department of Mathematical Sciences Appalachian State University

Chapter 4 Lecture Notes

Binomial Distribution n = 20, p = 0.3

AP Statistics Solutions to Packet 2

Chapter 5: Discrete Probability Distributions

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

STAT 200 QUIZ 2 Solutions Section 6380 Fall 2013

3.4 The Normal Distribution

Statistiek I. Proportions aka Sign Tests. John Nerbonne. CLCG, Rijksuniversiteit Groningen.

STAT 315: HOW TO CHOOSE A DISTRIBUTION FOR A RANDOM VARIABLE

The Binomial Distribution

1.7 Graphs of Functions

Random variables, probability distributions, binomial random variable

Method To Solve Linear, Polynomial, or Absolute Value Inequalities:

Part I Learning about SPSS

Econ 102 Aggregate Supply and Demand

Point and Interval Estimates

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

BINOMIAL DISTRIBUTION

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Descriptive Statistics

ST 371 (IV): Discrete Random Variables

Binomial lattice model for stock prices

DETERMINE whether the conditions for a binomial setting are met. COMPUTE and INTERPRET probabilities involving binomial random variables

An Introduction to Basic Statistics and Probability

Terms concerned with internal quality control procedures

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Transcription:

The normal approximation to the binomial In order for a continuous distribution (like the normal) to be used to approximate a discrete one (like the binomial), a continuity correction should be used. There are two major reasons to employ such a correction. First, recall that a discrete random variable can only take on only specified values, whereas a continuous random variable used to approximate it can take on any values whatsoever within an interval around those specified values. Hence, when using the normal distribution to approximate the binomial, more accurate approximations are likely to be obtained if a continuity correction is used. Second, recall that with a continuous distribution (such as the normal), the probability of obtaining a particular value of a random variable is zero. On the other hand, when the normal approximation is used to approximate a discrete distribution, a continuity correction can be employed so that we can approximate the probability of a specific value of the discrete distribution. Consider an experiment where we toss a fair coin 2 times and observe the number of heads. Suppose we want to compute the probability of obtaining exactly 4 heads. Whereas a discrete random variable can have only a specified value (such as 4), a continuous random variable used to approximate it could take on any values within an interval around that specified value, as demonstrated in this figure: c 200, Jeffrey S. Simonoff

Probability 0.00 0.05 0.0 0.5 0.20 0 2 3 4 5 6 7 8 9 0 2 The continuity correction requires adding or subtracting.5 from the value or values of the discrete random variable X as needed. Hence to use the normal distribution to approximate the probability of obtaining exactly 4 heads (i.e., X = 4), we would find the area under the normal curve from X = 3.5 to X = 4.5, the lower and upper boundaries of 4. Moreover, to determine the approximate probability of observing at least 4 heads, we would find the area under the normal curve from X = 3.5 and above since, on a continuum, 3.5 is the lower boundary of X. Similarly, to determine the approximate probability of observing at most 4 heads, we would find the area under the normal curve from X = 4.5 and below since, on a continuum, 4.5 is the upper boundary of X. When using the normal distribution to approximate discrete probability distribution functions, we see that semantics become important. To determine the approximate probability of observing fewer than 4 heads, we would find the area under the normal curve from 3.5 and below; to determine the approximate probability of observing at most 4 heads, we would find the area under the normal curve from 4.5 and below, since the latter event includes the value X = 4. Now consider the binomial distribution in particular. Let H be the number of heads in 2 flips of a fair coin. We know that the probability of observing exactly k heads in 2 c 200, Jeffrey S. Simonoff 2

flips is P(H = k) = ( ) 2 (.5) k (.5) 2 k, k = 0,, 2,..., 2. k We also know that the mean of this binomial distribution is given by while the standard deviation is given by µ = np = (2)(.5) = 6, σ = np( p) = (2)(.5)(.5) =.732. Say we are interested in the probability of observing between 3 and 5 heads, inclusive; that is, P(3 H 5). We can calculate this exactly, of course: P(3 H 5) = P(H = 3) + P(H = 4) + P(H = 5) =.0537 +.2085 +.9336 =.36792 What about the normal approximation to this value? First, we should check whether the normal approximation is likely to work very well in this case. A useful rule of thumb is that the normal approximation should work well enough if both np and n( p) are greater than 5. For this example, both equal 6, so we re about at the limit of usefulness of the approximation. Back to the question at hand. Since H is a binomial random variable, the following statement (based on the continuity correction) is exactly correct: P(3 H 5) = P(2.5 < H < 5.5). Note that this statement is not an approximation it is exactly correct! The reason for this is that we are adding the events 2.5 < H < 3 and 5 < H < 5.5 to get from the left side of the equation to the right side of the equation, but for the binomial random variable, these events have probability zero. The continuity correction is not where the approximation comes in; that comes when we approximate H using a normal distribution with mean µ = 6 and standard deviation σ =.732: P(3 H 5) = P(2.5 < H < 5.5) ( 2.5 6 P.732 < Z < 5.5 6 ).732 = P( 2.02 < Z <.29) = P(Z <.29) P(Z < 2.02) =.3859.027 =.3642 c 200, Jeffrey S. Simonoff 3

Note that the approximation is only off by about %, which is pretty good for such a small sample size. Example. Suppose that a sample of n =, 600 tires of the same type are obtained at random from an ongoing production process in which 8% of all such tires produced are defective. What is the probability that in such a sample not more than 50 tires will be defective? Answer. We approximate the B(600,.08) random variable T with a normal, with mean (600)(.08) = 28 and standard deviation (600)(.08)(.92) = 0.85. The probability calculation is thus P(T 50) = P(T < 50.5) ( ) 50.5 28 P Z < 0.85 = P(Z < 2.07) =.9808 Example. Based on past experience, 7% of all luncheon vouchers are in error. If a random sample of 400 vouchers is selected, what is the approximate probability that (a) exactly 25 are in error? (b) fewer than 25 are in error? (c) between 20 and 25 (inclusive) are in error? Answer. We approximate the B(400,.07) random variable V with a normal, with mean (400)(.07) = 28 and standard deviation (400)(.07)(.93) = 5.03. The probability calculations are thus (a) P(V = 25) = P(24.5 < V < 25.5) ( 24.5 28 P < Z < 5.03 = P(.69 < Z <.49) =.32.245 =.0670 ) 25.5 28 5.03 (b) P(V < 25) = P(V < 24.5) ( ) 24.5 28 P Z < 5.03 = P(Z <.69) =.245 c 200, Jeffrey S. Simonoff 4

(c) P(20 V 25) = P(9.5 < V < 25.5) ( 9.5 28 P < Z < 5.03 = P(.67 < Z <.49) =.32.0475 =.2646 ) 25.5 28 5.03 The normal approximation to the binomial is the underlying principle to an important tool in statistical quality control, the Np chart. Say we have an assembly line that turns out thousands of units per day. Periodically (daily, say), we sample n items from the assembly line, and count up the number of defective items, D. What distribution does D have? B(n, p), of course, with p being the probability that a particular item is defective. Thus, examination of D allows us to see if the probability of a defective item is changing over time; that is, the process is getting out of control. Consider the following situation. Our assembly line has been running for a while, and based on this historical data, we ve seen that the probability of an individual item being defective is.02 (this would come from sampling and getting the empirical frequency of defectives). Our online quality control system samples 000 items per day, and counts up the number of defectives D. We know that D B(000,.02), so E(D) = (000)(.02) = 20 and S(D) = (000)(.02)(.98) = 4.427. Thus, D is approximately normally distributed with mean µ = 20 and standard deviation σ = 4.427. This allows us to assess whether future values of D are unusual, by seeing whether they get too far from 20. Here is an example of an Np chart. c 200, Jeffrey S. Simonoff 5

NP Chart for In contr 40 Sample Count 30 20 0 0 3.0SL=33.28 NP=20.00-3.0SL=6.78 0 00 Sample Number 200 The chart consists of the values for the process plotted with three lines: the expected number of defects in the center, and two control limits at µ±3σ (the number 3 is arbitrary, but standard). Since D is roughly normally distributed, we know that the probability of D being outside the control limits, assuming that p is staying at.02, is P( Z > 3) =.0026. So in this case, where there are 200 days worth of data, we re not surprised to see a day or two outside the limits. There are actually three, but this is just random fluctuation; we know that because the process settles down to its correct value immediately. In fact, these data do come from a stable process. Now consider this chart: c 200, Jeffrey S. Simonoff 6

Sample Count 60 50 40 30 20 NP Chart for Number o 3.0SL=33.28 NP=20.00 0 0 0 00 Sample Number 200-3.0SL=6.78 This chart is very different. About 00 days into the sample, the process starts to go out of control. Eventually the D values move outside the control limits, and the process should be stopped and corrected. In fact, for these data, starting with the 0 st observation, I increased p steadily by.0003 per day. Control charts also can be constructed based on other statistics, such as means and standard deviations. They are an integral part of the idea of kaizen, or continuous improvement, that has revolutionized manufacturing around the world in recent years. c 200, Jeffrey S. Simonoff 7

MINITAB commands To obtain an Np chart, click on Stat Control Charts NP. Enter the variable with the binomial counts in it next to Variable:, and insert the number of binomial trials each observation refers to next to Subgroup size:. If there is a value of p that is known from historical data to correspond to the process being in control, enter that value next to Historical p:. c 200, Jeffrey S. Simonoff 8