The normal approximation to the binomial



Similar documents
The normal approximation to the binomial

Normal distribution. ) 2 /2σ. 2π σ

Normal Distribution as an Approximation to the Binomial Distribution

The Binomial Probability Distribution

Normal Approximation. Contents. 1 Normal Approximation. 1.1 Introduction. Anthony Tanbakuchi Department of Mathematics Pima Community College

4. Continuous Random Variables, the Pareto and Normal Distributions

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

Lecture 5 : The Poisson Distribution

SOLUTIONS: 4.1 Probability Distributions and 4.2 Binomial Distributions

Stats on the TI 83 and TI 84 Calculator

16. THE NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION

Probability Distributions

Sample Questions for Mastery #5

You flip a fair coin four times, what is the probability that you obtain three heads.

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

Lecture 10: Depicting Sampling Distributions of a Sample Proportion

MATH 140 Lab 4: Probability and the Standard Normal Distribution

Binomial random variables

Section 5 Part 2. Probability Distributions for Discrete Random Variables

Notes on Continuous Random Variables

Lecture 8. Confidence intervals and the central limit theorem

2 Binomial, Poisson, Normal Distribution

TEACHER NOTES MATH NSPIRED

Binomial Distribution Problems. Binomial Distribution SOLUTIONS. Poisson Distribution Problems

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

5. Continuous Random Variables

Method To Solve Linear, Polynomial, or Absolute Value Inequalities:

Example 1. so the Binomial Distrubtion can be considered normal

39.2. The Normal Approximation to the Binomial Distribution. Introduction. Prerequisites. Learning Outcomes

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Important Probability Distributions OPRE 6301

DETERMINE whether the conditions for a binomial setting are met. COMPUTE and INTERPRET probabilities involving binomial random variables

2. Discrete random variables

Chapter 4. Probability and Probability Distributions

ST 371 (IV): Discrete Random Variables

Math 151. Rumbos Spring Solutions to Assignment #22

MAT 155. Key Concept. September 27, S5.5_3 Poisson Probability Distributions. Chapter 5 Probability Distributions

Probability Distributions

Binomial random variables (Review)

Chapter 5: Normal Probability Distributions - Solutions

STAT 200 QUIZ 2 Solutions Section 6380 Fall 2013

Chapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS. Part 3: Discrete Uniform Distribution Binomial Distribution

Key Concept. Density Curve

The Normal Distribution

Introduction to the Practice of Statistics Sixth Edition Moore, McCabe Section 5.1 Homework Answers

Section 6.1 Discrete Random variables Probability Distribution

Practice Problems for Homework #6. Normal distribution and Central Limit Theorem.

Characteristics of Binomial Distributions

Chapter 5. Random variables

Week 3&4: Z tables and the Sampling Distribution of X

Ch5: Discrete Probability Distributions Section 5-1: Probability Distribution

The Normal Distribution. Alan T. Arnholt Department of Mathematical Sciences Appalachian State University

Confidence Intervals for the Difference Between Two Means

AP STATISTICS 2010 SCORING GUIDELINES

AP Statistics Solutions to Packet 2

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

Chapter 4 Lecture Notes

MATH 10: Elementary Statistics and Probability Chapter 7: The Central Limit Theorem

Binomial Probability Distribution

Probability Distribution for Discrete Random Variables

Chapter 4. iclicker Question 4.4 Pre-lecture. Part 2. Binomial Distribution. J.C. Wang. iclicker Question 4.4 Pre-lecture

Statistics 100A Homework 4 Solutions

Department of Civil Engineering-I.I.T. Delhi CEL 899: Environmental Risk Assessment Statistics and Probability Example Part 1

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

Confidence Intervals for One Standard Deviation Using Standard Deviation

REPEATED TRIALS. The probability of winning those k chosen times and losing the other times is then p k q n k.

Chapter 5: Discrete Probability Distributions

Statistiek I. Proportions aka Sign Tests. John Nerbonne. CLCG, Rijksuniversiteit Groningen.

How To Test For Significance On A Data Set

Chapter 5 - Practice Problems 1

Chapter 5. Discrete Probability Distributions

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

39.2. The Normal Approximation to the Binomial Distribution. Introduction. Prerequisites. Learning Outcomes

Chapter 4. Probability Distributions

Unit 26 Estimation with Confidence Intervals

Introduction to Hypothesis Testing

Section 5-3 Binomial Probability Distributions

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Binomial Distribution n = 20, p = 0.3

Chapter 5 Discrete Probability Distribution. Learning objectives

The Normal distribution

Hypothesis testing - Steps

STAT 315: HOW TO CHOOSE A DISTRIBUTION FOR A RANDOM VARIABLE

The Binomial Distribution. Summer 2003

Section 6-5 Sample Spaces and Probability

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. A) B) C) D) 0.

3.4. The Binomial Probability Distribution. Copyright Cengage Learning. All rights reserved.

Point and Interval Estimates

HYPOTHESIS TESTING: POWER OF THE TEST

Joint Exam 1/P Sample Exam 1

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe

WHERE DOES THE 10% CONDITION COME FROM?

Statistics 104: Section 6!

The Binomial Distribution

An Introduction to Basic Statistics and Probability

Part I Learning about SPSS

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

PROBABILITY AND SAMPLING DISTRIBUTIONS

Transcription:

The normal approximation to the binomial The binomial probability function is not useful for calculating probabilities when the number of trials n is large, as it involves multiplying a potentially very large number ( ) n k with a potentially very small one p k ( p) n k. Fortunately, the normal distribution provides a very good approximation to the Binomial when n is large enough (say np > 5 and n( p) > 5). The approximation involves two steps, one obvious and one not so obvious: () Instead of using the B(n, p) distribution, using the N(µ, σ 2 ) distribution, where µ = np and σ 2 = np( p). (2) Use a continuity correction to correct for the use of a continuous distribution to approximate a discrete one. The need for a continuity correction can be seen in the following situation. Consider an experiment where we toss a fair coin 2 times and observe the number of heads. Suppose we want to compute the probability of obtaining exactly 4 heads. Whereas a discrete random variable can have only a specified value (such as 4), a continuous random variable used to approximate it could take on any values within an interval around that specified value, as demonstrated in this figure: Probability..5..5.2 2 3 4 5 6 7 8 9 2 The goal is to approximate the area of the bar above 4 with an area under the normal c 2, Jeffrey S. Simonoff

curve. Using the probability corresponding to P(X = 4) is clearly wrong, since that is ; by using the area between 3.5 and 4.5 (the shaded area above), the area between the bar and curve missed on the left is roughly made up for on the right. This is the essence of the calculation: whenever the Binomial probability for an integer value k is needed, use the area under the corresponding normal curve for (k.5, k +.5). Example. Let H be the number of heads in 2 flips of a fair coin. What is the probability of observing between 3 and 5 heads, inclusive; that is, P(3 H 5)? In this case we can calculate the value exactly: since ( ) 2 P(H = k) = (.5) k (.5) 2 k, k =,, 2,..., 2, k we have P(3 H 5) = P(H = 3) + P(H = 4) + P(H = 5) =.537 +.285 +.9336 =.36792. To approximate this we first need the mean and standard deviation µ = np = (2)(.5) = 6 σ = np( p) = (2)(.5)(.5) =.732. Since H is a binomial random variable, the following statement (based on the continuity correction) is exactly correct: P(3 H 5) = P(2.5 < H < 5.5). Note that this statement is not an approximation it is exactly correct! The reason for this is that we are adding the events 2.5 < H < 3 and 5 < H < 5.5 to get from the left side of the equation to the right side of the equation, but for the binomial random variable, these events have probability zero. The continuity correction is not where the approximation comes in; that comes when we approximate H using a normal distribution with mean µ = 6 and standard deviation σ =.732: P(3 H 5) = P(2.5 < H < 5.5) ( 2.5 6 P.732 < Z < 5.5 6 ).732 = P( 2.2 < Z <.29) = P(Z <.29) P(Z < 2.2) =.3859.27 =.3642 c 2, Jeffrey S. Simonoff 2

Note that the approximation is only off by about %, which is pretty good for such a small sample size. Example. Suppose that a sample of n =, 6 tires of the same type are obtained at random from an ongoing production process in which 8% of all such tires produced are defective. What is the probability that in such a sample not more than 5 tires will be defective? Answer. We approximate the B(6,.8) random variable T with a normal, with mean (6)(.8) = 28 and standard deviation (6)(.8)(.92) =.85. The probability calculation is thus P(T 5) = P(T < 5.5) ( ) 5.5 28 P Z <.85 = P(Z < 2.7) =.988. Note that if the question had been about the probability of there being fewer than 5 defective tires, the calculation would have been P(T < 5) = P(T < 49.5) ( ) 49.5 28 P Z <.85 = P(Z <.98) =.976, the difference in these two values (.47) being the estimated probability of exactly 5 tires with defects. The normal approximation to the binomial is the underlying principle to an important tool in statistical quality control, the Np chart. Say we have an assembly line that turns out thousands of units per day. Periodically (daily, say), we sample n items from the assembly line, and count up the number of defective items, D. What distribution does D have? B(n, p), of course, with p being the probability that a particular item is defective. Thus, examination of D allows us to see if the probability of a defective item is changing over time; that is, the process is getting out of control. Consider the following situation. Our assembly line has been running for a while, and based on this historical data, we ve seen that the probability of an individual item being defective is.2 (this would come from sampling and getting the empirical frequency of defectives). Our online quality control system samples items per day, and counts up c 2, Jeffrey S. Simonoff 3

the number of defectives D. We know that D B(,.2), so E(D) = ()(.2) = 2 and S(D) = ()(.2)(.98) = 4.427. Thus, D is approximately normally distributed with mean µ = 2 and standard deviation σ = 4.427. This allows us to assess whether future values of D are unusual, by seeing whether they get too far from 2. Here is an example of an Np chart. NP Chart for In contr 4 Sample Count 3 2 3.SL=33.28 NP=2. -3.SL=6.78 Sample Number 2 The chart consists of the values for the process plotted with three lines: the expected number of defects in the center, and two control limits at µ±3σ (the number 3 is arbitrary, but standard). Since D is roughly normally distributed, we know that the probability of D being outside the control limits, assuming that p is staying at.2, is P( Z > 3) =.26. So in this case, where there are 2 days worth of data, we re not surprised to see a day or two outside the limits. There are actually three, but this is just random fluctuation; we know that because the process settles down to its correct value immediately. In fact, these data do come from a stable process. Now consider this chart: c 2, Jeffrey S. Simonoff 4

Sample Count 6 5 4 3 2 NP Chart for Number o 3.SL=33.28 NP=2. Sample Number 2-3.SL=6.78 This chart is very different. About days into the sample, the process starts to go out of control. Eventually the D values move outside the control limits, and the process should be stopped and corrected. In fact, for these data, starting with the st observation, I increased p steadily by.3 per day. Control charts also can be constructed based on other statistics, such as means and standard deviations. They are an integral part of the idea of kaizen, or continuous improvement, that has revolutionized manufacturing around the world in recent years. Minitab commands To obtain an Np chart, click on Stat Control Charts NP. Enter the variable with the binomial counts in it next to Variable:, and insert the number of binomial trials each observation refers to next to Subgroup size:. If there is a value of p that is known from historical data to correspond to the process being in control, enter that value next to Historical p:. c 2, Jeffrey S. Simonoff 5