Normal Approximation to Binomial Distributions

Similar documents
Normal Distribution as an Approximation to the Binomial Distribution

The normal approximation to the binomial

The normal approximation to the binomial

Normal Approximation. Contents. 1 Normal Approximation. 1.1 Introduction. Anthony Tanbakuchi Department of Mathematics Pima Community College

You flip a fair coin four times, what is the probability that you obtain three heads.

DETERMINE whether the conditions for a binomial setting are met. COMPUTE and INTERPRET probabilities involving binomial random variables

Characteristics of Binomial Distributions

SOLUTIONS: 4.1 Probability Distributions and 4.2 Binomial Distributions

WEEK #23: Statistics for Spread; Binomial Distribution

Normal distribution. ) 2 /2σ. 2π σ

Chapter 5. Random variables

The Binomial Probability Distribution

3.4. The Binomial Probability Distribution. Copyright Cengage Learning. All rights reserved.

2 Binomial, Poisson, Normal Distribution

Binomial Sampling and the Binomial Distribution

Lecture 19: Chapter 8, Section 1 Sampling Distributions: Proportions

REPEATED TRIALS. The probability of winning those k chosen times and losing the other times is then p k q n k.

STAT 35A HW2 Solutions

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

MA 1125 Lecture 14 - Expected Values. Friday, February 28, Objectives: Introduce expected values.

4. Continuous Random Variables, the Pareto and Normal Distributions

TEACHER NOTES MATH NSPIRED

Sample Questions for Mastery #5

6.4 Normal Distribution

Chapter 5: Normal Probability Distributions - Solutions

Binomial Probability Distribution

Week 4: Standard Error and Confidence Intervals

Homework 4 - KEY. Jeff Brenion. June 16, Note: Many problems can be solved in more than one way; we present only a single solution here.

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Chapter 5. Discrete Probability Distributions

Chapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS. Part 3: Discrete Uniform Distribution Binomial Distribution

Lecture 8. Confidence intervals and the central limit theorem

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

The Math. P (x) = 5! = = 120.

Stat 20: Intro to Probability and Statistics

Lecture 5 : The Poisson Distribution

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

2. Discrete random variables

The Normal Approximation to Probability Histograms. Dice: Throw a single die twice. The Probability Histogram: Area = Probability. Where are we going?

Stats on the TI 83 and TI 84 Calculator

6 3 The Standard Normal Distribution

Coin Flip Questions. Suppose you flip a coin five times and write down the sequence of results, like HHHHH or HTTHT.

Math 132. Population Growth: the World

Probability Distribution for Discrete Random Variables

Probability Distributions

The Binomial Distribution

Stat 20: Intro to Probability and Statistics

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

MATH 140 Lab 4: Probability and the Standard Normal Distribution

AMS 5 CHANCE VARIABILITY

Come scegliere un test statistico

Random variables, probability distributions, binomial random variable

Statistics 100A Homework 4 Solutions

5.1 Identifying the Target Parameter

Lecture 14. Chapter 7: Probability. Rule 1: Rule 2: Rule 3: Nancy Pfenning Stats 1000

Foundations of Statistics Frequentist and Bayesian

CALCULATIONS & STATISTICS

A Picture Really Is Worth a Thousand Words

Point and Interval Estimates

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability Distributions

Chapter 4. Probability and Probability Distributions

Ch5: Discrete Probability Distributions Section 5-1: Probability Distribution

Week 3&4: Z tables and the Sampling Distribution of X

How To Test For Significance On A Data Set

How To Run Statistical Tests in Excel

Contemporary Mathematics- MAT 130. Probability. a) What is the probability of obtaining a number less than 4?

Recall this chart that showed how most of our course would be organized:

Mathematical goals. Starting points. Materials required. Time needed

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

John Kerrich s coin-tossing Experiment. Law of Averages - pg. 294 Moore s Text

UNIT I: RANDOM VARIABLES PART- A -TWO MARKS

Two-sample inference: Continuous data

Section 5-3 Binomial Probability Distributions

Lecture 2: Discrete Distributions, Normal Distributions. Chapter 1

Chapter 4 Lecture Notes

14.30 Introduction to Statistical Methods in Economics Spring 2009

Introduction to Hypothesis Testing

CHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS

2013 MBA Jump Start Program. Statistics Module Part 3

Math Girls Rock! Math Club for Young Women Project 2. Geometric Probability and Buffon s Needle Problem

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

2.2 Derivative as a Function

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

8. THE NORMAL DISTRIBUTION

Likelihood: Frequentist vs Bayesian Reasoning

Unit 19: Probability Models

Introduction to Resampling Statistics Using Statistics101

Confidence Intervals for One Standard Deviation Using Standard Deviation

Important Probability Distributions OPRE 6301

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

AP Statistics 7!3! 6!

Simple Regression Theory II 2010 Samuel L. Baker

Decimal Notations for Fractions Number and Operations Fractions /4.NF

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

Solution. Solution. (a) Sum of probabilities = 1 (Verify) (b) (see graph) Chapter 4 (Sections ) Homework Solutions. Section 4.

BINOMIAL DISTRIBUTION

Transcription:

Normal Approximation to Binomial Distributions Charlie Vollmer Department of Statistics Colorado State University Fort Collins, CO charlesv@rams.colostate.edu September 18, 2015 Abstract This document is a supplement to class lectures for STAT 307-003, Fall, 2015. It details how the Normal Distribution can approximate the Binomial Distribution as the number of trials, n, gets large. How large does n need to be? How well does the Normal Distribution approximate a Binomial Distribution? Let us find out... 1 Setup: Defining some terms 1.1 Expected Value If we go to wikipedia, the following is the very first sentence that we ll see: In probability theory, the expected value of a random variable is intuitively the long-run average value of repetitions of the experiment it represents. Great! It is simply what we expect to see most often if we did something over and over and over again! And if we go down a few more sentences on the wikipedia page, we find something even more useful: The expected value is also known as the expectation, mathematical expectation, EV, mean, or first moment. Bam! Look at that fourth synonym: the mean! That is exactly what I would expect to see most often if I did an experiment over and over and over lots of times! Note: if a Random Variable is Binomially Distributed, its mean is: np. 1

1.2 Standard Error, SE If we go to wikipedia, the following is the first two sentences that we ll see: The standard error (SE) is the standard deviation of the sampling distribution of a statistic,[1] most commonly of the mean. The term may also be used to refer to an estimate of that standard deviation, derived from a particular sample used to compute the estimate. Ok, this is wordy but it s actually very accurate and descriptive. It s just saying that the standard error, SE, is the standard deviation of our statistic. So... if our statistic is: S n = Where X i is simply a 1 or 0, in the case of a coin toss (heads or tails), then our SE of this statistic is its standard deviation. n i=1 Now, we know -from class- that this statistic, S n, is a Binomially Distributed Random Variable (it follows a Binomial Distribution). In the case of a binomial, we (humans... and now you, too!) know that the variance of a Binomially Distributed Random Variable is simply: npq Do you remember how to find the standard deviation from the variance? Well, if you need the SE, it s just the standard deviation. So, now we know how to get the variance from a binomial, which means we have the standard deviation or -in other words- the standard error, SE! X i 2

2 The Approximation: Toss a coin 100 times The object of this section is to illustrate how if we plot out the histogram of the number of heads from a coin toss will be well approximated by a Normal Distribution as the number of tosses, n gets large. For instance, say that we toss a fair coin 100 times and see how many times that we get heads. We could do this and get 88 heads. We could also do it and get 45 heads. Let s say we do the entire experiment (toss the coin 100 times) 50 times. Thus, we ll get 50 different numbers. Let s see what that plot looks like: 6 4 count 2 0 40 45 50 55 60 65 Number of Heads It looks like on one experiment we got 40 heads and one time we got 41 heads. On another experiment we got 61 heads. Yet again, on 6 experiments we got 48 heads, and 6 more experiments we got 59 heads. You get the picture. So, in this situation, we only did this experiment (toss a coin 100 times) 50 times. And the plot above shows our results from those 50 experiments. What happens if we did this experiment 100 times? Or a thousand times?? Let us see... 3

Below, we see what happens when we do this experiment 500, 1000, and 5000 times: 0.10 0.075 0.05 0.050 5 40 45 50 55 60 fifty 0 40 50 60 thousand 0.08 0.06 0.075 0.04 0.050 5 40 50 60 five_hundred 0 30 40 50 60 70 five_thousand Whoa! We see that our histograms start to look like a bell curve! Clearly, this is no coincidence! This is because a Binomial Random Variable begins to look like a Normally Distributed Random Variable as the number of trials, n, grows large! Careful!! Take notice that we did NOT increase n yet, only the number of times that we did the experiment! So, now if we increase n, we would expect to see this bell-shaped-looking curve actually start to look more and more like a Normal Distribution. As of now, you can notice that it doesn t quite look like a normal distribution, but rather just a similar-looking curve. 4

3 The Approximation: Toss a coin 100 times Now, we do the same thing as above, but each experiment is tossing the coin 1000 times. What do you think this does to the Expected Value? Ponder this question: Is it easier to get all heads if I only toss the coin 10 times? Would it be harder to get all heads if I tossed the coin 1000 times? These questions have us think about the expected value and the standard error. As we do more and more trials, do we expect the mean of our sample to get closer to the true mean more often? So, let s do the experiment where we toss the coin 1,000 times. experiment 50 times, as we did before. These are our results: And let s do this 6 4 count 2 0 460 480 500 520 Number of Heads And we see that it s centered around 500 heads, as per our intuition of the outcome, and goes from around 450 heads in some experiments to about 550 heads in others. Does it look -upon quick glance- that it s about the same as our first plot?? 5

4 Examine the difference between n = 100 and n = 1000: As per the first section of these notes, we know what the variance of a Binomially Distributed Random variable is: npq. So, if we look at our two different situations, we see that our variances/standard deviations are: V ar(s n ) = npq = 100 1 2 1 2 = 25 in our first context of n = 100, and we have: in our second context of n = 1000. V ar(s n ) = npq = 1000 1 2 1 2 = 250 Careful! What we care about is our standard error, SE! We actually have that our standard errors are: V ar(sn ) = npq = 100 1 2 1 2 = 25 = 5 in our first context of n = 100, and we have: V ar(sn ) = npq = 1000 1 2 1 2 = 250 15 in our second context of n = 1000. Take a second to examine this further... this is actually striking! We know that most (95%) of our data will lie between 2 standard deviations (standard errors) in the context of a Normal Distribution. And here, that means in our first context that most will lie between 40 and 60 heads, while it will be between 470 and 530 in the second. However!... An interval of length 20 is actually 20% of the possible values of the first context (since we could get anywhere between 0 and 100 heads in 100 coin tosses) and an interval of length 60 is only 6% of the possible values in the second context (since we could get anywhere between 0 and 1000 heads if we flip a coin 1000 times). That means our distribution is MUCH tighter about the mean when we made 1000 tosses (as n got larger) than when we only made 100 tosses. 6

5 Visualize 1000 tosses: Let s see what it looks like when we do the 1000 toss experiment many times. Below is for 50, 500, 1000, and 5000 experiments of 1000 tosses: 0.08 0.06 0.03 0.04 0.01 475 500 525 fifty 450 475 500 525 550 thousand 0.03 0.01 0.01 450 475 500 525 five_hundred 480 520 560 five_thousand The important thing to look at is the five thousand experiment plot in the lower-right corner. If we compare this to the same plot in the previous 100-toss experiment, this should look more similar to a Normal Distribution. Let s see as n gets even larger... 7

6 As n gets larger and larger: We see what happens when n = 10, 000 below: 1250 1000 750 count 500 250 0 498000 499000 500000 501000 502000 Number of Heads And again for n = 100, 000: 4000 3000 count 2000 1000 0 497000 498000 499000 500000 501000 502000 Number of Heads And this looks pretty Normal to me! Note: In fact... we can check that this is extremely close to a Normal Curve. 8

7 Is n = 1000 a good Approximation? If we perform the n = 1, 000 experiment many, many times, we can actually get a good idea of how well it is approximated by a Normal Distribution. We plot the 1000-toss experiment done 100,000 times below: 0.01 450 500 550 heads What does this show us? Well... if we have a random variable that follows a Binomial Distribution where the n is at least 1,000... that we find that it is almost a Normal Distribution! This is a very important discovery of ours! Careful! Recall that a Normal Distribution is defined by two things: its mean and variance. If that s all we need, the mean and variance... well, we re gold! We have both of those things! 9