Lecture 5 : The Poisson Distribution



Similar documents
Notes on Continuous Random Variables

SOLUTIONS: 4.1 Probability Distributions and 4.2 Binomial Distributions

The normal approximation to the binomial

APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING

MAT 155. Key Concept. September 27, S5.5_3 Poisson Probability Distributions. Chapter 5 Probability Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Normal distribution. ) 2 /2σ. 2π σ

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

Chapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS. Part 3: Discrete Uniform Distribution Binomial Distribution

10.2 Series and Convergence

Lecture 7: Continuous Random Variables

Exploratory Data Analysis

Chapter 4. Probability Distributions

PROBABILITY AND SAMPLING DISTRIBUTIONS

DETERMINE whether the conditions for a binomial setting are met. COMPUTE and INTERPRET probabilities involving binomial random variables

Normal Distribution as an Approximation to the Binomial Distribution

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

Math 461 Fall 2006 Test 2 Solutions

The normal approximation to the binomial

CHAPTER 7 SECTION 5: RANDOM VARIABLES AND DISCRETE PROBABILITY DISTRIBUTIONS

Probability Distributions

2. Discrete random variables

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Section 6.1 Discrete Random variables Probability Distribution

Lesson 4 Measures of Central Tendency

Chapter 4. Probability and Probability Distributions

You flip a fair coin four times, what is the probability that you obtain three heads.

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

The Binomial Probability Distribution

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

Homework 4 - KEY. Jeff Brenion. June 16, Note: Many problems can be solved in more than one way; we present only a single solution here.

Stats on the TI 83 and TI 84 Calculator

2 Binomial, Poisson, Normal Distribution

TEACHER NOTES MATH NSPIRED

WHERE DOES THE 10% CONDITION COME FROM?

ST 371 (IV): Discrete Random Variables

Descriptive Statistics

Chapter 4 Lecture Notes

Statistics 104: Section 6!

6.4 Normal Distribution

16. THE NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION

Department of Civil Engineering-I.I.T. Delhi CEL 899: Environmental Risk Assessment Statistics and Probability Example Part 1

An Introduction to Basic Statistics and Probability

How To Test For Significance On A Data Set

Chapter 5 Discrete Probability Distribution. Learning objectives

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

5. Continuous Random Variables

Lecture 19: Chapter 8, Section 1 Sampling Distributions: Proportions

39.2. The Normal Approximation to the Binomial Distribution. Introduction. Prerequisites. Learning Outcomes

International Examinations. Advanced Level Mathematics Statistics 2 Steve Dobbs and Jane Miller

1.1 Introduction, and Review of Probability Theory Random Variable, Range, Types of Random Variables CDF, PDF, Quantiles...

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

Section 5 Part 2. Probability Distributions for Discrete Random Variables

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Probability Distributions

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

MATH 10: Elementary Statistics and Probability Chapter 7: The Central Limit Theorem

Random variables, probability distributions, binomial random variable

Lecture 8. Confidence intervals and the central limit theorem

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

EMPIRICAL FREQUENCY DISTRIBUTION

Week 3&4: Z tables and the Sampling Distribution of X

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

The Normal Distribution

Northumberland Knowledge

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Ch5: Discrete Probability Distributions Section 5-1: Probability Distribution

Notes on the Negative Binomial Distribution

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Characteristics of Binomial Distributions

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

1 Sufficient statistics

Confidence Intervals for the Difference Between Two Means

Probability Distributions

Chapter 3 RANDOM VARIATE GENERATION

2WB05 Simulation Lecture 8: Generating random variables

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Sampling Distributions

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Descriptive Statistics

Some special discrete probability distributions

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

1 The Brownian bridge construction

Means, standard deviations and. and standard errors

Lecture 2: Discrete Distributions, Normal Distributions. Chapter 1

39.2. The Normal Approximation to the Binomial Distribution. Introduction. Prerequisites. Learning Outcomes

Chapter 5. Random variables

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

1.5 Oneway Analysis of Variance

Practice Problems #4

Lecture 8: More Continuous Random Variables

3.4. The Binomial Probability Distribution. Copyright Cengage Learning. All rights reserved.

Transcription:

Lecture 5 : The Poisson Distribution Jonathan Marchini November 10, 2008 1 Introduction Many experimental situations occur in which we observe the counts of events within a set unit of time, area, volume, length etc. For example, The number of cases of a disease in different towns The number of mutations in set sized regions of a chromosome The number of dolphin pod sightings along a flight path through a region The number of particles emitted by a radioactive source in a given time The number of births per hour during a given day In such situations we are often interested in whether the events occur randomly in time or space. Consider the Babyboom dataset we saw in Lecture 2. The birth times of the babies throughout the day are shown in Figure 1. If we divide up the day into 24 hour intervals and count the number of births in each hour we can plot the counts as a histogram in Figure 2. How does this compare to the histogram of counts for a process that isn t random? Suppose the 44 birth times were distributed in time as shown in Figure 3. The histogram of these birth times per hour is shown in Figure 4. We see that the non-random clustering of events in time causes there to be more hours with zero births and more hours with large numbers of births than the real birth times histogram. This example illustrates that the distribution of counts is useful in uncovering whether the events might occur randomly or non-randomly in time (or space). Simply looking at the histogram isn t sufficient if we want to ask the question whether the events occur randomly or not. To answer this question we need a probability model for the distribution of counts of random events that dictates the type of distributions we should expect to see. 1

Figure 1: The birth times of the babies in the Babyboom dataset Figure 2: Histogram of birth times per hour of the babies in the Babyboom dataset 2

Figure 3: The birth times of the babies in the Babyboom dataset Figure 4: Histogram of birth times per hour of the non-random data. 3

2 The Poisson Distribution The Poisson distribution is a discrete probability distribution for the counts of events that occur randomly in a given interval of time (or space). If we let X = The number of events in a given interval, Then, if the mean number of events per interval is λ The probability of observing x events in a given interval is given by λ λx P(X = x) = e x! x = 0, 1, 2, 3, 4,... Note e is a mathematical constant. e 2.718282. There should be a button on your calculator e x that calculates powers of e. If the probabilities of X are distributed in this way, we write X Po(λ) λ is the parameter of the distribution. We say X follows a Poisson distribution with parameter λ Note A Poisson random variable can take on any positive integer value. In contrast, the Binomial distribution always has a finite upper limit. 2.1 Examples Births in a hospital occur randomly at an average rate of 1.8 births per hour. What is the probability of observing 4 births in a given hour at the hospital? Let X = No. of births in a given hour (i) Events occur randomly (ii) Mean rate λ = 1.8 X Po(1.8) We can now use the formula to calculate the probability of observing exactly 4 births in a given hour 1.8 1.84 P (X = 4) = e = 0.0723 4! What about the probability of observing more than or equal to 2 births in a given hour at the hospital? We want P (X 2) = P (X = 2) + P (X = 3) +... 4

i.e. an infinite number of probabilities to calculate but P (X 2) = P (X = 2) + P (X = 3) +... = 1 P (X < 2) = 1 (P (X = 0) + P (X = 1)) 1.8 1.80 = 1 (e 0! 1.8 1.81 + e 1! ) = 1 (0.16529 + 0.29753) = 0.537 3 The shape of the Poisson distribution par(mfrow = c(1, 3)) plot(0:20, dpois(0:20, 3), type = h, ylim = c(0, 0.25), xlab = X, main = Po(3), ylab = P(X), lwd = 3, cex.lab = 1.5, cex.axis = 2, cex.main = 2) plot(0:20, dpois(0:20, 5), type = h, ylim = c(0, 0.25), xlab = X, main = Po(5), ylab = P(X), lwd = 3, cex.lab = 1.5, cex.axis = 2, cex.main = 2) plot(0:20, dpois(0:20, 10),type = h, ylim = c(0, 0.25), xlab = X, main = Po(10), ylab = P(X), lwd = 3, cex.lab = 1.5, cex.axis = 2, cex.main = 2) Using the formula we can calculate the probabilities for a specific Poisson distribution and plot the probabilities to observe the shape of the distribution. For example, Figure 5 shows 3 different Poisson distributions. We observe that the distributions are (i) unimodal (ii) exhibit positive skew (that decreases a λ increases) (iii) centered roughly on λ (iii) the variance (spread) increases as λ increases 4 Mean and Variance of the Poisson distribution In general, there is a formula for the mean of a Poisson distribution. There is also a formula for the standard deviation, σ, and variance, σ 2. If X Po(λ) then µ = λ σ = λ σ 2 = λ 5

5 Changing the size of the interval Suppose we know that births in a hospital occur randomly at an average rate of 1.8 births per hour. What is the probability that we observe 5 births in a given 2 hour interval? Well, if births occur randomly at a rate of 1.8 births per 1 hour interval Then births occur randomly at a rate of 3.6 births per 2 hour interval Let Y = No. of births in a 2 hour period Then Y Po(3.6) 3.6 3.65 P (Y = 5) = e = 0.13768 5! This example illustrates the following rule If X Po(λ) on 1 unit interval, then Y Po(kλ) on k unit intervals. 6 Sum of two Poisson variables Now suppose we know that in hospital A births occur randomly at an average rate of 2.3 births per hour and in hospital B births occur randomly at an average rate of 3.1 births per hour. What is the probability that we observe 7 births in total from the two hospitals Figure 5: Three different Poisson distributions. 6

in a given 1 hour period? To answer this question we can use the following rule If X Po(λ 1 ) on 1 unit interval, and Y Po(λ 2 ) on 1 unit interval, then X + Y Po(λ 1 + λ 2 ) on 1 unit interval. So if we let X = No. of births in a given hour at hospital A and Y = No. of births in a given hour at hospital B Then X Po(2.3), Y Po(3.1) and X + Y Po(5.4) 5.4 5.47 P (X + Y = 7) = e = 0.11999 7! 7 Using the Poisson to approximate the Binomial The Binomial and Poisson distributions are both discrete probability distributions. In some circumstances the distributions are very similar. For example, consider the Bin(100, 0.02) and Po(2) distributions shown in Figure 6. Visually these distributions are identical. In general, If n is large (say > 50) and p is small (say < 0.1) then a Bin(n, p) can be approximated with a Po(λ) where λ = np The idea of using one distribution to approximate another is widespread throughout statistics and one we will meet again. In many situations it is extremely difficult to use the exact distribution and so approximations are very useful. Example Given that 5% of a population are left-handed, use the Poisson distribution to estimate the probability that a random sample of 100 people contains 2 or more left-handed people. X = No. of left handed people in a sample of 100 X Bin(100, 0.05) Poisson approximation X Po(λ) with λ = 100 0.05 = 5 We want P (X 2)? P (X 2) = 1 P (X < 2) ( ) = 1 P (X = 0) + P (X = 1) ) 5 50 51 1 (e + e 5 0! 1! 1 0.040428 0.959572 7

Figure 6: A Binomial and Poisson distribution that are very similar. If we use the exact Binomial distribution we get the answer 0.96292. 8 Fitting a Poisson distribution Consider the two sequences of birth times we saw in Section 1. Both of these examples consisted of a total of 44 births in 24 hour intervals. Therefore the mean birth rate for both sequences is 44 24 = 1.8333 What would be the expected counts if birth times were really random i.e. what is the expected histogram for a Poisson random variable with mean rate λ = 1.8333. Using the Poisson formula we can calculate the probabilities of obtaining each possible value 1 x 0 1 2 3 4 5 6 P (X = x) 0.15989 0.29312 0.26869 0.16419 0.07525 0.02759 0.01127 Then if we observe 24 hour intervals we can calculate the expected frequencies as 24 P (X = x) for each value of x. x 0 1 2 3 4 5 6 Expected frequency 3.837 7.035 6.448 3.941 1.806 0.662 0.271 24 P (X = x) We say we have fitted a Poisson distribution to the data. 1 in practice we group values with low probability into one category. 8

This consisted of 3 steps (i) Estimating the parameters of the distribution from the data (ii) Calculating the probability distribution (iii) Multiplying the probability distribution by the number of observations Once we have fitted a distribution to the data we can compare the expected frequencies to those we actually observed from the real Babyboom dataset. We see that the agreement is quite good. x 0 1 2 3 4 5 6 Expected 3.837 7.035 6.448 3.941 1.806 0.662 0.271 Observed 3 8 6 4 3 0 0 When we compare the expected frequencies to those observed from the nonrandom clustered sequence in Section 1 we see that there is much less agreement. x 0 1 2 3 4 5 6 Expected 3.837 7.035 6.448 3.941 1.806 0.662 0.271 Observed 12 3 0 2 2 4 1 In Lecture 7 we will see how we can formally test for a difference between the expected and observed counts. For now it is enough just to know how to fit a distribution. 9