Negative Binomial. Paul Johnson and Andrea Vieux. June 10, 2013



Similar documents
Notes on the Negative Binomial Distribution

Beta Distribution. Paul Johnson and Matt Beverlin June 10, 2013

4. Continuous Random Variables, the Pareto and Normal Distributions

6 PROBABILITY GENERATING FUNCTIONS

Math 461 Fall 2006 Test 2 Solutions

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Random variables, probability distributions, binomial random variable

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

WHERE DOES THE 10% CONDITION COME FROM?

Gaussian Conjugate Prior Cheat Sheet

Section 1.1 Linear Equations: Slope and Equations of Lines

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

1 Sufficient statistics

Homework 4 - KEY. Jeff Brenion. June 16, Note: Many problems can be solved in more than one way; we present only a single solution here.

The normal approximation to the binomial

The Binomial Probability Distribution

Probability Calculator

Chapter 4. Probability and Probability Distributions

1.1 Introduction, and Review of Probability Theory Random Variable, Range, Types of Random Variables CDF, PDF, Quantiles...

1 Prior Probability and Posterior Probability

Chapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS. Part 3: Discrete Uniform Distribution Binomial Distribution

You flip a fair coin four times, what is the probability that you obtain three heads.

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Zeros of a Polynomial Function

Differentiation and Integration

Lecture 5 : The Poisson Distribution

6.041/6.431 Spring 2008 Quiz 2 Wednesday, April 16, 7:30-9:30 PM. SOLUTIONS

Normal distribution. ) 2 /2σ. 2π σ

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

CHAPTER 2 Estimating Probabilities

The Engle-Granger representation theorem

E3: PROBABILITY AND STATISTICS lecture notes

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 18. A Brief Introduction to Continuous Probability

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

2WB05 Simulation Lecture 8: Generating random variables

INSURANCE RISK THEORY (Problems)

REPEATED TRIALS. The probability of winning those k chosen times and losing the other times is then p k q n k.

Notes on Continuous Random Variables

EC3070 FINANCIAL DERIVATIVES. Exercise 1

Factoring Patterns in the Gaussian Plane

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Probability Distributions

Section 5 Part 2. Probability Distributions for Discrete Random Variables

5.1 Radical Notation and Rational Exponents

Part 1 Expressions, Equations, and Inequalities: Simplifying and Solving

Chapter 4 Lecture Notes

Characteristics of Binomial Distributions

JUST THE MATHS UNIT NUMBER 1.8. ALGEBRA 8 (Polynomials) A.J.Hobson

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

Statistics 100A Homework 8 Solutions

UNIT I: RANDOM VARIABLES PART- A -TWO MARKS

Sampling Distributions

Practice problems for Homework 11 - Point Estimation

Point and Interval Estimates

Chapter 3 RANDOM VARIATE GENERATION

Some special discrete probability distributions

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

Zeros of Polynomial Functions

Practice Problems #4

Solving Rational Equations

Binomial lattice model for stock prices

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

Discrete Mathematics: Homework 7 solution. Due:

The normal approximation to the binomial

Lecture 7: Continuous Random Variables

PROBABILITY AND SAMPLING DISTRIBUTIONS

Ch5: Discrete Probability Distributions Section 5-1: Probability Distribution

Maximum Likelihood Estimation

Math Quizzes Winter 2009

A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails

Review of Random Variables

Student Outcomes. Lesson Notes. Classwork. Discussion (10 minutes)

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Section 1.3 P 1 = 1 2. = P n = 1 P 3 = Continuing in this fashion, it should seem reasonable that, for any n = 1, 2, 3,..., =

0.8 Rational Expressions and Equations

WORKED EXAMPLES 1 TOTAL PROBABILITY AND BAYES THEOREM

2. Discrete random variables

Section 5.1 Continuous Random Variables: Introduction

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

Zeros of Polynomial Functions

Chapter 4 Online Appendix: The Mathematics of Utility Functions

WEEK #23: Statistics for Spread; Binomial Distribution

Exam C, Fall 2006 PRELIMINARY ANSWER KEY

LINEAR INEQUALITIES. Mathematics is the art of saying many things in many different ways. MAXWELL

Negative Integer Exponents

DETERMINE whether the conditions for a binomial setting are met. COMPUTE and INTERPRET probabilities involving binomial random variables

6.4 Normal Distribution

6 3 The Standard Normal Distribution

Statistics and Random Variables. Math 425 Introduction to Probability Lecture 14. Finite valued Random Variables. Expectation defined

Covariance and Correlation

Inequalities - Solve and Graph Inequalities

Simple linear regression

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Chapter 5. Random variables

Chapter 15 Binomial Distribution Properties

Probability Generating Functions

3.4. The Binomial Probability Distribution. Copyright Cengage Learning. All rights reserved.

How To Understand And Solve A Linear Programming Problem

Transcription:

Negative Binomial Paul Johnson and Andrea Vieux June, 23 Introduction The Negative Binomial is discrete probability distribution, one that can be used for counts or other integervalued variables. It gives probabilities for integer values from to. It is sometimes also known as the Pascal distribution or as Polya s distribution. The Negative Binomial is best thought of as a variant of a Binomial distribution. The reader will recall that a Bernoulli trial is a coin flip experiment, one that returns yes or no, failure or success. The Binomial distribution describes the number of successes out of a given number of Bernoulli trials. As such, its values are defined from up to the number of trials. The Negative Binomial describes the same sequence of Bernoulli trials that is described by the Binomial distribution. The difference is what is being described. Whereas Bernoulli tracks the number of successes, the Negative Binomial represents the number of failures that occurs at the beginning of the sequence as we wait for a given number of successes to be achieved. The most frequent use of the Negative Binomial model has nothing to do with the property that it describes the chances of a string of failures. Rather, it is used to describe distributions that represent counts of events. It is a replacement for the commonly used Poisson distribution, which is often considered the default distribution for integer-valued count data. The Poisson has the property that its expected value equals its variance, a property that is not found in some empirically observed distributions. The Negative Binomial has a higher variance. 2 Mathematical Description Recall that the Binomial distribution gives the probability of observing r successes out ot N trials when the probability of observing a success is fixed at p. ( ) N B(r N) p r ( p) N r N! r r!(n r)! pr ( p) N r () ( ) N The symbol is the binomial coefficient (from intermediate algebra) and it is spoken out loud as r N choose r, meaning the number of ways one can choose subsets of r from a larger set of N. ( ) N N! (2) r N!(N r)! Thinking of one particular sequence the N Bernoulli trials, one that has r successes, we can easily calculate the probabilities. The chance of observing one particular sequence, S, S, F, S, F,... S, will be p p ( p) p ( p)... p. Regrouping indicates the probability of that particular sequence is p r ( p) N r. That is the probability of one particular ( ) sequence that has r successes. Put together the chances of all such sequences, N of which there are, and the formula in should be clearly understood. r The Negative Binomial is defined with reference to the Binomial. The R help page for the rbinom function describes it as the number of failures which occur in a sequence of Bernoulli trials before a target number of successes is reached (R documentation).

We found the description on the Math World web site (http://mathworld.wolfram.com/negativebinomialdistributio html) to have more intuitive appeal. The Negative Binomial Distribution gives the probability of r- successes and x failures in x+r- trials, and success on the (x+r) th trial. Take the first part of the definition, (r ) successes out of (x+r ) trials. By the Binomial distribution, that is (x + r )! P rob(r x + r ) (r )!(x)! pr ( p) x (3) The probability of observing a success on the last trial, the (x + r)th trial, is p. So, by the multiplication rule of probabilities, we multiply the previous equation 3 by p to obtain NB(x r, p) (x + r )! (r )!(x)! pr ( p) x (4) Or, if you prefer to write it with the binomial coefficient, ( x + r NB(x r, p) r ) p r ( p) x (5) 3 Central Tendency and Dispersion If x has a Negative Binomial distribution, then x can take on integer values ranging from to. The upper limit is infinity because we are letting the process run until there are exactly (r ) failures and then we conduct one additional experiment on which we obtain success. The expected value and variance are: E[x] r( p) p (6) V ar(x) r( p) p 2 (7) and the skewness and kurtosis are Skewness(x) 2 p ( p)r Kurtosis(x) p2 6p + 6 r( p) These are worth considering because of their dependence on r. As one would expect, the expected number of successes is increasing in the number of observed failures. However, the important effects of r are seen in the variance and skewness. As r increases, the variance increases. But the skewness shrinks. Hence, for small values of r, we should expect to see a very skewed distribution, whereas for higher values of r, the distribution should be more symmetrical. 4 Illustrations The following code is for Figure, which displays example distributions as a function of a target number of successes and probabilities for successes on individual trials. Recall that the probabilities indicate the chances of observing a given number of failures after observing a given number of successes. drawbinom < f u n c t i o n (S, p ) { x< seq (, 2 ) < dnbinom ( x, s i z es, probp ) m y t i t l e < paste ( " x ",S, p ) 2

p l o t ( x,, type" s ", mainmytitle, xlabpaste ( " number o f f a i l u r e s b e f o r e ",S, " s u c c e s s e s " ) ) } par ( mfcolc ( 3, 2 ) ) s i z e s < c ( 2, 4, 6 ) pvals < c (.33,. 6 7 ) f o r ( i in : 2 ) { f o r ( j in : 3 ) { drawbinom ( s i z e s [ j ], pvals [ i ] ) } } In the Figure we can see how changes in the size and probability alter the Negative Binomial Distribution. The size moves the graph right to left, and the probability widens or shrinks the bell shape. For a graph which resembles the Normal Distribution, refer to Figure 2. 5 The Negative Binomial as a Mixture Distribution From J. Scott Long s book (Citation??): The negative binomial distribution is a useful tool in the analysis of count data. It differs from the Poisson Distribution in that it allows for the conditional variance to exceed the conditional mean. The negative binomial distribution is the end result of a process which begins with a Poisson distribution with mean λ. The fixed (λ) is replaced with a random variable that has a mean of λ, but has nonzero variance. With this new framework, we are representing the possibility that there is unobserved heterogeneity. λ. Recall the definition of the Poisson gives the probability of observing x events as a function of a parameter P oisson(x λ) λx e λ (8) x! Instead of thinking of λ as a constant, think of it as a random variable, representing the fact that the average count is likely to vary across experiments (counties, cities, etc). If we think of λ as a variable, then the probability of observing x failures has to be the result of a 2 part process. Part chooses λ and part 2 chooses x. That means there are possibly many different routes through which we can observe a particular value x, since λ can vary from to, and for each possible value of λ, there is some chance of observing any positive value. P (x λ)p(λ)dλ (9) If one chooses just the right distribution for λ, then the mixture distribution will be Negative Binomial. Suppose we take the lucky guess that λ has a Gamma probability distribution P rob(λ) β α Γ(α) λ(α ) e λ/β () The shape parameter is α and the scale parameter is β, and the expected value is αβ and the variance is αβ 2. Inserting the given distributions into 9, we are able to calculate the marginal distribution of x. 3

Figure : Negative Binomial x 2.33 x 2.67...2.3 number of failures before 2 successes x 4.33...2 number of failures before 4 successes x 6.33..5..5.2..2.4.6.8. number of failures before 2 successes x 4.67..2.4.6 number of failures before 4 successes x 6.67..2.4.6 number of failures before 6 successes number of failures before 6 successes 4

Figure 2: Negative Binomial 2 x7 < seq (, ) 7 < dnbinom ( x7, s i z e 5, prob. 5 ) p l o t ( 7, type" s " ) 7...2.3.4 2 4 6 8 Index 5

e λ λ x P rob(x) x! β α Γ(α) λ(α ) e λ/β dλ () x!β α e λ λ x λ (α ) e λ/β dλ (2) Γ(α) x!β α Γ(α) x!β α Γ(α) λ x+α e λ e λ/β dλ (3) λ x+α e λ(+/β) dλ (4) I have tried to simplify this through brute force, but have been unable to do so. I feel sure there is a specialized result in calculus that will help to use the Gamma function, which would be adapted to this case as: Γ(x + α) λ x+α e λ dλ (5) And the end result, the Negative Binomial distribution, is stated either as ( ) α ( ) x Γ(α + x) β (6) x!γ(α) + β + β or Γ(α + x) x!γ(α) ( ) α ( ) x (7) + β + β This is the form of the Negative Binomial that was discussed earlier. I have been unable to find the brute force method to make the jump from 4 to 7, but in Gelman, Carlin, Stern, and Rubin s Bayesian Data Analysis, 2ed (p. 53), an alternative derivation is offered. Recall Bayes s theorem p(λ x) p(x λ)p(λ) (8) p(x) Rearrange that p(x) p(x λ)p(λ) p(λ x) (9) The left hand side is our target the marginal distribution of x. We already know formulae for the two components in the numerator, but the denominator requires a result from probability theory. Most statistics books and all Bayesian statistics books will have a result about updating from a Poisson distribution. If x is a Poisson variable with parameter λ, and the conjugate prior for λis the Gamma distribution, Gamma(λ α, β), then Bayes s theorem (expression 8), leads to the conclusion that if x events are observed, then p(λ x) Gamma(λ α + x, β + ) (2) That is the last ingredient we need to solve expression 9. Gelman et al put it this way: p(x) P oisson(x λ)gamma(λ α, β) Gamma(λ α + x, β) (2) Writing in the expressions obtained above, p(x) λ x e λ x! β α Γ(α) λ(α ) e λ/β (+β) (α+x) Γ(α+x) λ(α+x ) e λ (+β) (22) 6

After cancellation and rearrangement, that simplifies to: p(x) Γ(α + x) Γ(α)x! Using the definition of Γ(x) and rearranging again, we find: (α + x )! (α )!x! β α ( + β) α+x (23) ( ) α ( ) x β (24) + β + β Recognizing that the first term on the left is the Binomial coefficient, the derivation is complete. We choose the shape and scale parameters so that the Gamma distributed value of λ will allow us to simplify7. It appears obvious in this expression that the role of the probability of success on an individual and that α is the desired number of successes in the Negative Binomial. That means that, if want to set the Gamma parameters equal to the proper levels, we choose β ( p)/p and α n (the number of desired successes). If n represents the number of successes and the number of failures is x N n trial is p +β P rob(x) Another way to think of the mixture Γ(x + n) p n ( p) x Γ(n)x! Instead of replacing λ in the Poisson definition, suppose we think of multiplying it by a randomly chosen value δ. If δ has an expected value of, then E(λδ) λ, so, on average, the original λ is reproduced. However, because δ might be higher or lower, then λδ will have random variation, and so the number of observed events will fluctuate. Its average will still be λ, but it will have greater variance. In such a way, one can see the Poisson in this way (replacing λ by λδ). P (x δ) e (λδ) (λδ) x x! Here is the question: how can one select a mathematically workable model for δ so that E(δ)? I ve seen this done in several ways. Recall that the Gamma distribution has expected value αβ and variance αβ 2. So if we drew a variate y from any Gamma distribution and then divided by αβ, the result would be x/αβ and the expected value would be E(x/αβ) E(y)/αβ. More commonly, attention is focused on a subset of possible Gamma distributions, the ones for which the β /α. When δ follows a Gamma distribution Gamma(δ α, /α), then it has an expected value of and its variance is /α. As α tends to, the variance tends toward infinity. Think of λ as an attribute of an observation. If δ is a Gamma variate with mean and variance /α, then λδ is also a Gamma variate, with mean λ and variance λ 2 /α. Maybe it is just a lucky guess, but it appears to me that λδ would have the distribution Gamma(α, λ/α). 7