Lecture 7: Special Probability Distributions - 2 Assist. Prof. Dr. Emel YAVUZ DUMAN Introduction to Probability and Statistics İstanbul Kültür University
Outline 1 The Hypergeometric Distribution 2
Outline 1 The Hypergeometric Distribution 2
Many times we used sampling with and without replacement to illustrate the multiplication rules for independent and dependent events. To obtain a formula analogous to the binomial distribution that applies to sampling without replacement, in which case the trials are not independent, let us consider a set of N elements for which M are looked upon as successes and the other N M as failures. In connection with the binomial distribution, we are interested in the probability of getting x successes in n trial, but now we are choosing, without replacement, n of the N elements contained in the set.
There are ( M ) x way of choosing x of the M successes, and ( N M ) n x ways of choosing n x of the N M failure, and hence ( M)( N M ) x n x ways of choosing x successes and n x failures. Since there are ( N n) ways of choosing n of the N elements in the set, and we shall assume that they are all equally likely (which is what we mean when we say that the selection is random), ( then that the probability of x successes in n trials is M )( N M ) ( x n x / N ) n.
Definition 1 A random variable X has a hypergeometric distribution and it is referred to as a hypergeometric random variable if and only if its probability distribution is given by ( M )( N M ) x h(x; n, N, M) = ( N n) n x for x =0, 1, 2,, n, x M and n x N M. Thus, for sampling without replacement, the number of successes in n trials is a random variable having a hypergeometric distribution with parameters n, N, andm.
Example 2 As part of an air-pollution survey, an inspector decides to examine the exhaust of six of a company s 24 trucks. If four of the company s trucks emit excessive amounts of pollutants, what is the probability that none of them will be included in the inspector s sample? Solution. Substituting x =0,n =6,N = 24, and M = 4 into the formula for the hypergeometric distribution, we get ( M )( N M ) ( 4 24 4 ) x n x h(x; n, N, M) =h(0; 6, 24, 4) = ( N = 0)( 6 0 ( 24 ) =0.2880. n) 6
Example 3 Draw 6 cards from a deck without replacement. What is the probability of getting two hearts? Solution. Substituting x =2,n =6,N = 52, and M =13into the formula for the hypergeometric distribution, we get ( M )( N M ) ( 13 )( 52 13 ) x 2 h(x; n, N, M) =h(2; 6, 52, 13) = n x ( N n) = ( 52 6 6 2 ) =0.31513.
Example 4 49 balls are numbered 1-49. You select six numbers between 1 and 49. The ones you write on your lotto card. What is the probability that they contain (a) match 4? (b) match 6? Solution. (a) Substituting x =4,n =6,N = 49, and M =6into the formula for the hypergeometric distribution, we get ( M )( N M ) ( 6 49 6 ) x n x h(x; n, N, M) =h(4; 6, 49, 6) = ( N = 4)( 6 4 ( 49 ) =2.3062 10 n) 5. 6 (b) Substituting x =6,n =6,N = 49, and M = 6 into the formula for the hypergeometric distribution, we get ( M )( N M ) ( 6 49 6 ) x n x h(x; n, N, M) =h(6; 6, 49, 6) = ( N = 6)( 6 6 ( 49 ) =7.1511 10 n) 8. 6
Theorem 5 The mean and the variance of the hypergeometric distribution are μ = nm N and σ2 = nm(n M)(N n) N 2. (N 1)
Example 6 Suppose that a researcher goes to a small college of 200 faculty, 12 of which have blood type O-negative. She obtains a simple random sample of 20 of the faculty. Determine the mean and standard deviation of the number of randomly selected faculty that will have blood type O-negative. Solution. Substituting n = 20, N = 200, and M =12intothe formula for the hypergeometric distribution s mean and variance we obtain μ = nm 20 12 = =1.2 N 200 and σ = nm(n M)(N n) 20 12(200 12)(200 20) N 2 = (N 1) 200 2 =1.0101 (200 1) We expect that, in a random sample of 20 faculty members, 1.2 will have blood type O-negative.
Example 7 A case of wine has 12 bottles, 3 of which contains spoiled wine. A sample of 4 bottles is randomly selected from the case. (a) Find the probability distribution for X, the number of spoiled wine in the sample (b) What are the mean and variance of X? Solution. For this example n =4,N = 12, and M =3. Then ( 3 9 ) h(x;4, 12, 3) = x)( 4 x ). ( 12 4
h(x;4, 12, 3) = (3 x)( 9 4 x) ( 12 4 ). (a) The possible values for X are 0, 1, 2 and 3, with probabilities ( 3 )( 9 ( 3 )( 9 0 h(0; 4, 12, 3) = ( 4) 1 12 ) =0.25, h(1; 4, 12, 3) = ( 3) 12 ) =0.51, 4 4 )( 9 1) h(2; 4, 12, 3) = (b) The mean is given by and the variance is σ 2 = ( 3 )( 9 ( 3 2( 2) 3 12 ) =0.22, h(3; 4, 12, 3) = 4 nm(n M)(N n) N 2 (N 1) μ = nm N = 4 3 12 =1 = 4 3(12 3)(12 4) 12 2 (12 1) ( 12 ) =0.02. 4 =0.5455.
Binomial Approximation to Hypergeometric Distribution When N is large and n is relatively small compared to N (the usual rule of thumb is that n should not exceed 5 percent of N), there is not much difference between sampling with replacement and sampling without replacement, and the formula for the binomial distribution with the parameters n and θ = M N maybeusedto approximate hypergeometric probabilities.
Example 8 Among the 120 applicants for a job, only 80 are actually qualified. If five of the applicants are randomly selected for an in-depth interview, find the probability that only two of the five will be qualified for the job by using (a) the formula for the hypergeometric distribution; (b) the formula for the binomial distribution with θ =80/120 as an approximation.
Solution. (a) Substituting x =2,n =5,N = 120, and M =80 into the formula for the hypergeometric distribution, we get ( 80 )( 40 ) 2 3 h(x; n, N, M) =h(2; 5, 120, 80) = ( 120 ) =0.164. 5 rounded to three decimals; (b) substituting x =2,n =5,N = 120, and θ = 80 120 = 2 3 into the formula for the binomial distribution, we get ( b 2; 5, 2 ) = 3 ( 5 2 )( 2 3 ) 2 ( 1 2 3) 3 =0.165 rounded to three decimals. As can be seen from these results, the approximation is very close.
Example 9 Boxes contain 2000 items of which 10% are defective. Find the probability that no more than 2 defectives will be obtained in a sample of size 10. Solution. For this question x is equal to 0, 1 or 2, n = 10, N = 2000 and M = 2000 0.10 = 200. Since n =10 100 = 2000 0.05 = N 0.05 this means n is not exceed 5 percent of N we may use the method of binomial approximation to the hypergeometric distribution also.
(a) The hypergeometric distribution: P(X 2) = P(X =0)+P(X =1)+P(X =2) ( 200 )( 1800 ) ( 200 )( 1800 ) ( 200 )( 1800 ) 0 10 1 9 2 8 = ) + ) + ) ( 2000 10 ( 2000 10 ( 2000 10 = 0.3476 + 0.3881 + 0.1939 = 0.9296. (b) Binomial approximation to the hypergeometric distribution with θ = M/N = 200/2000 = 0.1: P(X 2) = P(X =0)+P(X =1)+P(X =2) ( ) ( ) 10 10 = 0.1 0 0.9 1 0+ 0.1 1 0.9 9 + 0 1 = 0.3487 + 0.3874 + 0.1937 = 0.9298. ( 10 2 ) 0.1 2 0.9 8
Outline 1 The Hypergeometric Distribution 2
When n, the number of trial, is large the calculation of binomial probabilities with the formula of binomial distribution will usually involve a prohibitive amount of work. In this section we shall present a probability distribution that can be used to approximate binomial probabilities of this kind. Specifically, we shall investigate the limiting form of the binomial distribution when n, θ 0, while nθ remains constant.
Definition 10 A random variable X has a Poisson distribution and it is referred to as Poisson random variable if and only if its probability distribution is given by p(x; λ) = λx e λ for x =0, 1, 2 x! where λ, the mean number of successes. In general, Poisson distribution will provide a good approximation to binomial probabilities when n 20 and θ 0.05. When n 100 and nθ <10, the approximation will generally be excellent.
Example 11 If 2 percent of books bound at a certain bindery have defective bindings, use the Poisson approximation to the binomial distribution to determine the probability that five of 400 books bound by this bindery will have defective bindings. Solution. Substituting x =5,λ = nθ = 400 0.02 = 8 into the formula for Poisson distribution, we get p(5; 8) = 85 e 8 =0.09160. 5!
Example 12 Records show that the probability is 0.00005 that a car will have a flat tire while crossing a certain bridge. Use the Poisson distribution to approximate the binomial probabilities that, among 10,000 cars crossing the bridge (a) exactly two will have a flat tire; (b) at most two will have a flat tire. Solution. (a) Substituting x =2, λ = nθ =10, 000 0.00005 = 0.5 into the formula for Poisson distribution, we get p(2; 0.5) = 0.52 e 0.5 =0.07582. 2! (b) p(2; 0.5) + p(1; 0.5)+p(0; 0.5) = 0.52 e 0.5 + 0.51 e 0.5 + 0.50 e 0.5 2! 1! 0! = 0.07582 + 0.30327 + 0.60653 = 0.98562.
Having derived the Poisson distribution as a limiting form of the binomial distribution, we can obtain formulas for its mean and its variance by applying the same limiting conditions (n,θ 0 and nθ = λ remains constant) to mean and the variance of the binomial distribution. For the mean we get μ = nθ = λ and for the variance we get σ 2 = nθ(1 θ) =λ(1 θ) which approaches λ when θ 0. Theorem 13 The mean and the variance of the Poisson distribution are given by μ = λ and σ 2 = λ. Theorem 14 The moment generating function of the Poisson distribution is given by M X (t) =e λ(et 1).
Although the Poisson distribution has been derived as a limiting form of the binomial distribution, it has many applications that have no direct connection with binomial distribution. In many practical situations we are interested in measuring how many times a certain event occurs in a specific time interval or in a specific length or area. For instance: 1 the number of phone calls received at an exchange or call centerinanhour; 2 the number of customers arriving at a toll booth per day; 3 the number of flaws on a length of cable; 4 the number of cars passing using a stretch of road during a day. The Poisson distribution plays a key role in modeling such problems.
Suppose we are given an interval (this could be time, length, area or volume) and we are interested in the number of successes in that interval. Assume that the interval can be divided into very small subintervals such that: 1 the probability of more than one success in any subinterval is zero; 2 the probability of one success in a subinterval is constant for all subintervals and is proportional to its length; 3 subintervals are independent of each other.
We assume the following. 1 The random variable X denotes the number of successes in the whole interval. 2 λ is the mean number of successes in the interval. X has a Poisson Distribution with parameter λ and P(X = x) =p(x; λ) = λx e λ, x =0, 1, 2,. x!
Example 15 The average number of trucks on any one day at a truck depot in a certain city is known to be 12. What is the probability that on a given day fewer than nine trucks will arrive at this depot? Solution. Let X be the number of trucks arriving on a given day. Then, using Poisson distribution with λ = 12, we get P(X < 9) = 8 p(x; 12) = x=0 = e 12 ( 12 0 + 125 5! =0.1550. 0! + 126 6! 8 12 x e 12 x=0 + 121 1! + 127 7! x! + 122 2! + 128 8! + 123 3! + 124 4! )
Example 16 The number of flaws in a fiber optic cable follows a Poisson distribution. The average number of flaws in 50m of cable is 1.2. (a) What is the probability of exactly three flaws in 150m of cable? (b) What is the probability of at least two flaws in 100m of cable? (c) What is the probability of exactly one flaw in the first 50m of cable and exactly one flaw in the second 50m of cable? Solution. (a) Mean number of flaws in 150m of cable is 1.2 3=3.6. So the probability of exactly three flaws in 150m of cable is p(3; 3.6) = 3.63 e 3.6 =0.21247 3!
(b) Mean number of flaws in 100m of cable is 1.2 2=2.4. Let X be the number of flaws in 100m of cable. P(X 2) = 1 P(X < 2) = 1 (P(X =0)+P(X =1)) =1 p(0; 2.4) p(1; 2.4) =1 2.40 e 2.4 0! =0.69156 2.41 e 2.4 1! (c) Now let X denote the number of flaws in a 50m section of cable. Then we know that P(X =1)=p(1; 1.2) = 1.21 e 1.2 =0.36143. 1! As X follows a Poisson distribution, the occurrence of flaws in the first and second 50m of cable are independent. Thus the probability of exactly one flaw in the first 50m and exactly one flaw in the second 50m is (0.36143)(0.36143) = 0.13063.
Example 17 Births in a hospital occur randomly at an average rate of 1.8 births per hour. What is the probability of observing 4 births in a given hour at the hospital? Solution. If we let X be the number of births in an hour, then X has a Poisson distribution: P(X =4)=p(4; 1.8) = 1.84 e 1.8 =0.072302. 4!
Example 18 Consider a telephone operator who, on the average, handles five calls every 3 minutes. (a) What is the probability that there will be no calls in the next minute? (b) At least one call? Solution. If we let X be the number of calls in a minute, then X has a Poisson distribution with λ = 5 3.So (a) P(no calls in the next minute) = P(X =0)=p(0; 5/3) = (5/3) 0 e 5/3 0! =0.1889 (b) P(At least one call) = P(X 1) = 1 P(X =0)= 1 p(0; 5/3) = 1 0.1889 = 0.8111.
Example 19 A certain kind of sheet metal has on the average, five defects per 10-square-feet. If we assume a Poisson distribution, what is the probability that a 15-square-feet sheet of the metal will have at least six defects? Solution. Let X denote the number of defects in a 15-square-foot sheet of the metal. Then, since the unit area is 10-square-feet, we have λ =5 1.5 =7.5 and P(X 6) = 1 P(X 5) = 1 (P(X =0)+P(X =1)+P(X =2) +P(X =3)+P(X =4)+P(X =5)) ( 7.5 =1 e 7.5 0 + 7.51 + 7.52 + 7.53 + 7.54 0! 1! 2! 3! 4! =1 (0.2414) =0.7586. ) + 7.55 5!
Thank You!!!