Common Probability Distributions Shyue Ping Ong
Readings Chapters 4.1-4.7 2
Overview Discrete Uniform Bernoulli Binomial Geometric Negative Binomial Poisson Continuous Uniform Exponential Normal/Gaussian 3
Uniform distribution! # p X (x) = " # $ Uniform Discrete RV 1 for all x in the valid range, where n is the number of possible values n 0 otherwise 1/n comes from the normalization property. Example: Uniform Continuous RV # % f X (x) = $ % & # % F X (x) = $ % &% E[X] = b + a 2 (b a)2 var(x) = 12 1 if a x b b a 0 otherwise 0 if x < a x a if a x b b a 1 if x > b 4
Bernoulli Random Variable Used to model one-off probabilistic experiments with only two outcomes Result of a coin toss Medical test result (positive / negative) Status of network link (up / down) p if x = 1 p X (x) = 1 p if x = 0 E[X] = p var(x) = p(1 p) Derivation of E[X] and var(x). 5
Binomial Random Variable Random variable defining the number of successes in a sequence of n repetitions of a Bernoulli trial, e.g., number of heads in n tosses of a coin. Denoted as X~B(n, p), where n and p are the parameters specifying the binomial distribution. p X (k) = E[X] = np n k var(x) = np(1 p) pk (1 p) n k where k = 0,1,2,...,n and Derive E[X] and var(x). Hint: How is the binomial RV related to the Bernoilli RV? n k = n! k!(n k)! 6
Binomial Distribution for different n and p n = 10 p = 0.5 p = 0.1 n = 100 7
Example Samples of 20 parts from a metal punching process are selected every hour. Typically, 1% of the parts require rework. Let X denote the number of parts in the sample of 20 that require rework. A process problem is suspected if X exceeds its mean by more than 3 standard deviations. If the percentage of parts that require rework remains at 1%, what is the probability that X exceeds its mean by more than 3 standard deviations? 8
Geometric random variable Random variable defining the number of successive Bernoulli trials before obtaining a successful result, e.g., number of tosses of a coin before resulting in a head. If the probability of success in a Bernoulli trial is p, the PMF is given by k-1 failures Success on k th trial p X (k) = (1 p) k 1 p where k = 1,2,..., p = 0.5 E[X] = 1 p var(x) = 1 p p 2 Proof that p x (k) is a legitimate probability law. 9
Negative Binomial Random Variable Random variable describing the number of successes before reach r failures in a sequence of Bernoulli experiments. Number of ways to select x successes out of first x-1+r trials. The final trial is a success by definition. x successes r failures p X (x) = x + r 1 x E[X] = pr 1 r pr var(x) = (1 p) 2 px (1 p) r where x = 0,1,2,3,... r = 3, p = 0.5 10
Poisson process Stochastic process that counts the number of events and the time points at which these events occur in a given time interval τ, with the following properties: The probability of more than one event in a subinterval tends to zero. The probability of one event in a subinterval tends to λτ. The event in each subinterval is independent of other subintervals. 11
Random variables associated with Poisson process Probability distribution of the number of events N τ in a given subinterval is a discrete Poisson distribution with mean λτ, where λ is known as the rate parameter. p Nτ (k) = e λτ (λτ ) k where k = 0, 1, 2, 3, 4,... k! E[N τ ] = var(n τ ) = λτ Note: A potential source of confusion is that sometimes, λ is denote the mean of the Poisson distribution. In the Poisson process, the variable λ denotes the rate, which is multiplied by the time interval τ to get the average number of events. Probability distribution of the waiting time until the next occurrence is a continuous exponential distribution. #% f T (t) = $ &% λe λt if t 0 0 otherwise 12
Poisson random variable Expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event. Example: If a network receives on average λ packets per hour. If the packets are independent of each other, a reasonable assumption is that the number of pieces of packets received per hour obeys a Poisson distribution. Can also be used for the number of events in other kinds of intervals such as distance, volume, etc. (λτ ) k p X (k) = e λτ where k = 0,1,2,3... k! λτ is the mean and λ is the rate of the process. Proof that p x (k) is a legitimate probability law. 13
Poisson distribution for different λ λ = 0.5 λ = 2 λ =10 B(100, 0.1) Poisson distribution is good approximation for B(n, p) when np = λ and n is large and p is small! 14
Example Inclusions are defects in poured metal caused by contaminants. The number of (large) inclusions in cast iron follows a Poisson distribution with a mean of 2.5 per cubic millimeter. Determine the following: a) Probability of at least one inclusion in a cubic millimeter. b) Probability of at least four inclusions in 5.0 cubic millimeters. c) Volume of material to inspect such that the probability of at least one inclusion is 0.99. 15
Exponential distribution " $ f T (t) = # %$ " $ F T (t) = # %$ λe λt if t > 0 0 otherwise 0 if t < 0 1 e λt if t 0 E[T ] = 1 λ var(t ) = 1 λ 2 16
Memorylessness The exponential distribution has the property of memorylessness: P(X > x +δ X > x) = P(X > δ) E.g., if X is the waiting time for an event to occur relative to an initial time, memorylessness implies that, if X is conditioned on a failure to observe the event over a initial period of x, the distribution of the remaining waiting time is the same as the original unconditional distribution. 17
Random processes in nature The exponential distribution used to model many processes in nature: Radioactive decay Distance between mutations on DNA strands Memorylessness property is particularly useful in modeling failure rates in engineering and manufacturing. 18
Relationship between half-life and λ In science, you often encounter the term half-life instead of the rate constant, especially in radioactive decay. Half life is essentially the time at which the cumulative probability of the exponential distribution is exactly 0.5, i.e., a 50% chance of the event occurring within the half-life. For a sizable sample of atoms, that means half the sample would have decayed by that time, hence the origin of the term half-life. The relationship between the half-life and the rate constant can be derived as follows: 19
Example Let X denote the time between detections of a particle with a Geiger counter and assume that X has an exponential distribution with E[X] = 60 seconds. i. What is the probability of detecting a particle within 30 seconds of starting the counter? ii. Suppose that we turn on the Geiger counter and wait 2 minutes without detecting a particle. What is the probability that a particle is detected in the next 30 seconds? 20
Another example carbon dating All living organisms contains carbon in a constant ratio of C 14 to C 12. due to constant exchange of carbon with the environment. When an organism dies, the C 14 begins to decay with a half-life of 5730 years. By checking the quantity of C 14 in a fossil, we can determine its age, a process known as carbon dating. Consider a fossil specimen that today contains 12g of C 14. It has been estimated that it initially contained 32g of C 14. Estimate its age. 21
Normal distribution 22
Normal Random Variable Probably the most widely used model for a continuous random variable Also known as the Gaussian distribution or the Bell curve Sample mean for a large number of independent experiments follows a normal distribution (central limit theorem). Arises in the study of numerous basic physical phenomena. E.g., the velocities of gas molecules in thermal equilibrium, position of a diffusing particle, etc. Other random variables with different means and variances can be modeled by normal probability density functions with appropriate choices of the center and width of the curve. Denoted as N(µ, σ 2 ) 1 f Z (z) = σ 2π e (z µ ) 2 2σ 2 E[Z] = µ, var(z) = σ 2 PDF CDF 23
Properties of the normal distribution 1. Symmetric Implies that the Normal distribution is not a good model for variables that are inherently skewed, e.g., price of a finanicial derivative. 2. Non-zero over entire real line 3. Preserved under linear transformations, i.e., If X ~ Ν(µ,σ ), then Y = ax + b ~ N(aµ + b, aσ ). This last property is particularly important as it means we only need to know the values for one normal distribution, and we can easily then map all other normal distributions with different means and variances to it. 24
The Standard Normal N(0, 1), i.e., the normal distribution with mean 0 and variance 1. PDF: f Z (z) = 1 2π e z 2 2 CDF: Φ(z) = P(Z z) = 1 2π z t 2 2 e dt The CDF cannot be easily written as an algebraic expression. Instead, its values are tabulated in a standard normal table (one is provided in class wiki -> resources, but widely available in different versions on internet). 25
Using the standard normal table Let s calculate the following probabilities for a standard normal random variable: i. P(Z<0.25) ii. P(Z>-1.23) iii. iv. P(Z>1) P(-1<Z<1) v. P(-2.25<Z<1.25) For the purposes of these examples, we will only use the table for positive z to illustrate various properties of the standard normal. This is how the tables were provided in the past, though nowadays you can use computers. It is important for you to understand these properties and you should refrain from using computers for solutions (which will not be allowed in the exams). 26
Portion of Standard normal table 27
The reverse problem How do we compute a Z value given a probability? Find the values of z that have the following probabilities: i. P(Z < z) = 0.95 ii. P(-z < Z < z) = 0.95 28
Normal random variables with arbitrary mean and variance We can standardize normal random variables with arbitrary means and variances Consider a normal random variable X with mean µ and variance σ 2. Let us now define a new random variable Z as follows: Z = X µ σ Since this is a linear function, we know Z must be normally distributed with mean: E[Z] = E[X] µ σ = 0 var(z) = var(x) σ 2 =1 i.e., Z has the standard normal distribution! 29
Example You are measuring the yield strength of a large number of samples of a form of steel. If the measured yield strength is normally distributed with mean 200 MPa and standard deviation 50 MPa, what is the probability that a random sample chosen has measured yield strength: i. Greater than 300 MPa? ii. Between 100 and 250 MPa? 30
Rules of thumb for the Normal distribution ~68% of samples are within one standard deviation from mean ~95% of samples are within two standard deviations from mean 31
Using statistical software to compute probabilities Most scientific / statistic software packages contain methods to calculate normal probabilities Python + scipy z µ σ Matlab has a similar function called normcdf, with the same arguments. 32
Central limit theorem Let {X 1,, X n } be a random sample of size n of i.i.d. random variables drawn from any distribution of expected value given by µ and finite variance given by σ 2. The sample average is given by: Type equation here. S n = X 1 + X 2 + + X n n The central limit theorem implies that for sufficient large n, SS n ~ N(µ, σ n ) 1 ~N(μ, σ8 n ) http://bit.ly/1z2acwv 33
Application of Central Limit Theorem Surprisingly general. Only assumptions are independence and finite mean and variance. Implies that the normal distribution is a good model in many science and engineering applications: Independent repeated experiments Noise S S n ~ N(µ, σ 1 ~N(μ, σ8 n ) n ) Variance decreases as n increases, which means with more independent experiments, you become increasingly confident that the measured sample mean is close to the true population mean. 34
Example You are conducting nanoindentation experiments to measure the hardness of carbon steel. The steel is known to have a hardness of 80 HV5 (HV5 is the unit of hardness) with a standard deviation of 10 HV5. If you perform 30 experiments on samples of the steel, what is the probability that the average hardness measured differs from the population average by more than 5%? 35
Another example You are still performing the nanoindentation experiments. How many experiments do you need to perform to ensure that the probability that your measured sample average differs from the population mean by more than 1% is less than 5%? 36
Binomial distribution when n is large Normal distribution can be used to approximate the binomial distribution (a discrete distribution!) when n is large Particularly effective in situations where straightforward computation of the binomial probability is challenging E.g., you are performing high throughput testing on the fracture of a material. Let s say the material fractures with probability p. If you perform 10000 tests, when is the probability that there are more than 1000 fractured samples? P(X >1000) = 10000 x=1001! # " 10000 x $ & p x (1 p) 10000 x % 9000 sums (or 1000 sums if you use the complement)! 37
Binomial Distribution for different n and p n=10 p = 0.5 p = 0.1 n=100 Normal approximation is valid when np is large. 38
de Moivre Laplace approximation If X is a binomial random variable with parameters n and p, the distribution of X can be approximated as: P(X x) P(Z P(X x) P(Z x + 0.5 np np(1 p) ) x 0.5 np np(1 p) ) Mean of B(n,p) Var of B(n,p) The approximation is good for np > 5 and n(1 p) > 5. Continuity corrections 39
Example Using the previous example, let s say p=0.095 Side note: You may say, well, I can nowadays compute the binomial probability exactly by writing a short computer program. That is true only up to a certain extent. Computing even the first term! # " 10000 1001 $ & % will test the precision limits of most computers! 40
Approximating the Poisson distribution Given that we have shown earlier that the Poisson distribution is a good approximation for the binomial distribution when n is large, it follows that the Normal distribution is a good approximation for the Poisson distribution when n is large as well. If X is a Poisson random variable with parameters λ, the distribution of X can be approximated as: P(X x) P(Z x λ λ ) The approximation is good for λ > 5. 41