MAT 2379 3X (Summer 2012) Continuous Distributions Up to now we have been working with discrete random variables whose R X is finite or countable. However we will have to allow for variables that can take values an interval of real numbers. We will call such variables continuous. Examples of continuous random variables: length, area, volume, pressure, temperature, mass and many others. For a continuous random variable X will also specify its probability distribution in two different ways: 1. with its cumulative distribution function (c.d.f.) which is F X (x) = P (X x), where x is a real number; 2. with its probability density function. We will define the density below. We have seen that for discrete random variables that the cumulative distribution function is a non-decreasing function step function. As we encounter a value in the range of the random variable, then a probability mass is added to the cumulation of probabilities. Now if all values in an interval are possible values, then we could cumulate probabilities continuously. This motivates the following definition. 1
Definition: Let X be a random variable with cumulative distribution function F X. If the function F X is continuous, then we say the random variable X is continuous. Definition : Let X be a continuous random variable with c.d.f. F. We define its probability density function (p.d.f.) as a function f X such that { F f X (x) = X (x), if it exists, 0, otherwise, where F is the derivative function of F. In other words, the density is the rate at which the probabilities are cumulated. By the Fundamental Theorem of Calculus, we can obtain another interpretation of the density: P (a < X b) = P (X b) P (X < a) Remarks: = F X (b) F X (a) = b a f X (x) dx. So the probability that X will fall in the interval (a, b] is the area under the probability density from x = a to x = b. A single real number will not have a mass, that is for a continuous random variable X, P (X = x) = 0, for all x 2
For a, b R, such that a < b, then P (a < X < b) = P (a X < b) = P (a X b) = P (a < X b) = See the accompanying graph: b a f(x) dx = F (b) F (a) It is the graph of a cdf and the corresponding density. The area under the density from x = 39.1 to x = 53 is 0.23. This means that P (39.1 < X < 53) = 0.23. Interpretation: As we repeat the experiment a large number of times about 23% of the values should fall between 39.1 and 53. 3
Properties of a p.d.f. : 1. f(x) 0 2. f(x) dx = 1 3. [Computational Property] Let A R then P (X A) = f(x) dx In particular, let F be the c.d.f. for X, then A F (x) = P (X x) = x f(t) dt. Definition: Let X be a continuous random variable with probability density function f X. The expected value of X is defined as E[X] = x f(x) dx. Definition: Let X be a continuous random variable with probability density function f. Its mean is defined as µ X = E[X] = Its variance is defined as σ 2 X = Var(X) = E[(X µ) 2 ] = x f(x) dx. Its standard deviation is defined as σ X = Var(X). (x µ) 2 f(x) dx 4
Remark: The mean, the variance and the standard deviation for a continuous random variable are interpreted in the same way as the corresponding measures for a discrete random variable. The mean of X represents the center of mass of the distribution and also the expected value of the random variable. The variance of X is a measure of the variability or dispersion of the values about the mean, in units squared. The standard deviation also measures the variability or the dispersion of the values about the mean, but in the same units as the original measurements. 5
We now present a distribution, known as the normal distribution, that is often used as an approximation to the true distribution of a random variable. It is often a reasonable model but not always. We will see a theorem later that will explain why the normal distribution is often a reasonable model. Normal Distribution Definition: A continuous random variable X with p.d.f. f(x) = 1 2πσ 2 e (x µ)2 /(2 σ 2), < x < is said to follow a normal distribution with parameters µ and σ, where < µ < and σ > 0. Note: Let X be a normal random variable with parameters µ and σ, then its mean and variance are respectively E[X] = µ and Var[X] = σ 2. Notation: X N(µ, σ 2 ) will mean that X follows a normal distribution with mean µ and variance σ 2. 6
Properties of Normal Curves: 1. The graph of the density of any normal random variable is symmetric, bell-shaped curve centered about its mean µ. Note: We call µ a location parameter. 2. The points of inflection in the curve occur for values of X one standard deviation away from the mean, i.e. at the values x = µ ± σ. Note: We call σ a shape parameter. Empirical Rule: about 2/3 of the values are within 1 standard deviation from the mean; about 95% of the values are within 2 standard deviations from the mean; about 99.7% of the values are within 3 standard deviations from the mean. 7
Definition: A standard normal random variable Z is a normal random variable with mean E[Z] = 0 and variance Var(Z) = 1. Its p.d.f. is Its c.d.f is φ(z) = 1 2π e z2 /2, < z < Φ(z) = P (Z z) = = z z φ(x) dx 1 2π e x2 /2 dx Remark: Some values of Φ(z) = P [Z z] are found in a table given on the web page accompanying these notes. We will also learn to use R to compute these values. Properties of the standard normal : Let Z be a standard normal random variable, then 1. its p.d.f φ is symmetric about the origin, i.e. z = 0; 2. Φ( z) = 1 Φ(z), that is P (Z z) = P (Z z). 8
Example 1: Using the table for the standard normal and also using Minitab answer the following questions. Let Z be a standard normal random variable, that is Z follows a N(0, 1) distribution. Find 1. P (.53 < Z < 2.06) 2. P ( 2.63 Z.51) 3. P ( Z > 1.96) 4. c such that P ( c Z c) =.95 5. c such that P (Z > c) =.10 6. c such that P (Z < c) =.99 Standardization Theorem: If X is a normal random variable with mean E[X] and variance V [X], then Z = X E[X] σ X, ( where σ X = V [X]), is a standard normal random variable. Consequences of the standardization theorem: ( X E[X] P (X x) = P x E[X] ) σ X σ ( X = P Z x E[X] ) ( ) x E[X] = Φ. σ X σ X and ( ) ( ) d E[X] c E[X] P (c X d) = Φ Φ σ X σ X 9
Example 2: Assuming that among diabetics, the fasting blood glucose level X (in mg per 100 ml) may be assumed to be approximately normally distributed with mean 106 and standard deviation 8. (a) What percentage of diabetics have fasting blood glucose levels between 90 and 120? (b) Find a level x such that 25% of diabetics have a fasting glucose level lower than x. (c) If we selected 5 diabetics at random, what is the probability that at most 1 would have fasting blood glucose level between 90 and 120? 10