PROPERTIES OF PROBABILITY S is the sample space A, B are arbitrary events, A is the complement of A Proposition: For any event A P (A ) = 1 P (A). Proposition: If A and B are mutually exclusive, that is, A B =, then P (A B) = 0. Proposition: For any two events A and B, P (A B) = P (A) + P (B) P (A B) Definition: For any two events A and B with P (B) > 0 the A given that B has occured is defined by conditional probability of P (A B) = P (A B) P (B) Multiplication Rule: P (A B) = P (A B)P (B). The Law of Total Probability: Let A 1, A 2,..., A n be mutually exclusive and exhaustive events. Then for any other event B n P (B) = P (B A 1 )P (A 1 ) +... + P (B A n )P (A n ) = P (B A i )P (A i ) i=1 Definition: Two events A and B are independent if P (A B) = P (A) and are dependent otherwise. Proposition: A and B are independent if and only if P (A B) = P (A)P (B) 1
DISCRETE RANDOM VARIABLES Definition For a given sample space S of some experiment a random variable (rv) is any rule that associates a number with each outcomes in S. Definition:. Any random variable whose only possible values are 0 and 1 is called Bernoulli random variable. Example: Consider the coin tossing game. Then S = {H, T}. Let X be a random variable equal to 0 if the outcome is T, and equal to 1 if the outcome is H. Definition: A random variable is said to be discrete if its set of possible values is a discrete set, i.e. either if it consists of finite number of elements or if its elements can be listed in a sequence as x 1, x 2,..., x n,.... Definition: A probability mass function (pmf) of a discrete rv is defined for every real number x as p(x) = P (X = x). Remark: Notice that for every possible value x of the random variable, the pmf specifies the probability of observing that value when the experiment is performed. The conditions are required for any pmf. p(x) 0 and p(x) = 1 x Definition: The cumulative distribution function (cdf) F (x) of a discrete rv X with pmf p(x) is defined for every number x by F (x) = P (X x) = y:y x p(y). For any number x, F (x) is the probability that the observed value of X will be at most x. Definition: Let X be a discrete rv with set of possible values D and pmf p(x). The expected value or mean value of X, denoted by E(X) or µ X is E(X) = µ X = x D xp(x). Proposition: Rules of Expected Value for any constant a and b Consider the random variables X and Y then E(aX + b) = a E(X) + b, E(aX + by ) = a E(X) + b E(Y ). 2
Definition: Let X have a pmf p(x) and expected value µ. Then the variance of X, denoted by Var(X) or σx, 2 is Var(X) = (x µ) 2 p(x) = E[(X µ) 2 ]. D The standard deviation (SD) of X is σ x = σ 2 X. Proposition: [ ] Var(X) = σx 2 = x 2 p(x) µ 2 = E(X 2 ) (E(X)) 2 D Proposition: Rules of Variance for any constants a and b Consider two independent random variables X, Y, then Var(a X + b) = a 2 σx, 2 σ ax+b = a σ X, Var(a X + b Y ) = a 2 VarX + b 2 VarY, σ a X+b Y = a σ X + b σ Y. 3
CONTINUOUS RANDOM VARIABLES Definition: A random variable X is said to be continuous if its set of possible values is an entire interval of numbers that is, for some A B, any number x between A and B is possible value of X. Definition: A probability density function (pdf) of a continuous rv X is a function f(x) such that for any two numbers a and b with a b, P (a X b) = b a f(x)dx Remark: Notice that, the above definition means that the probability that X takes on a value in the interval [a, b] is the area under the graph of the density function f(x). Proposition: For f(x) to be a legitimate pdf, it must satisfy the following two conditions: 1. f(x) 0 for all x 2. + f(x)dx = 1, that is, the area under the entire graph of f(x) is equal to 1. Proposition: If X is a continuous rv, then for any number c, P (X = c) = 0. Definition: The cumulative distribution function (cdf) F (x) of a continuous rv X with pdf f(x) is defined for every number x by F (x) = P (X x) = x f(y)dy. Remark: For each x, F (x) is the area under the density curve to the left of x. It means that F (x) is increases (from 0 to 1) as x increases. Definition: The expected value or mean value of a continuous rv X with pdf f(x) is E(X) = µ X = + x f(x) dx. Proposition: If X is a continuous rv with pdf f(x) and h(x) is any function of X, then E [h(x)] = µ h(x) = + h(x) f(x) dx. 4
Definition: The variance of a continuous rv X with pdf f(x) and expected value µ is σ 2 X = Var(X) = + (x µ) 2 f(x)dx = E [ (X µ) 2]. The standard deviation (SD) of X is σ X = σ 2 X. Proposition: [ ] Var(X) = σx 2 = x 2 p(x) µ 2 = E(X 2 ) (E(X)) 2 D 5
IMPORTANT DISCRETE RANDOM VARIABLES 1. Binomial Random Variable Definition: An experiment for which the following 4 conditions hold is called binomial experiment. 1. The experiment consists of a sequence of n trials, where n is fixed in advance of the experiment. 2. The trials are identical, and each trial can result in one of the same possible outcomes, which we denote by success (S) or failure (F). 3. The trials are independent, so that the outcome on any particular trial does not influence the outcome on any other trial. 4. The probability of success is constant from trial to trial; we denote this probability by p. Definition: Given a binomial experiment consisting of n trials, the binomial random variable X associated with this experiment is defined as X = the number of S s among the n trials. Remark: A binomial random variable X has two parameters n and p. We will use the notation X B(n, p). Theorem: Let X B(n, p), that is, X is a binomial rv with parameters n and p. Then the pmf of X is: { ( ) n f(x) = P (X = x) = x p x (1 p) n x if x = 0, 1,..., n, 0 otherwise. Theorem: Let X B(n, p), then E(X) = µ X = np and Var(X) = σ 2 X = np(1 p) 2. Poisson Random Variable Definition: A random variable X is said to have a Poisson distribution with parameter λ, that is X P(λ), if the pmf of X is for some λ 0. P (X = x) = e λ λx, x = 0, 1, 2,... x! The value of λ is frequently a rate per unit time or unit area of occurrence of a certain event and X denotes the number of occurrence of this event during the unit time or area. 6
The Poisson probability model assumes that 1. the events occur independently, 2. the probability that an event occurs does not change in time, 3. the probability that an event will occur in an interval is proportional to the length of the interval, 4. the probability of more than one event occurring at the same time is vanishingly small. Proposition: If X has Poisson distribution with parameter λ, then E(X) = Var(X) = λ. Proposition: Suppose that we have a sequence of binomial rv-s B(n, p)-s, and we let n and p 0 in such a way that np remains fixed at value λ > 0. Then B(n, p) P(λ). Remark: According to this proposition, in any binomial experiment in which n is large and p is small, B(n, p) P(λ), where λ = np. As a rule of thumb, this approximation can safely be applied if n 100, p.01 and np 20. 3. Geometric Random Variable A geometric rv and distribution are based on an experiment satisfying the following conditions: 1. The experiment consists of a sequence of independent trials. 2. Each trial can result either success (S) or failure (F). 3. The probability of success is constant from trial to trial, so P ( S on triali) = p for i = 1, 2, 3,.... 4. the experiment continues (trials are performed) until the first success have been observed. Proposition: The pmf of a negative binomial rv X with parameter p = P (S) = p is P (X = x) = (1 p) x 1 p x = 1, 2,... Proposition: If X is a geometric rv with parameter p then E(X) = 1 p Var(X) = 1 p p 2. 7
3. Hypergeometric Random Variable The assumptions leading to the hypergeometric distribution are as follows: 1. The population or set to be sampled consists of N individuals, objects, or elements (a finite population). 2. Each individual can be characterized as a success (S) or a failure (F), and there are M successes in the population. 3. A sample of n individuals is drawn in such a way that each subset of size n is equally likely to be chosen. The random variable of interest is X = the number of S s in the sample. The probability distribution of X depends on the parameters n, M, and N. Example: Suppose that a sample of size n is to be chosen randomly (without replacement) from an urn containing N balls, of which M are white and N M are black. If X denotes the number of white balls selected, then X has hypergeometric distribution, with parameters n, M, and N. Proposition: The pmf of a hypergeometric random variable X, with parameters n, M, and N is given by ) P (X = x) = ( )( M N M x n x ( N n) for x an integer satisfying max(0, n N + M) x min(n, M). 8
IMPORTANT CONTINUOUS RANDOM VARIABLES 1. Normal Distribution Definition: A continuous rv X is said to have a normal distribution with parameters µ and σ 2, where < µ < + and 0 < σ, if the pdf of X is f(x, µ, σ) = 1 σ 2π e(x µ)2 /(2σ2 ) < x < + Remark: The statement that X is normally distributed with parameters µ and σ 2 is abbreviated by X N (µ, σ 2 ). Definition: The normal distribution with parameter values µ = 0 and σ = 1 is called standard normal distribution and the random variable that has this standard normal distribution is called standard normal random variable and will be denoted by Z. The pdf of Z is f(z, 0, 1) = ϕ(z) = 1 2π e z2 /2 < z < + The cdf of Z is Φ(z) = P (Z z) = z ϕ(y)dy. Notation: z α will denote the value on the measurement axis for which α of the area under the z curve lies to the right of z α, that is P (Z z α ) = α. Proposition: If X N (µ, σ 2 ), then is a standard normal rv. Z = X µ σ Empirical Rule: If the population distribution of a variable is (approximately) normal, then 1. Roughly 68% of the values are within 1SD (standard deviation) of the mean. 2. Roughly 95% of the values are within 2SD of the mean. 3. Roughly 99% of the values are within 3SD of the mean. 9
Proposition: Let X be a binomial rv based on n trials with success probability p. Then if the binomial probability histogram is not too skewed, X has approximately a normal distribution with µ = np and σ = npq. In particular for a possible value k of X P (X = k) = Φ ( ) k +.5 np. npq In practice, the approximation is adequate provided that both np 5 and nq 5. 2. Lognormal Distribution Definition: A nonnegative rv X is said to have a lognormal distribution if the rv Y = ln(x) has a normal distribution. The resulting pdf of a lognormal rv, when ln(x) N (µ, σ 2 ) is f(x; µ, σ) = { 1 xσ /(2σ 2 ) 2π e[ln(x) µ]2 if x 0 0 otherwise. Remark: Be careful here; µ and σ are not the mean and standard deviation of X but of ln(x). Proposition: The mean and variance of X can be shown to be E(X) = e µ+σ2 /2, Var(X) = e 2µ+σ2 ( e σ2 1 ). Because ln(x) has a normal distribution the cdf of X can be expressed in terms of the cdf Φ(z) of a standard normal rv Z. For x 0, ( F (x; µ, σ) = P (X x) = P (ln(x) ln(x)) = P Z ln(x) µ ) ( ) ln(x) µ = Φ σ σ Remark: Suppose that X 1 and X 2 are independent rv s from lognormal distribution with the same parameters. Let Y 1 = ln X 1 and Y 2 = ln X 2 Then ( ) ( ) ) Y1 + Y 2 ln X1 + ln X 2 E = E = E ( X 1 X 2. 2 2 In general, if we have X 1, X 2,..., X n independent lognormals with the same parameters and Y i = ln X i, i = 1,..., n then ( ) Y1 +... + Y n E n = n n X i i=1 Thus the mean of the transformed variables corresponds to the geometric mean of the original variables. 10
3. Exponential Distribution Definition: A nonnegative rv X is said to have exponential distribution with parameter λ if the pdf of X is { 1 f(x) = λ e x/λ if x 0 0 if x 0. The cdf of X is F (x) = P (X x) = 1 e x/λ for x 0. Proposition: The mean and variance of X can be shown to be E(X) = λ, Var(X) = λ 2. 11
DISTRIBUTIONS DERIVED FROM THE NORMAL DISTRIBUTION Definition: A random variable X with pdf g(x) = λα Γ(α) xα 1 e λx x 0 has gamma distribution with parameters α > 0 and λ > 0. The gamma function Γ(x), is defined as Γ(x) = Properties of the Gamma Function: (i) Γ(x + 1) = xγ(x) (ii) Γ(n + 1) = n! (iii) Γ(1/2) = π. 0 u x 1 e u du. Remarks: 1. Notice that an exponential rv with parameter 1/θ = λ is a special case of a gamma rv. with parameters α = 1 and λ. 2. The sum of n independent identically distributed (iid) exponential rv, with parameter λ has a gamma distribution, with parameters n and λ. 3. The sum of n iid gamma rv with parameters α and λ has gamma distribution with parameters nα and λ. Definition: If Z is a standard normal rv, the distribution of U = Z 2 called the chi-square distribution with 1 degree of freedom. The density function of U χ 2 1 is f U (x) = x 1/2 2π e x/2, x > 0. Remark: A χ 2 1 random variable has the same density as a random variable with gamma distribution, with parameters α = 1/2 and λ = 1/2. Definition: If U 1, U 2,..., U k are independent chi-square rv-s with 1 degree of freedom, the distribution of V = U 1 + U 2 +... + U k is called the chi-square distribution with k degrees of freedom. Using Remark 3. and the above remark, a χ 2 k rv. follows gamma distribution with parameters α = k/2 and λ = 1/2. Thus the density function of V χ 2 k is: f V (x) = 1 2 k/2 Γ(k/2) xk/2 1 e x/2, x > 0 Proposition: If V has a chi-square distribution with k degree of freedom, then E(V ) = k, Var(V ) = 2k. 12
Definition: If Z N (0, 1) and U χ 2 n and Z and U are independent, then the distribution of Z U/n is called the t distribution with n degrees of freedom. Proposition: The density function of the t distribution with n degrees of freedom is f(t) = ( ) (n+1)/2 Γ[(n + 1)/2] 1 + t2. nπ Γ(n/2) n Remarks: For the above density f(t) = f( t), so the t density is symmetric about zero. As the number of degrees of freedom approaches infinity, the t distribution tends to be standard normal. Definition: Let U and V be independent chi-square variables with m and n degrees of freedom, respectively. The distribution of W = U/m V/n is called F distribution with m and n degrees of freedom and is denoted by F m,n. Remarks: (i) If T t n, then T 2 F 1,n. (ii) If X F n,m, then X 1 F m,n. 13
COVARIANCE and CORRELATION of RANDOM VARIABLES Definition: Let X and Y are random variables with expected values µ X and µ Y, respectively. The covariance of X and Y is provided that the expectation exists. Proposition: Cov(X, Y ) = E[(X µ X )(Y µ Y )], Cov(X, Y ) = E(XY ) E(X)E(Y ) Proof: By definition Cov(X, Y ) = E[(X µ X )(Y µ Y )] = E(XY Xµ Y Y µ X + µ X µ Y ) = = E(XY ) µ X µ Y µ X µ Y + µ X µ Y = E(XY ) E(X)E(Y ). Proposition: (i) If X and Y are independent random variables (ii) If X = Y with Var(X) = Var(Y ) = σ 2 Cov(X, Y ) = 0. Cov(X, Y ) = Var(X) = σ 2 Definition: If X and Y are random variables and the variances and covariances are exist and the variances are nonzero, then the correlation of X and Y, denoted by ρ, is ρ = Cor(X, Y ) = Cov(X, Y ) Var(X)Var(Y ) Proposition: (i) 1 ρ 1. (ii) ρ = ±1 if and only if X = a + by for some constants a and b. Proposition: Let X and Y are arbitrary random variables and the variances and covariances exist. Then Var(X + Y ) = Var(X) + Var(Y ) + 2Cov(X, Y ) Proof: Var(X + Y ) = E[(X + Y µ X µ Y ) 2 ] = E[((X µ X ) + (Y µ Y )) 2 ] = = E[(X µ X ) 2 + (Y µ Y ) 2 + 2(X µ X )(Y µ Y )] = = Var(X) + Var(Y ) + 2Cov(X, Y ). 14