Sociology 6Z03 Topic 10: Probability (Part I) John Fox McMaster University Fall 2016 John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 1 / 29 Outline: Probability (Part I) Introduction Probability Basics John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 2 / 29
Introduction Probability Theory Probability theory is the area of mathematics that deals with random phenomena. Individual random events are intrinsically unpredictable, but repeated random events are orderly and patterned. It is the purpose of probability theory to describe these patterns literally to bring order to chaos. Much of modern mathematics for example, calculus, algebra, and geometry is of ancient origin, but probability theory did not exist before the European Renaissance (specifically, the 17th century the late Renaissance or early Enlightenment). One use of probability theory is to provide a foundation for statistical inference. Statistical inference is the process of drawing conclusions about characteristics of a population based on a sample drawn at random from the population. John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 3 / 29 Probability Basics Experiment, Outcomes, Sample Space, Realization In probability theory: an experiment is a repeatable procedure for making an observation; an outcome is a possible observation resulting from an experiment; and the sample space of the experiment is the set of all possible outcomes. Any specific realization of the experiment produces a particular outcome in the sample space. John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 4 / 29
Probability Basics Finite and Continuous Sample Spaces Sample spaces may be discrete or continuous. If, for example, we flip a coin twice and record on each flip whether the coin shows heads (H) or tails (T ), then the sample space of the experiment is discrete and finite, consisting of the outcomes S = {HH, HT, TH, TT } If, in contrast, we burn a light bulb until it fails, recording the failure time in hours and fractions of an hour, then the sample space of the experiment is continuous and consists of all positive real numbers (not bothering to specify an upper limit for the life of a bulb): S = {x : x > 0}. John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 5 / 29 Probability Basics Sample Spaces Thought Question Suppose that we flip a coin only once and observe whether the coin comes up H or T. What is the sample space of this experiment? A S = {HH, TT }. B S = {HH, HT, TH, TT }. C S = {H, T }. D S = {HT }. E I don t know. John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 6 / 29
Probability Basics Events An event is a subset of the sample space of an experiment that is, a set of outcomes. An event is said to occur in a realization of the experiment if one of its constituent outcomes occurs. For example, for S = {HH, HT, TH, TT }, the event E = {HH, HT }, representing a head on the first flip of the coin, occurs if we obtain either the outcome HH or the outcome HT. John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 7 / 29 Probability Basics Axioms of Probability Probabilities are numbers assigned to events in a manner consistent with the following axioms (rules, as given by Moore): P1: The probability of an event E is a number between 0 and 1: 0 P(E ) 1. P2: The sample space S is exhaustive some outcome must occur: P(S) = 1. P3: Two events A and B are disjoint if they have no outcomes in common; disjoint events cannot occur simultaneously. The probability of occurrence of one or the other of two disjoint events is the sum of their separate probabilities of occurrence: For A and B disjoint, P(A or B) = P(A) + P(B). P4: The probability that an event E does not occur is the complement of its probability of occurrence: P(not E ) = 1 P(E ). John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 8 / 29
Probability Basics Interpretation of Probability Probabilities can be interpreted as long-run proportions. For example, to say that the probability of an event is.5 means that the event will occur approximately half the time if the experiment is repeated a very large number of times, with the approximation tending to improve as the number of repetitions of the experiment increases. This interpretation provides a way to estimate probabilities: Repeat the experiment many times and observe the proportion of times that the event occurs. This objective interpretation of probability is the basis of the classical approach to statistical inference. There are subjective or personal approaches to probability as well, where a probability is interpreted as strength of belief that an event will occur or that a proposition is true. John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 9 / 29 Probability Basics Axioms of Probability, Probability Models The fourth axiom, P4, is not really needed: It can be deduced from the others. (Can you see how?) Consider the event E = {o a, o b,..., o m } where the o i s are outcomes that is elements of the sample space S. Then, by the third axiom, the probability of E is the sum of probabilities of its constituent outcomes, P(E ) = P(o a ) + P(o b ) + + P(o m ). Thus, if we know the probabilities of all of the outcomes in the sample space, we can figure out the probability of any event. A probability model for an experiment consists of the sample space for the experiment and an assignment of probabilities to events in a manner consistent with the axioms. The axioms are not so restrictive as to imply a unique assignment of probabilities to a sample space. There are always infinitely many probability models for an experiment. John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 10 / 29
Probability Basics Probability Models: Examples Suppose, for example, that all outcomes in the sample space S = {HH, HT, TH, TT } are equally likely, so that P(HH) = P(HT ) = P(TH) = P(TT ) =.25 This corresponds to a fair coin flipped in a fair manner. Then, for E = {HH, HT } ( a head on the first flip ), the probability of E is P(E ) =.25 +.25 =.5. Let A = {TH, TT } be the event a tail on the first flip, and B = {HH} the event two heads. The events A and B are disjoint, and the event A or B is {TH, TT, HH}; thus, P(A or B) =.75 = P(A) + P(B) =.5 +.25 John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 11 / 29 Probability Basics Probability Models: Examples Thought Question Continuing with the preceding example, with S = {HH, HT, TH, TT } and equally likely outcomes, as before let A = {TH, TT } be the event a tail on the first flip. Now let C be the event a tail on the second flip. What outcomes are in C? A C = {TH, TT }. B C = {HT, TT }. C C = {TT }. D C = {HH, TH}. E I don t know. John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 12 / 29
Probability Basics Probability Models: Examples Thought Question Are the events A = {TH, TT } ( a tail on the first flip ) and C = {HT, TT } ( a tail on the second flip ) disjoint? A Yes. B No. C I don t know. John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 13 / 29 Probability Basics Probability Models: Examples Equally likely outcomes produce a simple example but any assignment of probabilities to outcomes that sums to 1 is consistent with the axioms. For example, a coin that is weighted to produce 2/3 heads yields the following probabilities for two independent flips: P(HH) = 4/9, P(HT ) = P(TH) = 2/9, P(TT ) = 1/9 John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 14 / 29
Probability Basics Probability Models: Examples Thought Question With this assignment of probabilities to outcomes, P(HH) = 4/9, P(HT ) = P(TH) = 2/9, P(TT ) = 1/9 what is the probability of the event E = {HH, HT }? A 6/9 = 2/3. B 4/9. C 2/4 = 1/2. D 8/9. E I don t know. John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 15 / 29 A random variable is a rule that assigns a number to each outcome in the sample space of an experiment. For example, the following random variable X counts the number of heads in each outcome of the coin-flipping experiment: outcome value x of X HH 2 HT 1 TH 1 TT 0 It is sometimes useful to distinguish between the random variable X (denoted by an upper-case letter) and a particular value of the random variable x (denoted by a lower-case letter). John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 16 / 29
Probability Distributions The probability distribution of a discrete random variable lists all possible values x i of the variable and shows the probability p i of observing each. For example, for the coin-flipping experiment with equally likely outcomes, the probability distribution of the number of heads X is x i p i = P(X = x i ) TT = 0.25 HT, TH = 1.50 HH = 2.25 sum 1.00 Notice that although the outcomes in the original sample space of the experiment are equally likely, the values of the random variable X are not equally likely. John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 17 / 29 Probability Distributions In general, a discrete, finite random variable has a number of possible values, x 1, x 2,..., x k, with probabilities p 1, p 2,..., p k. Following from the axioms of probability theory, each probability p i is a number between 0 and 1, and the sum of all probabilities is p 1 + p 2 + + p k = 1. We can find the probability of particular events that refer to the random variable by summing the probabilities for the values that make up the event. For example, the probability of getting at least one head is P(X 1) = P(X = 1) + P(X = 2) =.50 +.25 =.75. John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 18 / 29
Probability Distributions The probability distribution of a discrete random variable can be graphed as follows: Probability, p i 0.0 0.1 0.2 0.3 0.4 0.5 0 1 2 Number of Heads, x i John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 19 / 29 Probability Distributions Thought Question Continuing with the sample space S = {HH, HT, TH, TT }, define the random variable Y = 1 when both flips are the same (i.e., HH or TT ) and Y = 0 when they are different (i.e., HT or TH). If the four outcomes are equally likely, each with probability 1/4 = 0.25, is the following probability distribution for Y correct? y i p i = P(Y = y i ) 0 0.5 1 0.5 sum 1.0 A Yes. B No. C I don t know. John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 20 / 29
Mean of a Random Variable The mean µ of the random X, also called the expectation or expected value of X, is defined in the following manner: µ = x 1 p 1 + x 2 p 2 + + x k p k = x i p i It is conventional to use Greek letters like µ to represent numerical summaries of probability distributions. The expected value of X is also written as E (X ). For our example: x i p i x i p i 0.25 0.00 1.50 0.50 2.25 0.50 sum 1.00 µ = 1.00 John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 21 / 29 Mean of a Random Variable The mean of X gives the average value of the random variable in the following senses: The mean µ is the average of the possible values of X, each weighted by its probability of occurrence. If you think of probabilities as weights arranged along a bar, the mean µ is the point at which the bar balances:.25.50.25 0.0 0.5 1.0 1.5 2.0 µ Number of Heads, x i If we repeat the experiment many times and calculate the value of X for each realization, then the average of these values of X is approximately µ, with the approximation tending to get better as the number of repetitions increases. John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 22 / 29
Mean of a Random Variable Thought Question Recall the random variable Y = 1 when both flips in the coin-flipping experiment are the same and Y = 0 when they are different. With equally likely outcomes, Y has the probability distribution What is the mean of Y? A 0. B 1. y i p i = P(Y = y i ) 0 0.5 1 0.5 sum 1.0 C 0.5. John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 23 / 29 Variance and Standard Deviation of a Random Variable The variance σ 2 of X measures how spread out the distribution of X is around its mean µ: σ 2 = (x 1 µ) 2 p 1 + (x 2 µ) 2 p 2 + + (x k µ) 2 p k = (x i µ) 2 p i The variance of X is also written as V (X ) or Var(X ). The standard deviation σ of X is just the square root of the variance (and restores the units of the variable). Continuing the example (where µ = 1), x p i x i µ (x i µ) 2 p i 0.25 1 0.25 1.50 0 0.00 2.25 1 0.25 sum 1.00 σ 2 = 0.50 σ = 0.50 = 0.707 heads John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 24 / 29
Variance and Standard Deviation of a Random Variable Thought Question Recall the random variable Y with probability distribution y i p i = P(Y = y i ) 0 0.5 1 0.5 sum 1.0 and mean µ = 0.5. What are the variance and standard deviation of Y? A σ 2 = 1 and σ = 1. B σ 2 = 0.25 and σ = 0.5. C σ 2 = 0.5 and σ = 0.25. D I don t know. John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 25 / 29 Mean, Variance and Standard Deviation The formulas for µ, σ 2, and σ are very similar to the formulas for the mean x, variance s 2, and standard deviation s of a variable in a data set: random variable variable in a data set mean µ = x i p i 1 = x i n 1 x = n x i s2 = 1 n 1 (x i x) 2 variance σ 2 = (x i µ) 2 p i standard deviation σ = σ 2 s = s 2 = (x i x) 2 1 n 1 John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 26 / 29
Continuous Random variables defined on continuous sample spaces may themselves be continuous. The probability distribution of a continuous random variable X is described by a density curve, p(x). It is meaningless to talk of the probability of observing specific, individual values of a continuous random variable, but areas under the density curve give the probability of observing specific ranges of values of the random variable. John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 27 / 29 Continuous A continuous random variable, like a discrete random variable, has a mean, variance, and standard deviation. The formulas for the mean and variance of a continuous random variable are very similar to the corresponding formulas for a discrete random variable (substituting integrals for sums): µ = σ 2 = all x all x xp(x)dx (x µ) 2 p(x)dx (If you are unfamiliar with calculus, integrals are the continuous analogs of sums but don t worry about these formulas.) John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 28 / 29
Continuous Any density curve gives the distribution of a continuous random variable. A particularly important family of random variables is the family of normal distributions, which is already familiar. Recall that a normal distribution is uniquely specified by its mean µ and standard deviation σ. An example, for the standard normal distribution, Z N(0, 1): 0.0 0.1 0.2 0.3 0.4 p(z) P(0 < Z < 2) =.4772 4 2 0 2 Z 4 John Fox (McMaster University) Soc 6Z03: Probability I Fall 2016 29 / 29