Business Statistics 41000: Probability 1



Similar documents
Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

A Few Basics of Probability

An Introduction to Basic Statistics and Probability

Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

Decision Making under Uncertainty

Chapter 4 Lecture Notes

Ch5: Discrete Probability Distributions Section 5-1: Probability Distribution

ST 371 (IV): Discrete Random Variables

MA 1125 Lecture 14 - Expected Values. Friday, February 28, Objectives: Introduce expected values.

Chapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS. Part 3: Discrete Uniform Distribution Binomial Distribution

4. Continuous Random Variables, the Pareto and Normal Distributions

Bayesian Tutorial (Sheet Updated 20 March)

Joint Exam 1/P Sample Exam 1

Hypothesis Testing for Beginners

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

WHERE DOES THE 10% CONDITION COME FROM?

6.042/18.062J Mathematics for Computer Science. Expected Value I

The overall size of these chance errors is measured by their RMS HALF THE NUMBER OF TOSSES NUMBER OF HEADS MINUS NUMBER OF TOSSES

Chapter 4. Probability and Probability Distributions

6.3 Conditional Probability and Independence

Probability Distribution for Discrete Random Variables

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

MAS108 Probability I

Point and Interval Estimates

STA 371G: Statistics and Modeling

CALCULATIONS & STATISTICS

MATH 140 Lab 4: Probability and the Standard Normal Distribution

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Decision Making Under Uncertainty. Professor Peter Cramton Economics 300

E3: PROBABILITY AND STATISTICS lecture notes

Week 3&4: Z tables and the Sampling Distribution of X

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10

Basic Probability. Probability: The part of Mathematics devoted to quantify uncertainty

Lecture 3: Continuous distributions, expected value & mean, variance, the normal distribution

Normal distribution. ) 2 /2σ. 2π σ

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Lecture Note 1 Set and Probability Theory. MIT Spring 2006 Herman Bennett

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

CHAPTER 2 Estimating Probabilities

Statistics 104: Section 6!

Lecture 7: Continuous Random Variables

V. RANDOM VARIABLES, PROBABILITY DISTRIBUTIONS, EXPECTED VALUE

STAT 315: HOW TO CHOOSE A DISTRIBUTION FOR A RANDOM VARIABLE

Basics of Statistical Machine Learning

Chapter 13 & 14 - Probability PART

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

Math/Stats 425 Introduction to Probability. 1. Uncertainty and the axioms of probability

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

Economics 1011a: Intermediate Microeconomics

The Binomial Distribution

Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80)

MTH6120 Further Topics in Mathematical Finance Lesson 2

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe Section 4.4 Homework

Lecture 8. Confidence intervals and the central limit theorem

Probability and Expected Value

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Sample Questions for Mastery #5

AMS 5 CHANCE VARIABILITY

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Random variables, probability distributions, binomial random variable

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe

Lecture 6: Discrete & Continuous Probability and Random Variables

Unit 4 The Bernoulli and Binomial Distributions

Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Simple Regression Theory II 2010 Samuel L. Baker

Econ 132 C. Health Insurance: U.S., Risk Pooling, Risk Aversion, Moral Hazard, Rand Study 7

Lecture 10: Depicting Sampling Distributions of a Sample Proportion

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Exam 3 Review/WIR 9 These problems will be started in class on April 7 and continued on April 8 at the WIR.

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 13. Random Variables: Distribution and Expectation

Binomial Sampling and the Binomial Distribution

The normal approximation to the binomial

Problem sets for BUEC 333 Part 1: Probability and Statistics

Probability definitions

Linear Programming Notes VII Sensitivity Analysis

Math Quizzes Winter 2009

Lab 11. Simulations. The Concept

AP Stats - Probability Review

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

16. THE NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION

Statistics 100A Homework 8 Solutions

Definition and Calculus of Probability

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

You flip a fair coin four times, what is the probability that you obtain three heads.

Unit 19: Probability Models

University of California, Los Angeles Department of Statistics. Random variables

7.S.8 Interpret data to provide the basis for predictions and to establish

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Math 151. Rumbos Spring Solutions to Assignment #22

The Math. P (x) = 5! = = 120.

Statistics and Random Variables. Math 425 Introduction to Probability Lecture 14. Finite valued Random Variables. Expectation defined

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

Mathematical goals. Starting points. Materials required. Time needed

Section 1.3 P 1 = 1 2. = P n = 1 P 3 = Continuing in this fashion, it should seem reasonable that, for any n = 1, 2, 3,..., =

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Notes on Continuous Random Variables

Two Correlated Proportions (McNemar Test)

Transcription:

Business Statistics 41000: Probability 1 Drew D. Creal University of Chicago, Booth School of Business Week 3: January 24 and 25, 2014 1

Class information Drew D. Creal Email: dcreal@chicagobooth.edu Office: 404 Harper Center Office hours: email me for an appointment Office phone: 773.834.5249 Course homepage http://faculty.chicagobooth.edu/drew.creal/teaching/index.html 2

Course schedule Week # 1: Plotting and summarizing univariate data Week # 2: Plotting and summarizing bivariate data Week # 3: Probability 1 Week # 4: Probability 2 Week # 5: Probability 3 Week # 6: In-class exam Week # 7: Statistical inference 1 Week # 8: Statistical inference 2 Week # 9: Simple linear regression Week # 10: Multiple linear regression 3

Outline of today s topics I. Discrete random variables (AWZ p. 215-216) Discrete probability distributions (AWZ p. 949-950) The Bernoulli distribution Computing the probabilities of subsets of outcomes II. Expectation and variance of a discrete random variable III. Mode of a discrete random variable IV. Conditional, marginal, and joint distributions (AWZ p. 230-236) V. Several random variables 4

Why probability? 5

Why probability? In lectures #1 and #2, we looked at various types of data in different ways. We learned to use plots and numerical summary statistics to identify patterns in the data and see how variables related to one another. If we find patterns, we can use them to predict. For example, we used regression to predict the sales price of a house given its size. 6

Why probability? To make predictions, we use a mathematical model for the relationship. However, in business and economic applications, these specifications are rarely exact. Instead of saying: if x is this, then y must be that we want to say: if x is this, then y will probably be within this range of values. Probability is a way of modelling uncertainty mathematically. 7

Example: Gallup poll Gallup (1/22/14): In U.S., 65% Dissatisfied With How Gov t System Works Sixty-five percent of Americans are dissatisfied with the nation s system of government and how well it works, the highest percentage in Gallup s trend since 2001. Dissatisfaction is up five points since last year, and has edged above the previous high from 2012 (64%)... Results:...are based on telephone interviews conducted Jan. 5-8, 2014, with a random sample of 1,018 adults...the margin of sampling error is ±3 percentage points at the 95% confidence level. Source: www.gallop.com/poll 8

Example: Rasmussen poll Rasmussen: 68% Expect NSA Phone Spying To Stay the Same or Increase Despite President Obama s announcement of tighter controls on the National Security Agency s domestic spying efforts, two-out-of-three U.S. voters think spying on the phone calls of ordinary Americans will stay the same or increase....the margin of sampling error for the full sample of 1,000 Likely Voters is ± 3 percentage points with a 95% level of confidence.. Source: www.rasmussenreports.com 9

Why probability? In the previous examples, they mention the sampling error? What do they mean by this? How are they estimating the error? 10

Why probability? Answer: they took a random sample and computed 2 0.5 1018 = 0.0313 3% 2 0.5 1000 = 0.0316 3% These calculations come from a probability model, which we will study extensively!! Importantly, this model is based on a set of assumptions that could be wrong! You need to understand these assumptions and be able to think critically about them! 11

Discrete Random Variables 12

Discrete random variables Suppose you are a manager trying to estimate the number of units of a product you will sell next quarter. Suppose you know (unrealistically) that sales will be 1, 2, 3, or 4 (thousand) units. But, you are not sure which one it will be. First, why is sales a discrete random variable? 13

Discrete random variables Let the random variable S denote sales. Since S can only take on the values 1, 2, 3, or, 4 it is a discrete random variable. A probability distribution is a way to express this uncertainty mathematically. s p(s) list of possible values 1 0.095 or outcomes 2 0.230 3 0.440 4 0.235 probability of each value 14

Discrete random variables In a probability distribution, the probabilities always sum to one by definition. s p(s) list of possible values 1 0.095 or outcomes 2 0.230 3 0.440 4 0.235 1.0 probability of each value p(s) = Prob(S = s) 15

Remarks on notation In words, the notation Prob(X = x) means the probability that the random variable X takes on the number x. It is common convention to use capital letters (or words) such as X or Z to denote a random variable. The possible values that a random variable can take on are also known as outcomes. It is common for lower case letters such as x or z to denote the outcomes. It is common to abbreviate random variable as r.v.. 16

A picture of the discrete random variable s distribution 1.0 0.8 p(s) 0.6 0.4 0.2 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 s 17

Discrete random variable A discrete random variable is a numeric quantity that can take on any one of a countable number of possible values. However, it is unknown in advance which value will occur. Remarks: This is how we quantify or model uncertainty when a random event or experiment can take on a countable number of values. We list the possible values the variable can take on (i.e. the outcomes). We assign to each number (or outcome) a probability. 18

Discrete random variable Remarks continued: A probability is a number between 0 and 1. When the probabilities are summed up over the possible outcomes, the probabilities always sum to one. The word discrete emphasizes that the number of outcomes is finite (we can create a list of them). In our sales example, there were only 4 possible outcomes for the r.v. S. Later, we will study continuous random variables which may take on a continuous range of values. 19

Example: coin tossing Imagine a random experiment where we toss two coins. We define the random variable X to be the number of heads in two tosses. We assume each coin is fair so that the probability of tossing a head or tail is 1 2. Before tossing the coins, we know that there are 3 possible outcomes: x = 0, 1, and 2. x p(x) 0 0.25 1 0.50 2 0.25 The probability distribution of the random variable X. 20

Probability distribution of a discrete r.v. The probability distribution of a discrete random variable has two parts: 1.) a list of the possible outcomes. 2.) a list of the probabilities for each outcome. x p(x) x 1 p 1 x 2 p 2.. For a discrete r.v., we can think of a probability distribution as a table. 21

Remarks on notation You will often see probabilities written as p(x) or Prob(X = x) or Pr(X = x) or P(X = x) or p X (x). These are all common notation for the same thing. It just depends on the author s preferences. With the notation p(x) it should be understood from the context that you are talking about the random variable X which may take on an outcome x. In our sales example, p(1) is the probability that our sales during the next quarter is 1,000 units. 22

Interpreting probabilities The easiest way to interpret probabilities are... Probability is a measure of uncertainty with values between 0 and 1. An outcome with a probability of 0 will basically never happen. An outcome with a probability of 1 will basically always happen. 23

Interpreting probabilities There are more philosophical ways of interpreting probabilities. Two common ways are: frequentist and subjective (Bayesian). Consider again the example where we toss two fair coins. x p(x) 0 0.25 1 0.50 2 0.25 Frequentist: In the long run, if I toss the two coins over and over and over..., I will get 1 head 50% of the time. Subjective: I am indifferent between betting on the event 1 head or the event 0 or 2 heads. 24

Interpreting probabilities Consider the sales example again. 1.0 s p(s) 1 0.095 2 0.230 3 0.440 p(s) 0.8 0.6 0.4 0.2 4 0.235 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Its s about twice as likely that we will sell 3,000 units as it is that we will sell 2,000 or 4,000 units. If all our quarters were like this, we would see sales of 1,000 units about once in every 10 quarters. s 25

Assigning Probabilities to Categorial Variables Remember that part of our definition of a random variable is that it s value always takes on a number. What about situations where we have a random experiment and the variable of interest is a categorical variable (which is typically not a number)? In this case, we just assign a number to each category. We can then assign a probability to each number. 26

Assigning Probabilities to Categorial Variables Example: For the variable Reg from the British marketing data set, we assigned each region a number. 1 Scotland 2 North West 3 North 4 Yorkshire & Humberside 5 East Midlands 6 East Anglia 7 South East 8 Greater London 9 South West 10 Wales 11 West Midlands 27

Assigning Probabilities to Categorial Variables Example: Who will win the MVP at the Super Bowl? We let the random variable M be the football player that wins the MVP. We label each player with a number. 1 Peyton Manning 4 Russell Wilson 2 Wes Welker 5 Marshawn Lynch 3 Eric Decker 6 Richard Sherman The outcomes are h = 1, 2, 3, 4, 5, 6. We then assign probabilities to each outcome. P(M = 1) = p(1) = P(Peyton Manning wins the MVP) 28

The Bernoulli (and uniform) distributions 29

The Bernoulli distribution Our fundamental discrete random variable is the dummy variable, with two outcomes 0 and 1. Some examples are. Clinical trials: T is a r.v. describing a test for a disease. T = 0 if the person does not have the disease. T = 1 if they do. Marketing example: B is a r.v. describing whether a person buys a product. B = 0 if the person does not buy the product. B = 1 if they do. Sports: Rafael Nadal is about to hit a first serve. A is a r.v. describing whether he hits an ace. A = 0 if he does not hit an ace. A = 1 if he does. 30

The Bernoulli distribution In general, we say a discrete random variable taking on only two values (such as a dummy variable) has a Bernoulli distribution. You may often hear it called a Bernoulli Trial. The value 1 is often called a success. Suppose we label this random variable X, then we have x p(x) 0 1 p 1 p Pr(X = 1) = p Notation: X Bernoulli(p) 31

The Bernoulli distribution X Bernoulli(p) means that X is a discrete r. v. with the following probability distribution: x p(x) 0 1 p 1 p where p is the probability that X equals 1. In words, the random variable X is distributed as Bernoulli with parameter p. The Bernoulli is a family of probability distributions, where each probability distribution is indexed by the parameter p. 32

The Bernoulli distribution: further examples Example: Tossing a fair coin. Let X = 1 if the toss is heads and 0 otherwise. Then X Bernoulli(0.5). x p(x) 0 0.5 1 0.5 33

The discrete Uniform distribution X Discrete Uniform means that X is a discrete r. v. taking on a finite number of values with equal probabilities. If there are N outcomes, the probabilities are all 1 N. 34

Probabilities of Subsets of Outcomes Example: Suppose we toss a fair six-sided die. Let Z denote the outcome of the toss. What is the probability that Pr(2 < Z < 5)? In other words, what is the probability that we roll a 3 or 4? z p(z) 1 1/6 2 1/6 3 1/6 4 1/6 5 1/6 6 1/6 35

Probabilities of Subsets of Outcomes To compute the probability that any one of a group of outcomes occurs we sum up their probabilities. Pr(a < X < b) = a<x <b p(x) Example: Tossing a die. Pr(2 < Z < 5) = p(3) + p(4) = 1 6 + 1 6 = 1 3 36

Probabilities of Subsets of Outcomes Sometimes, we may also want to know if something is greater (less) than or equal to! Example: Tossing a die. Pr(2 Z < 5) = p(2) + p(3) + p(4) = 1 6 + 1 6 + 1 6 = 1 2 37

Probabilities of Subsets of Outcomes Example: Let s return to our sales example where S denotes the sales of units of our product (in thousands). s p(s) 1 0.095 2 0.230 3 0.440 4 0.235 What is the probability that we sell more than 1,000 units next quarter? Pr(S > 1) = p(2) + p(3) + p(4) = 0.23 + 0.44 + 0.235 = 0.905 38

Probabilities of Subsets of Outcomes Example: let s do it again! s p(s) 1 0.095 2 0.230 3 0.440 4 0.235 What is the probability that we sell more than 1,000 units next quarter? We could have done it like this: Pr(S > 1) = Pr(S 1) = 1 p(1) = 0.905 39

Probabilities of Subsets of Outcomes Example: one more time! s p(s) 1 0.095 2 0.230 3 0.440 4 0.235 What is the probability that we sell 3000 units or less next quarter? Pr(S 3) = Pr(1) + Pr(2) + Pr(3) = 0.095 + 0.23 + 0.44 = 0.765 40

Probabilities of Subsets of Outcomes Here are two helpful reminders 1. OR means ADD Pr(X = a OR X = b) = p(a) + p(b) As long as two events cannot both happen, the probability of either is the sum of the probabilities. 2. NOT means ONE MINUS Pr(X a) = 1 p(a) The probability that something does NOT happen is one minus the probability that it does. 41

Expectation and Variance of a Random Variable 42

Expectation of a discrete random variable Example: consider again the random variable S denoting sales. s p(s) 1 0.095 2 0.230 3 0.440 4 0.235 Now, imagine your boss asks you to predict sales next quarter. You have to come up with one number (a guess ) even though you are not sure. What number would you choose? 43

Expectation of a discrete random variable One option (and not the only one) is to report the expected value. The expected value of a discrete random variable is: E [X ] = all x x p(x) In words, the expected value is the sum of the possible outcomes x where each one is weighted by its probability p(x). IMPORTANT: This is similar to the sample mean but this is NOT the same thing. We will discuss this later on below. 44

Computing the expected value Example: consider again the random variable S which denotes the sales of our product. s p(s) 1 0.095 2 0.230 3 0.440 4 0.235 E (S) =.095 1 +.23 2 +.44 3 +.235 4 = 2.815 Yes. It does seem weird that 2.815 which is our guess for sales is not one of the possible values. Think of this as saying we think sales is likely to be somewhere around 3 thousand units, but it s more likely to be under 3 thousand than over. 45

Notation for the expected value Different authors use different notation for the expected value E [X ] including E(X ) and E [X ]. It is common notation in statistics to use the Greek symbol µ or µ x which is pronounced as mu. We often say mean instead of expected value. What we mean by this is that the expected value of X is the mean of the r.v. X. 46

Sample Mean vs. the Expected Value The Sample Mean variable: in a data set, it is the observed set of values sample mean: of a variable in n our data is 1 n x i i=1 It is the average of the observed values in the data set. The Expected Value random variable: a mathematical model for an uncertain quantity expected value (mean): of a r.v. is E [X ] = all x x p(x) Average of the possible values taken by a r.v. weighted by their probabilities. 47

Expected Value of a Function of a Discrete Random Variable Sometimes we will be interested in the expected value of some function of a random variable. For example, let W be the prize a game show contestant ends up with. Example: Deal or No Deal George has cases worth $5, $400, $10,000, and $1,000,000 remaining. There are 4 outcomes and each is equally likely. The banker s offer is $189,000. The expected value is E [W ] =.25 5 +.25 400 +.25 10, 000 +.25 1, 000, 000 = $252, 601.30 Is the banker s offer a good or bad deal? 48

Expected Value of a Function of a Discrete Random Variable BUT, this assumes people choose based on expected values. Economists believe in diminishing marginal utility of income. The more wealth you have the less utility you get from each additional $1. This is often modeled with a utility function over wealth. Let s assume something simple such as U(W ) = W. U(W) Utility function 500 400 300 200 100 0 25000 50000 75000 100000 125000 150000 175000 200000 225000 250000 275000 300000 Wealth 49

Expected Value of a Function of a Discrete Random Variable [ W ] To compute the expected value E [f (W )] = E in the case of a discrete random variable, just take the function f (.) of each possible outcome, then multiply by the probability and add them together. What is George s expected utility? [ ] E W =.25 5 +.25 400 +.25 10, 000 +.25 1, 000, 000 = 288.56 Compare this to the utility of the banker s offer: 189, 000 = 434.74 50

Variance of a Discrete Random Variable To understand how much the discrete random variable X varies about its mean (expected value), we define the variance. The variance of a discrete random variable X is: Var [X ] = all x p(x) (x µ x) 2 = E [(X µ x ) 2] In words, the variance is the expected squared distance of the r.v. X from its mean. If we take µ x to be our prediction for X, you can think of it as a weighted average of the squared prediction error. 51

Variance of a Discrete Random Variable Example: consider again the random variable S denoting sales next quarter. s p(s) 1 0.095 2 0.230 3 0.440 4 0.235 Imagine your boss asks you to also report the uncertainty associated with your predicted sales next quarter. E [S] = 2.815 V [S] =.095 (1 2.815) 2 +.23 (2 2.815) 2 +.44 (3 2.815) 2 +.235 (4 2.815) 2 = 0.811 The units are in squared thousands of units sold. 52

Remarks on Notation For the variance Var [X ] of X, it also common to use the abbreviated version V [X ]. It is common notation in statistics to use the Greek symbol σ 2 or σ 2 x which is pronounced as sigma squared. 53

Standard deviation of a Discrete Random Variable The standard deviation of a discrete random variable X is: σ X = σ 2 X The standard deviation of a random variable X is the square root of the variance of X. 54

Example: consider again the random variable S denoting sales next quarter. Consider two different distributions for sales denoted by p 1 (s) and p 2 (s). 1.0 s p 1 (s) p 2 (s) 1 0.01 0.30 2 0.10 0.30 3 0.80 0.20 4 0.09 0.20 p(s) 0.8 0.6 0.4 0.2 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Which distribution (p 1 (s) or p 2 (s)) has the larger expected value and/or variance? (Answers on the next slide.) p 2 S p 1 55

Example: If these were the distributions, what are the expected values and variances? E 1 (S) =.01 1 +.1 2 +.8 3 +.09 4 = 2.97 E 2 (S) =.3 1 +.3 2 +.2 3 +.2 4 = 2.3 V 1 (S) =.01 (1 2.97) 2 +.1 (2 2.97) 2 +.8 (3 2.97) 2 +.09 (4 2.97) 2 = 0.2291 V 2 (S) =.3 (1 2.3) 2 +.3 (2 2.3) 2 +.2 (3 2.3) 2 +.2 (4 2.3) 2 = 1.21 (NOTE: the notation E 1 (S) and V 1 (S) are the mean and variance of the first probability distribution p 1 (s).) 56

Example: consider again the random variable S denoting sales next quarter. Consider three more distributions for sales denoted by p 3 (s), p 4 (s), and p 5 (s). (NOTE: p 4 (s) is the same as the original distribution above.) 1.0 0.9 0.8 s p 3 (s) p 4 (s) p 5 (s) 1 0.20 0.095 0.05 2 0.30 0.230 0.20 3 0.30 0.440 0.50 4 0.20 0.235 0.25 p(s) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 p 4 (s) p 3 (s) p 5 (s) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 S The means are E 3 (S) = 2.5, E 4 (S) = 2.815, E 5 (S) = 2.95 while the variances are V 3 (S) = 1.05, V 4 (S) = 0.811, V 5 (S) = 0.648. 57

Mean and Variance of a Bernoulli Distribution Suppose X Bernoulli(p) then the mean and variance are E (X ) = p 1 + (1 p) 0 = p V (X ) = p (1 p) 2 + (1 p) (0 p) 2 = p (1 p) [(1 p) + p] = p (1 p) For what value of p is the mean the smallest (biggest)? For what value of p is the variance the smallest (biggest)? 58

Final Comments on the Mean and Variance The sample mean, sample variance, and sample standard deviation of a set of numbers are sample statistics computed from observed data. The mean, variance, and standard deviation of a random variable are properties of its probability distribution which is a mathematical model of uncertainty. They do share a lot of the same properties. The distinction between them is subtle but important for later on in the course!! 59

The mode of a discrete distribution For a discrete r.v. X, the mode of its probability distribution is the most likely value. In other words, the mode is the outcome x that has the largest probability. The mode does not have to be unique because there could be multiple outcomes that share the largest probability. 60

The mode of a discrete distribution Consider the two different distributions for sales S denoted by p 1 (s) and p 2 (s). 1.0 s p 1 (s) p 2 (s) 1 0.01 0.30 2 0.10 0.30 3 0.80 0.20 4 0.09 0.20 p(s) 0.8 0.6 0.4 0.2 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 What are the modes of the distributions p 1 (s) and p 2 (s)? p 2 S p 1 61

Remarks on the mode The mode of a probability distribution is not the same thing as the sample mode. The sample mode is the value that occurs most frequently in a dataset. For discrete numeric data, you may occasionally see the sample mode reported. 62

Conditional, Marginal, and Joint Distributions 63

Conditional, Marginal, and Joint Distributions What happens when there are two (or more) variables that we are uncertain about? How do we describe them probabilistically? We want to use probability to understand how two (or more) variables are related. In this section, we extend the results above to more than one variable. 64

Extending the sales example to include economic conditions Example: consider a slightly more complicated but (potentially) more realistic version of our sales example where we also take into account the condition of the economy We want to think about the economy and our sales together, that is jointly. For simplicity, our model thinks of the economy next quarter as either up or down. It is a Bernoulli random variable! 65

Extending the sales example to include economic conditions Example continued: Again, the random variable S denotes sales (in thousands of units) next quarter. Let E denote the economy next quarter where E = 1 if the economy is up and E = 0 if it is down. How can we think about E and S together? 66

Example continued: First: what do we think will happen with the economy? Up or down? Second: given the economy is up (down), what will happen to sales? Suppose we know which of course implies that Our model for the economy is p(e = 1) = p(up) = 0.7 p(e = 0) = p(down) = 0.3 E Bernoulli(0.7) 67

Example continued: Question: If the economy is up, will it be more or less likely that sales will take on higher values? How can we represent this mathematically? 68

Example continued: Answer: Specify two different probability distributions for S, one for each possible value that E can take on! p(s = s E = 1): the distribution of sales given that the economy is up. p(s = s E = 0): the distribution of sales given that the economy is down. 69

Example continued: Suppose we decide s p(s E = 1) s p(s E = 0) 1 0.05 1 0.20 2 0.20 2 0.30 3 0.50 3 0.30 4 0.25 4 0.20 These are called conditional probability distributions. (NOTE: These are the same as the earlier distributions p 3 (s) and p 5 (s).) Conditional on the economy being up (E = 1), sales of our product are more likely to be higher than when the economy is down (E = 0). If our product is actually procyclical, then this is likely to be a better model of reality than our earlier model. 70

We just defined two different probability distributions for the random variable S depending on the value of E. We can easily compute the expected value and variance of each of these distributions. s p(s E = 1) s p(s E = 0) 1 0.05 1 0.20 2 0.20 2 0.30 3 0.50 3 0.30 4 0.25 4 0.20 E (S E = 1) =.05 1 +.2 2 +.5 3 +.25 4 = 2.95 E (S E = 0) =.2 1 +.3 2 +.3 3 +.2 4 = 2.5 These are called conditional means. 71

We can also compute the variances of the conditional probability distributions V (S E = 1) =.05 (1 2.95) 2 +.2 (2 2.95) 2 +.5 (3 2.95) 2 +.25 (4 2.95) 2 =.05 2.8025 +.2 0.9025 +.5 0.0025 +.25 1.1025 = 0.1901 + 0.1805 + 0.00125 + 0.2756 = 0.6475 V (S E = 0) =.2 (1 2.5) 2 +.3 (2 2.5) 2 +.3 (3 2.5) 2 +.2 (4 2.5) 2 =.2 2.25 +.3 0.25 +.3 0.25 +.2 2.25 = 0.45 + 0.075 + 0.075 + 0.45 = 1.05 These are called conditional variances. 72

Conditional means and variances The mean (variance) of the conditional distribution is called a conditional mean (variance). The distributions p (E S) and p (S E) are both conditional distributions. Both of these distributions have a conditional mean. E [E S] = all e E [S E] = all s e p (E = e S) s p (S = s E) The conditional mean of p (E S) depends on the outcome of the random variable S. 73

Example continued: We ve said what we think will happen for the economy E. We ve said what we think will happen for sales S given we know E. What will happen for E and S jointly? 70% of the time the economy goes up, and 1/4 of those times sales = 4. 25% of 70% is 17.5% Pr(S = 4 and E = 1) = Pr(E = 1) Pr(S = 4 E = 1) =.7.25 =.175 74

Computing joint probabilities There are eight possible outcomes for (S, E). 0.25 S = 4 P(S = 4 and E = 1) = 0.7 * 0.25 = 0.175 0.5 S = 3 P(S = 3 and E = 1) = 0.7 * 0.5 = 0.35 0.7 0.3 E = 1 (UP) E = 0 (DOWN) 0.2 0.05 0.2 0.3 0.3 0.2 S = 2 P(S = 2 and E = 1) = 0.7 * 0.20 = 0.14 S = 1 P(S = 1 and E = 1) = 0.7 * 0.05 = 0.035 S = 4 P(S = 4 and E = 0) = 0.3 * 0.25 = 0.06 S = 3 P(S = 3 and E = 0) = 0.3 * 0.3 = 0.09 S = 2 P(S = 2 and E = 0) = 0.3 * 0.3 = 0.09 S = 1 P(S = 1 and E = 0) = 0.3 * 0.2 = 0.06 75

When both variables are discrete, we can display the joint probability distribution of (E, S) in a table: (e, s) Pr(E = e and S = s) (1,4) 0.175 (1,3) 0.350 (1,2) 0.140 (1,1) 0.035 (0,4) 0.060 (0,3) 0.090 (0,2) 0.090 (0,1) 0.060 There are eight possible values for the pair of random variables (E, S). We list the eight outcomes. Then, we list the probability of each outcome (which we calculated on the previous slide). 76

When there are only two discrete random variables, we can also display the joint distribution of E and S in a different table. Rows are values of E, columns are values of S. S 1 2 3 4 E 0 0.060 0.090 0.090 0.060 1 0.035 0.140 0.350 0.175 What is the probability that Pr(E = 1 and S = 4)? Answer: 0.175 If we don t know anything about E, what is Pr(S = 4)? Answer:.06 +.175 =.235 = p S (4) 77

Marginal distributions What is the probability of S if we know nothing about E? S 1 2 3 4 E 0 0.060 0.090 0.090 0.060 1 0.035 0.140 0.350 0.175 p S (s) 0.095 0.230 0.440 0.235 To obtain the probability distribution p S (s), we add the joint probabilities for each outcome (i.e. add downwards). For example: P S (1) = P(S = 1, E = 0) + P(S = 1, E = 1) = 0.06 + 0.035 = 0.095 78

Marginal distributions What is the probability of E if we know nothing about S? S 1 2 3 4 p E (e) E 0 0.060 0.090 0.090 0.060 0.300 1 0.035 0.140 0.350 0.175 0.700 p S (s) 0.095 0.230 0.440 0.235 To obtain the probability distribution p E (e), we add the joint probabilities for each outcome (i.e. add sideways). For example: P E (0) = P(S = 1, E = 0) + P(S = 2, E = 0) + P(S = 3, E = 0) + P(S = 4, E = 0) = 0.060 + 0.090 + 0.090 + 0.060 = 0.3 79

Marginal distributions S 1 2 3 4 p E (e) E 0 0.060 0.090 0.090 0.060 0.300 1 0.035 0.140 0.350 0.175 0.700 p S (s) 0.095 0.230 0.440 0.235 The distributions p E (e) and p S (s) are called marginal distributions. Why? 80

Conditional versus Marginals Remember the three distributions p 3 (s), p 4 (s), and p 5 (s) from above? 1.0 It turns out that: 0.9 0.8 p 3 (s) = p (s E = 0) p 4 (s) = p S (s) p 5 (s) = p (s E = 1) p(s) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 p 4 (s) p 3 (s) p 5 (s) p 4 (s) is the marginal distribution. 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Notice that it lies in-between the two conditional distributions. S 81

Conditional Probability Distribution The conditional probability that Y turns out to be y given you know that X = x is denoted by Pr(Y = y X = x) In words, the conditional prob. dist. of the random variable Y conditional on X is the probability that Y = y given that we know X = x. A conditional probability distribution is a new probability distribution for the random variable Y given that we know X = x. (NOTE: In our example, S was analagous to Y and E to X.) 82

Joint Probability Distribution The joint probability that Y turns out to be y and that X turns out to be x is denoted by Pr(Y = y, X = x) = Pr(Y = y and X = x) In words, a joint probability distribution specifies the probability that Y = y AND X = x. It describes our uncertainty over both Y and X at the same time. (NOTE: In our example, S was analagous to Y and E to X.) 83

Remarks on Notation The notation for the conditional, marginal, and joint distributions often gets abused and may be confusing. For the joint distribution, you may often see: P(Y = y, X = x) = Pr(Y = y and X = x) = p(y, x). The order in which the variables are written does not matter. Pr(Y = y and X = x) is the same as Pr(X = x and Y = y). For the conditional distribution, authors often write: P(Y = y X = x) = p(y x) For the marginal distribution, we can use p X (x) or p(x) or P(X = x) as before. These are all the same. 84

Two Important Relationships Relationship between Joint and Conditional p(y, x) = p(x) p(y x) = p(y) p(x y) Relationship between Joint and Marginal p(x) = y p(y, x) p(y) = x p(y, x) 85

Example: consider again the sales example with the r.v.s (S, E). JOINT: p(4, 1) = 0.175 In words, What s the chance the economy is up AND sales is 4 units? CONDITIONAL: p(4 1) = 0.25 In words, GIVEN you know the economy is up, what is the chance sales turns out to be 4 units? 86

Example continued: MARGINAL: p(4) = p S (4) =.235 =.175 +.06 In words, What s the chance sales turns out to be 4 units? MARGINAL: p(1) = p E (1) =.7 =.175 +.35 +.14 +.035 In words, What s the chance the economy will be up? (NOTE: This last one can be a bit confusing because p(1) is ambiguous as both E and S can take on the value 1.) 87

Conditionals from Joints We derived the joint distribution of (E, S) by first considering the marginal of E and then thinking about the conditional distribution of S E. An alternative approach is to start with a joint distribution p(y, x) and the marginal p X (x) and then obtain the conditional distribution. p(y, x) = p X (x)p(y x) => p(y x) = p(y, x) p X (x) (Note: in the expression on the left, the denominator is the marginal probability.) 88

Example: given that the economy is up (E = 1), what is the probability that sales is 4? S 1 2 3 4 p E (e) E 0 0.060 0.090 0.090 0.060 0.300 1 0.035 0.140 0.350 0.175 0.700 p S (s) 0.095 0.230 0.440 0.235 Using the marginal P(E = 1) and joint probabilities P(S = 4, E = 1) we have P(S = 4 E = 1) = P(S = 4, E = 1) P(E = 1) = 0.175 0.7 = 0.25 89

Example: given that sales is (S = 4), what is the probability that the economy is up? S 1 2 3 4 p E (e) E 0 0.060 0.090 0.090 0.060 0.300 1 0.035 0.140 0.350 0.175 0.700 p S (s) 0.095 0.230 0.440 0.235 Using the marginal P(S = 4) and joint probabilities P(S = 4, E = 1) we have P(E = 1 S = 4) = P(S = 4, E = 1) P(S = 4) = 0.175 0.235 = 0.745 (NOTE: even though we started with distributions for E and S E, we can still calculate p(e S).) 90

In general, you can compute the joint from marginals and conditionals and the other way around. Which way you do it will depend on the problem. Example: suppose you toss two fair coins: X is the first, Y is the second. (NOTE : X = 1 is a head). What is P(X = 1 and Y = 1) = P(two heads)? There are 4 possible outcomes for the two coins and each is equally likely so it is 1 4. P(X = 1 and Y = 1) = P(X = 1)P(Y = 1 X = 1) = 1 1 = 1/4. 2 2 91

Bayes Theorem 92

Bayes Theorem In many situations, you will know one conditional distribution p(y x) and the marginal distribution p X (x) but you are really interested in the other conditional distribution p(x y). Given that we know p(y x) and p X (x), can we compute p(x y)? 93

Example: Testing for a Disease Let D = 1 indicate you have a certain (rare) disease and let T = 1 indicate that you tested positive for it. Suppose we know the marginal probabilities, P(D = 1), and the conditional probabilities P(T = 1 D = 1) and P(T = 1 D = 0). 0.95 T = 1 P(D = 1 and T = 1) = 0.02 * 0.95 = 0.019 0.02 0.98 D = 1 D = 0 0.05 0.01 0.99 T= 0 P(T = 0 and D = 1) = 0.02 * 0.05 = 0.001 T= 1 P(T = 1 and D = 0) = 0.98 * 0.01 = 0.0098 T= 0 P(T = 0 and D = 0) = 0.98 * 0.99 = 0.9702 94

We start with info about D and T D. But if you are the patient who tests positive for a disease you care about P(D = 1 T = 1)! Given that you have tested positive, what is the probability that you have the disease? D 0 1 T 0 0.9702 0.001 1 0.0098 0.019 P(D = 1 T = 1) = P(D = 1, T = 1) P(T = 1) =.019 (.019 +.0098) = 0.66 95

Bayes Theorem Computing p(x y) from p X (x) and p(y x) is called Bayes Theorem. p(x y) = p(y, x) p Y (y) = p(y, x) p = p(y, x) allx X (x)p(y x) allx p X (x)p(y x) Example: (from the last slide...) p(d = 1 T = 1) = p(t = 1 D = 1)p(D = 1) p(t = 1 D = 1)p(D = 1) + p(t = 1 D = 0)p(D = 0) 96

Bayes Theorem Suppose that 52% of the U.S. population is currently Democrat and the remainder is Republican. Let the r.v. D = 1 if a person is a Democrat and zero otherwise. Recently, a poll was taken asking each voter their party and whether or not they would vote for the healthcare bill. Let the r.v. H = 1 if they would vote for the bill and zero otherwise. The results of the poll indicated that 55% of Democrats would vote for the healthcare bill while only 10% of Republicans would. A distant friend of yours said that if given the chance she would vote for the bill. For her, what is P(D = 1 H = 1)? 97

Bayes Theorem We can apply Bayes Theorem p(d = 1 H = 1) = p(h = 1 D = 1)p(D = 1) p(h = 1 D = 1)p(D = 1) + p(h = 1 D = 0)p(D = 0) = (0.55) (0.52) (0.55) (0.52) + (0.10) (0.48) = 0.286 0.286 + 0.048 = 0.856 98

Many Random Variables 99

Many Random variables As we have seen in looking at data, we often want to think about more than two variables at a time. We can extend the approach we used with two variables. Suppose we have three random variables (Y 1, Y 2, Y 3 ). p(y 1, y 2, y 3 ) = p(y 3 y 2, y 1 )p(y 2 y 1 )p(y 1 ) The joint distribution of all three variables can be broken down into the marginal and conditionals distributions. 100

Sampling without Replacement Example: Suppose we have 10 voters. 4 are Republican and 6 are Democrat. We randomly choose 3. Let Y i = 1 if the i-th voter chosen is a Democratic and 0 if they are Republican for i = 1, 2, 3. What is the probability of three Democrats? In other words, P(Y 1 = 1, Y 2 = 1, Y 3 = 1) = p(1, 1, 1)? 101

Sampling without Replacement The answer is p(y 1 = 1)p(Y 2 = 1 Y 1 = 1)p(Y 3 = 1 Y 1 = 1, Y 2 = 1) = 6 5 4 10 9 8 = 1 6 Step 1: p(y 1 = 1) = 6 10 because 6 out of 10 voters are Democrats. Step 2: p(y 2 = 1 Y 1 = 1) = 5 9 because conditional on Y 1 = 1 there are now only 9 voters remaining and 5 are Democrats. Step 3: p(y 3 = 1 Y 2 = 1, Y 1 = 1) = 4 8 because conditional on Y 1 = 1 and Y 2 = 1 there are only 8 voters remaining and 4 are Democrats. Key Point: If Y 1 = 1 then we do not replace the Democrat that was chosen first (and so on). This person can t be chosen again. 102

Example continued: There are a total of 8 outcomes. The logic behind how each probability is calculated is the same as on the last slide. (y 1, y 2, y 3 ) p(y 1, y 2, y 3 ) (0,0,0) 1/30 (0,0,1) 1/10 (0,1,0) 1/10 (1,0,0) 1/10 (0,1,1) 1/6 (1,0,1) 1/6 (1,1,0) 1/6 (1,1,1) 1/6 What is the marginal distribution of Y 1? Find all the outcomes where Y 1 = 1 and add the probabilities. p(y 1 = 1) = p(1, 0, 0) + p(1, 0, 1) + p(1, 1, 0) + p(1, 1, 1) = 1 10 + 1 6 + 1 6 + 1 6 = 6 10 103

Many Random variables Above, we had three random variables (Y 1, Y 2, Y 3 ). Then, we decomposed their joint distribution as p(y 1, y 2, y 3 ) = p(y 3 y 2, y 1 )p(y 2 y 1 )p(y 1 ) This is true for as many variables as you want. p(y 1, y 2,..., y n ) = p(y n y n 1, y n 2,..., y 2, y 1 )... p(y 3 y 2, y 1 )p(y 2 y 1 )p(y 1 ) This is important because it allows us to extend our results to n random variables. 104