Review of Probability

Similar documents
The Basics of Graphical Models

Math 3C Homework 3 Solutions

Probability: Terminology and Examples Class 2, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Chapter 4 Lecture Notes

MATH 140 Lab 4: Probability and the Standard Normal Distribution

Probability & Probability Distributions

Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Lesson 1. Basics of Probability. Principles of Mathematics 12: Explained! 314

Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

Section 6-5 Sample Spaces and Probability

Probabilistic Strategies: Solutions

Lecture Note 1 Set and Probability Theory. MIT Spring 2006 Herman Bennett

36 Odds, Expected Value, and Conditional Probability

ST 371 (IV): Discrete Random Variables

Bayesian Tutorial (Sheet Updated 20 March)

6.3 Conditional Probability and Independence

A Few Basics of Probability

Question of the Day. Key Concepts. Vocabulary. Mathematical Ideas. QuestionofDay

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Hoover High School Math League. Counting and Probability

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

CHAPTER 2 Estimating Probabilities

Statistics 100A Homework 2 Solutions

An Introduction to Basic Statistics and Probability

Lecture 1 Introduction Properties of Probability Methods of Enumeration Asrat Temesgen Stockholm University

Probability Using Dice

ECE-316 Tutorial for the week of June 1-5

WHERE DOES THE 10% CONDITION COME FROM?

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Joint Exam 1/P Sample Exam 1

Homework 3 Solution, due July 16

Discrete Structures for Computer Science

Week 2: Conditional Probability and Bayes formula

REPEATED TRIALS. The probability of winning those k chosen times and losing the other times is then p k q n k.

1 Maximum likelihood estimation

Chapter 13 & 14 - Probability PART

Binomial Sampling and the Binomial Distribution

Basic Probability. Probability: The part of Mathematics devoted to quantify uncertainty

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10

Pigeonhole Principle Solutions

Math/Stats 425 Introduction to Probability. 1. Uncertainty and the axioms of probability

Business Statistics 41000: Probability 1

Factorizations: Searching for Factor Strings

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 13. Random Variables: Distribution and Expectation

Lab 11. Simulations. The Concept

Chapter What is the probability that a card chosen from an ordinary deck of 52 cards is an ace? Ans: 4/52.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

Feb 7 Homework Solutions Math 151, Winter Chapter 4 Problems (pages )

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

2. How many ways can the letters in PHOENIX be rearranged? 7! = 5,040 ways.

14.30 Introduction to Statistical Methods in Economics Spring 2009

Definition and Calculus of Probability

Conditional Probability, Independence and Bayes Theorem Class 3, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

MA 1125 Lecture 14 - Expected Values. Friday, February 28, Objectives: Introduce expected values.

Math 141. Lecture 2: More Probability! Albyn Jones 1. jones/courses/ Library 304. Albyn Jones Math 141

Contemporary Mathematics- MAT 130. Probability. a) What is the probability of obtaining a number less than 4?

Ch. 13.2: Mathematical Expectation

Probability definitions

The Chinese Restaurant Process

Stat 20: Intro to Probability and Statistics

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Chapter 4: Probability and Counting Rules

Hypothesis Testing for Beginners

Chapter 4 - Practice Problems 1

Bayesian probability theory

Unit 4 The Bernoulli and Binomial Distributions

STAT 319 Probability and Statistics For Engineers PROBABILITY. Engineering College, Hail University, Saudi Arabia

Ex. 2.1 (Davide Basilio Bartolini)

NF5-12 Flexibility with Equivalent Fractions and Pages

Name: Date: Use the following to answer questions 2-4:

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

AP Stats - Probability Review

Section 6.2 Definition of Probability

Lecture 6: Discrete & Continuous Probability and Random Variables

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Introductory Probability. MATH 107: Finite Mathematics University of Louisville. March 5, 2014

Likelihood: Frequentist vs Bayesian Reasoning

University of California, Los Angeles Department of Statistics. Random variables

The mathematical branch of probability has its

The Calculus of Probability

ACMS Section 02 Elements of Statistics October 28, 2010 Midterm Examination II Answers

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

STAT 35A HW2 Solutions

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

4. Continuous Random Variables, the Pareto and Normal Distributions

Economics 1011a: Intermediate Microeconomics

Probability Distribution for Discrete Random Variables

Chapter 3. Distribution Problems. 3.1 The idea of a distribution The twenty-fold way

Logic and Reasoning Practice Final Exam Spring Section Number

Problem of the Month: Fair Games

You flip a fair coin four times, what is the probability that you obtain three heads.

Section 6.1 Discrete Random variables Probability Distribution

Statistics and Random Variables. Math 425 Introduction to Probability Lecture 14. Finite valued Random Variables. Expectation defined

Probability: The Study of Randomness Randomness and Probability Models. IPS Chapters 4 Sections

Chapter 5. Discrete Probability Distributions

Model-based Synthesis. Tony O Hagan

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

Basic Probability Concepts

SAT Math Facts & Formulas Review Quiz

Transcription:

Review of Probability David M. Blei Columbia University September 5, 2016 Before we discuss data and models, we will learn about graphical models. Graphical models are a way of using graphs to represent and compute about families of probability distributions. In this lecture we review the concepts from probability important for understanding graphical models. The concepts are random variables, joint distributions, conditional distributions, the chain rule, marginalization, (conditional) independence, and Bayes rule. Later in the semester, we will discuss the core statistical concepts that are important for understanding how to use probability models to analyze data. To begin, consider the card problem (via David MacKay) There are three cards: (R/R); (R/B); (B/B) I go through the following process. Close my eyes and pick a card Pick a side at random Show you that side I show you R. What s the probability the other side is red too? Random variables, atoms, and events. Probability is about random variables. A random variable is any probabilistic outcome. As two examples: (a) the flip of a coin and (b) the height of someone chosen randomly from a population. It is often useful to think of quantities that are not strictly probabilistic as random variables. For example, (a) the temperature on on December 18 of next year; the temperature on 03/04/1905; (c) the number of times the term streetlight appears on the internet. Random variables take on values in a sample space. They can be discrete or continuous. For example: Coin flip: fh; T g 1

Height: positive real values.0; 1/ Temperature: real values. 1; 1/ Number of words in a document: Positive integers f1; 2; : : :g We call the values of random variables atoms. For now we will consider discrete random variables. Notation: We denote the random variable with a capital letter; denote a realization of the random variable with a lower case letter. E.g., is a coin flip, x is the value (H or T ) of that coin flip. A discrete probability distribution assigns probability to every atom in the sample space. For example, if is an (unfair) coin, then P. D H / D 0:7 P. D T / D 0:3 The probabilities over the space must sum to one P. D x/ D 1 x (Note how we use summation. This is shorthand for all possible values of x. ) A subset of atoms is called an event. (Note that atoms are events too.) The probability of a subset is a sum of probabilities of its atoms. E.g., the atoms of a die are the individual sides (1, 2, 3, 4, 5, 6). The probability that the die is bigger than 3 is P.D > 3/ D P.D D 4/ C P.D D 5/ C P.D D 6/: Here is a useful picture to understand atoms, events, and random variables: ~x x 2

An atom is a point in the box; all the atoms are the sample space. The sample space is endowed with a probability function, p.!/; it sums to one, P!2 p.!/ D 1. For example, each atom might be each possible outcome of 2 dice. The probability of each atom is the probability of obtaining exactly that configuration. An event is a subset of atoms (e.g., the sum is equal to seven). E.g., two events in this picture are x and Nx. The probability of an event is sum of probabilities of its atoms. Random variables. Technically, a random variable is a function from the sample space to the reals. However, for our purposes now, a random variable is a partition of the sample space into unique values. The distribution of the random variable is determined by the underlying probability measure on the sample space. (We will omit measure theory. A cartoon view of measure theory is that it generalizes this story to continuous spaces, and invokes the real analysis needed to do so.) Joint distributions. The central object of this class is the joint distribution. Typically, we reason about collections of random variables. The joint distribution is a distribution over the configuration of all the random variables in the collection. For example, imagine flipping 4 coins. The joint distribution is over the space of all possible outcomes of the four coins. P.HHHH / D 1=16 P.HHH T / D 1=16 P.HH TH / D 1=16 Notice you can think of this as a single random variable with 16 values. Q What is an atom here? What is an event? Conditional distributions. A conditional distribution is the distribution of a random : : : variable given the value of other random variables. 3

The quantity P. D x j Y D y/ is the probability that D x when Y D y. For example, P.I listen to Steely Dan/ D 0:5 P.I listen to Steely Dan j Toni is home/ D 0:1 P.I listen to Steely Dan j Toni is not home/ D 0:7 Note that P. D xjy D y/ is a different distribution for each value of y, P. D x j Y D y/ D 1 x P. D x j Y D y/ 1 y (at least, not necessarily): To define the conditional probability, we use this picture: ~x, ~y ~x, y x, y x, ~y We define the conditional probability as P. D x j Y D y/ D P. D x; Y D y/ ; P.Y D y/ which holds when P.Y / > 0. Here the Venn diagram represents the space of crossproduct atoms, i.e., the space of coupled with the space of Y. The conditional probability is the relative probability of D x in the space where Y D y. Now we can solve the card problem. Let 1 be the random side of the random card I chose. Let 2 be the other side of that card. Compute P. 2 D red j 1 D red/ P. 2 D red j 1 D red/ D P. 1 D red; 2 D red/ P. 1 D red/ (1) The numerator is 1/3: Only one card has two red sides. 4

The denominator is 1/2: Three sides of the six are red. The solution is 2/3. The chain rule. Conditional probability lets us derive the chain rule. The chain rule defines the joint distribution as a product of conditionals, P.; Y / D P.; Y / P.Y / P.Y / D P. j Y /P.Y / In general, for any set of N variables NY P. 1 ; : : : ; N / D P. n j 1 ; : : : ; n 1 /: nd1 Marginalization. Given a collection of random variables, we are often only interested in a subset of them. For example, compute P./ from a joint distribution P.; Y; Z/. This is called the marginal of. We compute it with marginalization, P./ D y P.; Y D y; Z D z/ z We can see how this works from the Venn diagram. Atoms are configurations of fx; y; zg and so P. D x/ sums the distribution over the set of configurations where D x. We can also derive marginalization from the chain rule, p.x; y; z/ D y z y p.x/p.y; z j x/ z D p.x/ y D p.x/: p.y; z j x/ z Bayes rule. From the chain rule and marginalization, we obtain Bayes rule, P.Y j / D Py P. j Y /P.Y / P. j Y D y/p.y D y/: 5

Example: let Y be a disease and be a symptom. From the distribution of the symptom given the disease P. j Y / and the probability of the disease P.Y /, we can compute the (more useful) distribution of the disease given the symptom P.Y j /. Bayes rule is important in Bayesian statistics, where Y is a parameter that controls the distribution of. Independence. Random variables and Y are independent if knowing about tells us nothing about Y, P.Y j / D P.Y /: This means that their joint distribution factorizes,? Y () P.; Y / D P./P.Y /: Why? It is because of the chain rule, P.; Y / D P./P.Y j / D P./P.Y /: Notice this is a property of the probability function on the sample space. This perspective is important to understanding how graphical models work. Some examples of independent random variables, Flipping a coin once / flipping the same coin a second time You use an electric toothbrush / blue is your favorite color Examples of not independent random variables: Registered as a Republican / voted for Trump The color of the sky / The time of day Q Are these independent? Two twenty-sided dice Rolling three dice and computing.d 1 C D 2 ; D 2 C D 3 / # enrolled students and the temperature outside today # attending students and the temperature outside today 6

Suppose we have two coins, one biased and one fair, P.C 1 D H / D 0:5 P.C 2 D H / D 0:7: We choose one of the coins at random Z 2 f1; 2g, flip C Z twice, and record the outcome.; Y /. Q Are and Y independent? What if we knew the value of Z, which coin was flipped? Conditional independence. Variables and Y are conditionally independent given Z if P.Y j ; Z D z/ D P.Y j Z D z/: for all possible values of z. Again, this implies a factorization? Y j Z () P.; Y j Z D z/ D P. j Z D z/p.y j Z D z/; for all possible values of z. Summary. Next we will discuss graphical models, which are a formalism for computing with and reasoning about families of joint distributions. The concepts from probability important for learning about graphical models are A random variable A joint distribution of a collection of random variables A conditional distribution The chain rule Marginalization Independence & conditional independence Later, we will review statistical concepts and models such as maximum likelihood estimation and Bayesian statistics to relate graphical models to algorithms for making estimates from data. At that point, will review the fundamental concepts of expectation, conditional expectation, and continuous random variables. 7