Formal Probability Basics

Similar documents
An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics

Model-based Synthesis. Tony O Hagan

Chapter 4. Probability and Probability Distributions

STA 4273H: Statistical Machine Learning

Math/Stats 425 Introduction to Probability. 1. Uncertainty and the axioms of probability

Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

CHAPTER 2 Estimating Probabilities

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Bayesian Statistics in One Hour. Patrick Lam

Bayes and Naïve Bayes. cs534-machine Learning

Lecture Note 1 Set and Probability Theory. MIT Spring 2006 Herman Bennett

Introduction to Markov Chain Monte Carlo

BAYES' THEOREM IN DECISION MAKING Reasoning from Effect to Cause

The Basics of Graphical Models

Sample Size Designs to Assess Controls

Using SAS PROC MCMC to Estimate and Evaluate Item Response Theory Models

Markov Chain Monte Carlo Simulation Made Simple

Section 5 Part 2. Probability Distributions for Discrete Random Variables

The HB. How Bayesian methods have changed the face of marketing research. Summer 2004

RELIABILITY OF SYSTEMS WITH VARIOUS ELEMENT CONFIGURATIONS

Parallelization Strategies for Multicore Data Analysis

R Simulations: Monty Hall problem

Likelihood: Frequentist vs Bayesian Reasoning

An Introduction to Basic Statistics and Probability

Basic Probability Concepts

Bayesian Phylogeny and Measures of Branch Support

Analyzing Clinical Trial Data via the Bayesian Multiple Logistic Random Effects Model

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 12 04/08/2008. Sven Zenker

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Chapter 4: Probability and Counting Rules

Binomial lattice model for stock prices

Imputing Missing Data using SAS

Basics of Statistical Machine Learning

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

Lecture 9: Bayesian hypothesis testing

Probability and Statistics

What Is Probability?

Definition and Calculus of Probability

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10

DECISION MAKING UNDER UNCERTAINTY:

A Bayesian hierarchical surrogate outcome model for multiple sclerosis

Tutorial on Markov Chain Monte Carlo

Imperfect Debugging in Software Reliability

REPEATED TRIALS. The probability of winning those k chosen times and losing the other times is then p k q n k.

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

E3: PROBABILITY AND STATISTICS lecture notes

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Probability. a number between 0 and 1 that indicates how likely it is that a specific event or set of events will occur.

THE USE OF STATISTICAL DISTRIBUTIONS TO MODEL CLAIMS IN MOTOR INSURANCE

Statistics Graduate Courses

Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80)

Notes on the Negative Binomial Distribution

Section 6.2 Definition of Probability

STA 371G: Statistics and Modeling

ST 371 (IV): Discrete Random Variables

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

The Kelly Betting System for Favorable Games.

Discrete Structures for Computer Science

1 Prior Probability and Posterior Probability

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

The Binomial Distribution

Ch5: Discrete Probability Distributions Section 5-1: Probability Distribution

Inference on Phase-type Models via MCMC

Probability definitions

Probability and statistics; Rehearsal for pattern recognition

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Message-passing sequential detection of multiple change points in networks

Bayesian probability theory

Gaussian Processes to Speed up Hamiltonian Monte Carlo

Estimation and comparison of multiple change-point models

Characteristics of Binomial Distributions

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Bayesian Statistics: Indian Buffet Process

More details on the inputs, functionality, and output can be found below.

Probabilistic Methods for Time-Series Analysis

Probability & Probability Distributions

Model Calibration with Open Source Software: R and Friends. Dr. Heiko Frings Mathematical Risk Consulting

Predicting the World Cup. Dr Christopher Watts Centre for Research in Social Simulation University of Surrey

Chapter 4 & 5 practice set. The actual exam is not multiple choice nor does it contain like questions.

Basic Bayesian Methods

Reliability estimators for the components of series and parallel systems: The Weibull model

RATIOS, PROPORTIONS, PERCENTAGES, AND RATES

CONTINGENCY (CROSS- TABULATION) TABLES

Bayesian inference for population prediction of individuals without health insurance in Florida

The mathematical branch of probability has its

A Bayesian Antidote Against Strategy Sprawl

SAS Software to Fit the Generalized Linear Model

Bayesian Analysis for the Social Sciences

The normal approximation to the binomial

If, under a given assumption, the of a particular observed is extremely. , we conclude that the is probably not

The Chinese Restaurant Process

Monte Carlo-based statistical methods (MASM11/FMS091)

Non-Life Insurance Mathematics

National Sun Yat-Sen University CSE Course: Information Theory. Gambling And Entropy

Meng-Yun Lin

Transcription:

Formal Probability Basics We like to think of Probability formally as a function that assigns a real number to an event. We denote by H the basic experimental context in which events will arise. Very often H will be a hypothesis. Its complement, is denoted by H c or H. Let E and F be any events that might occur under H. Then a probability function P(E H) (spoken as E given H) is defined as: P1 0 P(E H) 1 for all E, H. P2 P(H H) = 1 and P( H H) = 0. P3 P(E F H) = P(E H) + P(F H) whenever E F H = {φ} - whenever impossible for any two of the events E, F and H to occur. Usually consider: E F = {φ} and say they are mutually exclusive. Overview of Bayesian Statistics p. 2/15

Formal Probability Basics (contd.) If E is an event, then we denote its complement (not E) by Ē or Ec. Since E Ē = {φ}, we have from P3: P(Ē) = 1 P(E). Conditional Probability of E given F : P(E F H) = P(E F H) P(F H) We will often write EF for E F. Compound probability rule: write the above as P(E FH)P(F H) = P(EF H). Overview of Bayesian Statistics p. 3/15

Formal Probability Basics Independent Events: E and F are said to be independent (we will write E F ) if the occurrence of one does not imply the occurrence of the other. Then, P(E FH) = P(E H) and we have the following multiplication rule: P(EF H) = P(E H)P(F H). Homework: Can you formally show that if P(E FH) = P(E H) then P(F EH) = P(F H)? Marginalization: We can express P(E H) by marginalizing over the event F : P(E H) = P(EF H) + P(E F H) = P(F H)P(E FH) + P( F H)P(E FH). Overview of Bayesian Statistics p. 4/15

Application - prognosis example Joint and conditional distributions: Table 1: Survival and Stage E = early Ē = Late Marginals F = survive 0.72 0.02 0.74 F = dead 0.18 0.08 0.26 M arginals 0.90 0.10 1.00 Can you identify the different numbers with joint, conditional and marginal probabilities? Odds and log-odds: Any probability p can be expressed as Odds, O where O = p/(1 p). Natural logarithm of odds will be called logit and logit(p) = log [p/(1 p)]. Overview of Bayesian Statistics p. 5/15

Bayes Theorem Observe that: P(EF H) = P(E FH)P(F H) = P(F EH)P(E H) P(F EH) = P(F H)P(E FH). P(E H) This is Bayes Theorem, named after Reverend Thomas Bayes an English clergyman with a passion for gambling! Often this is written as: P(F EH) = P(F H)P(E FH) P(F H)P(E FH) + P( F H)P(E FH) Overview of Bayesian Statistics p. 6/15

Principles of Bayesian Statistics Two hypothesis: H 0 : excess relative risk for thrombosis for women taking a pill exceeds 2; H 1 : it is under 2. Data collected at hand from a controlled trial show a relative risk of 3.6. Probability or likelihood under the data, given our prior beliefs is P(x H); H is H 0 or H 1. Overview of Bayesian Statistics p. 7/15

Bayes Theorem updates the probability of each hypothesis as, Marginal probability: P(H x) = P(H)P(x H) P(x) P(x) = P(H 0 )P(x H 0 ) + P(H 1 )P(x H 1 ) Reexpress: P(H x) P(H)P(x H) Overview of Bayesian Statistics p. 8/15

Famous Game Show Example! Suppose you re on a game show, and you re given the choice of three doors. Behind one door is a car, behind the others, goats. You pick a door, say number 1, and the host, who knows what is behind the doors, opens another door, say number 3, which has a goat. He says to you, Do you want to pick door number 2? Is it to your advantage to switch your choice of doors? Homework Provide a formal solution to the above problem using Bayes Theorem. Overview of Bayesian Statistics p. 9/15

Likelihood and Prior Bayes theorem in English: Posterior distribution = prior likelihood ( prior likelihood) Denominator is summed over all possible priors It is a fixed normalizing factor that is (usually) extremely difficult to evaluate Curse of dimensionality Markov Chain Monte Carlo to the rescue! WinBUGS software: www.mrc-bsu.cam.ac.uk/bugs/welcome.shtml Overview of Bayesian Statistics p. 10/15

Prior Elicitation Clinician interested in π: prop. of children between age 5 9 in a particular population having asthma symptoms. Clinician has prior beliefs about π, summarized as Prior support and Prior weights Data: random sample of 15 children show 2 having asthma symptoms. Likelihood obtained from Binomial distribution: ( ) 15 π 2 (1 π) 13 2 Note: ( 15 2 ) is a constant and *can* be ignored in the computations, though they are accounted for in the next Table. Overview of Bayesian Statistics p. 11/15

Computation Table Prior Support Prior weight Likelihood Prior Likelihood Posterior 0.10 0.10 0.267 0.027 0.098 0.12 0.15 0.287 0.043 0.157 0.14 0.25 0.290 0.072 0.265 0.16 0.25 0.279 0.070 0.255 0.18 0.15 0.258 0.039 0.141 0.20 0.10 0.231 0.023 0.084 Total 1.00 0.274 1.000 Posterior: obtained by dividing Prior Likelihood with normalizing constant 0.274 HOMEWORK: Redo the posterior computations with prior weights as 1/6 for each of the above six numbers in the support. This is an uninformative prior. Redo (again!) the computations with a more informative prior assigning weights 0.4 on support values 0.14 and 0.16, and weight 0.05 on each of the remaining four values. Comment on the sensitivity of the posteriors to the priors. Overview of Bayesian Statistics p. 12/15

The Sampling Perspective Previous example: direct evaluation of the posterior probabilities. Feasible for simpler discrete problems. Modern Bayesian Analysis: Derive complete posterior densities, say p(θ y) by drawing samples from that density. Samples are of the parameters themselves, or of their functions. If θ 1,...,θ M are samples from p(θ y) then, densities are created by feeding them into a density plotter. Similarly samples from f(θ), for some function f, are obtained by simply feeding the θ i s to f( ). In principle M can be arbitrarily large it comes from the computer and only depends upon the time we have for analysis. Do not confuse this with the data sample size n which is limited in size by experimental constraints. Overview of Bayesian Statistics p. 13/15

Issues in Sampling-based analysis Direct Monte Carlo: Some algorithms can be designed to generate independent samples exactly from the posterior distribution. In these situations there are NO convergence problems or issues. Sampling is called exact. Markov Chain Monte Carlo (MCMC): In general, exact sampling may not be possible/feasible. MCMC is a far more versatile set of algorithms that can be invoked to fit more general models. Note: anywhere where direct Monte Carlo applies, MCMC will provide excellent results too. Convergence issues: There is no free lunch! The power of MCMC comes at a cost. The initial samples do not necessarily come from the desired posterior distribution. Rather, they need to converge to the true posterior distribution. Therefore, one needs to assess convergence, discard output before the convergence and retain only post-convergence samples. The time of convergence is called burn-in. Diagnosing convergence: Usually a few parallel chains are run from rather different starting points. The sample values are plotted (called trace-plots) for each of the chains. The time for the chains to mix together is taken as the time for convergence. Good news! All this is automated in WinBUGS. So, as users, we need to only configure how to specify good Bayesian models and implement them in WinBUGS. This will be the focus of the course. Overview of Bayesian Statistics p. 14/15

Principle of Predictions Classical: Impute a point-estimate of θ into model. In Bayesian analysis we summarize θ by its entire posterior distribution p(θ y). In that spirit we obtain complete predictive distributions by averaging the likelihood over the full posterior distribution. In the sampling approach, posterior density p(θ y) is simulated using samples θ 1,...,θ M. Out of sample prediction of z will be: p(z y) = θ p(z θ)p(θ y) Implementation: We sample from p(z y). For each θ i from p(θ y), this amounts to drawing z i from the data likelihood p(y θ) with θ = θ i. Overview of Bayesian Statistics p. 15/15