Introduction to Markov Chain Monte Carlo



Similar documents
Gaussian Processes to Speed up Hamiltonian Monte Carlo

Tutorial on Markov Chain Monte Carlo

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Introduction to Monte Carlo. Astro 542 Princeton University Shirley Ho

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data

1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let

Bayesian Statistics: Indian Buffet Process

Random access protocols for channel access. Markov chains and their stability. Laurent Massoulié.

LECTURE 4. Last time: Lecture outline

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Dirichlet Processes A gentle tutorial

Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 12 04/08/2008. Sven Zenker

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Inference on Phase-type Models via MCMC

Perron vector Optimization applied to search engines

Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

Chenfeng Xiong (corresponding), University of Maryland, College Park

Polarization codes and the rate of polarization

STA 4273H: Statistical Machine Learning

Optimising and Adapting the Metropolis Algorithm

The Basics of Graphical Models

Markov Chain Monte Carlo and Numerical Differential Equations

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data

On the mathematical theory of splitting and Russian roulette

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

M/M/1 and M/M/m Queueing Systems

Parallelization Strategies for Multicore Data Analysis

Mathematical finance and linear programming (optimization)

9th Max-Planck Advanced Course on the Foundations of Computer Science (ADFOCS) Primal-Dual Algorithms for Online Optimization: Lecture 1

11. Time series and dynamic linear models

Estimating the evidence for statistical models

Markov Chain Monte Carlo Simulation Made Simple

Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization

Bayesian Statistics in One Hour. Patrick Lam

Model-based Synthesis. Tony O Hagan

Monte Carlo Methods in Finance

Statistics Graduate Courses

Multi Skilled Staff Scheduling with Simulated Annealing

Math 526: Brownian Motion Notes

Welcome to the course Algorithm Design

Message-passing sequential detection of multiple change points in networks

Lecture 13: Martingales

Monte Carlo Simulation

Hydrodynamic Limits of Randomized Load Balancing Networks

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

The Monte Carlo Framework, Examples from Finance and Generating Correlated Random Variables

Monte Carlo-based statistical methods (MASM11/FMS091)

Basics of Statistical Machine Learning

Model Calibration with Open Source Software: R and Friends. Dr. Heiko Frings Mathematical Risk Consulting

MASTER S THESIS for stud.techn. Kristian Stormark. Faculty of Information Technology, Mathematics and Electrical Engineering NTNU

Incorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care.

Echantillonnage exact de distributions de Gibbs d énergies sous-modulaires. Exact sampling of Gibbs distributions with submodular energies

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

Reinforcement Learning with Factored States and Actions

An Introduction to Basic Statistics and Probability

Monte Carlo-based statistical methods (MASM11/FMS091)

Computational Statistics for Big Data

Topic models for Sentiment analysis: A Literature Survey

MCMC-Based Assessment of the Error Characteristics of a Surface-Based Combined Radar - Passive Microwave Cloud Property Retrieval

IEOR 6711: Stochastic Models, I Fall 2012, Professor Whitt, Final Exam SOLUTIONS

Master s Theory Exam Spring 2006

What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference

Microclustering: When the Cluster Sizes Grow Sublinearly with the Size of the Data Set

Performance Analysis of a Telephone System with both Patient and Impatient Customers

Christfried Webers. Canberra February June 2015

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

Scheduling Shop Scheduling. Tim Nieberg

Course: Model, Learning, and Inference: Lecture 5

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

Lecture 7: Continuous Random Variables

Statistical Machine Learning

Generating Valid 4 4 Correlation Matrices

Parametric fractional imputation for missing data analysis

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT

LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS. 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method

Program description for the Master s Degree Program in Mathematics and Finance

Visualization of Collaborative Data

Some Examples of (Markov Chain) Monte Carlo Methods

5. Continuous Random Variables

An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment

Load Balancing and Switch Scheduling

Gossip Algorithms. Devavrat Shah MIT

The Advantages and Disadvantages of Online Linear Optimization

Choosing Probability Distributions in Simulation

Average System Performance Evaluation using Markov Chain

Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes

Hacking-proofness and Stability in a Model of Information Security Networks

Analysis of an Artificial Hormone System (Extended abstract)

Walk, Not Wait: Faster Sampling Over Online Social Networks

Lecture 2: Universality

A Sublinear Bipartiteness Tester for Bounded Degree Graphs

Computational Game Theory and Clustering

Marshall-Olkin distributions and portfolio credit risk

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Bias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes

An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics

Lectures 5-6: Taylor Series

Negative Examples for Sequential Importance Sampling of. Binary Contingency Tables

HT2015: SC4 Statistical Data Mining and Machine Learning

A linear algebraic method for pricing temporary life annuities

Transcription:

Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem solving technique decision/optimization/value problems generic, but not necessarily very efficient Based on - Neal Madras: Lectures on Monte Carlo Methods; AMS 2002

Lecture Outline Markov Chains notation & terminology fundamental properties of Markov Chains Sampling from prob. distributions using MCMC uniform desired target distribution Problem solving using MCMC optimization Relevance to Bayesian Networks

Markov Chains Notation & Terminology Countable (finite) state space Ω (e.g. N) Sequence of random variables {X t } on Ω for t =0,1,2,... Definition: {X t } is a Markov Chain if P[X t+1 = y X t =x t,...,x 0 =x 0 ] = P[X t+1 =y X t =x t ] Notation: P[X t+1 = i X t = j ] = p ji time-homogeneous

Markov Chains Examples Markov Chain Drawing a number from {1,2,3} with replacement. X t = last number seen at time t NOT a Markov Chain Drawing a number from {1,2,3} WITHOUT replacement. X t = last number seen at time t

Markov Chains Notation & Terminology Let P = (p ij ) transition probability matrix dimension Ω x Ω Let t (j) = P[X t = j] 0 initial probability distribution Then t (j) = i t-1 (i)p ij = ( t-1 P)(j) = ( o P t )(j) Example: graph vs. matrix representation

Markov Chains Fundamental Properties Theorem: Under some conditions (irreducibility and aperiodicity), the limit lim t P t exists and is ij independent of i; call it (j). If Ω is finite, then j (j) = 1 and ( P)(j) = (j) and such is a unique solution to xp=x ( is called a stationary distribution) Nice: no matter where we start, after some time, we will be in any state j with probability ~ (j) DEMO

Markov Chains Fundamental Properties Proposition: Assume a Markov Chain with discrete state space Ω. Assume there exist positive distribution on Ω ( (i)>0 and i (i) = 1) and for every i,j: (i)p ij = (j)p ji (detailed balance property) then is the stationary distribution of P Corollary: If transition matrix P is symmetric and Ω finite, then the stationary distribution is (i)=1/ Ω DEMO

Markov Chain Monte Carlo Random Walk on {0,1} m Ω={0,1} m generate chain: pick J {1,...,m} uniformly at random and set X t =(z 1,...,1-z J,...,z m ) where (z 1,...,z m )=X t-1 Markov Chain Monte Carlo basic idea: Given a prob. distribution on a set Ω, the problem is to generate random elements of Ω with distribution. MCMC does that by constructing a Markov Chain with stationary distribution and simulating the chain.

MCMC: Uniform Sampler Problem: sample elements uniformly at random from set (large but finite) Ω Idea: construct an irreducible symmetric Markov Chain with states Ω and run it for sufficient time by Theorem and Corollary, this will work Example: generate uniformly at random a feasible solution to the Knapsack Problem

MCMC: Uniform Sampler Example Knapsack Problem Definition Given: m items and their weight w i and value v i, knapsack with weight limit b Find: what is the most valuable subset of items that will fit into the knapsack? Representation: z=(z 1,...,z m ) {0,1} m, z i means whether we take item i feasible solutions Ω = { z {0,1} m ; i w i z i b} problem: maximize i v i z i subject to z Ω

MCMC Example: Knapsack Problem Uniform sampling using MCMC: given current X t =(z 1,...,z m ), generate X t+1 by: (1) choose J {1,...,m} uniformly at random (2) flip z J, i.e. let y = (z 1,...,1-z J,...,z m ) (3) if y is feasible, then set X t+1 = y, else set X t+1 = X t Comments: P ij is symmetric uniform sampling how long should we run it? can we use this to find a good solution?

MCMC Example: Knapsack Problem Can we use MCMC to find good solution? Yes: keep generating feasible solutions uniformly at random and remember the best one seen so far. this may take very long time, if number of good solutions is small Better: generate good solutions with higher probability => sample from a distribution where good solutions have higher probabilities (z) = C -1 exp( i v i z i )

MCMC: Target Distribution Sampler Let Ω be a countable (finite) state space Let Q be a symmetric transition prob. matrix Let be any prob. distribution on Ω s.t. (i)>0 the target distribution we can define a new Markov Chain {X i } such that its stationary distribution is this allows to sample from Ω according to

MCMC: Metropolis Algorithm Given such Ω,,Q creates a new MC {X t }: (1) choose proposal y randomly using Q P[Y=j X t = i ] = q ij (2) let = min{1, (Y)/ (i)} (acceptance probability) (3) accept y with probability, i.e. X t+1 =Y with prob., X t+1 =X t otherwise Resulting p ij : p ij =q ij min{1, (j)/ (i)} for i j p ii = 1 - j i p ij

MCMC: Metropolis Algorithm Proposition (Metropolis works): The p ij 's from Metropolis Algorithm satisfy detailed balance property w.r.t i.e. (i)p ij = (j)p ji the new Markov Chain has a stationary distr. Remarks: we only need to know ratios of values of the MC might converge to exponentially slowly

MCMC: Metropolis Algorithm Knapsack Problem Target distribution: -1 (z) = C b exp( b i v i z i ) Algorithm: (1) choose J {1,...,m} uniformly at random (2) let y = (z 1,...,1-z J,...,z m ) (3) if y is not feasible, then X t+1 = X t (4) if y is feasible, set X t+1 = y with prob., else X t+1 = X t where = min{1, exp( b i v i (y i -z i )}

MCMC: Optimization Metropolis Algorithm theoretically works, but: needs large b to make good states more likely its convergence time may be exponential in b try changing b over time Simulated Annealing for Knapsack Problem: = min{1, exp( b(t) i v i (y i -z i )} b(t) increases slowly with time (e.g. =log(t), =(1.001) t )

MCMC: Simulated Annealing General optimization problem: maximize function G(z) on all feasible solutions Ω let Q be again symmetric transition prob. matrix on Ω Simulated Annealing is Metropolis Algorithm with p ij =q ij min{1, exp( b(t) [G(j)-G(i)]) } for i j p ii = 1 - j i p ij Effect of b(t): exploration vs. exploitation trade-off

MCMC: Gibbs Sampling Consider a factored state space z Ω is a vector z=(z 1,...,z m ) notation: z -i = (z 1,...,z i-1,z i+1,...,z m ) Assume that target is s.t. P[Z i z -i ] is known Algorithm: (1) pick a component i {1,...,m} (2) sample value of z i from P[Z i z -i ], set X t =(z 1,...,z m ) A special case of generalized Metropolis Sampling (Metropolis-Hastings)

MCMC: Relevance to Bayesian Networks In Bayesian Networks, we know P[Z i z -i ] = P[Z i MarkovBlanket(Z i )] BN Inference Problem: compute P[Z i =z i E=e] Possible solution: (1) sample from worlds according to P[Z=z E=e] (2) compute fraction of those worlds where Z i =z i Gibbs Sampler works: let (z) = P[Z=z E=e], then P[Z i z -i ] satisfies detailed balance property w.r.t (z) (z) is stationary

MCMC: Inference in BN Example P[H S=true, B=true]

MCMC: Inference in BN Example h,l p (h,l) ( h,l) =? h,l h, l h, l Smoking and Breathing difficulties are fixed

MCMC: Inference in BN Example P[z i MB(Z i )] P[z i Par(Z i )] Y Chld(Z) P[y Par(Y)] p (h,l) ( h,l) = P[h gets picked].p[ h MB(H)] = ½.P[ h l,s,b] = ½.αP[ h s].p[b h,l]