Markov Chain Monte Carlo Simulation Made Simple

Size: px
Start display at page:

Download "Markov Chain Monte Carlo Simulation Made Simple"

Transcription

1 Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1

2 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical integration. It can be used to numerically estimate complex economometric models. In this paper I describe the intuition behind the process, show its flexiblity and applicability. I conclude by demonstrating that these methods are often simpler to implement than many common techniques such as MLE. This paper serves as a brief introduction. I do not intend to derive any results or prove any theorems. I beleive that MCMC offers a powerful estimation tool. This paper is designed to remove the mystery surround the process. Not only it extremely powerful and flexible but it is easy to implement. Given the recent growth in the power of computers I beleive that numerical procedures will be the estimation tools of the future. I outline the underlying logic, show why these techniques work. MCMC techniques are most often used in the Bayesian context. I start by outlining the simple linear model in the Bayesian framework. Although, analytical techniques exist for this model they are complex. In general, more complex model are analytically intractable. Having setup the estimation technique I examine the properties of Markov chains. These properties provide the basis for the estimation procedure. 1 The Bayesian Model While I beleive that the Bayesian approach a superior, consistent approach to statistics than the standard frequentist approach, this debate is volumous and not the topic of this paper. For practical purposes if is usually possible to use diffuse priors that do not influence the posterior results. prior f(θ) likelihood L(Y θ) posterior f(θ Y ) f(θ) L(Y θ) For example in the simple linear model θ = {β,σ 2 } 2 Markov Chains A Markov chain is a stochastic process. It generates a series a observations, X. To illustrate the concept I focus on a discrete time, descrete state space model. At each time period the process generates a sample, X t,from 2

3 the state space. For a simple example, suppose that the state space is the numbers 1, 2,and 3. A Markov chain is simply a string of these numbers. The Markov property is that the probability distribution over the next observation depends only upon the current observation. Let p ij represent the probability that the next observation is j (X t+1 = j), given that the current observation is i (X t = i). A convenient way to present these transition probabilities is through a transition matrix P, P = p 11 p 12 p 13 p 21 p 22 p 23. The elements of the first row represent the probabilities of moving to the different states if the current state is 1. Therefore, p 31 p 32 p 33 p 11 represents the probability that state X t+1 =1ifthecurrentstateisalso 1; p 12 represents the probability that state X t+1 =2if the current state is 1, etc... Suppose that our initial observation is indeed 1 (X 0 =1). The probability distribution for state is given by the first row of P. The next question is what is the probability distribution over the following observation. To illustrate, I consider the more specific question, with what probability does X 2 =3? There are three possible paths by which the second observation could equal 3; they are illustrated in the table below. Thus the probability that X 2 =3 is p 11 p 13 + p 12 p 23 + p 13 p 33. Pathway X 0 X 1 X 2 Probability # p 11 p 13 # p 12 p 23 # p 13 p 33 Thus, for any initial state, we can calculate the probability density over the states for a given number of moves. Obviously, as the number of moves increases these become increasingly difficult to calculate. Yet, matrix notation simplifies the calculation. Suppose, rather than start with a specific state we consider a probability distribution over these states, ν 0 = 0. v1 0 v2 0 v3 0 If we randomly select the initial state from this distribution, then what is the probability distribution of the next state in the chain is given by v (1) 1 v (1) = v (1) 2 = Pv (0) = p 11 p 12 p 13 p 21 p 22 p 23 v (1) p 31 p 32 p 33 3 v0 1 v 0 2 v 0 3 = p 11v p 12 v p 13 v 0 3 p 21 v p 22 v p 23 v 0 3 p 31 v p 32 v p 33 v 0 3 3

4 This idea can be extended, the probability distribution over the states after the second move is simply v 2 = Pv 1 = P 2 v 0. This idea can be generalised; specifically, v (t) = P t v (0). Of particular interest, is the distribution as the chain becomes long. As the chain s length increases then the distribution over the states becomes less and less determined by the starting distribution and more and more determined by the transition probabilities. Indeed, providing the chain satisfies certain regularity conditions, i.e. it does not get stuck in one state, there exists a unique invariant distribution associated with every transition matrix. Let π represent this invariant distribution. So for any starting distribution, π (0), as the chain becomes long then the π (t) tends to π ( lim π (t) = π). t There are two ways to calculate this invariant distribution. The first is analytical. This method exploits the fact that π = Pπ, and solves this system of equations. The second, and of more relevance for this paper, is to similate π by actually running the Markov chain. This involves choosing a starting value and simply running the Markov chain. The initial values in the chain depend strongly upon the starting values. However, as the chain becomes longer then the elements of the chain represent random draws from the probability distribution π Suppose, for example, that the transition matrix is P = We could start by setting X 0 =1and then running the Markov chain. We could estimate the density of each state by examining the frequency of each state. Figure 1 demonstrates that, as the number of iterations becomes large, that the relative frequency of each state converges to its invariant density. We can arbitarily increase the accuracy of these estimates simply by taking more iteration. In this example, I use a discrete state space model; however, these ideas are readily extendable to continuous state space models, where the transition matrix is replaced by a transition kernel (a probability density over the next state that depends only upon the current state). 2.1 Exploiting Markov chains for estimation Most of Markov theory revolves around finding the invariant distribution of Markov chains. MCMC turns the problem arround. Rather than finding 4

5 x1 x2 x N Figure 1: 5

6 the invariant distribution of a specific Markov chain, it starts with a specific invariant distributions and says, can I find a Markov chain that has this invariant distribution. 1 Typically, we already know the distribution of interest: the posterior distribution of the parameters. The key is to find a transition kernel that has this invariant distribution f(θ Y ). In Bayesian estimation we want to find the posterior distribution of the parameters, f(θ Y ). As discussed above this is often analytically intractible. However, suppose we have a Markov chain, P, whose invariant distribution is f(θ Y ). If we run thismarkovprocessthen,asthechainbecomeslong,itselementsrepresent random draws from the posterior distribution f(θ Y ). To illustrate how the process works consider the following algorthym. 1. Choose starting values, θ (0), and length of the chain, n 0 + m. 2. Given the current element in the chain, θ (t), use the Markov process P, to draw the next element θ (t+1). 3. If t>m,thenstoreθ (t+1). 4. If t<m+ n 0, then return to step 2; otherwise calculate and report the descriptive statistics for the elements stored in step 3. This algorthym generates and stores n 0 elements from the chain. These elements represent random samples from the posterior distribution of f (θ Y ). Thus the sample average represents an estimate of the expected value of θ. Other properties of f (θ Y ) can also be estimated by examining the properties of the sample. The accuracy of these estimates depends upon the number of draws, n 0. Accurracy is improved by running the chain longer. Note that the first m iterations of the chain were discarded. The initial elements in the chain are strongly influenced by the starting value (as the figure above demonstrates). If these starting values are drawn from a low density region of the posterior denisty then the chain contains too many draws from this region. 2 1 Each Markov process has a unique invariant distribution. Yet, many Markov chains could have the same invariant distribution. Thus, we are free to use any of these process to simulate the invariant distribution. 2 Another practical problem with running this algorthym is the high autocorrelation between elements in the chain. This reduces the rate a which convergence is acheived. A practical solution is to subsample from elements stored at step 3. 6

7 In summary, if we can find a Markov process with transition kernel P, such that its invariant distribution is f (θ Y ), then we can numerically estimate this posterior distribution by running the Markov chain. Obviously, there are many importance convergence consideration that I have not considered. However, the basic point is that if an appropriate transition kernel can be found, then estimations involves nothing more than running the Markov process. So far I have said nothing about how to find an appropriate transition kernel. It is to this point that I turn next. 3 Transition Kernels Table 1 compares the analytically calculated probability distribution with the numerically simulated values. The accuracy of the simulation can be increased by simply increases the number of iterations of the chain. 3 Most of Markov theory revolves around finding the invariant distribution of Markov chains. MCMC turns the problem arround. Typically, we already know the distribution of interest: the posterior distribution of the parameters. The key is to find a transition kernal that has this invariant distribution. Then to estimate this distribution we simply need to run the Markov chain for a suitably long period. 4 Joint, marginal and conditional distributions In the linear model we want to estimate f(β,σ 2 Y ). Being somewhat informal, this is the probability density of seeing a particular value of β and σ 2. Bayesian have calculated this density. It turns out that, with suitable conjugate priors 4, f(β,σ 2 Y ) is distributed inverse gamma normal. Unfortunately, this is about the most complicated model for which we can work with 3 These is a convenient time to discuss several aspects associated with implementing MCMC methods. First the starting value of the Markov chain affect the initial values of the chain. Over time their effect diminishes. However, if the starting values represent very low density portions of the state space then the choice of starting values affects the results. The usual solution is to discard the early part of the chain. This tends to disregard those draws from the chain that are highly dependent upon the starting values. Convergence criticeria??? literature????? 4 what is a conjugate prior? 7

8 the joint posterior density analytically. For more complex models the joint density is simply intractable. Yet, generally our interest is in the marginal denisty of a particular parameter. In particular case of the simple linear model we typically want to know about β and σ 2, separately. For example, this is all we report from a regression model, the distribution of β. This marginal density is simply the joint density of β and σ 2 intregrated across all possible values of σ 2. The key do using MCMC is to stop thinking in terms of calculating things analytically and imagine how you could simulate a single parameter in a model if you knew all the other parameters. Suppose for example that you knew the marginal distribution of σ 2 and wanted to calculate the marginal distribution of β. In order to estimate the marginal density of β Icould simply, integrate out σ 2 from the joint density. While simply tricky in this problem, it is impossible in more complex econometric models. However, knowing the marginal density of σ 2, I can draw a large number of random draws from this density. For each of these draws, the conditional density of β is simple to calculate (with normal priors, f(β Y,σ 2 ) is also distributed normally). To numerically estimate β I could draw a random sample from this distribution. Algorithm to calculate the marginal density of β given that the density of σ 2 is known. 1. set t=1 2. randonly draw (σ 2 ) (t) from its known posterior marginal distribution 3. calculate the posterior density of β given (σ 2 ) (t) (f(β Y,(σ 2 ) (t) )) 4. randomly draw (β) t from f(β Y,σ 2(t) )) 5. let t=t+1 and go to 2 Suppose this algorthm is repeated T times. Then the T samples of β represent random draws from its marginal density. The algorthm effectively integrates out σ 2. As an analogy, in our 101 econometrics classes we learn how to estimate the means of a variable if we know its variance. We then learn to calculate the variance if we know the mean. Being an order of magnitude harder, the calculation of the joint distribution of the mean and variance is typically 8

9 ommitted. Calculating the posterior density of the mean and the variance togther is much harder than calculating either conditional density. However, providing we can break a model down into a series of simple conditional densities we can estimate the marginal density of a parameter. The algorithm above assumed that the distribution of σ 2 was known and it produced a random sample from the posterior density of β. However,ifthe draws from the algorithm represent random draws from the marginal density of β, then we could simply reverse the logic of the argument, and draw random samples from the conditional density of σ 2 given the current value of β. Given that the β s are random draws from the marginal density for β, then random draws of σ 2 represent random draws from the marginal density of σ 2. Hence the following algorithm simulates the posterior ditributions for β and σ 2. Algorithm to calculate the marginal density of β and σ set t=1 and choose starting values, β (0) and (σ 2 ) (0). 2. calculate the posterior density of β given (σ 2 ) (t) (f(β Y,(σ 2 ) (t) )) 3. Randomly β (t+1) draw from this distribution. 4. calculate the posterior density of (σ 2 ) (t) given (β) (t+1) (f((σ 2 ) (t) Y,(β) (t+1) )) 5. randomly draw (σ 2 ) (t+1) from f((σ 2 ) Y,(β) (t+1) ) 6. let t=t+1 and go to 2 Providing the prior are appropriately choosen then the calculates of f(β Y,(σ 2 ) (t) ) and f((σ 2 ) (t) Y,(β) (t+1) ) are straightforward. The following code shows how simply this algorthym can be implied in STATA. See program OLS_MCMC.do 5 Bayesian Updates for simple models. Suppose we assume that the likelihood function is normal and so is our prior: Likelihood: p(y θ) = 1 2π exp ( 1(y 2 θ)2 ) 9

10 Normal prior: f(θ) = 1 2π exp ( 1 2 (θ µ 0) 2 ). To make life as simple as possible, suppose initially that the variance of both the likelihood and the prior density in one. By Bayes rule the posterior density is proportional to the product of the prior and the likelihood: p(θ y) p(y θ)f(θ). We can show that posterior denisty is also normal. Specifically, p(θ y) exp ( 1 2 (y θ)2 )exp ( 1 2 (θ µ 0) 2 )=exp ( 1 2 [(y θ)2 +(θ µ 0 ) 2 ]) We can expand the terms in the exponential and then collect them (completing the square). 1 2 [(y θ)2 +(θ µ 0 ) 2 ]= θ 2 +( y µ 0 ) θ y µ2 0. We only care about term is θ since everything else is in the nomalizing constant. θ 2 +( y µ 0 ) θ y µ2 0 =(θ b) 2 = θ 2 2θb+b 2 so b = 1 (y + µ 2 0) Hence p(θ y) exp ((θ 1 2 (y + µ 0)) 2 ) so p(θ y) is distributed normal with mean 1 2 (y + µ 0) and variance 1 2.We can now move to a more realistic example. The normal prior is referred to as a conjugate prior since it results in posterior density from the same class of distributions. 5.1 Simple Linear Model Consider the OLS simple linear model, y i = x i β + e i where e i N(0,σ 2 ). Using conjugate prior, β 0 N(β 0,B 0 ) and σ 2 0 GAMMA(υ 0,δ 0 ),wecan derive the posterior conditional densities. The posterior β parameter is normally distributed: β y, σ 2 N( β,b) b where β b = B(B0 1 β 0 + P N i=1 x iy i ) and B =(B0 1 + P N i=1 x0 ix i ) 1,and The posterior σ 2 is inverse gamma distributed: So σ 2 is distributed G( υ 0+N, δ 0+SSE ) 2 2 where SSE = P N i=1 (y i x i β) 2 i.e. sum of squared errors. 5.2 More Complex Models A key advantage of MCMC is that models can be built up in simple stepwise fashion. Suppose from example, that instead of a continuous dependent variable we have binary outcomes. Such data is typically analysed as a probit model. Specifically, z i = x i β + e i where e i N(0, 1), asify i =1 then z i > 0 and if y i =0then z i < 0. The variable z i is referred to a latent variable since we never actually observe it. The standard approach 10

11 to estimating such a model is to integrate out the latent variable and then apply maximium likelihood. A simple MCMC approach utilitizes a data augmentation technique (Tanner and Wong, 198?). If we knew the value of these latent data then we could simulate the β sjustaswedidintheols model above. Although we don t care directly about the latent data we can simulate these data. The probit model tells us the distribution of the latent data. Specific, if y i =1then z i has a left truncated normal distribution with mean xβ and variance 1: x TN [0,+ ] (x i β,1). Similarly, if y i =0 then we know that the corresponding latent variable lies between and 0: x TN [,0] (x i β,1). We now that the tools to implement this model. Let Z refer to the set of latent data (i.e. all the z i s.) Algorithm to calculate the marginal density of β in a probit model. 1. set t=1 and choose starting values, β (0) and (Z) (0). 2. calculate the posterior density of β given Z (t) (f(β Z t )) 3. Randomly β (t+1) draw from this distribution. 4. calculate the posterior density of Z (t) given (β) (t+1) (f(z Y,(β) (t+1) )) 5. randomly draw (Z) (t+1) from f(z Y,(β) (t+1) ) 6. let t=t+1 and go to 2 The key simplification here is that given Z, the posterior distribution of β is independent of the binary observed dependent variable. Now while in this context, the MLE approach provides highly reliable estimates in more complex models, such are multivariate, multinominal, or censored descrete choice models, MLE is less reliable. MCMC provides a powerful tool in these cases, being easy to program and less prone to the convergence failure problems of MLE. The construction of MCMC can be done peicewise. For example, the OLS code above with estimate the probit model with two additions. First, set the variance, σ 2, equal to one. Second, add the set to draw the latent data, Z. This is easily acheived using the following simulation. If z is a truncated normal variable with mean xβ, variance 1 with a range p to q, then the following algorithym readily provides a method to genrate a random sample 11

12 from the distribution of z. If x TN [p,q] (µ, σ 2 ) and u is a uniform random number then x = µ + σφ 1 (Φ((p µ)/σ)+u(φ((q µ)/σ) Φ((p µ)/σ))), represents a random draw of x. 12

Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour. Patrick Lam Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

More information

Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo

Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Introduction to Bayesian Data Analysis and Markov Chain Monte Carlo Jeffrey S. Morris University of Texas M.D. Anderson Cancer Center Department of Biostatistics jeffmo@mdanderson.org September 20, 2002

More information

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015

More information

Gibbs sampling. Gibbs sampling is attractive because it can sample from high-dimensional posteriors.

Gibbs sampling. Gibbs sampling is attractive because it can sample from high-dimensional posteriors. Gibbs sampling Gibbs sampling was proposed in the early 1990s (Geman and Geman, 1984; Gelfand and Smith, 1990) and fundamentally changed Bayesian computing. Gibbs sampling is attractive because it can

More information

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni 1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed

More information

Inference on Phase-type Models via MCMC

Inference on Phase-type Models via MCMC Inference on Phase-type Models via MCMC with application to networks of repairable redundant systems Louis JM Aslett and Simon P Wilson Trinity College Dublin 28 th June 202 Toy Example : Redundant Repairable

More information

The Exponential Family

The Exponential Family The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural

More information

1 Prior Probability and Posterior Probability

1 Prior Probability and Posterior Probability Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Bayesian Methods. 1 The Joint Posterior Distribution

Bayesian Methods. 1 The Joint Posterior Distribution Bayesian Methods Every variable in a linear model is a random variable derived from a distribution function. A fixed factor becomes a random variable with possibly a uniform distribution going from a lower

More information

Lab 8: Introduction to WinBUGS

Lab 8: Introduction to WinBUGS 40.656 Lab 8 008 Lab 8: Introduction to WinBUGS Goals:. Introduce the concepts of Bayesian data analysis.. Learn the basic syntax of WinBUGS. 3. Learn the basics of using WinBUGS in a simple example. Next

More information

Bayesian Techniques for Parameter Estimation. He has Van Gogh s ear for music, Billy Wilder

Bayesian Techniques for Parameter Estimation. He has Van Gogh s ear for music, Billy Wilder Bayesian Techniques for Parameter Estimation He has Van Gogh s ear for music, Billy Wilder Statistical Inference Goal: The goal in statistical inference is to make conclusions about a phenomenon based

More information

Bayesian Inference: Gibbs Sampling

Bayesian Inference: Gibbs Sampling Bayesian Inference: Gibbs Sampling Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 212 References: Most of the material in this note was taken

More information

11. Time series and dynamic linear models

11. Time series and dynamic linear models 11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

EC 6310: Advanced Econometric Theory

EC 6310: Advanced Econometric Theory EC 6310: Advanced Econometric Theory July 2008 Slides for Lecture on Bayesian Computation in the Nonlinear Regression Model Gary Koop, University of Strathclyde 1 Summary Readings: Chapter 5 of textbook.

More information

1 Markov Chain Monte Carlo (MCMC)

1 Markov Chain Monte Carlo (MCMC) 1 Markov Chain Monte Carlo (MCMC) By Steven F. Arnold Professor of Statistics-Penn State University Some references for MCMC are 1. Tanner, M. (1993) Tools for Statistical Inference, Method for Exploration

More information

Bayesian Methods for Regression in R

Bayesian Methods for Regression in R Bayesian Methods for Regression in R Nels Johnson Lead Collaborator, Laboratory for Interdisciplinary Statistical Analysis 03/14/2011 Nels Johnson (LISA) Bayesian Regression 03/14/2011 1 / 26 Outline What

More information

Models for Count Data With Overdispersion

Models for Count Data With Overdispersion Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extra-poisson variation and the negative binomial model, with brief appearances

More information

Gaussian Processes for Classification

Gaussian Processes for Classification Gaussian Processes for Classification Amir Atiya Dept Computer Engineering, Cairo University amir@alumni.caltech.edu www.alumni.caltech.edu/ amir Currently on leave at Veros Systems, TX February 2011 Gaussian

More information

MCMC estimation of a finite beta mixture

MCMC estimation of a finite beta mixture Noname manuscript No. (will be inserted by the editor MCMC estimation of a finite beta mixture Andriy Norets Xun Tang October 27, 2010 Abstract We describe an efficient Markov chain Monte Carlo algorithm

More information

The Tobit Model. Econ 674. Purdue University. Justin L. Tobias (Purdue) The Tobit 1 / 1

The Tobit Model. Econ 674. Purdue University. Justin L. Tobias (Purdue) The Tobit 1 / 1 The Tobit Model Econ 674 Purdue University Justin L. Tobias (Purdue) The Tobit 1 / 1 Estimation In this lecture, we address estimation and application of the tobit model. The tobit model is a useful specification

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

Physics 509: Least Squares Parameter Estimation. Scott Oser Lecture #9

Physics 509: Least Squares Parameter Estimation. Scott Oser Lecture #9 Physics 509: Least Squares Parameter Estimation Scott Oser Lecture #9 1 Outline Last time: we were introduced to frequentist parameter estimation, and learned the maximum likelihood method---a very powerful

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

7. Transformations of Variables

7. Transformations of Variables Virtual Laboratories > 2. Distributions > 1 2 3 4 5 6 7 8 7. Transformations of Variables Basic Theory The Problem As usual, we start with a random experiment with probability measure P on an underlying

More information

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 16: Bayesian inference (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 35 Priors 2 / 35 Frequentist vs. Bayesian inference Frequentists treat the parameters as fixed (deterministic).

More information

CS Statistical Machine learning Lecture 18: Midterm Review. Yuan (Alan) Qi

CS Statistical Machine learning Lecture 18: Midterm Review. Yuan (Alan) Qi CS 59000 Statistical Machine learning Lecture 18: Midterm Review Yuan (Alan) Qi Overview Overfitting, probabilities, decision theory, entropy and KL divergence, ML and Bayesian estimation of Gaussian and

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Multivariate Statistical Modelling Based on Generalized Linear Models

Multivariate Statistical Modelling Based on Generalized Linear Models Ludwig Fahrmeir Gerhard Tutz Multivariate Statistical Modelling Based on Generalized Linear Models Second Edition With contributions from Wolfgang Hennevogl With 51 Figures Springer Contents Preface to

More information

Bonus-malus systems and Markov chains

Bonus-malus systems and Markov chains Bonus-malus systems and Markov chains Dutch car insurance bonus-malus system class % increase new class after # claims 0 1 2 >3 14 30 14 9 5 1 13 32.5 14 8 4 1 12 35 13 8 4 1 11 37.5 12 7 3 1 10 40 11

More information

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data Faming Liang University of Florida August 9, 2015 Abstract MCMC methods have proven to be a very powerful tool for analyzing

More information

L8 Statistical modeling

L8 Statistical modeling C3 Modeling Inference Bootstrap C3 Modeling Inference Bootstrap Monte-Carlo and Empirical Methods for Statistical Inference, FMS91/MASM11 L8 Statistical modeling The frequentist approach Statistical modeling

More information

Approximate Inference

Approximate Inference Approximate Inference 9.520 Class 19 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Laplace and Variational Inference. 4. Basic Sampling

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Centre for Central Banking Studies

Centre for Central Banking Studies Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics

More information

Sampling Methods: Particle Filtering

Sampling Methods: Particle Filtering Penn State Sampling Methods: Particle Filtering CSE586 Computer Vision II CSE Dept, Penn State Univ Penn State Recall: Importance Sampling Procedure to estimate E P (f(x)): 1) Generate N samples x i from

More information

Bayesian Statistics: Indian Buffet Process

Bayesian Statistics: Indian Buffet Process Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note

More information

Limited Dependent Variable Models

Limited Dependent Variable Models Limited Dependent Variable Models Contents: Censored and truncated samples Sample selection bias and Mills ratio Truncated regression The Tobit I Model Interpretation, Tests, etc. The Selectivity Tobit

More information

1 Sufficient statistics

1 Sufficient statistics 1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =

More information

Suppose today s forecast is sunny. Then the probability that it will be sunny in two days from now is: = (.05)(.95) + (.5)(.05) =

Suppose today s forecast is sunny. Then the probability that it will be sunny in two days from now is: = (.05)(.95) + (.5)(.05) = Math 151, Probability Jo Hardin March 13, 2015 1 Markov Chains Markov Chains Markov chain is a mathematical system that undergoes transitions from one state to another which depends only on the current

More information

BIOS 312: MODERN REGRESSION ANALYSIS

BIOS 312: MODERN REGRESSION ANALYSIS BIOS 312: MODERN REGRESSION ANALYSIS James C (Chris) Slaughter Department of Biostatistics Vanderbilt University School of Medicine james.c.slaughter@vanderbilt.edu biostat.mc.vanderbilt.edu/coursebios312

More information

Andreas Svensson, Johan Dahlin and Thomas B. Schön. June 21, 2015

Andreas Svensson, Johan Dahlin and Thomas B. Schön. June 21, 2015 Exercises Summer school on foundations and advances in stochastic filtering (FASF 215) Course 1: Nonlinear system identification using sequential Monte Carlo methods Andreas Svensson, Johan Dahlin and

More information

Hidden Markov Chain Models

Hidden Markov Chain Models ECO 513 Fall 2010 C. Sims Hidden Markov Chain Models November 10, 2010 c 2010 by Christopher A. Sims. This document may be reproduced for educational and research purposes, so long as the copies contain

More information

Approximate Inference

Approximate Inference Approximate Inference IPAM Summer School Ruslan Salakhutdinov BCS, MIT Deprtment of Statistics, University of Toronto 1 Plan 1. Introduction/Notation. 2. Illustrative Examples. 3. Laplace Approximation.

More information

Sampling Methods, Particle Filtering, and Markov-Chain Monte Carlo

Sampling Methods, Particle Filtering, and Markov-Chain Monte Carlo Sampling Methods, Particle Filtering, and Markov-Chain Monte Carlo Vision-Based Tracking Fall 2012, CSE Dept, Penn State Univ References Recall: Bayesian Filtering Rigorous general framework for tracking.

More information

Markov Chains on Continuous State Space

Markov Chains on Continuous State Space Markov Chains on Continuous State Space 1 Markov Chains Monte Carlo 1. Consider a discrete time Markov chain {X i, i = 1, 2,...} that takes values on a continuous state space S. Examples of continuous

More information

Appendix A: Background

Appendix A: Background 12 Appendix A: Background The purpose of this Appendix is to review background material on the normal distribution and its relatives, and an outline of the basics of estimation and hypothesis testing as

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Abstract

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Abstract !!" # $&%('),.-0/12-35417698:4= 12-@?>ABAC8ED.-03GFH-I -J&KL.M>I = NPO>3Q/>= M,%('>)>A('(RS$B$&%BTU8EVXWY$&A>Z[?('\RX%Y$]W>\U8E^7_a`bVXWY$&A>Z[?('\RX)>?BT(' A Gentle Tutorial of the EM Algorithm and

More information

Bayesian Mixture Models and the Gibbs Sampler

Bayesian Mixture Models and the Gibbs Sampler Bayesian Mixture Models and the Gibbs Sampler David M. Blei Columbia University October 19, 2015 We have discussed probabilistic modeling, and have seen how the posterior distribution is the critical quantity

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo February 2012 J. Corso (SUNY at Buffalo) Parametric Techniques February 2012 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed

More information

General Random Effect Latent Variable Modeling: Random Subjects, Items, Contexts, and Parameters

General Random Effect Latent Variable Modeling: Random Subjects, Items, Contexts, and Parameters General Random Effect Latent Variable Modeling: Random Subjects, Items, Contexts, and Parameters Tihomir Asparouhov and Bengt Muthén July 18, 2012 Abstract Bayesian methodology is well-suited for estimating

More information

Hypothesis Testing for Beginners

Hypothesis Testing for Beginners Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes

More information

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014 Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

More information

Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)

Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component) Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 2: The Gibbs Sampler, via motivation from Metropolis-Hastings. Statistical applications

More information

Uncertainty Quantification in Engineering Science

Uncertainty Quantification in Engineering Science Course Info Uncertainty Quantification in Engineering Science Prof. Costas Papadimitriou, University of Thessaly, costasp@uth.gr Winter Semester 2014-2015 Course Prerequisites DESIRABLE Elementary knowledge

More information

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Validation of Software for Bayesian Models using Posterior Quantiles Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Abstract We present a simulation-based method designed to establish that software

More information

Credit Risk Models: An Overview

Credit Risk Models: An Overview Credit Risk Models: An Overview Paul Embrechts, Rüdiger Frey, Alexander McNeil ETH Zürich c 2003 (Embrechts, Frey, McNeil) A. Multivariate Models for Portfolio Credit Risk 1. Modelling Dependent Defaults:

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

4. Joint Distributions of Two Random Variables

4. Joint Distributions of Two Random Variables 4. Joint Distributions of Two Random Variables 4.1 Joint Distributions of Two Discrete Random Variables Suppose the discrete random variables X and Y have supports S X and S Y, respectively. The joint

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 12: Laplace Approximation for Logistic Regression Exponential Families Generalized Linear Models Many

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Imputing Missing Data using SAS

Imputing Missing Data using SAS ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are

More information

Introduction to Bayesian Statistics with WinBUGS Part I Introdcution

Introduction to Bayesian Statistics with WinBUGS Part I Introdcution Introduction to Bayesian Statistics with WinBUGS Part I Introdcution Matthew S. Johnson Teachers College; Columbia University johnson@tc.columbia.edu New York ASA Chapter CUNY Graduate Center New York,

More information

Probability Theory. Elementary rules of probability Sum rule. Product rule. p. 23

Probability Theory. Elementary rules of probability Sum rule. Product rule. p. 23 Probability Theory Uncertainty is key concept in machine learning. Probability provides consistent framework for the quantification and manipulation of uncertainty. Probability of an event is the fraction

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Bayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012

Bayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012 Bayesian logistic betting strategy against probability forecasting Akimichi Takemura, Univ. Tokyo (joint with Masayuki Kumon, Jing Li and Kei Takeuchi) November 12, 2012 arxiv:1204.3496. To appear in Stochastic

More information

5. Conditional Distributions

5. Conditional Distributions Virtual Laboratories > 2. Distributions > 1 2 3 4 5 6 7 8 5. Conditional Distributions Basic Theory As usual, we start with a random experiment with probability measure P on an underlying sample space

More information

A simple jags demo. December 14, An introduction to BUGs/WinBUGs/JAGs

A simple jags demo. December 14, An introduction to BUGs/WinBUGs/JAGs A simple jags demo December 14, 2009 1 An introduction to BUGs/WinBUGs/JAGs BUGs/WinBUGs/JAGs is a succesful general purpose McMC sampler for inference. It is very useful for inference, and particularly

More information

Lecture 6: The Bayesian Approach

Lecture 6: The Bayesian Approach Lecture 6: The Bayesian Approach What Did We Do Up to Now? We are given a model Log-linear model, Markov network, Bayesian network, etc. This model induces a distribution P(X) Learning: estimate a set

More information

Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

More information

Posterior probability!

Posterior probability! Posterior probability! P(x θ): old name direct probability It gives the probability of contingent events (i.e. observed data) for a given hypothesis (i.e. a model with known parameters θ) L(θ)=P(x θ):

More information

PS 271B: Quantitative Methods II. Lecture Notes

PS 271B: Quantitative Methods II. Lecture Notes PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.

More information

The Application of Markov Chain Monte Carlo to Infectious Diseases

The Application of Markov Chain Monte Carlo to Infectious Diseases The Application of Markov Chain Monte Carlo to Infectious Diseases Alyssa Eisenberg March 16, 2011 Abstract When analyzing infectious diseases, there are several current methods of estimating the parameters

More information

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal

More information

Markov Chain Monte Carlo (MCMC) Sampling methods

Markov Chain Monte Carlo (MCMC) Sampling methods Markov Chain Monte Carlo (MCMC) Sampling methods Sampling Selection of a subset of individuals from within a statistical population to estimate characteristics of the whole population A probability sample

More information

2.3 Forward probabilities and inverse probabilities

2.3 Forward probabilities and inverse probabilities 2.3: Forward probabilities and inverse probabilities 27 Bayesians also use probabilities to describe inferences. 2.3 Forward probabilities and inverse probabilities Probability calculations often fall

More information

STATISTICAL ANALYSIS OF THE ADDITIVE AND MULTIPLICATIVE HYPOTHESES OF MULTIPLE EXPOSURE SYNERGY FOR COHORT AND CASE-CONTROL STUDIES

STATISTICAL ANALYSIS OF THE ADDITIVE AND MULTIPLICATIVE HYPOTHESES OF MULTIPLE EXPOSURE SYNERGY FOR COHORT AND CASE-CONTROL STUDIES Chapter 5 STATISTICAL ANALYSIS OF THE ADDITIVE AND MULTIPLICATIVE HYPOTHESES OF MULTIPLE EXPOSURE SYNERGY FOR COHORT AND CASE-CONTROL STUDIES 51 Introduction In epidemiological studies, where there are

More information

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships

More information

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

More information

3 Random vectors and multivariate normal distribution

3 Random vectors and multivariate normal distribution 3 Random vectors and multivariate normal distribution As we saw in Chapter 1, a natural way to think about repeated measurement data is as a series of random vectors, one vector corresponding to each unit.

More information

Bayesian Methods for Regression in R

Bayesian Methods for Regression in R Bayesian Methods for Regression in R Nels Johnson Lead Collaborator, Laboratory for Interdisciplinary Statistical Analysis Department of Statistics, Virginia Tech 03/13/2012 Nels Johnson (LISA) Bayesian

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

More information

Curve Fitting & Multisensory Integration

Curve Fitting & Multisensory Integration Curve Fitting & Multisensory Integration Using Probability Theory Hannah Chen, Rebecca Roseman and Adam Rule COGS 202 4.14.2014 Overheard at Porters You re optimizing for the wrong thing! What would you

More information

Introduction to Markov Chain Monte Carlo

Introduction to Markov Chain Monte Carlo Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

Numerical Methods for Option Pricing

Numerical Methods for Option Pricing Chapter 9 Numerical Methods for Option Pricing Equation (8.26) provides a way to evaluate option prices. For some simple options, such as the European call and put options, one can integrate (8.26) directly

More information

Generalized Linear Model Theory

Generalized Linear Model Theory Appendix B Generalized Linear Model Theory We describe the generalized linear model as formulated by Nelder and Wedderburn (1972), and discuss estimation of the parameters and tests of hypotheses. B.1

More information

Hypothesis Testing. 1 Introduction. 2 Hypotheses. 2.1 Null and Alternative Hypotheses. 2.2 Simple vs. Composite. 2.3 One-Sided and Two-Sided Tests

Hypothesis Testing. 1 Introduction. 2 Hypotheses. 2.1 Null and Alternative Hypotheses. 2.2 Simple vs. Composite. 2.3 One-Sided and Two-Sided Tests Hypothesis Testing 1 Introduction This document is a simple tutorial on hypothesis testing. It presents the basic concepts and definitions as well as some frequently asked questions associated with hypothesis

More information

4 Alternative Gravity Model Estimators. 4 Alternative Gravity Model Estimators

4 Alternative Gravity Model Estimators. 4 Alternative Gravity Model Estimators 4 Alternative Gravity Model Estimators 49 4 Alternative Gravity Model Estimators The previous section primarily used OLS as the estimation methodology for a variety of gravity models, both intuitive and

More information

HIDDEN MARKOV CHAIN MODELS

HIDDEN MARKOV CHAIN MODELS ECO 513 Fall 2005 HIDDEN MARKOV CHAIN MODELS 1. The class of models We consider a class of models in which we have a parametric model for the conditional distribution of the observation y t {y s, s < t}

More information

1. First, write our query in general form: 2. Next, loop iteratively to calculate main result:

1. First, write our query in general form: 2. Next, loop iteratively to calculate main result: Variable Elimination Class #17: Inference in Bayes Nets, II Artificial Intelligence (CS 452/552): M. Allen, 16 Oct. 15! Want to know the probability of some value of X, given known evidence e, and other

More information

Introduction to Machine Learning

Introduction to Machine Learning . {circular,large,light,smooth,thick}, malignant Outline Contents Introduction to Machine Learning Bayesian Classification Varun Chandola January 3, 07 Learning Probabilistic Classifiers. Treating Output

More information

On Bayesian Model Assessment and Choice Using Cross-Validation Predictive Densities: Appendix

On Bayesian Model Assessment and Choice Using Cross-Validation Predictive Densities: Appendix On Bayesian Model Assessment and Choice Using Cross-Validation Predictive Densities: Appendix Appendix of Research report B23 Aki Vehtari and Jouko Lampinen Laboratory of Computational Engineering Helsinki

More information