# Tutorial on Markov Chain Monte Carlo

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology, Gif-sur-Yvette, France, July 8 13, 2000 This presentation available at July, Revised 14/05/08 Bayesian and MaxEnt Workshop LA-UR

2 MCMC experts Acknowledgements Julian Besag, Jim Guberantus, John Skilling, Malvin Kalos General discussions Greg Cunningham, Richard Silver July, 2000 Bayesian and MaxEnt Workshop 2

3 Problem statement Parameter space of n dimensions represented by vector x Given an arbitrary target probability density function (pdf), q(x), draw a set of samples {x k } from it Only requirement typically is that, given x, one be able to evaluate Cq(x), where C is an unknown constant MCMC algorithms do not typically require knowledge of the normalization constant of the target pdf; from now on the multiplicative constant C will not be made explicit Although focus here is on continuous variables, MCMC can be applied to discrete variables as well July, 2000 Bayesian and MaxEnt Workshop 3

4 Uses of MCMC Permits evaluation of the expectation values of functions of x, e.g., f(x) = f(x) q(x) dx (1/K) Σ k f(x k ) typical use is to calculate mean x and variance (x - x ) 2 Also useful for evaluating integrals, such as the partition function for properly normalizing the pdf Dynamic display of sequences provides visualization of uncertainties in model and range of model variations Automatic marginalization; when considering any subset of parameters of an MCMC sequence, the remaining parameters are marginalized over July, 2000 Bayesian and MaxEnt Workshop 4

5 Markov Chain Monte Carlo Generates sequence of random samples from an arbitrary probability density function Metropolis algorithm: draw trial step from symmetric pdf, i.e., t(δ x) = t(-δ x) accept or reject trial step simple and generally applicable relies only on calculation of target pdf for any x x 2 Probability(x 1, x 2 ) accepted step rejected step x 1 July, 2000 Bayesian and MaxEnt Workshop 5

6 Metropolis algorithm Select initial parameter vector x 0 Iterate as follows: at iteration number k (1) create new trial position x* = x k + Δx, where Δx is randomly chosen from t(δx) (2) calculate ratio r = q(x*)/q(x k ) (3) accept trial position, i.e. set x k+1 = x* if r 1 or with probability r, if r < 1 otherwise stay put, x k+1 = x k Requires only computation of q(x) Creates Markov chain since x k+1 depends only on x k July, 2000 Bayesian and MaxEnt Workshop 6

7 Choice of trial distribution Loose requirements on trial distribution t() stationary; independent of position Often used functions include n-d Gaussian, isotropic and uncorrelated n-d Cauchy, isotropic and uncorrelated Choose width to optimize MCMC efficiency rule of thumb: aim for acceptance fraction of about 25% July, 2000 Bayesian and MaxEnt Workshop 7

8 Experiments with the Metropolis algorithm Target distribution q(x) is n dimensional Gaussian uncorrelated, univariate (isotropic with unit variance) most generic case Trial distribution t(δx) is n dimensional Gaussian uncorrelated, equivariate; various widths target x k x k x k trial July, 2000 Bayesian and MaxEnt Workshop 8

9 MCMC sequences for 2D Gaussian results of running Metropolis with ratios of width of trial to target of 0.25, 1, and 4 when trial pdf is much smaller than target pdf, movement across target pdf is slow when trial width same as target, samples seem to sample target pdf better when trial much larger than target, trials stay put for long periods, but jumps are large This example from Hanson and Cunningham (SPIE, 1998) July, 2000 Bayesian and MaxEnt Workshop 9

10 MCMC sequences for 2D Gaussian results of running Metropolis with ratios of width of trial to target of 0.25, 1, and 4 display accumulated 2D distribution for 1000 trials viewed this way, it is difficult to see difference between top two images when trial pdf much larger than target, fewer splats, but further apart July, 2000 Bayesian and MaxEnt Workshop 10

11 MCMC - autocorrelation and efficiency In MCMC sequence, subsequent parameter values are usually correlated Degree of correlation quantified by autocorrelation function: 1 ρ( l) = y() i y( i l) N i = 1 where y(x) is the sequence and l is lag For Markov chain, expect exponential ρ() l = exp[ l ] λ Sampling efficiency is 1 1 η= [ 1+ 2 ρ( l)] = 1 + 2λ η 1 N l= 1 In other words, iterates required to achieve one statistically independent sample July, 2000 Bayesian and MaxEnt Workshop 11

12 Autocorrelation for 2D Gaussian plot confirms that the autocorrelation drops slowly when the trial width is much smaller than the target width; MCMC efficiency is poor best efficiency is when trial about same size as target (for 2D) ρ Normalized autocovariance for various widths of trial pdf relative to target: 0.25, 1, and 4 July, 2000 Bayesian and MaxEnt Workshop 12

13 Efficiency as function of width of trial pdf for univariate Gaussians, with 1 to 64 dimensions efficiency as function of width of trial distributions boxes are predictions of optimal efficiency from diffusion theory [A. Gelman, et al., 1996] efficiency drops reciprocally with number of dimensions July, 2000 Bayesian and MaxEnt Workshop 13

14 Efficiency as function of acceptance fraction for univariate Gaussians, with 1 to 64 dimensions efficiency as function of acceptance fraction best efficiency is achieved when about 25% of trials are accepted for a moderate number of dimensions 64 Acceptance fraction 1 July, 2000 Bayesian and MaxEnt Workshop 14

15 Efficiency of Metropolis algorithm Results of experimental study agree with predictions from diffusion theory (A. Gelman et al., 1996) Optimum choice for width of Gaussian trial distribution occurs for acceptance fraction of about 25% (but is a weak function of number of dimensions) Optimal statistical efficiency: η ~ 0.3/n for simplest case of uncorrelated, equivariate Gaussian correlation and variable variance generally decreases efficiency July, 2000 Bayesian and MaxEnt Workshop 15

16 Further considerations When target distribution q(x) not isotropic difficult to accommodate with isotropic t(δx) each parameter can have different efficiency desirable to vary width of different t(x) to approximately match q(x) recovers efficiency of univariate case When q(x) has correlations t(x) should match shape of q(x) q(x) t(δx) July, 2000 Bayesian and MaxEnt Workshop 16

17 MCMC - Issues Identification of convergence to target pdf is sequence in thermodynamic equilibrium with target pdf? validity of estimated properties of parameters (covariance) Burn in at beginning of sequence, may need to run MCMC for awhile to achieve convergence to target pdf Use of multiple sequences different starting values can help confirm convergence natural choice when using computers with multiple CPUs Accuracy of estimated properties of parameters related to efficiency, described above Optimization of efficiency of MCMC July, 2000 Bayesian and MaxEnt Workshop 17

18 MCMC - Multiple runs Multiple runs starting with different random number seed confirm MCMC sequences have converged to the target pdf First MCMC sequence Second, independent MCMC sequence Examples of multiple MCMC runs from my talk on the analysis of Rossi data ( July, 2000 Bayesian and MaxEnt Workshop 18

19 Annealing Introduction of fictitious temperature define functional ϕ(x) as minus-logarithm of target probability ϕ(x) = -log(q(x)) scale ϕ by an inverse temperature to form new pdf q'(x, T) = exp[- T -1 ϕ(x)] q'(x, T) is flatter than q(x) for T > 1 (called annealing) Uses of annealing (also called tempering) allows MCMC to move between multiple peaks in q(x) simulated annealing optimization algorithm (takes lim T 0) July, 2000 Bayesian and MaxEnt Workshop 19

20 Annealing to handle multiple peaks Example - target distribution is three narrow, well separated peaks For original distribution (T = 1), an MCMC run of steps rarely moves between peaks At temperature T = 100 (right), MCMC moves easily between peaks and through surrounding regions from M-D Wu and W. J. Fitzgerald, Maximum Entropy and Bayesian Methods (1996) T = 1 T = 100 July, 2000 Bayesian and MaxEnt Workshop 20

21 Gibbs Other MCMC algorithms vary only one component of x at a time draw new value of x j from conditional q(x j x 1 x 2... x j-1 x j+1... ) Metropolis-Hastings allows use of nonsymmetric trial functions, t(δx; x k ), suitably chosen to improve efficiency use r = [t(δx; x k ) q(x* )] / [t(-δx; x*) q(x k )] Langevin technique uses gradient* of minus-log-prob to shift trial function towards regions of higher probability uses Metropolis-Hastings * adjoint differentiation affords efficient gradient calculation July, 2000 Bayesian and MaxEnt Workshop 21

22 Hamiltonian hybrid algorithm Hamiltonian hybrid algorithm called hybrid because it alternates Gibbs & Metropolis steps associate with each parameter x a momentum p i i define a Hamiltonian H = ϕ(x) + Σ p i2 /(2 m i ) ; where ϕ = -log (q (x )) new pdf: q'(x, p) = exp(- H(x, p)) = q(x) exp(-σ p i2 /(2 m i )) can easily move long distances in (x, p) space at constant H using Hamiltonian dynamics, so Metropolis step is very efficient uses gradient* of ϕ (minus-log-prob) Gibbs step in p for constant x is easy efficiency may be better than Metropolis for large dimensions * adjoint differentiation affords efficient gradient calculation July, 2000 Bayesian and MaxEnt Workshop 22

23 Hamiltonian hybrid algorithm p i k+2 k k+1 x i Typical trajectories: red path - Gibbs sample from momentum distribution green path - trajectory with constant H, follow by Metropolis July, 2000 Bayesian and MaxEnt Workshop 23

24 Conclusions MCMC provides good tool for exploring the posterior and hence for drawing inferences about models and parameters For valid results, care must be taken to verify convergence of the sequence exclude early part of sequence, before convergence reached be wary of multiple peaks that need to be sampled For good efficiency, care must be taken to adjust the size and shape of the trial distribution; rule of thumb is to aim for 25% trial acceptance for 5 < n < 100 A lot of research is happening - don t worry, be patient July, 2000 Bayesian and MaxEnt Workshop 24

25 Short Bibliography Posterior sampling with improved efficiency, K. M. Hanson and G. S. Cunningham, Proc. SPIE 3338, (1998); includes introduction to MCMC Markov Chain Monte Carlo in Practice, W. R. Gilks et al., (Chapman and Hall, 1996); excellent general-purpose book Efficient Metropolis jumping rules, A. Gelman et al, in Bayesian Statistics 5, J. M. Bernardo et al., (Oxford Univ., 1996); diffusion theory Bayesian computation and stochastic systems, J. Besag et al., Stat. Sci. 10, 3-66 (1995); MCMC applied to image analysis Bayesian Learning for Neural Networks, R. M. Neal, (Springer, 1996); Hamiltonian hybrid MCMC Bayesian multinodal evidence computation by adaptive tempered MCMC, M-D Wu and W. J. Fitzgerald, in Maximum Entropy and Bayesian Methods, K. M. Hanson and R. N. Silver, eds., (Kluwer Academic, 1996); annealing More articles and slides under July, 2000 Bayesian and MaxEnt Workshop 25

26 Short Bibliography Inversion based on complex simulations, K. M. Hanson, Maximum Entropy and Bayesian Methods, G. J. Erickson et al., eds., (Kluwer Academic, 1998); describes adjoint differentiation and its usefulness "Bayesian analysis in nuclear physics," four tutorials presented at LANSCE, July 25 - August 1, 2005, including more detail about MCMC and its application [ ] More articles and slides under July, 2000 Bayesian and MaxEnt Workshop 26

### Gaussian Processes to Speed up Hamiltonian Monte Carlo

Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo

### Introduction to Markov Chain Monte Carlo

Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem

### 11. Time series and dynamic linear models

11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd

### Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 12 04/08/2008. Sven Zenker

Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem Lecture 12 04/08/2008 Sven Zenker Assignment no. 8 Correct setup of likelihood function One fixed set of observation

### Dirichlet Processes A gentle tutorial

Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid El-Arini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.

### Computational Statistics for Big Data

Lancaster University Computational Statistics for Big Data Author: 1 Supervisors: Paul Fearnhead 1 Emily Fox 2 1 Lancaster University 2 The University of Washington September 1, 2015 Abstract The amount

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com

Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

### Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

### Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010

Simulation Methods Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Simulation Methods 15.450, Fall 2010 1 / 35 Outline 1 Generating Random Numbers 2 Variance Reduction 3 Quasi-Monte

### Introduction to Monte Carlo. Astro 542 Princeton University Shirley Ho

Introduction to Monte Carlo Astro 542 Princeton University Shirley Ho Agenda Monte Carlo -- definition, examples Sampling Methods (Rejection, Metropolis, Metropolis-Hasting, Exact Sampling) Markov Chains

### Inference on Phase-type Models via MCMC

Inference on Phase-type Models via MCMC with application to networks of repairable redundant systems Louis JM Aslett and Simon P Wilson Trinity College Dublin 28 th June 202 Toy Example : Redundant Repairable

### Introduction to the Monte Carlo method

Some history Simple applications Radiation transport modelling Flux and Dose calculations Variance reduction Easy Monte Carlo Pioneers of the Monte Carlo Simulation Method: Stanisław Ulam (1909 1984) Stanislaw

### Optimising and Adapting the Metropolis Algorithm

Optimising and Adapting the Metropolis Algorithm by Jeffrey S. Rosenthal 1 (February 2013; revised March 2013 and May 2013 and June 2013) 1 Introduction Many modern scientific questions involve high dimensional

### AN ACCESSIBLE TREATMENT OF MONTE CARLO METHODS, TECHNIQUES, AND APPLICATIONS IN THE FIELD OF FINANCE AND ECONOMICS

Brochure More information from http://www.researchandmarkets.com/reports/2638617/ Handbook in Monte Carlo Simulation. Applications in Financial Engineering, Risk Management, and Economics. Wiley Handbooks

### Monte Carlo Simulation technique. S. B. Santra Department of Physics Indian Institute of Technology Guwahati

Monte Carlo Simulation technique S. B. Santra Department of Physics Indian Institute of Technology Guwahati What is Monte Carlo (MC)? Introduction Monte Carlo method is a common name for a wide variety

### Model-based Synthesis. Tony O Hagan

Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that

### A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data Faming Liang University of Florida August 9, 2015 Abstract MCMC methods have proven to be a very powerful tool for analyzing

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

### MCMC Using Hamiltonian Dynamics

5 MCMC Using Hamiltonian Dynamics Radford M. Neal 5.1 Introduction Markov chain Monte Carlo (MCMC) originated with the classic paper of Metropolis et al. (1953), where it was used to simulate the distribution

### Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

### Dealing with large datasets

Dealing with large datasets (by throwing away most of the data) Alan Heavens Institute for Astronomy, University of Edinburgh with Ben Panter, Rob Tweedie, Mark Bastin, Will Hossack, Keith McKellar, Trevor

### Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

### Monte Carlo Simulation

1 Monte Carlo Simulation Stefan Weber Leibniz Universität Hannover email: sweber@stochastik.uni-hannover.de web: www.stochastik.uni-hannover.de/ sweber Monte Carlo Simulation 2 Quantifying and Hedging

### Sampling-based optimization

Sampling-based optimization Richard Combes October 11, 2013 The topic of this lecture is a family of mathematical techniques called sampling-based methods. These methods are called sampling mathods because

### Lab 8: Introduction to WinBUGS

40.656 Lab 8 008 Lab 8: Introduction to WinBUGS Goals:. Introduce the concepts of Bayesian data analysis.. Learn the basic syntax of WinBUGS. 3. Learn the basics of using WinBUGS in a simple example. Next

### Markov Chain Monte Carlo Simulation Made Simple

Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical

### EC 6310: Advanced Econometric Theory

EC 6310: Advanced Econometric Theory July 2008 Slides for Lecture on Bayesian Computation in the Nonlinear Regression Model Gary Koop, University of Strathclyde 1 Summary Readings: Chapter 5 of textbook.

### Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT

Validation of Software for Bayesian Models using Posterior Quantiles Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Abstract We present a simulation-based method designed to establish that software

### Enhancing Parallel Pattern Search Optimization with a Gaussian Process Oracle

Enhancing Parallel Pattern Search Optimization with a Gaussian Process Oracle Genetha A. Gray, Monica Martinez-Canales Computational Sciences & Mathematics Research Department, Sandia National Laboratories,

### An Introduction to Machine Learning

An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

### Combining information from different survey samples - a case study with data collected by world wide web and telephone

Combining information from different survey samples - a case study with data collected by world wide web and telephone Magne Aldrin Norwegian Computing Center P.O. Box 114 Blindern N-0314 Oslo Norway E-mail:

### Information Theory (2)

Some Applications in Computer Science Jean-Marc Vincent MESCAL-INRIA Project Laboratoire d Informatique de Grenoble Universities of Grenoble, France Jean-Marc.Vincent@imag.fr LICIA - joint laboratory LIG

### Time Series Analysis

Time Series Analysis Time series and stochastic processes Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and García-Martos

### Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February

### Exercises with solutions (1)

Exercises with solutions (). Investigate the relationship between independence and correlation. (a) Two random variables X and Y are said to be correlated if and only if their covariance C XY is not equal

### Bayesian Statistics: Indian Buffet Process

Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note

### Segmentation of 2D Gel Electrophoresis Spots Using a Markov Random Field

Segmentation of 2D Gel Electrophoresis Spots Using a Markov Random Field Christopher S. Hoeflich and Jason J. Corso csh7@cse.buffalo.edu, jcorso@cse.buffalo.edu Computer Science and Engineering University

### Markov Chain Monte Carlo and Numerical Differential Equations

Markov Chain Monte Carlo and Numerical Differential Equations J.M. Sanz-Serna 1 Introduction This contribution presents a hopefully readable introduction to Markov Chain Monte Carlo methods with particular

### Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Refik Soyer * Department of Management Science The George Washington University M. Murat Tarimcilar Department of Management Science

### Imperfect Debugging in Software Reliability

Imperfect Debugging in Software Reliability Tevfik Aktekin and Toros Caglar University of New Hampshire Peter T. Paul College of Business and Economics Department of Decision Sciences and United Health

### Parallelization Strategies for Multicore Data Analysis

Parallelization Strategies for Multicore Data Analysis Wei-Chen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management

### P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )

Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

### Logistic Regression (1/24/13)

STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

### Big Data need Big Model 1/44

Big Data need Big Model 1/44 Andrew Gelman, Bob Carpenter, Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, Allen Riddell,... Department of Statistics,

### ECON4202/ECON6201 Advanced Econometric Theory and Methods

Business School School of Economics ECON4202/ECON6201 Advanced Econometric Theory and Methods (SIMULATION BASED ECONOMETRIC METHODS) Course Outline Semester 2, 2015 Part A: Course-Specific Information

### Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data (Oxford) in collaboration with: Minjie Xu, Jun Zhu, Bo Zhang (Tsinghua) Balaji Lakshminarayanan (Gatsby) Bayesian

### Probability and Random Variables. Generation of random variables (r.v.)

Probability and Random Variables Method for generating random variables with a specified probability distribution function. Gaussian And Markov Processes Characterization of Stationary Random Process Linearly

### Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015

### Monte Carlo Methods and Models in Finance and Insurance

Chapman & Hall/CRC FINANCIAL MATHEMATICS SERIES Monte Carlo Methods and Models in Finance and Insurance Ralf Korn Elke Korn Gerald Kroisandt f r oc) CRC Press \ V^ J Taylor & Francis Croup ^^"^ Boca Raton

### Incorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care.

Incorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and

### General Sampling Methods

General Sampling Methods Reference: Glasserman, 2.2 and 2.3 Claudio Pacati academic year 2016 17 1 Inverse Transform Method Assume U U(0, 1) and let F be the cumulative distribution function of a distribution

### Markov Chain Monte Carlo and Applied Bayesian Statistics: a short course Chris Holmes Professor of Biostatistics Oxford Centre for Gene Function

MCMC Appl. Bayes 1 Markov Chain Monte Carlo and Applied Bayesian Statistics: a short course Chris Holmes Professor of Biostatistics Oxford Centre for Gene Function MCMC Appl. Bayes 2 Objectives of Course

### ST 371 (VIII): Theory of Joint Distributions

ST 371 (VIII): Theory of Joint Distributions So far we have focused on probability distributions for single random variables. However, we are often interested in probability statements concerning two or

### Centre for Central Banking Studies

Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics

### A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS Eusebio GÓMEZ, Miguel A. GÓMEZ-VILLEGAS and J. Miguel MARÍN Abstract In this paper it is taken up a revision and characterization of the class of

### Bayesian Model Averaging CRM in Phase I Clinical Trials

M.D. Anderson Cancer Center 1 Bayesian Model Averaging CRM in Phase I Clinical Trials Department of Biostatistics U. T. M. D. Anderson Cancer Center Houston, TX Joint work with Guosheng Yin M.D. Anderson

### Analyzing Clinical Trial Data via the Bayesian Multiple Logistic Random Effects Model

Analyzing Clinical Trial Data via the Bayesian Multiple Logistic Random Effects Model Bartolucci, A.A 1, Singh, K.P 2 and Bae, S.J 2 1 Dept. of Biostatistics, University of Alabama at Birmingham, Birmingham,

### MAN-BITES-DOG BUSINESS CYCLES ONLINE APPENDIX

MAN-BITES-DOG BUSINESS CYCLES ONLINE APPENDIX KRISTOFFER P. NIMARK The next section derives the equilibrium expressions for the beauty contest model from Section 3 of the main paper. This is followed by

### BAYESIAN ECONOMETRICS

BAYESIAN ECONOMETRICS VICTOR CHERNOZHUKOV Bayesian econometrics employs Bayesian methods for inference about economic questions using economic data. In the following, we briefly review these methods and

### Bayesian Methods. 1 The Joint Posterior Distribution

Bayesian Methods Every variable in a linear model is a random variable derived from a distribution function. A fixed factor becomes a random variable with possibly a uniform distribution going from a lower

### Introduction to Markov Chain Monte Carlo

1 Introduction to Markov Chain Monte Carlo Charles J. Geyer 1.1 History Despite a few notable uses of simulation of random processes in the pre-computer era (Hammersley and Handscomb, 1964, Section 1.2;

### Model Calibration with Open Source Software: R and Friends. Dr. Heiko Frings Mathematical Risk Consulting

Model with Open Source Software: and Friends Dr. Heiko Frings Mathematical isk Consulting Bern, 01.09.2011 Agenda in a Friends Model with & Friends o o o Overview First instance: An Extreme Value Example

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Lecture 6: The Bayesian Approach

Lecture 6: The Bayesian Approach What Did We Do Up to Now? We are given a model Log-linear model, Markov network, Bayesian network, etc. This model induces a distribution P(X) Learning: estimate a set

### DURATION ANALYSIS OF FLEET DYNAMICS

DURATION ANALYSIS OF FLEET DYNAMICS Garth Holloway, University of Reading, garth.holloway@reading.ac.uk David Tomberlin, NOAA Fisheries, david.tomberlin@noaa.gov ABSTRACT Though long a standard technique

### Monte Carlo-based statistical methods (MASM11/FMS091)

Monte Carlo-based statistical methods (MASM11/FMS091) Jimmy Olsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February 5, 2013 J. Olsson Monte Carlo-based

### PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

### These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

### Monte Carlo Methods in Finance

Author: Yiyang Yang Advisor: Pr. Xiaolin Li, Pr. Zari Rachev Department of Applied Mathematics and Statistics State University of New York at Stony Brook October 2, 2012 Outline Introduction 1 Introduction

### Bayesian Analysis Users Guide Release 4.00, Manual Version 1

Bayesian Analysis Users Guide Release 4.00, Manual Version 1 G. Larry Bretthorst Biomedical MR Laboratory Washington University School Of Medicine, Campus Box 8227 Room 2313, East Bldg., 4525 Scott Ave.

### PMR5406 Redes Neurais e Lógica Fuzzy Aula 3 Multilayer Percetrons

PMR5406 Redes Neurais e Aula 3 Multilayer Percetrons Baseado em: Neural Networks, Simon Haykin, Prentice-Hall, 2 nd edition Slides do curso por Elena Marchiori, Vrie Unviersity Multilayer Perceptrons Architecture

### Estimating the evidence for statistical models

Estimating the evidence for statistical models Nial Friel University College Dublin nial.friel@ucd.ie March, 2011 Introduction Bayesian model choice Given data y and competing models: m 1,..., m l, each

### BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

### Linear Classification. Volker Tresp Summer 2015

Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

### Principle of Data Reduction

Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

### Efficiency Improvement of Stochastic Simulation by Means of Subset Sampling

5. LS-DYNA Anwenderforum, Ulm 6 Robustheit / Optimierung I Efficiency Improvement of Stochastic Simulation by Means of Subset Sampling Martin Liebscher, Stephan Pannier, Jan-Uwe Sickert, Wolfgang Graf

### APPLIED MISSING DATA ANALYSIS

APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview

### The Exponential Family

The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural

### ABC methods for model choice in Gibbs random fields

ABC methods for model choice in Gibbs random fields Jean-Michel Marin Institut de Mathématiques et Modélisation Université Montpellier 2 Joint work with Aude Grelaud, Christian Robert, François Rodolphe,

### Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization

Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization Jean- Damien Villiers ESSEC Business School Master of Sciences in Management Grande Ecole September 2013 1 Non Linear

### Modeling the Distribution of Environmental Radon Levels in Iowa: Combining Multiple Sources of Spatially Misaligned Data

Modeling the Distribution of Environmental Radon Levels in Iowa: Combining Multiple Sources of Spatially Misaligned Data Brian J. Smith, Ph.D. The University of Iowa Joint Statistical Meetings August 10,

### Chapter 2, part 2. Petter Mostad

Chapter 2, part 2 Petter Mostad mostad@chalmers.se Parametrical families of probability distributions How can we solve the problem of learning about the population distribution from the sample? Usual procedure:

### Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x

### INTRODUCTION TO NEURAL NETWORKS

INTRODUCTION TO NEURAL NETWORKS Pictures are taken from http://www.cs.cmu.edu/~tom/mlbook-chapter-slides.html http://research.microsoft.com/~cmbishop/prml/index.htm By Nobel Khandaker Neural Networks An

### Portfolio Distribution Modelling and Computation. Harry Zheng Department of Mathematics Imperial College h.zheng@imperial.ac.uk

Portfolio Distribution Modelling and Computation Harry Zheng Department of Mathematics Imperial College h.zheng@imperial.ac.uk Workshop on Fast Financial Algorithms Tanaka Business School Imperial College

### Calibration of Dynamic Traffic Assignment Models

Calibration of Dynamic Traffic Assignment Models m.hazelton@massey.ac.nz Presented at DADDY 09, Salerno Italy, 2-4 December 2009 Outline 1 Statistical Inference as Part of the Modelling Process Model Calibration

### Bayesian Image Super-Resolution

Bayesian Image Super-Resolution Michael E. Tipping and Christopher M. Bishop Microsoft Research, Cambridge, U.K..................................................................... Published as: Bayesian

### C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)}

C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)} 1. EES 800: Econometrics I Simple linear regression and correlation analysis. Specification and estimation of a regression model. Interpretation of regression

### Tutorial on variational approximation methods. Tommi S. Jaakkola MIT AI Lab

Tutorial on variational approximation methods Tommi S. Jaakkola MIT AI Lab tommi@ai.mit.edu Tutorial topics A bit of history Examples of variational methods A brief intro to graphical models Variational

### Imputing Missing Data using SAS

ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are

### Handling attrition and non-response in longitudinal data

Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

### CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

### Chenfeng Xiong (corresponding), University of Maryland, College Park (cxiong@umd.edu)

Paper Author (s) Chenfeng Xiong (corresponding), University of Maryland, College Park (cxiong@umd.edu) Lei Zhang, University of Maryland, College Park (lei@umd.edu) Paper Title & Number Dynamic Travel

### Intro Inverse Modelling

Intro Inverse Modelling Thomas Kaminski (http://fastopt.com) Thanks to: Simon Blessing (FastOpt), Ralf Giering (FastOpt), Nadine Gobron (JRC), Wolfgang Knorr (QUEST), Thomas Lavergne (Met.No), Bernard

### Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

### Some stability results of parameter identification in a jump diffusion model

Some stability results of parameter identification in a jump diffusion model D. Düvelmeyer Technische Universität Chemnitz, Fakultät für Mathematik, 09107 Chemnitz, Germany Abstract In this paper we discuss