# bayesian inference: principles and practice

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 bayesian inference: and july 9, 2009 bayesian inference: and

2 motivation background bayes theorem bayesian probability bayesian inference would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena are simple enough to generalize to future observations bayesian inference: and

3 motivation background bayes theorem bayesian probability bayesian inference would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena are simple enough to generalize to future observations claim: bayesian inference provides a systematic framework to infer such models from observed data bayesian inference: and

4 motivation background bayes theorem bayesian probability bayesian inference behind bayesian interpretation of probability and bayesian inference are well established (bayes, laplace, etc., 18th century) + recent advances in mathematical techniques and computational resources have enabled successful applications of these to real-world problems bayesian inference: and

5 background bayes theorem bayesian probability bayesian inference motivation: a bayesian approach to network modularity bayesian inference: and

6 outline background bayes theorem bayesian probability bayesian inference 1 (what we d like to do) background: joint, marginal, and conditional probabilities bayes theorem: inverting conditional probabilities bayesian probability: unknowns as random variables bayesian inference: bayesian probability + bayes theorem 2 (what we re able to do) monte carlo methods: representative samples : bound optimization bayesian inference: and

7 background bayes theorem bayesian probability bayesian inference joint, marginal, and conditional probabilities joint distribution p XY (X = x, Y = y): probability X = x and Y = y conditional distribution p X Y (X = x Y = y): probability X = x given Y = y marginal distribution p X (X ): probability X = x (regardless of Y ) bayesian inference: and

8 sum and product rules background bayes theorem bayesian probability bayesian inference sum rule sum out settings of irrelevant variables: p (x) = y Ω Y p (x, y) (1) product rule the joint as the product of the conditional and marginal: p (x, y) = p (x y) p (y) (2) = p (y x) p (x) (3) bayesian inference: and

9 outline background bayes theorem bayesian probability bayesian inference 1 (what we d like to do) background: joint, marginal, and conditional probabilities bayes theorem: inverting conditional probabilities bayesian probability: unknowns as random variables bayesian inference: bayesian probability + bayes theorem 2 (what we re able to do) monte carlo methods: representative samples : bound optimization bayesian inference: and

10 inverting conditional probabilities background bayes theorem bayesian probability bayesian inference equate far right- and left-hand sides of product rule p (y x) p (x) = p (x, y) = p (x y) p (y) (4) and divide: bayes theorem (bayes and price 1763) the probability of Y given X from the probability of X given Y : p (y x) = p (x y) p (y) p (x) (5) where p (x) = y Ω Y p (x y) p (y) is the normalization constant bayesian inference: and

11 example: diagnoses a la bayes background bayes theorem bayesian probability bayesian inference population 10,000 1% has (rare) disease test is 99% (relatively) effective, i.e. given a patient is sick, 99% test positive given a patient is healthy, 99% test negative bayesian inference: and

12 example: diagnoses a la bayes background bayes theorem bayesian probability bayesian inference population 10,000 1% has (rare) disease test is 99% (relatively) effective, i.e. given a patient is sick, 99% test positive given a patient is healthy, 99% test negative given positive test, what is probability the patient is sick? 1 1 follows wiggins (2006) bayesian inference: and

13 example: diagnoses a la bayes background bayes theorem bayesian probability bayesian inference sick population (100 ppl) healthy population (9900 ppl) 1% (99 ppl test +) 1% (1 ppl test -) 99% (99 ppl test +) 99% (9801 ppl test -) 99 sick patients test positive, 99 healthy patients test positive bayesian inference: and

14 example: diagnoses a la bayes background bayes theorem bayesian probability bayesian inference sick population (100 ppl) healthy population (9900 ppl) 1% (99 ppl test +) 1% (1 ppl test -) 99% (99 ppl test +) 99% (9801 ppl test -) 99 sick patients test positive, 99 healthy patients test positive given positive test, 50% probability that patient is sick bayesian inference: and

15 example: diagnoses a la bayes background bayes theorem bayesian probability bayesian inference know probability of testing positive/negative given sick/healthy use bayes theorem to invert to probability of sick/healthy given positive/negative test p (sick test +) = 99/100 1/100 {}}{{}}{ p (test + sick) p (sick) p (test +) }{{} 198/100 2 = = 1 2 (6) bayesian inference: and

16 example: diagnoses a la bayes background bayes theorem bayesian probability bayesian inference know probability of testing positive/negative given sick/healthy use bayes theorem to invert to probability of sick/healthy given positive/negative test p (sick test +) = 99/100 1/100 {}}{{}}{ p (test + sick) p (sick) p (test +) }{{} 198/100 2 = = 1 2 (6) most work in calculating denominator (normalization) bayesian inference: and

17 outline background bayes theorem bayesian probability bayesian inference 1 (what we d like to do) background: joint, marginal, and conditional probabilities bayes theorem: inverting conditional probabilities bayesian probability: unknowns as random variables bayesian inference: bayesian probability + bayes theorem 2 (what we re able to do) monte carlo methods: representative samples : bound optimization bayesian inference: and

18 interpretations of probabilities (just enough philosophy) background bayes theorem bayesian probability bayesian inference frequentists: limit of relative frequency of events for large number of trials bayesians: measure of a state of knowledge, quantifying degrees of belief (jaynes 2003) bayesian inference: and

19 interpretations of probabilities (just enough philosophy) background bayes theorem bayesian probability bayesian inference frequentists: limit of relative frequency of events for large number of trials bayesians: measure of a state of knowledge, quantifying degrees of belief (jaynes 2003) key difference: bayesians permit assignment of probabilities to unknown/unobservable hypotheses (frequentists do not) bayesian inference: and

20 interpretations of probabilities (just enough philosophy) background bayes theorem bayesian probability bayesian inference e.g., inferring model parameters Θ from observed data D: frequentist approach: calculate parameter setting that maximizes likelihood of data (point estimate), Θ = argmax p(d Θ) (7) Θ bayesian approach: calculate distribution over parameter settings given data, p(θ D) =? (8) bayesian inference: and

21 outline background bayes theorem bayesian probability bayesian inference 1 (what we d like to do) background: joint, marginal, and conditional probabilities bayes theorem: inverting conditional probabilities bayesian probability: unknowns as random variables bayesian inference: bayesian probability + bayes theorem 2 (what we re able to do) monte carlo methods: representative samples : bound optimization bayesian inference: and

22 background bayes theorem bayesian probability bayesian inference bayesian probability + bayes theorem bayesian inference: treat unknown quantities as random variables use bayes theorem to systematically update prior knowledge in the presence of observed data posterior {}}{ p(θ D) = likelihood prior {}}{{}}{ p(d Θ) p(θ) p(d) }{{} evidence (9) bayesian inference: and

23 example: coin flipping background bayes theorem bayesian probability bayesian inference observe independent coin flips (bernoulli trials) infer distribution over coin bias bayesian inference: and

24 example: coin flipping background bayes theorem bayesian probability bayesian inference prior p(θ) over coin bias before observing flips bayesian inference: and

25 example: coin flipping background bayes theorem bayesian probability bayesian inference observe flips: HTHHHTTHHHH bayesian inference: and

26 example: coin flipping background bayes theorem bayesian probability bayesian inference update posterior p(θ D) using bayes theorem bayesian inference: and

27 example: coin flipping background bayes theorem bayesian probability bayesian inference observe flips: HHHHHHHHHHHHTHHHHHHHHHH HHHHHHHHHHHHHHHHHHHHHHH HHHHHHTHHHHHHHHHHHHHHHH HHHHHHHHHHHHHHHHHHHHHHH HHHHHHHHT bayesian inference: and

28 example: coin flipping background bayes theorem bayesian probability bayesian inference update posterior p(θ D) using bayes theorem bayesian inference: and

29 quantities of interest background bayes theorem bayesian probability bayesian inference bayesian inference maintains full posterior distributions over unknowns many quantities of interest require expectations under these posteriors, e.g. posterior mean and predictive distribution: Θ = E p(θ D) [Θ] = dθ Θ p(θ D) (10) p (x D) = E p(θ D) [p (x Θ, D)] = dθ p (x Θ, D) p(θ D) (11) bayesian inference: and

30 quantities of interest background bayes theorem bayesian probability bayesian inference bayesian inference maintains full posterior distributions over unknowns many quantities of interest require expectations under these posteriors, e.g. posterior mean and predictive distribution: Θ = E p(θ D) [Θ] = dθ Θ p(θ D) (10) p (x D) = E p(θ D) [p (x Θ, D)] = dθ p (x Θ, D) p(θ D) often can t compute posterior (normalization), let alone expectations with respect to it approximation methods (11) bayesian inference: and

31 outline 1 (what we d like to do) background: joint, marginal, and conditional probabilities bayes theorem: inverting conditional probabilities bayesian probability: unknowns as random variables bayesian inference: bayesian probability + bayes theorem 2 (what we re able to do) monte carlo methods: representative samples : bound optimization bayesian inference: and

32 representative samples general approach: approximate intractable expectations via sum over representative samples 2 Φ = E p(x) [φ(x)] = dx φ(x) }{{} arbitrary function p(x) }{{} target density (12) 2 follows mackay (2003), including stolen images bayesian inference: and

33 representative samples general approach: approximate intractable expectations via sum over representative samples 2 Φ = E p(x) [φ(x)] = dx φ(x) }{{} arbitrary function p(x) }{{} target density (12) Φ = 1 R R φ(x (r) ) (13) r=1 2 follows mackay (2003), including stolen images bayesian inference: and

34 representative samples general approach: approximate intractable expectations via sum over representative samples 2 Φ = E p(x) [φ(x)] = dx φ(x) }{{} arbitrary function p(x) }{{} target density (12) Φ = 1 R R φ(x (r) ) (13) r=1 shifts problem to finding good samples 2 follows mackay (2003), including stolen images bayesian inference: and

35 representative samples further complication: in general we can only evaluate the target density to within a multiplicative (normalization) constant, i.e. p(x) = p (x) Z and p (x (r) ) can be evaluated with Z unknown (14) bayesian inference: and

36 monte carlo methods uniform sampling importance sampling rejection sampling... markov chain monte carlo (mcmc) methods metropolis-hastings gibbs sampling... bayesian inference: and

37 uniform sampling sample uniformly from state space of all x values evaluate non-normalized density p (x (r) ) at each x (r) approximate normalization constant as estimate expectation as Φ = Z R = R r=1 R p (x (r) ) (15) r=1 φ(x (r) ) p (x (r) ) Z R (16) bayesian inference: and

38 uniform sampling sample uniformly from state space of all x values evaluate non-normalized density p (x (r) ) at each x (r) approximate normalization constant as estimate expectation as Φ = Z R = R r=1 R p (x (r) ) (15) r=1 φ(x (r) ) p (x (r) ) Z R (16) requires prohibitively large number of samples in high dimensions with concentrated density bayesian inference: and

39 importance sampling ot permitted. k/mackay/itila/ for links. 29 Monte Carlo Methods modify uniform sampling by introducing a sampler density to sample from it q(x) = q (x) (x) from which we Z Q n a multiplicative choose q(x) simple enough that q (x) can be sampled from, Q (x)/zwith hope that q (x) is a reasonable approximation to p Q ). An (x) 9.5. We call Q the P (x) Q (x) R φ(x) r=1 from Q(x). If mate Φ by equaof x where Q(x) is, and points where x into account the introduce weights Figure Functions involved in importance sampling. We wish to (29.21) bayesian inference: and

40 importance sampling adjust estimator by weighting importance of each sample Φ = R r=1 w r φ(x (r) ) R w r (17) where w r = p (x (r) ) q (x (r) ) (18) bayesian inference: and

41 importance sampling adjust estimator by weighting importance of each sample Φ = R r=1 w r φ(x (r) ) R w r (17) where w r = p (x (r) ) q (x (r) ) (18) difficult to choose good q (x) as well as estimate reliability of estimator bayesian inference: and

42 rejection sampling similar to importance sampling, but proposal density strictly bounds target density, i.e. Copyright Cambridge University Press On-screen viewing permitted. Printing no for some 364known value c and all x cq (x) > p (x), (19) You can buy this book for 30 pounds or \$50. See (a) P (x) cq (x) (b) P (x) c u x x bayesian inference: and

43 rejection sampling generate sample x from q (x) generate uniformly random number u from [0, cq (x)] add x to set {x (r) } if p (x) > u estimate expectation as Φ = 1 R R r=1 φ(x (r) ) ress On-screen viewing permitted. Printing not permitted. ds or \$50. See for links. P (x) cq (x) x (b) P (x) u x cq (x) undred samples, what will the typical range of weights be? stimate the ratio of the largest jakeweight hofman to the median weight x bayesian inference: and 29 Mon Figure R (a) The functi rejection samp samples from are able to dra Q(x) Q (x), value c such th for all x. (b) A generated at r shaded area un c Q (x). If thi below P (x) th

44 rejection sampling generate sample x from q (x) generate uniformly random number u from [0, cq (x)] add x to set {x (r) } if p (x) > u estimate expectation as Φ = 1 R R r=1 φ(x (r) ) ress On-screen viewing permitted. Printing not permitted. ds or \$50. See for links. 29 Mon Q(x) Q (x), generated at r cq (b) (x) cq (x) Figure R P (x) P (a) The functi (x) rejection samp u samples from are able to dra x x x value c such th for all x. (b) A c often prohibitively large for poor choice of q (x) or high shaded area un undred samples, dimensions what will the typical range of weights be? stimate the ratio of the largest weight to the median weight c Q (x). If thi below P (x) th bayesian inference: and

45 ,000. What will the mediate: since the metropolis-hastings P (x) to the volume ed here implies that In general, use c a grows local proposal density q(x ; x (t) ), depending on current nce rate is state expected x (t) Q(x; x (1) ) for one-dimensional generating samples P (x) x (1) x only if theconstruct proposalmarkov chain through stateq(x; space, x (2) ) converging to oblems it target is difficult density note: proposal density needn t closely approximate target e of a proposal densityden- P (x) ity Q(x ; x (t) ) might (t) bayesian inference: and

46 metropolis-hastings at time t, generate tenative state x from q(x ; x (t) ) evaluate a = p (x (t) ) q(x (t) ; x ) p (x ) q(x ; x (t) ) (20) if a 1, accept the new state; else accept the new state with probability a if new state is rejected, set x (t+1) = x (t) bayesian inference: and

47 metropolis-hastings at time t, generate tenative state x from q(x ; x (t) ) evaluate a = p (x (t) ) q(x (t) ; x ) p (x ) q(x ; x (t) ) (20) if a 1, accept the new state; else accept the new state with probability a if new state is rejected, set x (t+1) = x (t) effective for high dimensional problems, but difficult to assess convergence of markov chain 3 3 see neal (1993) bayesian inference: and

48 gibbs sampling metropolis method where proposal density is chosen as conditional distribution, i.e. ( ) q(x i ; x (t) ) = p x i {x (t) j } j i (21) useful when joint density factorizes, as in sparse graphical model 4 4 see wainwright & jordan 2008 bayesian inference: and

49 gibbs sampling metropolis method where proposal density is chosen as conditional distribution, i.e. ( ) q(x i ; x (t) ) = p x i {x (t) j } j i (21) useful when joint density factorizes, as in sparse graphical model 4 similar difficulties to metropolis, but no concerns about adjustable parameters 4 see wainwright & jordan 2008 bayesian inference: and

50 outline 1 (what we d like to do) background: joint, marginal, and conditional probabilities bayes theorem: inverting conditional probabilities bayesian probability: unknowns as random variables bayesian inference: bayesian probability + bayes theorem 2 (what we re able to do) monte carlo methods: representative samples : bound optimization bayesian inference: and

51 bound optimization general approach: replace integration with optimization construct auxiliary function upper-bounded by log-evidence, maximize auxiliary 9.4. The EM function Algorithm in General 453 he EM algorithm involves alterately computing a lower bound n the log likelihood for the curent parameter values and then aximizing this bound to obtain he new parameter values. See he text for a full discussion. ln p(x θ) L (q,θ) θ old θ new 5 omplete data) 5 image log likelihood from bishop function (2006) whose value we wish to maximize. We start ith some initial parameter value θ old, and injake the hofman first E step we bayesian evaluate inference: the poste- and

52 variational bayes bound log of expected value by expected value of log using jensen s inequality 6 : ln p(d) = ln dθ p(d Θ)p(Θ) = ln dθ p(d Θ)p(Θ) q(θ) q(θ) [ ] p(d Θ)p(Θ) dθ ln q(θ) q(θ) for sufficiently simple (i.e. factorized) approximating distribution q(θ), right-hand side can be easily evaluated and optimized 6 image from feynman (1972) bayesian inference: and

53 variational bayes iterative coordinate ascent algorithm provides controlled analytic approxmations to posterior and evidence approximate posterior q(θ) minimizes kullback-leibler distance to true posterior resulting deterministic algorithm is often fast and scalable bayesian inference: and

54 variational bayes iterative coordinate ascent algorithm provides controlled analytic approxmations to posterior and evidence approximate posterior q(θ) minimizes kullback-leibler distance to true posterior resulting deterministic algorithm is often fast and scalable complexity of approximation often limited (to, e.g., mean-field theory, assuming weak interaction between unknowns) iterative algorithm requires restarts, no guarantees on quality of approximation bayesian inference: and

55 example: a bayesian approach to network modularity bayesian inference: and

56 example: a bayesian approach to network modularity nodes: authors, edges: co-authored papers can we infer (community) structure in the giant component? bayesian inference: and

57 example: a bayesian approach to network modularity bayesian inference: and

58 example: a bayesian approach to network modularity bayesian inference: and

59 example: a bayesian approach to network modularity inferred topological communities correspond to sub-disciplines bayesian inference: and

60 outline 1 (what we d like to do) background: joint, marginal, and conditional probabilities bayes theorem: inverting conditional probabilities bayesian probability: unknowns as random variables bayesian inference: bayesian probability + bayes theorem 2 (what we re able to do) monte carlo methods: representative samples : bound optimization bayesian inference: and

61 information theory, inference, and learning algorithms, mackay (2003) pattern recognition and machine learning, bishop (2006) bayesian data analysis, gelman, et. al. (2003) probabilistic inference using markov chain monte carlo methods, neal (1993) graphical models, exponential families, and variational inference, wainwright & jordan (2006) probability theory: the logic of science, jaynes (2003) what is bayes theorem..., wiggins (2006) bayesian inference view on cran variational-bayes.org variational bayesian inference for network modularity bayesian inference: and

### Tutorial on Markov Chain Monte Carlo

Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### Lecture 4 : Bayesian inference

Lecture 4 : Bayesian inference The Lecture dark 4 energy : Bayesian puzzle inference What is the Bayesian approach to statistics? How does it differ from the frequentist approach? Conditional probabilities,

### An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics

Slide 1 An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics Dr. Christian Asseburg Centre for Health Economics Part 1 Slide 2 Talk overview Foundations of Bayesian statistics

### The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

### Introduction to Markov Chain Monte Carlo

Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem

### Gaussian Processes to Speed up Hamiltonian Monte Carlo

Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo

### Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

### Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com

Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

### Stat260: Bayesian Modeling and Inference Lecture Date: February 1, Lecture 3

Stat26: Bayesian Modeling and Inference Lecture Date: February 1, 21 Lecture 3 Lecturer: Michael I. Jordan Scribe: Joshua G. Schraiber 1 Decision theory Recall that decision theory provides a quantification

### Dirichlet Processes A gentle tutorial

Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid El-Arini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.

### Artificial Intelligence Mar 27, Bayesian Networks 1 P (T D)P (D) + P (T D)P ( D) =

Artificial Intelligence 15-381 Mar 27, 2007 Bayesian Networks 1 Recap of last lecture Probability: precise representation of uncertainty Probability theory: optimal updating of knowledge based on new information

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

### Examination 110 Probability and Statistics Examination

Examination 0 Probability and Statistics Examination Sample Examination Questions The Probability and Statistics Examination consists of 5 multiple-choice test questions. The test is a three-hour examination

### Inference on Phase-type Models via MCMC

Inference on Phase-type Models via MCMC with application to networks of repairable redundant systems Louis JM Aslett and Simon P Wilson Trinity College Dublin 28 th June 202 Toy Example : Redundant Repairable

### Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data (Oxford) in collaboration with: Minjie Xu, Jun Zhu, Bo Zhang (Tsinghua) Balaji Lakshminarayanan (Gatsby) Bayesian

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### L10: Probability, statistics, and estimation theory

L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian

### Sufficient Statistics and Exponential Family. 1 Statistics and Sufficient Statistics. Math 541: Statistical Theory II. Lecturer: Songfeng Zheng

Math 541: Statistical Theory II Lecturer: Songfeng Zheng Sufficient Statistics and Exponential Family 1 Statistics and Sufficient Statistics Suppose we have a random sample X 1,, X n taken from a distribution

### 4. Introduction to Statistics

Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

### Introduction to Detection Theory

Introduction to Detection Theory Reading: Ch. 3 in Kay-II. Notes by Prof. Don Johnson on detection theory, see http://www.ece.rice.edu/~dhj/courses/elec531/notes5.pdf. Ch. 10 in Wasserman. EE 527, Detection

### Why the Normal Distribution?

Why the Normal Distribution? Raul Rojas Freie Universität Berlin Februar 2010 Abstract This short note explains in simple terms why the normal distribution is so ubiquitous in pattern recognition applications.

### Lab 8: Introduction to WinBUGS

40.656 Lab 8 008 Lab 8: Introduction to WinBUGS Goals:. Introduce the concepts of Bayesian data analysis.. Learn the basic syntax of WinBUGS. 3. Learn the basics of using WinBUGS in a simple example. Next

### Bayesian Clustering for Email Campaign Detection

Peter Haider haider@cs.uni-potsdam.de Tobias Scheffer scheffer@cs.uni-potsdam.de University of Potsdam, Department of Computer Science, August-Bebel-Strasse 89, 14482 Potsdam, Germany Abstract We discuss

### Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 12 04/08/2008. Sven Zenker

Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem Lecture 12 04/08/2008 Sven Zenker Assignment no. 8 Correct setup of likelihood function One fixed set of observation

### The Exponential Family

The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural

### Likelihood: Frequentist vs Bayesian Reasoning

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B University of California, Berkeley Spring 2009 N Hallinan Likelihood: Frequentist vs Bayesian Reasoning Stochastic odels and

### Introduction to Monte Carlo. Astro 542 Princeton University Shirley Ho

Introduction to Monte Carlo Astro 542 Princeton University Shirley Ho Agenda Monte Carlo -- definition, examples Sampling Methods (Rejection, Metropolis, Metropolis-Hasting, Exact Sampling) Markov Chains

### Bayes and Naïve Bayes. cs534-machine Learning

Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule

### Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 6 Sequential Monte Carlo methods II February

### Model-based Synthesis. Tony O Hagan

Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that

### 1. The maximum likelihood principle 2. Properties of maximum-likelihood estimates

The maximum-likelihood method Volker Blobel University of Hamburg March 2005 1. The maximum likelihood principle 2. Properties of maximum-likelihood estimates Keys during display: enter = next page; =

### Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February

### 33. STATISTICS. 33. Statistics 1

33. STATISTICS 33. Statistics 1 Revised September 2011 by G. Cowan (RHUL). This chapter gives an overview of statistical methods used in high-energy physics. In statistics, we are interested in using a

### Inference of Probability Distributions for Trust and Security applications

Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations

### Monte Carlo-based statistical methods (MASM11/FMS091)

Monte Carlo-based statistical methods (MASM11/FMS091) Jimmy Olsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February 5, 2013 J. Olsson Monte Carlo-based

### Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 11

CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note Conditional Probability A pharmaceutical company is marketing a new test for a certain medical condition. According

### Combining information from different survey samples - a case study with data collected by world wide web and telephone

Combining information from different survey samples - a case study with data collected by world wide web and telephone Magne Aldrin Norwegian Computing Center P.O. Box 114 Blindern N-0314 Oslo Norway E-mail:

The result of the bayesian analysis is the probability distribution of every possible hypothesis H, given one real data set D. This prestatistical approach to our problem was the standard approach of Laplace

### A crash course in probability and Naïve Bayes classification

Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s

### Logic, Probability and Learning

Logic, Probability and Learning Luc De Raedt luc.deraedt@cs.kuleuven.be Overview Logic Learning Probabilistic Learning Probabilistic Logic Learning Closely following : Russell and Norvig, AI: a modern

### THE USE OF STATISTICAL DISTRIBUTIONS TO MODEL CLAIMS IN MOTOR INSURANCE

THE USE OF STATISTICAL DISTRIBUTIONS TO MODEL CLAIMS IN MOTOR INSURANCE Batsirai Winmore Mazviona 1 Tafadzwa Chiduza 2 ABSTRACT In general insurance, companies need to use data on claims gathered from

### Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

### Markov Chain Monte Carlo Simulation Made Simple

Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical

### Bayesian Statistics: Indian Buffet Process

Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note

### Bayesian Methods. 1 The Joint Posterior Distribution

Bayesian Methods Every variable in a linear model is a random variable derived from a distribution function. A fixed factor becomes a random variable with possibly a uniform distribution going from a lower

### Introduction to Deep Learning Variational Inference, Mean Field Theory

Introduction to Deep Learning Variational Inference, Mean Field Theory 1 Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Center for Visual Computing Ecole Centrale Paris Galen Group INRIA-Saclay Lecture 3: recap

### Statistical Machine Learning from Data

Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

### Master s Theory Exam Spring 2006

Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

### Probability and Statistics

CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b - 0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be

### Section 5. Stan for Big Data. Bob Carpenter. Columbia University

Section 5. Stan for Big Data Bob Carpenter Columbia University Part I Overview Scaling and Evaluation data size (bytes) 1e18 1e15 1e12 1e9 1e6 Big Model and Big Data approach state of the art big model

### Parallelization Strategies for Multicore Data Analysis

Parallelization Strategies for Multicore Data Analysis Wei-Chen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management

### Message-passing sequential detection of multiple change points in networks

Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal

### Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1

Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

### Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

1 Learning Goals Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1. Be able to apply Bayes theorem to compute probabilities. 2. Be able to identify

### Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

### Bayesian Phylogeny and Measures of Branch Support

Bayesian Phylogeny and Measures of Branch Support Bayesian Statistics Imagine we have a bag containing 100 dice of which we know that 90 are fair and 10 are biased. The

### Un point de vue bayésien pour des algorithmes de bandit plus performants

Un point de vue bayésien pour des algorithmes de bandit plus performants Emilie Kaufmann, Telecom ParisTech Rencontre des Jeunes Statisticiens, Aussois, 28 août 2013 Emilie Kaufmann (Telecom ParisTech)

### Lecture 9: Bayesian hypothesis testing

Lecture 9: Bayesian hypothesis testing 5 November 27 In this lecture we ll learn about Bayesian hypothesis testing. 1 Introduction to Bayesian hypothesis testing Before we go into the details of Bayesian

### Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Be able to explain the difference between the p-value and a posterior

### STAT3016 Introduction to Bayesian Data Analysis

STAT3016 Introduction to Bayesian Data Analysis Course Description The Bayesian approach to statistics assigns probability distributions to both the data and unknown parameters in the problem. This way,

### Review for Calculus Rational Functions, Logarithms & Exponentials

Definition and Domain of Rational Functions A rational function is defined as the quotient of two polynomial functions. F(x) = P(x) / Q(x) The domain of F is the set of all real numbers except those for

### Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

### PS 271B: Quantitative Methods II. Lecture Notes

PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.

### Towards running complex models on big data

Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation

### Gaussian Classifiers CS498

Gaussian Classifiers CS498 Today s lecture The Gaussian Gaussian classifiers A slightly more sophisticated classifier Nearest Neighbors We can classify with nearest neighbors x m 1 m 2 Decision boundary

### Variational Mean Field for Graphical Models

Variational Mean Field for Graphical Models CS/CNS/EE 155 Baback Moghaddam Machine Learning Group baback @ jpl.nasa.gov Approximate Inference Consider general UGs (i.e., not tree-structured) All basic

### Christfried Webers. Canberra February June 2015

c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

### More details on the inputs, functionality, and output can be found below.

Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a two-armed trial comparing

### AALBORG UNIVERSITY. An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants

AALBORG UNIVERSITY An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants by Jesper Møller, Anthony N. Pettitt, Kasper K. Berthelsen and Robert W. Reeves

### 11 TRANSFORMING DENSITY FUNCTIONS

11 TRANSFORMING DENSITY FUNCTIONS It can be expedient to use a transformation function to transform one probability density function into another. As an introduction to this topic, it is helpful to recapitulate

### Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach

Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach Rémi Bardenet Arnaud Doucet Chris Holmes Department of Statistics, University of Oxford, Oxford OX1 3TG, UK REMI.BARDENET@GMAIL.COM

### Tracking Algorithms. Lecture17: Stochastic Tracking. Joint Probability and Graphical Model. Probabilistic Tracking

Tracking Algorithms (2015S) Lecture17: Stochastic Tracking Bohyung Han CSE, POSTECH bhhan@postech.ac.kr Deterministic methods Given input video and current state, tracking result is always same. Local

### Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of

### Tutorial on variational approximation methods. Tommi S. Jaakkola MIT AI Lab

Tutorial on variational approximation methods Tommi S. Jaakkola MIT AI Lab tommi@ai.mit.edu Tutorial topics A bit of history Examples of variational methods A brief intro to graphical models Variational

### BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

### Estimating the evidence for statistical models

Estimating the evidence for statistical models Nial Friel University College Dublin nial.friel@ucd.ie March, 2011 Introduction Bayesian model choice Given data y and competing models: m 1,..., m l, each

### Estimation and comparison of multiple change-point models

Journal of Econometrics 86 (1998) 221 241 Estimation and comparison of multiple change-point models Siddhartha Chib* John M. Olin School of Business, Washington University, 1 Brookings Drive, Campus Box

### P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )

Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

### Supplement to Regional climate model assessment. using statistical upscaling and downscaling techniques

Supplement to Regional climate model assessment using statistical upscaling and downscaling techniques Veronica J. Berrocal 1,2, Peter F. Craigmile 3, Peter Guttorp 4,5 1 Department of Biostatistics, University

### Random graphs with a given degree sequence

Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.

### Basic Bayesian Methods

6 Basic Bayesian Methods Mark E. Glickman and David A. van Dyk Summary In this chapter, we introduce the basics of Bayesian data analysis. The key ingredients to a Bayesian analysis are the likelihood

### Probabilistic Fitting

Probabilistic Fitting Shape Modeling April 2016 Sandro Schönborn University of Basel 1 Probabilistic Inference for Face Model Fitting Approximate Inference with Markov Chain Monte Carlo 2 Face Image Manipulation

### Probability and statistics; Rehearsal for pattern recognition

Probability and statistics; Rehearsal for pattern recognition Václav Hlaváč Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception

### Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

### Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

### Basics of Probability

Basics of Probability 1 Sample spaces, events and probabilities Begin with a set Ω the sample space e.g., 6 possible rolls of a die. ω Ω is a sample point/possible world/atomic event A probability space

### Incorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care.

Incorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and

### Optimising and Adapting the Metropolis Algorithm

Optimising and Adapting the Metropolis Algorithm by Jeffrey S. Rosenthal 1 (February 2013; revised March 2013 and May 2013 and June 2013) 1 Introduction Many modern scientific questions involve high dimensional

### A Bayesian Antidote Against Strategy Sprawl

A Bayesian Antidote Against Strategy Sprawl Benjamin Scheibehenne (benjamin.scheibehenne@unibas.ch) University of Basel, Missionsstrasse 62a 4055 Basel, Switzerland & Jörg Rieskamp (joerg.rieskamp@unibas.ch)

### Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015

### Logistic Regression (1/24/13)

STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

### Bayesian probability theory

Bayesian probability theory Bruno A. Olshausen arch 1, 2004 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using probability. The foundations

### 11. Time series and dynamic linear models

11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd

### Imperfect Debugging in Software Reliability

Imperfect Debugging in Software Reliability Tevfik Aktekin and Toros Caglar University of New Hampshire Peter T. Paul College of Business and Economics Department of Decision Sciences and United Health

### Monte Carlo-based statistical methods (MASM11/FMS091)

Monte Carlo-based statistical methods (MASM11/FMS091) Jimmy Olsson Centre for Mathematical Sciences Lund University, Sweden Lecture 6 Sequential Monte Carlo methods II February 3, 2012 Changes in HA1 Problem

### BAYESIAN ANALYSIS OF AN AGGREGATE CLAIM MODEL USING VARIOUS LOSS DISTRIBUTIONS. by Claire Dudley

BAYESIAN ANALYSIS OF AN AGGREGATE CLAIM MODEL USING VARIOUS LOSS DISTRIBUTIONS by Claire Dudley A dissertation submitted for the award of the degree of Master of Science in Actuarial Management School