bayesian inference: principles and practice


 Emmeline Wiggins
 1 years ago
 Views:
Transcription
1 bayesian inference: and july 9, 2009 bayesian inference: and
2 motivation background bayes theorem bayesian probability bayesian inference would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena are simple enough to generalize to future observations bayesian inference: and
3 motivation background bayes theorem bayesian probability bayesian inference would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena are simple enough to generalize to future observations claim: bayesian inference provides a systematic framework to infer such models from observed data bayesian inference: and
4 motivation background bayes theorem bayesian probability bayesian inference behind bayesian interpretation of probability and bayesian inference are well established (bayes, laplace, etc., 18th century) + recent advances in mathematical techniques and computational resources have enabled successful applications of these to realworld problems bayesian inference: and
5 background bayes theorem bayesian probability bayesian inference motivation: a bayesian approach to network modularity bayesian inference: and
6 outline background bayes theorem bayesian probability bayesian inference 1 (what we d like to do) background: joint, marginal, and conditional probabilities bayes theorem: inverting conditional probabilities bayesian probability: unknowns as random variables bayesian inference: bayesian probability + bayes theorem 2 (what we re able to do) monte carlo methods: representative samples : bound optimization bayesian inference: and
7 background bayes theorem bayesian probability bayesian inference joint, marginal, and conditional probabilities joint distribution p XY (X = x, Y = y): probability X = x and Y = y conditional distribution p X Y (X = x Y = y): probability X = x given Y = y marginal distribution p X (X ): probability X = x (regardless of Y ) bayesian inference: and
8 sum and product rules background bayes theorem bayesian probability bayesian inference sum rule sum out settings of irrelevant variables: p (x) = y Ω Y p (x, y) (1) product rule the joint as the product of the conditional and marginal: p (x, y) = p (x y) p (y) (2) = p (y x) p (x) (3) bayesian inference: and
9 outline background bayes theorem bayesian probability bayesian inference 1 (what we d like to do) background: joint, marginal, and conditional probabilities bayes theorem: inverting conditional probabilities bayesian probability: unknowns as random variables bayesian inference: bayesian probability + bayes theorem 2 (what we re able to do) monte carlo methods: representative samples : bound optimization bayesian inference: and
10 inverting conditional probabilities background bayes theorem bayesian probability bayesian inference equate far right and lefthand sides of product rule p (y x) p (x) = p (x, y) = p (x y) p (y) (4) and divide: bayes theorem (bayes and price 1763) the probability of Y given X from the probability of X given Y : p (y x) = p (x y) p (y) p (x) (5) where p (x) = y Ω Y p (x y) p (y) is the normalization constant bayesian inference: and
11 example: diagnoses a la bayes background bayes theorem bayesian probability bayesian inference population 10,000 1% has (rare) disease test is 99% (relatively) effective, i.e. given a patient is sick, 99% test positive given a patient is healthy, 99% test negative bayesian inference: and
12 example: diagnoses a la bayes background bayes theorem bayesian probability bayesian inference population 10,000 1% has (rare) disease test is 99% (relatively) effective, i.e. given a patient is sick, 99% test positive given a patient is healthy, 99% test negative given positive test, what is probability the patient is sick? 1 1 follows wiggins (2006) bayesian inference: and
13 example: diagnoses a la bayes background bayes theorem bayesian probability bayesian inference sick population (100 ppl) healthy population (9900 ppl) 1% (99 ppl test +) 1% (1 ppl test ) 99% (99 ppl test +) 99% (9801 ppl test ) 99 sick patients test positive, 99 healthy patients test positive bayesian inference: and
14 example: diagnoses a la bayes background bayes theorem bayesian probability bayesian inference sick population (100 ppl) healthy population (9900 ppl) 1% (99 ppl test +) 1% (1 ppl test ) 99% (99 ppl test +) 99% (9801 ppl test ) 99 sick patients test positive, 99 healthy patients test positive given positive test, 50% probability that patient is sick bayesian inference: and
15 example: diagnoses a la bayes background bayes theorem bayesian probability bayesian inference know probability of testing positive/negative given sick/healthy use bayes theorem to invert to probability of sick/healthy given positive/negative test p (sick test +) = 99/100 1/100 {}}{{}}{ p (test + sick) p (sick) p (test +) }{{} 198/100 2 = = 1 2 (6) bayesian inference: and
16 example: diagnoses a la bayes background bayes theorem bayesian probability bayesian inference know probability of testing positive/negative given sick/healthy use bayes theorem to invert to probability of sick/healthy given positive/negative test p (sick test +) = 99/100 1/100 {}}{{}}{ p (test + sick) p (sick) p (test +) }{{} 198/100 2 = = 1 2 (6) most work in calculating denominator (normalization) bayesian inference: and
17 outline background bayes theorem bayesian probability bayesian inference 1 (what we d like to do) background: joint, marginal, and conditional probabilities bayes theorem: inverting conditional probabilities bayesian probability: unknowns as random variables bayesian inference: bayesian probability + bayes theorem 2 (what we re able to do) monte carlo methods: representative samples : bound optimization bayesian inference: and
18 interpretations of probabilities (just enough philosophy) background bayes theorem bayesian probability bayesian inference frequentists: limit of relative frequency of events for large number of trials bayesians: measure of a state of knowledge, quantifying degrees of belief (jaynes 2003) bayesian inference: and
19 interpretations of probabilities (just enough philosophy) background bayes theorem bayesian probability bayesian inference frequentists: limit of relative frequency of events for large number of trials bayesians: measure of a state of knowledge, quantifying degrees of belief (jaynes 2003) key difference: bayesians permit assignment of probabilities to unknown/unobservable hypotheses (frequentists do not) bayesian inference: and
20 interpretations of probabilities (just enough philosophy) background bayes theorem bayesian probability bayesian inference e.g., inferring model parameters Θ from observed data D: frequentist approach: calculate parameter setting that maximizes likelihood of data (point estimate), Θ = argmax p(d Θ) (7) Θ bayesian approach: calculate distribution over parameter settings given data, p(θ D) =? (8) bayesian inference: and
21 outline background bayes theorem bayesian probability bayesian inference 1 (what we d like to do) background: joint, marginal, and conditional probabilities bayes theorem: inverting conditional probabilities bayesian probability: unknowns as random variables bayesian inference: bayesian probability + bayes theorem 2 (what we re able to do) monte carlo methods: representative samples : bound optimization bayesian inference: and
22 background bayes theorem bayesian probability bayesian inference bayesian probability + bayes theorem bayesian inference: treat unknown quantities as random variables use bayes theorem to systematically update prior knowledge in the presence of observed data posterior {}}{ p(θ D) = likelihood prior {}}{{}}{ p(d Θ) p(θ) p(d) }{{} evidence (9) bayesian inference: and
23 example: coin flipping background bayes theorem bayesian probability bayesian inference observe independent coin flips (bernoulli trials) infer distribution over coin bias bayesian inference: and
24 example: coin flipping background bayes theorem bayesian probability bayesian inference prior p(θ) over coin bias before observing flips bayesian inference: and
25 example: coin flipping background bayes theorem bayesian probability bayesian inference observe flips: HTHHHTTHHHH bayesian inference: and
26 example: coin flipping background bayes theorem bayesian probability bayesian inference update posterior p(θ D) using bayes theorem bayesian inference: and
27 example: coin flipping background bayes theorem bayesian probability bayesian inference observe flips: HHHHHHHHHHHHTHHHHHHHHHH HHHHHHHHHHHHHHHHHHHHHHH HHHHHHTHHHHHHHHHHHHHHHH HHHHHHHHHHHHHHHHHHHHHHH HHHHHHHHT bayesian inference: and
28 example: coin flipping background bayes theorem bayesian probability bayesian inference update posterior p(θ D) using bayes theorem bayesian inference: and
29 quantities of interest background bayes theorem bayesian probability bayesian inference bayesian inference maintains full posterior distributions over unknowns many quantities of interest require expectations under these posteriors, e.g. posterior mean and predictive distribution: Θ = E p(θ D) [Θ] = dθ Θ p(θ D) (10) p (x D) = E p(θ D) [p (x Θ, D)] = dθ p (x Θ, D) p(θ D) (11) bayesian inference: and
30 quantities of interest background bayes theorem bayesian probability bayesian inference bayesian inference maintains full posterior distributions over unknowns many quantities of interest require expectations under these posteriors, e.g. posterior mean and predictive distribution: Θ = E p(θ D) [Θ] = dθ Θ p(θ D) (10) p (x D) = E p(θ D) [p (x Θ, D)] = dθ p (x Θ, D) p(θ D) often can t compute posterior (normalization), let alone expectations with respect to it approximation methods (11) bayesian inference: and
31 outline 1 (what we d like to do) background: joint, marginal, and conditional probabilities bayes theorem: inverting conditional probabilities bayesian probability: unknowns as random variables bayesian inference: bayesian probability + bayes theorem 2 (what we re able to do) monte carlo methods: representative samples : bound optimization bayesian inference: and
32 representative samples general approach: approximate intractable expectations via sum over representative samples 2 Φ = E p(x) [φ(x)] = dx φ(x) }{{} arbitrary function p(x) }{{} target density (12) 2 follows mackay (2003), including stolen images bayesian inference: and
33 representative samples general approach: approximate intractable expectations via sum over representative samples 2 Φ = E p(x) [φ(x)] = dx φ(x) }{{} arbitrary function p(x) }{{} target density (12) Φ = 1 R R φ(x (r) ) (13) r=1 2 follows mackay (2003), including stolen images bayesian inference: and
34 representative samples general approach: approximate intractable expectations via sum over representative samples 2 Φ = E p(x) [φ(x)] = dx φ(x) }{{} arbitrary function p(x) }{{} target density (12) Φ = 1 R R φ(x (r) ) (13) r=1 shifts problem to finding good samples 2 follows mackay (2003), including stolen images bayesian inference: and
35 representative samples further complication: in general we can only evaluate the target density to within a multiplicative (normalization) constant, i.e. p(x) = p (x) Z and p (x (r) ) can be evaluated with Z unknown (14) bayesian inference: and
36 monte carlo methods uniform sampling importance sampling rejection sampling... markov chain monte carlo (mcmc) methods metropolishastings gibbs sampling... bayesian inference: and
37 uniform sampling sample uniformly from state space of all x values evaluate nonnormalized density p (x (r) ) at each x (r) approximate normalization constant as estimate expectation as Φ = Z R = R r=1 R p (x (r) ) (15) r=1 φ(x (r) ) p (x (r) ) Z R (16) bayesian inference: and
38 uniform sampling sample uniformly from state space of all x values evaluate nonnormalized density p (x (r) ) at each x (r) approximate normalization constant as estimate expectation as Φ = Z R = R r=1 R p (x (r) ) (15) r=1 φ(x (r) ) p (x (r) ) Z R (16) requires prohibitively large number of samples in high dimensions with concentrated density bayesian inference: and
39 importance sampling ot permitted. k/mackay/itila/ for links. 29 Monte Carlo Methods modify uniform sampling by introducing a sampler density to sample from it q(x) = q (x) (x) from which we Z Q n a multiplicative choose q(x) simple enough that q (x) can be sampled from, Q (x)/zwith hope that q (x) is a reasonable approximation to p Q ). An (x) 9.5. We call Q the P (x) Q (x) R φ(x) r=1 from Q(x). If mate Φ by equaof x where Q(x) is, and points where x into account the introduce weights Figure Functions involved in importance sampling. We wish to (29.21) bayesian inference: and
40 importance sampling adjust estimator by weighting importance of each sample Φ = R r=1 w r φ(x (r) ) R w r (17) where w r = p (x (r) ) q (x (r) ) (18) bayesian inference: and
41 importance sampling adjust estimator by weighting importance of each sample Φ = R r=1 w r φ(x (r) ) R w r (17) where w r = p (x (r) ) q (x (r) ) (18) difficult to choose good q (x) as well as estimate reliability of estimator bayesian inference: and
42 rejection sampling similar to importance sampling, but proposal density strictly bounds target density, i.e. Copyright Cambridge University Press Onscreen viewing permitted. Printing no for some 364known value c and all x cq (x) > p (x), (19) You can buy this book for 30 pounds or $50. See (a) P (x) cq (x) (b) P (x) c u x x bayesian inference: and
43 rejection sampling generate sample x from q (x) generate uniformly random number u from [0, cq (x)] add x to set {x (r) } if p (x) > u estimate expectation as Φ = 1 R R r=1 φ(x (r) ) ress Onscreen viewing permitted. Printing not permitted. ds or $50. See for links. P (x) cq (x) x (b) P (x) u x cq (x) undred samples, what will the typical range of weights be? stimate the ratio of the largest jakeweight hofman to the median weight x bayesian inference: and 29 Mon Figure R (a) The functi rejection samp samples from are able to dra Q(x) Q (x), value c such th for all x. (b) A generated at r shaded area un c Q (x). If thi below P (x) th
44 rejection sampling generate sample x from q (x) generate uniformly random number u from [0, cq (x)] add x to set {x (r) } if p (x) > u estimate expectation as Φ = 1 R R r=1 φ(x (r) ) ress Onscreen viewing permitted. Printing not permitted. ds or $50. See for links. 29 Mon Q(x) Q (x), generated at r cq (b) (x) cq (x) Figure R P (x) P (a) The functi (x) rejection samp u samples from are able to dra x x x value c such th for all x. (b) A c often prohibitively large for poor choice of q (x) or high shaded area un undred samples, dimensions what will the typical range of weights be? stimate the ratio of the largest weight to the median weight c Q (x). If thi below P (x) th bayesian inference: and
45 ,000. What will the mediate: since the metropolishastings P (x) to the volume ed here implies that In general, use c a grows local proposal density q(x ; x (t) ), depending on current nce rate is state expected x (t) Q(x; x (1) ) for onedimensional generating samples P (x) x (1) x only if theconstruct proposalmarkov chain through stateq(x; space, x (2) ) converging to oblems it target is difficult density note: proposal density needn t closely approximate target e of a proposal densityden P (x) ity Q(x ; x (t) ) might (t) bayesian inference: and
46 metropolishastings at time t, generate tenative state x from q(x ; x (t) ) evaluate a = p (x (t) ) q(x (t) ; x ) p (x ) q(x ; x (t) ) (20) if a 1, accept the new state; else accept the new state with probability a if new state is rejected, set x (t+1) = x (t) bayesian inference: and
47 metropolishastings at time t, generate tenative state x from q(x ; x (t) ) evaluate a = p (x (t) ) q(x (t) ; x ) p (x ) q(x ; x (t) ) (20) if a 1, accept the new state; else accept the new state with probability a if new state is rejected, set x (t+1) = x (t) effective for high dimensional problems, but difficult to assess convergence of markov chain 3 3 see neal (1993) bayesian inference: and
48 gibbs sampling metropolis method where proposal density is chosen as conditional distribution, i.e. ( ) q(x i ; x (t) ) = p x i {x (t) j } j i (21) useful when joint density factorizes, as in sparse graphical model 4 4 see wainwright & jordan 2008 bayesian inference: and
49 gibbs sampling metropolis method where proposal density is chosen as conditional distribution, i.e. ( ) q(x i ; x (t) ) = p x i {x (t) j } j i (21) useful when joint density factorizes, as in sparse graphical model 4 similar difficulties to metropolis, but no concerns about adjustable parameters 4 see wainwright & jordan 2008 bayesian inference: and
50 outline 1 (what we d like to do) background: joint, marginal, and conditional probabilities bayes theorem: inverting conditional probabilities bayesian probability: unknowns as random variables bayesian inference: bayesian probability + bayes theorem 2 (what we re able to do) monte carlo methods: representative samples : bound optimization bayesian inference: and
51 bound optimization general approach: replace integration with optimization construct auxiliary function upperbounded by logevidence, maximize auxiliary 9.4. The EM function Algorithm in General 453 he EM algorithm involves alterately computing a lower bound n the log likelihood for the curent parameter values and then aximizing this bound to obtain he new parameter values. See he text for a full discussion. ln p(x θ) L (q,θ) θ old θ new 5 omplete data) 5 image log likelihood from bishop function (2006) whose value we wish to maximize. We start ith some initial parameter value θ old, and injake the hofman first E step we bayesian evaluate inference: the poste and
52 variational bayes bound log of expected value by expected value of log using jensen s inequality 6 : ln p(d) = ln dθ p(d Θ)p(Θ) = ln dθ p(d Θ)p(Θ) q(θ) q(θ) [ ] p(d Θ)p(Θ) dθ ln q(θ) q(θ) for sufficiently simple (i.e. factorized) approximating distribution q(θ), righthand side can be easily evaluated and optimized 6 image from feynman (1972) bayesian inference: and
53 variational bayes iterative coordinate ascent algorithm provides controlled analytic approxmations to posterior and evidence approximate posterior q(θ) minimizes kullbackleibler distance to true posterior resulting deterministic algorithm is often fast and scalable bayesian inference: and
54 variational bayes iterative coordinate ascent algorithm provides controlled analytic approxmations to posterior and evidence approximate posterior q(θ) minimizes kullbackleibler distance to true posterior resulting deterministic algorithm is often fast and scalable complexity of approximation often limited (to, e.g., meanfield theory, assuming weak interaction between unknowns) iterative algorithm requires restarts, no guarantees on quality of approximation bayesian inference: and
55 example: a bayesian approach to network modularity bayesian inference: and
56 example: a bayesian approach to network modularity nodes: authors, edges: coauthored papers can we infer (community) structure in the giant component? bayesian inference: and
57 example: a bayesian approach to network modularity bayesian inference: and
58 example: a bayesian approach to network modularity bayesian inference: and
59 example: a bayesian approach to network modularity inferred topological communities correspond to subdisciplines bayesian inference: and
60 outline 1 (what we d like to do) background: joint, marginal, and conditional probabilities bayes theorem: inverting conditional probabilities bayesian probability: unknowns as random variables bayesian inference: bayesian probability + bayes theorem 2 (what we re able to do) monte carlo methods: representative samples : bound optimization bayesian inference: and
61 information theory, inference, and learning algorithms, mackay (2003) pattern recognition and machine learning, bishop (2006) bayesian data analysis, gelman, et. al. (2003) probabilistic inference using markov chain monte carlo methods, neal (1993) graphical models, exponential families, and variational inference, wainwright & jordan (2006) probability theory: the logic of science, jaynes (2003) what is bayes theorem..., wiggins (2006) bayesian inference view on cran variationalbayes.org variational bayesian inference for network modularity bayesian inference: and
Tutorial on Markov Chain Monte Carlo
Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationLecture 4 : Bayesian inference
Lecture 4 : Bayesian inference The Lecture dark 4 energy : Bayesian puzzle inference What is the Bayesian approach to statistics? How does it differ from the frequentist approach? Conditional probabilities,
More informationAn Introduction to Using WinBUGS for CostEffectiveness Analyses in Health Economics
Slide 1 An Introduction to Using WinBUGS for CostEffectiveness Analyses in Health Economics Dr. Christian Asseburg Centre for Health Economics Part 1 Slide 2 Talk overview Foundations of Bayesian statistics
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
More informationIntroduction to Markov Chain Monte Carlo
Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem
More informationGaussian Processes to Speed up Hamiltonian Monte Carlo
Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 1, Lecture 3
Stat26: Bayesian Modeling and Inference Lecture Date: February 1, 21 Lecture 3 Lecturer: Michael I. Jordan Scribe: Joshua G. Schraiber 1 Decision theory Recall that decision theory provides a quantification
More informationDirichlet Processes A gentle tutorial
Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid ElArini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.
More informationArtificial Intelligence Mar 27, Bayesian Networks 1 P (T D)P (D) + P (T D)P ( D) =
Artificial Intelligence 15381 Mar 27, 2007 Bayesian Networks 1 Recap of last lecture Probability: precise representation of uncertainty Probability theory: optimal updating of knowledge based on new information
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002Topics in StatisticsBiological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationExamination 110 Probability and Statistics Examination
Examination 0 Probability and Statistics Examination Sample Examination Questions The Probability and Statistics Examination consists of 5 multiplechoice test questions. The test is a threehour examination
More informationInference on Phasetype Models via MCMC
Inference on Phasetype Models via MCMC with application to networks of repairable redundant systems Louis JM Aslett and Simon P Wilson Trinity College Dublin 28 th June 202 Toy Example : Redundant Repairable
More informationSampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data
Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data (Oxford) in collaboration with: Minjie Xu, Jun Zhu, Bo Zhang (Tsinghua) Balaji Lakshminarayanan (Gatsby) Bayesian
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationL10: Probability, statistics, and estimation theory
L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian
More informationSufficient Statistics and Exponential Family. 1 Statistics and Sufficient Statistics. Math 541: Statistical Theory II. Lecturer: Songfeng Zheng
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Sufficient Statistics and Exponential Family 1 Statistics and Sufficient Statistics Suppose we have a random sample X 1,, X n taken from a distribution
More information4. Introduction to Statistics
Statistics for Engineers 41 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation
More informationIntroduction to Detection Theory
Introduction to Detection Theory Reading: Ch. 3 in KayII. Notes by Prof. Don Johnson on detection theory, see http://www.ece.rice.edu/~dhj/courses/elec531/notes5.pdf. Ch. 10 in Wasserman. EE 527, Detection
More informationWhy the Normal Distribution?
Why the Normal Distribution? Raul Rojas Freie Universität Berlin Februar 2010 Abstract This short note explains in simple terms why the normal distribution is so ubiquitous in pattern recognition applications.
More informationLab 8: Introduction to WinBUGS
40.656 Lab 8 008 Lab 8: Introduction to WinBUGS Goals:. Introduce the concepts of Bayesian data analysis.. Learn the basic syntax of WinBUGS. 3. Learn the basics of using WinBUGS in a simple example. Next
More informationBayesian Clustering for Email Campaign Detection
Peter Haider haider@cs.unipotsdam.de Tobias Scheffer scheffer@cs.unipotsdam.de University of Potsdam, Department of Computer Science, AugustBebelStrasse 89, 14482 Potsdam, Germany Abstract We discuss
More informationParameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 12 04/08/2008. Sven Zenker
Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem Lecture 12 04/08/2008 Sven Zenker Assignment no. 8 Correct setup of likelihood function One fixed set of observation
More informationThe Exponential Family
The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural
More informationLikelihood: Frequentist vs Bayesian Reasoning
"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B University of California, Berkeley Spring 2009 N Hallinan Likelihood: Frequentist vs Bayesian Reasoning Stochastic odels and
More informationIntroduction to Monte Carlo. Astro 542 Princeton University Shirley Ho
Introduction to Monte Carlo Astro 542 Princeton University Shirley Ho Agenda Monte Carlo  definition, examples Sampling Methods (Rejection, Metropolis, MetropolisHasting, Exact Sampling) Markov Chains
More informationBayes and Naïve Bayes. cs534machine Learning
Bayes and aïve Bayes cs534machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule
More informationMonte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)
Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 6 Sequential Monte Carlo methods II February
More informationModelbased Synthesis. Tony O Hagan
Modelbased Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that
More information1. The maximum likelihood principle 2. Properties of maximumlikelihood estimates
The maximumlikelihood method Volker Blobel University of Hamburg March 2005 1. The maximum likelihood principle 2. Properties of maximumlikelihood estimates Keys during display: enter = next page; =
More informationMonte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)
Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February
More information33. STATISTICS. 33. Statistics 1
33. STATISTICS 33. Statistics 1 Revised September 2011 by G. Cowan (RHUL). This chapter gives an overview of statistical methods used in highenergy physics. In statistics, we are interested in using a
More informationInference of Probability Distributions for Trust and Security applications
Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations
More informationMonte Carlobased statistical methods (MASM11/FMS091)
Monte Carlobased statistical methods (MASM11/FMS091) Jimmy Olsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February 5, 2013 J. Olsson Monte Carlobased
More informationDiscrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 11
CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note Conditional Probability A pharmaceutical company is marketing a new test for a certain medical condition. According
More informationCombining information from different survey samples  a case study with data collected by world wide web and telephone
Combining information from different survey samples  a case study with data collected by world wide web and telephone Magne Aldrin Norwegian Computing Center P.O. Box 114 Blindern N0314 Oslo Norway Email:
More informationThe result of the bayesian analysis is the probability distribution of every possible hypothesis H, given one real data set D. This prestatistical approach to our problem was the standard approach of Laplace
More informationA crash course in probability and Naïve Bayes classification
Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s
More informationLogic, Probability and Learning
Logic, Probability and Learning Luc De Raedt luc.deraedt@cs.kuleuven.be Overview Logic Learning Probabilistic Learning Probabilistic Logic Learning Closely following : Russell and Norvig, AI: a modern
More informationTHE USE OF STATISTICAL DISTRIBUTIONS TO MODEL CLAIMS IN MOTOR INSURANCE
THE USE OF STATISTICAL DISTRIBUTIONS TO MODEL CLAIMS IN MOTOR INSURANCE Batsirai Winmore Mazviona 1 Tafadzwa Chiduza 2 ABSTRACT In general insurance, companies need to use data on claims gathered from
More informationBayesian Statistics in One Hour. Patrick Lam
Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical
More informationMarkov Chain Monte Carlo Simulation Made Simple
Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical
More informationBayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
More informationBayesian Methods. 1 The Joint Posterior Distribution
Bayesian Methods Every variable in a linear model is a random variable derived from a distribution function. A fixed factor becomes a random variable with possibly a uniform distribution going from a lower
More informationIntroduction to Deep Learning Variational Inference, Mean Field Theory
Introduction to Deep Learning Variational Inference, Mean Field Theory 1 Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Center for Visual Computing Ecole Centrale Paris Galen Group INRIASaclay Lecture 3: recap
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationMaster s Theory Exam Spring 2006
Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem
More informationProbability and Statistics
CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b  0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute  Systems and Modeling GIGA  Bioinformatics ULg kristel.vansteen@ulg.ac.be
More informationSection 5. Stan for Big Data. Bob Carpenter. Columbia University
Section 5. Stan for Big Data Bob Carpenter Columbia University Part I Overview Scaling and Evaluation data size (bytes) 1e18 1e15 1e12 1e9 1e6 Big Model and Big Data approach state of the art big model
More informationParallelization Strategies for Multicore Data Analysis
Parallelization Strategies for Multicore Data Analysis WeiChen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management
More informationMessagepassing sequential detection of multiple change points in networks
Messagepassing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal
More informationData Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1
Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields
More informationBayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom
1 Learning Goals Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1. Be able to apply Bayes theorem to compute probabilities. 2. Be able to identify
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationBayesian Phylogeny and Measures of Branch Support
Bayesian Phylogeny and Measures of Branch Support Bayesian Statistics Imagine we have a bag containing 100 dice of which we know that 90 are fair and 10 are biased. The
More informationUn point de vue bayésien pour des algorithmes de bandit plus performants
Un point de vue bayésien pour des algorithmes de bandit plus performants Emilie Kaufmann, Telecom ParisTech Rencontre des Jeunes Statisticiens, Aussois, 28 août 2013 Emilie Kaufmann (Telecom ParisTech)
More informationLecture 9: Bayesian hypothesis testing
Lecture 9: Bayesian hypothesis testing 5 November 27 In this lecture we ll learn about Bayesian hypothesis testing. 1 Introduction to Bayesian hypothesis testing Before we go into the details of Bayesian
More informationComparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom
Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Be able to explain the difference between the pvalue and a posterior
More informationSTAT3016 Introduction to Bayesian Data Analysis
STAT3016 Introduction to Bayesian Data Analysis Course Description The Bayesian approach to statistics assigns probability distributions to both the data and unknown parameters in the problem. This way,
More informationReview for Calculus Rational Functions, Logarithms & Exponentials
Definition and Domain of Rational Functions A rational function is defined as the quotient of two polynomial functions. F(x) = P(x) / Q(x) The domain of F is the set of all real numbers except those for
More informationProbability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special DistributionsVI Today, I am going to introduce
More informationPS 271B: Quantitative Methods II. Lecture Notes
PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.
More informationTowards running complex models on big data
Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation
More informationGaussian Classifiers CS498
Gaussian Classifiers CS498 Today s lecture The Gaussian Gaussian classifiers A slightly more sophisticated classifier Nearest Neighbors We can classify with nearest neighbors x m 1 m 2 Decision boundary
More informationVariational Mean Field for Graphical Models
Variational Mean Field for Graphical Models CS/CNS/EE 155 Baback Moghaddam Machine Learning Group baback @ jpl.nasa.gov Approximate Inference Consider general UGs (i.e., not treestructured) All basic
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationMore details on the inputs, functionality, and output can be found below.
Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a twoarmed trial comparing
More informationAALBORG UNIVERSITY. An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants
AALBORG UNIVERSITY An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants by Jesper Møller, Anthony N. Pettitt, Kasper K. Berthelsen and Robert W. Reeves
More information11 TRANSFORMING DENSITY FUNCTIONS
11 TRANSFORMING DENSITY FUNCTIONS It can be expedient to use a transformation function to transform one probability density function into another. As an introduction to this topic, it is helpful to recapitulate
More informationTowards scaling up Markov chain Monte Carlo: an adaptive subsampling approach
Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach Rémi Bardenet Arnaud Doucet Chris Holmes Department of Statistics, University of Oxford, Oxford OX1 3TG, UK REMI.BARDENET@GMAIL.COM
More informationTracking Algorithms. Lecture17: Stochastic Tracking. Joint Probability and Graphical Model. Probabilistic Tracking
Tracking Algorithms (2015S) Lecture17: Stochastic Tracking Bohyung Han CSE, POSTECH bhhan@postech.ac.kr Deterministic methods Given input video and current state, tracking result is always same. Local
More informationLearning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu
Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of
More informationTutorial on variational approximation methods. Tommi S. Jaakkola MIT AI Lab
Tutorial on variational approximation methods Tommi S. Jaakkola MIT AI Lab tommi@ai.mit.edu Tutorial topics A bit of history Examples of variational methods A brief intro to graphical models Variational
More informationBayesX  Software for Bayesian Inference in Structured Additive Regression
BayesX  Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, LudwigMaximiliansUniversity Munich
More informationEstimating the evidence for statistical models
Estimating the evidence for statistical models Nial Friel University College Dublin nial.friel@ucd.ie March, 2011 Introduction Bayesian model choice Given data y and competing models: m 1,..., m l, each
More informationEstimation and comparison of multiple changepoint models
Journal of Econometrics 86 (1998) 221 241 Estimation and comparison of multiple changepoint models Siddhartha Chib* John M. Olin School of Business, Washington University, 1 Brookings Drive, Campus Box
More informationP (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )
Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =
More informationSupplement to Regional climate model assessment. using statistical upscaling and downscaling techniques
Supplement to Regional climate model assessment using statistical upscaling and downscaling techniques Veronica J. Berrocal 1,2, Peter F. Craigmile 3, Peter Guttorp 4,5 1 Department of Biostatistics, University
More informationRandom graphs with a given degree sequence
Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.
More informationBasic Bayesian Methods
6 Basic Bayesian Methods Mark E. Glickman and David A. van Dyk Summary In this chapter, we introduce the basics of Bayesian data analysis. The key ingredients to a Bayesian analysis are the likelihood
More informationProbabilistic Fitting
Probabilistic Fitting Shape Modeling April 2016 Sandro Schönborn University of Basel 1 Probabilistic Inference for Face Model Fitting Approximate Inference with Markov Chain Monte Carlo 2 Face Image Manipulation
More informationProbability and statistics; Rehearsal for pattern recognition
Probability and statistics; Rehearsal for pattern recognition Václav Hlaváč Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More informationBayesian networks  Timeseries models  Apache Spark & Scala
Bayesian networks  Timeseries models  Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup  November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
More informationBasics of Probability
Basics of Probability 1 Sample spaces, events and probabilities Begin with a set Ω the sample space e.g., 6 possible rolls of a die. ω Ω is a sample point/possible world/atomic event A probability space
More informationIncorporating cost in Bayesian Variable Selection, with application to costeffective measurement of quality of health care.
Incorporating cost in Bayesian Variable Selection, with application to costeffective measurement of quality of health care University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and
More informationOptimising and Adapting the Metropolis Algorithm
Optimising and Adapting the Metropolis Algorithm by Jeffrey S. Rosenthal 1 (February 2013; revised March 2013 and May 2013 and June 2013) 1 Introduction Many modern scientific questions involve high dimensional
More informationA Bayesian Antidote Against Strategy Sprawl
A Bayesian Antidote Against Strategy Sprawl Benjamin Scheibehenne (benjamin.scheibehenne@unibas.ch) University of Basel, Missionsstrasse 62a 4055 Basel, Switzerland & Jörg Rieskamp (joerg.rieskamp@unibas.ch)
More informationParametric Models Part I: Maximum Likelihood and Bayesian Density Estimation
Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationBayesian probability theory
Bayesian probability theory Bruno A. Olshausen arch 1, 2004 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using probability. The foundations
More information11. Time series and dynamic linear models
11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd
More informationImperfect Debugging in Software Reliability
Imperfect Debugging in Software Reliability Tevfik Aktekin and Toros Caglar University of New Hampshire Peter T. Paul College of Business and Economics Department of Decision Sciences and United Health
More informationMonte Carlobased statistical methods (MASM11/FMS091)
Monte Carlobased statistical methods (MASM11/FMS091) Jimmy Olsson Centre for Mathematical Sciences Lund University, Sweden Lecture 6 Sequential Monte Carlo methods II February 3, 2012 Changes in HA1 Problem
More informationBAYESIAN ANALYSIS OF AN AGGREGATE CLAIM MODEL USING VARIOUS LOSS DISTRIBUTIONS. by Claire Dudley
BAYESIAN ANALYSIS OF AN AGGREGATE CLAIM MODEL USING VARIOUS LOSS DISTRIBUTIONS by Claire Dudley A dissertation submitted for the award of the degree of Master of Science in Actuarial Management School
More informationVariational Bayesian Inference for Big Data Marketing Models 1
Variational Bayesian Inference for Big Data Marketing Models 1 Asim Ansari Yang Li Jonathan Z. Zhang 2 December 2014 1 This is a preliminary version. Please do not cite or circulate. 2 Asim Ansari is the
More informationLecture 11: Graphical Models for Inference
Lecture 11: Graphical Models for Inference So far we have seen two graphical models that are used for inference  the Bayesian network and the Join tree. These two both represent the same joint probability
More information