Inference of Probability Distributions for Trust and Security applications

Save this PDF as:

Size: px
Start display at page:

Transcription

1 Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi

2 Outline 2

3 Outline Motivations 2

4 Outline Motivations Bayesian vs Frequentist approach 2

5 Outline Motivations Bayesian vs Frequentist approach A class of functions to estimate the distribution 2

6 Outline Motivations Bayesian vs Frequentist approach A class of functions to estimate the distribution Measuring the precision of an estimation function 2

7 Motivations Inferring the probability distribution of a random variable Examples of applications in Trust & Security How much we can trust an individual or a set of individuals Input distribution in a noisy channel to compute the Bayes risk Application of the Bayesian approach to hypothesis testing (anonymity, information flow)... 3

8 Setting and assumptions For simplicity we consider only binary random variables X = {succ, fail} honest/dishonest, secure/insecure,... Goal: infer (an approximation of) the probability of success Means: Sequence of n trials. Observation (Evidence) : s, f Pr(succ) =θ s =#succ f =#fail = n s 4

9 Using the evidence to infer θ The Frequentist method: F (n, s) = s n The Bayesian method: Assume an a priori probability distribution for θ (representing your partial knowledge about θ, whatever the source may be) and combine it with the evidence, using Bayes theorem, to obtain the a posteriori distribution 5

10 Bayesian vs Frequentist Criticisms to the frequentist approach Limited applicability: sometimes it is not possible to measure the frequencies (in this talk we consider the case in which this is possible) Eg: what is the probability that my submitted paper will be accepted? Misleading evidence: For small samples (small n) we can be unlucky, i.e. get unlikely results This is less dramatic for the Bayesian approach because the a priori distribution reduces the effect of a misleading evidence, provided it is close enough to the real distribution 6

11 Bayesian vs Frequentist Criticisms to the Bayesian approach We need to assume an a priori probability distribution; as we usually do not know the real distribution, the assumption can be somehow arbitrary and differ significantly from reality Observe that the two approaches give the same result as n tends to infinity: the true distribution Frequentist approach: because of the law of large numbers Bayes approach: because the a priori washes out for large values of n. 7

12 Bayesian vs Frequentist The surprising thing is that the Frequentist approach can be worse than the Bayesian approach even when the trials give a good result, or when we consider the average difference (from the true θ) wrt all possible results Example: true θ = 1/2, n = 1 Pr(s) 1 F (n, s) = s n = 0 s =0 1 s =1 The difference from the true distribution is 1/2 1/2 0 1/2 θ 1 s 8

13 Bayesian vs Frequentist The surprising thing is that the Frequentist approach can be worse than the Bayesian approach even when the trials give a good result, or when we consider the average difference (from the true θ) wrt all possible results Example: true θ = 1/2, n = 1 Pr(s) 1 F (n, s) = s n = 0 s =0 1 s =1 The difference from the true distribution is 1/2 1/2 0 1/3 1/2 2/3 θ 1 s A better function would be F c (n, s) = s +1 n +2 = 1 3 s =0 2 3 s =1 The difference from the true distribution is 1/6 9

14 Bayesian vs Frequentist The surprising thing is that the Frequentist approach can be worse than the Bayesian approach even when the trials give a good result, or when we consider the average difference (from the true θ) wrt all possible results Pr(s) 1 Example: true θ = 1/2, n = 2 F (n, s) = s n = 0 s =0 1 2 s =1 1 s =2 The average difference from the true distribution is 1/4 1/2 1/4 0 1/2 θ 1 s 10

15 Bayesian vs Frequentist The surprising thing is that the Frequentist approach can be worse than the Bayesian approach even when the trials give a good result, or when we consider the average difference (from the true θ) w.r.t. all possible results Pr(s) 1 Example: true θ = 1/2, n = 2 F (n, s) = s n = The average distance from the true distribution is 1/4 0 s =0 1 2 s =1 1 s =2 1/2 1/4 0 1/4 1/2 3/4 θ 1 s Again, a better function would be F c (n, s) = s +1 n +2 = 1 2 s =1 3 4 s =2 The average distance from the true distribution is 1/8 1 4 s =0 11

16 Bayesian vs Frequentist 12

17 Bayesian vs Frequentist We will see that Fc(s,n) = (s+1)/(n+2) corresponds to one of the possible Bayesian approaches. 12

18 Bayesian vs Frequentist We will see that Fc(s,n) = (s+1)/(n+2) corresponds to one of the possible Bayesian approaches. Of course, if the true θ is different from 1/2 then Fc can be worse than F 12

19 Bayesian vs Frequentist We will see that Fc(s,n) = (s+1)/(n+2) corresponds to one of the possible Bayesian approaches. Of course, if the true θ is different from 1/2 then Fc can be worse than F And, of course, the problem is that we don t know what θ is (the value θ is exactly what we are trying to find out!). 12

20 Bayesian vs Frequentist We will see that Fc(s,n) = (s+1)/(n+2) corresponds to one of the possible Bayesian approaches. Of course, if the true θ is different from 1/2 then Fc can be worse than F And, of course, the problem is that we don t know what θ is (the value θ is exactly what we are trying to find out!). However, Fc is still better than F if we consider the average distance wrt all possible θ [0,1], assuming that they are all equally likely (i.e. that θ has a uniform distribution) 12

21 Bayesian vs Frequentist We will see that Fc(s,n) = (s+1)/(n+2) corresponds to one of the possible Bayesian approaches. Of course, if the true θ is different from 1/2 then Fc can be worse than F And, of course, the problem is that we don t know what θ is (the value θ is exactly what we are trying to find out!). However, Fc is still better than F if we consider the average distance wrt all possible θ [0,1], assuming that they are all equally likely (i.e. that θ has a uniform distribution) In fact we can prove that, under a suitable notion of difference, and for θ uniformly distributed, Fc is the best function of the kind G(s,n) = (s+t)/(n+m) 12

22 A Bayesian approach Assumption: θ is the generic value of a continuous random variable whose probability density is a Beta distribution with (unknown) parameters, B(σ, ϕ)(θ) = Γ(σ+ϕ) Γ(σ)Γ(ϕ) θ σ 1 (1 θ) ϕ 1 where Γ is the extension of the factorial function i.e. Γ(n) =(n 1)! for n natural number Note that the uniform distribution is a particular case of Beta distribution, with, B, can be seen as the a posteriori probability density of given by a uniform a priori (principle of maximum entropy) and a trial sequence resulting in successes and failures. 13

23 Examples of Beta Distribution 14

24 Examples of Beta Distribution 14

25 Examples of Beta Distribution 14

26 Other examples of Beta Distribution 15

27 The Bayesian Approach Assume an a priori probability distribution for Θ (representing our partial knowledge about Θ, whatever the source may be) and combine it with the evidence, using Bayes theorem, to obtain the a posteriori probability distribution likelihood a priori Pd(θ s) = a posteriori Pr(s θ) Pd(θ) Pr(s) evidence One possible definition for the estimation function (algorithm) is the mean of the a posteriori distribution 1 A(n, s) =E Pd(θ s) (Θ) = 0 θ Pd(θ s) dθ 16

28 The Bayesian Approach Since the distribution of Θ is assumed to be a beta distribution B(,, it is natural to take as a priori a function of the same class, i.e. B( In general we don t know the real parameters, hence may be different from, The likelihood Pr( s ) is a binomial, i.e. s + f Pr(s θ) = s θ s (1 θ) f The Beta distribution is a conjugate of the binomial, which means that the application of Bayes theorem gives as a posteriori a function of the same class, and more precisely Pd(θ s) =B(α + s, β + f) 17

29 The Bayesian Approach Summarizing, we are considering three probability density functions for : B(, ) : the real distribution of B(, ) : the a priori (the distribution of up to our best knowledge) B(s +, f + ) : the a posteriori The result of the mean-based algorithm is : A α,β (n, s) = E B(s+α,f+β) (Θ) = s + α s + f + α + β = s + α n + α + β 18

30 The Bayesian Approach The frequentist method can be seen as the limit of the Bayesian mean-based algorithms, for 0 Intuitively, the Bayesian meanbased algorithms give the best result for ( and How can we compare two Bayesian algorithms in general, i.e. independently of? 19

31 Measuring the precision of Bayesian algorithms Define a difference D(A(n,s), ) (possibly a distance, but not necessarily. It does not need to be symmetric) non-negative zero iff A(n,s) = what else? Consider the expected value DE(A,n, ) of D(A(n,s), ) with respect to the likelihood (the conditional probability of s given ) n D E (A, n, θ) = Pr(s θ) D(A(n, s), θ) Risk of A : the expected value R(A,n) of DE(A,n, ) with respect to the true distribution of 1 R(A, n) = Pd(θ) D E (A, n, θ) dθ s=0 0 20

32 Measuring the precision of Bayesian Algorithms Note that the definition of Risk of A is general, i.e. it is a natural definition for any estimation algorithm (not necessarily Bayesian or mean-based) What other conditions should D satisfy? It seems natural to require that D be such that R(A,n) has a minimum (for all n s) when the a priori distribution coincides with the true distribution It is not obvious that such D exists 21

33 Measuring the precision of Bayesian Algorithms We have considered the following candidates for D(x,y) (all of which can be extended to the n-ary case): The norms: x - y x - y 2... x - y k... The Kullback-Leibler divergence y D KL ((y, 1 y) (x, 1 x)) = y log 2 x + (1 y) log 2 1 y 1 x 22

34 Measuring the precision of Bayesian algorithms Theorem. For the mean-based Bayesian algorithms, with a priori B( ), we have that the condition is satisfied (i.e. the Risk is minimum when coincide with the parameters, of the true distribution), by the following functions: The 2nd norm (x - y) 2 The Kullback-Leibler divergence We find it very surprising that the condition is satisfied by these two very different functions, and not by any of the other norms x - y k for k 2 23

35 D(x, y) =(x y) 2 σ =1, ϕ =1 R(A α,β, 5) n =5, θ =1/2 n =5 D E (A α,β, 5, 1/2) For the Kullback-Leibler divergence the plots are similar, but much more steep, and they diverge for 0 or 0 24

36 Work in progress Note that for the 2nd norm D(x,y) = (x-y) 2 the average DE is a distance. This contrasts with the case of D(x,y) = DKL(y x) and makes the first more appealing. How robust is the theorem that certifies that the 2nd-norm-based DE is a good distance? In particular: Does it extend to the case of multi-valued random variables? Note that in the multi-valued case the likelihood is a multinomial, the conjugate a priori is a Dirichelet and the D is the Euclidian distance (squared) What are the possible applications? 25

37 Possible applications (work in progress) We can use DE to compare two different estimation algorithms. Mean-based vs other ways of selecting a Bayesian vs non-bayesian In more complicated scenarios there may be different Bayesian meanbased algorithms. Example: noisy channel. DE induces a metric on distributions. Bayes equations define transformations on this metric space from the a priori to the a posteriori. We intend to study the properties of such transformations in the hope that they will reveal interesting properties of the corresponding Bayesian methods, independent of the a priori. 26

Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )

Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

Hypothesis Testing. 1 Introduction. 2 Hypotheses. 2.1 Null and Alternative Hypotheses. 2.2 Simple vs. Composite. 2.3 One-Sided and Two-Sided Tests

Hypothesis Testing 1 Introduction This document is a simple tutorial on hypothesis testing. It presents the basic concepts and definitions as well as some frequently asked questions associated with hypothesis

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

1 Sufficient statistics

1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =

Introduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures.

Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses.

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Be able to explain the difference between the p-value and a posterior

Constrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing

Communications for Statistical Applications and Methods 2013, Vol 20, No 4, 321 327 DOI: http://dxdoiorg/105351/csam2013204321 Constrained Bayes and Empirical Bayes Estimator Applications in Insurance

15.0 More Hypothesis Testing

15.0 More Hypothesis Testing 1 Answer Questions Type I and Type II Error Power Calculation Bayesian Hypothesis Testing 15.1 Type I and Type II Error In the philosophy of hypothesis testing, the null hypothesis

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015

Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

Betting rules and information theory

Betting rules and information theory Giulio Bottazzi LEM and CAFED Scuola Superiore Sant Anna September, 2013 Outline Simple betting in favorable games The Central Limit Theorem Optimal rules The Game

Bayesian Analysis for the Social Sciences

Bayesian Analysis for the Social Sciences Simon Jackman Stanford University http://jackman.stanford.edu/bass November 9, 2012 Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November

Lecture 6: The Bayesian Approach

Lecture 6: The Bayesian Approach What Did We Do Up to Now? We are given a model Log-linear model, Markov network, Bayesian network, etc. This model induces a distribution P(X) Learning: estimate a set

Notes on the Negative Binomial Distribution

Notes on the Negative Binomial Distribution John D. Cook October 28, 2009 Abstract These notes give several properties of the negative binomial distribution. 1. Parameterizations 2. The connection between

Lecture 9: Bayesian hypothesis testing

Lecture 9: Bayesian hypothesis testing 5 November 27 In this lecture we ll learn about Bayesian hypothesis testing. 1 Introduction to Bayesian hypothesis testing Before we go into the details of Bayesian

The Utility of Bayesian Predictive Probabilities for Interim Monitoring of Clinical Trials

The Utility of Bayesian Predictive Probabilities for Interim Monitoring of Clinical Trials Ben Saville, Ph.D. Berry Consultants KOL Lecture Series, Nov 2015 1 / 40 Introduction How are clinical trials

Statistical Machine Learning from Data

Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

ECON 459 Game Theory Lecture Notes Auctions Luca Anderlini Spring 2015 These notes have been used before. If you can still spot any errors or have any suggestions for improvement, please let me know. 1

Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

Bayesian hypothesis testing for proportions

Paper SP08 Bayesian hypothesis testing for proportions Antonio Nieto, PharmaMar, Madrid, Spain Sonia Extremera, PharmaMar, Madrid, Spain Javier Gómez, PharmaMar, Madrid, Spain ABSTRACT Most clinical trials

Maximum Likelihood Estimation

Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

MAS2317/3317. Introduction to Bayesian Statistics. More revision material

MAS2317/3317 Introduction to Bayesian Statistics More revision material Dr. Lee Fawcett, 2014 2015 1 Section A style questions 1. Describe briefly the frequency, classical and Bayesian interpretations

Part 2: One-parameter models

Part 2: One-parameter models Bernoilli/binomial models Return to iid Y 1,...,Y n Bin(1, θ). The sampling model/likelihood is p(y 1,...,y n θ) =θ P y i (1 θ) n P y i When combined with a prior p(θ), Bayes

T Cryptology Spring 2009

T-79.5501 Cryptology Spring 2009 Homework 2 Tutor : Joo Y. Cho joo.cho@tkk.fi 5th February 2009 Q1. Let us consider a cryptosystem where P = {a, b, c} and C = {1, 2, 3, 4}, K = {K 1, K 2, K 3 }, and the

Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page

Errata for ASM Exam C/4 Study Manual (Sixteenth Edition) Sorted by Page 1 Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page Practice exam 1:9, 1:22, 1:29, 9:5, and 10:8

Bayes and Naïve Bayes. cs534-machine Learning

Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule

Posterior probability!

Posterior probability! P(x θ): old name direct probability It gives the probability of contingent events (i.e. observed data) for a given hypothesis (i.e. a model with known parameters θ) L(θ)=P(x θ):

What is Bayesian statistics and why everything else is wrong

What is Bayesian statistics and why everything else is wrong 1 Michael Lavine ISDS, Duke University, Durham, North Carolina Abstract We use a single example to explain (1), the Likelihood Principle, (2)

Models for Count Data With Overdispersion

Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extra-poisson variation and the negative binomial model, with brief appearances

Lecture 2: October 24

Machine Learning: Foundations Fall Semester, 2 Lecture 2: October 24 Lecturer: Yishay Mansour Scribe: Shahar Yifrah, Roi Meron 2. Bayesian Inference - Overview This lecture is going to describe the basic

Principle of Data Reduction

Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

Chapter 9: Hypothesis Testing Sections

Chapter 9: Hypothesis Testing Sections 9.1 Problems of Testing Hypotheses Skip: 9.2 Testing Simple Hypotheses Skip: 9.3 Uniformly Most Powerful Tests Skip: 9.4 Two-Sided Alternatives 9.5 The t Test 9.6

Likelihood Approaches for Trial Designs in Early Phase Oncology

Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth Garrett-Mayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University

Computational Statistics and Data Analysis

Computational Statistics and Data Analysis 53 (2008) 17 26 Contents lists available at ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda Coverage probability

Supplement to Call Centers with Delay Information: Models and Insights

Supplement to Call Centers with Delay Information: Models and Insights Oualid Jouini 1 Zeynep Akşin 2 Yves Dallery 1 1 Laboratoire Genie Industriel, Ecole Centrale Paris, Grande Voie des Vignes, 92290

α α λ α = = λ λ α ψ = = α α α λ λ ψ α = + β = > θ θ β > β β θ θ θ β θ β γ θ β = γ θ > β > γ θ β γ = θ β = θ β = θ β = β θ = β β θ = = = β β θ = + α α α α α = = λ λ λ λ λ λ λ = λ λ α α α α λ ψ + α =

The Power Method for Eigenvalues and Eigenvectors

Numerical Analysis Massoud Malek The Power Method for Eigenvalues and Eigenvectors The spectrum of a square matrix A, denoted by σ(a) is the set of all eigenvalues of A. The spectral radius of A, denoted

Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

Chapter 2: Entropy and Mutual Information

Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality

Coding and decoding with convolutional codes. The Viterbi Algor

Coding and decoding with convolutional codes. The Viterbi Algorithm. 8 Block codes: main ideas Principles st point of view: infinite length block code nd point of view: convolutions Some examples Repetition

Introduction to Logistic Regression

OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction

The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

MS&E 226: Small Data

MS&E 226: Small Data Lecture 16: Bayesian inference (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 35 Priors 2 / 35 Frequentist vs. Bayesian inference Frequentists treat the parameters as fixed (deterministic).

PS 271B: Quantitative Methods II. Lecture Notes

PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.

Section 4.4 Inner Product Spaces

Section 4.4 Inner Product Spaces In our discussion of vector spaces the specific nature of F as a field, other than the fact that it is a field, has played virtually no role. In this section we no longer

Bayesian Techniques for Parameter Estimation. He has Van Gogh s ear for music, Billy Wilder

Bayesian Techniques for Parameter Estimation He has Van Gogh s ear for music, Billy Wilder Statistical Inference Goal: The goal in statistical inference is to make conclusions about a phenomenon based

Chapter 4: Non-Parametric Classification

Chapter 4: Non-Parametric Classification Introduction Density Estimation Parzen Windows Kn-Nearest Neighbor Density Estimation K-Nearest Neighbor (KNN) Decision Rule Gaussian Mixture Model A weighted combination

Likelihood: Frequentist vs Bayesian Reasoning

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B University of California, Berkeley Spring 2009 N Hallinan Likelihood: Frequentist vs Bayesian Reasoning Stochastic odels and

Gaussian Processes to Speed up Hamiltonian Monte Carlo

Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo

Time Series Analysis

Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Identification of univariate time series models, cont.:

Summary of Probability

Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible

4. Introduction to Statistics

Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

Bayesian Adaptive Designs for Early-Phase Oncology Trials

The University of Hong Kong 1 Bayesian Adaptive Designs for Early-Phase Oncology Trials Associate Professor Department of Statistics & Actuarial Science The University of Hong Kong The University of Hong

The Dirichlet-Multinomial and Dirichlet-Categorical models for Bayesian inference

The Dirichlet-Multinomial and Dirichlet-Categorical models for Bayesian inference Stephen Tu tu.stephenl@gmail.com 1 Introduction This document collects in one place various results for both the Dirichlet-multinomial

Introduction to Machine Learning

Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 5: Decision Theory & ROC Curves Gaussian ML Estimation Many figures courtesy Kevin Murphy s textbook,

Advanced 3G and 4G Wireless Communication Prof. Aditya K. Jagannatham Department of Electrical Engineering Indian Institute of Technology, Kanpur

Advanced 3G and 4G Wireless Communication Prof. Aditya K. Jagannatham Department of Electrical Engineering Indian Institute of Technology, Kanpur Lecture - 3 Rayleigh Fading and BER of Wired Communication

L4: Bayesian Decision Theory

L4: Bayesian Decision Theory Likelihood ratio test Probability of error Bayes risk Bayes, MAP and ML criteria Multi-class problems Discriminant functions CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna

What you CANNOT ignore about Probs and Stats

What you CANNOT ignore about Probs and Stats by Your Teacher Version 1.0.3 November 5, 2009 Introduction The Finance master is conceived as a postgraduate course and contains a sizable quantitative section.

Lecture 8: More Continuous Random Variables

Lecture 8: More Continuous Random Variables 26 September 2005 Last time: the eponential. Going from saying the density e λ, to f() λe λ, to the CDF F () e λ. Pictures of the pdf and CDF. Today: the Gaussian

Stochastic Inventory Control

Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the

INSURANCE RISK THEORY (Problems)

INSURANCE RISK THEORY (Problems) 1 Counting random variables 1. (Lack of memory property) Let X be a geometric distributed random variable with parameter p (, 1), (X Ge (p)). Show that for all n, m =,

Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

Markov Chain Monte Carlo Simulation Made Simple

Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

An Introduction to Bayesian Statistics

An Introduction to Bayesian Statistics Robert Weiss Department of Biostatistics UCLA School of Public Health robweiss@ucla.edu April 2011 Robert Weiss (UCLA) An Introduction to Bayesian Statistics UCLA

Hypothesis Testing: General Framework 1 1

Hypothesis Testing: General Framework Lecture 2 K. Zuev February 22, 26 In previous lectures we learned how to estimate parameters in parametric and nonparametric settings. Quite often, however, researchers

Models Where the Fate of Every Individual is Known

Models Where the Fate of Every Individual is Known This class of models is important because they provide a theory for estimation of survival probability and other parameters from radio-tagged animals.

Centre for Central Banking Studies

Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics

Gaussian Conjugate Prior Cheat Sheet

Gaussian Conjugate Prior Cheat Sheet Tom SF Haines 1 Purpose This document contains notes on how to handle the multivariate Gaussian 1 in a Bayesian setting. It focuses on the conjugate prior, its Bayesian

What Is Probability?

1 What Is Probability? The idea: Uncertainty can often be "quantified" i.e., we can talk about degrees of certainty or uncertainty. This is the idea of probability: a higher probability expresses a higher

1 Prior Probability and Posterior Probability

Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

DEPARTMENT OF ECONOMICS. Unit ECON 12122 Introduction to Econometrics. Notes 4 2. R and F tests

DEPARTMENT OF ECONOMICS Unit ECON 11 Introduction to Econometrics Notes 4 R and F tests These notes provide a summary of the lectures. They are not a complete account of the unit material. You should also

Master s Theory Exam Spring 2006

Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

38. STATISTICS. 38. Statistics 1. 38.1. Fundamental concepts

38. Statistics 1 38. STATISTICS Revised September 2013 by G. Cowan (RHUL). This chapter gives an overview of statistical methods used in high-energy physics. In statistics, we are interested in using a

INTRODUCTORY STATISTICS

INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore

3.6: General Hypothesis Tests

3.6: General Hypothesis Tests The χ 2 goodness of fit tests which we introduced in the previous section were an example of a hypothesis test. In this section we now consider hypothesis tests more generally.

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan

Mathematical Induction

Mathematical Induction In logic, we often want to prove that every member of an infinite set has some feature. E.g., we would like to show: N 1 : is a number 1 : has the feature Φ ( x)(n 1 x! 1 x) How

0.1 Phase Estimation Technique

Phase Estimation In this lecture we will describe Kitaev s phase estimation algorithm, and use it to obtain an alternate derivation of a quantum factoring algorithm We will also use this technique to design

Probabilistic trust models in network security

UNIVERSITY OF SOUTHAMPTON Probabilistic trust models in network security by Ehab M. ElSalamouny A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in the Faculty of Engineering

Discrete Binary Distributions

Discrete Binary Distributions Carl Edward Rasmussen November th, 26 Carl Edward Rasmussen Discrete Binary Distributions November th, 26 / 4 Key concepts Bernoulli: probabilities over binary variables Binomial:

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

8 Goodness of Fit Test

113 8 Goodness of Fit Test For a continuous random variable X, that is, for a random variable whose range is a non-countable subset of R, the probability of any particular outcome x is zero: P[X = x] =,

The result of the bayesian analysis is the probability distribution of every possible hypothesis H, given one real data set D. This prestatistical approach to our problem was the standard approach of Laplace

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

A New Interpretation of Information Rate

A New Interpretation of Information Rate reproduced with permission of AT&T By J. L. Kelly, jr. (Manuscript received March 2, 956) If the input symbols to a communication channel represent the outcomes

Probability and Statistics

CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b - 0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be

A logistic approximation to the cumulative normal distribution

A logistic approximation to the cumulative normal distribution Shannon R. Bowling 1 ; Mohammad T. Khasawneh 2 ; Sittichai Kaewkuekool 3 ; Byung Rae Cho 4 1 Old Dominion University (USA); 2 State University

The CUSUM algorithm a small review. Pierre Granjon

The CUSUM algorithm a small review Pierre Granjon June, 1 Contents 1 The CUSUM algorithm 1.1 Algorithm............................... 1.1.1 The problem......................... 1.1. The different steps......................

bayesian inference: principles and practice

bayesian inference: and http://jakehofman.com july 9, 2009 bayesian inference: and motivation background bayes theorem bayesian probability bayesian inference would like models that: provide predictive

We shall turn our attention to solving linear systems of equations. Ax = b

59 Linear Algebra We shall turn our attention to solving linear systems of equations Ax = b where A R m n, x R n, and b R m. We already saw examples of methods that required the solution of a linear system

Point Biserial Correlation Tests

Chapter 807 Point Biserial Correlation Tests Introduction The point biserial correlation coefficient (ρ in this chapter) is the product-moment correlation calculated between a continuous random variable

Bayesian and Classical Inference

Eco517, Part I Fall 2002 C. Sims Bayesian and Classical Inference Probability statements made in Bayesian and classical approaches to inference often look similar, but they carry different meanings. Because

Bayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012

Bayesian logistic betting strategy against probability forecasting Akimichi Takemura, Univ. Tokyo (joint with Masayuki Kumon, Jing Li and Kei Takeuchi) November 12, 2012 arxiv:1204.3496. To appear in Stochastic

1 Fixed Point Iteration and Contraction Mapping Theorem

1 Fixed Point Iteration and Contraction Mapping Theorem Notation: For two sets A,B we write A B iff x A = x B. So A A is true. Some people use the notation instead. 1.1 Introduction Consider a function