L4: Bayesian Decision Theory

Size: px
Start display at page:

Download "L4: Bayesian Decision Theory"

Transcription

1 L4: Bayesian Decision Theory Likelihood ratio test Probability of error Bayes risk Bayes, MAP and ML criteria Multi-class problems Discriminant functions CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna 1

2 Likelihood ratio test (LRT) Assume we are to classify an object based on the evidence provided by feature vector x Would the following decision rule be reasonable? "Choose the class that is most probable given observation x More formally: Evaluate the posterior probability of each class P(ω i x) and choose the class with largest P(ω i x) Let s examine this rule for a 2-class problem In this case the decision rule becomes if P ω 1 x P ω 2 x choose ω 1 else choose ω 2 Or, in a more compact form Applying Bayes rule p x ω 1 P ω 1 p x P ω 1 x ω 1 P ω 2 x ω2 ω1 p x ω 2 P ω 2 ω2 p x CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 2

3 Since p(x) does not affect the decision rule, it can be eliminated* Rearranging the previous expression Λ x = p x ω 1 p x ω 2 ω1 P ω 2 ω2 P ω 1 The term Λ x is called the likelihood ratio, and the decision rule is known as the likelihood ratio test *p(x) can be disregarded in the decision rule since it is constant regardless of class ω i. However, p(x) will be needed if we want to estimate the posterior P ω i x which, unlike p x ω 1 P ω 1, is a true probability value and, therefore, gives us an estimate of the goodness of our decision CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 3

4 Likelihood ratio test: an example Problem Given the likelihoods below, derive a decision rule based on the LRT (assume equal priors) p x ω 1 = N 4,1 ; p x ω 2 = N 10,1 Solution Substituting into the LRT expression Λ x = 1 2π e 1 2 x π e 1 2 x 10 2 ω1 1 1 ω2 Simplifying the LRT expression Λ x = e 1 2 x x 10 2 ω 1 1 ω2 Changing signs and taking logs x 4 2 x 10 2ω 1 0 ω2 Which yields x ω 1 R 1 : say 1 7 ω2 This LRT result is intuitive since the likelihoods differ only in their mean How would the LRT decision rule change if the priors were such thatp ω 1 = 2P(ω 2 )? R 2 : say 2 P(x 1 ) P(x 2 ) 4 10 CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 4 x

5 Probability of error The performance of any decision rule can be measured by P[error] Making use of the Theorem of total probability (L2): P error = C i=1 P error ω i P[ω i ] The class conditional probability P error ω i can be expressed as P error ω i = P choose ω j ω i = p x ω i dx R j So, for our 2-class problem, P error becomes = ε i P error = P ω 1 p x ω 1 dx R 2 ε 1 + P ω 2 p x ω 2 dx ε 2 where ε i is the integral of p x ω i over region R j where we choose ω j For the previous example, since we assumed equal priors, then P[error] = (ε 1 + ε 2 )/2 How would you compute P error numerically? : say 1 R 2 : say 2 P(x 1 ) P(x 2 ) CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 5 x

6 How good is the LRT decision rule? To answer this question, it is convenient to express P[error] in terms of the posterior P[error x] P error = P error x p x dx The optimal decision rule will minimize P[error x] at every value of x in feature space, so that the integral above is minimized CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 6

7 At each x, P[error x ] is equal to P[ω i x ] when we choose ω j This is illustrated in the figure below, ALT R 2, ALT,LTR R 2,LRT Probability P( 1 x) P[error x' ] for ALT decision rule P( 2 x) P[error x' ] for LRT decision rule x From the figure it becomes clear that, for any value of x, the LRT will always have a lower P[error x ] Therefore, when we integrate over the real line, the LRT decision rule will yield a lower P[error] x For any given problem, the minimum probability of error is achieved by the LRT decision rule; this probability of error is called the Bayes Error Rate and is the best any classifier can do. CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 7

8 Bayes risk So far we have assumed that the penalty of misclassifying x ω 1 as ω 2 is the same as the reciprocal error In general, this is not the case For example, misclassifying a cancer sufferer as a healthy patient is a much more serious problem than the other way around This concept can be formalized in terms of a cost function C ij C ij represents the cost of choosing class ω i when ω j is the true class We define the Bayes Risk as the expected value of the cost 2 R = E C = i=1 2 = i=1 2 j=1 2 j=1 C ij P choose ω i and x ω j = C ij P x R i ω j P ω j CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 8

9 What is the decision rule that minimizes the Bayes Risk? First notice that P x R i ω j = p x ω j dx R i We can express the Bayes Risk as R = [C 11 P ω 1 p(x ω 1 ) + C 12 P ω 2 p x ω 2 dx + [C 21 P ω 1 p(x ω 1 ) + C 22 P ω 2 p x ω 2 dx R 2 Then we note that, for either likelihood, one can write: p x ω i dx + p x ω i dx R 2 = p x ω i dx R 2 = 1 CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 9

10 Merging the last equation into the Bayes Risk expression yields R = C 11 P 1 +C 21 P 1 +C 21 P 1 C 21 P 1 p x ω 1 dx + C 12 P 2 p x ω 2 dx p x ω 1 dx + C 22 P 2 p x ω 2 dx R 2 R 2 p x ω 1 dx + C 22 P 2 p x ω 2 dx p x ω 1 dx C 22 P 2 p x ω 2 dx Now we cancel out all the integrals over R 2 R = C 21 P 1 + C 22 P 2 + C 12 C 22 P 2 p x ω 2 dx C 21 C 11 P p x ω 1 dx The first two terms are constant w.r.t. so they can be ignored Thus, we seek a decision region that minimizes = argmin C 12 C 22 P 2 p x ω 2 C 21 C 11 P 1 p(x ω 1 ) dx = argmin g x CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 10

11 Let s forget about the actual expression of g(x) to develop some intuition for what kind of decision region we are looking for Intuitively, we will select for those regions that minimize g x In other words, those regions where g x 0 g(x) =A B C A B C x So we will choose such that C 21 C 11 P 1 p x ω 1 C 12 C 22 P 2 p x ω 2 And rearranging P x ω 1 P x ω 2 ω1 ω2 C 12 C 22 P ω 2 C 21 C 11 P ω 1 Therefore, minimization of the Bayes Risk also leads to an LRT CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 11

12 The Bayes risk: an example Consider a problem with likelihoods L 1 = N 0, 3 and L 2 = N 2,1 Λ x = 1 2 Sketch the two densities What is the likelihood ratio? Assume P 1 = P 2, C ii = 0, C 12 = 1 and C 21 = 3 1/2 Determine a decision rule to minimize P[error] x 2 N 0, 3 N 2,1 ω 1 ω x 2 2 2x 2 12x + 12 x = 4.73,1.27 ω 1 0 ω 2 ω 1 0 ω 2 likelihood x x R 2 CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 12

13 Bayes criterion LRT variations This is the LRT that minimizes the Bayes risk Λ Bayes x = p x ω ω1 1 C 12 C 22 P ω 2 p x ω 2 ω2 C 21 C 11 P ω 1 Maximum A Posteriori criterion Sometimes we may be interested in minimizing P error A special case of Λ Bayes x that uses a zero-one cost C ij = Known as the MAP criterion, since it seeks to maximize P ω i x Λ MAP x = p x ω ω1 1 P ω 2 P ω ω1 1 x 1 p x ω 2 ω2 P ω 1 P ω 2 x ω2 0; i = j 1; i j Maximum Likelihood criterion For equal priors P[ω i ] = 1/2 and 0/1 loss function, the LTR is known as a ML criterion, since it seeks to maximize P(x ω i ) Λ ML x = p x ω 1 p x ω 2 ω1 1 ω2 CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 13

14 Two more decision rules are commonly cited in the literature The Neyman-Pearson Criterion, used in Detection and Estimation Theory, which also leads to an LRT, fixes one class error probabilities, say ε 1 α, and seeks to minimize the other For instance, for the sea-bass/salmon classification problem of L1, there may be some kind of government regulation that we must not misclassify more than 1% of salmon as sea bass The Neyman-Pearson Criterion is very attractive since it does not require knowledge of priors and cost function The Minimax Criterion, used in Game Theory, is derived from the Bayes criterion, and seeks to minimize the maximum Bayes Risk The Minimax Criterion does nor require knowledge of the priors, but it needs a cost function For more information on these methods, refer to Detection, Estimation and Modulation Theory, by H.L. van Trees CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 14

15 Probability Minimum P[error] for multi-class problems Minimizing P[error] generalizes well for multiple classes For clarity in the derivation, we express P[error] in terms of the probability of making a correct assignment P error = 1 P[correct] The probability of making a correct assignment is P correct = Σ C i=1 P ω i p x ω i dx R i Minimizing P[error] is equivalent to maximizing P[correct], so expressing the latter in terms of posteriors C P correct = Σ i=1 R i p x P ω i x dx To maximize P[correct], we must maximize each integral Ri, which we achieve by choosing the class with largest posterior So each R i is the region where P ω i x is maximum, and the decision rule that minimizes P[error] is the MAP criterion R 2 R 3 R 2 P( 3 x) P( 2 x) P( 1 x) x CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 15

16 Risk Minimum Bayes risk for multi-class problems Minimizing the Bayes risk also generalizes well As before, we use a slightly different formulation We denote by α i the decision to choose class ω i We denote by α(x) the overall decision rule that maps feature vectors x into classes ω i, α x α 1, α 2, α C The (conditional) risk R α i x of assigning x to class ω i is R α x α i = R α i x = Σ C j=1 C ij P ω j x And the Bayes Risk associated with decision rule α(x) is R α x = R α x x p x dx To minimize this expression, we must minimize the conditional risk R α x x at each x, which is equivalent to choosing ω i such that R α i x is minimum R 2 R 2 R 3 R 2 R 2 ( 3 x) ( 1 x) ( 2 x) CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 16 x

17 Discriminant functions All the decision rules shown in L4 have the same structure At each point x in feature space, choose class ω i that maximizes (or minimizes) some measure g i (x) This structure can be formalized with a set of discriminant functions g i (x), i = 1.. C, and the decision rule assign x to class ω i if g i x Therefore, we can visualize the decision rule as a network that computes C df s and selects the class with highest discriminant And the three decision rules can be summarized as Discriminant functions g j x j i g 1 (x) Class assignment Select max g 2 (x) g C (x) Costs C rite rio n D is c rim in a n t F u n c tio n B a y e s g i (x )= - ( i x ) Features x 1 x 2 x 3 x d M A P g i (x )= P( i x ) ML g i (x )= P (x i ) CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 17

Lecture 9: Introduction to Pattern Analysis

Lecture 9: Introduction to Pattern Analysis Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu [email protected] Modern machine learning is rooted in statistics. You will find many familiar

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Message-passing sequential detection of multiple change points in networks

Message-passing sequential detection of multiple change points in networks Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal

More information

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

More information

L13: cross-validation

L13: cross-validation Resampling methods Cross validation Bootstrap L13: cross-validation Bias and variance estimation with the Bootstrap Three-way data partitioning CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

More information

Reject Inference in Credit Scoring. Jie-Men Mok

Reject Inference in Credit Scoring. Jie-Men Mok Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business

More information

1 Prior Probability and Posterior Probability

1 Prior Probability and Posterior Probability Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

L25: Ensemble learning

L25: Ensemble learning L25: Ensemble learning Introduction Methods for constructing ensembles Combination strategies Stacked generalization Mixtures of experts Bagging Boosting CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna

More information

5 Directed acyclic graphs

5 Directed acyclic graphs 5 Directed acyclic graphs (5.1) Introduction In many statistical studies we have prior knowledge about a temporal or causal ordering of the variables. In this chapter we will use directed graphs to incorporate

More information

Christfried Webers. Canberra February June 2015

Christfried Webers. Canberra February June 2015 c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

More information

Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil.

Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil. Steven J Zeil Old Dominion Univ. Fall 200 Discriminant-Based Classification Linearly Separable Systems Pairwise Separation 2 Posteriors 3 Logistic Discrimination 2 Discriminant-Based Classification Likelihood-based:

More information

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}. Walrasian Demand Econ 2100 Fall 2015 Lecture 5, September 16 Outline 1 Walrasian Demand 2 Properties of Walrasian Demand 3 An Optimization Recipe 4 First and Second Order Conditions Definition Walrasian

More information

Constrained optimization.

Constrained optimization. ams/econ 11b supplementary notes ucsc Constrained optimization. c 2010, Yonatan Katznelson 1. Constraints In many of the optimization problems that arise in economics, there are restrictions on the values

More information

6. Budget Deficits and Fiscal Policy

6. Budget Deficits and Fiscal Policy Prof. Dr. Thomas Steger Advanced Macroeconomics II Lecture SS 2012 6. Budget Deficits and Fiscal Policy Introduction Ricardian equivalence Distorting taxes Debt crises Introduction (1) Ricardian equivalence

More information

Principle of Data Reduction

Principle of Data Reduction Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

More information

Bayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012

Bayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012 Bayesian logistic betting strategy against probability forecasting Akimichi Takemura, Univ. Tokyo (joint with Masayuki Kumon, Jing Li and Kei Takeuchi) November 12, 2012 arxiv:1204.3496. To appear in Stochastic

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

More information

Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents

Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents William H. Sandholm January 6, 22 O.. Imitative protocols, mean dynamics, and equilibrium selection In this section, we consider

More information

Economics of Insurance

Economics of Insurance Economics of Insurance In this last lecture, we cover most topics of Economics of Information within a single application. Through this, you will see how the differential informational assumptions allow

More information

Gaussian Conjugate Prior Cheat Sheet

Gaussian Conjugate Prior Cheat Sheet Gaussian Conjugate Prior Cheat Sheet Tom SF Haines 1 Purpose This document contains notes on how to handle the multivariate Gaussian 1 in a Bayesian setting. It focuses on the conjugate prior, its Bayesian

More information

Elasticity Theory Basics

Elasticity Theory Basics G22.3033-002: Topics in Computer Graphics: Lecture #7 Geometric Modeling New York University Elasticity Theory Basics Lecture #7: 20 October 2003 Lecturer: Denis Zorin Scribe: Adrian Secord, Yotam Gingold

More information

11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression

11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c11 2013/9/9 page 221 le-tex 221 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

1 Teaching notes on GMM 1.

1 Teaching notes on GMM 1. Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in

More information

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015 ECON 459 Game Theory Lecture Notes Auctions Luca Anderlini Spring 2015 These notes have been used before. If you can still spot any errors or have any suggestions for improvement, please let me know. 1

More information

Bayesian probability theory

Bayesian probability theory Bayesian probability theory Bruno A. Olshausen arch 1, 2004 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using probability. The foundations

More information

The Optimality of Naive Bayes

The Optimality of Naive Bayes The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Inference of Probability Distributions for Trust and Security applications

Inference of Probability Distributions for Trust and Security applications Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations

More information

A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

More information

Bayesian Model Averaging Continual Reassessment Method BMA-CRM. Guosheng Yin and Ying Yuan. August 26, 2009

Bayesian Model Averaging Continual Reassessment Method BMA-CRM. Guosheng Yin and Ying Yuan. August 26, 2009 Bayesian Model Averaging Continual Reassessment Method BMA-CRM Guosheng Yin and Ying Yuan August 26, 2009 This document provides the statistical background for the Bayesian model averaging continual reassessment

More information

Nonparametric adaptive age replacement with a one-cycle criterion

Nonparametric adaptive age replacement with a one-cycle criterion Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: [email protected]

More information

Metric Spaces. Chapter 7. 7.1. Metrics

Metric Spaces. Chapter 7. 7.1. Metrics Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some

More information

Notes on metric spaces

Notes on metric spaces Notes on metric spaces 1 Introduction The purpose of these notes is to quickly review some of the basic concepts from Real Analysis, Metric Spaces and some related results that will be used in this course.

More information

Notes on the Negative Binomial Distribution

Notes on the Negative Binomial Distribution Notes on the Negative Binomial Distribution John D. Cook October 28, 2009 Abstract These notes give several properties of the negative binomial distribution. 1. Parameterizations 2. The connection between

More information

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1]. Probability Theory Probability Spaces and Events Consider a random experiment with several possible outcomes. For example, we might roll a pair of dice, flip a coin three times, or choose a random real

More information

Section 1.1 Linear Equations: Slope and Equations of Lines

Section 1.1 Linear Equations: Slope and Equations of Lines Section. Linear Equations: Slope and Equations of Lines Slope The measure of the steepness of a line is called the slope of the line. It is the amount of change in y, the rise, divided by the amount of

More information

Lecture Notes on Elasticity of Substitution

Lecture Notes on Elasticity of Substitution Lecture Notes on Elasticity of Substitution Ted Bergstrom, UCSB Economics 210A March 3, 2011 Today s featured guest is the elasticity of substitution. Elasticity of a function of a single variable Before

More information

A Game Theoretical Framework for Adversarial Learning

A Game Theoretical Framework for Adversarial Learning A Game Theoretical Framework for Adversarial Learning Murat Kantarcioglu University of Texas at Dallas Richardson, TX 75083, USA muratk@utdallas Chris Clifton Purdue University West Lafayette, IN 47907,

More information

Least-Squares Intersection of Lines

Least-Squares Intersection of Lines Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a

More information

1 Sufficient statistics

1 Sufficient statistics 1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =

More information

Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February

More information

Language Modeling. Chapter 1. 1.1 Introduction

Language Modeling. Chapter 1. 1.1 Introduction Chapter 1 Language Modeling (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction In this chapter we will consider the the problem of constructing a language model from a set

More information

Classification Problems

Classification Problems Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems

More information

15. Symmetric polynomials

15. Symmetric polynomials 15. Symmetric polynomials 15.1 The theorem 15.2 First examples 15.3 A variant: discriminants 1. The theorem Let S n be the group of permutations of {1,, n}, also called the symmetric group on n things.

More information

Monte Carlo-based statistical methods (MASM11/FMS091)

Monte Carlo-based statistical methods (MASM11/FMS091) Monte Carlo-based statistical methods (MASM11/FMS091) Jimmy Olsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February 5, 2013 J. Olsson Monte Carlo-based

More information

Introduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007

Introduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007 Introduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007 Naïve Bayes Components ML vs. MAP Benefits Feature Preparation Filtering Decay Extended Examples

More information

V out. Figure 1: A voltage divider on the left, and potentiometer on the right.

V out. Figure 1: A voltage divider on the left, and potentiometer on the right. Living with the Lab Fall 202 Voltage Dividers and Potentiometers Gerald Recktenwald v: November 26, 202 [email protected] Introduction Voltage dividers and potentiometers are passive circuit components

More information

Nonlinear Algebraic Equations Example

Nonlinear Algebraic Equations Example Nonlinear Algebraic Equations Example Continuous Stirred Tank Reactor (CSTR). Look for steady state concentrations & temperature. s r (in) p,i (in) i In: N spieces with concentrations c, heat capacities

More information

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic Report prepared for Brandon Slama Department of Health Management and Informatics University of Missouri, Columbia

More information

Vector and Matrix Norms

Vector and Matrix Norms Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

More information

A Brief Review of Elementary Ordinary Differential Equations

A Brief Review of Elementary Ordinary Differential Equations 1 A Brief Review of Elementary Ordinary Differential Equations At various points in the material we will be covering, we will need to recall and use material normally covered in an elementary course on

More information

ST 371 (IV): Discrete Random Variables

ST 371 (IV): Discrete Random Variables ST 371 (IV): Discrete Random Variables 1 Random Variables A random variable (rv) is a function that is defined on the sample space of the experiment and that assigns a numerical variable to each possible

More information

Solving Quadratic Equations

Solving Quadratic Equations 9.3 Solving Quadratic Equations by Using the Quadratic Formula 9.3 OBJECTIVES 1. Solve a quadratic equation by using the quadratic formula 2. Determine the nature of the solutions of a quadratic equation

More information

Practice problems for Homework 11 - Point Estimation

Practice problems for Homework 11 - Point Estimation Practice problems for Homework 11 - Point Estimation 1. (10 marks) Suppose we want to select a random sample of size 5 from the current CS 3341 students. Which of the following strategies is the best:

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete

More information

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α

More information

3 Monomial orders and the division algorithm

3 Monomial orders and the division algorithm 3 Monomial orders and the division algorithm We address the problem of deciding which term of a polynomial is the leading term. We noted above that, unlike in the univariate case, the total degree does

More information

Inner Product Spaces

Inner Product Spaces Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

More information

Time Series Analysis

Time Series Analysis Time Series Analysis [email protected] Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Identification of univariate time series models, cont.:

More information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

More information

ASEN 3112 - Structures. MDOF Dynamic Systems. ASEN 3112 Lecture 1 Slide 1

ASEN 3112 - Structures. MDOF Dynamic Systems. ASEN 3112 Lecture 1 Slide 1 19 MDOF Dynamic Systems ASEN 3112 Lecture 1 Slide 1 A Two-DOF Mass-Spring-Dashpot Dynamic System Consider the lumped-parameter, mass-spring-dashpot dynamic system shown in the Figure. It has two point

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci

More information

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA This handout presents the second derivative test for a local extrema of a Lagrange multiplier problem. The Section 1 presents a geometric motivation for the

More information

Monte Carlo-based statistical methods (MASM11/FMS091)

Monte Carlo-based statistical methods (MASM11/FMS091) Monte Carlo-based statistical methods (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 6 Sequential Monte Carlo methods II February 7, 2014 M. Wiktorsson

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

Testing Research and Statistical Hypotheses

Testing Research and Statistical Hypotheses Testing Research and Statistical Hypotheses Introduction In the last lab we analyzed metric artifact attributes such as thickness or width/thickness ratio. Those were continuous variables, which as you

More information

Review of Fundamental Mathematics

Review of Fundamental Mathematics Review of Fundamental Mathematics As explained in the Preface and in Chapter 1 of your textbook, managerial economics applies microeconomic theory to business decision making. The decision-making tools

More information

Bayesian Classifier for a Gaussian Distribution, Decision Surface Equation, with Application

Bayesian Classifier for a Gaussian Distribution, Decision Surface Equation, with Application Iraqi Journal of Statistical Science (18) 2010 p.p. [35-58] Bayesian Classifier for a Gaussian Distribution, Decision Surface Equation, with Application ABSTRACT Nawzad. M. Ahmad * Bayesian decision theory

More information

minimal polyonomial Example

minimal polyonomial Example Minimal Polynomials Definition Let α be an element in GF(p e ). We call the monic polynomial of smallest degree which has coefficients in GF(p) and α as a root, the minimal polyonomial of α. Example: We

More information

Inner product. Definition of inner product

Inner product. Definition of inner product Math 20F Linear Algebra Lecture 25 1 Inner product Review: Definition of inner product. Slide 1 Norm and distance. Orthogonal vectors. Orthogonal complement. Orthogonal basis. Definition of inner product

More information

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

CHI-SQUARE: TESTING FOR GOODNESS OF FIT CHI-SQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity

More information

Likelihood Approaches for Trial Designs in Early Phase Oncology

Likelihood Approaches for Trial Designs in Early Phase Oncology Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth Garrett-Mayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University

More information

Fuzzy Probability Distributions in Bayesian Analysis

Fuzzy Probability Distributions in Bayesian Analysis Fuzzy Probability Distributions in Bayesian Analysis Reinhard Viertl and Owat Sunanta Department of Statistics and Probability Theory Vienna University of Technology, Vienna, Austria Corresponding author:

More information

5. Continuous Random Variables

5. Continuous Random Variables 5. Continuous Random Variables Continuous random variables can take any value in an interval. They are used to model physical characteristics such as time, length, position, etc. Examples (i) Let X be

More information

Automata on Infinite Words and Trees

Automata on Infinite Words and Trees Automata on Infinite Words and Trees Course notes for the course Automata on Infinite Words and Trees given by Dr. Meghyn Bienvenu at Universität Bremen in the 2009-2010 winter semester Last modified:

More information

Signal Detection C H A P T E R 14 14.1 SIGNAL DETECTION AS HYPOTHESIS TESTING

Signal Detection C H A P T E R 14 14.1 SIGNAL DETECTION AS HYPOTHESIS TESTING C H A P T E R 4 Signal Detection 4. SIGNAL DETECTION AS HYPOTHESIS TESTING In Chapter 3 we considered hypothesis testing in the context of random variables. The detector resulting in the minimum probability

More information

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 [email protected] Abstract Probability distributions on structured representation.

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

The elements used in commercial codes can be classified in two basic categories:

The elements used in commercial codes can be classified in two basic categories: CHAPTER 3 Truss Element 3.1 Introduction The single most important concept in understanding FEA, is the basic understanding of various finite elements that we employ in an analysis. Elements are used for

More information

Equations, Inequalities & Partial Fractions

Equations, Inequalities & Partial Fractions Contents Equations, Inequalities & Partial Fractions.1 Solving Linear Equations 2.2 Solving Quadratic Equations 1. Solving Polynomial Equations 1.4 Solving Simultaneous Linear Equations 42.5 Solving Inequalities

More information

1 Another method of estimation: least squares

1 Another method of estimation: least squares 1 Another method of estimation: least squares erm: -estim.tex, Dec8, 009: 6 p.m. (draft - typos/writos likely exist) Corrections, comments, suggestions welcome. 1.1 Least squares in general Assume Y i

More information

5.1 Identifying the Target Parameter

5.1 Identifying the Target Parameter University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying

More information

Multi-variable Calculus and Optimization

Multi-variable Calculus and Optimization Multi-variable Calculus and Optimization Dudley Cooke Trinity College Dublin Dudley Cooke (Trinity College Dublin) Multi-variable Calculus and Optimization 1 / 51 EC2040 Topic 3 - Multi-variable Calculus

More information