L4: Bayesian Decision Theory
|
|
|
- Brian Banks
- 9 years ago
- Views:
Transcription
1 L4: Bayesian Decision Theory Likelihood ratio test Probability of error Bayes risk Bayes, MAP and ML criteria Multi-class problems Discriminant functions CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna 1
2 Likelihood ratio test (LRT) Assume we are to classify an object based on the evidence provided by feature vector x Would the following decision rule be reasonable? "Choose the class that is most probable given observation x More formally: Evaluate the posterior probability of each class P(ω i x) and choose the class with largest P(ω i x) Let s examine this rule for a 2-class problem In this case the decision rule becomes if P ω 1 x P ω 2 x choose ω 1 else choose ω 2 Or, in a more compact form Applying Bayes rule p x ω 1 P ω 1 p x P ω 1 x ω 1 P ω 2 x ω2 ω1 p x ω 2 P ω 2 ω2 p x CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 2
3 Since p(x) does not affect the decision rule, it can be eliminated* Rearranging the previous expression Λ x = p x ω 1 p x ω 2 ω1 P ω 2 ω2 P ω 1 The term Λ x is called the likelihood ratio, and the decision rule is known as the likelihood ratio test *p(x) can be disregarded in the decision rule since it is constant regardless of class ω i. However, p(x) will be needed if we want to estimate the posterior P ω i x which, unlike p x ω 1 P ω 1, is a true probability value and, therefore, gives us an estimate of the goodness of our decision CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 3
4 Likelihood ratio test: an example Problem Given the likelihoods below, derive a decision rule based on the LRT (assume equal priors) p x ω 1 = N 4,1 ; p x ω 2 = N 10,1 Solution Substituting into the LRT expression Λ x = 1 2π e 1 2 x π e 1 2 x 10 2 ω1 1 1 ω2 Simplifying the LRT expression Λ x = e 1 2 x x 10 2 ω 1 1 ω2 Changing signs and taking logs x 4 2 x 10 2ω 1 0 ω2 Which yields x ω 1 R 1 : say 1 7 ω2 This LRT result is intuitive since the likelihoods differ only in their mean How would the LRT decision rule change if the priors were such thatp ω 1 = 2P(ω 2 )? R 2 : say 2 P(x 1 ) P(x 2 ) 4 10 CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 4 x
5 Probability of error The performance of any decision rule can be measured by P[error] Making use of the Theorem of total probability (L2): P error = C i=1 P error ω i P[ω i ] The class conditional probability P error ω i can be expressed as P error ω i = P choose ω j ω i = p x ω i dx R j So, for our 2-class problem, P error becomes = ε i P error = P ω 1 p x ω 1 dx R 2 ε 1 + P ω 2 p x ω 2 dx ε 2 where ε i is the integral of p x ω i over region R j where we choose ω j For the previous example, since we assumed equal priors, then P[error] = (ε 1 + ε 2 )/2 How would you compute P error numerically? : say 1 R 2 : say 2 P(x 1 ) P(x 2 ) CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 5 x
6 How good is the LRT decision rule? To answer this question, it is convenient to express P[error] in terms of the posterior P[error x] P error = P error x p x dx The optimal decision rule will minimize P[error x] at every value of x in feature space, so that the integral above is minimized CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 6
7 At each x, P[error x ] is equal to P[ω i x ] when we choose ω j This is illustrated in the figure below, ALT R 2, ALT,LTR R 2,LRT Probability P( 1 x) P[error x' ] for ALT decision rule P( 2 x) P[error x' ] for LRT decision rule x From the figure it becomes clear that, for any value of x, the LRT will always have a lower P[error x ] Therefore, when we integrate over the real line, the LRT decision rule will yield a lower P[error] x For any given problem, the minimum probability of error is achieved by the LRT decision rule; this probability of error is called the Bayes Error Rate and is the best any classifier can do. CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 7
8 Bayes risk So far we have assumed that the penalty of misclassifying x ω 1 as ω 2 is the same as the reciprocal error In general, this is not the case For example, misclassifying a cancer sufferer as a healthy patient is a much more serious problem than the other way around This concept can be formalized in terms of a cost function C ij C ij represents the cost of choosing class ω i when ω j is the true class We define the Bayes Risk as the expected value of the cost 2 R = E C = i=1 2 = i=1 2 j=1 2 j=1 C ij P choose ω i and x ω j = C ij P x R i ω j P ω j CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 8
9 What is the decision rule that minimizes the Bayes Risk? First notice that P x R i ω j = p x ω j dx R i We can express the Bayes Risk as R = [C 11 P ω 1 p(x ω 1 ) + C 12 P ω 2 p x ω 2 dx + [C 21 P ω 1 p(x ω 1 ) + C 22 P ω 2 p x ω 2 dx R 2 Then we note that, for either likelihood, one can write: p x ω i dx + p x ω i dx R 2 = p x ω i dx R 2 = 1 CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 9
10 Merging the last equation into the Bayes Risk expression yields R = C 11 P 1 +C 21 P 1 +C 21 P 1 C 21 P 1 p x ω 1 dx + C 12 P 2 p x ω 2 dx p x ω 1 dx + C 22 P 2 p x ω 2 dx R 2 R 2 p x ω 1 dx + C 22 P 2 p x ω 2 dx p x ω 1 dx C 22 P 2 p x ω 2 dx Now we cancel out all the integrals over R 2 R = C 21 P 1 + C 22 P 2 + C 12 C 22 P 2 p x ω 2 dx C 21 C 11 P p x ω 1 dx The first two terms are constant w.r.t. so they can be ignored Thus, we seek a decision region that minimizes = argmin C 12 C 22 P 2 p x ω 2 C 21 C 11 P 1 p(x ω 1 ) dx = argmin g x CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 10
11 Let s forget about the actual expression of g(x) to develop some intuition for what kind of decision region we are looking for Intuitively, we will select for those regions that minimize g x In other words, those regions where g x 0 g(x) =A B C A B C x So we will choose such that C 21 C 11 P 1 p x ω 1 C 12 C 22 P 2 p x ω 2 And rearranging P x ω 1 P x ω 2 ω1 ω2 C 12 C 22 P ω 2 C 21 C 11 P ω 1 Therefore, minimization of the Bayes Risk also leads to an LRT CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 11
12 The Bayes risk: an example Consider a problem with likelihoods L 1 = N 0, 3 and L 2 = N 2,1 Λ x = 1 2 Sketch the two densities What is the likelihood ratio? Assume P 1 = P 2, C ii = 0, C 12 = 1 and C 21 = 3 1/2 Determine a decision rule to minimize P[error] x 2 N 0, 3 N 2,1 ω 1 ω x 2 2 2x 2 12x + 12 x = 4.73,1.27 ω 1 0 ω 2 ω 1 0 ω 2 likelihood x x R 2 CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 12
13 Bayes criterion LRT variations This is the LRT that minimizes the Bayes risk Λ Bayes x = p x ω ω1 1 C 12 C 22 P ω 2 p x ω 2 ω2 C 21 C 11 P ω 1 Maximum A Posteriori criterion Sometimes we may be interested in minimizing P error A special case of Λ Bayes x that uses a zero-one cost C ij = Known as the MAP criterion, since it seeks to maximize P ω i x Λ MAP x = p x ω ω1 1 P ω 2 P ω ω1 1 x 1 p x ω 2 ω2 P ω 1 P ω 2 x ω2 0; i = j 1; i j Maximum Likelihood criterion For equal priors P[ω i ] = 1/2 and 0/1 loss function, the LTR is known as a ML criterion, since it seeks to maximize P(x ω i ) Λ ML x = p x ω 1 p x ω 2 ω1 1 ω2 CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 13
14 Two more decision rules are commonly cited in the literature The Neyman-Pearson Criterion, used in Detection and Estimation Theory, which also leads to an LRT, fixes one class error probabilities, say ε 1 α, and seeks to minimize the other For instance, for the sea-bass/salmon classification problem of L1, there may be some kind of government regulation that we must not misclassify more than 1% of salmon as sea bass The Neyman-Pearson Criterion is very attractive since it does not require knowledge of priors and cost function The Minimax Criterion, used in Game Theory, is derived from the Bayes criterion, and seeks to minimize the maximum Bayes Risk The Minimax Criterion does nor require knowledge of the priors, but it needs a cost function For more information on these methods, refer to Detection, Estimation and Modulation Theory, by H.L. van Trees CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 14
15 Probability Minimum P[error] for multi-class problems Minimizing P[error] generalizes well for multiple classes For clarity in the derivation, we express P[error] in terms of the probability of making a correct assignment P error = 1 P[correct] The probability of making a correct assignment is P correct = Σ C i=1 P ω i p x ω i dx R i Minimizing P[error] is equivalent to maximizing P[correct], so expressing the latter in terms of posteriors C P correct = Σ i=1 R i p x P ω i x dx To maximize P[correct], we must maximize each integral Ri, which we achieve by choosing the class with largest posterior So each R i is the region where P ω i x is maximum, and the decision rule that minimizes P[error] is the MAP criterion R 2 R 3 R 2 P( 3 x) P( 2 x) P( 1 x) x CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 15
16 Risk Minimum Bayes risk for multi-class problems Minimizing the Bayes risk also generalizes well As before, we use a slightly different formulation We denote by α i the decision to choose class ω i We denote by α(x) the overall decision rule that maps feature vectors x into classes ω i, α x α 1, α 2, α C The (conditional) risk R α i x of assigning x to class ω i is R α x α i = R α i x = Σ C j=1 C ij P ω j x And the Bayes Risk associated with decision rule α(x) is R α x = R α x x p x dx To minimize this expression, we must minimize the conditional risk R α x x at each x, which is equivalent to choosing ω i such that R α i x is minimum R 2 R 2 R 3 R 2 R 2 ( 3 x) ( 1 x) ( 2 x) CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 16 x
17 Discriminant functions All the decision rules shown in L4 have the same structure At each point x in feature space, choose class ω i that maximizes (or minimizes) some measure g i (x) This structure can be formalized with a set of discriminant functions g i (x), i = 1.. C, and the decision rule assign x to class ω i if g i x Therefore, we can visualize the decision rule as a network that computes C df s and selects the class with highest discriminant And the three decision rules can be summarized as Discriminant functions g j x j i g 1 (x) Class assignment Select max g 2 (x) g C (x) Costs C rite rio n D is c rim in a n t F u n c tio n B a y e s g i (x )= - ( i x ) Features x 1 x 2 x 3 x d M A P g i (x )= P( i x ) ML g i (x )= P (x i ) CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 17
Lecture 9: Introduction to Pattern Analysis
Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns
Basics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu [email protected] Modern machine learning is rooted in statistics. You will find many familiar
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
Statistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
CHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Message-passing sequential detection of multiple change points in networks
Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal
CS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
L13: cross-validation
Resampling methods Cross validation Bootstrap L13: cross-validation Bias and variance estimation with the Bootstrap Three-way data partitioning CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU
These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
Reject Inference in Credit Scoring. Jie-Men Mok
Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business
1 Prior Probability and Posterior Probability
Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which
Linear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
L25: Ensemble learning
L25: Ensemble learning Introduction Methods for constructing ensembles Combination strategies Stacked generalization Mixtures of experts Bagging Boosting CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna
5 Directed acyclic graphs
5 Directed acyclic graphs (5.1) Introduction In many statistical studies we have prior knowledge about a temporal or causal ordering of the variables. In this chapter we will use directed graphs to incorporate
Christfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
Maximum Likelihood Estimation
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for
Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil.
Steven J Zeil Old Dominion Univ. Fall 200 Discriminant-Based Classification Linearly Separable Systems Pairwise Separation 2 Posteriors 3 Logistic Discrimination 2 Discriminant-Based Classification Likelihood-based:
Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.
Walrasian Demand Econ 2100 Fall 2015 Lecture 5, September 16 Outline 1 Walrasian Demand 2 Properties of Walrasian Demand 3 An Optimization Recipe 4 First and Second Order Conditions Definition Walrasian
Constrained optimization.
ams/econ 11b supplementary notes ucsc Constrained optimization. c 2010, Yonatan Katznelson 1. Constraints In many of the optimization problems that arise in economics, there are restrictions on the values
6. Budget Deficits and Fiscal Policy
Prof. Dr. Thomas Steger Advanced Macroeconomics II Lecture SS 2012 6. Budget Deficits and Fiscal Policy Introduction Ricardian equivalence Distorting taxes Debt crises Introduction (1) Ricardian equivalence
Principle of Data Reduction
Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then
Bayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012
Bayesian logistic betting strategy against probability forecasting Akimichi Takemura, Univ. Tokyo (joint with Masayuki Kumon, Jing Li and Kei Takeuchi) November 12, 2012 arxiv:1204.3496. To appear in Stochastic
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
Introduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
CSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents
Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents William H. Sandholm January 6, 22 O.. Imitative protocols, mean dynamics, and equilibrium selection In this section, we consider
Economics of Insurance
Economics of Insurance In this last lecture, we cover most topics of Economics of Information within a single application. Through this, you will see how the differential informational assumptions allow
Gaussian Conjugate Prior Cheat Sheet
Gaussian Conjugate Prior Cheat Sheet Tom SF Haines 1 Purpose This document contains notes on how to handle the multivariate Gaussian 1 in a Bayesian setting. It focuses on the conjugate prior, its Bayesian
Elasticity Theory Basics
G22.3033-002: Topics in Computer Graphics: Lecture #7 Geometric Modeling New York University Elasticity Theory Basics Lecture #7: 20 October 2003 Lecturer: Denis Zorin Scribe: Adrian Secord, Yotam Gingold
11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression
Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c11 2013/9/9 page 221 le-tex 221 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
1 Teaching notes on GMM 1.
Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in
ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015
ECON 459 Game Theory Lecture Notes Auctions Luca Anderlini Spring 2015 These notes have been used before. If you can still spot any errors or have any suggestions for improvement, please let me know. 1
Bayesian probability theory
Bayesian probability theory Bruno A. Olshausen arch 1, 2004 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using probability. The foundations
The Optimality of Naive Bayes
The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
Inference of Probability Distributions for Trust and Security applications
Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations
A Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear
Bayesian Model Averaging Continual Reassessment Method BMA-CRM. Guosheng Yin and Ying Yuan. August 26, 2009
Bayesian Model Averaging Continual Reassessment Method BMA-CRM Guosheng Yin and Ying Yuan August 26, 2009 This document provides the statistical background for the Bayesian model averaging continual reassessment
Nonparametric adaptive age replacement with a one-cycle criterion
Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: [email protected]
Metric Spaces. Chapter 7. 7.1. Metrics
Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some
Notes on metric spaces
Notes on metric spaces 1 Introduction The purpose of these notes is to quickly review some of the basic concepts from Real Analysis, Metric Spaces and some related results that will be used in this course.
Notes on the Negative Binomial Distribution
Notes on the Negative Binomial Distribution John D. Cook October 28, 2009 Abstract These notes give several properties of the negative binomial distribution. 1. Parameterizations 2. The connection between
The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].
Probability Theory Probability Spaces and Events Consider a random experiment with several possible outcomes. For example, we might roll a pair of dice, flip a coin three times, or choose a random real
Section 1.1 Linear Equations: Slope and Equations of Lines
Section. Linear Equations: Slope and Equations of Lines Slope The measure of the steepness of a line is called the slope of the line. It is the amount of change in y, the rise, divided by the amount of
Lecture Notes on Elasticity of Substitution
Lecture Notes on Elasticity of Substitution Ted Bergstrom, UCSB Economics 210A March 3, 2011 Today s featured guest is the elasticity of substitution. Elasticity of a function of a single variable Before
A Game Theoretical Framework for Adversarial Learning
A Game Theoretical Framework for Adversarial Learning Murat Kantarcioglu University of Texas at Dallas Richardson, TX 75083, USA muratk@utdallas Chris Clifton Purdue University West Lafayette, IN 47907,
Least-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
1 Sufficient statistics
1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =
Continued Fractions and the Euclidean Algorithm
Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction
Introduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)
Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February
Language Modeling. Chapter 1. 1.1 Introduction
Chapter 1 Language Modeling (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction In this chapter we will consider the the problem of constructing a language model from a set
Classification Problems
Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems
15. Symmetric polynomials
15. Symmetric polynomials 15.1 The theorem 15.2 First examples 15.3 A variant: discriminants 1. The theorem Let S n be the group of permutations of {1,, n}, also called the symmetric group on n things.
Monte Carlo-based statistical methods (MASM11/FMS091)
Monte Carlo-based statistical methods (MASM11/FMS091) Jimmy Olsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February 5, 2013 J. Olsson Monte Carlo-based
Introduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007
Introduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007 Naïve Bayes Components ML vs. MAP Benefits Feature Preparation Filtering Decay Extended Examples
V out. Figure 1: A voltage divider on the left, and potentiometer on the right.
Living with the Lab Fall 202 Voltage Dividers and Potentiometers Gerald Recktenwald v: November 26, 202 [email protected] Introduction Voltage dividers and potentiometers are passive circuit components
Nonlinear Algebraic Equations Example
Nonlinear Algebraic Equations Example Continuous Stirred Tank Reactor (CSTR). Look for steady state concentrations & temperature. s r (in) p,i (in) i In: N spieces with concentrations c, heat capacities
A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic
A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic Report prepared for Brandon Slama Department of Health Management and Informatics University of Missouri, Columbia
Vector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty
A Brief Review of Elementary Ordinary Differential Equations
1 A Brief Review of Elementary Ordinary Differential Equations At various points in the material we will be covering, we will need to recall and use material normally covered in an elementary course on
ST 371 (IV): Discrete Random Variables
ST 371 (IV): Discrete Random Variables 1 Random Variables A random variable (rv) is a function that is defined on the sample space of the experiment and that assigns a numerical variable to each possible
Solving Quadratic Equations
9.3 Solving Quadratic Equations by Using the Quadratic Formula 9.3 OBJECTIVES 1. Solve a quadratic equation by using the quadratic formula 2. Determine the nature of the solutions of a quadratic equation
Practice problems for Homework 11 - Point Estimation
Practice problems for Homework 11 - Point Estimation 1. (10 marks) Suppose we want to select a random sample of size 5 from the current CS 3341 students. Which of the following strategies is the best:
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete
MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.
MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α
3 Monomial orders and the division algorithm
3 Monomial orders and the division algorithm We address the problem of deciding which term of a polynomial is the leading term. We noted above that, unlike in the univariate case, the total degree does
Inner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
Time Series Analysis
Time Series Analysis [email protected] Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Identification of univariate time series models, cont.:
Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
ASEN 3112 - Structures. MDOF Dynamic Systems. ASEN 3112 Lecture 1 Slide 1
19 MDOF Dynamic Systems ASEN 3112 Lecture 1 Slide 1 A Two-DOF Mass-Spring-Dashpot Dynamic System Consider the lumped-parameter, mass-spring-dashpot dynamic system shown in the Figure. It has two point
Linear Models for Classification
Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci
SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA
SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA This handout presents the second derivative test for a local extrema of a Lagrange multiplier problem. The Section 1 presents a geometric motivation for the
Monte Carlo-based statistical methods (MASM11/FMS091)
Monte Carlo-based statistical methods (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 6 Sequential Monte Carlo methods II February 7, 2014 M. Wiktorsson
Association Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
Testing Research and Statistical Hypotheses
Testing Research and Statistical Hypotheses Introduction In the last lab we analyzed metric artifact attributes such as thickness or width/thickness ratio. Those were continuous variables, which as you
Review of Fundamental Mathematics
Review of Fundamental Mathematics As explained in the Preface and in Chapter 1 of your textbook, managerial economics applies microeconomic theory to business decision making. The decision-making tools
Bayesian Classifier for a Gaussian Distribution, Decision Surface Equation, with Application
Iraqi Journal of Statistical Science (18) 2010 p.p. [35-58] Bayesian Classifier for a Gaussian Distribution, Decision Surface Equation, with Application ABSTRACT Nawzad. M. Ahmad * Bayesian decision theory
minimal polyonomial Example
Minimal Polynomials Definition Let α be an element in GF(p e ). We call the monic polynomial of smallest degree which has coefficients in GF(p) and α as a root, the minimal polyonomial of α. Example: We
Inner product. Definition of inner product
Math 20F Linear Algebra Lecture 25 1 Inner product Review: Definition of inner product. Slide 1 Norm and distance. Orthogonal vectors. Orthogonal complement. Orthogonal basis. Definition of inner product
CHI-SQUARE: TESTING FOR GOODNESS OF FIT
CHI-SQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity
Likelihood Approaches for Trial Designs in Early Phase Oncology
Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth Garrett-Mayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University
Fuzzy Probability Distributions in Bayesian Analysis
Fuzzy Probability Distributions in Bayesian Analysis Reinhard Viertl and Owat Sunanta Department of Statistics and Probability Theory Vienna University of Technology, Vienna, Austria Corresponding author:
5. Continuous Random Variables
5. Continuous Random Variables Continuous random variables can take any value in an interval. They are used to model physical characteristics such as time, length, position, etc. Examples (i) Let X be
Automata on Infinite Words and Trees
Automata on Infinite Words and Trees Course notes for the course Automata on Infinite Words and Trees given by Dr. Meghyn Bienvenu at Universität Bremen in the 2009-2010 winter semester Last modified:
Signal Detection C H A P T E R 14 14.1 SIGNAL DETECTION AS HYPOTHESIS TESTING
C H A P T E R 4 Signal Detection 4. SIGNAL DETECTION AS HYPOTHESIS TESTING In Chapter 3 we considered hypothesis testing in the context of random variables. The detector resulting in the minimum probability
Course: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 [email protected] Abstract Probability distributions on structured representation.
Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
The elements used in commercial codes can be classified in two basic categories:
CHAPTER 3 Truss Element 3.1 Introduction The single most important concept in understanding FEA, is the basic understanding of various finite elements that we employ in an analysis. Elements are used for
Equations, Inequalities & Partial Fractions
Contents Equations, Inequalities & Partial Fractions.1 Solving Linear Equations 2.2 Solving Quadratic Equations 1. Solving Polynomial Equations 1.4 Solving Simultaneous Linear Equations 42.5 Solving Inequalities
1 Another method of estimation: least squares
1 Another method of estimation: least squares erm: -estim.tex, Dec8, 009: 6 p.m. (draft - typos/writos likely exist) Corrections, comments, suggestions welcome. 1.1 Least squares in general Assume Y i
5.1 Identifying the Target Parameter
University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying
Multi-variable Calculus and Optimization
Multi-variable Calculus and Optimization Dudley Cooke Trinity College Dublin Dudley Cooke (Trinity College Dublin) Multi-variable Calculus and Optimization 1 / 51 EC2040 Topic 3 - Multi-variable Calculus
