Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression


 Annis Lewis
 2 years ago
 Views:
Transcription
1 Logistic Regression Department of Statistics The Pennsylvania State University
2 Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max Pr(G = k X = x). k Decision boundary between class k and l is determined by the equation: Pr(G = k X = x) = Pr(G = l X = x). Divide both sides by Pr(G = l X = x) and take log. The above equation is equivalent to log Pr(G = k X = x) Pr(G = l X = x) = 0.
3 Since we enforce linear boundary, we can assume log Pr(G = k X = x) Pr(G = l X = x) = a(k,l) 0 + p j=1 a (k,l) j x j. For logistic regression, there are restrictive relations between a (k,l) for different pairs of (k, l).
4 Assumptions log Pr(G = 1 X = x) log Pr(G = K X = x) Pr(G = 2 X = x) log Pr(G = K X = x) Pr(G = K 1 X = x) Pr(G = K X = x) = β 10 + β1 T x = β 20 + β2 T x. = β (K 1)0 + βk 1 T x
5 For any pair (k, l): log Pr(G = k X = x) Pr(G = l X = x) = β k0 β l0 + (β k β l ) T x. Number of parameters: (K 1)(p + 1). Denote the entire parameter set by θ = {β 10, β 1, β 20, β 2,..., β (K 1)0, β K 1 }. The log ratio of posterior probabilities are called logodds or logit transformations.
6 Under the assumptions, the posterior probabilities are given by: Pr(G = k X = x) = exp(β k0 + β T k x) 1 + K 1 l=1 exp(β l0 + β T l x) for k = 1,..., K 1 Pr(G = K X = x) = K 1 l=1 exp(β l0 + βl T x). For Pr(G = k X = x) given above, obviously Sum up to 1: K k=1 Pr(G = k X = x) = 1. A simple calculation shows that the assumptions are satisfied.
7 Comparison with LR on Indicators Similarities: Both attempt to estimate Pr(G = k X = x). Both have linear classification boundaries. Difference: Linear regression on indicator matrix: approximate Pr(G = k X = x) by a linear function of x. Pr(G = k X = x) is not guaranteed to fall between 0 and 1 and to sum up to 1. Logistic regression: Pr(G = k X = x) is a nonlinear function of x. It is guaranteed to range from 0 to 1 and to sum up to 1.
8 Fitting Logistic Regression Models Criteria: find parameters that maximize the conditional likelihood of G given X using the training data. Denote p k (x i ; θ) = Pr(G = k X = x i ; θ). Given the first input x 1, the posterior probability of its class being g 1 is Pr(G = g 1 X = x 1 ). Since samples in the training data set are independent, the posterior probability for the N samples each having class g i, i = 1, 2,..., N, given their inputs x 1, x 2,..., x N is: N Pr(G = g i X = x i ).
9 The conditional loglikelihood of the class labels in the training data set is L(θ) = = N log Pr(G = g i X = x i ) N log p gi (x i ; θ).
10 Binary Classification For binary classification, if g i = 1, denote y i = 1; if g i = 2, denote y i = 0. Let p 1 (x; θ) = p(x; θ), then p 2 (x; θ) = 1 p 1 (x; θ) = 1 p(x; θ). Since K = 2, the parameters θ = {β 10, β 1 }. We denote β = (β 10, β 1 ) T.
11 If y i = 1, i.e., g i = 1, log p gi (x; β) = log p 1 (x; β) = 1 log p(x; β) = y i log p(x; β). If y i = 0, i.e., g i = 2, log p gi (x; β) = log p 2 (x; β) = 1 log(1 p(x; β)) = (1 y i ) log(1 p(x; β)). Since either y i = 0 or 1 y i = 0, we have log p gi (x; β) = y i log p(x; β) + (1 y i ) log(1 p(x; β)).
12 The conditional likelihood L(β) = = N log p gi (x i ; β) N [y i log p(x i ; β) + (1 y i ) log(1 p(x i ; β))] There p + 1 parameters in β = (β 10, β 1 ) T. Assume a column vector form for β: β 10 β 11 β = β 12.. β 1,p
13 Here we add the constant term 1 to x to accommodate the intercept. 1 x,1 x = x,2.. x,p
14 By the assumption of logistic regression model: p(x; β) = Pr(G = 1 X = x) = exp(βt x) 1 + exp(β T x) 1 p(x; β) = Pr(G = 2 X = x) = exp(β T x) Substitute the above in L(β): L(β) = N [ ] y i β T x i log(1 + e βt x i ).
15 To maximize L(β), we set the first order partial derivatives of L(β) to zero. L(β) β 1j = = = N N x ij e βt x i y i x ij 1 + e βt x i N y i x ij N p(x; β)x ij N x ij (y i p(x i ; β)) for all j = 0, 1,..., p.
16 In matrix form, we write L(β) β N = x i (y i p(x i ; β)). To solve the set of p + 1 nonlinear equations L(β) β 1j = 0, j = 0, 1,..., p, use the NewtonRaphson algorithm. The NewtonRaphson algorithm requires the secondderivatives or Hessian matrix: 2 L(β) N β β T = x i xi T p(x i ; β)(1 p(x i ; β)).
17 The element on the jth row and nth column is (counting from 0): L(β) β 1j β 1n N (1 + e βt x i )e βt x i x ij x in (e βt x i ) 2 x ij x in = (1 + e βt x i ) 2 = = N x ij x in p(x i ; β) x ij x in p(x i ; β) 2 N x ij x in p(x i ; β)(1 p(x i ; β)).
18 Starting with β old, a single NewtonRaphson update is β new = β old ( 2 ) 1 L(β) L(β) β β T β, where the derivatives are evaluated at β old.
19 The iteration can be expressed compactly in matrix form. Let y be the column vector of yi. Let X be the N (p + 1) input matrix. Let p be the Nvector of fitted probabilities with ith element p(x i ; β old ). Let W be an N N diagonal matrix of weights with ith element p(x i ; β old )(1 p(x i ; β old )). L(β) β = X T (y p) 2 L(β) β β T = X T WX.
20 The NewtonRaphson step is β new = β old + (X T WX) 1 X T (y p) = (X T WX) 1 X T W(Xβ old + W 1 (y p)) = (X T WX) 1 X T Wz, where z Xβ old + W 1 (y p). If z is viewed as a response and X is the input matrix, β new is the solution to a weighted least square problem: β new arg min β (z Xβ) T W(z Xβ). Recall that linear regression by least square is to solve arg min β (z Xβ) T (z Xβ). z is referred to as the adjusted response. The algorithm is referred to as iteratively reweighted least squares or IRLS.
21 Pseudo Code 1. 0 β 2. Compute y by setting its elements to { 1 if gi = 1 y i = 0 if g i = 2 i = 1, 2,..., N. 3. Compute p by setting its elements to, eβt x i p(x i ; β) = i = 1, 2,..., N. 1 + e βt x i 4. Compute the diagonal matrix W. The ith diagonal element is p(x i ; β)(1 p(x i ; β)), i = 1, 2,..., N. 5. z Xβ + W 1 (y p). 6. β (X T WX) 1 X T Wz. 7. If the stopping criteria is met, stop; otherwise go back to step 3.
22 Computational Efficiency Since W is an N N diagonal matrix, direct matrix operations with it may be very inefficient. A modified pseudo code is provided next.
23 1. 0 β 2. Compute y by setting its elements to { 1 if gi = 1 y i =, i = 1, 2,..., N. 0 if g i = 2 3. Compute p by setting its elements to p(x i ; β) = eβt x i 1 + e βt x i i = 1, 2,..., N. 4. Compute the N (p + 1) matrix X by multiplying the ith row of matrix X by p(x i ; β)(1 p(x i ; β)), i = 1, 2,..., N: x1 T p(x 1 ; β)(1 p(x 1 ; β))x X = x2 T 1 T X = p(x 2 ; β)(1 p(x 2 ; β))x T 2 xn T p(x N ; β)(1 p(x N ; β))xn T 5. β β + (X T X) 1 X T (y p). 6. If the stopping criteria is met, stop; otherwise go back to step 3.
24 Example Diabetes data set Input X is two dimensional. X 1 and X 2 are the two principal components of the original 8 variables. Class 1: without diabetes; Class 2: with diabetes. Applying logistic regression, we obtain β = (0.7679, , ) T.
25 The posterior probabilities are: Pr(G = 1 X = x) = Pr(G = 2 X = x) = The classification rule is: e X X e X X e X X 2 Ĝ(x) = { X X X X 2 < 0
26 Solid line: decision boundary obtained by logistic regression. Dash line: decision boundary obtained by LDA. Within training data set classification error rate: 28.12%. Sensitivity: 45.9%. Specificity: 85.8%.
27 Multiclass Case (K 3) When K 3, β is a (K1)(p+1)vector: β 10 β 11 β 10. β 1 β 1p β 20 β 20 β = β 2 =.. β 2p β (K 1)0 β K 1. β (K 1)0. β (K 1)p
28 Let β l = ( βl0 β l ). The likelihood function becomes L(β) = = = N log p gi (x i ; β) ( N e β g T x i i log 1 + K 1 [ ( N β g T i x i log 1 + l=1 e β T l x i K 1 l=1 ) e β T l x i )]
29 Note: the indicator function I ( ) equals 1 when the argument is true and 0 otherwise. First order derivatives: [ ] L(β) N e β k T x i x ij = I (g i = k)x ij β kj 1 + K 1 l=1 e β l T x i N = x ij (I (g i = k) p k (x i ; β))
30 Second order derivatives: = = 2 L(β) β kj β mn N 1 x ij (1 + K 1 l=1 e β l T x i ) 2 [ e β T k x i I (k = m)x in (1 + K 1 l=1 e β T l x i ) + e β T k x i e β T mx i x in ] N x ij x in ( p k (x i ; β)i (k = m) + p k (x i ; β)p m (x i ; β)) = N x ij x in p k (x i ; β)[i (k = m) p m (x i ; β)].
31 Matrix form. y is the concatenated indicator vector of dimension N (K 1). y 1 I (g 1 = k) y 2 I (g 2 = k) y =. y K 1 y k =. I (g N = k) p = 1 k K 1 p is the concatenated vector of fitted probabilities of dimension N (K 1). p 1 p k (x 1 ; β) p 2 p k (x 2 ; β). p K 1 p k =. p k (x N ; β) 1 k K 1
32 X is an N(K 1) (p + 1)(K 1) matrix: X = X X X
33 Matrix W is an N(K 1) N(K 1) square matrix: W = W 11 W 12 W 1(K 1) W 21 W 22 W 2(K 1) W (K 1),1 W (K 1),2 W (K 1),(K 1) Each submatrix W km, 1 k, m K 1, is an N N diagonal matrix. When k = m, the ith diagonal element in W kk is p k (x i ; β old )(1 p k (x i ; β old )). When k m, the ith diagonal element in W km is p k (x i ; β old )p m (x i ; β old ).
34 Similarly as with binary classification L(β) β = X T (y p) 2 L(β) β β T = X T W X. The formula for updating β new in the binary classification case holds for multiclass. β new = ( X T W X) 1 X T Wz, where z Xβ old + W 1 (y p). Or simply: β new = β old + ( X T W X) 1 X T (y p).
35 Computation Issues Initialization: one option is to use β = 0. Convergence is not guaranteed, but usually is the case. Usually, the loglikelihood increases after each iteration, but overshooting can occur. In the rare cases that the loglikelihood decreases, cut step size by half.
36 Connection with LDA Under the model of LDA: log Pr(G = k X = x) Pr(G = K X = x) = log π k π K 1 2 (µ k + µ K ) T Σ 1 (µ k µ K ) +x T Σ 1 (µ k µ K ) = a k0 + a T k x. The model of LDA satisfies the assumption of the linear logistic model. The linear logistic model only specifies the conditional distribution Pr(G = k X = x). No assumption is made about Pr(X ).
37 The LDA model specifies the joint distribution of X and G. Pr(X ) is a mixture of Gaussians: Pr(X ) = K π k φ(x ; µ k, Σ). k=1 where φ is the Gaussian density function. Linear logistic regression maximizes the conditional likelihood of G given X : Pr(G = k X = x). LDA maximizes the joint likelihood of G and X : Pr(X = x, G = k).
38 If the additional assumption made by LDA is appropriate, LDA tends to estimate the parameters more efficiently by using more information about the data. Samples without class labels can be used under the model of LDA. LDA is not robust to gross outliers. As logistic regression relies on fewer assumptions, it seems to be more robust. In practice, logistic regression and LDA often give similar results.
39 Simulation Assume input X is 1D. Two classes have equal priors and the classconditional densities of X are shifted versions of each other. Each conditional density is a mixture of two normals: Class 1 (red): 0.6N( 2, 1 4 ) + 0.4N(0, 1). Class 2 (blue): 0.6N(0, 1 4 ) + 0.4N(2, 1). The classconditional densities are shown below.
40
41 LDA Result Training data set: 2000 samples for each class. Test data set: 1000 samples for each class. The estimation by LDA: ˆµ 1 = , ˆµ 2 = , ˆσ 2 = Boundary value between the two classes is (ˆµ 1 + ˆµ 2 )/2 = The classification error rate on the test data is Based on the true distribution, the Bayes (optimal) boundary value between the two classes is and the error rate is
42
43 Logistic Regression Result Linear logistic regression obtains β = ( , ) T. The boundary value satisfies X = 0, hence equals The error rate on the test data set is The estimated posterior probability is: Pr(G = 1 X = x) = e x 1 + e x.
44 The estimated posterior probability Pr(G = 1 X = x) and its true value based on the true distribution are compared in the graph below.
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationCS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
More informationWes, Delaram, and Emily MA751. Exercise 4.5. 1 p(x; β) = [1 p(xi ; β)] = 1 p(x. y i [βx i ] log [1 + exp {βx i }].
Wes, Delaram, and Emily MA75 Exercise 4.5 Consider a twoclass logistic regression problem with x R. Characterize the maximumlikelihood estimates of the slope and intercept parameter if the sample for
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationCSI:FLORIDA. Section 4.4: Logistic Regression
SI:FLORIDA Section 4.4: Logistic Regression SI:FLORIDA Reisit Masked lass Problem.5.5 2 .5  .5 .5  .5.5.5 We can generalize this roblem to two class roblem as well! SI:FLORIDA Reisit Masked lass
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationPattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University
Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationPa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classiﬁca6on
Pa8ern Recogni6on and Machine Learning Chapter 4: Linear Models for Classiﬁca6on Represen'ng the target values for classifica'on If there are only two classes, we typically use a single real valued output
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationReject Inference in Credit Scoring. JieMen Mok
Reject Inference in Credit Scoring JieMen Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business
More informationLinear Models for Classification
Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci
More informationLOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
More informationClassification Problems
Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN13: 9780470860809 ISBN10: 0470860804 Editors Brian S Everitt & David
More informationProbabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur
Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:
More informationModels for Count Data With Overdispersion
Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extrapoisson variation and the negative binomial model, with brief appearances
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationAuxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationGeneralized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)
Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through
More informationLogit Models for Binary Data
Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study loglinear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationMaximum Likelihood Estimation of Logistic Regression Models: Theory and Implementation
Maximum Likelihood Estimation of Logistic Regression Models: Theory and Implementation Abstract This article presents an overview of the logistic regression model for dependent variables having two or
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multiclass classification.
More informationThe basic unit in matrix algebra is a matrix, generally expressed as: a 11 a 12. a 13 A = a 21 a 22 a 23
(copyright by Scott M Lynch, February 2003) Brief Matrix Algebra Review (Soc 504) Matrix algebra is a form of mathematics that allows compact notation for, and mathematical manipulation of, highdimensional
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationKERNEL LOGISTIC REGRESSIONLINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA
Rahayu, Kernel Logistic RegressionLinear for Leukemia Classification using High Dimensional Data KERNEL LOGISTIC REGRESSIONLINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA S.P. Rahayu 1,2
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationCCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York
BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal  the stuff biology is not
More informationLinear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil.
Steven J Zeil Old Dominion Univ. Fall 200 DiscriminantBased Classification Linearly Separable Systems Pairwise Separation 2 Posteriors 3 Logistic Discrimination 2 DiscriminantBased Classification Likelihoodbased:
More informationFactorial experimental designs and generalized linear models
Statistics & Operations Research Transactions SORT 29 (2) JulyDecember 2005, 249268 ISSN: 16962281 www.idescat.net/sort Statistics & Operations Research c Institut d Estadística de Transactions Catalunya
More informationInner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
More informationWebbased Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni
1 Webbased Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed
More informationAnalysis of Bayesian Dynamic Linear Models
Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main
More informationLecture 6: Logistic Regression
Lecture 6: CS 19410, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationNominal and ordinal logistic regression
Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome
More informationParametric Models Part I: Maximum Likelihood and Bayesian Density Estimation
Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015
More informationOverview Classes. 123 Logistic regression (5) 193 Building and applying logistic regression (6) 263 Generalizations of logistic regression (7)
Overview Classes 123 Logistic regression (5) 193 Building and applying logistic regression (6) 263 Generalizations of logistic regression (7) 24 Loglinear models (8) 54 1517 hrs; 5B02 Building and
More information11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression
Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c11 2013/9/9 page 221 letex 221 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial
More informationClass #6: Nonlinear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Nonlinear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Nonlinear classification Linear Support Vector Machines
More informationSOLVING LINEAR SYSTEMS
SOLVING LINEAR SYSTEMS Linear systems Ax = b occur widely in applied mathematics They occur as direct formulations of real world problems; but more often, they occur as a part of the numerical analysis
More informationCofactor Expansion: Cramer s Rule
Cofactor Expansion: Cramer s Rule MATH 322, Linear Algebra I J. Robert Buchanan Department of Mathematics Spring 2015 Introduction Today we will focus on developing: an efficient method for calculating
More informationTwo Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014
More informationNeural Networks. CAP5610 Machine Learning Instructor: GuoJun Qi
Neural Networks CAP5610 Machine Learning Instructor: GuoJun Qi Recap: linear classifier Logistic regression Maximizing the posterior distribution of class Y conditional on the input vector X Support vector
More informationSupport Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationFitting Subjectspecific Curves to Grouped Longitudinal Data
Fitting Subjectspecific Curves to Grouped Longitudinal Data Djeundje, Viani HeriotWatt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK Email: vad5@hw.ac.uk Currie,
More informationLogistic Regression for Data Mining and HighDimensional Classification
Logistic Regression for Data Mining and HighDimensional Classification Paul Komarek Dept. of Math Sciences Carnegie Mellon University komarek@cmu.edu Advised by Andrew Moore School of Computer Science
More informationBayes and Naïve Bayes. cs534machine Learning
Bayes and aïve Bayes cs534machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule
More informationTowards running complex models on big data
Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation
More informationPackage EstCRM. July 13, 2015
Version 1.4 Date 2015711 Package EstCRM July 13, 2015 Title Calibrating Parameters for the Samejima's Continuous IRT Model Author Cengiz Zopluoglu Maintainer Cengiz Zopluoglu
More informationDirect Methods for Solving Linear Systems. Matrix Factorization
Direct Methods for Solving Linear Systems Matrix Factorization Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011
More informationMVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr
Machine Learning for Computer Vision 1 MVA ENS Cachan Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Department of Applied Mathematics Ecole Centrale Paris Galen
More informationRegression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013
Lecture 6: Regression Analysis MIT 18.S096 Dr. Kempthorne Fall 2013 MIT 18.S096 Regression Analysis 1 Outline Regression Analysis 1 Regression Analysis MIT 18.S096 Regression Analysis 2 Multiple Linear
More informationProgramming Exercise 3: Multiclass Classification and Neural Networks
Programming Exercise 3: Multiclass Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement onevsall logistic regression and neural networks
More informationEstimating an ARMA Process
Statistics 910, #12 1 Overview Estimating an ARMA Process 1. Main ideas 2. Fitting autoregressions 3. Fitting with moving average components 4. Standard errors 5. Examples 6. Appendix: Simple estimators
More informationLeastSquares Intersection of Lines
LeastSquares Intersection of Lines Johannes Traa  UIUC 2013 This writeup derives the leastsquares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
More informationTHE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok
THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan
More informationLecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs
CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like
More informationStatistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP  Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationLogit and Probit. Brad Jones 1. April 21, 2009. University of California, Davis. Bradford S. Jones, UCDavis, Dept. of Political Science
Logit and Probit Brad 1 1 Department of Political Science University of California, Davis April 21, 2009 Logit, redux Logit resolves the functional form problem (in terms of the response function in the
More information(a) The transpose of a lower triangular matrix is upper triangular, and the transpose of an upper triangular matrix is lower triangular.
Theorem.7.: (Properties of Triangular Matrices) (a) The transpose of a lower triangular matrix is upper triangular, and the transpose of an upper triangular matrix is lower triangular. (b) The product
More informationLecture 2 Matrix Operations
Lecture 2 Matrix Operations transpose, sum & difference, scalar multiplication matrix multiplication, matrixvector product matrix inverse 2 1 Matrix transpose transpose of m n matrix A, denoted A T or
More informationResponse variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting.
Prof. Dr. J. Franke All of Statistics 1.52 Binary response variables  logistic regression Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationFactorization Theorems
Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization
More informationExact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure
Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission
More informationDiagonal, Symmetric and Triangular Matrices
Contents 1 Diagonal, Symmetric Triangular Matrices 2 Diagonal Matrices 2.1 Products, Powers Inverses of Diagonal Matrices 2.1.1 Theorem (Powers of Matrices) 2.2 Multiplying Matrices on the Left Right by
More informationPenalized Logistic Regression and Classification of Microarray Data
Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAGLMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification
More informationTorgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances
Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances It is possible to construct a matrix X of Cartesian coordinates of points in Euclidean space when we know the Euclidean
More informationClassification by Pairwise Coupling
Classification by Pairwise Coupling TREVOR HASTIE * Stanford University and ROBERT TIBSHIRANI t University of Toronto Abstract We discuss a strategy for polychotomous classification that involves estimating
More informationL10: Probability, statistics, and estimation theory
L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian
More informationThe equivalence of logistic regression and maximum entropy models
The equivalence of logistic regression and maximum entropy models John Mount September 23, 20 Abstract As our colleague so aptly demonstrated ( http://www.winvector.com/blog/20/09/thesimplerderivationoflogisticregression/
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationLinear Algebra Notes for Marsden and Tromba Vector Calculus
Linear Algebra Notes for Marsden and Tromba Vector Calculus ndimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of
More informationLecture 8 February 4
ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt
More informationOrdinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
More informationMultiple Choice Models II
Multiple Choice Models II Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini Laura Magazzini (@univr.it) Multiple Choice Models II 1 / 28 Categorical data Categorical
More information4. Joint Distributions of Two Random Variables
4. Joint Distributions of Two Random Variables 4.1 Joint Distributions of Two Discrete Random Variables Suppose the discrete random variables X and Y have supports S X and S Y, respectively. The joint
More informationANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING
ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING BY OMID ROUHANIKALLEH THESIS Submitted as partial fulfillment of the requirements for the degree of
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
More informationChapter 4: NonParametric Classification
Chapter 4: NonParametric Classification Introduction Density Estimation Parzen Windows KnNearest Neighbor Density Estimation KNearest Neighbor (KNN) Decision Rule Gaussian Mixture Model A weighted combination
More informationChapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
More informationLocal classification and local likelihoods
Local classification and local likelihoods November 18 knearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor
More information1 Maximum likelihood estimation
COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models  part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK2800 Kgs. Lyngby
More informationDecember 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in twodimensional space (1) 2x y = 3 describes a line in twodimensional space The coefficients of x and y in the equation
More informationPrinciple of Data Reduction
Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More information