Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
|
|
- Annis Lewis
- 8 years ago
- Views:
Transcription
1 Logistic Regression Department of Statistics The Pennsylvania State University
2 Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max Pr(G = k X = x). k Decision boundary between class k and l is determined by the equation: Pr(G = k X = x) = Pr(G = l X = x). Divide both sides by Pr(G = l X = x) and take log. The above equation is equivalent to log Pr(G = k X = x) Pr(G = l X = x) = 0.
3 Since we enforce linear boundary, we can assume log Pr(G = k X = x) Pr(G = l X = x) = a(k,l) 0 + p j=1 a (k,l) j x j. For logistic regression, there are restrictive relations between a (k,l) for different pairs of (k, l).
4 Assumptions log Pr(G = 1 X = x) log Pr(G = K X = x) Pr(G = 2 X = x) log Pr(G = K X = x) Pr(G = K 1 X = x) Pr(G = K X = x) = β 10 + β1 T x = β 20 + β2 T x. = β (K 1)0 + βk 1 T x
5 For any pair (k, l): log Pr(G = k X = x) Pr(G = l X = x) = β k0 β l0 + (β k β l ) T x. Number of parameters: (K 1)(p + 1). Denote the entire parameter set by θ = {β 10, β 1, β 20, β 2,..., β (K 1)0, β K 1 }. The log ratio of posterior probabilities are called log-odds or logit transformations.
6 Under the assumptions, the posterior probabilities are given by: Pr(G = k X = x) = exp(β k0 + β T k x) 1 + K 1 l=1 exp(β l0 + β T l x) for k = 1,..., K 1 Pr(G = K X = x) = K 1 l=1 exp(β l0 + βl T x). For Pr(G = k X = x) given above, obviously Sum up to 1: K k=1 Pr(G = k X = x) = 1. A simple calculation shows that the assumptions are satisfied.
7 Comparison with LR on Indicators Similarities: Both attempt to estimate Pr(G = k X = x). Both have linear classification boundaries. Difference: Linear regression on indicator matrix: approximate Pr(G = k X = x) by a linear function of x. Pr(G = k X = x) is not guaranteed to fall between 0 and 1 and to sum up to 1. Logistic regression: Pr(G = k X = x) is a nonlinear function of x. It is guaranteed to range from 0 to 1 and to sum up to 1.
8 Fitting Logistic Regression Models Criteria: find parameters that maximize the conditional likelihood of G given X using the training data. Denote p k (x i ; θ) = Pr(G = k X = x i ; θ). Given the first input x 1, the posterior probability of its class being g 1 is Pr(G = g 1 X = x 1 ). Since samples in the training data set are independent, the posterior probability for the N samples each having class g i, i = 1, 2,..., N, given their inputs x 1, x 2,..., x N is: N Pr(G = g i X = x i ).
9 The conditional log-likelihood of the class labels in the training data set is L(θ) = = N log Pr(G = g i X = x i ) N log p gi (x i ; θ).
10 Binary Classification For binary classification, if g i = 1, denote y i = 1; if g i = 2, denote y i = 0. Let p 1 (x; θ) = p(x; θ), then p 2 (x; θ) = 1 p 1 (x; θ) = 1 p(x; θ). Since K = 2, the parameters θ = {β 10, β 1 }. We denote β = (β 10, β 1 ) T.
11 If y i = 1, i.e., g i = 1, log p gi (x; β) = log p 1 (x; β) = 1 log p(x; β) = y i log p(x; β). If y i = 0, i.e., g i = 2, log p gi (x; β) = log p 2 (x; β) = 1 log(1 p(x; β)) = (1 y i ) log(1 p(x; β)). Since either y i = 0 or 1 y i = 0, we have log p gi (x; β) = y i log p(x; β) + (1 y i ) log(1 p(x; β)).
12 The conditional likelihood L(β) = = N log p gi (x i ; β) N [y i log p(x i ; β) + (1 y i ) log(1 p(x i ; β))] There p + 1 parameters in β = (β 10, β 1 ) T. Assume a column vector form for β: β 10 β 11 β = β 12.. β 1,p
13 Here we add the constant term 1 to x to accommodate the intercept. 1 x,1 x = x,2.. x,p
14 By the assumption of logistic regression model: p(x; β) = Pr(G = 1 X = x) = exp(βt x) 1 + exp(β T x) 1 p(x; β) = Pr(G = 2 X = x) = exp(β T x) Substitute the above in L(β): L(β) = N [ ] y i β T x i log(1 + e βt x i ).
15 To maximize L(β), we set the first order partial derivatives of L(β) to zero. L(β) β 1j = = = N N x ij e βt x i y i x ij 1 + e βt x i N y i x ij N p(x; β)x ij N x ij (y i p(x i ; β)) for all j = 0, 1,..., p.
16 In matrix form, we write L(β) β N = x i (y i p(x i ; β)). To solve the set of p + 1 nonlinear equations L(β) β 1j = 0, j = 0, 1,..., p, use the Newton-Raphson algorithm. The Newton-Raphson algorithm requires the second-derivatives or Hessian matrix: 2 L(β) N β β T = x i xi T p(x i ; β)(1 p(x i ; β)).
17 The element on the jth row and nth column is (counting from 0): L(β) β 1j β 1n N (1 + e βt x i )e βt x i x ij x in (e βt x i ) 2 x ij x in = (1 + e βt x i ) 2 = = N x ij x in p(x i ; β) x ij x in p(x i ; β) 2 N x ij x in p(x i ; β)(1 p(x i ; β)).
18 Starting with β old, a single Newton-Raphson update is β new = β old ( 2 ) 1 L(β) L(β) β β T β, where the derivatives are evaluated at β old.
19 The iteration can be expressed compactly in matrix form. Let y be the column vector of yi. Let X be the N (p + 1) input matrix. Let p be the N-vector of fitted probabilities with ith element p(x i ; β old ). Let W be an N N diagonal matrix of weights with ith element p(x i ; β old )(1 p(x i ; β old )). L(β) β = X T (y p) 2 L(β) β β T = X T WX.
20 The Newton-Raphson step is β new = β old + (X T WX) 1 X T (y p) = (X T WX) 1 X T W(Xβ old + W 1 (y p)) = (X T WX) 1 X T Wz, where z Xβ old + W 1 (y p). If z is viewed as a response and X is the input matrix, β new is the solution to a weighted least square problem: β new arg min β (z Xβ) T W(z Xβ). Recall that linear regression by least square is to solve arg min β (z Xβ) T (z Xβ). z is referred to as the adjusted response. The algorithm is referred to as iteratively reweighted least squares or IRLS.
21 Pseudo Code 1. 0 β 2. Compute y by setting its elements to { 1 if gi = 1 y i = 0 if g i = 2 i = 1, 2,..., N. 3. Compute p by setting its elements to, eβt x i p(x i ; β) = i = 1, 2,..., N. 1 + e βt x i 4. Compute the diagonal matrix W. The ith diagonal element is p(x i ; β)(1 p(x i ; β)), i = 1, 2,..., N. 5. z Xβ + W 1 (y p). 6. β (X T WX) 1 X T Wz. 7. If the stopping criteria is met, stop; otherwise go back to step 3.
22 Computational Efficiency Since W is an N N diagonal matrix, direct matrix operations with it may be very inefficient. A modified pseudo code is provided next.
23 1. 0 β 2. Compute y by setting its elements to { 1 if gi = 1 y i =, i = 1, 2,..., N. 0 if g i = 2 3. Compute p by setting its elements to p(x i ; β) = eβt x i 1 + e βt x i i = 1, 2,..., N. 4. Compute the N (p + 1) matrix X by multiplying the ith row of matrix X by p(x i ; β)(1 p(x i ; β)), i = 1, 2,..., N: x1 T p(x 1 ; β)(1 p(x 1 ; β))x X = x2 T 1 T X = p(x 2 ; β)(1 p(x 2 ; β))x T 2 xn T p(x N ; β)(1 p(x N ; β))xn T 5. β β + (X T X) 1 X T (y p). 6. If the stopping criteria is met, stop; otherwise go back to step 3.
24 Example Diabetes data set Input X is two dimensional. X 1 and X 2 are the two principal components of the original 8 variables. Class 1: without diabetes; Class 2: with diabetes. Applying logistic regression, we obtain β = (0.7679, , ) T.
25 The posterior probabilities are: Pr(G = 1 X = x) = Pr(G = 2 X = x) = The classification rule is: e X X e X X e X X 2 Ĝ(x) = { X X X X 2 < 0
26 Solid line: decision boundary obtained by logistic regression. Dash line: decision boundary obtained by LDA. Within training data set classification error rate: 28.12%. Sensitivity: 45.9%. Specificity: 85.8%.
27 Multiclass Case (K 3) When K 3, β is a (K-1)(p+1)-vector: β 10 β 11 β 10. β 1 β 1p β 20 β 20 β = β 2 =.. β 2p β (K 1)0 β K 1. β (K 1)0. β (K 1)p
28 Let β l = ( βl0 β l ). The likelihood function becomes L(β) = = = N log p gi (x i ; β) ( N e β g T x i i log 1 + K 1 [ ( N β g T i x i log 1 + l=1 e β T l x i K 1 l=1 ) e β T l x i )]
29 Note: the indicator function I ( ) equals 1 when the argument is true and 0 otherwise. First order derivatives: [ ] L(β) N e β k T x i x ij = I (g i = k)x ij β kj 1 + K 1 l=1 e β l T x i N = x ij (I (g i = k) p k (x i ; β))
30 Second order derivatives: = = 2 L(β) β kj β mn N 1 x ij (1 + K 1 l=1 e β l T x i ) 2 [ e β T k x i I (k = m)x in (1 + K 1 l=1 e β T l x i ) + e β T k x i e β T mx i x in ] N x ij x in ( p k (x i ; β)i (k = m) + p k (x i ; β)p m (x i ; β)) = N x ij x in p k (x i ; β)[i (k = m) p m (x i ; β)].
31 Matrix form. y is the concatenated indicator vector of dimension N (K 1). y 1 I (g 1 = k) y 2 I (g 2 = k) y =. y K 1 y k =. I (g N = k) p = 1 k K 1 p is the concatenated vector of fitted probabilities of dimension N (K 1). p 1 p k (x 1 ; β) p 2 p k (x 2 ; β). p K 1 p k =. p k (x N ; β) 1 k K 1
32 X is an N(K 1) (p + 1)(K 1) matrix: X = X X X
33 Matrix W is an N(K 1) N(K 1) square matrix: W = W 11 W 12 W 1(K 1) W 21 W 22 W 2(K 1) W (K 1),1 W (K 1),2 W (K 1),(K 1) Each submatrix W km, 1 k, m K 1, is an N N diagonal matrix. When k = m, the ith diagonal element in W kk is p k (x i ; β old )(1 p k (x i ; β old )). When k m, the ith diagonal element in W km is p k (x i ; β old )p m (x i ; β old ).
34 Similarly as with binary classification L(β) β = X T (y p) 2 L(β) β β T = X T W X. The formula for updating β new in the binary classification case holds for multiclass. β new = ( X T W X) 1 X T Wz, where z Xβ old + W 1 (y p). Or simply: β new = β old + ( X T W X) 1 X T (y p).
35 Computation Issues Initialization: one option is to use β = 0. Convergence is not guaranteed, but usually is the case. Usually, the log-likelihood increases after each iteration, but overshooting can occur. In the rare cases that the log-likelihood decreases, cut step size by half.
36 Connection with LDA Under the model of LDA: log Pr(G = k X = x) Pr(G = K X = x) = log π k π K 1 2 (µ k + µ K ) T Σ 1 (µ k µ K ) +x T Σ 1 (µ k µ K ) = a k0 + a T k x. The model of LDA satisfies the assumption of the linear logistic model. The linear logistic model only specifies the conditional distribution Pr(G = k X = x). No assumption is made about Pr(X ).
37 The LDA model specifies the joint distribution of X and G. Pr(X ) is a mixture of Gaussians: Pr(X ) = K π k φ(x ; µ k, Σ). k=1 where φ is the Gaussian density function. Linear logistic regression maximizes the conditional likelihood of G given X : Pr(G = k X = x). LDA maximizes the joint likelihood of G and X : Pr(X = x, G = k).
38 If the additional assumption made by LDA is appropriate, LDA tends to estimate the parameters more efficiently by using more information about the data. Samples without class labels can be used under the model of LDA. LDA is not robust to gross outliers. As logistic regression relies on fewer assumptions, it seems to be more robust. In practice, logistic regression and LDA often give similar results.
39 Simulation Assume input X is 1-D. Two classes have equal priors and the class-conditional densities of X are shifted versions of each other. Each conditional density is a mixture of two normals: Class 1 (red): 0.6N( 2, 1 4 ) + 0.4N(0, 1). Class 2 (blue): 0.6N(0, 1 4 ) + 0.4N(2, 1). The class-conditional densities are shown below.
40
41 LDA Result Training data set: 2000 samples for each class. Test data set: 1000 samples for each class. The estimation by LDA: ˆµ 1 = , ˆµ 2 = , ˆσ 2 = Boundary value between the two classes is (ˆµ 1 + ˆµ 2 )/2 = The classification error rate on the test data is Based on the true distribution, the Bayes (optimal) boundary value between the two classes is and the error rate is
42
43 Logistic Regression Result Linear logistic regression obtains β = ( , ) T. The boundary value satisfies X = 0, hence equals The error rate on the test data set is The estimated posterior probability is: Pr(G = 1 X = x) = e x 1 + e x.
44 The estimated posterior probability Pr(G = 1 X = x) and its true value based on the true distribution are compared in the graph below.
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationCS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
More informationWes, Delaram, and Emily MA751. Exercise 4.5. 1 p(x; β) = [1 p(xi ; β)] = 1 p(x. y i [βx i ] log [1 + exp {βx i }].
Wes, Delaram, and Emily MA75 Exercise 4.5 Consider a two-class logistic regression problem with x R. Characterize the maximum-likelihood estimates of the slope and intercept parameter if the sample for
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationCSI:FLORIDA. Section 4.4: Logistic Regression
SI:FLORIDA Section 4.4: Logistic Regression SI:FLORIDA Reisit Masked lass Problem.5.5 2 -.5 - -.5 -.5 - -.5.5.5 We can generalize this roblem to two class roblem as well! SI:FLORIDA Reisit Masked lass
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationPattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University
Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationPa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on
Pa8ern Recogni6on and Machine Learning Chapter 4: Linear Models for Classifica6on Represen'ng the target values for classifica'on If there are only two classes, we typically use a single real valued output
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationReject Inference in Credit Scoring. Jie-Men Mok
Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business
More informationClassification Problems
Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems
More informationLOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationProbabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur
Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:
More informationLinear Models for Classification
Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationAuxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationLogit Models for Binary Data
Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response
More informationMaximum Likelihood Estimation of Logistic Regression Models: Theory and Implementation
Maximum Likelihood Estimation of Logistic Regression Models: Theory and Implementation Abstract This article presents an overview of the logistic regression model for dependent variables having two or
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationCCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York
BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationLinear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil.
Steven J Zeil Old Dominion Univ. Fall 200 Discriminant-Based Classification Linearly Separable Systems Pairwise Separation 2 Posteriors 3 Logistic Discrimination 2 Discriminant-Based Classification Likelihood-based:
More informationFactorial experimental designs and generalized linear models
Statistics & Operations Research Transactions SORT 29 (2) July-December 2005, 249-268 ISSN: 1696-2281 www.idescat.net/sort Statistics & Operations Research c Institut d Estadística de Transactions Catalunya
More informationKERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA
Rahayu, Kernel Logistic Regression-Linear for Leukemia Classification using High Dimensional Data KERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA S.P. Rahayu 1,2
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationInner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
More informationAnalysis of Bayesian Dynamic Linear Models
Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main
More informationWeb-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni
1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed
More informationLecture 6: Logistic Regression
Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,
More informationNominal and ordinal logistic regression
Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationOverview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More information11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression
Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c11 2013/9/9 page 221 le-tex 221 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationTwo Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014
More informationSupport Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationFitting Subject-specific Curves to Grouped Longitudinal Data
Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,
More informationDirect Methods for Solving Linear Systems. Matrix Factorization
Direct Methods for Solving Linear Systems Matrix Factorization Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011
More informationPackage EstCRM. July 13, 2015
Version 1.4 Date 2015-7-11 Package EstCRM July 13, 2015 Title Calibrating Parameters for the Samejima's Continuous IRT Model Author Cengiz Zopluoglu Maintainer Cengiz Zopluoglu
More informationMVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr
Machine Learning for Computer Vision 1 MVA ENS Cachan Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Department of Applied Mathematics Ecole Centrale Paris Galen
More informationTowards running complex models on big data
Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation
More informationThe equivalence of logistic regression and maximum entropy models
The equivalence of logistic regression and maximum entropy models John Mount September 23, 20 Abstract As our colleague so aptly demonstrated ( http://www.win-vector.com/blog/20/09/the-simplerderivation-of-logistic-regression/
More informationProgramming Exercise 3: Multi-class Classification and Neural Networks
Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks
More informationSOLVING LINEAR SYSTEMS
SOLVING LINEAR SYSTEMS Linear systems Ax = b occur widely in applied mathematics They occur as direct formulations of real world problems; but more often, they occur as a part of the numerical analysis
More informationRegression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013
Lecture 6: Regression Analysis MIT 18.S096 Dr. Kempthorne Fall 2013 MIT 18.S096 Regression Analysis 1 Outline Regression Analysis 1 Regression Analysis MIT 18.S096 Regression Analysis 2 Multiple Linear
More informationEstimating an ARMA Process
Statistics 910, #12 1 Overview Estimating an ARMA Process 1. Main ideas 2. Fitting autoregressions 3. Fitting with moving average components 4. Standard errors 5. Examples 6. Appendix: Simple estimators
More informationStatistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
More informationLeast-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationLogistic Regression for Data Mining and High-Dimensional Classification
Logistic Regression for Data Mining and High-Dimensional Classification Paul Komarek Dept. of Math Sciences Carnegie Mellon University komarek@cmu.edu Advised by Andrew Moore School of Computer Science
More informationLogit and Probit. Brad Jones 1. April 21, 2009. University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science
Logit and Probit Brad 1 1 Department of Political Science University of California, Davis April 21, 2009 Logit, redux Logit resolves the functional form problem (in terms of the response function in the
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationFactorization Theorems
Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization
More informationPenalized Logistic Regression and Classification of Microarray Data
Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification
More informationLecture 2 Matrix Operations
Lecture 2 Matrix Operations transpose, sum & difference, scalar multiplication matrix multiplication, matrix-vector product matrix inverse 2 1 Matrix transpose transpose of m n matrix A, denoted A T or
More informationResponse variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting.
Prof. Dr. J. Franke All of Statistics 1.52 Binary response variables - logistic regression Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit
More informationBayes and Naïve Bayes. cs534-machine Learning
Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule
More informationLecture 8 February 4
ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt
More informationExact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure
Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission
More informationClassification by Pairwise Coupling
Classification by Pairwise Coupling TREVOR HASTIE * Stanford University and ROBERT TIBSHIRANI t University of Toronto Abstract We discuss a strategy for polychotomous classification that involves estimating
More informationTorgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances
Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances It is possible to construct a matrix X of Cartesian coordinates of points in Euclidean space when we know the Euclidean
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationTHE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok
THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan
More informationLecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs
CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like
More informationMultiple Choice Models II
Multiple Choice Models II Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini Laura Magazzini (@univr.it) Multiple Choice Models II 1 / 28 Categorical data Categorical
More informationLinear Algebra Notes for Marsden and Tromba Vector Calculus
Linear Algebra Notes for Marsden and Tromba Vector Calculus n-dimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of
More informationChapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
More informationLocal classification and local likelihoods
Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationImplementation of Logistic Regression with Truncated IRLS
Implementation of Logistic Regression with Truncated IRLS Paul Komarek School of Computer Science Carnegie Mellon University komarek@cmu.edu Andrew Moore School of Computer Science Carnegie Mellon University
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationDecember 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation
More informationSpatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
More informationThe Probit Link Function in Generalized Linear Models for Data Mining Applications
Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications
More informationNumerical methods for American options
Lecture 9 Numerical methods for American options Lecture Notes by Andrzej Palczewski Computational Finance p. 1 American options The holder of an American option has the right to exercise it at any moment
More informationANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING
ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING BY OMID ROUHANI-KALLEH THESIS Submitted as partial fulfillment of the requirements for the degree of
More informationOrdinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
More informationGaussian Conjugate Prior Cheat Sheet
Gaussian Conjugate Prior Cheat Sheet Tom SF Haines 1 Purpose This document contains notes on how to handle the multivariate Gaussian 1 in a Bayesian setting. It focuses on the conjugate prior, its Bayesian
More informationSolution to Homework 2
Solution to Homework 2 Olena Bormashenko September 23, 2011 Section 1.4: 1(a)(b)(i)(k), 4, 5, 14; Section 1.5: 1(a)(b)(c)(d)(e)(n), 2(a)(c), 13, 16, 17, 18, 27 Section 1.4 1. Compute the following, if
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More informationA Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear
More information1 Maximum likelihood estimation
COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N
More informationL4: Bayesian Decision Theory
L4: Bayesian Decision Theory Likelihood ratio test Probability of error Bayes risk Bayes, MAP and ML criteria Multi-class problems Discriminant functions CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna
More information