Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)
|
|
- Joanna Stephens
- 7 years ago
- Views:
Transcription
1 Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through a regression model for θ i Involves choice of a link function (systematic component) Examples for counts, binomial data Algorithm for maximizing likelihood 1
2 Systematic Component, Link Functions Instead of modeling the mean, µ i, as a linear function of predictors, x i, we introduce on one-to-one continuously differentiable transformation g( ) and focus on η i = g(µ i ), where g( ) will be called the link function and η i the linear predictor. We assume that the transformed mean follows a linear model, η i = x iβ. Since the link function is invertible and one-to-one, we have µ i = g 1 (η i ) = g 1 (x iβ). 2
3 Note that we are transforming the expected value, µ i, instead of the raw data, y i. For classical linear models, the mean is the linear predictor. In this case, the identity link is reasonable since both µ i and η i can take any value on the real line. This is not the case in general. 3
4 Link Functions for Poisson Data For example, if Y i Poi(µ i ) then µ i must be > 0. In this case, a linear model is not reasonable since for some values of x i µ i 0. By using the model, η i = log(µ i ) = x iβ, we are guaranteed to have µ i > 0 for all β R p and all values of x i. In general, a link function for count data should map the interval (0, ) R (i.e., from the + real numbers to the entire real line). The log link is a natural choice 4
5 Link Functions for Binomial Data For the binomial distribution, 0 < µ i < 1 (mean of y i is n i µ i ) Therefore, the link function should map from (0, 1) R Standard choices: 1. logit: η i = log{µ i /(1 µ i )}. 2. probit: η i = Φ 1 (µ i ), where Φ( ) is the N(0, 1) cdf. 3. complementary log-log: η i = log{ log(1 µ i )}. Each of these choices is important in applications & will be considered in detail later in the course 5
6 Recall that the exponential family density has the following form: f(y i ; θ i, φ) = exp { y i θ i b(θ i ) a(φ) + c(y i, φ) }. where a( ), b( ) and c( ) are known functions. Specifying the GLM involves choosing a( ), b( ), c( ): 1. Specify a( ), c( ) to correspond to particular distribution (e.g., Binomial, Poisson) 2. Specify b( ) to correspond to a particular link function 6
7 Recall that mean & variance are µ i = b (θ i ) and σ 2 = b (θ i )φ. Using b (θ i ) = g 1 (x iβ), we can express the density as f(y i ; x i, β, φ), so that the conditional likelihood of y i given x i depends on parameters β and φ. It would seem that a natural choice for b( ) and hence g( ), would correspond to θ i = η i = x iβ, so that b ( ) is the inverse link 7
8 Canonical Links and Sufficient Statistics Each of the distributions we have considered has a special, canonical, link function for which there exists a sufficient statistic equal in dimension to β. Canonical links occur when θ i = η i = x iβ, with θ i the canonical parameter As a homework exercise, please show for next Thursday that the following distributions are in the exponential family and have the listed canonical links: Normal η i = µ i Poisson η i = logµ i binomial η i = log{µ i /(1 µ i )} gamma η i = µ 1 i For the canonical links, the sufficient statistic is X y, with components i x ij y i, for j = 1,..., p. 8
9 Although canonical links often nice properties, selection of the link function should be based on prior expectation and model fit Example: Logistic Regression Suppose y i Bin(1, p i ), for i = 1,..., n, are independent 0/1 indicator variables of an adverse response (e.g., preterm birth) and let x i denote a p 1 vector of predictors for individual i (e.g., dose of dde exposure, race, age, etc). The likelihood is as follows: f(y β) = n = n = exp [ n p y i i (1 p i ) 1 y i = n ( p i ) y i (1 p i ) 1 p i exp { y i log ( p ) ( i 1 )} log 1 p i 1 p i {y i θ i log(1 + e θ i )} ]. 9
10 Choosing the canonical link, θ i = log ( p i 1 p i the likelihood has the following form: ) = x i β, f(y β) = exp[ n {y i x iβ log(1 + e x iβ )}]. This is logistic regression, which is widely used in epidemiology and other applications for modeling of binary response data. In general, if f(y i ; θ i, φ) is in the exponential family and θ i = θ(η i ), η i = x iβ, then the model is called a generalized linear model (GLM) 10
11 Model fitting Choosing a GLM results in a likelihood function: L(y; β, φ, x) = n exp { y i θ i b(θ i ) a(φ) + c(y i, φ) }, where θ i is a function of η i = x iβ. The maximum likelihood estimate is defined as β = sup L(y; β, φ, x), β with φ initially assumed to be known 11
12 Frequentist inferences for GLMs typically rely on β and asymptotic approximations. In the normal linear model special case, the MLE corresponds to the least squares estimator In general, there is no closed form expression so we need an algorithm to calculate β. 12
13 Maximum Likelihood Estimation of GLMs All GLMs can be fit using the same algorithm, a form of iteratively re-weighted least squares: 1. Given an initial value for β, calculate the estimated linear predictor η i = x i β and use that to obtain the fitted values µ i = g 1 ( η i ). Calculate the adjusted dependent variable, z i = η i + (y i µ i ) ( dη ) i dµ, 0 i where the derivative is evaluated at µ i. 13
14 2. Calculate the iterative weights W 1 i = ( dη ) i dµ V 0 i. i where V i is the variance function evaluated at µ i. 3. Regress z i on x i with weight W i to give new estimates of β 14
15 Justification for the IWLS procedure Note that the log-likelihood can be expressed as l = n {y i θ i b(θ i )}/a(φ) + c(y i, φ). To maximize this log-likelihood we need l/ β j, l β j = n = n = n l i dθ i dµ i θ i dµ i dη i (y i µ i ) a(φ) (y i µ i ) W i a(φ) η i β j 1 V i dµ i dη i x ij, dη i dµ i x ij since µ i = b (θ i ) and b (θ i ) = V i implies dµ i /dθ i = V i. With constant dispersion (a(φ) = φ), the MLE equations for β j : n W i (y i µ i ) dη i dµ i x ij = 0. 15
16 Fisher s scoring method uses the gradient vector, l/ β = u, and minus the expected value of the Hessian matrix E ( 2 l ) = A. β r β s Given the current estimate b of β, choose the adjustment δb so Aδb = u. Excluding φ, the components of u are u r = n so we have A rs = E( u r / β s ) = E n [ (yi µ i ) β s W i (y i µ i ) dη i dµ i x ir, { dη } i dη i Wi x ir + Wi x ir (y i µ i ) ]. dµ i dµ i β s The expectation of the first term is 0 and the second term is n W i dη i dµ i x ir µ i β s = n W i dη i dµ i x ir dµ i dη i η i β s = n W i x ir x is. 16
17 The new estimate b = b + δb of β thus satisfies Ab = Ab + Aδb = Ab + u, where (Ab) r = s A rs b s = n W i x ir η i. Thus, the new estimate b satisfies (Ab ) r = n W i x ir {η i + (y i µ i )dη i /dµ i }. These equations have the form of linear weighted least squares equation with weight W i and dependent variable z i. 17
18 Some Comments The IWLS procedure is simple to implement and converges rapidly in most cases Procedures are available to calculate MLEs and implement frequentist inferences for GLMs in most software packages. In R or S-PLUS the glm( ) function can be used - try help(glm) In Matlab the glmfit( ) function can be used 18
19 Example: Smoking and Obesity y i = 1 if the child is obese and y i = 0 otherwise, for i = 1,..., n x i = (1, age i, smoke i, age i smoke i ) Bernoulli likelihood, L(y; β, x) = n where µ i = Pr(y i = 1 x i, β). µ y i i (1 µ i ) 1 y i, Choosing the canonical link, µ i = 1/{1 + exp( x iβ)}, results in a logistic regression model: Pr(y i = 1 x i, β) = exp(x iβ) 1 + exp(x iβ), Hence, probability of obesity depends on age and smoking through a non-linear model 19
20 Letting X = cbind(age,smoke,age*smoke) and Y = 0/1 obesity outcome in R, we use fit<- glm(y ~ age + smoke + age*smoke, family=binomial, data=obese) to implement IWLS and fit the model Note that data are available on the web - try to replicate results (note children a year or younger have been discarded) The command summary(glm) yields the results: 20
21 Coefficients: Value Std. Error t value (Intercept) age smoke age:smoke Null Deviance: on 3874 degrees of freedom Residual Deviance: on 3871 degrees of freedom Number of Fisher Scoring Iterations: 6 Correlation of Coefficients: (Intercept) age smoke age smoke age:smoke
22 Thus, the IWLS algorithm converged in 6 iterations to the MLE: β = ( 2.365, 0.066, 0.043, 0.008) For any value of the covariates we can calculate the probability of obesity For example, for non-smokers the age curves can be plotted by using: beta<- fit$coef ## introduce grid spanning range of observed ages x<- seq(min(obese$age),max(obese$age),length=100) ## calculate fitted probability of obesity py<- 1/(1+exp(-beta[1]+beta[2]*x)) plot(x,py,xlab="age in years", ylab="pr(obesity)") Meaning of the rest of the R/S-PLUS output will be clear after next class 22
23 Next Class Topic: Frequentist inference for GLMs Have homework exercise completed and written up for next Thursday Complete the following exercise: 1. Write down generalized linear models for the Caesarian data (grouping the two different infection types) and the cellular differentiation data. 2. Show the different components of the GLM, expressing the likelihood in exponential family form & using a canonical link function 3. Fit the GLM using maximum likelihood and report the parameter estimates. 23
Poisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationLogistic Regression (a type of Generalized Linear Model)
Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge
More informationPattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University
Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationGeneralized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
More informationLecture 14: GLM Estimation and Logistic Regression
Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South
More informationLogit Models for Binary Data
Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response
More informationChapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationFactorial experimental designs and generalized linear models
Statistics & Operations Research Transactions SORT 29 (2) July-December 2005, 249-268 ISSN: 1696-2281 www.idescat.net/sort Statistics & Operations Research c Institut d Estadística de Transactions Catalunya
More informationi=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by
Statistics 580 Maximum Likelihood Estimation Introduction Let y (y 1, y 2,..., y n be a vector of iid, random variables from one of a family of distributions on R n and indexed by a p-dimensional parameter
More informationLogit and Probit. Brad Jones 1. April 21, 2009. University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science
Logit and Probit Brad 1 1 Department of Political Science University of California, Davis April 21, 2009 Logit, redux Logit resolves the functional form problem (in terms of the response function in the
More informationLecture 6: Poisson regression
Lecture 6: Poisson regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction EDA for Poisson regression Estimation and testing in Poisson regression
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationGLM, insurance pricing & big data: paying attention to convergence issues.
GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More informationLOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
More informationUsing the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes
Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes JunXuJ.ScottLong Indiana University August 22, 2005 The paper provides technical details on
More informationPart 2: One-parameter models
Part 2: One-parameter models Bernoilli/binomial models Return to iid Y 1,...,Y n Bin(1, θ). The sampling model/likelihood is p(y 1,...,y n θ) =θ P y i (1 θ) n P y i When combined with a prior p(θ), Bayes
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationOrdinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
More informationDepartment of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.
Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x
More informationMaximum Likelihood Estimation
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for
More informationLocal classification and local likelihoods
Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor
More informationPrinciple of Data Reduction
Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then
More information13. Poisson Regression Analysis
136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often
More informationOverview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
More informationVI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
More informationStatistics in Retail Finance. Chapter 2: Statistical models of default
Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of
More informationLecture 8: Gamma regression
Lecture 8: Gamma regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Models with constant coefficient of variation Gamma regression: estimation and testing
More informationNominal and ordinal logistic regression
Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome
More informationStandard errors of marginal effects in the heteroskedastic probit model
Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationPenalized Logistic Regression and Classification of Microarray Data
Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More information1 Logistic Regression
Generalized Linear Models 1 May 29, 2012 Generalized Linear Models (GLM) In the previous chapter on regression, we focused primarily on the classic setting where the response y is continuous and typically
More informationGLM with a Gamma-distributed Dependent Variable
GLM with a Gamma-distributed Dependent Variable Paul E. Johnson October 6, 204 Introduction I started out to write about why the Gamma distribution in a GLM is useful. In the end, I ve found it difficult
More informationNonlinear Regression:
Zurich University of Applied Sciences School of Engineering IDP Institute of Data Analysis and Process Design Nonlinear Regression: A Powerful Tool With Considerable Complexity Half-Day : Improved Inference
More informationSUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationA Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationSpatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
More informationRegression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
More informationPractice problems for Homework 11 - Point Estimation
Practice problems for Homework 11 - Point Estimation 1. (10 marks) Suppose we want to select a random sample of size 5 from the current CS 3341 students. Which of the following strategies is the best:
More informationThe zero-adjusted Inverse Gaussian distribution as a model for insurance claims
The zero-adjusted Inverse Gaussian distribution as a model for insurance claims Gillian Heller 1, Mikis Stasinopoulos 2 and Bob Rigby 2 1 Dept of Statistics, Macquarie University, Sydney, Australia. email:
More informationSimple example of collinearity in logistic regression
1 Confounding and Collinearity in Multivariate Logistic Regression We have already seen confounding and collinearity in the context of linear regression, and all definitions and issues remain essentially
More informationThe Probit Link Function in Generalized Linear Models for Data Mining Applications
Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications
More informationAnalysis of ordinal data with cumulative link models estimation with the R-package ordinal
Analysis of ordinal data with cumulative link models estimation with the R-package ordinal Rune Haubo B Christensen June 28, 2015 1 Contents 1 Introduction 3 2 Cumulative link models 4 2.1 Fitting cumulative
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationIntroduction to Logistic Regression
OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction
More information171:290 Model Selection Lecture II: The Akaike Information Criterion
171:290 Model Selection Lecture II: The Akaike Information Criterion Department of Biostatistics Department of Statistics and Actuarial Science August 28, 2012 Introduction AIC, the Akaike Information
More informationLinda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents
Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén
More informationComputer exercise 4 Poisson Regression
Chalmers-University of Gothenburg Department of Mathematical Sciences Probability, Statistics and Risk MVE300 Computer exercise 4 Poisson Regression When dealing with two or more variables, the functional
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationLecture 19: Conditional Logistic Regression
Lecture 19: Conditional Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina
More informationGLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
More informationPricing of Car Insurance with Generalized Linear Models
Faculteit Wetenschappen en Bio-ingenieurswetenschappen Vakgroep Wiskunde Voorzitter: Prof. Dr. P. Uwe Einmahl Pricing of Car Insurance with Generalized Linear Models door Evelien Brisard Promotor Prof.
More informationMATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...
MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationExamining a Fitted Logistic Model
STAT 536 Lecture 16 1 Examining a Fitted Logistic Model Deviance Test for Lack of Fit The data below describes the male birth fraction male births/total births over the years 1931 to 1990. A simple logistic
More informationModel Selection and Claim Frequency for Workers Compensation Insurance
Model Selection and Claim Frequency for Workers Compensation Insurance Jisheng Cui, David Pitt and Guoqi Qian Abstract We consider a set of workers compensation insurance claim data where the aggregate
More informationE(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F
Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,
More informationResponse variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting.
Prof. Dr. J. Franke All of Statistics 1.52 Binary response variables - logistic regression Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationLocation matters. 3 techniques to incorporate geo-spatial effects in one's predictive model
Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort xavier.conort@gear-analytics.com Motivation Location matters! Observed value at one location is
More informationDistribution (Weibull) Fitting
Chapter 550 Distribution (Weibull) Fitting Introduction This procedure estimates the parameters of the exponential, extreme value, logistic, log-logistic, lognormal, normal, and Weibull probability distributions
More informationCHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.
Some Continuous Probability Distributions CHAPTER 6: Continuous Uniform Distribution: 6. Definition: The density function of the continuous random variable X on the interval [A, B] is B A A x B f(x; A,
More informationLogistic Regression for Data Mining and High-Dimensional Classification
Logistic Regression for Data Mining and High-Dimensional Classification Paul Komarek Dept. of Math Sciences Carnegie Mellon University komarek@cmu.edu Advised by Andrew Moore School of Computer Science
More informationMultinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
More informationExtreme Value Modeling for Detection and Attribution of Climate Extremes
Extreme Value Modeling for Detection and Attribution of Climate Extremes Jun Yan, Yujing Jiang Joint work with Zhuo Wang, Xuebin Zhang Department of Statistics, University of Connecticut February 2, 2016
More informationCS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
More informationANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING
ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING BY OMID ROUHANI-KALLEH THESIS Submitted as partial fulfillment of the requirements for the degree of
More informationChapter 3 RANDOM VARIATE GENERATION
Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.
More informationParametric Survival Models
Parametric Survival Models Germán Rodríguez grodri@princeton.edu Spring, 2001; revised Spring 2005, Summer 2010 We consider briefly the analysis of survival data when one is willing to assume a parametric
More informationProbability Calculator
Chapter 95 Introduction Most statisticians have a set of probability tables that they refer to in doing their statistical wor. This procedure provides you with a set of electronic statistical tables that
More informationBasic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
More informationPackage dsmodellingclient
Package dsmodellingclient Maintainer Author Version 4.1.0 License GPL-3 August 20, 2015 Title DataSHIELD client site functions for statistical modelling DataSHIELD
More informationA Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps
More informationInfinitely Imbalanced Logistic Regression
Journal of Machine Learning Research () 1 13 Submitted 9/06;Revised 12/06; Published Infinitely Imbalanced Logistic Regression Art B. Owen Department of Statistics Stanford Unversity Stanford CA, 94305,
More informationChapter 29 The GENMOD Procedure. Chapter Table of Contents
Chapter 29 The GENMOD Procedure Chapter Table of Contents OVERVIEW...1365 WhatisaGeneralizedLinearModel?...1366 ExamplesofGeneralizedLinearModels...1367 TheGENMODProcedure...1368 GETTING STARTED...1370
More informationLOGIT AND PROBIT ANALYSIS
LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y
More informationGENERALIZED LINEAR MODELS IN VEHICLE INSURANCE
ACTA UNIVERSITATIS AGRICULTURAE ET SILVICULTURAE MENDELIANAE BRUNENSIS Volume 62 41 Number 2, 2014 http://dx.doi.org/10.11118/actaun201462020383 GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE Silvie Kafková
More informationMultiple Choice: 2 points each
MID TERM MSF 503 Modeling 1 Name: Answers go here! NEATNESS COUNTS!!! Multiple Choice: 2 points each 1. In Excel, the VLOOKUP function does what? Searches the first row of a range of cells, and then returns
More informationMultivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationErrata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page
Errata for ASM Exam C/4 Study Manual (Sixteenth Edition) Sorted by Page 1 Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page Practice exam 1:9, 1:22, 1:29, 9:5, and 10:8
More informationApplied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne
Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model
More informationCCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York
BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationEstimating an ARMA Process
Statistics 910, #12 1 Overview Estimating an ARMA Process 1. Main ideas 2. Fitting autoregressions 3. Fitting with moving average components 4. Standard errors 5. Examples 6. Appendix: Simple estimators
More informationReject Inference in Credit Scoring. Jie-Men Mok
Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business
More information7.1 The Hazard and Survival Functions
Chapter 7 Survival Models Our final chapter concerns models for the analysis of data which have three main characteristics: (1) the dependent variable or response is the waiting time until the occurrence
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More information