Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)


 Joanna Stephens
 2 years ago
 Views:
Transcription
1 Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through a regression model for θ i Involves choice of a link function (systematic component) Examples for counts, binomial data Algorithm for maximizing likelihood 1
2 Systematic Component, Link Functions Instead of modeling the mean, µ i, as a linear function of predictors, x i, we introduce on onetoone continuously differentiable transformation g( ) and focus on η i = g(µ i ), where g( ) will be called the link function and η i the linear predictor. We assume that the transformed mean follows a linear model, η i = x iβ. Since the link function is invertible and onetoone, we have µ i = g 1 (η i ) = g 1 (x iβ). 2
3 Note that we are transforming the expected value, µ i, instead of the raw data, y i. For classical linear models, the mean is the linear predictor. In this case, the identity link is reasonable since both µ i and η i can take any value on the real line. This is not the case in general. 3
4 Link Functions for Poisson Data For example, if Y i Poi(µ i ) then µ i must be > 0. In this case, a linear model is not reasonable since for some values of x i µ i 0. By using the model, η i = log(µ i ) = x iβ, we are guaranteed to have µ i > 0 for all β R p and all values of x i. In general, a link function for count data should map the interval (0, ) R (i.e., from the + real numbers to the entire real line). The log link is a natural choice 4
5 Link Functions for Binomial Data For the binomial distribution, 0 < µ i < 1 (mean of y i is n i µ i ) Therefore, the link function should map from (0, 1) R Standard choices: 1. logit: η i = log{µ i /(1 µ i )}. 2. probit: η i = Φ 1 (µ i ), where Φ( ) is the N(0, 1) cdf. 3. complementary loglog: η i = log{ log(1 µ i )}. Each of these choices is important in applications & will be considered in detail later in the course 5
6 Recall that the exponential family density has the following form: f(y i ; θ i, φ) = exp { y i θ i b(θ i ) a(φ) + c(y i, φ) }. where a( ), b( ) and c( ) are known functions. Specifying the GLM involves choosing a( ), b( ), c( ): 1. Specify a( ), c( ) to correspond to particular distribution (e.g., Binomial, Poisson) 2. Specify b( ) to correspond to a particular link function 6
7 Recall that mean & variance are µ i = b (θ i ) and σ 2 = b (θ i )φ. Using b (θ i ) = g 1 (x iβ), we can express the density as f(y i ; x i, β, φ), so that the conditional likelihood of y i given x i depends on parameters β and φ. It would seem that a natural choice for b( ) and hence g( ), would correspond to θ i = η i = x iβ, so that b ( ) is the inverse link 7
8 Canonical Links and Sufficient Statistics Each of the distributions we have considered has a special, canonical, link function for which there exists a sufficient statistic equal in dimension to β. Canonical links occur when θ i = η i = x iβ, with θ i the canonical parameter As a homework exercise, please show for next Thursday that the following distributions are in the exponential family and have the listed canonical links: Normal η i = µ i Poisson η i = logµ i binomial η i = log{µ i /(1 µ i )} gamma η i = µ 1 i For the canonical links, the sufficient statistic is X y, with components i x ij y i, for j = 1,..., p. 8
9 Although canonical links often nice properties, selection of the link function should be based on prior expectation and model fit Example: Logistic Regression Suppose y i Bin(1, p i ), for i = 1,..., n, are independent 0/1 indicator variables of an adverse response (e.g., preterm birth) and let x i denote a p 1 vector of predictors for individual i (e.g., dose of dde exposure, race, age, etc). The likelihood is as follows: f(y β) = n = n = exp [ n p y i i (1 p i ) 1 y i = n ( p i ) y i (1 p i ) 1 p i exp { y i log ( p ) ( i 1 )} log 1 p i 1 p i {y i θ i log(1 + e θ i )} ]. 9
10 Choosing the canonical link, θ i = log ( p i 1 p i the likelihood has the following form: ) = x i β, f(y β) = exp[ n {y i x iβ log(1 + e x iβ )}]. This is logistic regression, which is widely used in epidemiology and other applications for modeling of binary response data. In general, if f(y i ; θ i, φ) is in the exponential family and θ i = θ(η i ), η i = x iβ, then the model is called a generalized linear model (GLM) 10
11 Model fitting Choosing a GLM results in a likelihood function: L(y; β, φ, x) = n exp { y i θ i b(θ i ) a(φ) + c(y i, φ) }, where θ i is a function of η i = x iβ. The maximum likelihood estimate is defined as β = sup L(y; β, φ, x), β with φ initially assumed to be known 11
12 Frequentist inferences for GLMs typically rely on β and asymptotic approximations. In the normal linear model special case, the MLE corresponds to the least squares estimator In general, there is no closed form expression so we need an algorithm to calculate β. 12
13 Maximum Likelihood Estimation of GLMs All GLMs can be fit using the same algorithm, a form of iteratively reweighted least squares: 1. Given an initial value for β, calculate the estimated linear predictor η i = x i β and use that to obtain the fitted values µ i = g 1 ( η i ). Calculate the adjusted dependent variable, z i = η i + (y i µ i ) ( dη ) i dµ, 0 i where the derivative is evaluated at µ i. 13
14 2. Calculate the iterative weights W 1 i = ( dη ) i dµ V 0 i. i where V i is the variance function evaluated at µ i. 3. Regress z i on x i with weight W i to give new estimates of β 14
15 Justification for the IWLS procedure Note that the loglikelihood can be expressed as l = n {y i θ i b(θ i )}/a(φ) + c(y i, φ). To maximize this loglikelihood we need l/ β j, l β j = n = n = n l i dθ i dµ i θ i dµ i dη i (y i µ i ) a(φ) (y i µ i ) W i a(φ) η i β j 1 V i dµ i dη i x ij, dη i dµ i x ij since µ i = b (θ i ) and b (θ i ) = V i implies dµ i /dθ i = V i. With constant dispersion (a(φ) = φ), the MLE equations for β j : n W i (y i µ i ) dη i dµ i x ij = 0. 15
16 Fisher s scoring method uses the gradient vector, l/ β = u, and minus the expected value of the Hessian matrix E ( 2 l ) = A. β r β s Given the current estimate b of β, choose the adjustment δb so Aδb = u. Excluding φ, the components of u are u r = n so we have A rs = E( u r / β s ) = E n [ (yi µ i ) β s W i (y i µ i ) dη i dµ i x ir, { dη } i dη i Wi x ir + Wi x ir (y i µ i ) ]. dµ i dµ i β s The expectation of the first term is 0 and the second term is n W i dη i dµ i x ir µ i β s = n W i dη i dµ i x ir dµ i dη i η i β s = n W i x ir x is. 16
17 The new estimate b = b + δb of β thus satisfies Ab = Ab + Aδb = Ab + u, where (Ab) r = s A rs b s = n W i x ir η i. Thus, the new estimate b satisfies (Ab ) r = n W i x ir {η i + (y i µ i )dη i /dµ i }. These equations have the form of linear weighted least squares equation with weight W i and dependent variable z i. 17
18 Some Comments The IWLS procedure is simple to implement and converges rapidly in most cases Procedures are available to calculate MLEs and implement frequentist inferences for GLMs in most software packages. In R or SPLUS the glm( ) function can be used  try help(glm) In Matlab the glmfit( ) function can be used 18
19 Example: Smoking and Obesity y i = 1 if the child is obese and y i = 0 otherwise, for i = 1,..., n x i = (1, age i, smoke i, age i smoke i ) Bernoulli likelihood, L(y; β, x) = n where µ i = Pr(y i = 1 x i, β). µ y i i (1 µ i ) 1 y i, Choosing the canonical link, µ i = 1/{1 + exp( x iβ)}, results in a logistic regression model: Pr(y i = 1 x i, β) = exp(x iβ) 1 + exp(x iβ), Hence, probability of obesity depends on age and smoking through a nonlinear model 19
20 Letting X = cbind(age,smoke,age*smoke) and Y = 0/1 obesity outcome in R, we use fit< glm(y ~ age + smoke + age*smoke, family=binomial, data=obese) to implement IWLS and fit the model Note that data are available on the web  try to replicate results (note children a year or younger have been discarded) The command summary(glm) yields the results: 20
21 Coefficients: Value Std. Error t value (Intercept) age smoke age:smoke Null Deviance: on 3874 degrees of freedom Residual Deviance: on 3871 degrees of freedom Number of Fisher Scoring Iterations: 6 Correlation of Coefficients: (Intercept) age smoke age smoke age:smoke
22 Thus, the IWLS algorithm converged in 6 iterations to the MLE: β = ( 2.365, 0.066, 0.043, 0.008) For any value of the covariates we can calculate the probability of obesity For example, for nonsmokers the age curves can be plotted by using: beta< fit$coef ## introduce grid spanning range of observed ages x< seq(min(obese$age),max(obese$age),length=100) ## calculate fitted probability of obesity py< 1/(1+exp(beta[1]+beta[2]*x)) plot(x,py,xlab="age in years", ylab="pr(obesity)") Meaning of the rest of the R/SPLUS output will be clear after next class 22
23 Next Class Topic: Frequentist inference for GLMs Have homework exercise completed and written up for next Thursday Complete the following exercise: 1. Write down generalized linear models for the Caesarian data (grouping the two different infection types) and the cellular differentiation data. 2. Show the different components of the GLM, expressing the likelihood in exponential family form & using a canonical link function 3. Fit the GLM using maximum likelihood and report the parameter estimates. 23
Introduction to Generalized Linear Models
to Generalized Linear Models Heather Turner ESRC National Centre for Research Methods, UK and Department of Statistics University of Warwick, UK WU, 2008 04 2224 Copyright c Heather Turner, 2008 to Generalized
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study loglinear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationLogistic Regression (a type of Generalized Linear Model)
Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge
More informationPattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University
Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationGeneralized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
More informationLecture 14: GLM Estimation and Logistic Regression
Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South
More informationChapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
More informationLogit Models for Binary Data
Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response
More informationModels for Count Data With Overdispersion
Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extrapoisson variation and the negative binomial model, with brief appearances
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationFactorial experimental designs and generalized linear models
Statistics & Operations Research Transactions SORT 29 (2) JulyDecember 2005, 249268 ISSN: 16962281 www.idescat.net/sort Statistics & Operations Research c Institut d Estadística de Transactions Catalunya
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationLogit and Probit. Brad Jones 1. April 21, 2009. University of California, Davis. Bradford S. Jones, UCDavis, Dept. of Political Science
Logit and Probit Brad 1 1 Department of Political Science University of California, Davis April 21, 2009 Logit, redux Logit resolves the functional form problem (in terms of the response function in the
More information18 Generalised Linear Models
18 Generalised Linear Models Generalised linear models (GLM) is a generalisation of ordinary least squares regression. See also Davison, Section 10.110.4, Green (1984) and Dobson and Barnett (2008). To
More informationi=1 In practice, the natural logarithm of the likelihood function, called the loglikelihood function and denoted by
Statistics 580 Maximum Likelihood Estimation Introduction Let y (y 1, y 2,..., y n be a vector of iid, random variables from one of a family of distributions on R n and indexed by a pdimensional parameter
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationLecture 6: Poisson regression
Lecture 6: Poisson regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction EDA for Poisson regression Estimation and testing in Poisson regression
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More informationGLM, insurance pricing & big data: paying attention to convergence issues.
GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK  michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.
More informationLOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
More informationUsing the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes
Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes JunXuJ.ScottLong Indiana University August 22, 2005 The paper provides technical details on
More informationOrdinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
More informationPart 2: Oneparameter models
Part 2: Oneparameter models Bernoilli/binomial models Return to iid Y 1,...,Y n Bin(1, θ). The sampling model/likelihood is p(y 1,...,y n θ) =θ P y i (1 θ) n P y i When combined with a prior p(θ), Bayes
More informationLocal classification and local likelihoods
Local classification and local likelihoods November 18 knearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models  part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK2800 Kgs. Lyngby
More information13. Poisson Regression Analysis
136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often
More informationIntroduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures.
Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses.
More informationMaximum Likelihood Estimation
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for
More information1 Logistic Regression
Generalized Linear Models 1 May 29, 2012 Generalized Linear Models (GLM) In the previous chapter on regression, we focused primarily on the classic setting where the response y is continuous and typically
More informationPrinciple of Data Reduction
Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then
More informationLogistic regression: Model selection
Logistic regression: April 14 The WCGS data Measures of predictive power Today we will look at issues of model selection and measuring the predictive power of a model in logistic regression Our data set
More informationVI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
More informationOverview Classes. 123 Logistic regression (5) 193 Building and applying logistic regression (6) 263 Generalizations of logistic regression (7)
Overview Classes 123 Logistic regression (5) 193 Building and applying logistic regression (6) 263 Generalizations of logistic regression (7) 24 Loglinear models (8) 54 1517 hrs; 5B02 Building and
More informationDepartment of Mathematics, Indian Institute of Technology, Kharagpur Assignment 23, Probability and Statistics, March 2015. Due:March 25, 2015.
Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 3, Probability and Statistics, March 05. Due:March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x
More informationStatistics in Retail Finance. Chapter 2: Statistical models of default
Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationProfile Likelihood Confidence Intervals for GLM s
Profile Likelihood Confidence Intervals for GLM s The standard procedure for computing a confidence interval (CI) for a parameter in a generalized linear model is by the formula: estimate ± percentile
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationLecture 8: Gamma regression
Lecture 8: Gamma regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Models with constant coefficient of variation Gamma regression: estimation and testing
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANACHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANACHAMPAIGN Linear Algebra Slide 1 of
More informationGLM with a Gammadistributed Dependent Variable
GLM with a Gammadistributed Dependent Variable Paul E. Johnson October 6, 204 Introduction I started out to write about why the Gamma distribution in a GLM is useful. In the end, I ve found it difficult
More informationThe zeroadjusted Inverse Gaussian distribution as a model for insurance claims
The zeroadjusted Inverse Gaussian distribution as a model for insurance claims Gillian Heller 1, Mikis Stasinopoulos 2 and Bob Rigby 2 1 Dept of Statistics, Macquarie University, Sydney, Australia. email:
More informationNominal and ordinal logistic regression
Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome
More informationLecture 13: Introduction to generalized linear models
Lecture 13: Introduction to generalized linear models 21 November 2007 1 Introduction Recall that we ve looked at linear models, which specify a conditional probability density P(Y X) of the form Y = α
More informationPenalized Logistic Regression and Classification of Microarray Data
Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAGLMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationSUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationNonlinear Regression:
Zurich University of Applied Sciences School of Engineering IDP Institute of Data Analysis and Process Design Nonlinear Regression: A Powerful Tool With Considerable Complexity HalfDay : Improved Inference
More informationSpatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
More informationSYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation
SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu September 19, 2015 Outline
More informationA Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit nonresponse. In a survey, certain respondents may be unreachable or may refuse to participate. Item
More informationParametric Models Part I: Maximum Likelihood and Bayesian Density Estimation
Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015
More informationAnalysis of ordinal data with cumulative link models estimation with the Rpackage ordinal
Analysis of ordinal data with cumulative link models estimation with the Rpackage ordinal Rune Haubo B Christensen June 28, 2015 1 Contents 1 Introduction 3 2 Cumulative link models 4 2.1 Fitting cumulative
More informationIntroduction to Logistic Regression
OpenStaxCNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStaxCNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction
More informationNonParametric Estimation in Survival Models
NonParametric Estimation in Survival Models Germán Rodríguez grodri@princeton.edu Spring, 2001; revised Spring 2005 We now discuss the analysis of survival data without parametric assumptions about the
More informationStandard errors of marginal effects in the heteroskedastic probit model
Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In nonlinear regression models, such as the heteroskedastic
More informationThe Probit Link Function in Generalized Linear Models for Data Mining Applications
Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications
More informationLinda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents
Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén
More informationRegression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
More informationPearson s Goodness of Fit Statistic as a Score Test Statistic
Pearson s Goodness of Fit Statistic as a Score Test Statistic Gordon K. Smyth Abstract For any generalized linear model, the Pearson goodness of fit statistic is the score test statistic for testing the
More informationSimple example of collinearity in logistic regression
1 Confounding and Collinearity in Multivariate Logistic Regression We have already seen confounding and collinearity in the context of linear regression, and all definitions and issues remain essentially
More informationPractice problems for Homework 11  Point Estimation
Practice problems for Homework 11  Point Estimation 1. (10 marks) Suppose we want to select a random sample of size 5 from the current CS 3341 students. Which of the following strategies is the best:
More informationComputer exercise 4 Poisson Regression
ChalmersUniversity of Gothenburg Department of Mathematical Sciences Probability, Statistics and Risk MVE300 Computer exercise 4 Poisson Regression When dealing with two or more variables, the functional
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationMATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...
MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 20092016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationDistribution (Weibull) Fitting
Chapter 550 Distribution (Weibull) Fitting Introduction This procedure estimates the parameters of the exponential, extreme value, logistic, loglogistic, lognormal, normal, and Weibull probability distributions
More informationExamining a Fitted Logistic Model
STAT 536 Lecture 16 1 Examining a Fitted Logistic Model Deviance Test for Lack of Fit The data below describes the male birth fraction male births/total births over the years 1931 to 1990. A simple logistic
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationResponse variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting.
Prof. Dr. J. Franke All of Statistics 1.52 Binary response variables  logistic regression Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit
More informationExtreme Value Modeling for Detection and Attribution of Climate Extremes
Extreme Value Modeling for Detection and Attribution of Climate Extremes Jun Yan, Yujing Jiang Joint work with Zhuo Wang, Xuebin Zhang Department of Statistics, University of Connecticut February 2, 2016
More informationE(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F
Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,
More informationMicroeconometrics Blundell Lecture 1 Overview and Binary Response Models
Microeconometrics Blundell Lecture 1 Overview and Binary Response Models Richard Blundell http://www.ucl.ac.uk/~uctp39a/ University College London FebruaryMarch 2016 Blundell (University College London)
More informationLecture 19: Conditional Logistic Regression
Lecture 19: Conditional Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina
More informationLogistic Regression for Data Mining and HighDimensional Classification
Logistic Regression for Data Mining and HighDimensional Classification Paul Komarek Dept. of Math Sciences Carnegie Mellon University komarek@cmu.edu Advised by Andrew Moore School of Computer Science
More informationLocation matters. 3 techniques to incorporate geospatial effects in one's predictive model
Location matters. 3 techniques to incorporate geospatial effects in one's predictive model Xavier Conort xavier.conort@gearanalytics.com Motivation Location matters! Observed value at one location is
More informationGENERALIZED LINEAR MODELS IN VEHICLE INSURANCE
ACTA UNIVERSITATIS AGRICULTURAE ET SILVICULTURAE MENDELIANAE BRUNENSIS Volume 62 41 Number 2, 2014 http://dx.doi.org/10.11118/actaun201462020383 GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE Silvie Kafková
More informationPricing of Car Insurance with Generalized Linear Models
Faculteit Wetenschappen en Bioingenieurswetenschappen Vakgroep Wiskunde Voorzitter: Prof. Dr. P. Uwe Einmahl Pricing of Car Insurance with Generalized Linear Models door Evelien Brisard Promotor Prof.
More informationGLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
More informationCS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
More informationProbability Calculator
Chapter 95 Introduction Most statisticians have a set of probability tables that they refer to in doing their statistical wor. This procedure provides you with a set of electronic statistical tables that
More informationGLM III: Advanced Modeling Strategy 2005 CAS Seminar on Predictive Modeling Duncan Anderson MA FIA Watson Wyatt Worldwide
GLM III: Advanced Modeling Strategy 25 CAS Seminar on Predictive Modeling Duncan Anderson MA FIA Watson Wyatt Worldwide W W W. W A T S O N W Y A T T. C O M Agenda Introduction Testing the link function
More informationModel Selection and Claim Frequency for Workers Compensation Insurance
Model Selection and Claim Frequency for Workers Compensation Insurance Jisheng Cui, David Pitt and Guoqi Qian Abstract We consider a set of workers compensation insurance claim data where the aggregate
More informationA Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps
More information11. Analysis of Casecontrol Studies Logistic Regression
Research methods II 113 11. Analysis of Casecontrol Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationMultiple Choice: 2 points each
MID TERM MSF 503 Modeling 1 Name: Answers go here! NEATNESS COUNTS!!! Multiple Choice: 2 points each 1. In Excel, the VLOOKUP function does what? Searches the first row of a range of cells, and then returns
More informationBasic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS OneSample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
More informationModern Methods for Missing Data
Modern Methods for Missing Data Paul D. Allison, Ph.D. Statistical Horizons LLC www.statisticalhorizons.com 1 Introduction Missing data problems are nearly universal in statistical practice. Last 25 years
More informationThe Exponential Family
The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural
More informationChapter 14: Analyzing Relationships Between Variables
Chapter Outlines for: Frey, L., Botan, C., & Kreps, G. (1999). Investigating communication: An introduction to research methods. (2nd ed.) Boston: Allyn & Bacon. Chapter 14: Analyzing Relationships Between
More informationPackage dsmodellingclient
Package dsmodellingclient Maintainer Author Version 4.1.0 License GPL3 August 20, 2015 Title DataSHIELD client site functions for statistical modelling DataSHIELD
More informationChapter 29 The GENMOD Procedure. Chapter Table of Contents
Chapter 29 The GENMOD Procedure Chapter Table of Contents OVERVIEW...1365 WhatisaGeneralizedLinearModel?...1366 ExamplesofGeneralizedLinearModels...1367 TheGENMODProcedure...1368 GETTING STARTED...1370
More informationInfinitely Imbalanced Logistic Regression
Journal of Machine Learning Research () 1 13 Submitted 9/06;Revised 12/06; Published Infinitely Imbalanced Logistic Regression Art B. Owen Department of Statistics Stanford Unversity Stanford CA, 94305,
More informationMultivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #47/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN13: 9780470860809 ISBN10: 0470860804 Editors Brian S Everitt & David
More information