# Generalized Linear Models

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the response variable is discrete and the error terms do not follow a normal distribution, namely logistic regression and Poisson regression. Both belong to a family of regression models called generalized linear models. Generalized linear models are extensions of traditional regression models that allow the mean to depend on the explanatory variables through a link function, and the response variable to be any member of a set of distributions called the exponential family (e.g., Normal, Poisson, Binomial). We can use the function glm() to work with generalized linear models in R. It s usage is similar to that of the function lm() which we previously used for multiple linear regression. The main difference is that we need to include an additional argument family to describe the error distribution and link function to be used in the model. In this tutorial we show how glm() can be used to fit logistic regression and Poisson regression models. A. Logistic Regression Logistic regression is appropriate when the response variable is categorical with two possible outcomes (i.e., binary outcomes). Binary variables can be represented using an indicator variable Y i, taking on values 0 or 1, and modeled using a binomial distribution with probability P(Y i =1) = i. Logistic regression models this probability as a function of one or more explanatory variables. To perform logistic regression in R, use the command: > glm(response ~ explanantory_variables, family=binomial) Note that the option family is set to binomial, which tells R to perform logistic regression. Ex. A car manufacturer was interested in creating a model for determining the probability that families will purchase a new car during the next year. A random sample of 33 suburban families was selected. Data on annual income (in thousands of dollars) and the current age of the oldest family car (in years) was obtained. A follow-up interview was conducted a year later to determine whether or not the family actually purchased a new car during the year (Y=1 if the family purchased a car and 0 otherwise). We are interested in determining the probability that a family purchases a new car given their income and the age of their oldest car.

2 To read in the data set and fit a logistic regression model we type: > dat = read.table("purchase.txt",header=true) > results = glm(new ~ income + age, family=binomial) > results Call: glm(formula = new ~ income + age, family = binomial) Coefficients: (Intercept) income age Degrees of Freedom: 32 Total (i.e. Null); 30 Residual Null Deviance: Residual Deviance: AIC: According to the output, the model is logit( i ) = *income *age. After fitting the model, we can test the overall model fit and hypothesis regarding a subset of regression parameters using a likelihood ratio test (LRT). Likelihood ratio tests are similar to partial F-tests in the sense that they compare the full model with a restricted model where the explanatory variables of interest are omitted. The p-values of the tests are calculated using the 2 distribution. To test the hypothesis H 0 : 1= 2 =0 we can compare our model with a reduced model that only contains an intercept term. A likelihood ratio test comparing the full and reduced models can be performed using the anova() function with the additional option test="chisq". > results.reduced =glm(new ~ 1, family=binomial) > anova(results.reduced,results, test="chisq") Analysis of Deviance Table Model 1: new ~ 1 Model 2: new ~ income + age Resid. Df Resid. Dev Df Deviance P(> Chi ) The likelihood ratio test statistic is 2 =8.298 with a p-value= Hence, we have relatively strong evidence in favor of rejecting H 0.

3 As a next step, we perform tests on the individual regression parameters. > summary(results) Call: glm(formula = new ~ income + age, family = binomial) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) * income * age Signif. codes: 0 *** ** 0.01 * Null deviance: on 32 degrees of freedom Residual deviance: on 30 degrees of freedom AIC: Number of Fisher Scoring iterations: 4 To test H 0 : 1 =0, we use z = (p-value=0.0158). Hence, the family s income appears to have a significant impact on the probability of purchasing a new car, while controlling for the age of the families oldest car. To test H 0 : 2 =0, we use z = (p-value=0.1249). Hence, the age of a family s oldest car does not appear to have a significant impact on the probability of purchasing a new car, once income is included in the model. To compute how the odds of purchasing a car changes as a function of income use the commands: > exp(coef(results)) (Intercept) income age To create a 95% confidence interval for the estimate, type: > exp(confint.default(results)) 2.5 % 97.5 % (Intercept) income age

4 We see that the odds ratio corresponding to income is (95% CI: (1.013, 1.131)). This implies that if we fix the age of the oldest car, increasing family income by one thousand dollars will increase the odds of purchasing a new car by We are often interested in using the fitted logistic regression curve to estimate probabilities and construct confidence intervals for these estimates. We can do this using the function predict.glm. The usage is similar to that of the function predict which we previously used when working on multiple linear regression problems. The main difference is the option type, which tells R which type of prediction is required. The default predictions are given on the logit scale (i.e. predictions are made in terms of the log odds), while using type = "response" gives the predicted probabilities. To predict the probability that a family with an annual income of \$53 thousand and whose oldest car is 1 year old will purchase a new car in the next year, type: > pi.hat = predict.glm(results, data.frame(income=53, age=1), type="response", se.fit=true) > pi.hat\$fit [1] This tells us that the predicted probability is In order to obtain confidence intervals we instead need to work on the logit scale and thereafter transform the results into probabilities. To create a 95% confidence interval for the estimate, type: > l.hat = predict.glm(results, data.frame(income=53, age=1), se.fit=true) > ci = c(l.hat\$fit *l.hat\$se.fit, l.hat\$fit *l.hat\$se.fit) To transform the results to probabilities type: > exp(ci)/(1+exp(ci)) [1] For a family with an annual income of \$53 thousand and whose oldest car is 1 year old, the estimated probability of purchasing a new car is A 95% CI is given by (0.115, 0.720).

5 B. Poisson Regression Data is often collected in counts (e.g. the number of heads in 12 flips of a coin or the number of car thefts in a city during a year). Many discrete response variables have counts as possible outcomes. Binomial counts are the number of successes in a fixed number of trials, n. Poisson counts are the number of occurrences of some event in a certain interval of time (or space). While Binomial counts only take values between 0 and n, Poisson counts have no upper bound. We now consider a nonlinear regression model where the response outcomes are discrete counts that follow a Poisson distribution. Poisson regression provides a model that describes how the mean response, changes as a function of one or more explanatory variables. To perform logistic regression in R, we use the command: > glm(response ~ explanantory_variables, family=poisson) Note that we specified the family to be poisson, which tells R to perform Poisson regression. Ex. Researchers studied 41 male African elephants over a period of 8 years. The age of the elephant at the beginning of the study and the number of successful matings during the 8 years were recorded. We assume the number of matings follows a Poisson distribution, where the mean depends on the age of the elephant in question. We can fit a Poisson regression model using the following code: > dat = read.table("elephants.txt", header=true) > attach(dat) > results = glm(mating ~ age, family=poisson) > summary(results) Call: glm(formula = mating ~ age, family = poisson) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) ** age e-07 *** --- Signif. codes: 0 *** ** 0.01 * (Dispersion parameter for poisson family taken to be 1) Null deviance: on 40 degrees of freedom

6 Residual deviance: on 39 degrees of freedom AIC: Number of Fisher Scoring iterations: 5 To determine whether there is a significant relationship between the mean number of matings and the age of the elephants we test H 0 : 1 =0. The test statistic is z=4.997 (p-value<0.0001). Hence, it appears that age does impact the mean number of matings. To see how the mean number of matings increases per year use the commands: > beta =coef(results) > beta (Intercept) age > exp(beta[2]) age To create a 95% confidence interval for the estimate, type: > exp(confint.default(results)) 2.5 % 97.5 % (Intercept) age Hence, each additional year is associated with a 7.1% increase in the mean number of matings. A 95% confidence interval is given by (1.043, 1.100), which represents a % increase.

### Multiple Linear Regression

Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

### A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

### Logistic Regression (a type of Generalized Linear Model)

Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge

### Multivariate Logistic Regression

1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

### Logistic regression (with R)

Logistic regression (with R) Christopher Manning 4 November 2007 1 Theory We can transform the output of a linear regression to be suitable for probabilities by using a logit link function on the lhs as

### Régression logistique : introduction

Chapitre 16 Introduction à la statistique avec R Régression logistique : introduction Une variable à expliquer binaire Expliquer un risque suicidaire élevé en prison par La durée de la peine L existence

### Lab 13: Logistic Regression

Lab 13: Logistic Regression Spam Emails Today we will be working with a corpus of emails received by a single gmail account over the first three months of 2012. Just like any other email address this account

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### Correlation and Simple Linear Regression

Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a

### Logistic Models in R

Logistic Models in R Jim Bentley 1 Sample Data The following code reads the titanic data that we will use in our examples. > titanic = read.csv( + "http://bulldog2.redlands.edu/facultyfolder/jim_bentley/downloads/math111/titanic.csv

### Examining a Fitted Logistic Model

STAT 536 Lecture 16 1 Examining a Fitted Logistic Model Deviance Test for Lack of Fit The data below describes the male birth fraction male births/total births over the years 1931 to 1990. A simple logistic

### VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

### Regression step-by-step using Microsoft Excel

Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

### Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

### Statistics 305: Introduction to Biostatistical Methods for Health Sciences

Statistics 305: Introduction to Biostatistical Methods for Health Sciences Modelling the Log Odds Logistic Regression (Chap 20) Instructor: Liangliang Wang Statistics and Actuarial Science, Simon Fraser

### Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)

Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.

MSwM examples Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech February 24, 2014 Abstract Two examples are described to illustrate the use of

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

### Nominal and ordinal logistic regression

Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome

### Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 libname in1 >c:\=; Data first; Set in1.extract; A=1; PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA; OUTPUT OUT=CC XBETA=XB P=PROB; MODEL

### 13. Poisson Regression Analysis

136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often

### Introduction to Generalized Linear Models

to Generalized Linear Models Heather Turner ESRC National Centre for Research Methods, UK and Department of Statistics University of Warwick, UK WU, 2008 04 22-24 Copyright c Heather Turner, 2008 to Generalized

### I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

### Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

### Final Exam Practice Problem Answers

Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

### Lecture 8: Gamma regression

Lecture 8: Gamma regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Models with constant coefficient of variation Gamma regression: estimation and testing

### Lecture 6: Poisson regression

Lecture 6: Poisson regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction EDA for Poisson regression Estimation and testing in Poisson regression

### ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. 1. Motivation. Likert items are used to measure respondents attitudes to a particular question or statement. One must recall

### Simple example of collinearity in logistic regression

1 Confounding and Collinearity in Multivariate Logistic Regression We have already seen confounding and collinearity in the context of linear regression, and all definitions and issues remain essentially

### L3: Statistical Modeling with Hadoop

L3: Statistical Modeling with Hadoop Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: December 10, 2014 Today we are going to learn...

### Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480

1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500

### We extended the additive model in two variables to the interaction model by adding a third term to the equation.

Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic

### Section 6: Model Selection, Logistic Regression and more...

Section 6: Model Selection, Logistic Regression and more... Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Model Building

### Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

### Some Essential Statistics The Lure of Statistics

Some Essential Statistics The Lure of Statistics Data Mining Techniques, by M.J.A. Berry and G.S Linoff, 2004 Statistics vs. Data Mining..lie, damn lie, and statistics mining data to support preconceived

### Statistical Models in R

Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

### Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, Last revised March 28, 2015

Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised March 28, 2015 NOTE: The routines spost13, lrdrop1, and extremes are

### BIOSTAT640 R-Solution for HW7 Logistic Minming Li & Steele H. Valenzuela Mar.10, 2016

BIOSTAT640 R-Solution for HW7 Logistic Minming Li & Steele H. Valenzuela Mar.10, 2016 Required Libraries If there is an error in loading the libraries, you must first install the packaged library (install.packages("package

### Regression 3: Logistic Regression

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic regression Logistic regression in R Outline Logistic regression Introduction The model Looking at and comparing

### GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE

ACTA UNIVERSITATIS AGRICULTURAE ET SILVICULTURAE MENDELIANAE BRUNENSIS Volume 62 41 Number 2, 2014 http://dx.doi.org/10.11118/actaun201462020383 GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE Silvie Kafková

### Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

### Yiming Peng, Department of Statistics. February 12, 2013

Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop

### Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@

Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,

### Lecture 12: Generalized Linear Models for Binary Data

Lecture 12: Generalized Linear Models for Binary Data Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of

### 5. Ordinal regression: cumulative categories proportional odds. 6. Ordinal regression: comparison to single reference generalized logits

Lecture 23 1. Logistic regression with binary response 2. Proc Logistic and its surprises 3. quadratic model 4. Hosmer-Lemeshow test for lack of fit 5. Ordinal regression: cumulative categories proportional

### Introduction to Stata

Introduction to Stata September 23, 2014 Stata is one of a few statistical analysis programs that social scientists use. Stata is in the mid-range of how easy it is to use. Other options include SPSS,

### Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

### Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Logistic regression generalizes methods for 2-way tables Adds capability studying several predictors, but Limited to

### Ordinal Regression. Chapter

Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

### Poisson Models for Count Data

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

### Appendix 1: Estimation of the two-variable saturated model in SPSS, Stata and R using the Netherlands 1973 example data

Appendix 1: Estimation of the two-variable saturated model in SPSS, Stata and R using the Netherlands 1973 example data A. SPSS commands and corresponding parameter estimates Copy the 1973 data from the

### Linear Regression in SPSS

Linear Regression in SPSS Data: mangunkill.sav Goals: Examine relation between number of handguns registered (nhandgun) and number of man killed (mankill) checking Predict number of man killed using number

### Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015 References: Long 1997, Long and Freese 2003 & 2006 & 2014,

### Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

### ANOVA. February 12, 2015

ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R

### COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies

### Comparing Nested Models

Comparing Nested Models ST 430/514 Two models are nested if one model contains all the terms of the other, and at least one additional term. The larger model is the complete (or full) model, and the smaller

### Bivariate Analysis. Correlation. Correlation. Pearson's Correlation Coefficient. Variable 1. Variable 2

Bivariate Analysis Variable 2 LEVELS >2 LEVELS COTIUOUS Correlation Used when you measure two continuous variables. Variable 2 2 LEVELS X 2 >2 LEVELS X 2 COTIUOUS t-test X 2 X 2 AOVA (F-test) t-test AOVA

### Using R for Linear Regression

Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

### All Models are wrong, but some are useful.

1 Goodness of Fit in Logistic Regression As in linear regression, goodness of fit in logistic regression attempts to get at how well a model fits the data. It is usually applied after a final model has

### LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

### Introduction to Quantitative Methods

Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

### Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Rather than repeat material that is

### Lecture 13: Introduction to generalized linear models

Lecture 13: Introduction to generalized linear models 21 November 2007 1 Introduction Recall that we ve looked at linear models, which specify a conditional probability density P(Y X) of the form Y = α

### DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

### Statistics in Retail Finance. Chapter 2: Statistical models of default

Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### BRIEF OVERVIEW ON INTERPRETING COUNT MODEL RISK RATIOS

BRIEF OVERVIEW ON INTERPRETING COUNT MODEL RISK RATIOS An Addendum to Negative Binomial Regression Cambridge University Press (2007) Joseph M. Hilbe 2008, All Rights Reserved This short monograph is intended

### Local classification and local likelihoods

Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor

### Package dsmodellingclient

Package dsmodellingclient Maintainer Author Version 4.1.0 License GPL-3 August 20, 2015 Title DataSHIELD client site functions for statistical modelling DataSHIELD

### Categorical Data Analysis

Richard L. Scheaffer University of Florida The reference material and many examples for this section are based on Chapter 8, Analyzing Association Between Categorical Variables, from Statistical Methods

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

### Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship

### How to set the main menu of STATA to default factory settings standards

University of Pretoria Data analysis for evaluation studies Examples in STATA version 11 List of data sets b1.dta (To be created by students in class) fp1.xls (To be provided to students) fp1.txt (To be

### Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

### Using Stata for Categorical Data Analysis

Using Stata for Categorical Data Analysis NOTE: These problems make extensive use of Nick Cox s tab_chi, which is actually a collection of routines, and Adrian Mander s ipf command. From within Stata,

### Online Appendix to Are Risk Preferences Stable Across Contexts? Evidence from Insurance Data

Online Appendix to Are Risk Preferences Stable Across Contexts? Evidence from Insurance Data By LEVON BARSEGHYAN, JEFFREY PRINCE, AND JOSHUA C. TEITELBAUM I. Empty Test Intervals Here we discuss the conditions

### AP Statistics 2001 Solutions and Scoring Guidelines

AP Statistics 2001 Solutions and Scoring Guidelines The materials included in these files are intended for non-commercial use by AP teachers for course and exam preparation; permission for any other use

### Outline. Dispersion Bush lupine survival Quasi-Binomial family

Outline 1 Three-way interactions 2 Overdispersion in logistic regression Dispersion Bush lupine survival Quasi-Binomial family 3 Simulation for inference Why simulations Testing model fit: simulating the

### Logistic Regression (1/24/13)

STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

### 1 Logistic Regression

Generalized Linear Models 1 May 29, 2012 Generalized Linear Models (GLM) In the previous chapter on regression, we focused primarily on the classic setting where the response y is continuous and typically

### data visualization and regression

data visualization and regression Sepal.Length 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 I. setosa I. versicolor I. virginica I. setosa I. versicolor I. virginica Species Species

### Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

### SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

### Statistical Models in R

Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate

### Data Analysis. Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) SS Analysis of Experiments - Introduction

Data Analysis Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) Prof. Dr. Dr. h.c. Dieter Rombach Dr. Andreas Jedlitschka SS 2014 Analysis of Experiments - Introduction

### Psychology 205: Research Methods in Psychology

Psychology 205: Research Methods in Psychology Using R to analyze the data for study 2 Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 38 Outline 1 Getting ready