Régression logistique : introduction



Similar documents
Generalized Linear Models

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Multivariate Logistic Regression

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.

Logistic Regression (a type of Generalized Linear Model)

Lab 13: Logistic Regression

Logistic regression (with R)

Lecture 6: Poisson regression

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Basic Statistical and Modeling Procedures Using SAS

Lecture 8: Gamma regression

Simple example of collinearity in logistic regression

Multiple Linear Regression

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Personnalisez votre intérieur avec les revêtements imprimés ALYOS design

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

Outline. Dispersion Bush lupine survival Quasi-Binomial family

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Examining a Fitted Logistic Model

L3: Statistical Modeling with Hadoop

SAS Software to Fit the Generalized Linear Model

Section 6: Model Selection, Logistic Regression and more...

Introduction au BIM. ESEB Seyssinet-Pariset Economie de la construction contact@eseb.fr

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

Psychology 205: Research Methods in Psychology

Lucky vs. Unlucky Teams in Sports

Lecture 14: GLM Estimation and Logistic Regression

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION

Session 85 IF, Predictive Analytics for Actuaries: Free Tools for Life and Health Care Analytics--R and Python: A New Paradigm!

Unrealized Gains in Stocks from the Viewpoint of Investment Risk Management

Goodness of Fit Tests for Categorical Data: Comparing Stata, R and SAS

Statistical Models in R

1 Logistic Regression

ANOVA. February 12, 2015

Lecture 19: Conditional Logistic Regression

11. Analysis of Case-control Studies Logistic Regression

Stock Price Forecasting Using Information from Yahoo Finance and Google Trend

Local classification and local likelihoods

Chapter 7: Simple linear regression Learning Objectives

Advanced Software Engineering Agile Software Engineering. Version 1.0

Statistics 305: Introduction to Biostatistical Methods for Health Sciences

The Need For Speed. leads to PostgreSQL. Dimitri Fontaine 28 Mars 2013

Electronic Thesis and Dissertations UCLA

N-Way Analysis of Variance

Multinomial and Ordinal Logistic Regression

Exchange Rate Regime Analysis for the Chinese Yuan

Introduction ToIP/Asterisk Quelques applications Trixbox/FOP Autres distributions Conclusion. Asterisk et la ToIP. Projet tuteuré

Fondation Rennes 1. Atelier de l innovation. Fondation Rennes 1. Fondation Rennes 1 MANAGEMENT AGILE. Fondation Rennes 1 ET INNOVATION

International Statistical Institute, 56th Session, 2007: Phil Everson

Statistical Models in R

Website Report: To-Do Tasks: 22 SEO SCORE: 73 / 100. Add the exact keywords to this URL.

Using Stata for Categorical Data Analysis

Troncatures dans les modèles linéaires simples et à effets mixtes sous R

A survey analysis example

Chapter 6: Multivariate Cointegration Analysis

An Introduction to Categorical Data Analysis Using R

Implementation of SAP-GRC with the Pictet Group

Stockage distribué sous Linux

Binary Diagnostic Tests Two Independent Samples

VI. Introduction to Logistic Regression

EFFECTS OF FORCE LEVEL AND TIME LAPSES AFTER TASK PERFORMANCE ON THE ACCURACY OF FORCE MATCHING IN A POWER-GRIP TRACKING TASK

Calcul parallèle avec R

Correlation and Simple Linear Regression

Lecture 18: Logistic Regression Continued

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

RAPPORT FINANCIER ANNUEL PORTANT SUR LES COMPTES 2014

Comparing Nested Models


Liste d'adresses URL

FISHER CAM-H300C-3F CAM-H650C-3F CAM-H1300C-3F

Nonlinear Regression Functions. SW Ch 8 1/54/

Some Essential Statistics The Lure of Statistics

2 RENSEIGNEMENTS CONCERNANT L ASSURÉ SI CELUI-CI N EST PAS LE REQUÉRANT INFORMATION CONCERNING THE INSURED PERSON IF OTHER THAN THE APPLICANT

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Module 14: Missing Data Stata Practical

Spectrométrie d absorption moléculaire UV-visible

EARLY VS. LATE ENROLLERS: DOES ENROLLMENT PROCRASTINATION AFFECT ACADEMIC SUCCESS?

Statistics, Data Analysis & Econometrics

Insurance Fraud Estimation: More Evidence from the Quebec Automobile Insurance Industry. by Louis Caron and Georges Dionne

Lecture 25. December 19, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Note concernant votre accord de souscription au service «Trusted Certificate Service» (TCS)

Survival analysis methods in Insurance Applications in car insurance contracts

Nominal and ordinal logistic regression

GLM with a Gamma-distributed Dependent Variable

User Manual Culturethèque Thailand

First-half 2012 Results. August 29 th, Jean-Paul AGON. Chairman and CEO

Transcription:

Chapitre 16 Introduction à la statistique avec R Régression logistique : introduction

Une variable à expliquer binaire Expliquer un risque suicidaire élevé en prison par La durée de la peine L existence de mesures disciplinaire Des antécédents d abus dans l enfance

Les limites de la régression linéaire Haut risque de suicide = a + b duree + c discip + d abus + bruit

Les limites de la régression linéaire Haut risque de suicide = a + b duree + c discip + d abus + bruit La distribution du «bruit» est normale, donc «a + b duree + c discip + d abus + bruit» varie entre plus et moins l infini, or la variable «haut risque de suicide» est binaire

Les limites de la régression linéaire prob HR suicide oui Log prob HR suicide oui 1 ( ) a b duree c discip d abus

Les limites de la régression linéaire prob HR suicide oui Log prob HR suicide oui 1 ( ) a b duree c discip d abus Varie entre + et - l infini

Une seule variable explicative Log prob haut risque de suicide 1 prob haut risque de suicide = a + b abus

Une seule variable explicative Log prob haut risque de suicide 1 prob haut risque de suicide = a + b abus > mod1 <- glm(suicide.hr~abus, data=smp.l, family="binomial") > summary(mod1) Call: glm(formula = suicide.hr ~ abus, family = "binomial", data = smp.l) Deviance Residuals: Min 1Q Median 3Q Max -0.8446-0.6020-0.6020-0.6020 1.8959 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -1.6161 0.1154-14.003 < 2e-16 *** abus 0.7688 0.1897 4.052 5.07e-05 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 760.21 on 752 degrees of freedom Residual deviance: 744.26 on 751 degrees of freedom (46 observations deleted due to missingness) AIC: 748.26 Number of Fisher Scoring iterations: 4

Une seule variable explicative Log prob haut risque de suicide 1 prob haut risque de suicide = a + b abus > mod1 <- glm(suicide.hr~abus, data=smp.l, family="binomial") > summary(mod1) Call: glm(formula = suicide.hr ~ abus, family = "binomial", data = smp.l) Deviance Residuals: Min 1Q Median 3Q Max -0.8446-0.6020-0.6020-0.6020 1.8959 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -1.6161 0.1154-14.003 < 2e-16 *** abus 0.7688 0.1897 4.052 5.07e-05 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 760.21 on 752 degrees of freedom Residual deviance: 744.26 on 751 degrees of freedom (46 observations deleted due to missingness) AIC: 748.26 Number of Fisher Scoring iterations: 4

Une seule variable explicative Log prob haut risque de suicide 1 prob haut risque de suicide = a + b abus > mod1 <- glm(suicide.hr~abus, data=smp.l, family="binomial") > summary(mod1) Call: glm(formula = suicide.hr ~ abus, family = "binomial", data = smp.l) Deviance Residuals: Min 1Q Median 3Q Max -0.8446-0.6020-0.6020-0.6020 1.8959 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -1.6161 0.1154-14.003 < 2e-16 *** abus 0.7688 0.1897 4.052 5.07e-05 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 760.21 on 752 degrees of freedom Residual deviance: 744.26 on 751 degrees of freedom (46 observations deleted due to missingness) AIC: 748.26 Number of Fisher Scoring iterations: 4

Une seule variable explicative Log prob haut risque de suicide 1 prob haut risque de suicide = a + b abus > mod1 <- glm(suicide.hr~abus, data=smp.l, family="binomial") > summary(mod1) Call: glm(formula = suicide.hr ~ abus, family = "binomial", data = smp.l) Deviance Residuals: Min 1Q Median 3Q Max -0.8446-0.6020-0.6020-0.6020 1.8959 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -1.6161 0.1154-14.003 < 2e-16 *** abus 0.7688 0.1897 4.052 5.07e-05 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 760.21 on 752 degrees of freedom Residual deviance: 744.26 on 751 degrees of freedom (46 observations deleted due to missingness) AIC: 748.26 Number of Fisher Scoring iterations: 4

Une seule variable explicative Log prob haut risque de suicide 1 prob haut risque de suicide = a + b abus > mod1 <- glm(suicide.hr~abus, data=smp.l, family="binomial") > summary(mod1) Call: glm(formula = suicide.hr ~ abus, family = "binomial", data = smp.l) Deviance Residuals: Min 1Q Median 3Q Max -0.8446-0.6020-0.6020-0.6020 1.8959 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -1.6161 0.1154-14.003 < 2e-16 *** abus 0.7688 0.1897 4.052 5.07e-05 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 760.21 on 752 degrees of freedom Residual deviance: 744.26 on 751 degrees of freedom (46 observations deleted due to missingness) AIC: 748.26 Number of Fisher Scoring iterations: 4 > exp(0.7688) [1] 2.157176

Une seule variable explicative > exp(0.7688) [1] 2.157176 > library(epi) > twoby2(1-smp.l$suicide.hr, 1-smp.l$abus) 2 by 2 table analysis: ------------------------------------------------------ Outcome : 0 Comparing : 0 vs. 1 0 1 P(0) 95% conf. interval 0 63 90 0.4118 0.3366 0.4913 1 147 453 0.2450 0.2122 0.2810 95% conf. interval Relative Risk: 1.6807 1.3276 2.1276 Sample Odds Ratio: 2.1571 1.4873 3.1287 Conditional MLE Odds Ratio: 2.1547 1.4577 3.1764 Probability difference: 0.1668 0.0837 0.2525 Exact P-value: 1e-04 Asymptotic P-value: 1e-04 ------------------------------------------------------

Conclusion mod1 <- glm(suicide.hr~abus, data=smp.l, family="binomial") summary(mod1) exp(0,7688) library(epi) twoby2(1-smp.l$suicide.hr, 1-smp.l$abus)