BSTA 6651 Cat. Data Anal. Homework #3 Fall, 2011

Size: px
Start display at page:

Download "BSTA 6651 Cat. Data Anal. Homework #3 Fall, 2011"

Transcription

1 Problem 5.1 Table 5.11 shows the statistical output of logistic regression results for modeling the probability of remission of cancer using a labeling index (LI) explanatory variable. The following optional SAS code can be used to reproduce the results shown in Table 5.11 DATA PROB_5_1; Input LI N datalines; ; Run; PROC LOGISTIC Data=PROB_5_1; Model Remiss/N = LI /COVB; * The COVB option displays the covariance matrix; Output Out=Logit5 p=pi_hat Lower=Lower Upper=Upper; Run; * Lower and Upper are the confidence limits for Pi; PROC PRINT Data=Logit5(where=(LI in (8 10))); Run; * Display the Pi_Hat values for LI = 8 and LI = 10; 5.1a. The model being fit is Logit( π ) = α + β*li, where α = and β= are listed under Estimate in Table 5.11 To extract the values for π first rewrite this in terms of: θ = e α+β*li now find θ π = θ For LI = 8, ( LI = 8) = e * 8 now find ( LI = 8) θ = π = = b. Using the equations above we can find π (LI = 26) as: ( LI = 26) = e * 26 θ = π = = now find ( LI = 26) 5.1c. Using the formula from section we can calculate the rate of change in π as: π( LI) = βπ( LI) [ 1 π( LI) ] LI ( LI = 8) π LI = * [ ] = Page 1 of 7

2 ( LI = 26) π LI = * [ 0.5] = d. We can calculate π at the lower and upper quartiles of LI as : θ ( LI = 14) = e * 14 = now find ( LI = 14) π = = θ ( LI = 28) = e * = now find π ( LI = 28) = = This allows us to calculate the change in π over the middle half of the range of LI values as: π = = e. From part a, we can rewrite the model as: θ = e α+β*li = e α e β*li = e α e *LI For a unit change in LI, we can write LI* = LI + 1 and noting that e θ * = e α e *(LI+1) = e α e e *LI = 1.16*e α e *LI and thus θ * = 1.16 θ which shows that for a unit change in LI the odds ratio of remission changes by a multiplicative factor of e β = f. The 95% C.I. of β is (0.0593)=(0.0287,0.2611) and so the 95% C.I. of exp(β), θ, is (exp(.0287), exp(.2611))=(1.029, 1.298). ˆ 2 β g. The Wald test statistics for LI is χw = ( ) = ( ) = 5.96 and the upper tailed SE..( ˆ β ) probability of Chi-square with d.f. of 1 at 5.96 is , which is smaller than We therefore can conclude LI has a significant effect on the remission rate (at 5% significant level). 5.1h. Using the output at the top of Table 5.11 we can construct the Likelihood Ratio statistic using the values for -2LogL listed under Intercept Only (L 0 ) and Intercept and Covariates (L 1 ): -2(L 0 -L 1 ) = = This value agrees with the value shown in Table 5.11 for Likelihood Ratio under the section titled Testing Global Null Hypothesis: BETA=0. This test statistic is χ 2 and the df is 1 due to the addition of the single factor (LI) in the fitted model compared to the intercept only model. The p-value of (listed in Table 5.11 for the parameter Li) is highly significant showing we can reject Ho: β= i. Find the 95% C.I. for logit(π) first. The MLE of logit(π) is log it( ˆ π )= ˆ α + ˆ βli and so its (asymptotical) variance at LI=8 is ˆ 2 ˆ 2 var( ˆ α) + 2LI cov( ˆ α, β ) + LI var( β ) (8)( 0.077) + 8 (0.004) = Therefore, the 95% C.I. logit(π) at LI=8 is Page 2 of 7

3 ( ˆ α + ˆ β LI) ± 1.96S.E.( ˆ α + ˆ β LI) = (8) ± = ( 4.505, 0.735). Converting the logit function we get the 95% C.I. for π is 1 1 exp( 4.505) exp( 0.735) (logit ( 4.505), logit ( 0.735)) = (, ) = (0.01, 0.32). exp( 4.505) exp( 0.735) Problem 5.2 The data in Table 5.12 shows the flight number, (Ft), temperature (Temp, of), and o-ring thermal distress response (TD: 1=yes; 0=no) for 23 space shuttle flights prior to the Challenger disaster in The data is based on Table 1 in J. Amer. Statist. Assoc., 84: 945_957, 1989., by S. R. Dalal, E. B. Fowlkes, and B. Hoadley. 5.2 a. (The SAS code used to produce the following results can be found in the appendix.) PROC LOGISTIC was used to model the effect of temperature on the probability of thermal distress in O-rings. The fitted model obtained was: Logit( π ) = * Temperature A plot of π across the range of temperatures is given in figure 1, which shows that as the temperature increases the probability of thermal distress decreases. We also know that the steepest decreasing rate of the probability (of thermal distress) occurs at which corresponds to a temperature value of. Figure 1. Plot for problem 5.2 showing the predicted probability of Thermal Distress for a range of temperatures. Page 3 of 7

4 5.2b. Therefore, the probability of thermal distress at 31 is 99.96%. Also, this ( 31F) is an extrapolation beyond the data range of temperature, which is not recommended. 5.2c. Shown below is the SAS output for PROC GENMOD detailing the parameter estimates. The confidence interval for the effect of temperature on the odds of thermal distress can be obtained from the Walds 95% confidence interval for the β value. Therefore, the confidence interval is given by ( ). With a Chi Square value of 4.6 and df =1, the p- value is given as 0.032, therefore, the null hypothesis of H 0 : β=0 is rejected at the 5% significance level. Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept Temp If you use PROC LOGISTIC and the CLODDS=PL model option we obtain the point estimate of with a 95% profile Likelihood confidence interval of (0.597, 0.941) for the effect on the odds of TD per 1 of change in temperature. Thus a 1 of temperature increase will reduce the odds of TD to a point between 59.7% and 94.1% of the original odds. The CLPARM=PL option gives the 95% profile Likelihood confidence intervals for the model parameters. 5.2d. If we re-run the analysis using a more complex model, model d: Logit( π ) = α + β 1 * Temperature + β 2 * Temperature 2 We obtain an -2LogL value for this more complex model of The log likelihood for this model is and we know from earlier result (part a) that the log likelihood for the linear term only (model a) is The likelihood ratio statistic is given by This follows a Chi-square distribution with 1 df, and its p-value from the table is Based on this p value, we can conclude that adding the quadratic term will not improve the goodness of fit significantly. Problem 5.9 The output in Table 5.14 shows the result of fitting a logit model to the death penalty data in Table 2.6. Let def be the defendant s race and vic be the victim s race. The fitted model is then: Logit( π ) = *def *vic. 5.9a. Since the def coefficient is negative and the vic coefficient is larger and positive, we conclude that cases with a white victim (vic = 1) and a black defendant (def = 0) will have the highest probability, which is Page 4 of 7

5 Changing the defendant from black to white (0 to 1) changes the odds of the death penalty by a multiplicative factor of e = Similarly, changing the victim from black to white (0 to 1) changes the odds of the death penalty by a multiplicative factor of e = b.Since there is no interaction between def and vic in the model, the conditional odds ratios for def are the same for black and white victims. The 95% confidence interval for the conditional odds ratio for def is (e , e ) = (0.21, 0.89) and for the conditional odds ratio for vic is (e , e ) = (3.69, 41.16). The size of the confidence interval for the victim is substantially larger than the CI for the defendant. Due to the logarithmic relationship, the CIs are not centered about the estimates. These C.I. s can be interpreted as follows. Controlling for victims race the odds of death penalty when the defendant was white is between exp ( ) =0.209 and exp (-0.114) =0.892 times the odds when the defendant was black. Likewise, controlling for defendants race, the odds of death penalty when the victim was white is between exp (1.3068) =3.69 and exp (3.7175) =41.16 times the odds when the victim was black. 5.9c. The hypothesis to test for conditional independence of defendant s race and death penalty controlling for victim s race is H 0 : β 1 =0. (i) Wald test = /SE)^2 = ( /0.3671)^2 = 5.59 (ii) The Chi_sq for LR test is given as 5.01 and is comparable to the Wald test value both give small the p-values (<0.05), and hence we reject the null hypothesis and conclude there is an significant effect of defendant s race on death penalty. 5.9d. The deviance G 2 = and Pearson χ 2 = both with df = 1 have p-values of 0.54 and 0.66, respectively and both show that we fail to reject Ho and so that the fit is reasonable. Problem 5.15 Table 5.17, repeated below, shows the parameter estimates for the logistic regression model for esophageal cancer. The model is: Logit( π ) = α + β 1 A + β 2 S + β 3 R + β 4 RS Variable Effect P-value Intercept -7.0 <0.01 Alcohol use (A) Smoking (S) 1.2 <0.01 Race (R) Race X smoking (RS) Based on the parameter estimates, the fitted model is: When we have to consider blacks, that is R=1, the above equation upon the substitution becomes : When R=0: Page 5 of 7

6 The YS conditional odds ratio is given by exp(1.4)=4.055 for blacks and exp(1.2)=3.32 for whites. The model equation when S=1 is given by: The model equation when S=0 is given by: The YR conditional odds ratio is given by exp(0.5)= 1.65 for S= 1 and exp(0.3)= 1.35 for S=0. When R=0, RxS is also zero and so the coefficient of S of the fitted equation is the log odds ratio between Y and S for whites (R=0). Similarly, the coefficient of R is the log odds ratio between Y and R for S=0. Therefore, the p-values for R and S are testing the null hypothesis of no effect of R on Y given S=0 and of no effect of S on Y given whites, respectively. Problem 5.18 (a) From the computer output on Table 5.1, the fitted model is: log(odds) = width. i) At width of 26.3 cm, the estimated odds is exp( x26.3)=2.07 ii) At width of 27.3 cm, the estimated odds is exp( x27.3)=3.40 iii) The odds ratio of 27.3 cm to 26.3 cm is therefore 3.40/2.07=1.64. Therefore, the odds increase by 64% as the width increases from 26.3 cm to 27.3 cm. (b) The 95% confidence interval (C.I.) for slope parameter β is (0.3084, ). The instant change rate of probability of having satellites is βπ(1 π), which equals.25β at π=0.5. Therefore, the 95% C.I. of the instant change rate of π when it is at 0.5 is 0.25(0.3084, )=(0.07,0.17). Page 6 of 7

7 BSTA 6651 Cat. Data Anal. Homework #2 Fall, 2011 Dr. Fan Appendix ************************ SAS code for problem 5.2. ***************************; DATA PROB_5_2; Input Ft Temp TD label TD = 'Thermal Distress (1=Yes, 0=No)'; datalines; ; * NOTE: Added Obs #24 is to produce the Pi_Hat values for Temp = 31 of. Run; PROC LOGISTIC Data=PROB_5_2 Descending; Model TD = Temp / CLODDS=PL CLPARM=PL; Output Out=Logit2 p=pi_hat Lower=Lower Upper=Upper; Run; *Lower and Upper are the confidence limits for Pi; PROC PRINT Data=Logit2(where=(Temp in (31))); Run; * Display the Pi_Hat values for Temp = 31 of; PROC SORT Data=Logit2; By Temp Ft; Run; * Sort data for plotting.; * Setup plotting symbols and axes definitions to improve plot appearance. ; *************************************************************************; Symbol1 value=dot h=1.2 i=spline w=2 c=blue l=1; *Dot symbol with spline through points. ; Symbol2 value= + h=1.5 i=join w=1.5 c=red l=42; *Circle symbol with dashed line.; Axis1 label=(angle=90 f=swissb height=1.5 'Estimated Probability') value=(height=1.5); run; Axis2 label=(f=swissb height=1.5 "Temperature (of)") value=(height=1.5); run; Legend1 label=(height=1.5 "Key:") value=( height=1.5); PROC GPLOT Data = Logit2(Where=(Temp GT 50)); Plot (Pi_Hat TD) * Temp / GRID vaxis=axis1 haxis=axis2 Overlay Legend=Legend1; Run; Quit; Page 7 of 7

VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

More information

Lecture 14: GLM Estimation and Logistic Regression

Lecture 14: GLM Estimation and Logistic Regression Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables. SIMPLE LINEAR CORRELATION Simple linear correlation is a measure of the degree to which two variables vary together, or a measure of the intensity of the association between two variables. Correlation

More information

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

More information

Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

More information

Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

More information

III. INTRODUCTION TO LOGISTIC REGRESSION. a) Example: APACHE II Score and Mortality in Sepsis

III. INTRODUCTION TO LOGISTIC REGRESSION. a) Example: APACHE II Score and Mortality in Sepsis III. INTRODUCTION TO LOGISTIC REGRESSION 1. Simple Logistic Regression a) Example: APACHE II Score and Mortality in Sepsis The following figure shows 30 day mortality in a sample of septic patients as

More information

LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

More information

Logit Models for Binary Data

Logit Models for Binary Data Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 libname in1 >c:\=; Data first; Set in1.extract; A=1; PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA; OUTPUT OUT=CC XBETA=XB P=PROB; MODEL

More information

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln Log-Rank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)

More information

Chapter 29 The GENMOD Procedure. Chapter Table of Contents

Chapter 29 The GENMOD Procedure. Chapter Table of Contents Chapter 29 The GENMOD Procedure Chapter Table of Contents OVERVIEW...1365 WhatisaGeneralizedLinearModel?...1366 ExamplesofGeneralizedLinearModels...1367 TheGENMODProcedure...1368 GETTING STARTED...1370

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Multivariate Logistic Regression

Multivariate Logistic Regression 1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

208-25 LEGEND OPTIONS USING MULTIPLE PLOT STATEMENTS IN PROC GPLOT

208-25 LEGEND OPTIONS USING MULTIPLE PLOT STATEMENTS IN PROC GPLOT Paper 28-25 LEGEND OPTIONS USING MULTIPLE PLOT STATEMENTS IN PROC GPLOT Julie W. Pepe, University of Central Florida, Orlando, FL ABSTRACT A graph with both left and right vertical axes is easy to construct

More information

Ordinal Regression. Chapter

Ordinal Regression. Chapter Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

More information

13. Poisson Regression Analysis

13. Poisson Regression Analysis 136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Lecture 18: Logistic Regression Continued

Lecture 18: Logistic Regression Continued Lecture 18: Logistic Regression Continued Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina

More information

LOGIT AND PROBIT ANALYSIS

LOGIT AND PROBIT ANALYSIS LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y

More information

Factors affecting online sales

Factors affecting online sales Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

More information

Logistic (RLOGIST) Example #1

Logistic (RLOGIST) Example #1 Logistic (RLOGIST) Example #1 SUDAAN Statements and Results Illustrated EFFECTS RFORMAT, RLABEL REFLEVEL EXP option on MODEL statement Hosmer-Lemeshow Test Input Data Set(s): BRFWGT.SAS7bdat Example Using

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a

More information

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Simple Linear Regression, Scatterplots, and Bivariate Correlation 1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.

More information

Two Correlated Proportions (McNemar Test)

Two Correlated Proportions (McNemar Test) Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Statistics 305: Introduction to Biostatistical Methods for Health Sciences

Statistics 305: Introduction to Biostatistical Methods for Health Sciences Statistics 305: Introduction to Biostatistical Methods for Health Sciences Modelling the Log Odds Logistic Regression (Chap 20) Instructor: Liangliang Wang Statistics and Actuarial Science, Simon Fraser

More information

Discrete Distributions

Discrete Distributions Discrete Distributions Chapter 1 11 Introduction 1 12 The Binomial Distribution 2 13 The Poisson Distribution 8 14 The Multinomial Distribution 11 15 Negative Binomial and Negative Multinomial Distributions

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

Using R for Linear Regression

Using R for Linear Regression Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

More information

Logistic Regression. http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests

Logistic Regression. http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests Logistic Regression http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests Overview Binary (or binomial) logistic regression is a form of regression which is used when the dependent is a dichotomy

More information

Independent t- Test (Comparing Two Means)

Independent t- Test (Comparing Two Means) Independent t- Test (Comparing Two Means) The objectives of this lesson are to learn: the definition/purpose of independent t-test when to use the independent t-test the use of SPSS to complete an independent

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Nominal and Real U.S. GDP 1960-2001

Nominal and Real U.S. GDP 1960-2001 Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 318- Managerial Economics Use the data set for gross domestic product (gdp.xls) to answer the following questions. (1) Show graphically

More information

Logit and Probit. Brad Jones 1. April 21, 2009. University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science

Logit and Probit. Brad Jones 1. April 21, 2009. University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science Logit and Probit Brad 1 1 Department of Political Science University of California, Davis April 21, 2009 Logit, redux Logit resolves the functional form problem (in terms of the response function in the

More information

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Odds ratio, Odds ratio test for independence, chi-squared statistic. Odds ratio, Odds ratio test for independence, chi-squared statistic. Announcements: Assignment 5 is live on webpage. Due Wed Aug 1 at 4:30pm. (9 days, 1 hour, 58.5 minutes ) Final exam is Aug 9. Review

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Discussion Section 4 ECON 139/239 2010 Summer Term II

Discussion Section 4 ECON 139/239 2010 Summer Term II Discussion Section 4 ECON 139/239 2010 Summer Term II 1. Let s use the CollegeDistance.csv data again. (a) An education advocacy group argues that, on average, a person s educational attainment would increase

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

More information

containing Kendall correlations; and the OUTH = option will create a data set containing Hoeffding statistics.

containing Kendall correlations; and the OUTH = option will create a data set containing Hoeffding statistics. Getting Correlations Using PROC CORR Correlation analysis provides a method to measure the strength of a linear relationship between two numeric variables. PROC CORR can be used to compute Pearson product-moment

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

ABSTRACT INTRODUCTION

ABSTRACT INTRODUCTION Paper SP03-2009 Illustrative Logistic Regression Examples using PROC LOGISTIC: New Features in SAS/STAT 9.2 Robert G. Downer, Grand Valley State University, Allendale, MI Patrick J. Richardson, Van Andel

More information

ln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking

ln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking Dummy Coding for Dummies Kathryn Martin, Maternal, Child and Adolescent Health Program, California Department of Public Health ABSTRACT There are a number of ways to incorporate categorical variables into

More information

Lecture 19: Conditional Logistic Regression

Lecture 19: Conditional Logistic Regression Lecture 19: Conditional Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

More information

Statistics and Data Analysis

Statistics and Data Analysis NESUG 27 PRO LOGISTI: The Logistics ehind Interpreting ategorical Variable Effects Taylor Lewis, U.S. Office of Personnel Management, Washington, D STRT The goal of this paper is to demystify how SS models

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

Cool Tools for PROC LOGISTIC

Cool Tools for PROC LOGISTIC Cool Tools for PROC LOGISTIC Paul D. Allison Statistical Horizons LLC and the University of Pennsylvania March 2013 www.statisticalhorizons.com 1 New Features in LOGISTIC ODDSRATIO statement EFFECTPLOT

More information

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form. One-Degree-of-Freedom Tests Test for group occasion interactions has (number of groups 1) number of occasions 1) degrees of freedom. This can dilute the significance of a departure from the null hypothesis.

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

More information

SUGI 29 Statistics and Data Analysis

SUGI 29 Statistics and Data Analysis Paper 194-29 Head of the CLASS: Impress your colleagues with a superior understanding of the CLASS statement in PROC LOGISTIC Michelle L. Pritchard and David J. Pasta Ovation Research Group, San Francisco,

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

Algebra II End of Course Exam Answer Key Segment I. Scientific Calculator Only

Algebra II End of Course Exam Answer Key Segment I. Scientific Calculator Only Algebra II End of Course Exam Answer Key Segment I Scientific Calculator Only Question 1 Reporting Category: Algebraic Concepts & Procedures Common Core Standard: A-APR.3: Identify zeros of polynomials

More information

Poisson Regression or Regression of Counts (& Rates)

Poisson Regression or Regression of Counts (& Rates) Poisson Regression or Regression of (& Rates) Carolyn J. Anderson Department of Educational Psychology University of Illinois at Urbana-Champaign Generalized Linear Models Slide 1 of 51 Outline Outline

More information

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Logistic regression generalizes methods for 2-way tables Adds capability studying several predictors, but Limited to

More information

Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

More information

Pearson s Correlation

Pearson s Correlation Pearson s Correlation Correlation the degree to which two variables are associated (co-vary). Covariance may be either positive or negative. Its magnitude depends on the units of measurement. Assumes the

More information

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

From the help desk: Swamy s random-coefficients model

From the help desk: Swamy s random-coefficients model The Stata Journal (2003) 3, Number 3, pp. 302 308 From the help desk: Swamy s random-coefficients model Brian P. Poi Stata Corporation Abstract. This article discusses the Swamy (1970) random-coefficients

More information

MORE ON LOGISTIC REGRESSION

MORE ON LOGISTIC REGRESSION DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 816 MORE ON LOGISTIC REGRESSION I. AGENDA: A. Logistic regression 1. Multiple independent variables 2. Example: The Bell Curve 3. Evaluation

More information

Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

More information

MATH 60 NOTEBOOK CERTIFICATIONS

MATH 60 NOTEBOOK CERTIFICATIONS MATH 60 NOTEBOOK CERTIFICATIONS Chapter #1: Integers and Real Numbers 1.1a 1.1b 1.2 1.3 1.4 1.8 Chapter #2: Algebraic Expressions, Linear Equations, and Applications 2.1a 2.1b 2.1c 2.2 2.3a 2.3b 2.4 2.5

More information

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship

More information

Categorical Data Analysis

Categorical Data Analysis Richard L. Scheaffer University of Florida The reference material and many examples for this section are based on Chapter 8, Analyzing Association Between Categorical Variables, from Statistical Methods

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

More information

Regression step-by-step using Microsoft Excel

Regression step-by-step using Microsoft Excel Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

More information

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

Tests for Two Survival Curves Using Cox s Proportional Hazards Model Chapter 730 Tests for Two Survival Curves Using Cox s Proportional Hazards Model Introduction A clinical trial is often employed to test the equality of survival distributions of two treatment groups.

More information

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2 University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

More information

Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

Review of Fundamental Mathematics

Review of Fundamental Mathematics Review of Fundamental Mathematics As explained in the Preface and in Chapter 1 of your textbook, managerial economics applies microeconomic theory to business decision making. The decision-making tools

More information

The KaleidaGraph Guide to Curve Fitting

The KaleidaGraph Guide to Curve Fitting The KaleidaGraph Guide to Curve Fitting Contents Chapter 1 Curve Fitting Overview 1.1 Purpose of Curve Fitting... 5 1.2 Types of Curve Fits... 5 Least Squares Curve Fits... 5 Nonlinear Curve Fits... 6

More information

Statistics in Retail Finance. Chapter 2: Statistical models of default

Statistics in Retail Finance. Chapter 2: Statistical models of default Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision

More information

SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

More information

Dongfeng Li. Autumn 2010

Dongfeng Li. Autumn 2010 Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

EQUATIONS and INEQUALITIES

EQUATIONS and INEQUALITIES EQUATIONS and INEQUALITIES Linear Equations and Slope 1. Slope a. Calculate the slope of a line given two points b. Calculate the slope of a line parallel to a given line. c. Calculate the slope of a line

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015

Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Note: This handout assumes you understand factor variables,

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

Comparison of sales forecasting models for an innovative agro-industrial product: Bass model versus logistic function

Comparison of sales forecasting models for an innovative agro-industrial product: Bass model versus logistic function The Empirical Econometrics and Quantitative Economics Letters ISSN 2286 7147 EEQEL all rights reserved Volume 1, Number 4 (December 2012), pp. 89 106. Comparison of sales forecasting models for an innovative

More information

Distribution (Weibull) Fitting

Distribution (Weibull) Fitting Chapter 550 Distribution (Weibull) Fitting Introduction This procedure estimates the parameters of the exponential, extreme value, logistic, log-logistic, lognormal, normal, and Weibull probability distributions

More information