BSTA 6651 Cat. Data Anal. Homework #3 Fall, 2011
|
|
- Merry Higgins
- 7 years ago
- Views:
Transcription
1 Problem 5.1 Table 5.11 shows the statistical output of logistic regression results for modeling the probability of remission of cancer using a labeling index (LI) explanatory variable. The following optional SAS code can be used to reproduce the results shown in Table 5.11 DATA PROB_5_1; Input LI N datalines; ; Run; PROC LOGISTIC Data=PROB_5_1; Model Remiss/N = LI /COVB; * The COVB option displays the covariance matrix; Output Out=Logit5 p=pi_hat Lower=Lower Upper=Upper; Run; * Lower and Upper are the confidence limits for Pi; PROC PRINT Data=Logit5(where=(LI in (8 10))); Run; * Display the Pi_Hat values for LI = 8 and LI = 10; 5.1a. The model being fit is Logit( π ) = α + β*li, where α = and β= are listed under Estimate in Table 5.11 To extract the values for π first rewrite this in terms of: θ = e α+β*li now find θ π = θ For LI = 8, ( LI = 8) = e * 8 now find ( LI = 8) θ = π = = b. Using the equations above we can find π (LI = 26) as: ( LI = 26) = e * 26 θ = π = = now find ( LI = 26) 5.1c. Using the formula from section we can calculate the rate of change in π as: π( LI) = βπ( LI) [ 1 π( LI) ] LI ( LI = 8) π LI = * [ ] = Page 1 of 7
2 ( LI = 26) π LI = * [ 0.5] = d. We can calculate π at the lower and upper quartiles of LI as : θ ( LI = 14) = e * 14 = now find ( LI = 14) π = = θ ( LI = 28) = e * = now find π ( LI = 28) = = This allows us to calculate the change in π over the middle half of the range of LI values as: π = = e. From part a, we can rewrite the model as: θ = e α+β*li = e α e β*li = e α e *LI For a unit change in LI, we can write LI* = LI + 1 and noting that e θ * = e α e *(LI+1) = e α e e *LI = 1.16*e α e *LI and thus θ * = 1.16 θ which shows that for a unit change in LI the odds ratio of remission changes by a multiplicative factor of e β = f. The 95% C.I. of β is (0.0593)=(0.0287,0.2611) and so the 95% C.I. of exp(β), θ, is (exp(.0287), exp(.2611))=(1.029, 1.298). ˆ 2 β g. The Wald test statistics for LI is χw = ( ) = ( ) = 5.96 and the upper tailed SE..( ˆ β ) probability of Chi-square with d.f. of 1 at 5.96 is , which is smaller than We therefore can conclude LI has a significant effect on the remission rate (at 5% significant level). 5.1h. Using the output at the top of Table 5.11 we can construct the Likelihood Ratio statistic using the values for -2LogL listed under Intercept Only (L 0 ) and Intercept and Covariates (L 1 ): -2(L 0 -L 1 ) = = This value agrees with the value shown in Table 5.11 for Likelihood Ratio under the section titled Testing Global Null Hypothesis: BETA=0. This test statistic is χ 2 and the df is 1 due to the addition of the single factor (LI) in the fitted model compared to the intercept only model. The p-value of (listed in Table 5.11 for the parameter Li) is highly significant showing we can reject Ho: β= i. Find the 95% C.I. for logit(π) first. The MLE of logit(π) is log it( ˆ π )= ˆ α + ˆ βli and so its (asymptotical) variance at LI=8 is ˆ 2 ˆ 2 var( ˆ α) + 2LI cov( ˆ α, β ) + LI var( β ) (8)( 0.077) + 8 (0.004) = Therefore, the 95% C.I. logit(π) at LI=8 is Page 2 of 7
3 ( ˆ α + ˆ β LI) ± 1.96S.E.( ˆ α + ˆ β LI) = (8) ± = ( 4.505, 0.735). Converting the logit function we get the 95% C.I. for π is 1 1 exp( 4.505) exp( 0.735) (logit ( 4.505), logit ( 0.735)) = (, ) = (0.01, 0.32). exp( 4.505) exp( 0.735) Problem 5.2 The data in Table 5.12 shows the flight number, (Ft), temperature (Temp, of), and o-ring thermal distress response (TD: 1=yes; 0=no) for 23 space shuttle flights prior to the Challenger disaster in The data is based on Table 1 in J. Amer. Statist. Assoc., 84: 945_957, 1989., by S. R. Dalal, E. B. Fowlkes, and B. Hoadley. 5.2 a. (The SAS code used to produce the following results can be found in the appendix.) PROC LOGISTIC was used to model the effect of temperature on the probability of thermal distress in O-rings. The fitted model obtained was: Logit( π ) = * Temperature A plot of π across the range of temperatures is given in figure 1, which shows that as the temperature increases the probability of thermal distress decreases. We also know that the steepest decreasing rate of the probability (of thermal distress) occurs at which corresponds to a temperature value of. Figure 1. Plot for problem 5.2 showing the predicted probability of Thermal Distress for a range of temperatures. Page 3 of 7
4 5.2b. Therefore, the probability of thermal distress at 31 is 99.96%. Also, this ( 31F) is an extrapolation beyond the data range of temperature, which is not recommended. 5.2c. Shown below is the SAS output for PROC GENMOD detailing the parameter estimates. The confidence interval for the effect of temperature on the odds of thermal distress can be obtained from the Walds 95% confidence interval for the β value. Therefore, the confidence interval is given by ( ). With a Chi Square value of 4.6 and df =1, the p- value is given as 0.032, therefore, the null hypothesis of H 0 : β=0 is rejected at the 5% significance level. Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept Temp If you use PROC LOGISTIC and the CLODDS=PL model option we obtain the point estimate of with a 95% profile Likelihood confidence interval of (0.597, 0.941) for the effect on the odds of TD per 1 of change in temperature. Thus a 1 of temperature increase will reduce the odds of TD to a point between 59.7% and 94.1% of the original odds. The CLPARM=PL option gives the 95% profile Likelihood confidence intervals for the model parameters. 5.2d. If we re-run the analysis using a more complex model, model d: Logit( π ) = α + β 1 * Temperature + β 2 * Temperature 2 We obtain an -2LogL value for this more complex model of The log likelihood for this model is and we know from earlier result (part a) that the log likelihood for the linear term only (model a) is The likelihood ratio statistic is given by This follows a Chi-square distribution with 1 df, and its p-value from the table is Based on this p value, we can conclude that adding the quadratic term will not improve the goodness of fit significantly. Problem 5.9 The output in Table 5.14 shows the result of fitting a logit model to the death penalty data in Table 2.6. Let def be the defendant s race and vic be the victim s race. The fitted model is then: Logit( π ) = *def *vic. 5.9a. Since the def coefficient is negative and the vic coefficient is larger and positive, we conclude that cases with a white victim (vic = 1) and a black defendant (def = 0) will have the highest probability, which is Page 4 of 7
5 Changing the defendant from black to white (0 to 1) changes the odds of the death penalty by a multiplicative factor of e = Similarly, changing the victim from black to white (0 to 1) changes the odds of the death penalty by a multiplicative factor of e = b.Since there is no interaction between def and vic in the model, the conditional odds ratios for def are the same for black and white victims. The 95% confidence interval for the conditional odds ratio for def is (e , e ) = (0.21, 0.89) and for the conditional odds ratio for vic is (e , e ) = (3.69, 41.16). The size of the confidence interval for the victim is substantially larger than the CI for the defendant. Due to the logarithmic relationship, the CIs are not centered about the estimates. These C.I. s can be interpreted as follows. Controlling for victims race the odds of death penalty when the defendant was white is between exp ( ) =0.209 and exp (-0.114) =0.892 times the odds when the defendant was black. Likewise, controlling for defendants race, the odds of death penalty when the victim was white is between exp (1.3068) =3.69 and exp (3.7175) =41.16 times the odds when the victim was black. 5.9c. The hypothesis to test for conditional independence of defendant s race and death penalty controlling for victim s race is H 0 : β 1 =0. (i) Wald test = /SE)^2 = ( /0.3671)^2 = 5.59 (ii) The Chi_sq for LR test is given as 5.01 and is comparable to the Wald test value both give small the p-values (<0.05), and hence we reject the null hypothesis and conclude there is an significant effect of defendant s race on death penalty. 5.9d. The deviance G 2 = and Pearson χ 2 = both with df = 1 have p-values of 0.54 and 0.66, respectively and both show that we fail to reject Ho and so that the fit is reasonable. Problem 5.15 Table 5.17, repeated below, shows the parameter estimates for the logistic regression model for esophageal cancer. The model is: Logit( π ) = α + β 1 A + β 2 S + β 3 R + β 4 RS Variable Effect P-value Intercept -7.0 <0.01 Alcohol use (A) Smoking (S) 1.2 <0.01 Race (R) Race X smoking (RS) Based on the parameter estimates, the fitted model is: When we have to consider blacks, that is R=1, the above equation upon the substitution becomes : When R=0: Page 5 of 7
6 The YS conditional odds ratio is given by exp(1.4)=4.055 for blacks and exp(1.2)=3.32 for whites. The model equation when S=1 is given by: The model equation when S=0 is given by: The YR conditional odds ratio is given by exp(0.5)= 1.65 for S= 1 and exp(0.3)= 1.35 for S=0. When R=0, RxS is also zero and so the coefficient of S of the fitted equation is the log odds ratio between Y and S for whites (R=0). Similarly, the coefficient of R is the log odds ratio between Y and R for S=0. Therefore, the p-values for R and S are testing the null hypothesis of no effect of R on Y given S=0 and of no effect of S on Y given whites, respectively. Problem 5.18 (a) From the computer output on Table 5.1, the fitted model is: log(odds) = width. i) At width of 26.3 cm, the estimated odds is exp( x26.3)=2.07 ii) At width of 27.3 cm, the estimated odds is exp( x27.3)=3.40 iii) The odds ratio of 27.3 cm to 26.3 cm is therefore 3.40/2.07=1.64. Therefore, the odds increase by 64% as the width increases from 26.3 cm to 27.3 cm. (b) The 95% confidence interval (C.I.) for slope parameter β is (0.3084, ). The instant change rate of probability of having satellites is βπ(1 π), which equals.25β at π=0.5. Therefore, the 95% C.I. of the instant change rate of π when it is at 0.5 is 0.25(0.3084, )=(0.07,0.17). Page 6 of 7
7 BSTA 6651 Cat. Data Anal. Homework #2 Fall, 2011 Dr. Fan Appendix ************************ SAS code for problem 5.2. ***************************; DATA PROB_5_2; Input Ft Temp TD label TD = 'Thermal Distress (1=Yes, 0=No)'; datalines; ; * NOTE: Added Obs #24 is to produce the Pi_Hat values for Temp = 31 of. Run; PROC LOGISTIC Data=PROB_5_2 Descending; Model TD = Temp / CLODDS=PL CLPARM=PL; Output Out=Logit2 p=pi_hat Lower=Lower Upper=Upper; Run; *Lower and Upper are the confidence limits for Pi; PROC PRINT Data=Logit2(where=(Temp in (31))); Run; * Display the Pi_Hat values for Temp = 31 of; PROC SORT Data=Logit2; By Temp Ft; Run; * Sort data for plotting.; * Setup plotting symbols and axes definitions to improve plot appearance. ; *************************************************************************; Symbol1 value=dot h=1.2 i=spline w=2 c=blue l=1; *Dot symbol with spline through points. ; Symbol2 value= + h=1.5 i=join w=1.5 c=red l=42; *Circle symbol with dashed line.; Axis1 label=(angle=90 f=swissb height=1.5 'Estimated Probability') value=(height=1.5); run; Axis2 label=(f=swissb height=1.5 "Temperature (of)") value=(height=1.5); run; Legend1 label=(height=1.5 "Key:") value=( height=1.5); PROC GPLOT Data = Logit2(Where=(Temp GT 50)); Plot (Pi_Hat TD) * Temp / GRID vaxis=axis1 haxis=axis2 Overlay Legend=Legend1; Run; Quit; Page 7 of 7
VI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
More informationLecture 14: GLM Estimation and Logistic Regression
Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationSIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.
SIMPLE LINEAR CORRELATION Simple linear correlation is a measure of the degree to which two variables vary together, or a measure of the intensity of the association between two variables. Correlation
More informationOverview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
More informationBasic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
More informationGeneralized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
More informationMultinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
More informationIII. INTRODUCTION TO LOGISTIC REGRESSION. a) Example: APACHE II Score and Mortality in Sepsis
III. INTRODUCTION TO LOGISTIC REGRESSION 1. Simple Logistic Regression a) Example: APACHE II Score and Mortality in Sepsis The following figure shows 30 day mortality in a sample of septic patients as
More informationLOGISTIC REGRESSION ANALYSIS
LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic
More informationLogit Models for Binary Data
Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More informationUsing An Ordered Logistic Regression Model with SAS Vartanian: SW 541
Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 libname in1 >c:\=; Data first; Set in1.extract; A=1; PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA; OUTPUT OUT=CC XBETA=XB P=PROB; MODEL
More informationSurvey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups
Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln Log-Rank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)
More informationChapter 29 The GENMOD Procedure. Chapter Table of Contents
Chapter 29 The GENMOD Procedure Chapter Table of Contents OVERVIEW...1365 WhatisaGeneralizedLinearModel?...1366 ExamplesofGeneralizedLinearModels...1367 TheGENMODProcedure...1368 GETTING STARTED...1370
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More information208-25 LEGEND OPTIONS USING MULTIPLE PLOT STATEMENTS IN PROC GPLOT
Paper 28-25 LEGEND OPTIONS USING MULTIPLE PLOT STATEMENTS IN PROC GPLOT Julie W. Pepe, University of Central Florida, Orlando, FL ABSTRACT A graph with both left and right vertical axes is easy to construct
More informationOrdinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
More information13. Poisson Regression Analysis
136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationLecture 18: Logistic Regression Continued
Lecture 18: Logistic Regression Continued Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina
More informationLOGIT AND PROBIT ANALYSIS
LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y
More informationFactors affecting online sales
Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4
More informationLogistic (RLOGIST) Example #1
Logistic (RLOGIST) Example #1 SUDAAN Statements and Results Illustrated EFFECTS RFORMAT, RLABEL REFLEVEL EXP option on MODEL statement Hosmer-Lemeshow Test Input Data Set(s): BRFWGT.SAS7bdat Example Using
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a
More informationSimple Linear Regression, Scatterplots, and Bivariate Correlation
1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.
More informationTwo Correlated Proportions (McNemar Test)
Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationStatistics 305: Introduction to Biostatistical Methods for Health Sciences
Statistics 305: Introduction to Biostatistical Methods for Health Sciences Modelling the Log Odds Logistic Regression (Chap 20) Instructor: Liangliang Wang Statistics and Actuarial Science, Simon Fraser
More informationDiscrete Distributions
Discrete Distributions Chapter 1 11 Introduction 1 12 The Binomial Distribution 2 13 The Poisson Distribution 8 14 The Multinomial Distribution 11 15 Negative Binomial and Negative Multinomial Distributions
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
More informationUsing R for Linear Regression
Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of
More informationLogistic Regression. http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests
Logistic Regression http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests Overview Binary (or binomial) logistic regression is a form of regression which is used when the dependent is a dichotomy
More informationIndependent t- Test (Comparing Two Means)
Independent t- Test (Comparing Two Means) The objectives of this lesson are to learn: the definition/purpose of independent t-test when to use the independent t-test the use of SPSS to complete an independent
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationNominal and Real U.S. GDP 1960-2001
Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 318- Managerial Economics Use the data set for gross domestic product (gdp.xls) to answer the following questions. (1) Show graphically
More informationLogit and Probit. Brad Jones 1. April 21, 2009. University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science
Logit and Probit Brad 1 1 Department of Political Science University of California, Davis April 21, 2009 Logit, redux Logit resolves the functional form problem (in terms of the response function in the
More informationOdds ratio, Odds ratio test for independence, chi-squared statistic.
Odds ratio, Odds ratio test for independence, chi-squared statistic. Announcements: Assignment 5 is live on webpage. Due Wed Aug 1 at 4:30pm. (9 days, 1 hour, 58.5 minutes ) Final exam is Aug 9. Review
More informationChapter 10. Key Ideas Correlation, Correlation Coefficient (r),
Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables
More informationDiscussion Section 4 ECON 139/239 2010 Summer Term II
Discussion Section 4 ECON 139/239 2010 Summer Term II 1. Let s use the CollegeDistance.csv data again. (a) An education advocacy group argues that, on average, a person s educational attainment would increase
More informationA Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps
More informationcontaining Kendall correlations; and the OUTH = option will create a data set containing Hoeffding statistics.
Getting Correlations Using PROC CORR Correlation analysis provides a method to measure the strength of a linear relationship between two numeric variables. PROC CORR can be used to compute Pearson product-moment
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationABSTRACT INTRODUCTION
Paper SP03-2009 Illustrative Logistic Regression Examples using PROC LOGISTIC: New Features in SAS/STAT 9.2 Robert G. Downer, Grand Valley State University, Allendale, MI Patrick J. Richardson, Van Andel
More informationln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking
Dummy Coding for Dummies Kathryn Martin, Maternal, Child and Adolescent Health Program, California Department of Public Health ABSTRACT There are a number of ways to incorporate categorical variables into
More informationLecture 19: Conditional Logistic Regression
Lecture 19: Conditional Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina
More information" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
More information1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ
STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material
More informationStatistics and Data Analysis
NESUG 27 PRO LOGISTI: The Logistics ehind Interpreting ategorical Variable Effects Taylor Lewis, U.S. Office of Personnel Management, Washington, D STRT The goal of this paper is to demystify how SS models
More informationCHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationCool Tools for PROC LOGISTIC
Cool Tools for PROC LOGISTIC Paul D. Allison Statistical Horizons LLC and the University of Pennsylvania March 2013 www.statisticalhorizons.com 1 New Features in LOGISTIC ODDSRATIO statement EFFECTPLOT
More informationThis can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.
One-Degree-of-Freedom Tests Test for group occasion interactions has (number of groups 1) number of occasions 1) degrees of freedom. This can dilute the significance of a departure from the null hypothesis.
More informationDATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
More informationSUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
More informationSUGI 29 Statistics and Data Analysis
Paper 194-29 Head of the CLASS: Impress your colleagues with a superior understanding of the CLASS statement in PROC LOGISTIC Michelle L. Pritchard and David J. Pasta Ovation Research Group, San Francisco,
More informationFinal Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationChapter 5 Analysis of variance SPSS Analysis of variance
Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,
More informationAlgebra II End of Course Exam Answer Key Segment I. Scientific Calculator Only
Algebra II End of Course Exam Answer Key Segment I Scientific Calculator Only Question 1 Reporting Category: Algebraic Concepts & Procedures Common Core Standard: A-APR.3: Identify zeros of polynomials
More informationPoisson Regression or Regression of Counts (& Rates)
Poisson Regression or Regression of (& Rates) Carolyn J. Anderson Department of Educational Psychology University of Illinois at Urbana-Champaign Generalized Linear Models Slide 1 of 51 Outline Outline
More informationUnit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)
Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Logistic regression generalizes methods for 2-way tables Adds capability studying several predictors, but Limited to
More informationChapter 6: Multivariate Cointegration Analysis
Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration
More informationPearson s Correlation
Pearson s Correlation Correlation the degree to which two variables are associated (co-vary). Covariance may be either positive or negative. Its magnitude depends on the units of measurement. Assumes the
More informationLOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
More informationIntroduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
More informationFrom the help desk: Swamy s random-coefficients model
The Stata Journal (2003) 3, Number 3, pp. 302 308 From the help desk: Swamy s random-coefficients model Brian P. Poi Stata Corporation Abstract. This article discusses the Swamy (1970) random-coefficients
More informationMORE ON LOGISTIC REGRESSION
DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 816 MORE ON LOGISTIC REGRESSION I. AGENDA: A. Logistic regression 1. Multiple independent variables 2. Example: The Bell Curve 3. Evaluation
More informationPremaster Statistics Tutorial 4 Full solutions
Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for
More informationMATH 60 NOTEBOOK CERTIFICATIONS
MATH 60 NOTEBOOK CERTIFICATIONS Chapter #1: Integers and Real Numbers 1.1a 1.1b 1.2 1.3 1.4 1.8 Chapter #2: Algebraic Expressions, Linear Equations, and Applications 2.1a 2.1b 2.1c 2.2 2.3a 2.3b 2.4 2.5
More informationStatistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY
Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship
More informationCategorical Data Analysis
Richard L. Scheaffer University of Florida The reference material and many examples for this section are based on Chapter 8, Analyzing Association Between Categorical Variables, from Statistical Methods
More informationDescriptive Statistics
Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web
More informationLesson 1: Comparison of Population Means Part c: Comparison of Two- Means
Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
More informationRegression step-by-step using Microsoft Excel
Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression
More informationTests for Two Survival Curves Using Cox s Proportional Hazards Model
Chapter 730 Tests for Two Survival Curves Using Cox s Proportional Hazards Model Introduction A clinical trial is often employed to test the equality of survival distributions of two treatment groups.
More informationECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
More informationStandard errors of marginal effects in the heteroskedastic probit model
Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic
More informationRegression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
More informationReview of Fundamental Mathematics
Review of Fundamental Mathematics As explained in the Preface and in Chapter 1 of your textbook, managerial economics applies microeconomic theory to business decision making. The decision-making tools
More informationThe KaleidaGraph Guide to Curve Fitting
The KaleidaGraph Guide to Curve Fitting Contents Chapter 1 Curve Fitting Overview 1.1 Purpose of Curve Fitting... 5 1.2 Types of Curve Fits... 5 Least Squares Curve Fits... 5 Nonlinear Curve Fits... 6
More informationStatistics in Retail Finance. Chapter 2: Statistical models of default
Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision
More informationSPSS Guide: Regression Analysis
SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar
More informationDongfeng Li. Autumn 2010
Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationINTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationEQUATIONS and INEQUALITIES
EQUATIONS and INEQUALITIES Linear Equations and Slope 1. Slope a. Calculate the slope of a line given two points b. Calculate the slope of a line parallel to a given line. c. Calculate the slope of a line
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationInteraction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015
Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Note: This handout assumes you understand factor variables,
More informationExample: Boats and Manatees
Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant
More informationComparison of sales forecasting models for an innovative agro-industrial product: Bass model versus logistic function
The Empirical Econometrics and Quantitative Economics Letters ISSN 2286 7147 EEQEL all rights reserved Volume 1, Number 4 (December 2012), pp. 89 106. Comparison of sales forecasting models for an innovative
More informationDistribution (Weibull) Fitting
Chapter 550 Distribution (Weibull) Fitting Introduction This procedure estimates the parameters of the exponential, extreme value, logistic, log-logistic, lognormal, normal, and Weibull probability distributions
More information