III. INTRODUCTION TO LOGISTIC REGRESSION. a) Example: APACHE II Score and Mortality in Sepsis


 Britton Booker
 2 years ago
 Views:
Transcription
1 III. INTRODUCTION TO LOGISTIC REGRESSION 1. Simple Logistic Regression a) Example: APACHE II Score and Mortality in Sepsis The following figure shows 30 day mortality in a sample of septic patients as a function of their baseline APACHE II Score. Patients are coded as 1 or 0 depending on whether they are dead or alive in 30 days, respectively. Died 1 30 Day Mortality in Patients with Sepsis Survived APACHE II Score at Baseline We wish to predict death from baseline APACHE II score in these patients. Let π(x) be the probability that a patient with score x will die. Note that linear regression would not work well here since it could produce probabilities less than zero or greater than one. Died 1 30 Day Mortality in Patients with Sepsis Survived APACHE II Score at Baseline 1
2 b) Sigmoidal family of logistic regression curves Logistic regression fits probability functions of the following form: p( x) = exp( a+ bx)/( 1 + exp( a+ bx)) This equation describes a family of sigmoidal curves, three examples of which are given below. p( x) = exp( a+ bx)/ ( 1 + exp( a+ bx)) 1.0 π(x) α β c) Parameter values and the shape of the regression curve For negative values of x, exp ( a+ b ) Æ and hence p( x ) Æ 0/ ( 1+ 0) = 0 x For now assume that β > 0. x 0 as x Æ For very large values of x, exp( a+ bx) Æ p( x) Æ ( 1+ ) = 1 and hence 2
3 p( x) = exp( a+ bx)/ ( 1 + exp( a+ bx)) π(x) α β When x = a/ b, a+ bx = 0 and hence p( x ) = = 05. The slope of π(x) when π(x)=.5 is β/4. Thus β controls how fast π(x) rises from 0 to 1. x b g For given β, α controls were the 50% survival point is located. We wish to choose the best curve to fit the data. Data that has a sharp survival cut off point between patients who live or die should have a large value of β. Died 1 Survived x Data with a lengthy transition from survival to death should have a low value of β. Died 1 Survived x 3
4 d) The probability of death under the logistic model This probability is p( x) = exp( a+ bx)/ ( 1 + exp( a+ bx)) {3.1} Hence 1  p( x) = probability of survival 1 = + exp( a + bx)  exp( a + bx) 1 + exp( a+ bx) = 1 ( 1+ exp( a+ bx)), and the odds of death is p( x) ( 1  p( x)) = exp( a+ bx) The log odds of death equals log( p( x) ( 1  p( x)) = a+ bx {3.2} e) The logit function For any number π between 0 and 1 the logit function is defined by logit( p) = log( p/( 1 p)) Let d i = R S T 1: i 0: i th th patient dies patient lives x i be the APACHE II score of the i th patient Then the expected value of d i is Ed ( ) =p ( x) = Pr[ d= 1] i i i Thus we can rewrite the logistic regression equation {5.2} as logit( Ed ( )) =p ( x) =a+bx i i i {3.3} 4
5 2. Contrast Between Logistic and Linear Regression In linear regression, the expected value of y i given x i is E( y ) = a+ bx for i = 12,,..., n i i y i has a normal distribution with standard deviation σ. it is the random component of the model, which has a normal distribution. a+ bx i is the linear predictor. In logistic regression, the expected value of d i given x i is E(d i ) = logit(e(d i )) = α + x i β for i = 1, 2,, n d is dichotomous with probability of event p i = p[ xi] i it is the random component of the model p = p i [ xi] logit is the link function that relates the expected value of the random component to the linear predictor. 3. Maximum Likelihood Estimation In linear regression we used the method of least squares to estimate regression coefficients. In generalized linear models we use another approach called maximum likelihood estimation. The maximum likelihood estimate of a parameter is that value that maximizes the probability of the observed data. We estimate α and β by those values â and ˆb that maximize the probability of the observed data under the logistic regression model. 5
6 Bas eline APACHE II Score Number of Patients Number of Deaths Baseline APACHE II S c o re Number of Patients Number of Deaths This data is analyzed as follows. tabulate fate Death by 30 days Freq. Percent Cum alive dead Total
7 . histogram apache [fweight=freq ], discrete (start=0, width=1) Density APACHE Score at Baseline. scatter proportion apache Proportion of Deaths with Indicated Score APACHE Score at Baseline 7
8 Logit estimates Number of obs = 454 LR chi2(1) = Prob > chi2 = Log likelihood = Pseudo R2 = fate Coef. Std. Err. z P> z [95% Conf. Interval] apache _cons logit( Ed ( )) =p ( x) =a+bx ˆb = i i i â = p( x) = exp( a+ bx)/( 1 + exp( a+ bx)) exp( x) = 1 + exp( x) exp( ) p ( 20) = = exp( ) Probability of Death by 30 days APACHE Score at Baseline Proportion of Deaths with Indicated Score Pr(fate) 8
9 4. Odds Ratios and the Logistic Regression Model a) Odds ratio associated with a unit increase in x The log odds that patients with APACHE II scores of x and x + 1 will die are and respectively. logit ( p( x)) = a+ bx {3.5} logit ( p( x + 1)) = a+ b( x + 1) = a+ bx + b {3.6} subtracting {3.5} from {3.6} gives β = logit ( p( x + 1)) logit( p( x)) β = logit ( p( x + 1)) logit( p( x)) F I HG K J  F p x HG p x 1  F HG p( x + 1)) p( x) = log log 1 p( x + 1) 1 p( x) = log ( + 1)/( 1 p( x + 1)) ( )/( p( x)) I K J I K J and hence exp(β) is the odds ratio for death associated with a unit increase in x. A property of logistic regression is that this ratio remains constant for all values of x. 9
10 5. 95% Confidence Intervals for Odds Ratio Estimates In our sepsis example the parameter estimate for apache (β) was with a standard error of Therefore, the odds ratio for death associated with a unit rise in APACHE II score is exp( ) = with a 95% confidence interval of (exp( ), exp( )) = (1.09, 1.15). 6. Quality of Model fit If our model is correct then ( ) ˆ ˆ logit observed proportion =a+bx i It can be helpful to plot the observed log odds against a+bx ˆ ˆ i 10
11 Log odds of Death in 30 days APACHE Score at Baseline obs_logodds Linear prediction 7. 95% Confidence Interval for p[ x] Let 2 and 2 s denote the variance of â and ˆb. s â ˆb Let s ˆ denote the covariance between â and ˆb. âb Then it can be shown that the standard error of is seèa+b ˆ ˆx = Î s + 2xs + x s aˆ ab ˆ ˆ bˆ A 95% confidence interval for a+bx is a+b ˆ ˆx ± 1.96 seèa+b ˆ ˆx Î 11
12 APACHE Score at Baseline lb_logodds/ub_logodds Linear prediction obs_logodds A 95% confidence interval for a+bx is a+b ˆ ˆx ± 1.96 seèa+b ˆ ˆx Î ( xi) exp ( xi) / ( 1 exp( xi) ) p = a+b + a+b Hence, a 95% confidence interval for pˆ x, pˆ x, where and ( L[ ] U[ ]) p[ x] expèa+b ˆ ˆx se È ˆ ˆx Î a+b p ˆ L [ x] = Î 1+ expèa+b ˆ ˆx seèa+b ˆ ˆx Î Î exp Èa+b ˆ ˆx se È ˆ ˆx Î a+b p ˆ U [ x] = Î 1 + expèa+b ˆ ˆx seèa+b ˆ ˆx Î Î is 12
13 APACHE Score at Baseline lb_prob/ub_prob Pr(fate) Proportion Dead by 30 Days It is common to recode continuous variables into categorical variables in order to calculate odds ratios for, say the highest quartile compared to the lowest.. centile apache, centile( )  Binom. Interp.  Variable Obs Percentile Centile [95% Conf. Interval] apache generate float upper_q= apache >= 20. tabulate upper_q upper_q Freq. Percent Cum Total
14 . cc fate upper_q if apache >= 20 apache <= 10 Proportion Exposed Unexposed Total Exposed Cases Controls Total Point estimate [95% Conf. Interval] Odds ratio (exact) Attr. frac. ex (exact) Attr. frac. pop chi2(1) = Pr>chi2 = This approach discards potentially valuable information and may not be as clinically relevant as an odds ratio at two specific values. Alternately we can calculate the odds ratio for death for patients at the 75 th percentile of Apache scores compared to patients at the 25 th percentile logit ( p (20)) = a+b 20 logit ( p (10)) = a+b 10 Subtracting gives ( 20 )/ ( 1 ( 20) ) ( 10 )/ ( 1 ( 10) ) Êp p ˆ log Á =b 10 = = Ë p p Hence, the odds ratio equals exp(1.156) = 3.18 A problem with this estimate is that it is strongly dependent on the accuracy of the logistic regression model. 14
15 Hence, the odds ratio equals exp(1.156) = 3.18 With Stata we can calculate the 95% confidence interval for this odds ratio as follows:. lincom 10*apache, eform ( 1) 10 apache = fate exp(b) Std. Err. z P> z [95% Conf. Interval] (1) A problem with this estimate is that it is strongly dependent on the accuracy of the logistic regression model. Simple logistic regression generalizes to allow multiple covariates logit( Ed ( )) =a+b 1x +b 2x b x i i1 i k ik where x i1, x 12,, x ik are covariates from the i th patient α and β 1,...β k, are known parameters d i = 1: ith patient suffers event of interest 0: otherwise Multiple logistic regression can be used for many purposes. One of these is to weaken the logitlinear assumption of simple logistic regression using restricted cubic splines. 15
16 8. Restricted Cubic Splines These curves have k knots located at t1, t2,, t k. They are: Linear before t and after t. 1 Piecewise cubic polynomials between adjacent knots 3 2 (i.e. of the form ax + bx + cx + d ) Continuous and smooth. k t1 t2 t3 Given x and k knots a restricted cubic spline can be defined by y =a+ x b + x b + + x  b k 1 k 1 for suitably defined values of These covariates are functions of x and the knots but are independent of y. x1 xi = x and hence the hypothesis tests the linear hypothesis. b =b = =b k In logistic regression we use restricted cubic splines by modeling ( E( d) ) =a+ x1b 1+ x2b 2+ + x  1b  1 logit i k k Programs to calculate x1,, xk  1are available in Stata, R and other statistical software packages. 16
17 We fit a logistic regression model using a three knot restricted cubic spline model with knots at the default locations at the 10th percentile, 50th percentile, and 90th percentile.. rc_spline apache, nknots(3) number of knots = 3 value of knot 1 = 7 value of knot 2 = 14 value of knot 3 = 25. logit fate _Sapache1 _Sapache2 Logit estimates Number of obs = 454 LR chi2(2) = Prob > chi2 = Log likelihood = Pseudo R2 = fate Coef. Std. Err. z P> z [95% Conf. Interval] _Sapache _Sapache _cons Note that the coefficient for _Sapache2 is small and not significant, indicating an excellent fit for the simple logistic regression model. Giving analogous commands as for simple logistic regression gives the following plot of predicted mortality given the baseline Apache score.. drop prob logodds se lb_logodds ub_logodds ub_prob lb_prob. rename prob_rcs prob. rename logodds_rcs logodds. predict se, stdp. generate float lb_logodds= logodds1.96* se. generate float ub_logodds= logodds+1.96* se. generate float ub_prob= exp( ub_logodds)/(1+exp( ub_logodds)). generate float lb_prob= exp( lb_logodds)/(1+exp( lb_logodds)). twoway (rarea lb_prob ub_prob apache, blcolor(yellow) bfcolor(yellow)) > (scatter proportion apache) > (line prob apache, clcolor(red) clwidth(medthick)) This plot is very similar to the one for simple logistic regression except the 95% confidence band is a little wider. 17
18 Three Knot Restricted Cubic Spline Model APACHE Score at Baseline lb_prob/ub_prob Pr(fate) Proportion Dead by 30 Days Simple Logistic Regression APACHE Score at Baseline lb_prob/ub_prob Pr(fate) Proportion Dead by 30 Days 18
19 This regression gives the following table of values Percentile Apache _Sapache1 _Sapache We calculate the odds ratio of death for patients at the 75th vs. 25th percentile of apache score as follows: The logodds at the 75th percentile equals logit p 20 = a+b 20 +b ( ( )) 1 2 The logodds at the 25th percentile equals ( ( )) 1 2 logit p 10 = a+b 10 +b Subtracting the second from the first equation gives that the log odds ratio for patients at the 75th vs. 25th percentile of apache score is ( ( ) ( )) 1 2 logit p 20 / p 10 =b 10 +b Stata calculates this odds ratio to be 3.22 with a 95% confidence interval of lincom _Sapache1* * _Sapache2, eform ( 1) 10 _Sapache _Sapache2 = fate exp(b) Std. Err. z P> z [95% Conf. Interval] (1) Recall that for the simple model we had the following odds ratio and confidence interval fate exp(b) Std. Err. z P> z [95% Conf. Interval] (1) The close agreement of these results supports the use of the simple logistic regression model for these data. 19
20 9. Simple 2x2 CaseControl Studies a) Example: Esophageal Cancer and Alcohol Breslow & Day, Vol. I give the following results from the IlleetVilaine casecontrol study of esophageal cancer and alcohol. Cases were 200 men diagnosed with esophageal cancer in regional hospitals between 1/1/1972 and 4/30/1974. Controls were 775 men drawn from electoral lists in each commune. Esophageal Daily Alcohol Consumption Cancer > 80g < 80g Total Yes (Cases) No (Controls) Total b) Review of Classical CaseControl Theory Let x i = S T R1 = cases 0 = for controls m i = number of cases (i = 1) or controls (i = 0) d i = number of cases (i = 1) or controls (i = 0) who are heavy drinkers. Then the observed prevalence of heavy drinkers is d 0 /m 0 = 109/775 for controls and d 1 /m 1 = 96/200 for cases. The observed prevalence of moderate or nondrinkers is (m 0  d 0 )/m 0 = 666/775 for controls and (m 1  d 1 )/m 1 = 104/200 for cases. 20
21 The observed odds that a case or control will be a heavy drinker is ( d / m )/[( m  d )/ m ] = d / ( m d ) i i i i i i i i = 109/666 and 96/104 for controls and cases, respectively. The observed odds ratio for heavy drinking in cases relative to controls is d /( ) 1 m1  d1 y= d /( m  d ) = 96 /104 = / 666 If the cases and Since controls esophageal are a representative cancer is rare sample y from their respective also estimates underlying the populations relative risk then of 1. y is an unbiased esophageal estimate cancer of the in heavy true odds drinkers ratio for heavy drinking in relative cases relative to moderate to controls drinkers. the underlying population. 2. This true odds ratio also equals the true odds ratio for esophageal cancer in heavy drinkers relative to moderate drinkers. Casecontrol studies would be pointless if this were not true. Woolf s estimate of the standard error of the log odds ratio is selog( ˆ ) = d + y 0 m0 d + 0 d + 1 m1 d1 and the distribution of Hence, if we let y ˆ L = yˆ expè1.96se log Î ( yˆ ) and y ˆ U = yˆ expè1.96se log Î ( yˆ ) ( ˆ, ˆ ) then y y is a 95% confidence interval for y. L U log( yˆ ) is approximately normal. 21
22 10. Logistic Regression Models for 2x2 Contingency Tables Consider the logistic regression model logit ( E( di / mi)) = a+ bxi {3.9} where E( di / mi) = pi = Probability of being a heavy drinker for cases (i = 1) and controls (i = 0). Then {3.9} can be rewritten logit( pi) = log( pi / ( 1  pi)) = a+ bxi Hence log( p1/ ( 1  p1)) = a+ bx1 = a+ b and log( p0 / ( 1  p0)) = a+ bx0 = a since x 1 = 1 and x 0 = 0. Subtracting these two equations gives log( p1 / ( 1p1)) log( p0 / ( 1 p0)) = b L p1/( 1  p1) O log log( y) b p0 /( 1  p0) QP = = NM b and hence the true odds ratio y = e a) Estimating relative risks from the model coefficients Our primary interest is in β. Given an estimate b of β then y = e b b) Nuisance parameters α is called a nuisance parameter. This is one that is required by the model but is not used to calculate interesting statistics. 11. Analyzing CaseControl Data with Stata Consider the following data on esophageal cancer and heavy drinking cancer alcohol patients No < 80g 666 Yes < 80g 104 No >= 80g 109 Yes >= 80g 96 22
23 . cc cancer alcohol [freq=patients], woolf alcohol Proportion Exposed Unexposed Total Exposed Cases Controls Total Point estimate [95% Conf. Interval] Odds ratio (Woolf) Attr. frac. ex (Woolf) Attr. frac. pop chi2(1) = Pr>chi2 = * * Now calculate the same odds ratio using logistic regression * 96 /104 The estimated odds ratio is = / 666. logistic alcohol cancer [freq=patients] Logistic regression No. of obs = 975 LR chi2(1) = Prob > chi2= Log likelihood = Pseudo R2 = alcohol Odds Ratio Std. Err. z P> z [95% Conf. Interval] cancer This is the analogous logistic command for simple logistic regression. If we had entered the data as cancer heavy patients This commands fit the model logit(e(alcohol)) = α + cancer*β giving β = 1.73 = the log odds ratio of being a heavy drinker in cancer patients relative to controls. The odds ratio is exp(1.73) =
24 a) Logistic and classical estimates of the 95% CI of the OR The 95% confidence interval is (5.64exp( ), 5.64exp( )) = (4.00, 7.95). The classical limits using Woolf s method is (5.64exp(1.96 s), 5.64exp(1.96 s)) =(4.00, 7.95), where s 2 = 1/96 + 1/ / /666 = = (0.1752) 2. Hence Logistic regression is in exact agreement with classical methods in this simple case. In Stata the command cc cancer alcohol [freq=patients], woolf gives us Woolf s 95% confidence interval for the odds ratio. We will cover how to calculate confidence intervals using glm in the next chapter. 24
Lecture 16: Logistic regression diagnostics, splines and interactions. Sandy Eckel 19 May 2007
Lecture 16: Logistic regression diagnostics, splines and interactions Sandy Eckel seckel@jhsph.edu 19 May 2007 1 Logistic Regression Diagnostics Graphs to check assumptions Recall: Graphing was used to
More informationHow to set the main menu of STATA to default factory settings standards
University of Pretoria Data analysis for evaluation studies Examples in STATA version 11 List of data sets b1.dta (To be created by students in class) fp1.xls (To be provided to students) fp1.txt (To be
More informationOverview Classes. 123 Logistic regression (5) 193 Building and applying logistic regression (6) 263 Generalizations of logistic regression (7)
Overview Classes 123 Logistic regression (5) 193 Building and applying logistic regression (6) 263 Generalizations of logistic regression (7) 24 Loglinear models (8) 54 1517 hrs; 5B02 Building and
More informationMultinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
More informationUnit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)
Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Logistic regression generalizes methods for 2way tables Adds capability studying several predictors, but Limited to
More information17. SIMPLE LINEAR REGRESSION II
17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.
More informationIntroduction. Hypothesis Testing. Hypothesis Testing. Significance Testing
Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters
More informationYiming Peng, Department of Statistics. February 12, 2013
Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop
More informationOrdinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
More informationFrom the help desk: Swamy s randomcoefficients model
The Stata Journal (2003) 3, Number 3, pp. 302 308 From the help desk: Swamy s randomcoefficients model Brian P. Poi Stata Corporation Abstract. This article discusses the Swamy (1970) randomcoefficients
More informationIntroduction. Survival Analysis. Censoring. Plan of Talk
Survival Analysis Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 01/12/2015 Survival Analysis is concerned with the length of time before an event occurs.
More informationLecture 10: Logistical Regression II Multinomial Data. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II
Lecture 10: Logistical Regression II Multinomial Data Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II Logit vs. Probit Review Use with a dichotomous dependent variable Need a link
More informationLOGISTIC REGRESSION ANALYSIS
LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic
More information11. Analysis of Casecontrol Studies Logistic Regression
Research methods II 113 11. Analysis of Casecontrol Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More informationHandling missing data in Stata a whirlwind tour
Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled
More informationVI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
More informationStatistical Modelling in Stata 5: Linear Models
Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 08/11/2016 Structure This Week What is a linear model? How
More informationDiscussion Section 4 ECON 139/239 2010 Summer Term II
Discussion Section 4 ECON 139/239 2010 Summer Term II 1. Let s use the CollegeDistance.csv data again. (a) An education advocacy group argues that, on average, a person s educational attainment would increase
More information13. Poisson Regression Analysis
136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often
More informationTwo Correlated Proportions (McNemar Test)
Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with
More informationUsing Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, Last revised March 28, 2015
Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised March 28, 2015 NOTE: The routines spost13, lrdrop1, and extremes are
More informationIntroduction to Stata
Introduction to Stata September 23, 2014 Stata is one of a few statistical analysis programs that social scientists use. Stata is in the midrange of how easy it is to use. Other options include SPSS,
More informationCorrelation and Regression
Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look
More informationBRIEF OVERVIEW ON INTERPRETING COUNT MODEL RISK RATIOS
BRIEF OVERVIEW ON INTERPRETING COUNT MODEL RISK RATIOS An Addendum to Negative Binomial Regression Cambridge University Press (2007) Joseph M. Hilbe 2008, All Rights Reserved This short monograph is intended
More informationSize of a study. Chapter 15
Size of a study 15.1 Introduction It is important to ensure at the design stage that the proposed number of subjects to be recruited into any study will be appropriate to answer the main objective(s) of
More informationOdds ratios and logistic regression: further examples of their use and interpretation
The Stata Journal (2003) 3, Number 3, pp. 213 225 Odds ratios and logistic regression: further examples of their use and interpretation Susan M. Hailpern, MS, MPH Paul F. Visintainer, PhD School of Public
More informationFrom the help desk: hurdle models
The Stata Journal (2003) 3, Number 2, pp. 178 184 From the help desk: hurdle models Allen McDowell Stata Corporation Abstract. This article demonstrates that, although there is no command in Stata for
More informationSOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS
SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS BIOSTATISTICS DESCRIBING DATA, THE NORMAL DISTRIBUTION SOLUTIONS 1. a. To calculate the mean, we just add up all 7 values, and divide by 7. In Xi i= 1 fancy
More informationAuxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationMarginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015
Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015 References: Long 1997, Long and Freese 2003 & 2006 & 2014,
More informationMULTIPLE REGRESSION EXAMPLE
MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if
More informatione = random error, assumed to be normally distributed with mean 0 and standard deviation σ
1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.
More informationApplication of Odds Ratio and Logistic Models in Epidemiology and Health Research
Application of Odds Ratio and Logistic Models in Epidemiology and Health Research By: Min Qi Wang, PhD; James M. Eddy, DEd; Eugene C. Fitzhugh, PhD Wang, M.Q., Eddy, J.M., & Fitzhugh, E.C.* (1995). Application
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More informationLogit Models for Binary Data
Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationIAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results
IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is Rsquared? Rsquared Published in Agricultural Economics 0.45 Best article of the
More informationPlease follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software
STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used
More informationModelling Binary Outcomes 22/11/2016
Modelling Binary Outcomes 22/11/2016 Contents 1 Modelling Binary Outcomes 5 1.1 Crosstabulation.................................... 5 1.1.1 Measures of Effect............................... 6 1.1.2 Limitations
More informationSurvival Analysis Using SPSS. By Hui Bian Office for Faculty Excellence
Survival Analysis Using SPSS By Hui Bian Office for Faculty Excellence Survival analysis What is survival analysis Event history analysis Time series analysis When use survival analysis Research interest
More informationModule 14: Missing Data Stata Practical
Module 14: Missing Data Stata Practical Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine www.missingdata.org.uk Supported by ESRC grant RES 189250103 and MRC grant G0900724
More informationGeneralized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
More informationSample Size Calculation for Longitudinal Studies
Sample Size Calculation for Longitudinal Studies Phil Schumm Department of Health Studies University of Chicago August 23, 2004 (Supported by National Institute on Aging grant P01 AG1891101A1) Introduction
More informationTesting on proportions
Testing on proportions Textbook Section 5.4 April 7, 2011 Example 1. X 1,, X n Bernolli(p). Wish to test H 0 : p p 0 H 1 : p > p 0 (1) Consider a related problem The likelihood ratio test is where c is
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationNominal and ordinal logistic regression
Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome
More informationUsing An Ordered Logistic Regression Model with SAS Vartanian: SW 541
Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 libname in1 >c:\=; Data first; Set in1.extract; A=1; PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA; OUTPUT OUT=CC XBETA=XB P=PROB; MODEL
More information11 Logistic Regression  Interpreting Parameters
11 Logistic Regression  Interpreting Parameters Let us expand on the material in the last section, trying to make sure we understand the logistic regression model and can interpret Stata output. Consider
More information" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
More informationStata Walkthrough 4: Regression, Prediction, and Forecasting
Stata Walkthrough 4: Regression, Prediction, and Forecasting Over drinks the other evening, my neighbor told me about his 25yearold nephew, who is dating a 35yearold woman. God, I can t see them getting
More informationNonlinear Regression Functions. SW Ch 8 1/54/
Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General
More information1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ
STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material
More informationNumerical Summarization of Data OPRE 6301
Numerical Summarization of Data OPRE 6301 Motivation... In the previous session, we used graphical techniques to describe data. For example: While this histogram provides useful insight, other interesting
More informationEcon 371 Problem Set #3 Answer Sheet
Econ 371 Problem Set #3 Answer Sheet 4.3 In this question, you are told that a OLS regression analysis of average weekly earnings yields the following estimated model. AW E = 696.7 + 9.6 Age, R 2 = 0.023,
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANACHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANACHAMPAIGN Linear Algebra Slide 1 of
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationQuantitative Methods for Economics Tutorial 9. Katherine Eyal
Quantitative Methods for Economics Tutorial 9 Katherine Eyal TUTORIAL 9 4 October 2010 ECO3021S Part A: Problems 1. In Problem 2 of Tutorial 7, we estimated the equation ŝleep = 3, 638.25 0.148 totwrk
More informationStatistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY
Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship
More informationBivariate Regression Analysis. The beginning of many types of regression
Bivariate Regression Analysis The beginning of many types of regression TOPICS Beyond Correlation Forecasting Two points to estimate the slope Meeting the BLUE criterion The OLS method Purpose of Regression
More informationLogit and Probit. Brad Jones 1. April 21, 2009. University of California, Davis. Bradford S. Jones, UCDavis, Dept. of Political Science
Logit and Probit Brad 1 1 Department of Political Science University of California, Davis April 21, 2009 Logit, redux Logit resolves the functional form problem (in terms of the response function in the
More informationSummary of Formulas and Concepts. Descriptive Statistics (Ch. 14)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 14) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationInstitut für Soziologie Eberhard Karls Universität Tübingen www.maartenbuis.nl
from Indirect Extracting from Institut für Soziologie Eberhard Karls Universität Tübingen www.maartenbuis.nl from Indirect What is the effect of x on y? Which effect do I choose: average marginal or marginal
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationEffect estimation versus hypothesis testing
Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Effect estimation versus hypothesis testing PD Dr. C. Schindler Swiss Tropical and Public Health Institute
More informationParametric Survival Models
Parametric Survival Models Germán Rodríguez grodri@princeton.edu Spring, 2001; revised Spring 2005, Summer 2010 We consider briefly the analysis of survival data when one is willing to assume a parametric
More informationREGRESSION LINES IN STATA
REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression
More information1 Simple Linear Regression I Least Squares Estimation
Simple Linear Regression I Least Squares Estimation Textbook Sections: 8. 8.3 Previously, we have worked with a random variable x that comes from a population that is normally distributed with mean µ and
More informationLOGIT AND PROBIT ANALYSIS
LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y
More informationStandard errors of marginal effects in the heteroskedastic probit model
Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In nonlinear regression models, such as the heteroskedastic
More informationRegression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
More informationEcon 371 Problem Set #3 Answer Sheet
Econ 371 Problem Set #3 Answer Sheet 4.1 In this question, you are told that a OLS regression analysis of third grade test scores as a function of class size yields the following estimated model. T estscore
More informationECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
More informationRegression. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.
Class: Date: Regression Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Given the least squares regression line y8 = 5 2x: a. the relationship between
More informationChapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
More informationExtending Hypothesis Testing. pvalues & confidence intervals
Extending Hypothesis Testing pvalues & confidence intervals So far: how to state a question in the form of two hypotheses (null and alternative), how to assess the data, how to answer the question by
More informationTopic 3  Survival Analysis
Topic 3  Survival Analysis 1. Topics...2 2. Learning objectives...3 3. Grouped survival data  leukemia example...4 3.1 Cohort survival data schematic...5 3.2 Tabulation of events and time at risk...6
More informationCompetingrisks regression
Competingrisks regression Roberto G. Gutierrez Director of Statistics StataCorp LP Stata Conference Boston 2010 R. Gutierrez (StataCorp) Competingrisks regression July 1516, 2010 1 / 26 Outline 1. Overview
More informationGLM III: Advanced Modeling Strategy 2005 CAS Seminar on Predictive Modeling Duncan Anderson MA FIA Watson Wyatt Worldwide
GLM III: Advanced Modeling Strategy 25 CAS Seminar on Predictive Modeling Duncan Anderson MA FIA Watson Wyatt Worldwide W W W. W A T S O N W Y A T T. C O M Agenda Introduction Testing the link function
More informationMissing data and net survival analysis Bernard Rachet
Workshop on Flexible Models for Longitudinal and Survival Data with Applications in Biostatistics Warwick, 2729 July 2015 Missing data and net survival analysis Bernard Rachet General context Populationbased,
More informationIntroduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
More informationNonlinear relationships Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015
Nonlinear relationships Richard Williams, University of Notre Dame, http://www.nd.edu/~rwilliam/ Last revised February, 5 Sources: Berry & Feldman s Multiple Regression in Practice 985; Pindyck and Rubinfeld
More informationMultiple Regression  Selecting the Best Equation An Example Techniques for Selecting the "Best" Regression Equation
Multiple Regression  Selecting the Best Equation When fitting a multiple linear regression model, a researcher will likely include independent variables that are not important in predicting the dependent
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationFinal Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
More informationespecially with continuous
Handling interactions in Stata, especially with continuous predictors Patrick Royston & Willi Sauerbrei German Stata Users meeting, Berlin, 1 June 2012 Interactions general concepts General idea of a (twoway)
More informationSUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
More informationSection 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
More informationLogistic Regression (a type of Generalized Linear Model)
Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationAugust 2012 EXAMINATIONS Solution Part I
August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,
More informationLesson 1: Comparison of Population Means Part c: Comparison of Two Means
Lesson : Comparison of Population Means Part c: Comparison of Two Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
More informationModule 5 Hypotheses Tests: Comparing Two Groups
Module 5 Hypotheses Tests: Comparing Two Groups Objective: In medical research, we often compare the outcomes between two groups of patients, namely exposed and unexposed groups. At the completion of this
More informationThe zeroadjusted Inverse Gaussian distribution as a model for insurance claims
The zeroadjusted Inverse Gaussian distribution as a model for insurance claims Gillian Heller 1, Mikis Stasinopoulos 2 and Bob Rigby 2 1 Dept of Statistics, Macquarie University, Sydney, Australia. email:
More informationBasic Biostatistics for Clinical Research. Ramses F Sadek, PhD GRU Cancer Center
Basic Biostatistics for Clinical Research Ramses F Sadek, PhD GRU Cancer Center 1 1. Basic Concepts 2. Data & Their Presentation Part One 2 1. Basic Concepts Statistics Biostatistics Populations and samples
More informationFactors affecting online sales
Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4
More information