Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes



Similar documents
Logit and Probit. Brad Jones 1. April 21, University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science

Nominal and ordinal logistic regression

Standard errors of marginal effects in the heteroskedastic probit model

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Regression with a Binary Dependent Variable

From the help desk: hurdle models


Missing data and net survival analysis Bernard Rachet

Pattern Analysis. Logistic Regression. 12. Mai Joachim Hornegger. Chair of Pattern Recognition Erlangen University

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

Examples of Using R for Modeling Ordinal Data

Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page

Maximum Likelihood Estimation

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

Lecture 19: Conditional Logistic Regression

Prediction for Multilevel Models

Logistic Regression (1/24/13)

Ordinal Regression. Chapter

Wes, Delaram, and Emily MA751. Exercise p(x; β) = [1 p(xi ; β)] = 1 p(x. y i [βx i ] log [1 + exp {βx i }].

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

Note on the EM Algorithm in Linear Regression Model

CREDIT SCORING MODEL APPLICATIONS:

ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING

From the help desk: Swamy s random-coefficients model

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Statistical Machine Learning

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Maximum Likelihood Estimation of Logistic Regression Models: Theory and Implementation

Poisson Models for Count Data

LOGIT AND PROBIT ANALYSIS

Least Squares Estimation

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén Table Of Contents

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Logit Models for Binary Data

Multinomial and Ordinal Logistic Regression

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

STATISTICA Formula Guide: Logistic Regression. Table of Contents

SAS Software to Fit the Generalized Linear Model

A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails

Penalized Logistic Regression and Classification of Microarray Data

Generalized Linear Models

A revisit of the hierarchical insurance claims modeling

7.1 The Hazard and Survival Functions

Multivariate Logistic Regression

Introduction to mixed model and missing data issues in longitudinal studies

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

MATHEMATICAL METHODS OF STATISTICS

GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE

11. Time series and dynamic linear models

VI. Introduction to Logistic Regression

The equivalence of logistic regression and maximum entropy models

Statistics in Retail Finance. Chapter 6: Behavioural models

Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses

Probability Calculator

Machine Learning Logistic Regression

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, Last revised February 21, 2015

Course 4 Examination Questions And Illustrative Solutions. November 2000

11. Analysis of Case-control Studies Logistic Regression

On Marginal Effects in Semiparametric Censored Regression Models

General Regression Formulae ) (N-2) (1 - r 2 YX

Exam C, Fall 2006 PRELIMINARY ANSWER KEY

Multinomial Logistic Regression

Recent Developments of Statistical Application in. Finance. Ruey S. Tsay. Graduate School of Business. The University of Chicago

Lecture 3: Linear methods for classification

Credit Scoring Modelling for Retail Banking Sector.

Master programme in Statistics

Parametric Survival Models

SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS

Yew May Martin Maureen Maclachlan Tom Karmel Higher Education Division, Department of Education, Training and Youth Affairs.

BayesX - Software for Bayesian Inference in Structured Additive Regression

Markov Chain Monte Carlo Simulation Made Simple

PROBABILITY AND STATISTICS. Ma To teach a knowledge of combinatorial reasoning.

XPost: Excel Workbooks for the Post-estimation Interpretation of Regression Models for Categorical Dependent Variables

Multilevel Modelling of medical data

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan , Fall 2010

Multiple Choice Models II

Supplement to Call Centers with Delay Information: Models and Insights

Analysis of ordinal data with cumulative link models estimation with the R-package ordinal

Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting.

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

1.1 Introduction, and Review of Probability Theory Random Variable, Range, Types of Random Variables CDF, PDF, Quantiles...

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

1.5 / 1 -- Communication Networks II (Görg) Transforms

Two Correlated Proportions (McNemar Test)

Analysis of Bayesian Dynamic Linear Models

Interpretation of Somers D under four simple models

Statistics in Retail Finance. Chapter 2: Statistical models of default

Review of Random Variables

Sun Li Centre for Academic Computing

Chapter 29 The GENMOD Procedure. Chapter Table of Contents

Longitudinal Data Analysis. Wiley Series in Probability and Statistics

Transcription:

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes JunXuJ.ScottLong Indiana University August 22, 2005 The paper provides technical details on the methods described in Jun Xu J. Scott Long (forthcoming) Confidence Intervals for Predicted Outcomes in Regression Models for Categorical Outcomes The Stata Journal. These formula were incorporated into prvalue. See www.indiana.edu/~jslsoc/spost.htm for further details. 1 General Formula The delta method is a general approach for computing confidence intervals for functions of maximum likelihood estimates. The delta method takes a function that is too complex for analytically computing the variance, creates a linear approximation of that function, then computes the variance of the simpler linear function that can be used for large sample inference. We begin with a general result for maximum likelihood theory. Under stard regularity conditions, if β b is a vector of ML estimates, then ³ h ³ n bβ β d N 0,V ar bβ i. (1) Let G (β) be some function, such as predicted probabilities from a logit or ordinal logit model. The Taylor series expansion of G( b β) is G( β) b G(β)+( β b β) 0 G 0 (β)+( β b β) 0 G 00 (β )( β b β)/2 (2) G(β)+( β b β) 0 G 0 (β), where G 0 (β) G 00 (β) are matrices of first second partial derivatives with respect to β, β is some value between β b β. Then, h n G( β) b i G(β) n( β b β) 0 G 0 (β). (3) This leads to leads to G( β) b ³ N G(β), G(β) Var( b β) G(β) (Greene 2000; Agresti 2002). 0 To estimate the variance, we evaluate the partials at the ML estimates, G(β x),which β 0 β leads to ³ Var G( β) b G(b β) b 0 Var( β) b G(b β) b. (4) 1

Forexample,considerthelogitmodelwith G(β) Pr(y 1 x) Λ (x 0 β). (5) To compute the confidence interval, we need the gradient vector G(β) h Λ(x0 β) 0 Λ(x 0 β) 1 Λ(x 0 β) K i 0. (6) Since Λ is a cdf, Λ(x0 β) Λ(x0 β) k x 0 β x 0 β k λ (x 0 β) x k.then G(β) λ (x 0 β) λ (x 0 β) x 1 λ (x 0 β) x K 0. (7) To compute the confidence interval for a change in the probability as the independent variables change from x a to x b, we use the function G (β) Λ(β x a ) Λ (β x b ), (8) where G(β) [Λ(β x a) Λ(β x b )] Λ(β x a) Λ(β x b). (9) Substituting this result into equation 4, Λ(β xa ) Var(G(β)) 0 Var( β) b Λ(β x a) Λ(β xa ) 0 Var( β) b Λ(β x b) Λ(β xb ) 0 Var( β) b Λ(β x a) Λ(β xb ) + 0 Var( β) b Λ(β x b) (10). We now apply these formula to the models for which prvalue computes confidence intervals. 2 Binary Models In binary models, G(β) Pr(y 1 x) F (x 0 β) where F is the cdf for the logistic, normal, or cloglog function. The gradient is F(x 0 β) F(x0 β) k x 0 β x 0 β k f(x 0 β)x k, (11) where f is the pdf corresponding to F.Forthevectorx it follows that F(x 0 β) f(x 0 β)x. (12) 2

From equation 4, Var [Pr(y 1 x)] f(x 0 β)x 0 Var( β)xf(x b 0 β) f(x 0 β) 2 x 0 Var( β)x b. (13) The variances of Pr(y 0 x) Pr(y 0 x) are the equal since [1 F (x 0 β)] f(x 0 β)x (14) Var [Pr(y 0 x)] [ f(x 0 β)] 2 x 0 Var( b β)x. (15) 3 Ordered Logit Probit Assume that there are m 1,J outcome categories, where Pr (y m x) F (τ m x 0 β) F (τ m 1 x 0 β) for j 1,J. (16) Since we assume that τ 0 τ J, F (τ 0 x 0 β)0 F (τ J x 0 β)1. To compute the gradient, It follows that F (τ m x 0 β) F (τ m x 0 β) (τ m x 0 β) (17) k (τ m x 0 β) k f (τ m x 0 β)( x k ) (18) F (τ m x 0 β) F (τ m x 0 β) τ j (τ m x 0 β) (τ m x 0 β) τ j. (19) F (τ m x 0 β) τ j f (τ m x 0 β) if j m (20) F (τ m x 0 β) 0if j 6 m. (21) τ j Using these results with equation 16, Pr (y i m x i ) k [f (τ m x 0 β)( x k )] [f (τ m 1 x 0 β)( x k )] (22) x k f (τ m x 0 β) [f (τ m 1 x 0 β)] Pr (y i m x i ) F (τ m x 0 β) F (τ m 1 x 0 β) (23) τ j τ j τ j f (τ m x 0 β) if j m f (τ m 1 x 0 β) if j m 1 0 otherwise. 3

then For example, with three categories: With respect to τ, Pr (y 1 x) F (τ 1 x 0 β) 0 (24) Pr (y 2 x) F (τ 2 x 0 β) F (τ 1 x 0 β) (25) Pr (y 3 x) 1 F (τ 2 x 0 β), (26) Pr (y i 1 x i ) k x k [f (τ 1 x 0 β)] (27) Pr (y i 2 x i ) k x k [f (τ 2 x 0 β) f (τ 1 x 0 β)] (28) Pr (y i 3 x i ) k x k [ f (τ 2 x 0 β)]. (29) Pr (y i 1 x i ) τ 1 f (τ 1 x 0 β) (30) Pr (y i 1 x i ) τ 2 0 (31) Pr (y i 2 x i ) τ 1 f (τ 1 x 0 β) (32) Pr (y i 2 x i ) τ 2 f (τ 2 x 0 β) (33) Pr (y i 3 x i ) τ 1 0 (34) Pr (y i 3 x i ) τ 2 0. (35) To implement these procedures in Stata, we create the augmented matrices: β β 0 τ 1 τ J 1 0 (36) x 1 x 0 1 0 0 0 x 2 x 0 0 1 0 0 (37). x J 1 x 0 0 0 1 0, such that x 0 j β τ j xβ. (38) We then create the gradients described above. 4

4 Generalized Ordered Logit 1 The generalized ordered logit model is identical to the ordinal logit model except that the coefficients associated with x differ for each outcome. Since there is an intercept for each outcome, the τ s are fixed to zero β J 0 for identification. Then, Pr (y i 1 x i ) F ( x 0 β m ) (39) Pr (y i m x i ) F ( x 0 β m ) F x 0 β m 1 for m 2,J 1 (40) Pr (y i J x i ) F x 0 β m 1. (41) The gradient with respect to the β s is while no gradient for thresholds is needed. Then, 5 Multinomial Logit Assuming outcomes 1 through J, F ( x 0 β m ) m,k f ( x 0 β m )( x k ), (42) Pr (y i m x i ) m,k x k f ( x 0 β m ) f x 0 β m 1. (43) Pr(y m x) exp (xβ m) JP exp, (44) xβ j where without loss of generality we assume that β 1 0 to identify the model ( accordingly, the derivatives below do not apply to the partial with respect to β 1 ). To simplify notation, let P exp xβ j. The derivative of the probability of m with respect to βn is Using the quotient rule, Pr(y m x) n Examining each partial in turn. Pr(y m x) n j1 exp (xβ m) 1 n. (45) exp (xβ m) exp (xβ m ) 2. (46) n n exp (xβ m ) exp (xβ m) m (47) n m n exp(xβ m ) x if m n 0 if m 6 n 1 Confidence intervals for the generalized ordered logit model are not supported in prvalue. 5

P J exp xβ j JP exp xβ j j1 j1 n n (48) exp(xβ n ) x. The last equality follows since the partial of exp xβ j with respect to βn is 0 unless j n. Combining these results. If m n, Pr(y m x) exp (xβ m ) x exp (xβ m ) 2 x 2 (49) m exp (xβ m ) exp (xβ m ) 2 2 x exp (xβm ) exp (xβ m) exp (xβ m ) x [Pr(y m) Pr (y m)pr(y m)] x Pr(y m)[1 Pr (y m)] x. For m 6 n, Pr(y m x) [0 exp (xβ m )exp(xβ n ) x] 2 (50) n exp (xβ m) exp (xβ n ) x Pr(y m)pr(y n) x. For example, for two x s m 1: Pr(y m x) p m (1 p m ) x 1 p m (1 p m ) x 2 p m (1 p m ) 0 (51) m Pr(y m x) 0 p m p n x 1 p m p n x 2 p m p n. (52) n6m 6 Poisson Negative Binomial Regression In the Poisson regression model, so that μ i exp(x 0 iβ), (53) μ exp (x0 β) (54) k k exp(x0 β) x 0 β x 0 β k exp(x 0 β)x k μx k 6

Using matrices, μ exp (x0 β) exp(x0 β) x 0 β x 0 β μx. (55) The probability of a given count is so we can compute the gradient as: Pr (y x) exp ( μ) μy y!, (56) exp ( μ) μ y /y! 1 exp ( μ) μ y μ. (57) k y! μ k Since the last term was computed above, we only need to derive exp ( μ) μ y μ exp( μ) μy exp ( μ) + μy μ μ exp( μ) yμ y 1 μ y exp ( μ). (58) This leads to Using matrices, Pr (y x) 1 k y! μ exp ( μ) yμ y 1 μ y exp ( μ) x k (59) exp ( μ) yμy μ y+1 exp ( μ) x k y! yμy μ y+1 exp (μ) y! x k. Pr (y x) The negative binomial model is specified as exp ( μ) yμy μ y+1 exp ( μ) x (60) y! yμy μ y+1 exp (μ) y! x. μ exp(x 0 β + ε) (61) exp(x 0 β)exp(ε), where ε has a gamma distribution with variance α. The counts have a negative binomial distribution Pr (y i x i ) Γ(y µ ν µ yi i + ν) ν μi, (62) y i! Γ(ν) ν + μ i ν + μ i 7

where ν α 1. The derivatives of the log-likelihood are given in Stata Reference, Version 8, page 10. To simplify notation, we define τ lnα, m 1/α, p 1/ (1 + αμ), μ exp(xβ). Withψ (z) being the digamma function evaluated at z, τ p (y μ) (63) α (μ y) m ln (1 + αμ)+ψ (y + m) ψ (m). 1+αμ (64) Then by the chain rule, Pr (y x) Pr (y x) Pr(y x) 1 Pr (y x), (65) so that Similarly for τ, Pr (y x) Pr (y x) τ τ Pr (y x). (66) Pr (y x). (67) 7 References Agresti, Alan. 2002. Categorical Data Analysis. 2nd Edition. New York: Wiley. Greene, William H. 2000. Econometric Analysis, 4th Ed. New York: Prentice Hall. File: spost_deltaci-22aug2005.tex - 22Aug2005 8