18 Generalised Linear Models

Size: px
Start display at page:

Download "18 Generalised Linear Models"

Transcription

1 18 Generalised Linear Models Generalised linear models (GLM) is a generalisation of ordinary least squares regression. See also Davison, Section , Green (1984) and Dobson and Barnett (2008). To motivate the GLM approach let us briefly overview linear models An overview of linear models Let us consider the two competing linear nested models Restricted model: Y t β 0 + Full model: Y t β 0 + q β j x ti + ε t j1 q β j x ti + j1 p jq+1 β j x ti + ε t (87) where {ε t } are iid random variables with mean zero and variance σ 2. Let us suppose that we observe {(Y t x ti )} T, where {Y t} are normal. The classical method for testing H 0 : Model 0 against H A : Model 1 is to use the F-test (ANOVA). That is, let ˆσ R 2 be the residual sum of squares under the null and ˆσ F 2 be the residual sum of squares under the alternative. Then the F-statistic is where F S 2 R SF 2 /(p q) ˆσ 2 F S 2 F σ 2 F T (Y t 1 T p p ˆβ j F x tj ) 2 SR 2 j1 T (Y t p ˆβ j F x tj ) 2. j1 T (Y t q ˆβ j R x tj ) 2 j1 and under the null F F p qt p. Moreover, if the sample size is large (p q)f χ 2 p q. We recall that the residuals of the full model are r t Y t ˆβ 0 q ˆβ j1 j x ti p ˆβ jq+1 j x ti and the residual sum of squares SF 2 is used to measure how well the linear model fits the data (see Chapter 8 and STAT612 notes). The F-test and ANOVA are designed specifically for linear models. In this section the aim is to generalise Model specification. Estimation 125

2 Testing. Residuals. to a larger class of models. To generalise we will be in using a log-likelihood framework. To see how this fits in with the linear regression, let us now see how ANOVA and the log-likelihood ratio test are related. Suppose that σ 2 is known, then the log-likelihood ratio test for the above hypothesis is 1 S 2 σ 2 R SF 2 χ 2 p q where we note that since {ε t } is Gaussian, this is the exact distribution and not an asymptotic result. In the case that σ 2 is unknown and has to be replaced by its estimator ˆσ F 2, then we can either use the approximation or the exact distribution 1 ˆσ 2 F which returns us to the F-statistic. S 2 R SF 2 χ 2 p q S 2 R SF 2 /(p q) ˆσ 2 F T F p qt p On the other hand, if the variance σ 2 is unknown we can go the log-likelihood ratio test route. The log-likelihood ratio test reduces to log S2 R S 2 F log 1 + (S2 F S2 R ) ˆσ F 2 χ 2 p q. We recall that by using the expansion log(1 + x) x + O(x 2 ) we obtain log S2 R S 2 F log 1 + (S2 R S2 F ) SF 2 S2 R S2 F S 2 F + o p (1). Now we know the above is approximately χ 2 p q. But it is straightforward to see that by dividing by (p q) and multiplying by (T p) we have (T p) (p q) log S2 R S 2 F (T p) (p q) log 1 + (S2 R S2 F ) SF 2 (S2 R S2 F )/(p q) ˆσ F 2 + o p (1) F + o p (1). 126

3 Hence we have transformed the log-likelihood ratio test into the F -test, which we discussed at the start of this section. The ANOVA and log-likelihood methods are asymptotically equivalent. In the case that {ε t } are non-gaussian, but the model is linear with iid random variables, the above results also hold. However, in the case that the regressors have a nonlinear influence on the response and/or the response is not normal we need to take an alternative approach. Through out this section we will encounter such models. We will start by focussing on the following two problems: (i) How to model the relationship between the response and the regressors when the reponse is non-gaussian, and the model is nonlinear. (ii) Generalise ANOVA for nonlinear models Motivation Let us suppose {Y t } are independent random variables where it is believed that the regressors x t (x t is a p-dimensional vector) has an influence on {Y t }. Let us suppose that Y t is a binary random variable taking either zero or one and (Y t ) P (Y t 1) π t. How to model the relationship between Y t and x t? A simple approach, is to use a linear model, ie. let (Y t ) β x t, But a major problem with this approach is that (Y t ), is a probability, and for many values of β, β x t will lie outside the unit interval - hence a linear model is not meaningful. However, we can make a nonlinear transformation which transforms the regressors to the unit interval. For example let (Y t ) π t exp(β x t ) 1 + exp(β x t ) this transformation lies between zero and one. Hence we could just use a nonlinear regression to estimate the parameters. That is rewrite the model as Y t exp(β x t) 1+exp(β x t) + ε t, and use the estimator ˆβ T, where ˆβ T arg min β T Y t exp(β x t ) exp(β x t ) as an estimator of β. However, this approach also has its drawbacks. One obvious drawback is that the variance var(ε t ) var(y t ) π t (1 π t ), hence the variance depends on the parameters. Hence, alternative methods are required. The GLM approach is a general framework for a wide class of distributions. We recall that in Section 3 we considered maximum likelihood estimation for iid random variables which come from the natural exponential family. Distributions in this family include the normal, binary, 127

4 binomial and Poisson, amongst others. We recall that the natural exponential family has the form f(y; θ) exp yθ κ(θ) + c(y) where κ(θ) b(η 1 (θ)). To be a little more general we will suppose that the distribution can be written as yθ κ(θ) f(y; θ) exp + c(y ) (88) where is a nuisance parameter (called the disperson parameter, it plays the role of the variance in linear models) and θ is the parameter of interest. We recall that examples of exponential models include (i) The exponential distribution is already in natural exponential form with θ λ and 1. The log density is log f(y; θ) λy + log λ. (ii) For the binomial distribution we let θ log( π π 1 π ) and 1, since log( 1 π ) is invertible this gives π log f(y; θ) log f(y; log 1 π ) yθ n log n. + log 1 + exp(θ) y (iii) For the normal distribution we have that log f(y; µ σ 2 (y µ)2 ) 2σ 2 µy µ 2 /2 σ log σ log(2π) + y log(2π). Hence σ 2, θ µ, κ(θ) θ 2 /2 and c(y ) Xt 2 / π log 2π. (iv) The Poisson log distribution can be written as log f(y; µ) y log µ µ + log y Hence θ log µ, κ(θ) exp(θ) and c(y) log y. (v) Other members in this family include, Gamma, Beta, Multinomial and inverse Gaussian. Remark 18.1 Moments) Using Lemma 3.1 (see Section 3) we have (Y ) κ (θ) and var(y ) κ (θ). 128

5 GLM is a method which generalises the methods in linear models to the exponential family (recall that the normal model is a subclass of the exponential family). In the GLM setting it is usually assumed that the response variables {Y t } has a density which comes from the natural exponential family, which has log density yt θ t κ(θ t ) log f(y t ; θ t ) + c(y t ) where the parameter θ t depends on the regressors. The regressors influence the response through a linear predictor η t β x t and a link function, which connects β x t to the mean (Y t ) µ(θ t ) κ (θ t ). More precisely, there is a monotonic link function g( ), which is defined such that g(µ(θ t )) η t β x t. To summarise The log likelihood function of {Y t } is κ (θ t ) g 1 (η t ) µ(θ t ) (Y t ) θ t µ 1 (g 1 (η t )) θ(η t ) var(y t ) 1 dµ(θ t ). (89) dθ t L T (β) T Yt θ(η) κ(θ t ) + c(y t ). The choice of link function is rather subjective. One of the most popular is the canonical link which we define below. Definition 18.1 The canonical link function) The canonical link function is where we let η t θ t. Hence µ t κ (η t ) and g(κ (θ t )) g(κ (η t )) η t. The canonical link is usually used because it make the calculations simple. We observe with the canonical link the log-likelihood of {Y t } is L T (β) T Yt β x t κ(β x t ) which we will show is relatively simple to maximise. + c(y ) Example 18.1 The log-likelihood and canonical link function) (i) The canonical link for the exponential f(y t ; λ t ) λ t exp( λ t y t ) is θ t λ t β x t, and λ β x t. The log-likelihood is T Y t β x t log(β x t ). 129

6 (ii) The canonical link for the binomial is θ t β x t log( πt 1 π t ), hence π t exp(β x t) 1+exp(β x t). The log-likelihood is T Y t β x t + n t log exp(β x t ) nt + log 1 + exp(β. x t ) Y t (iii) The canonical link for the normal is θ t β x t µ t. The log-likelihood is (Y t β x t ) 2 2σ log σ log(2π) which is the usual least squared criterion. If the canonical link were not used, we would be in the nonlinear least squares setting that is the log-likelihood is (Y t g 1 (β x t )) 2 2σ log σ log(2π) (iv) The canonical link for the Poisson is θ t β x t log λ t, hence λ t exp(β x t ). The loglikelihood is T Y t β x t exp(β x t ) + log Y t. However, as mentioned above, the canonical link is simply used for its mathematical simplicity. There exists other links, which can often be more suitable. Remark 18.2 Link functions for the Binomial) We recall that the link function is defined a monotonic function g, where η t β x t g(µ t ). The choice of link is up to you. A popular choice to let g 1 a well known distribution function. The motivation for this is that for the Binomial distribution µ t n t π t (where π t is the probability of a success ). Clearly 0 π t 1, hence using g 1 distribution function (or survival function) makes sense. Examples include (i) The Logistic link, this is the canonical link function, where β x t g(µ t ) log( πt 1 π t ) µ log( t n t µ t ). (i) The Probit link, where π t Φ(β x t ), Φ is the standard normal distribution function and the link function is β x t g(µ t ) Φ 1 (µ t /n t ). (ii) The extreme value link, where the distribution function is F (x) 1 exp( exp(x)). Hence in this case the link function is β x t g(µ t ) log( log(1 µ t /n t )). 130

7 18.3 Estimating the parameters in a GLM The score function for GLM The score function for generalised linear models has a very interesting form, which we will now derive. From now on, we will suppose that t for all t, and that is known (though even in the case that it is unknown, it will not change the estimation scheme used). Much of the theory remains true without this restriction, but this makes the derivations a bit cleaner, and is enough for all the models we will encounter. where With this substitution, recall that the log-likelihood is L T (β ) T Yt θ t η(θ t ) Yt θ t η(θ t ) L t (β ) + c(y t ) + c(y t ) T t and θ t is a function of β t through the relationship g(κ (θ t )) η t β x t. For the MLE of β, we need to solve the likelihood equations L T β j T t β j 0 for all j 1... p. To obtain an interesting expression for the above, recall that var(y t ) µ (θ t ) and η t g(µ t ) and let µ (θ t ) V (µ t ). Since V (µ t ) dµt dθ t, inverting the derivative we have dθt dµ t 1/V (µ t ). Furthermore, since dηt dµ t g (µ t ), inverting the derivative we have dµt dη t 1/g (µ t ). By the chain rule for differentiation and using the above we have t β j d t dη t η t β j d t dη t η t θ t θ t β j (90) d t dθ t dµ t η t dθ t dµ t dη t β j d 1 1 t dµt dηt η t dθ t dθ t dµ t β j (Y t κ (θ t )) (κ (θ t )) 1 (g (µ t )) 1 x tj (Y t µ t )x tj V (µ t )g (µ t ) 131

8 Thus the likelihood equations we have to solve for the MLE of β are T (since µ t g 1 (β x t )). (Y t µ t )x T tj V (µ t )g (µ t ) (Y t g 1 (β x t ))x tj V (g 1 (β x t ))g 0 1 j p (91) (µ t ) We observe that the above set of equations can be considered as estimating functions (see Section 17 since the expectation of the above set of equations is equal to zero). Moreover, it has a very similar structure to the Normal equations arising in ordinary least squares. Example 18.2 (i) Normal {Y t } with mean µ t β x t. Here, we have g(µ t ) µ t β x t so g (µ t ) dg(µt) dµ t 1; also V (µ t ) 1, σ 2, so the equations become 1 σ 2 T (Y t β x t )x tj 0 Ignoring the factor σ 2, the LHS is the jth element of the vector X T (Y Xβ ), so the equations reduce to the Normal equations of least squares: X T (Y Xβ ) 0 or equivalently X T Xβ X T Y. (ii) Poisson {Y t } with log-link function, hence mean µ t log β x t (hence g(µ t ) log µ t ). This time, g (µ t ) 1/µ t, var(y t ) V (µ t ) µ t and 1. Substituting µ t exp(β x t ), into the equations become T (Y t e β x t )x tj 0 which are not linear in β and no explicit solution is possible in general The Newton-Raphson scheme See also Section and Green (1984). It is clear from the examples above that usually there does not exist a simple solution for the likelihood estimator of β. However, we can use the Newton-Raphson scheme to estimate β. We will derive an interesting expression for the iterative scheme. Other than the expression being useful for implementation, it also highlights the estimators connection to weighted least squares. 132

9 We recall that the Newtom Rapshon consists of the scheme (β (i+1) ) (β (i) ) (H (i) ) 1 u (i) where the p 1 gradient vector u (i) is u (i) LT... L T β 1 β ββi) p and the p p Hessian matrix H (i) is given by H (i) jk 2 L T (β) β j β k ββ i) for j k p, both u (i) and H (i) being evaluated at the current estimate β (i). By using (90), the score function at the ith iteration is u (i) j L T (β) β j ββ i) The Hessian at the ith iteration is T T d t dη t η t β j ββ i) t β j ββ i) T d t dη t ββ i)x tj. H (i) jk 2 L T (β) T 2 t β j β ββ i) k β j β ββ i) k T t β k β ββ i) j T t x tj β k η ββ i) t T t x tj x tk η t η t T Thus if we write s (i) for the T 1 vector with 2 t ηt 2 ββ i)x tj x tk (92) s (i) t and W (i) for the diagonal n n matrix with t η t ββ i) W tt d2 t dη 2 t 133

10 we have u (i) X T s (i) and H X T W (i) X and the Newton-Raphson iteration is (β (i+1) ) (β (i) ) (H (i) ) 1 u (i) (β (i) ) + (X T W (i) X) 1 X T s (i) Fisher scoring for GLM s Typically, partly for reasons of tradition, we use a modification of this in fitting statistical models. The matrix W is replaced by W, another diagonal T T matrix with W (i) tt ( W (i) tt β (i) ) d2 t β (i) dηt 2. One reason for taking the expectation is that W (i) tt diagonal entries, since (see Section 1) W (i) tt d2 t dηt 2 β (i) var is sure to have non-negative (typically positive) dt β (i) dη t so that W var(s (i) β (i) ), and the matrix is non-negative-definite. Using the Fisher score function the iteration becomes (β (i+1) ) (β (i) ) + (X T W (i) X) 1 X T s (i) Iteratively reweighted least squares The iteration (β (i+1) ) (β (i) ) + (X T W (i) X) 1 X T s (i) (93) is formally similar to the familiar solution for least squares estimates in linear models: β (X T X) 1 X T Y or more particularly the related weighted least squares estimates: β (X T W X) 1 X T W Y In fact, (93) can be re-arranged to have exactly this form. Algebraic manipulation gives (β (i) ) (X T W (i) X) 1 X T W (i) X(β (i) ) and (X T W (i) X) 1 X T s (i) (X T W (i) X) 1 X T W (i) (W (i) ) 1 s (i). Therefore substituting the above into (93) gives (β (i+1) ) (X T W (i) X) 1 X T W (i) X(β (i) ) + (X T W (i) X) 1 X T W (i) (W (i) ) 1 s (i) (X T W (i) X) 1 X T W (i) X(β (i) ) + (W (i) ) 1 : (X T W (i) X) 1 X T W (i) Z (i). 134

11 One reason that the above equation is of interest is that it has the form of weighted least squares. More precisely, it has the form of a weighted least squares regression of Z (i) on X with the diagonal weight matrix W (i). That is let z (i) t denote the ith element of the vector Z (i), then β (i+1) minimises the following weighted least squares criterion Of course, in reality W (i) t reweighted least squares. and z (i) t T W (i) (i) t z t β 2. x t are functions of β (i), hence the above is often called iteratively Estimating of the dispersion parameter Recall that in the linear model case, the variance σ 2 did not affect the estimation of β. and In the general GLM case, continuing to assume that t, we have s t d t d dθ t dµ t Y t µ t dη t dθ t dµ t dη t V (µ t )g (µ t ) W tt var(y t ) var(s t ) {V (µ t )g (µ t )} 2 V (µ t ) {V (µ t )g (µ t )} 2 1 V (µ t )(g (µ t )) 2 so that 1/ appears as a scale factor in W and s, but otherwise does not appear in the estimating equations or iteration for β. Hence does not play a role in the estimation of β. As in the Normal/linear case, (a) we are less interested in, and (b) we see that can be separately estimated from β. Recall that var(y t ) V (µ t ), thus E((Y t µ t ) 2 ) V (µ t ) We can use this to suggest a simple estimator for : 1 T p T (Y t µ t ) 2 V ( µ t ) 1 T p T (Y t µ t ) 2 µ ( θ t ) where µ t g 1 ( β x) and θ t µ 1 g 1 ( β x). Recall that the above resembles estimators of the residual variance. Indeed, it can be argued that the distribution of the above is close to χ 2 T p. 135

12 Remark 18.3 We mention that a slight generalisation of the above is when the dispersion parameter satisfies t a t, where a t is known. In this case, an estimator of the is 1 T p T (Y t µ t ) 2 a t V ( µ t ) 1 T p T (Y t µ t ) 2 a t µ ( θ t ) 18.4 Deviance, scaled deviance and residual deviance The scaled deviance Instead of minimising the sum of squares (which is done for linear models) we have been maximising a log-likelihood L T (β). Furthermore, we recall S( ˆβ) T rt 2 T Yt ˆβ 0 q ˆβ j x ti j1 p jq+1 ˆβ j x ti 2 is a numerical summary of how well the linear model fitted, S( β) 0 means a perfect fit. In this section we will define the equivalent of residuals and square residuals for GLM. so What is the best we can do in fitting a GLM? Recall t Y tθ t κ(θ t ) + c(y t ) d t dθ t 0 Y t κ (θ t ) 0 A model that achieves this equality for all t is called saturated (the same terminology is used for linear models). In other words, will need one free parameter for each observation (the alternative g(κ (θ t )) β x t is a restriction). Denote the corresponding θ t by θ t, i.e. the solution of κ ( θ t ) Y t. Now consider the difference 2{ t ( θ t ) t (θ t )} 2 {Y t( θ t θ t ) κ( θ t ) + κ(θ t )}. Maximising the likelihood is the same as minimising this quantity, which is always non-negative, and is 0 only if there is a perfect fit to the i th observation. This is analogous to linear models, where maximising the normal likelihood is the same as minimising least squares criterion (which is zero when the fit is best). Thus L T ( θ) T t( θ t ) provides a baseline value for the log-likelihood. 136

13 Example 18.3 The normal linear model) κ(θ t ) 1 2 θ2 t, κ (θ t ) θ t µ t, θ t Y t and σ 2 so 2{ t ( θ t ) t (θ t )} 2 σ 2 {Y t(y t µ t ) 1 2 Y 2 t µ2 t } (Y t µ t ) 2 /σ 2. Hence 2{ t ( θ t ) t (θ t )} includes the classical residual squared for the normal case. In general, let D t 2{Y t ( θ t ˆθ t ) κ( θ t ) + κ(ˆθ t )} We call D T D t the deviance of the model. If is present, let D D 2{L T ( θ) L T ( θ)}. is the scaled deviance. Thus the residual deviance plays the same role for GLM s as does the residual sum of squares for linear models Interpreting D t We will now show that D t 2{Y t ( θ t ˆθ t ) κ( θ t ) + κ(ˆθ t )} (Y t µ t ) 2. V (µ t ) To show the above we require expression for Y t ( θ t ˆθ t ) and κ( θ t ) κ(ˆθ t ). theorem to expand κ and κ about ˆθ t to obtain We use Taylor s κ( θ t ) κ(ˆθ t ) + ( θ t ˆθ t )κ (ˆθ t ) ( θ t ˆθ t ) 2 κ (ˆθ t ) (94) and κ ( θ t ) κ (ˆθ t ) + ( θ t ˆθ t )κ (ˆθ t ) (95) But κ ( θ t ) Y t, κ (ˆθ t ) µ t and κ (ˆθ t ) V (µ t ), so (94) becomes κ( θ t ) κ(ˆθ t ) + ( θ t ˆθ t )µ t ( θ t ˆθ t ) 2 V (µ t ) κ( θ t ) κ(ˆθ t ) ( θ t ˆθ t )µ t ( θ t ˆθ t ) 2 V (µ t ) (96) and (95) becomes Y t µ t + ( θ t ˆθ t )V (µ t ) Y t µ t ( θ t ˆθ t )V (µ t ) (97) 137

14 Now substituting (96) and (97) into D t gives D t 2{Y t ( θ t ˆθ t ) κ( θ t ) + κ(ˆθ t )} 2{Y t ( θ t ˆθ t ) ( θ t ˆθ t )µ t 1 2 ( θ t ˆθ t ) 2 V (µ t )} ( θ t ˆθ t ) 2 V (µ t ) (Y t µ t ) 2. V (µ t ) Recalling that vary t V (µ t ) and (Y t ) µ t, D t / is like a standardised squared residual. The signed square root of this approximation is called the Pearson residual. In other words (Y t µ t ) sign(y t µ t ) 2 V (µ t ) is called a Pearson residual. The distribution theory for this is very approximate, but a rule of thumb is that if the model fits, the scaled deviance D/ (or in practice D/ ) χ 2 T p Deviance residuals The analogy with the normal example can be taken further. The square roots of the individual terms in the residual sum of squares are the residuals, Y t β x t. We use the square roots of the individual terms in the deviances residual in the same way. However, the classical residuals can be both negative and positive, and the deviances residuals should behave in a similar way. But what sign should be used? The most obvious solution is to use Dt if Y t µ t < 0 r t Dt if Y t µ t 0 Thus we call the quantities {r t } the deviance residuals. Diagnostic plots We recall that for linear models we would often plot the residuals against the regressors to check to see whether a linear model is appropriate or not. One can make useful diagnostics plots which have exactly the same form as linear models, except that deviance residuals are used instead of ordinary residuals, and linear predictor values instead of fitted values Limiting distributions and standard errors of estimators In the majority of examples we have considered in the previous sections (see, for example, Sections 1 and 4) we observed iid {Y t } with distribution f( ; θ). We showed that 1 θ T N θ 0 T I(θ) 138

15 where I(θ) 2 log f(x;θ) f(x; θ)dx (I(θ) is Fisher s information). However this result was θ 2 based on the observations being iid. In the more general setting where {Y t } are independent but not identically distributed we have that β N p (β (I(β)) 1 ) where now I(β) is a p p matrix (of the entire sample), where (using equation (92)) we have (I(β)) jk E 2 T d 2 t β j β k dηt 2 x tj x tk (X T W X) jk. Thus for large samples we have where W is evaluated at the MLE β. β N p (β (X T W X) 1 ) The analysis of deviance How can we test hypotheses about models, and in particular decide which explanatory variables to include? The two close related methods we will consider below are the log-likelihood ratio test and an analogue of the analysis of variance (ANOVA), called the analysis of deviance. Let us concentrate on the simplest case, of testing a full vs. a reduced model. Partition the model matrix X and the parameter vector β as X X 1 X 2 where X 1 is n q and β 1 is q 1 (this is analogous to equation (87) for linear models). The full model is η Xβ X 1 β 1 + X 2 β 2 and the reduced model is η X 1 β1. We wish to test β β1 H 0 : β 2 0, i.e. that the reduced model is adequate for the data. Define the rescaled deviances for the full and reduced models β 2 and D R 2{L T ( θ) sup β 20β 1 L T (θ)} D F 2{L T ( θ) sup β 1β 2 L T (β)} where we recall that L T ( θ) T t( θ t ) is likelihood of the saturated model defined in Section Taking differences we have D R D F 2 sup L T (β) sup L T (θ) β 1β 2 β 20β 1 139

16 which is the likelihood ratio statistic. The results in Theorem 6.1, equation (26) (the log likelihood ratio test for composite hypothesis) also hold for observations which are not identically distributed. Hence using a generalised version of Theorem 6.1 we have D R D F 2 sup L T (β) sup L T (θ) χ 2 p q. β 1β 2 β 20β 1 So we can conduct a test of the adequacy of the reduced model D R D F by referring to a χ 2 p q, and rejecting H 0 if the statistic is too large (p-value too small). If is not present in the model, then we are good to go. If is present (and unknown), we estimate with 1 T p T (Y t µ t ) 2 V ( µ t ) 1 T p T (Y t µ t ) 2 µ ( θ t ) (see Section ). Now we consider D R D F, we can then continue to use the χ b 2 p q distribution, but since we are estimating we can use the statistic D R D F p q D F T p against as in the normal case (compare with Section 18.1). F (p q)(t p) 18.5 Examples Example 18.4 Question Suppose that {Y t } are independent random variables with the canonical exponential family, whose logarithm satisfies log f(y; θ t ) yθ t κ(θ t ) + c(y; ) where is the dispersion parameter. Let E(Y t ) µ t. Let η t β x t θ t (hence the canonical link is used), where x t are regressors which influence Y t. [14] (a) (i) Obtain the log-likelihood of {(Y t x t )} T. (ii) Denote the log-likelihood of {(Y t x t )} T as L T (β). Show that L T β j T (Y t µ t )x tj and 2 L T β k β j T κ (θ t )x tj x tk. (b) Let Y t have Gamma distribution, where the log density has the form log f(y t ; µ t ) Y t/µ t log µ t ν ν 1 log ν 1 + log Γ(ν 1 ) + ν 1 1 log Y t E(Y t ) µ t, var(y t ) µ 2 t /ν and ν t β x t g(µ t ). 140

17 (i) What is the canonical link function for the Gamma distribution and write down the corresponding likelihood of {(Y t x t )} T. (ii) Suppose that η t β x t β 0 + β 1 x t1. Denote the likelihood as L T (β 0 β 1 ). What are the first and second derivatives of L T (β 0 β 1 )? (iii) Evaluate the Fisher information matrix at β 0 and β 1 0. (iv) Using your answers in (ii,iii) and the mle of β 0 with β 1 0, derive the score test for testing H 0 : β 1 0 against H A : β 1 0. Solution (a) (i) The general log-likelihood for {(Y t x t )} with the canonical link function is L T (β ) T Yt (β x t κ(β x t )) + c(y t ). (ii) In the differentiation use that κ (θ t ) κ (β x t ) µ t. (b) (i) For the gamma distribution the canonical link is θ t η t 1/µ t 1/beta x t. Thus the log-likelihood is L T (β) T 1 Y t (β x t ) log( 1/β x t ) + c(ν 1 Y t ) ν where c( ) can be evaluated. (ii) Use part (ii) above to give L T (β) β j ν 1 T L T (β) β i β j ν 1 Y t + 1/(β x t ) x tj T 1 (β x ti x tj x t ) (iii) Take the expectation of the above at a general β 0 and β 1 0. (iv) Using the above information, use the Wald test, Score test or log-likelihood ratio test. Example 18.5 Question: It is a belief amongst farmers that the age of a hen has a negative influence on the number of eggs she lays and the quality the eggs. To investigate this, m hens were randomly sampled. On a given day, the total number of eggs and the number of bad eggs that each of the hens lays is recorded. Let N i denote the total number of eggs hen i lays, Y i denote the number of bad eggs the hen lays and x i denote the age of hen i. 141

18 It is known that the number of eggs a hen lays follows a Poisson distribution and the quality (whether it is good or bad) of a given egg is an independent event (independent of the other eggs). Let N i be a Poisson random variable with mean λ i, where we model λ i exp(α 0 + γ 1 x i ) and π i denote the probability that hen i lays a bad egg, where we model π i with π i exp(β 0 + γ 1 x i ) 1 + exp(β 0 + γ 1 x i ). Suppose that (α 0 β 0 γ 1 ) are unknown parameters. (a) Obtain the likelihood of {(N i Y i )} m. (b) Obtain the estimating function (score) of the likelihood and the Information matrix. (c) Obtain an iterative algorithm for estimating the unknown parameters. (d) For a given α 0 β 0 γ 1, evaluate the average number of bad eggs a 4 year old hen will lay in one day. (e) Describe in detail a method for testing H 0 : γ 1 0 against H A : γ 1 0. Solution (a) Since the canonical links are being used the log-likelihood function is L m (α 0 β 0 γ 1 ) L m (Y N) + L m (N) m Y i βx i N i log(1 + exp(βx i )) + N i αx i αx i + log m Y i βx i N i log(1 + exp(βx i )) + N i αx i αx i. where α (α 0 γ 1 ), β (β 0 γ 1 ) and x i (1 x i ). (b) We know that if the canonical link is used the score is and the second derivative is m L 1 Y i κ (β x i ) m Yi µ i m m 2 L 1 κ (β x i ) var(y i ). Ni Y i + log N i 142

19 Using the above we have for this question the score is The second derivative is 2 L m α L m β L m γ 1 2 L m α 0 L m β 0 L m γ 1 m λ i m Ni λ i m Yi N i π i m Ni λ i + Yi N i π i x i. 2 L m α 0 γ 1 m N i π i (1 π i ) m m λ i x i λ i + N i π i (1 π i ) 2 L m β 0 γ 1 x 2 i. m N i π i (1 π i )x i Observing that E(N i ) λ i the information matrix is m λ i 0 m λ iπ i (1 π i )x i I(θ) 0 m λ iπ i (1 π i ) m λ iπ i (1 π i )x i m λ iπ i (1 π i )x m i λ iπ i (1 π i )x m i λ i + λ i π i (1 π i ) x 2 i. (c) We can estimate θ 0 (α 0 β 0 γ 1 ) using Newton-Raphson with Fisher scoring, that is θ i θ i + I(θ i ) 1 S i 1 where S i 1 m m Ni λ i m Yi N i π i Ni λ i + Yi N i π i x i.. (d) We note that given the regressor x i 4, the average number of bad eggs will be E(Y i ) E(E(Y i N i )) E(N i π i ) λ i π i exp(α 0 + γ 1 x i ) exp()β 0 + γ 1 x i. 1 + exp(β 0 + γ 1 x i ) (e) Give either the log likelihood ratio test, score test or Wald test. 143

20 19 Proportion data, Count data and Overdispersion See Sections 10.4 and 10.5, Davison (2002), Agresti (1996), for a thorough theoretical account and Simonoff (2003) for a very interesting, applied perspective. In the previous section we generalised the linear model framework to the exponential family. GLM is often used for modelling count data, in these cases usually the Binomial, Poisson or Multinomial distributions are used. Types of data and the distribution: Hence we would observe (Y t x t ) for 1 t T and from Distribution Regressors Response variables Binomial x t Y t (Y t N Y t ) (Y t1 Y t2 ) Poission x t Y t Y t Multinomial x t Y (Y t1 Y t2... Y tm ) ( i Y ti N) Distribution Binomial P (Y t1 k Y t2 N k) N k Probabilities (1 π(β x t )) N k π(β x t ) k Poission P (Y t k) λ(β x t) k exp( β x t) k Multinomial P (Y t1 k 1... Y tm k m ) N k 1...k m π1 (β x t ) k1... π m (β x t ) km the data we are able to select the distribution we want to fit, and thus estimate β. In this section we will be mainly dealing with count data where the regressors tend to be ordinal (not continuous regressors). This type of data normally comes in the form of a contingency table. One of the most common type of contingency table is the two by two table, and we will consider this in the Section below Two by Two Tables Consider the following 2 2 contingency table Male Female Total Blue Pink Total Given with the above table, it is natural to ask is whether there is an association between gender and colour preference. The standard method is test for independence. However, we could also pose question in a different way: are proportion of females who like blue the same 144

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by Statistics 580 Maximum Likelihood Estimation Introduction Let y (y 1, y 2,..., y n be a vector of iid, random variables from one of a family of distributions on R n and indexed by a p-dimensional parameter

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Lecture 8: Gamma regression

Lecture 8: Gamma regression Lecture 8: Gamma regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Models with constant coefficient of variation Gamma regression: estimation and testing

More information

GLM, insurance pricing & big data: paying attention to convergence issues.

GLM, insurance pricing & big data: paying attention to convergence issues. GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Logit Models for Binary Data

Logit Models for Binary Data Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

More information

Computer exercise 4 Poisson Regression

Computer exercise 4 Poisson Regression Chalmers-University of Gothenburg Department of Mathematical Sciences Probability, Statistics and Risk MVE300 Computer exercise 4 Poisson Regression When dealing with two or more variables, the functional

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

Lecture 6: Poisson regression

Lecture 6: Poisson regression Lecture 6: Poisson regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction EDA for Poisson regression Estimation and testing in Poisson regression

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

More information

GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE

GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE ACTA UNIVERSITATIS AGRICULTURAE ET SILVICULTURAE MENDELIANAE BRUNENSIS Volume 62 41 Number 2, 2014 http://dx.doi.org/10.11118/actaun201462020383 GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE Silvie Kafková

More information

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1. MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

Lecture 14: GLM Estimation and Logistic Regression

Lecture 14: GLM Estimation and Logistic Regression Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South

More information

Advanced statistical inference. Suhasini Subba Rao Email: suhasini.subbarao@stat.tamu.edu

Advanced statistical inference. Suhasini Subba Rao Email: suhasini.subbarao@stat.tamu.edu Advanced statistical inference Suhasini Subba Rao Email: suhasini.subbarao@stat.tamu.edu August 1, 2012 2 Chapter 1 Basic Inference 1.1 A review of results in statistical inference In this section, we

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

Factorial experimental designs and generalized linear models

Factorial experimental designs and generalized linear models Statistics & Operations Research Transactions SORT 29 (2) July-December 2005, 249-268 ISSN: 1696-2281 www.idescat.net/sort Statistics & Operations Research c Institut d Estadística de Transactions Catalunya

More information

Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

More information

The equivalence of logistic regression and maximum entropy models

The equivalence of logistic regression and maximum entropy models The equivalence of logistic regression and maximum entropy models John Mount September 23, 20 Abstract As our colleague so aptly demonstrated ( http://www.win-vector.com/blog/20/09/the-simplerderivation-of-logistic-regression/

More information

Statistics in Retail Finance. Chapter 2: Statistical models of default

Statistics in Retail Finance. Chapter 2: Statistical models of default Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision

More information

15.1 The Structure of Generalized Linear Models

15.1 The Structure of Generalized Linear Models 15 Generalized Linear Models Due originally to Nelder and Wedderburn (1972), generalized linear models are a remarkable synthesis and extension of familiar regression models such as the linear models described

More information

Regression Models for Time Series Analysis

Regression Models for Time Series Analysis Regression Models for Time Series Analysis Benjamin Kedem 1 and Konstantinos Fokianos 2 1 University of Maryland, College Park, MD 2 University of Cyprus, Nicosia, Cyprus Wiley, New York, 2002 1 Cox (1975).

More information

13. Poisson Regression Analysis

13. Poisson Regression Analysis 136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often

More information

Trend and Seasonal Components

Trend and Seasonal Components Chapter 2 Trend and Seasonal Components If the plot of a TS reveals an increase of the seasonal and noise fluctuations with the level of the process then some transformation may be necessary before doing

More information

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium

More information

NON-LIFE INSURANCE PRICING USING THE GENERALIZED ADDITIVE MODEL, SMOOTHING SPLINES AND L-CURVES

NON-LIFE INSURANCE PRICING USING THE GENERALIZED ADDITIVE MODEL, SMOOTHING SPLINES AND L-CURVES NON-LIFE INSURANCE PRICING USING THE GENERALIZED ADDITIVE MODEL, SMOOTHING SPLINES AND L-CURVES Kivan Kaivanipour A thesis submitted for the degree of Master of Science in Engineering Physics Department

More information

Nominal and ordinal logistic regression

Nominal and ordinal logistic regression Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome

More information

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal

More information

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

More information

MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

More information

Linear Models for Continuous Data

Linear Models for Continuous Data Chapter 2 Linear Models for Continuous Data The starting point in our exploration of statistical models in social research will be the classical linear model. Stops along the way include multiple linear

More information

Logistic Regression (a type of Generalized Linear Model)

Logistic Regression (a type of Generalized Linear Model) Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge

More information

University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014.

University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014. University of Ljubljana Doctoral Programme in Statistics ethodology of Statistical Research Written examination February 14 th, 2014 Name and surname: ID number: Instructions Read carefully the wording

More information

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes JunXuJ.ScottLong Indiana University August 22, 2005 The paper provides technical details on

More information

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

More information

Master s Theory Exam Spring 2006

Master s Theory Exam Spring 2006 Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

Module 4 - Multiple Logistic Regression

Module 4 - Multiple Logistic Regression Module 4 - Multiple Logistic Regression Objectives Understand the principles and theory underlying logistic regression Understand proportions, probabilities, odds, odds ratios, logits and exponents Be

More information

(Quasi-)Newton methods

(Quasi-)Newton methods (Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

More information

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

1 Prior Probability and Posterior Probability

1 Prior Probability and Posterior Probability Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

More information

Automated Biosurveillance Data from England and Wales, 1991 2011

Automated Biosurveillance Data from England and Wales, 1991 2011 Article DOI: http://dx.doi.org/10.3201/eid1901.120493 Automated Biosurveillance Data from England and Wales, 1991 2011 Technical Appendix This online appendix provides technical details of statistical

More information

Detekce změn v autoregresních posloupnostech

Detekce změn v autoregresních posloupnostech Nové Hrady 2012 Outline 1 Introduction 2 3 4 Change point problem (retrospective) The data Y 1,..., Y n follow a statistical model, which may change once or several times during the observation period

More information

LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

More information

STAT 830 Convergence in Distribution

STAT 830 Convergence in Distribution STAT 830 Convergence in Distribution Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 1 / 31

More information

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015. Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x

More information

Lecture Notes 1. Brief Review of Basic Probability

Lecture Notes 1. Brief Review of Basic Probability Probability Review Lecture Notes Brief Review of Basic Probability I assume you know basic probability. Chapters -3 are a review. I will assume you have read and understood Chapters -3. Here is a very

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

More information

Profitable Strategies in Horse Race Betting Markets

Profitable Strategies in Horse Race Betting Markets University of Melbourne Department of Mathematics and Statistics Master Thesis Profitable Strategies in Horse Race Betting Markets Author: Alexander Davis Supervisor: Dr. Owen Jones October 18, 2013 Abstract:

More information

Estimating an ARMA Process

Estimating an ARMA Process Statistics 910, #12 1 Overview Estimating an ARMA Process 1. Main ideas 2. Fitting autoregressions 3. Fitting with moving average components 4. Standard errors 5. Examples 6. Appendix: Simple estimators

More information

MULTIPLE REGRESSION WITH CATEGORICAL DATA

MULTIPLE REGRESSION WITH CATEGORICAL DATA DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting

More information

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information

Logistic regression modeling the probability of success

Logistic regression modeling the probability of success Logistic regression modeling the probability of success Regression models are usually thought of as only being appropriate for target variables that are continuous Is there any situation where we might

More information

Joint models for classification and comparison of mortality in different countries.

Joint models for classification and comparison of mortality in different countries. Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

Nonlinear Regression Functions. SW Ch 8 1/54/

Nonlinear Regression Functions. SW Ch 8 1/54/ Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General

More information

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1. **BEGINNING OF EXAMINATION** 1. You are given: (i) The annual number of claims for an insured has probability function: 3 p x q q x x ( ) = ( 1 ) 3 x, x = 0,1,, 3 (ii) The prior density is π ( q) = q,

More information

Ordinal Regression. Chapter

Ordinal Regression. Chapter Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.

More information

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study But I will offer a review, with a focus on issues which arise in finance 1 TYPES OF FINANCIAL

More information

Principle of Data Reduction

Principle of Data Reduction Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

More information

Local classification and local likelihoods

Local classification and local likelihoods Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor

More information

1 The Brownian bridge construction

1 The Brownian bridge construction The Brownian bridge construction The Brownian bridge construction is a way to build a Brownian motion path by successively adding finer scale detail. This construction leads to a relatively easy proof

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

More information

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance 2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

More information

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

More information

Multivariate Analysis of Variance (MANOVA): I. Theory

Multivariate Analysis of Variance (MANOVA): I. Theory Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the

More information

1 Solving LPs: The Simplex Algorithm of George Dantzig

1 Solving LPs: The Simplex Algorithm of George Dantzig Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.

More information

1 Another method of estimation: least squares

1 Another method of estimation: least squares 1 Another method of estimation: least squares erm: -estim.tex, Dec8, 009: 6 p.m. (draft - typos/writos likely exist) Corrections, comments, suggestions welcome. 1.1 Least squares in general Assume Y i

More information

1 Sufficient statistics

1 Sufficient statistics 1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

More information

Two Correlated Proportions (McNemar Test)

Two Correlated Proportions (McNemar Test) Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with

More information

Chapter 4: Statistical Hypothesis Testing

Chapter 4: Statistical Hypothesis Testing Chapter 4: Statistical Hypothesis Testing Christophe Hurlin November 20, 2015 Christophe Hurlin () Advanced Econometrics - Master ESA November 20, 2015 1 / 225 Section 1 Introduction Christophe Hurlin

More information

PREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE

PREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE PREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE BY P.D. ENGLAND AND R.J. VERRALL ABSTRACT This paper extends the methods introduced in England & Verrall (00), and shows how predictive

More information

BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

More information

CURVE FITTING LEAST SQUARES APPROXIMATION

CURVE FITTING LEAST SQUARES APPROXIMATION CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship

More information

Chapter 3: The Multiple Linear Regression Model

Chapter 3: The Multiple Linear Regression Model Chapter 3: The Multiple Linear Regression Model Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans November 23, 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information