Random Effects Models for Longitudinal Survey Data

Size: px
Start display at page:

Download "Random Effects Models for Longitudinal Survey Data"

Transcription

1 Analysis of Survey Data. Edited by R. L. Chambers and C. J. Skinner Copyright 2003 John Wiley & Sons, Ltd. ISBN: CHAPTER 14 Random Effects Models for Longitudinal Survey Data C. J. Skinner and D. J. Holmes INTRODUCTION introduction Random effects models have a number of important uses in the analysis of longitudinal survey data. The main use, which we shall focus on in this chapter, is in the study of individual-level dynamics. Random effects models enable variation in individual responses to be decomposed into variation between the `permanent' characteristics of individuals and temporal `transitory' variation within individuals. Another important use of random effects models in the analysis of longitudinal data is in allowing for the effects of time-constant unobserved covariates in regression models (e.g. Solon, 1989; Hsiao, 1986; Baltagi, 2001). Failure to allow for these unobserved covariates in regression analysis of cross-sectional survey data may lead to inconsistent estimation of regression coefficients. Consistent estimation may, however, be achievable with the use of random effects models and longitudinal data. A `typical' random effects model may be conceived of as follows. It is supposed that a response variable Y is measured at each of a number of successive waves of the survey. The measurement for individual i at wave t is denoted y it and this value is assumed to be generated in two stages. First, `permanent' random effects y i are generated from some distribution for each individual i. Then, at each wave, y it is generated from y i. In the simplest case this generation follows the same process independently at each wave. For example, we may have y i N y, s 2 1, yit j y i N y i, s 2 2 : (14:1) Under this model, longitudinal data enable the `cross-sectional' variance s 2 1 s2 2 of y it to be decomposed into the variance s 2 1 of the `permanent' component y i and the variance s 2 2 of the `transitory' component at each wave.

2 206 RANDOM EFFECTS MODELS FOR LONGITUDINAL SURVEY DATA This may aid understanding of the mobility of individuals over time in terms of their place in the distribution of the response variable. An example, which we shall focus on, is where the response variable is earnings, subject to a log transformation, and a model of the form (14.1) enables us to study the degree of mobility of an individual's place in the earnings distribution (e.g. Lillard and Willis, 1978). Rich classes of random effects models for longitudinal data have been developed for the above purposes. A number of different terms have been used to describe these models including variance component models, error component models, mixed effects models, multilevel models and hierarchical models (Baltagi, 2001; Hsiao, 1986; Diggle, Liang and Zeger, 1994; Goldstein, 1995). The general aim of this chapter is to consider how to take account of complex sampling designs in the fitting of such random effects models. We shall suppose that there is a known probability sampling scheme employed to select the sample of individuals followed over the waves of the survey. Two additional complications will be that there may be wave nonresponse, so that not all sampled individuals will respond at each wave, and that the target population of individuals may not be fixed over time. To provide a specific focus, we will consider data on earnings of male employees over the first five waves of the British Household Panel Survey (BHPS), that is over the period 1991±5. As a basic model for the log earnings y it of individual i at wave t ˆ 1,..., T we shall suppose that y it ˆ b t u i n it, t ˆ 1,..., T (14:2) where the random effect u i is the `permanent' random effect, referred to earlier, and the n it are transitory random effects, whose effects on the response variable may last beyond the current wave t via the first-order autoregressive (AR(1)) model: n it ˆ rn it 1 " it, t ˆ 1,..., T: (14:3) Both u i and n it may include the effects of measurement errors (Abowd and Card, 1989). The random variables u i and " it are assumed to be mutually independent with E(u i ) ˆ E(" it ) ˆ 0, var(u i ) ˆ s 2 u, var(" it) ˆ s 2 " : The unknown fixed parameters b t (t ˆ 1,..., T ) represent annual (inflation) effects. Lillard and Willis (1978) considered this model (amongst others) for log-earnings for seven years (1967±73) of data from the US Panel Study of Income Dynamics. Letting s 2 n ˆ var(n it) and assuming the " it and n it are mutually independent and stationary, we obtain s 2 n ˆ s2 " = 1 r2 : (14:4) We refer to the above model as Model B and to the more restricted `variance components' model in which r ˆ 0 as Model A. See Goldstein, Healy and Rasbash (1994) for further discussion of such models.

3 covariance structure approach A COVARIANCE STRUCTURE APPROACH 207 We shall consider two broad approaches to fitting these models under a complex sample design. The first is a covariance structure approach, following Chamberlain (1982) and Skinner, Holt and Smith (1989, section 3.4.5, henceforth referred to SHS), in which the observations on the T waves are treated as a multivariate outcome with individuals as `single-level' units. This approach is set out in Section The second approach treats the data as two-level (Goldstein, 1995) with the level 1 units as the waves t ˆ 1,..., T and the level 2 units as the individuals i. The aim is to apply the methods developed by Pfeffermann et al. (1998). This approach is set out in Section A related approach is developed by Feder, Nathan and Pfeffermann (2000) for a model with time-varying random effects. The application of both our approaches to earnings data from the British Household Panel Survey will be considered in Section A COVARIANCE STRUCTURE APPROACH a Following the notation in Section 14.1, let y i ˆ ( y i1,..., y it ) 0 be the T 1 vector representing the profile of values of individual i over the T waves of the survey. Under the model defined by (14.2)±(14.4), these multivariate outcomes are independent with mean vector and covariance matrix given respectively by E( y i ) ˆ b ˆ ( b 1,..., b T ) 0, (14:5) var( y i ) ˆ s 2 u J T s 2 n V T ( r), (14:6) where J T is the T T matrix of ones and V T ( r) is the T T matrix with the (tt 0 )th element given by r (t 0 t) (1 t t 0 T). These equations define a `covariance structure' model in which the mean vector is unconstrained but the k ˆ T(T 1)=2 distinct elements of the covariance matrix are constrained to be functions of the parameter vector y ˆ (s 2 u, s2 ", r)0. Inference about these parameters may follow the approach outlined in SHS (section 3.4.5). Assuming first no nonresponse, let the data consist of the values y i for units i in a sample s. The usual survey estimator of the finite population covariance matrix S is given by ^S ˆ X w i ( y i y)( y i y) 0 = X w i, (14:7) s s where y ˆ X w i y i = X w i, s s and where w i is the survey weight for individual i. Let ^A ˆ vech( ^S ) denote the k 1 vector of distinct elements of ^S (the `vector half' of ^S: see Fuller, 1987, p. 382) and let A(y) ˆ vech[var( y i )] denote the corresponding vector of elements of var( y i ) from (14.6).

4 208 RANDOM EFFECTS MODELS FOR LONGITUDINAL SURVEY DATA Following Chamberlain (1982), Fuller (1984) and SHS (section 3.4.5), a general class of estimators of y is obtained by minimising h i 0V h i 1 ^A A(y) ^A A(y) (14:8) where V is a given k k non-singular matrix. A generalised least squares (GLS) estimator ^y GLS is obtained by taking V to be a consistent estimator of the covariance matrix of ^A. One choice, Vc, is obtained from the linearisation method (Wolter, 1985) by approximating the covariance matrix of the elements of ^S by the covariance matrix of the corresponding elements of the linear statistic X z i, (14:9) s where z i ˆ w i [( y i y)( y i y) 0 ^S ]= P s w i is treated as a fixed variable. The estimator V c may allow in the usual way for the complex design (Wolter, 1985). Since A(y) is a non-linear function of y, iterative minimisation of (14.8) is required. It may be noted that, for a given value of r under Model B, A(y) is linear in (s 2 u, s2 n ) and so closed form expressions may be determined for the values ^s 2 u ( r) and ^s2 n ( r), which minimise (14.8) for given r. The iterative minimisation may thus be reduced to a scalar problem. A consistent estimator of the covariance matrix of ^y GLS is given by (Fuller, 1984) V L ^y h GLS ˆ A_ 0V 1 ^y GLS _ c A ^y i 1, (14:10) GLS where A(y) _ ˆ ]A(y)=]y. An advantage of the GLS approach is that it provides a ready-made goodness-of-fit test as the minimised value of the criterion in (14.8), namely the Wald statistic: h XW 2 ˆ ^A i 0Vc h i 1 A ^y GLS ^A A ^y GLS : (14:11) If the model is correct and if the sample is large enough for V c to be a good approximation to the covariance matrix of ^A, then XW 2 should be distributed approximately as chi-squared with k q degrees of freedom, where q ˆ 2 and 3 for Models A and B respectively. One potential problem with the GLS estimator is that the covariance matrix estimator may be unstable if it is based on a relatively small number of degrees of freedom. This may lead to departures from the null distribution of XW 2 assumed above. In this case, it may be preferable to consider alternative choices of V. One approach is to let V be an estimator of the covariance matrix of ^A based upon the (false) assumption that observations are independent and identically distributed. Thus, if we write ^A ˆ X a i, (14:12) s

5 A COVARIANCE STRUCTURE APPROACH 209 where a i ˆ vech(z i ) denotes the k 1 vector of distinct elements of z i, then we may set V equal to V iid ˆ n X s (a i a)(a i a) 0 =(n 1), (14:13) where a ˆ Ps a i=n and n denotes the sample size. Although V iid may be more stable than a variance estimator which allows for the complex design, this choice of V is still correlated with ^A and, as discussed by Altonji and Segal (1996), may lead to serious bias in the estimation of y. To avoid this problem, an even simpler approach is to set V equal to the identity matrix, when the estimator of y obtained by minimising (14.8) may be viewed as an ordinary least squares (OLS) estimator. In both the cases when V ˆ V iid and when V is the identity matrix, the resulting estimator ^y will still be consistent for y but the Wald statistic XW 2 will no longer follow a chi-squared distribution if the model is true. The large-sample distribution will instead be a mixture of chi-squared distributions and this may be approximated by a chi-squared distribution using one or two moment Rao±Scott approximations (SHS, Ch. 4). It is also no longer appropriate to use expression (14.10) to obtain standard errors for the elements of ^y. Instead, as noted in SHS (Ch. 3), a consistent estimator of the covariance matrix of ^y is V ^y; V ˆ V 0 ˆ [ A(^y) _ 0 V0 1 A(^y)] _ 1 [ A(^y) _ 0 V0 1 V c V0 1 A(^y)][ _ A(^y) _ 0 V0 1 A(^y)] _ 1, where V 0 is the specified choice of V (V iid or the identity matrix) used to determine ^y and V c is a consistent estimator of the covariance matrix of ^A under the complex design. Note that this expression reduces to (14.10) when V 0 ˆ V c. The approach considered so far in this section is based on the estimated covariance matrix ^S in (14.7) and assumes no nonresponse. This is an unrealistic assumption. The simplest way of handling nonresponse is to consider only those individuals who respond on all T waves, the so-called `attrition sample', s T, at wave T. For longitudinal surveys, designed for multipurpose longitudinal analyses, it is common to construct longitudinal weights w it at each wave t, which are appropriate for longitudinal analysis based upon data for the attrition sample s t of individuals who respond up to wave t (Lepkowski, 1989). Thus, the simplest approach is to use only data from attrition sample s T and to replace the weights w i, e.g. in (14.7), by the weights w it. A more sophisticated approach, aimed at producing more efficient estimates, uses data from all attrition samples s 1,..., s T. A recursive approach to the estimation of the covariance matrix of y i may then be developed. Let y (t) i ˆ ( y i1,..., y it ) 0 and let ^S (t) denote the estimated t t covariance matrix of y (t) i. Begin the recursion by setting ^S (1) ˆ X w i1 ( y i1 y 1 ) 2 = X w i1, s 1 s 1

6 multilevel modelling approach 210 RANDOM EFFECTS MODELS FOR LONGITUDINAL SURVEY DATA where y 1 ˆ X s 1 w i1 y i1 = X s 1 w i1 as in (14.7). At the tth step of the recursion (t ˆ 2,..., T ) set the (t 1) (t 1) submatrix of ^S (t) corresponding to y (t 1) i equal to ^S (t 1). Let b (t) be the vector of weighted regression coefficients of y it on y (t 1) i given by " b (t) ˆ X # 1 X w it ( y (t 1) i y (t 1) )( y (t 1) i y (t 1) ) 0 w it ( y (t 1) i y (t 1) )y it s t s t where y (t 1) ˆ X s t w it y (t 1) i = X s t w it : Then set the (tt)th element of ^S (t), corresponding to the variance of y it, equal to ^s 2 et b(t)0 ^S(t 1) b (t), where ^s 2 et ˆ X w it (e it e t ) 2 = X w it, e it ˆ y it y (t 1) 0 i b (t), e t ˆ X w it e it = X w it : s t s t s t s t (t) (t) Finally, let ^S t, t 1 denote the 1 (t 1) vector of remaining elements of ^S corresponding to the covariances between y it and y (t 1) i and let ^S t, (t) t 1 ˆ b(t)0 ^S(t 1) : The recursive process is repeated for t ˆ 2,..., T. If y i is multivariate normal and there are no weights the resulting ^S (t) is a maximum likelihood estimator (Holt, Smith and Winter, 1980) for data from the set of attrition samples. In general, the estimator may be viewed as a form of pseudo-likelihood estimator (see Chapter 2). If the weights do not vary greatly, if y i is approximately multivariate normal and the observations for most individuals fall into one of the attrition samples, the estimator ^S (t) may be expected to be fairly efficient. Weighting can become unwieldy if it is attempted to adjust for all possible wave nonresponse patterns in addition to the attrition samples. See, for example, Lepkowski (1989) for further discussion. For a more general discussion of inference in the presence of nonresponse see Chapter 18. We return in Section 14.4 to the application of the methods discussed in this section A MULTILEVEL MODELLING APPROACH a A second approach to handling complex survey designs in the fitting of the models defined in Section 14.1 is by adapting standard approaches, such as iterative generalised least squares (IGLS), used for fitting random effects models (Goldstein, 1995). Pfeffermann et al. (1998) have considered modifying

7 A MULTILEVEL MODELLING APPROACH 211 IGLS estimation using an approach analogous to the pseudo-likelihood method (see Chapter 2) for a model of the form (14.2), where the v it are not serially correlated. Here we consider the extension of their approach to a longitudinal context, allowing for serial correlation. A potential advantage of this approach is that covariates may be handled more directly in the model. A potential disadvantage is that goodness-of-fit tests are not generated so directly. In multilevel modelling terminology (Goldstein, 1995), the individuals are the level 2 units and the repeated measurements at the different waves represent level 1 units. Pfeffermann et al. (1998) allow for a two-stage sampling scheme, whereby the level 2 units i are selected with inclusion probabilities p i and the level 1 units t with inclusion probabilities p tji conditional on level 2 unit i being selected. Weights w i and w tji are then constructed equal to the reciprocals of these respective probabilities, which are assumed known. To adapt this approach to our context of longitudinal surveys subject to wave nonresponse, it seems natural to let p i denote the probability that individual i is sampled and p tji the probability that this individual responds at wave t. While we may reasonably suppose that the p i are known, it is not straightforward to estimate the p tji for general patterns of wave nonresponse (as noted in the covariance structure approach of Section 14.2). We therefore restrict attention to estimation using only the data derived from the attrition samples s t. As noted in Section 14.2, it is common for longitudinal weights w it to be available for use with these attrition samples and we shall suppose here that these approximate (p i p tji ) 1. We may then set w i equal to the design weight p 1 i and w tji equal to w it =w i. Alternatively, given w i1,..., w it, we may set w i ˆ w i1 and w tji ˆ w it =w i1 (t ˆ 1... T). Note, in particular, that in this case w 1ji ˆ 1 for all i. This approach treats the sample selection and the response process at the first wave as a common selection process. In the approach of Pfeffermann et al. (1998), correction for bias by weighting tends to be more difficult at level 1 than at level 2, because there tends to be more non-linearity in the IGLS estimator as a function of level 1 sums than of level 2 sums. Hence setting w i ˆ w i1 may be preferable to setting w i ˆ p 1 i because the resulting w tji may be less variable and closer to one. Having then constructed the weights w i and w tji, the approach of Pfeffermann et al. (1998) may be applied to fit a model of form (14.2) where the v it are not serially correlated. This is Model A. The basic approach is to modify the IGLS estimation procedure by weighting all sums over i by the weights w i and weighting all sums over t by the weights w tji. Often survey weights are only available in a scaled form; for example, so that they sum to the sample size. For inference about many regression-type models, as in Parts B and C of this book, estimation procedures for the model parameters are invariant to such scaling. Although this is also true for multilevel modelling if the w i are scaled, it is not true if the weights w tji are scaled. Pfeffermann et al. (1998) took advantage of this fact to choose a scaling to minimise small-sample estimation bias. In our context we consider scaling the weights w tji to construct the scaled weights w tji as

8 212 RANDOM EFFECTS MODELS FOR LONGITUDINAL SURVEY DATA " # w tji ˆ t (i)w tji = Xt (i) where t (i) is the last wave at which individual i responds (1 t (i) T). Hence the average weight w tji for individual i across waves 1,..., t (i) is equal to one. We now consider the question of how to adapt the approach of Pfeffermann et al. (1998) to allow for possible serial correlation of the v it in Model B. We follow an approach similar to that in Hsiao (1986, section 3.7), which is based on observing that if we know r then Model B may be transformed to the form of Model A by y it ry it 1 ˆ (b t rb t 1 ) (1 r)u i " it : (14:14) The estimation procedure involves two steps: tˆ1 w tji Step 1. Eliminate the random effect u i by differencing the responses y it D it ˆ y it y it 1, i 2 s t, t ˆ 2,..., T and estimate the linear regression model D it ˆ d t gd it 1 Z it by OLS weighted by the weights w it for observations i in the attrition samples s t (t ˆ 2,..., T), where the parameters d t are unconstrained. Under Model B, the least squares estimator ^g of g is consistent for Set ^r ˆ 1 2^g. g ˆ cov(d it, D it 1 )=var(d it 1 ) t ˆ 2,..., T ˆ [ (1 r)s 2 " =(1 r)]=[2s2 " =(1 r)] ˆ (1 r)=2: Step 2. Let ~y it ˆ y it ^ry it 1 and fit the model obtained from (14.14) for the transformed data: ~y it ˆ ~b t ~u i ~" it (14:15) using the approach of Pfeffermann et al. (1998) with the assumptions of Model A applying to the model in (14.15). The estimated variance of ~u i is then divided by (1 ^r) 2 to obtain the estimate ^s 2 u. This two-step approach produces consistent estimators of the parameters of Model B but the resulting standard errors of ^s 2 u and ^s2 " will not allow for uncertainty in the estimation of r. Finally, we note that Pfeffermann et al. (1998) only allowed for the sample to be clustered into level 2 units. In the application in Section 14.4 the sampling design will also lead to geographical clustering of the sample individuals into

9 an application AN APPLICATION 213 primary sampling units. The procedure for standard error estimation proposed by Pfeffermann et al. (1998) therefore needs to be extended to handle this case. We shall not, however, consider this extension here, presenting only point estimates for the multilevel modelling approach in the next section AN APPLICATION: EARNINGS OF MALE EMPLOYEES IN GREAT BRITAIN In this section we apply the approaches set out earlier to fit random effects models to longitudinal data on the monthly earnings of male full-time employees in Great Britain for the period 1991±5, using data from the British Household Panel Study (BHPS). The BHPS is a household panel survey, based on a sample of around individuals. Data were first collected in 1991 and successive waves have taken place annually (Berthoud and Gershuny, 2000). We base our analysis on the work of Ramos (1999). Like him, we consider only men over the first five waves of the BHPS and divide the men into four age cohorts in order to control for life cycle effects. These cohorts consist of men (i) born before 1941, (ii) born between 1941 and 1950, (iii) born between 1951 and 1960 and (iv) born after The variable y is taken as the logarithm of earnings, with earnings being defined as the usual monthly earnings or salary payment before tax, for a reference period determined in the survey. We avoid the problem of zero earnings by defining the target population at wave t to consist of those men in the age cohorts who have positive earnings. It is thus possible for individuals to move in and out of the target population between waves. It is clearly plausible that the earnings behaviour of those moving in and out of the target population will differ systematically from those remaining in the target population. For simplicity, we shall, however, assume that the models defined in Section 14.1 apply to all individuals when they have positive earnings. The panel sample was selected by stratified multistage sampling, with postal sectors as primary sampling units (PSUs). We use the standard linearisation approach to variance estimation for stratified multistage samples (e.g. SHS, p. 50). The BHPS involves a stratified sample of 250 PSUs. For the purpose of variance estimation, we approximate this stratified design as being defined by 75 strata, obtained by first breaking down each of 18 regional strata into 2 or 3 `major strata', defined according to proportion of `head of households' in professional/managerial positions, and then by breaking down each of these major strata into 2 `minor strata', defined according to the proportion of the population of pensionable age. We first assess the fit of Models A and B (defined in Section 14.1) for each of the four cohorts. The results are presented in Table We use goodness-offit tests based on the covariance structure approach of Section 14.2, with three choices of the matrix V in (14.8):

10 214 RANDOM EFFECTS MODELS FOR LONGITUDINAL SURVEY DATA Table 14.1 Goodness-of-fit test statistics for Models A and B for four cohorts and three estimation methods. Model A Cohort (when born) OLS GLS (iid) Model B GLS (complex) OLS GLS (iid) GLS (complex) Before ± b 39.0 b 39.9 b 28.4 b 27.0 b 29.5 b 1951± b 43.3 b b 37.4 b 35.5 b Notes: 1. Test statistics are weighted and are referred to the chi-squared distribution with 13 df for Model A and 12 df for Model B. 2. a significant at 5 % level; b significant at 1 % level. 3. OLS and GLS (iid) test statistics involve Rao±Scott first-order correction. OLS : V ˆ I, the identity matrix; GLS (iid): V ˆ V iid, as defined in (14.13); GLS (complex): V ˆ V c, the linearisation estimator of the covariance matrix of A Ã, based upon (14.9), allowing for the complex design. For V ˆ V c, the test statistic is given by XW 2 in (14.11) with the null distribution indicated in Section For V ˆ I or V iid, the values of ^y GLS and V c in (14.11) are replaced by the corresponding values of ^y and V and a first-order Rao±Scott adjustment is applied to the test statistic (SHS, Ch. 4). The same null distributions as for V c are used. Test statistics based upon second-order Rao±Scott approximations were also calculated and led to similar results. All of the test statistics are based on data from the attrition sample s 5 at wave 5, for individuals who gave full interviews at each of the five waves. Longitudinal weights w i5 were used, which allow both for unequal sampling probabilities and for differential attrition from nonresponse over the five waves. To allow for the changing population, the expression for the estimated covariance matrix in (14.7) was modified by including only those who reported positive earnings at each wave in the estimation of the covariance between the log earnings at two waves. The values of the test statistics in Table 14.1 are referred to a chi-squared null distribution with 13 degrees of freedom in the case of Model A and with 12 degrees of freedom in the case of Model B. The results suggest that Model A provides an adequate fit for the cohort born before 1941 but not for the other cohorts and that Model B provides an adequate fit for all cohorts, except the one consisting of those born between 1941 and The values of the test statistics vary according to the three choices of V. The differences between the values of the test statistics for the GLS (iid) and GLS (complex) choices of V are not large, reflecting the fact that there is a large number of degrees of freedom for estimating the covariance matrix of ^A (relative to the dimension of the matrix) and that the pairs of V matrices tend not to be dramatically disproportionate. The value of the test statistic with V as

11 AN APPLICATION 215 the identity matrix suggests a much better fit of both Models A and B for the 1951±60 cohort and a somewhat better fit for the cohort born after This may be because this test statistic tends to be sensitive to different deviations from the null hypothesis than the GLS test statistics. The 1951±60 cohort is distinctive in having less variation among the estimated variances of log earnings over the five waves and, more generally, displays the least evidence of nonstationarity. Because of the high positive correlation between the elements of ^A, the test statistic with V as the identity matrix may be expected to attach greater `weight' to such departures from Model A than the GLS test statistics and this may lead to the noticeable difference in values for the 1951±60 cohort. Strong graphical evidence against Model A for this cohort is provided by Figure This figure plots the elements ^S tt 0 of ^S in (14.3) against jt t 0 j and there is a clear tendency for the covariances to decline as the number of years between waves increases. This suggests that the insignificant value of the test statistic for Model A, with V as the identity matrix, reflects lack of power. Estimates of the parameters in Model B are presented in Table 14.2 for the three cohorts for which Model B shows no significant lack of fit in Table Estimates are presented for the same three choices of V matrix as in Table While the estimates based on the two GLS choices of V are fairly similar, the OLS estimates, with V as the identity matrix, can be noticeably different, especially for the 1951±60 cohort. The effect of the differences for the cohort born after 1960 is illustrated in Figure 14.2, in which the estimated variances and covariances from (14.7) are presented together with fitted lines, joining the variances and covariances under Model B, implied by the parameter estimates in Table The lines for the GLS choices of V are surprisingly low, unlike the OLS line, which passes through the middle of the points. Similar underfitting of the variances and covariances occurs for the other cohorts and this finding may reflect downward bias in such estimates employing 0.3 Variances and co-variances Years apart Figure 14.1 Estimated variances and covariances for cohort born 1951±60.

12 216 RANDOM EFFECTS MODELS FOR LONGITUDINAL SURVEY DATA Table 14.2 Parameter estimates for Model B for three cohorts using covariance structure approach. Cohort (when born) Estimator Parameter r s 2 u s 2 e Before 1941 OLS 0.37 (0.16) (0.028) (0.018) GLS (iid) 0.35 (0.16) (0.024) (0.011) GLS (complex) 0.32 (0.13) (0.022) (0.009) 1951±60 OLS 0.56 (0.11) (0.021) (0.015) GLS (iid) 0.85 (0.09) (0.047) (0.047) GLS (complex) 0.85 (0.09) (0.044) (0.045) After 1960 OLS 0.49 (0.08) (0.018) (0.014) GLS (iid) 0.41 (0.07) (0.016) (0.010) GLS (complex) 0.40 (0.07) (0.016) (0.009) Notes: 1. Standard errors in parentheses. 2. Estimates are weighted and based only on data for attrition sample at wave ±50 cohort is excluded because of lack of fit of Model B in Table Variances and co-variances OLS GLS (lid) GLS (complex) Years apart Figure 14.2 Estimated variances and covariances for cohort born after 1960 with values fitted under Model B. sample-based V matrices, as discussed, for example, by Altonji and Segal (1996) and Browne (1984). The inversion of V implies that the lowest variances tend to receive most `weight', leading to the fitted line following more the lower envelope of the points than the centre of them. The potential presence of non-negligible

13 AN APPLICATION 217 bias suggests that choosing V as the identity matrix may be preferable here for the purpose of parameter estimation, as concluded by Altonji and Segal (1996). Table 14.3 shows for one cohort the effects of weighting, of the use of data from all attrition samples and of the use of the multilevel modelling approach of Section For the covariance structure approach, the impact of weighting is similar for all three choices of the matrix V. The fairly modest impact of weighting is expected here, since the BHPS weights do not vary greatly and are not strongly related to earnings. The impact of using data from all attrition samples s 1,..., s 5, not just from s 5, appears to be a little more marked than the impact of weighting. This may reflect the fact that the earnings behaviour of those men who leave the sample before 1995 may be different from those who remain in the sample for all five waves. In particular, this behaviour may be less stable leading to a reduction in the estimated correlation r. Control for possible informative attrition might be attempted by including covariates in the model. Table 14.3 Parameter estimates for Model B for cohort born after Estimator Parameter r s 2 u s 2 e Covariance structure approach Using attrition sample at wave 5 only Weighted OLS GLS (iid) GLS (complex) Unweighted OLS GLS (iid) GLS (complex) Using all five attrition samples (weighted) OLS GLS (iid) GLS (complex) Multilevel modelling approach Using attrition sample at wave 5 only Weighted unscaled Weighted scaled Unweighted Using all five attrition samples Weighted unscaled Weighted scaled Unweighted

14 remarks 218 RANDOM EFFECTS MODELS FOR LONGITUDINAL SURVEY DATA The results for the multilevel modelling approach in Table 14.3 are based upon the two-step method described in Section The estimated value of r is first determined and then estimates of ^s 2 u and ^s2 e are obtained by the method of Pfeffermann et al. (1998) either with or without weights and, in the former case, the weights may be scaled or not. The impact of weighting on the multilevel approach is again modest, indeed somewhat more modest than for the covariance structure approach. This may be because a common estimate of r is used here. Scaling the weights also has little effect. This may be because all the weights w tji are fairly close to one in this application and thus scaling has less of an impact than in the two-stage sampling application in Pfeffermann et al. (1998). The differences between the estimates from the covariance structure approach and the corresponding multilevel modelling approaches are not especially large in Table 14.3 relative to the standard errors in Table Nevertheless, across all four cohorts and both models, the main differences in the estimates between methods were between the three choices of V matrix for the covariance structure approach and between the covariance structure and the multilevel approaches. The impact of weighting and the scaling of weights tended to be less important CONCLUDING REMARKS concluding It is often useful to include random effects in the specification of models for longitudinal survey data. In this chapter we have considered two approaches to allowing for complex survey designs and sample attrition when fitting such models. The covariance structure approach is particularly natural with survey data. The complex survey design and attrition are allowed for when making inference about the covariance matrix of the longitudinal responses. Modelling of the structure of this matrix may then proceed in a standard way. The second approach is to adapt standard multilevel modelling procedures, extending the approach of Pfeffermann et al. (1998). The two approaches may be compared in a number of ways:. The multilevel approach incorporates the different attrition samples more directly, although the possible creation of bias with unequal w tji (for given i) with small numbers of level 1 units (i.e. small T), as discussed by Pfeffermann et al. (1998), may be a problem.. The multilevel approach incorporates covariates more naturally, although the extension of the covariance structure approach to include covariates using LISREL models is well established.. The covariance structure approach handles serial correlation more easily.. The covariance structure approach generates goodness-of-fit tests and residuals at the level of variances and covariances. The multilevel approach generates unit level residuals.

15 CONCLUDING REMARKS 219 Finally, our application of the covariance structure approach to the BHPS data showed evidence of bias in the estimation of the variance components when using GLS with a covariance matrix V estimated from the data. This accords with the findings of Altonji and Segal (1996). This evidence suggests that it is safer to specify V as the identity matrix and use Rao ± Scott adjustments for testing.

SYSTEMS OF REGRESSION EQUATIONS

SYSTEMS OF REGRESSION EQUATIONS SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations

More information

Comparison of Estimation Methods for Complex Survey Data Analysis

Comparison of Estimation Methods for Complex Survey Data Analysis Comparison of Estimation Methods for Complex Survey Data Analysis Tihomir Asparouhov 1 Muthen & Muthen Bengt Muthen 2 UCLA 1 Tihomir Asparouhov, Muthen & Muthen, 3463 Stoner Ave. Los Angeles, CA 90066.

More information

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium

More information

Marketing Mix Modelling and Big Data P. M Cain

Marketing Mix Modelling and Big Data P. M Cain 1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

More information

Handling attrition and non-response in longitudinal data

Handling attrition and non-response in longitudinal data Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

More information

Chapter 11 Introduction to Survey Sampling and Analysis Procedures

Chapter 11 Introduction to Survey Sampling and Analysis Procedures Chapter 11 Introduction to Survey Sampling and Analysis Procedures Chapter Table of Contents OVERVIEW...149 SurveySampling...150 SurveyDataAnalysis...151 DESIGN INFORMATION FOR SURVEY PROCEDURES...152

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models

A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models Grace Y. Yi 13, JNK Rao 2 and Haocheng Li 1 1. University of Waterloo, Waterloo, Canada

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Visualization of Complex Survey Data: Regression Diagnostics

Visualization of Complex Survey Data: Regression Diagnostics Visualization of Complex Survey Data: Regression Diagnostics Susan Hinkins 1, Edward Mulrow, Fritz Scheuren 3 1 NORC at the University of Chicago, 11 South 5th Ave, Bozeman MT 59715 NORC at the University

More information

Robust Inferences from Random Clustered Samples: Applications Using Data from the Panel Survey of Income Dynamics

Robust Inferences from Random Clustered Samples: Applications Using Data from the Panel Survey of Income Dynamics Robust Inferences from Random Clustered Samples: Applications Using Data from the Panel Survey of Income Dynamics John Pepper Assistant Professor Department of Economics University of Virginia 114 Rouss

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Econometric Methods for Panel Data

Econometric Methods for Panel Data Based on the books by Baltagi: Econometric Analysis of Panel Data and by Hsiao: Analysis of Panel Data Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Examining the effects of exchange rates on Australian domestic tourism demand: A panel generalized least squares approach

Examining the effects of exchange rates on Australian domestic tourism demand: A panel generalized least squares approach 19th International Congress on Modelling and Simulation, Perth, Australia, 12 16 December 2011 http://mssanz.org.au/modsim2011 Examining the effects of exchange rates on Australian domestic tourism demand:

More information

FIXED EFFECTS AND RELATED ESTIMATORS FOR CORRELATED RANDOM COEFFICIENT AND TREATMENT EFFECT PANEL DATA MODELS

FIXED EFFECTS AND RELATED ESTIMATORS FOR CORRELATED RANDOM COEFFICIENT AND TREATMENT EFFECT PANEL DATA MODELS FIXED EFFECTS AND RELATED ESTIMATORS FOR CORRELATED RANDOM COEFFICIENT AND TREATMENT EFFECT PANEL DATA MODELS Jeffrey M. Wooldridge Department of Economics Michigan State University East Lansing, MI 48824-1038

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

More information

The Basic Two-Level Regression Model

The Basic Two-Level Regression Model 2 The Basic Two-Level Regression Model The multilevel regression model has become known in the research literature under a variety of names, such as random coefficient model (de Leeuw & Kreft, 1986; Longford,

More information

Department of Economics

Department of Economics Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 1473-0278 On Testing for Diagonality of Large Dimensional

More information

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Examples: Multilevel Modeling With Complex Survey Data CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Complex survey data refers to data obtained by stratification, cluster sampling and/or

More information

HLM software has been one of the leading statistical packages for hierarchical

HLM software has been one of the leading statistical packages for hierarchical Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

Penalized regression: Introduction

Penalized regression: Introduction Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

More information

Time series Forecasting using Holt-Winters Exponential Smoothing

Time series Forecasting using Holt-Winters Exponential Smoothing Time series Forecasting using Holt-Winters Exponential Smoothing Prajakta S. Kalekar(04329008) Kanwal Rekhi School of Information Technology Under the guidance of Prof. Bernard December 6, 2004 Abstract

More information

PITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU

PITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU PITFALLS IN TIME SERIES ANALYSIS Cliff Hurvich Stern School, NYU The t -Test If x 1,..., x n are independent and identically distributed with mean 0, and n is not too small, then t = x 0 s n has a standard

More information

Chapter 19 Statistical analysis of survey data. Abstract

Chapter 19 Statistical analysis of survey data. Abstract Chapter 9 Statistical analysis of survey data James R. Chromy Research Triangle Institute Research Triangle Park, North Carolina, USA Savitri Abeyasekera The University of Reading Reading, UK Abstract

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses

Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses G. Gordon Brown, Celia R. Eicheldinger, and James R. Chromy RTI International, Research Triangle Park, NC 27709 Abstract

More information

Example G Cost of construction of nuclear power plants

Example G Cost of construction of nuclear power plants 1 Example G Cost of construction of nuclear power plants Description of data Table G.1 gives data, reproduced by permission of the Rand Corporation, from a report (Mooz, 1978) on 32 light water reactor

More information

1 Short Introduction to Time Series

1 Short Introduction to Time Series ECONOMICS 7344, Spring 202 Bent E. Sørensen January 24, 202 Short Introduction to Time Series A time series is a collection of stochastic variables x,.., x t,.., x T indexed by an integer value t. The

More information

Multilevel modelling of complex survey data

Multilevel modelling of complex survey data J. R. Statist. Soc. A (2006) 169, Part 4, pp. 805 827 Multilevel modelling of complex survey data Sophia Rabe-Hesketh University of California, Berkeley, USA, and Institute of Education, London, UK and

More information

Centre for Central Banking Studies

Centre for Central Banking Studies Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics

More information

Multilevel Modeling of Complex Survey Data

Multilevel Modeling of Complex Survey Data Multilevel Modeling of Complex Survey Data Sophia Rabe-Hesketh, University of California, Berkeley and Institute of Education, University of London Joint work with Anders Skrondal, London School of Economics

More information

http://www.jstor.org This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions

http://www.jstor.org This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions A Significance Test for Time Series Analysis Author(s): W. Allen Wallis and Geoffrey H. Moore Reviewed work(s): Source: Journal of the American Statistical Association, Vol. 36, No. 215 (Sep., 1941), pp.

More information

The Loss in Efficiency from Using Grouped Data to Estimate Coefficients of Group Level Variables. Kathleen M. Lang* Boston College.

The Loss in Efficiency from Using Grouped Data to Estimate Coefficients of Group Level Variables. Kathleen M. Lang* Boston College. The Loss in Efficiency from Using Grouped Data to Estimate Coefficients of Group Level Variables Kathleen M. Lang* Boston College and Peter Gottschalk Boston College Abstract We derive the efficiency loss

More information

Chapter 2. Dynamic panel data models

Chapter 2. Dynamic panel data models Chapter 2. Dynamic panel data models Master of Science in Economics - University of Geneva Christophe Hurlin, Université d Orléans Université d Orléans April 2010 Introduction De nition We now consider

More information

Panel Data Econometrics

Panel Data Econometrics Panel Data Econometrics Master of Science in Economics - University of Geneva Christophe Hurlin, Université d Orléans University of Orléans January 2010 De nition A longitudinal, or panel, data set is

More information

Univariate and Multivariate Methods PEARSON. Addison Wesley

Univariate and Multivariate Methods PEARSON. Addison Wesley Time Series Analysis Univariate and Multivariate Methods SECOND EDITION William W. S. Wei Department of Statistics The Fox School of Business and Management Temple University PEARSON Addison Wesley Boston

More information

Clarifying Some Issues in the Regression Analysis of Survey Data

Clarifying Some Issues in the Regression Analysis of Survey Data Survey Research Methods (2007) http://w4.ub.uni-konstanz.de/srm Vol. 1, No. 1, pp. 11-18 c European Survey Research Association Clarifying Some Issues in the Regression Analysis of Survey Data Phillip

More information

Advanced Forecasting Techniques and Models: ARIMA

Advanced Forecasting Techniques and Models: ARIMA Advanced Forecasting Techniques and Models: ARIMA Short Examples Series using Risk Simulator For more information please visit: www.realoptionsvaluation.com or contact us at: admin@realoptionsvaluation.com

More information

Approaches for Analyzing Survey Data: a Discussion

Approaches for Analyzing Survey Data: a Discussion Approaches for Analyzing Survey Data: a Discussion David Binder 1, Georgia Roberts 1 Statistics Canada 1 Abstract In recent years, an increasing number of researchers have been able to access survey microdata

More information

Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88)

Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88) Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88) Introduction The National Educational Longitudinal Survey (NELS:88) followed students from 8 th grade in 1988 to 10 th grade in

More information

Network quality control

Network quality control Network quality control Network quality control P.J.G. Teunissen Delft Institute of Earth Observation and Space systems (DEOS) Delft University of Technology VSSD iv Series on Mathematical Geodesy and

More information

Factor analysis. Angela Montanari

Factor analysis. Angela Montanari Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

More information

Time Series Analysis

Time Series Analysis Time Series Analysis Identifying possible ARIMA models Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and García-Martos

More information

Linear Models for Continuous Data

Linear Models for Continuous Data Chapter 2 Linear Models for Continuous Data The starting point in our exploration of statistical models in social research will be the classical linear model. Stops along the way include multiple linear

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

ADVANCED FORECASTING MODELS USING SAS SOFTWARE

ADVANCED FORECASTING MODELS USING SAS SOFTWARE ADVANCED FORECASTING MODELS USING SAS SOFTWARE Girish Kumar Jha IARI, Pusa, New Delhi 110 012 gjha_eco@iari.res.in 1. Transfer Function Model Univariate ARIMA models are useful for analysis and forecasting

More information

Chapter 10: Basic Linear Unobserved Effects Panel Data. Models:

Chapter 10: Basic Linear Unobserved Effects Panel Data. Models: Chapter 10: Basic Linear Unobserved Effects Panel Data Models: Microeconomic Econometrics I Spring 2010 10.1 Motivation: The Omitted Variables Problem We are interested in the partial effects of the observable

More information

1 Introduction to Matrices

1 Introduction to Matrices 1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns

More information

ON THE ROBUSTNESS OF FIXED EFFECTS AND RELATED ESTIMATORS IN CORRELATED RANDOM COEFFICIENT PANEL DATA MODELS

ON THE ROBUSTNESS OF FIXED EFFECTS AND RELATED ESTIMATORS IN CORRELATED RANDOM COEFFICIENT PANEL DATA MODELS ON THE ROBUSTNESS OF FIXED EFFECTS AND RELATED ESTIMATORS IN CORRELATED RANDOM COEFFICIENT PANEL DATA MODELS Jeffrey M. Wooldridge THE INSTITUTE FOR FISCAL STUDIES DEPARTMENT OF ECONOMICS, UCL cemmap working

More information

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy. Blue vs. Orange. Review Jeopardy Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

More information

Chapter 4: Vector Autoregressive Models

Chapter 4: Vector Autoregressive Models Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...

More information

New SAS Procedures for Analysis of Sample Survey Data

New SAS Procedures for Analysis of Sample Survey Data New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many

More information

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

More information

Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure

Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure Technical report Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure Table of contents Introduction................................................................ 1 Data preparation

More information

Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA

Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA Paper P-702 Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA ABSTRACT Individual growth models are designed for exploring longitudinal data on individuals

More information

Regression analysis of probability-linked data

Regression analysis of probability-linked data Regression analysis of probability-linked data Ray Chambers University of Wollongong James Chipperfield Australian Bureau of Statistics Walter Davis Statistics New Zealand 1 Overview 1. Probability linkage

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4

4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4 4. Simple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/4 Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

More information

Time Series Analysis

Time Series Analysis JUNE 2012 Time Series Analysis CONTENT A time series is a chronological sequence of observations on a particular variable. Usually the observations are taken at regular intervals (days, months, years),

More information

Time Series Analysis

Time Series Analysis Time Series Analysis Forecasting with ARIMA models Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and García-Martos (UC3M-UPM)

More information

Life Table Analysis using Weighted Survey Data

Life Table Analysis using Weighted Survey Data Life Table Analysis using Weighted Survey Data James G. Booth and Thomas A. Hirschl June 2005 Abstract Formulas for constructing valid pointwise confidence bands for survival distributions, estimated using

More information

State Space Time Series Analysis

State Space Time Series Analysis State Space Time Series Analysis p. 1 State Space Time Series Analysis Siem Jan Koopman http://staff.feweb.vu.nl/koopman Department of Econometrics VU University Amsterdam Tinbergen Institute 2011 State

More information

Introduction to time series analysis

Introduction to time series analysis Introduction to time series analysis Margherita Gerolimetto November 3, 2010 1 What is a time series? A time series is a collection of observations ordered following a parameter that for us is time. Examples

More information

Detekce změn v autoregresních posloupnostech

Detekce změn v autoregresních posloupnostech Nové Hrady 2012 Outline 1 Introduction 2 3 4 Change point problem (retrospective) The data Y 1,..., Y n follow a statistical model, which may change once or several times during the observation period

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Clustering in the Linear Model

Clustering in the Linear Model Short Guides to Microeconometrics Fall 2014 Kurt Schmidheiny Universität Basel Clustering in the Linear Model 2 1 Introduction Clustering in the Linear Model This handout extends the handout on The Multiple

More information

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical

More information

Forecasting in supply chains

Forecasting in supply chains 1 Forecasting in supply chains Role of demand forecasting Effective transportation system or supply chain design is predicated on the availability of accurate inputs to the modeling process. One of the

More information

4. Work and retirement

4. Work and retirement 4. Work and retirement James Banks Institute for Fiscal Studies and University College London María Casanova Institute for Fiscal Studies and University College London Amongst other things, the analysis

More information

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2 University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

Longitudinal Meta-analysis

Longitudinal Meta-analysis Quality & Quantity 38: 381 389, 2004. 2004 Kluwer Academic Publishers. Printed in the Netherlands. 381 Longitudinal Meta-analysis CORA J. M. MAAS, JOOP J. HOX and GERTY J. L. M. LENSVELT-MULDERS Department

More information

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem Chapter Vector autoregressions We begin by taking a look at the data of macroeconomics. A way to summarize the dynamics of macroeconomic data is to make use of vector autoregressions. VAR models have become

More information

Models for Longitudinal and Clustered Data

Models for Longitudinal and Clustered Data Models for Longitudinal and Clustered Data Germán Rodríguez December 9, 2008, revised December 6, 2012 1 Introduction The most important assumption we have made in this course is that the observations

More information

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1. MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

More information

Technical Efficiency Accounting for Environmental Influence in the Japanese Gas Market

Technical Efficiency Accounting for Environmental Influence in the Japanese Gas Market Technical Efficiency Accounting for Environmental Influence in the Japanese Gas Market Sumiko Asai Otsuma Women s University 2-7-1, Karakida, Tama City, Tokyo, 26-854, Japan asai@otsuma.ac.jp Abstract:

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

Comparison of Imputation Methods in the Survey of Income and Program Participation

Comparison of Imputation Methods in the Survey of Income and Program Participation Comparison of Imputation Methods in the Survey of Income and Program Participation Sarah McMillan U.S. Census Bureau, 4600 Silver Hill Rd, Washington, DC 20233 Any views expressed are those of the author

More information

3.1 Least squares in matrix form

3.1 Least squares in matrix form 118 3 Multiple Regression 3.1 Least squares in matrix form E Uses Appendix A.2 A.4, A.6, A.7. 3.1.1 Introduction More than one explanatory variable In the foregoing chapter we considered the simple regression

More information

171:290 Model Selection Lecture II: The Akaike Information Criterion

171:290 Model Selection Lecture II: The Akaike Information Criterion 171:290 Model Selection Lecture II: The Akaike Information Criterion Department of Biostatistics Department of Statistics and Actuarial Science August 28, 2012 Introduction AIC, the Akaike Information

More information

Poor identification and estimation problems in panel data models with random effects and autocorrelated errors

Poor identification and estimation problems in panel data models with random effects and autocorrelated errors Poor identification and estimation problems in panel data models with random effects and autocorrelated errors Giorgio Calzolari Laura Magazzini January 7, 009 Submitted for presentation at the 15th Conference

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Evaluating one-way and two-way cluster-robust covariance matrix estimates

Evaluating one-way and two-way cluster-robust covariance matrix estimates Evaluating one-way and two-way cluster-robust covariance matrix estimates Christopher F Baum 1 Austin Nichols 2 Mark E Schaffer 3 1 Boston College and DIW Berlin 2 Urban Institute 3 Heriot Watt University

More information

Joint models for classification and comparison of mortality in different countries.

Joint models for classification and comparison of mortality in different countries. Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute

More information

Online Appendices to the Corporate Propensity to Save

Online Appendices to the Corporate Propensity to Save Online Appendices to the Corporate Propensity to Save Appendix A: Monte Carlo Experiments In order to allay skepticism of empirical results that have been produced by unusual estimators on fairly small

More information

TIME SERIES ANALYSIS

TIME SERIES ANALYSIS TIME SERIES ANALYSIS L.M. BHAR AND V.K.SHARMA Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-0 02 lmb@iasri.res.in. Introduction Time series (TS) data refers to observations

More information

Power and sample size in multilevel modeling

Power and sample size in multilevel modeling Snijders, Tom A.B. Power and Sample Size in Multilevel Linear Models. In: B.S. Everitt and D.C. Howell (eds.), Encyclopedia of Statistics in Behavioral Science. Volume 3, 1570 1573. Chicester (etc.): Wiley,

More information

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

More information

Longitudinal and Panel Data

Longitudinal and Panel Data Longitudinal and Panel Data Analysis and Applications in the Social Sciences EDWARD W. FREES University of Wisconsin Madison PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE The Pitt Building,

More information