An extension of the factoring likelihood approach for nonmonotone missing data


 Lambert Cummings
 2 years ago
 Views:
Transcription
1 An extension of the factoring likelihood approach for nonmonotone missing data Jae Kwang Kim Dong Wan Shin January 14, 2010 ABSTRACT We address the problem of parameter estimation in multivariate distributions under ignorable nonmonotone missing data. The factoring likelihood method for monotone missing data, termed by Rubin 1974), is extended to a more general case of nonmonotone missing data. The proposed method is equivalent to the NewtonRaphson method for the observed likelihood, but avoids the burden of computing the first and the second partial derivatives of the observed likelihood. Instead, the maximum likelihood estimates and their information matrices for each partition of the data set are computed separately and combined naturally using the generalized least squares method. A numerical example is presented to illustrate the method. A MonteCarlo experiment compares the proposed method with the EM method. KEY WORDS. EM algorithm, GaussNewton method, Generalized least squares, Maximum likelihood estimator, Missing at random. Department of Statistics, Iowa State University, Ames, IA, 50014, U.S.A., phone: , fax: Department of Statistics, Ewha University, Seoul, , Korea, phone: , fax:
2 1 Introduction Missing data is quite common in practice. Statistical analysis of data with missing values is an important practical problem because missing data is oftentimes nonignorable. When we simply ignore missing data, the resulting estimates will have nonresponse bias if the responding part of the sample is systematically different than the nonresponding part of the sample. Also, we may lose some information observed in the partially missing data. Little and Rubin 2002) and Molenberghs and Kenward 2007) provide comprehensive overviews of the missing data problem. We consider statistical inference of data with missing values using the maximum likelihood method. Specifically, we propose a computational tool for obtaining the maximum likelihood estimator MLE) under multivariate missing data. To explain the basic idea, we use an example of bivariate normal data. Later in Section 2.3, an extension to general multivariate data is proposed. Let y i = y 1i, y 2i ) be a bivariate normal random variables distributed as y1i y 2i ) [ iid µ1 N µ 2 ) σ11 σ, 12 σ 12 σ 22 )], 1) where iid is the abbreviation of independently and identically distributed. Note that five parameters, µ 1, µ 2, σ 11, σ 12, and σ 22, are needed to identify the bivariate normal distribution. We assume that the observations are missing at random MAR) as defined in Rubin 1976) so that the relevant likelihood is the observed likelihood, or the marginal likelihood of the observed data. Under MAR, we can ignore the response mechanism when estimating the population parameters. 1
3 If the missing data pattern is monotone in the sense that the set of respondents for one variable is a subset of the respondent set of the other variable, the observed likelihood can be factored into the marginal likelihood for one variable and the conditional likelihood for the second variable conditional on the first so that the maximum likelihood estimates can be estimated separately for each likelihood. For example, assume that y 1 is fully observed with n observations and y 2 is observed with r observations. Anderson 1957) first considered maximum likelihood parameter estimation under this setup by using an alternative representation of the bivariate normal distribution as y 1i y 2i y 1i iid N µ 1, σ 11 ) 2) iid N β β 21 1 y 1i, σ 22 1 ), where β 20 1 = µ 2 β 21 1 µ 1, β 21 1 = σ 1 11 σ 12 and σ 22 1 = σ 22 β σ 11. The observed likelihood is then written as a product of marginal likelihood of a fully observed variable y 1 and the conditional likelihood of y 2 given y 1. Thus, the parameters µ 1 and σ 11 for the marginal distribution of y 1 can be estimated with n observations and the other regression parameters, β 20 1, β 21 1, and σ 22 1, can be estimated from the conditional distribution with r observations. The factoring likelihood FL) method, termed by Rubin 1974), expresses the observed likelihood as a product of the marginal likelihood and the conditional likelihood so that the maximum likelihood estimates can be obtained separately at each likelihood. Note that the FL approach consists of two steps. In the first step, the likelihood is factored, and in the second step the MLE for each likelihood is computed separately. In many cases, the MLE s are easily computed in the FL approach because the marginal and the condi 2
4 tional likelihoods are known so that we can directly use the known solutions of the likelihood equations for each likelihood. For the monotone missing data, the MLE s for the conditional distribution are independent of those for the marginal distribution. This is because the two sets of parameters  the parameters for the marginal likelihood and those for the conditional likelihood  are orthogonal Cox and Reid, 1997) and as a result the MLE s for the conditional likelihood are not affected by the MLE s for the marginal likelihood. Rubin 1974) recommended the FL approach as a general framework in the analysis of missing data with a monotone missing pattern. The main advantage of the FL is its computational simplicity. Under nonmonotone missing data patterns, however, the FL approach is not directly applicable. The EM algorithm, proposed by Dempster et al. 1977), successfully provides MLE s under a general missing pattern. Using the EM algorithm also avoids the calculation of the observed likelihood function and uses only the complete likelihood function. Despite its popularity, there are several shortcomings of using the EM algorithm. First, the computation is performed iteratively and the convergence is notoriously slow Liu and Rubin, 1994). Second, the covariance matrix of the estimated parameters is not provided directly Louis, 1982, and Meng and Rubin, 1991). The focus of this paper is to propose an alternative method that will resolve these two issues at the same time. In this paper, we consider an extension of the FL method to the nonmonotone missing data. To apply the FL method to nonmonotone missing, in addition to the two steps in the original FL approach, we need another step that combines the separate MLE s computed for each likelihood to produce 3
5 the final MLE s. The proposed method turns out to be essentially the same as the direct maximum likelihood method using the NewtonRaphson algorithm which converges much faster than the EM algorithm. Furthermore, the proposed method provides the asymptotic variancecovariance matrix of the MLE s directly as a byproduct of the computation. Using the variancecovariance expression, the asymptotic variances are compared with other estimators obtained by ignoring some part of partially observed data. A related work is Chen et al. 2008), who compared variances of estimators for regression models with missing responses and covariates. The proposed method is an extension of the preliminary work of Kim 2004) who considered the case of a bivariate missing data. In Section 2, some of the result of Kim 2004) is reviewed and extended to more general class of multivariate missing data. Efficiency comparisons based on the asymptotic variancecovariance matrix obtained from the proposed method are discussed in Section 3. The proposed method is applied to a categorical data example in Section 4. Results from a limited simulation study are presented in Section 5. Concluding remarks are made in Section 6. 2 Proposed method The proposed method can be described in the following three steps: [Step 1] Partition the original sample into several disjoint sets according to the missing pattern. [Step 2] Compute the MLE s for the identified parameters separately in each partition of the sample. 4
6 [Step 3] Combine the estimators to get a set of final estimates using a generalized least squares GLS) form. Kim 2004) discuss the procedures in detail for the bivariate case. We review the result of Kim 2004) for the bivariate normal case in Section 2.1. In Section 2.2, we consider a general class of bivariate distributions. In Section 2.3, the proposed method is extended to multivariate distributions. 2.1 Bivariate normal case To simplify the presentation, we describe the proposed method in the bivariate normal setup with nonmonotone missing pattern. The joint distribution of y = y 1, y 2 ) is parameterized by the five parameters using model 1) or 2). For the convenience of the factoring method described in Section 1, we use the parametrization in 2) and let θ = β 20 1, β 21 1, σ 22 1, µ 1, σ 11 ). In Step 1, we partition the sample into several disjoint sets according to the pattern of missingness. In the case of a nonmonotone missing pattern with two variables, we have 3 = types of respondents that contain information about the parameters. The first set H has both y 1 and y 2 observed, the second set K has y 1 observed but y 2 missing, and the third set L has y 2 observed but y 1 missing. See Table 1. Let n H, n K, n L be the sample sizes of the set H, K, L, respectively. The case of both y 1 and y 2 missing can be safely removed from the sample. In Step 2, we obtain the parameter estimators in each set: For set H, we have the five parameters η H = β 20.1, β 21.1, σ 22.1, µ 1, σ 11 ) of the conditional distribution of y 2 given y 1 and the marginal distribution of y 1, with MLE s ˆη H = ˆβ 20 1,H, ˆβ 21 1,H, ˆσ 22 1,H, ˆµ 1,H, ˆσ 11,H ). For set K, the MLE s 5
7 η θ) = β 20 1, β 21 1, σ 22 1, µ 1, σ 11, µ 1, σ 11, β β 21 1 µ 1, σ β σ 11 ) 4) Table 1. An illustration of the missing data structure under bivariate normal distribution Set y 1 y 2 Sample Size Estimable parameters H Observed Observed n H µ 1, µ 2, σ 11, σ 12, σ 22 K Observed Missing n K µ 1, σ 11 L Missing Observed n L µ 2, σ 22 ˆη K = ˆµ 1,K, ˆσ 11,K ) are obtained for η K = µ 1, σ 11 ), the parameters of the marginal distribution of y 1. For set L, the MLE s ˆη L = ˆµ 2,L, ˆσ 22,L ) are obtained for η L = µ 2, σ 22 ), where µ 2 = β β 21 1 µ 1 and σ 22 = σ β21 1σ In Step 3, we use the GLS method to combine the three estimators ˆη H, ˆη K, ˆη L to get a final estimator for the parameter θ. Let ˆη = ˆη H, ˆη K, ˆη L). Then ˆη = ˆβ20 1,H, ˆβ ) 21 1,H, ˆσ 22 1,H, ˆµ 1,H, ˆσ 11,H, ˆµ 1,K, ˆσ 11,K, ˆµ 2,L, ˆσ 22,L. 3) The expected value of this estimator is and the asymptotic covariance matrix is { Σ22.1 V = diag, 2σ2 22 1, σ 11, 2σ2 11, σ 11, 2σ2 11, σ } 22, 2σ2 22, 5) n H n H n H n H n K n K n L n L where Σ 22.1 = ) σ σ 1 11 µ 2 1 σ 1 11 σ 22 1 µ 1 σ11 1 σ 22 1 µ 1 σ11 1 σ 22 1 Note that Σ 22.1 = {E[1, y 1 )1, y 1 ) ]} 1 σ 22.1 = [ 1 µ 1 µ 1 σ 11 +µ 2 1] 1 σ Derivation for the asymptotic covariance matrix of the first five estimates in 3) is straightforward and can be found, for example, in Subsection of Little 6 ).
8 and Rubin 2002). We have a blockdiagonal structure of V in 5) because ˆµ 1K and ˆσ 11K are independent due to normality and observations between different sets are independent due to the iid assumptions. Note that the nine elements in ηθ) are related to each other because they are all functions of the five elements of vector θ. The information contained in the extra four equations has not yet been utilized in constructing estimators ˆη H, ˆη K, ˆη L. The information can be employed to construct a fully efficient estimator of θ by combining ˆη H, ˆη K, ˆη L through a GLS generalized least squares) regression of ˆη = ˆη H, ˆη K, ˆη L) on θ as follows: ˆη ηˆθ S ) = η/ θ )θ ˆθ S ) + error, where ˆθ S is an initial estimator. The expected value and variance of ˆη in 4) and 5) can be viewed as a nonlinear model of the five parameters in θ. Using a Taylor series expansion on the nonlinear model, a step of the GaussNewton method can be formulated as e η = X θ ˆθ ) S + u, 6) ) ) where e η = ˆη η ˆθS, η ˆθS is the vector 4) evaluated at ˆθ S, X = µ 1 2β 21 1 σ β β21 1 2, 7) and, approximately, u 0, V), 7
9 Table 2. Summary for bivariate normal case. y 1 y 2 Data Set Size Estimable parameters Asymptotic variance O M K n K η K = θ 1 W K = diagσ 11, 2σ11 2 ) O O H n H η H = θ 1, θ 2 ) W H = diagw K, Σ 22.1, 2σ ) M O L n L η L = µ 2, σ 22 ) W L = diagσ 22, 2σ22 2 ) O: observed, M: missing, θ 1 = µ 1, σ 11 ), θ 2 = β 20 1, β 21 1, σ 22 1 ), η = η H, η K, η L ), θ = η H, V = diagw H /n H, W K /n K, W L /n L ), Σ 22.1 = {E[1, y 1 )1, y 1 ) ]} 1 σ 22.1, X = η/ θ, µ 2 = β β 21.1 µ 1, σ 22 = σ β σ 11 and V is the covariance matrix defined in 5). The GaussNewton method for the estimation of nonlinear models can be found in Seber and Wild 1989). Relations among parameters η, θ, X, and V are summarized in Table 2. A simple initial estimator is the weighted average of available estimators from data sets, defined as ˆθ S = ˆβ 20 1,H, ˆβ 21 1,H, ˆσ 11 2,H, ˆµ 1,HK, ˆσ 11,HK ), 8) where ˆµ 1,HK = 1 p K ) ˆµ 1,H + p K ˆµ 1,K, ˆσ 11,HK = 1 p K ) ˆσ 11,H + p K ˆσ 11,K, and p K = n K / n H + n K ). This initial value is a nconsistent estimate of θ and guarantees the consistency of the onestep estimators. The procedure can be carried out iteratively until convergence. Given the current value ˆθ t), the solution of the GaussNewton method can be obtained iteratively as ˆθ t+1) = ˆθ t) + ) 1 X 1 t) ˆV t) X t) X t) { 1 ˆθt) ˆV )} t) ˆη η, 9) where X t) and ˆV t) are evaluated from X in 7) and V in 5), respectively, using the current value ˆθ t). The covariance matrix of the estimator in 9) can be estimated by C = when the iteration is stopped at the tth iteration. 1 X 1 t) ˆV t) t)) X, 10) 8
10 Remark 1 Bivariate normal monotone case. In the case of monotone missing which consists of H and K, the iteration 9) produces the estimator obtained by the factoring likelihood, given in Little and Rubin 2002). order to see this, note that 9) reduces to ˆθ t+1) = ˆθ t) + where and ) 1 { X 1 HK ˆV HKt) X HK X 1 ˆθt) HK ˆV )} HKt) ˆη HK η, 11) X HK = ˆV HKt) = diag { ˆΣ22.1t) n H , 2ˆσ2 22 1t) n H, ˆσ 11t) n H, 2ˆσ2 11t) n H,, ˆσ 11t), 2ˆσ2 11t) n K n K ˆη HK = ˆβ20 1,H, ˆβ 21 1,H, ˆσ 22 1,H, ˆµ 1,H, ˆσ 11,H, ˆµ 1K, ˆσ 11K ). Starting with an initial estimator ˆθ 0) = ˆβ20 1,H, ˆβ 21 1,H, ˆσ 22 1,H, ˆµ 1,H, ˆσ 11,H ), the estimator constructed from data set H, one step application of 9) leads to the onestep estimate ˆθ 1) = ˆβ20 1,H, ˆβ 21 1,H, ˆσ 22 1,H, ˆµ 1,HK, ˆσ 11,HK ) 12) }, In where ˆµ 1,HK and ˆσ 11,HK are defined after 8). This is the same as the MLE of Anderson 1957) from the original factoring likelihood method. When only y 2 is subject to missingness, the estimated regression coefficient for the regression of y 2 on y 1 using the set H only is fully efficient, but the estimated regression coefficient for the regression of y 1 on y 2 based on the set H only is not fully efficient. 9
11 2.2 General bivariate case We now consider the general bivariate case when the joint distribution is not necessarily normal. Assume that the joint distribution of y = y 1, y 2 ) is parameterized by θ. For set H, let η H = η H θ) be a parametrization for the joint distribution of y 1, y 2 ) such that the information matrix, say I H η H ), is easy to compute. One such parametrization is η H = η H1, η H2 ), where η H1 is the parameter vector for the conditional distribution of y 1 given y 2 and η H2 is the parameter vector for the marginal distribution of y 2. Since the parameters for the conditional distribution are orthogonal to those for the marginal distribution, the parametrization η H = η H1, η H2 ) results in a block diagonal I H η H ). For set K, let η K = η K θ) be a parametrization for the marginal distribution of y 1 such that the information matrix, say I K η K ), is easy to compute. Define η L similarly. The parametrization for the set H need not be the same as that for set K or for set L, which provides more flexibility in choosing the parametrization. Separate orthogonal parametrization in each set will lead to computational advantages over the direct maximum likelihood method. Let ˆη H be the MLE of η H constructed using sample set H. Define ˆη K and ˆη L similarly. Let V H, V K, and V L be the estimated covariance matrices of the MLE s ˆη H, ˆη K, and ˆη L, respectively, and let V = diagv H, V K, V L ). Note that V 1 = diag {I H η H ), I K η K ), I L η L )}. Because V is a function of θ, we write V = V θ). Also, define [ η X θ) = H θ, η K θ, η L θ and η θ) = η H θ), η K θ), η L θ) ). Note that ˆη = ˆη H, ˆη K, ˆη L, ) is ] 10
12 different from ηˆθ) in general. Using the above notation, the maximum likelihood estimator can be computed iteratively from 9) with X t) = Xˆθ t) ) and ˆV t) = Vˆθ t) ). We now show that the procedure in 9) produces a fully efficient estimator of θ in that it is equivalent to the NewtonRaphson procedure for ML estimation based on the observed likelihood. Let the score function of the observed likelihood be defined as S obs y; θ) = log l obs θ) / θ, where l obs θ) = i f y i ; θ) dy imis), with y imis) defined to be the missing part of y i, is the observed likelihood function of parameter θ. The NewtonRaphson method for maximum likelihood estimation can be defined as ˆθ t+1) = ˆθ t) + [I obs ˆθt) )] 1 Sobs y; ˆθ t)), 13) where I obs θ) = E [ 2 log l obs θ) / θ θ ] is the expected information matrix for θ. We show in Theorem 1 below, that the iterations 9) and 13) are identical in that, starting from ˆθ t), the two iterations produce identical values for ˆθ t+1) for all t. Therefore, our procedure gives us a fully efficient estimator of θ. Equivalence between the GaussNewton estimator and the maximum likelihood estimator will be established in a more general multivariate situation in Section 2.3. Note that evaluation of the likelihood l obs θ) and hence direct computation of the MLE through the scoring method 13) is not trivial 11
13 due to the complexities in evaluating the integral in the observed likelihood. On the other hand, our procedure is easy to implement because it involves evaluation of likelihoods corresponding to only observed parts. The following theorem establishes equivalence between the GaussNewton estimator in 9) and the maximum likelihood estimator in 13). Theorem 1 The GaussNewton estimator 9) is equivalent to the maximum likelihood estimator 13) in that, starting from ˆθ t), 9) and 13) give the same value for ˆθ t+1). Proof. See Appendix A. Note that because of the nature of the NewtonRaphson algorithm, the iteration 9) converges much faster than the usual EMalgorithm. Moreover, our procedure directly produces a simple estimator C of the variance of the MLE ˆθ, while the EMalgorithm does not give a direct estimate of the variance of ˆθ. Remark 2  Onestep estimator. Given a suitable choice of the initial estimate ˆθ S, the onestep estimator ˆθ = ˆθ S + X S ) 1 1 ˆV S X S X 1 S ˆV S e η, 14) can be a very good approximation to the maximum likelihood estimator, where X S and ˆV S are evaluated from X and V, respectively, using the initial estimator ˆθ S. The onestep NewtonRaphson estimator 13) using nconsistent initial estimates is asymptotically equivalent to the MLE Lehmann, 1983, Theorem 3.1). By Theorem 1, the onestep GaussNewton estimator 14) is also asymptotically equivalent to the MLE. 12
14 2.3 General multivariate case One advantage of the GLS GaussNewton procedure 9) is that it can easily extend to a general multivariate case having p variables y = y 1,..., y p ). Any general missing data set can be partitioned into say H 1,..., H q, mutually disjoint and exhaustive data sets such that, for each data set H j, j = 1,..., q, all the element share the same missing pattern. Therefore, each set H j can be considered to be a complete data set if only all the observed variables are considered. Let θ be a parameter vector for which the joint distribution of y is fully indexed and is identified. We choose a parameter vector η j for the joint distribution of the observed variables corresponding to H j such that the joint distribution is identified and the information matrix, say I j θ), is easy to compute. Let ˆη j be the MLE of η j computed from the data set H j, which can be easily computed because H j is complete. We have V j = varˆη j ) = I 1 j θ). Let η = η 1,..., η q) and let ˆη = ˆη 1,..., ˆη q). Then V = varˆη) = diagi 1 1 θ),..., I 1 q θ)). Letting X = η/ θ, with an initial consistent estimator ˆθ S, the following iteration ˆθ = ˆθ S + X V 1 X ) 1 X V 1 ˆη ηˆθ S )), 15) defines a one step GaussNewton procedure for ML estimation. The following theorem establishes equivalence of the proposed estimator and the MLE. The proof is a straightforward extension of that of Theorem 1 and thus is skipped for brevity. Theorem 2 The onestep estimator 15) is equivalent to the maximum likelihood estimator. 13
15 In order to implement our procedure 15), we need to specify θ, η, X, V) as well as an initial estimator ˆθ S. Specification of θ, η, X, V) depends on data distribution and missing type. In the following remarks, explicit expressions for θ, η, X, V) are given for some important cases. These remarks demonstrate that our procedure 15) can be easily implemented to multivariate normal cases and hence multiple regressions with any missing type. Remark 33variate general nonmonotone missing case. We give a detailed implementation of the GaussNewton procedure 15) for 3 variate case. Assume that y = y 1, y 2, y 3 ) is jointly normal N 3 µ, Σ), µ = µ 1, µ 2, µ 3 ), Σ = σ ij ). All possible 7 = missing cases are displayed in Table 3. Expressions for θ, η, X and V are given in Table 3. Parameters θ 1, θ 2, θ 3 correspond to the parameters of the distributions of y 1, the conditional distribution of y 2 given y 1, and the conditional distribution of y 3 given y 1, y 2 ), respectively. The parameter θ = θ 1, θ 2, θ 3) fully parameterizes the joint distribution of y 1, y 2, y 3 ). The conditional parameters can be written in the following regression equations y 1 = µ 1 + e 1, e 1 N0, σ 11 ), y 2 = β β 21.1 y 1 + e 2.1, e 2.1 N0, σ 22.1 ), y 3 = β β y 1 + β y 2 + e 3.12, e 3.12 N0, σ ), y 3 = β β 31.1 y 1 + e 3.1, e 3.1 N0, σ 33.1 ), y 3 = β β 31.2 y 2 + e 3.2, e 3.2 N0, σ 33.2 ), in which the regression errors are independent of the regressors. Construction of estimator ˆη j from data set H j is obvious from defini 14
16 tion of parameter η j. For example, ˆη 1 = ˆµ 1,1, ˆσ 11,1 ) is estimated from H 1 ; ˆη 2 = ˆθ 1,2, ˆθ 2,2) is estimated from data set H 2 where ˆθ 1,2 = ˆµ 1,2, ˆσ 11,2 ) is constructed from variable y 1 and ˆθ 2,2 = ˆβ 20.1,2, ˆβ 21.1,2, ˆσ 22.1,2 ) is constructed from the regression of y 2 on y 1 ; ˆη 7 = ˆµ 2,7, ˆσ 22,7, ˆβ 30.2,7, ˆβ 31.2,7, ˆσ 33.2,7 ) is estimated from H 7 where ˆµ 2,7, ˆσ 22,7 ) is constructed from variable y 2 and ˆβ 30.2,7, ˆβ 31.2,7, ˆσ 33.2,7 ) is constructed from regression of y 3 on y 2. We then have ˆη = ˆη 7,..., ˆη 1). A simple initial estimator ˆθ S = ˆθ 1S, ˆθ 2S, ˆθ 3S) can be constructed by averaging available estimators from data sets H 1,..., H 7 as given by ˆθ 1S = n 1ˆθ1,1 + n 2ˆθ1,2 + n 3ˆθ1,3 + n 6ˆθ1,6 )/n 1 + n 2 + n 3 + n 6 ), ˆθ 2S = n 2ˆθ2,2 + n 3ˆθ2,3 )/n 2 + n 3 ), ˆθ 3S = ˆθ 3,3. For evaluation of η and X = η/ θ, we need expressions for η with respect to θ and their derivatives. This issue is addressed in Remark 4 below. We now have all the materials for implementing 15). Observe that some elements of η 4,..., η 7 are nonlinear functions of θ. Therefore, the X matrix has elements other than 0 or 1, as occurred in the last two columns of X in 7). For monotone missing pattern, sets H 4 H 7 are empty and the X matrix consists of elements of 0 or 1, as in Remark 1. Remark 4  Evaluation of η and X. Consider the general p dimensional normal case N p µ, Σ), µ = µ 1,..., µ p ), Σ = σ ij ). As shown in Remark 3, elements of η take one of the following three forms: {θ j, j = 1,,., p}, {µ, Σ}, or { parameters, say β j0.j, β jj.j, σ jj.j) of the regression of y j on a vector, say y J, a subvector of y 1, y 2,..., y j 1 ) such that y j = β j0.j + β jj y J + e j.j, j = 2,..., p }. In order to compute η and X = η/ θ, we need expressions 15
17 Table 3. 3dimensional normal case: nonmonotone missing. y 1 y 2 y 3 Data Set Size Estimable parameters Asymptotic variance O M M H 1 n 1 η 1 = θ 1 W 1 = diagσ 11, 2σ11 2 ) O O M H 2 n 2 η 2 = θ 1, θ 2 ) W 2 = diagw 1, Σ 22.1, 2σ ) O O O H 3 n 3 η 3 = θ 1, θ 2, θ 3 ) W 3 = diagw 2, Σ 33.12, 2σ ) M M O H 4 n 4 η 4 = µ 3, σ 33 ) W 4 = diagσ 33, 2σ33 2 ) M O M H 5 n 5 η 5 = µ 2, σ 22 ) W 5 = diagσ 22, 2σ22 2 ) O M O H 6 n 6 η 6 = θ 1, β 30.1, β 31.1, σ 33.1 ) W 6 = diagw 1, Σ 33.1, 2σ ) M O O H 7 n 7 η 7 = µ 2, σ 22, β 30.2, β 31.2, σ 33.2 ) W 7 = diagσ 22, 2σ22 2, Σ 33.2, 2σ ) θ 1 = µ 1, σ 11 ), θ 2 = β 20 1, β 21 1, σ 22 1 ), θ 3 = β 30 12, β 31 12, β 32 12, σ ), µ = µ 1, µ 2, µ 3 ) and Σ = σ ij ) are computed from θ 1, θ 2, θ 3 using the recursion in Remark 4 below, β 30.1 = µ 3 β 31.1 µ 1, β 31.1 = σ 13 /σ 11, σ 33.1 = σ 33 β σ 11, β 30.2 = µ 3 β 31.2 µ 2, β 31.2 = σ 23 /σ 22, σ 33.2 = σ 33 β σ 22, Σ 22.1 = {E[1, y 1 )1, y 1 ) ]} 1 σ 22.1, Σ = {E[1, y 1, y 2 )1, y 1, y 2 ) ]} 1 σ 33.12, Σ 33.1 = {E[1, y 1 )1, y 1 ) ]} 1 σ 33.1, Σ 33.2 = {E[1, y 2 )1, y 2 ) ]} 1 σ 33.2, η = η 7, η 6,..., η 1 ), θ = η 3, V = diagw 7 /n 7, W 6 /n 6,..., W 1 /n 1 ), X = η/ θ. for the parameters µ, Σ and β j0.j, β jj.j, σ jj.j) in terms of the conditional parameter θ = θ 1,..., θ p). Using a regression expression y j+1 = β j+1) j + β j+1) j y β j+1)j 12...j y j + e j+1) 12...j, we get, for j = 0, 1, 2,..., p 1, µ j+1 = Ey j+1 ) = β j+1) j + β j+1) j µ β j+1)j 12...j µ j, σ i,j+1 = covy i, y j+1 ) = β j+1) j σ i β j+1)j 12...j σ ij, i = 1, 2,..., j, and j j σ j+1,j+1 = vary j+1 ) = σ j+1)j+1) 12...j + β j+1)k 12...j β j+1)l 12...j σ kl. k=1 l=1 Note that, in the above three equations, µ j+1, σ 1j+1), σ 2j+1),..., σ j+1)j+1) ) is expressed in terms of the conditional parameter θ j+1 = β j+1) j, β j+1) j,..., β j+1)j 12...j, σ j+1)j+1) 12...j ) and the marginal parameter µ j, σ 1j,..., σ jj ), the latter of which is a function of 16
18 θ j, θ j 1,..., θ 1. Therefore, recursive evaluation of these three equations for j = 0, 1,..., p 1 with initial values β 10 = µ 1 and σ 11 = σ 11 gives the required expression for the marginal parameters µ j, σ 1j,..., σ j 1)j, σ jj ), j = 1,..., p in terms of the conditional parameters θ j, θ j 1,..., θ 1. 1, Partial derivatives are recursively computed as follows: for j = 0, 1,..., p j l=1 β j+1)l 12...j µ l /θ t if t = 1,..., j, µ j+1 / θ t = 1, µ 1,..., µ j, 0) if t = j if t = j + 2, j + 3,..., p, σ i,j+1 / θ t = β j+1) j σ i1 / θ t β j+1)j 12...j σ ij / θ t, i = 1, 2,..., j, t = 1, 2,..., j, σ j+1,j+1 / θ t = σ j+1)j+1) 12...j + j k=1 j l=1 β j+1)k 12...jβ j+1)l 12...j σ kl / θ t, t = 1, 2,..., j, σ i,j+1 / θ j+1 = 0, σ i1,..., σ ij, 0), i = 1,..., j, σ j+1,j+1 / θ j+1 = 0, 2 j k=1 β j+1)k 12...jσ k1, 2 j k=1 β j+1)k 12...jσ k2,,..., 2 j k=1 β j+1)k 12...jσ kj, 1), σ i,j+1 / θ t = 0, i = 1, 2,..., j + 1, t = j + 2, j + 3,..., p. Given these expressions for µ, Σ and their partial derivatives in terms of θ, it is straightforward to compute β j0.j, β jj.j, σ jj.j) and the corresponding derivatives because the regression parameters are simple functions of µ and Σ. Remark 5  A 4variate nonmonotone case. In Table 4, expressions for θ, η, X and V are given for a 4dimensional case with a specific missing pattern. 17
19 Table 4. A 4dimensional normal case  nonmonotone missing. y 1 y 2 y 3 y 4 Data Set Size Estimable parameters Asymptotic variance O M M M H 1 n 1 η 1 = µ 1, σ 11 ) W 1 = diagσ 11, 2σ11 2 ) M O M M H 2 n 2 η 2 = µ 2, σ 22 ) W 2 = diagσ 22, 2σ22 2 ) M M O M H 3 n 3 η 3 = µ 3, σ 33 ) W 3 = diagσ 33, 2σ33 2 ) M M M O H 4 n 4 η 4 = µ 4, σ 44 ) W 4 = diagσ 44, 2σ44 2 ) O O O O H 5 n 5 η 5 = θ 1, θ 2, θ 3, θ 4 ) W 5, see below θ 1 = µ 1, σ 11 ), θ 2 = β 20 1, β 21 1, σ 22 1 ), θ 3 = β 30 12, β 31 12, β 32 12, σ ), θ 4 = β , β , β , β , σ ) µ = µ 1,..., µ 4 ) and Σ = σ ij ) are computed from θ 1,..., θ 4 using the recursion in Remark 4, W 5 = diagσ 11, 2σ11 2, Σ 22.1, 2σ22.1 2, Σ 33.12, 2σ , Σ , 2σ ) Σ 22.1 = {E[1, y 1 )1, y 1 ) ]} 1 σ 22.1, Σ = {E[1, y 1, y 2 )1, y 1, y 2 ) ]} 1 σ 33.12, Σ = {E[1, y 1, y 2, y 3 )1, y 1, y 2, y 3 ) ]} 1 σ , η = η 5, η 4,..., η 1 ), θ = η 5, V = diagw 5 /n 5, W 4 /n 4,..., W 1 /n 1 ), X = η/ θ 3 Efficiency comparison We compare efficiencies of estimators constructed from different combinations of data sets H, K, L for the bivariate normal case. Under the nonmonotone missing pattern, we can compute the following four types of the estimates. 1. ˆθ H : the maximum likelihood estimator using the samples in H set. 2. ˆθ HK : the maximum likelihood estimator using the samples in H K. 3. ˆθ HL : the maximum likelihood estimator using the samples in H L. 4. ˆθ HKL : the maximum likelihood estimator using the whole sample. By Theorem 1, the GaussNewton estimator 9) is asymptotically equal to ˆθ HKL. Write X = [X H X K X L] where X H is the left 5 5 submatrix of X, X K is the 5 2 submatrix in the middle of X, and X L is the 5 2 submatrix in the right side of X. Similarly, we can decompose ˆη = ˆη H, ˆη K, ˆη L) and V = diag {VH, V K, V L }. 18
20 Using the arguments in the proof of Theorem 1, the asymptotic variance of ˆθ 1 H is X 1 H H) X. Similarly, we have ) 1 V ar ˆθHL = X 1 H X H + X 1 L ˆV L L) X, ) 1 V ar ˆθHK = X 1 H X H + X 1 K ˆV K K) X, and ) 1 V ar ˆθHKL = X 1 H X H + X 1 K ˆV K X K + X 1 L ˆV L L) X. Using the matrix algebra such as ) 1 X 1 H X H + X 1 L ˆV L X L = X 1 H X H ) 1 X L X H ) 1 1 ˆV H X H [ ˆV L + X L X 1 H X H we can derive expressions for the variances of the estimators. ) ] X L X L X 1 H H) X, For estimates of the slope parameter, the asymptotic variances are V ar ˆβ 21.1,HK ) = σ 22.1 σ 11 n H = V ar ˆβ 21.1,H ) V ar ˆβ 21.1,HL ) = σ 22.1 { 1 2pL ρ 2 1 ρ 2)} σ 11 n H ) V ar ˆβ21 1,HKL = σ { p } Lρ 2 1 ρ 2 ), σ 11 n H 1 p L p K ρ 4 where ρ 2 = σ 2 12/ σ 11 σ 22 ), p K = n K / n H + n K ) and p L = n L / n H + n L ). See Appendix B for derivations of these variances and other variances below. Thus, we have V ar ˆβ 21.1,H ) = V ar ˆβ 21.1,HK ) V ar ˆβ 21.1,HL ) V ar ˆβ 21.1,HKL ). 16) Here strict inequalities generally hold except for special trivial cases. Note that the asymptotic variance of ˆβ 21 1,HK is the same as the variance of ˆβ 21 1,H, 19
21 which implies that there is no gain of efficiency by adding set K missing y 2 ) to H. On the other hand, by comparing V ar ˆβ ) 21 1,HL ) with V ar ˆβ21 1,H, we observe an efficiency gain by adding a set L missing y 1 ) to H. This analysis is similar to the results from Little 1992) who summarized statistical results for regression with missing X s whose data sets are H and L. Little 1992) did not include the cases of missing y 2 s because data set K) with missing y 2 does not contain additional information in estimating the regression parameter. It is interesting to observe that even though adding K the data set with missing y 2 ) to H does not improve efficiency of regression parameter estimate, i.e., V ar ˆβ 21.1,H ) = V ar ˆβ 21.1,HK ), adding K to H, L) does improve the efficiency, i.e., V ar ˆβ 21.1,HL ) > V ar ˆβ 21.1,HKL ). Using these expressions, we can investigate variance reduction of ˆβ 21.1,HKL over ˆβ 21.1,HK. For example, we can say that relative efficiency of ˆβ 21.1,HKL over ˆβ 21.1,HK is lager for larger values of p K, p L, or ρ. If p L = p K = 0.5 and ρ = 0.5, the relative efficiency value is If p L = p K = 0.9 and ρ = 0.9, the relative efficiency value is For the other parameters of the conditional distribution, relationships similar to 16) hold. For the marginal parameters, we have V arˆµ 1,HK ) = σ 11 n H 1 p K ), V arˆµ 1,HL ) = σ 11 1 pl ρ 2), n H { V arˆµ 1,HKL ) = σ 11 n H 1 p K ) p Lρ 2 1 p K ) 2 1 p L p K ρ 2 ) }, 20
22 and V arˆσ 11,HK ) = 2σ2 11 n H 1 p K ), V arˆσ 11,HL ) = 2σ pl ρ 4), n H { V arˆσ 11,HKL ) = 2σ2 11 n H 1 p K ) p Lρ 4 1 p K ) 2 1 p L p K ρ 4 Note that efficiency of the marginal parameters µ 1, σ 11 ) of y 1 improves if additional data for y 2 with y 1 missing are provided. In particular, if n K = n L, then V arˆµ 1,H ) V arˆµ 1,HL ) V arˆµ 1,HK ) V arˆµ 1,HKL ) }. and V arˆσ 11,H ) V arˆσ 11,HL ) V arˆσ 11,HK ) V arˆσ 11,HKL ). 4 A Numerical Example For a numerical example, we consider the data set adapted from Bishop, Fienberg and Holland 1975, Table 1.42). Table 5 gives the data for a 2 3 table of three categorical variable Y 1 =Clinic, Y 2 =Parential care, Y 3 =survival) with one supplemental margin for Y 2 and Y 3 and another supplemental margin for Y 1 and Y 3. In this setup, Y i are all dichotomous, taking either 0 or 1, and 8 parameters can be defined as π ijk = P ry 1 = i, Y 2 = j, Y 3 = k), i = 0, 1; j = 0, 1; k = 0, 1. For the orthogonal parametrization, we use η H = ) π 1 11, π 1 10, π 1 01, π 1 00, π +1 1, π +1 0, π
23 Table 5. A 2 3 table with supplemental margins Set y 1 y 2 y 3 Count H K L
24 where π i jk = P r y 1 = i y 2 = j, y 3 = k), π +j k = P r y 2 = j y 3 = k), π ++k = P r y 3 = k). We also set θ θ 1, θ 2, θ 3, θ 4, θ 5, θ 6, θ 7 ) = η H. Note that the validity of the proposed method does not depend on the choice of the parametrization. A suitable parametrization will make the computation of the information matrix simple. From the data in Table 5, we can obtain 13 observations for 7 parameters. The observation vector can be written ˆη = ˆη H, ˆη K, ˆη L), where ˆη H = 293/316, 4/6, 176/373, 3/20, 316/689, 6/26, 689/715) ˆη K = ˆπ 1 +1,K, ˆπ 1 +0,K, ˆπ ++1,K ) = 100/182, 5/11, 182/193) ˆη L = ˆπ +1 1,L, ˆπ +1 0,L, ˆπ ++1,L ) = 90/240, 5/15, 240/255) with the expectations where η H = θ, η θ) = η H, η K, η L), η K = π 1 11 π π 1 01 π +0 1, π 1 10 π π 1 00 π +0 0, π ++1 ) = θ 1 θ 5 + θ 3 1 θ 5 ), θ 2 θ 6 + θ 4 1 θ 6 ), θ 7 ) and η L = π +1 1, π +1 0, π ++1 ) = θ5, θ 6, θ 7 ), and the variancecovariance matrix where V = diag {V H /n H, V K /n K, V L /n L } V H = diag {θ 1 1 θ 1 ), θ 2 1 θ 2 ),..., θ 7 1 θ 7 )}, V K = diag { ) ) π π1 +1, π π1 +0, π++1 1 π 1++ ) } V L = diag {θ 5 1 θ 5 ), θ 6 1 θ 6 ), θ 7 1 θ 7 )}. 23
25 The GaussNewton method as in 9) can be used to solve the nonlinear model of three parameters, where the initial estimator of θ is ˆθ S = 293/316, 4/6, 176/373, 3/20, 406/929, 11/41, 1111/1163) and the X matrix is X = θ θ θ θ θ 1 θ θ 2 θ The resulting onestep estimates are ˆθ 1 = 0.923, 0.678, 0.454, 0.168, 0.426, 0.272, 0.955).. Standard errors of the estimated values are computed from 10) and are ˆV 1/2 ˆθ 1 ) = diag , , , , , , ). On the other hand, the standard errors of the initial parameter estimates are ˆV 1/2 ˆθ S ) = diag , , , , , , ). Note that there is no efficiency gain for ˆπ ++1 because y 3 is fully observed throughout the sample. 5 Simulation Study To test our theory with finite sample sizes, we perform a limited simulation. For the simulated data set, we generate B = 10, 000 samples of size n from 24
26 Table 6. Monte Carlo variance of the point estimators under two difference estimation schemes based on samples of 10,000 trials. Sample size Parameter EM estimation GN estimation True variance µ µ σ σ σ µ µ σ σ σ the population x y 1 iid N1, 1) iid N 0, 1) y 2 = y 1 + e where e N 0, ). We use two levels of sample sizes, n = 100 and n = 500. Variable x is always observed and variables y 1 and y 2 are subject to missingness. The response probability for y 1 follows a logistic regression model such that logit {P ry 1 is observed)} = x, the response probability for y 1 follows a logistic regression model such that logit {P ry 1 is observed)} = 0.7x, and that the two responses are independent. The onestep GaussNewton GN) estimation and the EM estimation are compared. The estimates from the EM algorithm are computed after 10 iterations with the same initial values as for the onestep GN estimator. 25
27 Table 7. Monte Carlo result for the estimated variance of the proposed method based on samples of 10,000 trials. Sample size Parameter Mean Est. Var. Rel. Bias tstatistic µ µ σ σ σ µ µ σ σ σ The means and variances of the point estimators and the mean of the estimated variances are calculated. Because the point estimators are all unbiased in the simulation, their simulation means are not listed here. Table 6 displays the Monte Carlo variances of the point estimators in each estimation method. The theoretical asymptotic variances of the MLE are also computed and presented in the last column of Table 6. The simulation results in Table 6 are generally consistent with the theoretical results. The simulation variances are slightly larger than the theoretical variances because the estimators were not computed until convergence. Table 7 displays the mean, relative bias, and the tstatistic of the estimated variance of the onestep GN method. The relative bias is the Monte Carlo bias of the variance estimator divided by the Monte Carlo variance, where the variance is given in Table 6. The tstatistic for testing the hypothesis of zero bias is the Monte Carlo estimated bias divided by the Monte Carlo standard error of the estimated bias. The tvalues as well as the values 26
28 of relative biases state that estimated variances of our estimators computed using 10) are close to their theoretical values. The simulation results in Table 6 show that the two procedures have similar performance in terms of point estimation. The efficiency is slightly better for the onestep GN method because the EM algorithm was terminated after 10 iterations. The efficiency improvement is larger for the variance parameters than for the mean parameters, which suggests that convergence of the EM algorithm for the mean parameters is faster than convergence for the variance parameters. Also, as can be seen in Table 7, the onestep GN method provides consistent variance estimates for all parameters. The performance is better for a larger sample size because the consistency of the variance estimator is justified from the asymptotic theory. 6 Concluding remarks We have proposed a GaussNewton algorithm for obtaining the maximum likelihood estimator under a general nonmonotone missing data scenario. The proposed method is shown to be algebraically equivalent to the Newton Raphson method but avoids the burden of obtaining the observed likelihood. Instead, the MLEs separately computed from each partition of the marginal likelihoods and the full likelihoods are combined in a natural way. The way we combine the information takes the form of GLS estimation and thus can be easily implemented using the existing software. The estimated covariance matrix is computed automatically and shows good finite sample performance in the simulation study. The proposed method is not restricted to the multivariate normal distribution. It can be applied to any parametric multivariate 27
29 distribution as long as the computation for the marginal likelihood and the full likelihood are relatively easier than that of the observed likelihood. The proposed method assumes an ignorable response mechanism. A more realistic situation would be the case when the probability of y 2 missing depends on the value of y 1. In this case, the assumption of missing at random no longer holds and we have to take the response mechanism into account. Further investigation in this direction is a topic for a future research. Appendix A. Proof of Theorem 1 Note that the observed loglikelihood can be written as a sum of the loglikelihood in each set: log l obs θ) = log l H θ) + log l K θ) + log l L θ), A.1) where l H = i H f y; θ) is the likelihood function defined in set H, and l K and l L are defined similarly. Under MAR, l H is the likelihood for the joint distribution of y 1 and y 2, l K is the likelihood for the marginal distribution of y 1, and l L is the likelihood for the marginal distribution of y 2. By A.1), the score function for the likelihood can be written as S obs y; θ) = S H y; θ) + S K y; θ) + S L y; θ) A.2) and the expected information matrix also satisfies the additive decomposition I obs θ) = I H θ) + I K θ) + I L θ), A.3) where I H θ) = E [ 2 log l H θ) / θ θ ], and I K θ) and I L θ) are defined similarly. 28
30 The equation in A.3) can be written as ) ) η I obs θ) = H ηh I H η θ H ) θ + ) ) ηl ηl + I L η θ L ) θ ) ηk I K η θ K ) ) ηk θ = X ˆV 1 X, A.4) where X = η H / θ, η K / θ, η L / θ) and ˆV 1 = diag {I H η H ), I K η K ), I L η L )}. Now, consider the score function in A.2). Using the chain rule, the score function can be written as ) ) ) η S obs y; θ) = H η S H y; η θ H ) + K η S K y; η θ K ) + L S L y; η θ L ). A.5) Let ˆη H be the MLE of the likelihood l H. Taking a Taylor expansion of S H y; η H ) about ˆη H leads to S H y; η H ). = S H y; ˆη H ) I H ˆη H ) η H ˆη H ), where I H η H ) = 2 log l H η H ) / η H η H. Using S H y; ˆη H ) = 0 and the convergence of the observed information matrix to the expected information matrix, we have S H y; η H ). = I H ˆη H ) η H ˆη H ). Similar results hold for the sets K and L. Thus, A.5) becomes ) ). η S obs y; θ) = H η I K ˆη θ H ) ˆη H η H ) + K I K ˆη θ K ) ˆη K η K ) ) η + L I L ˆη θ L ) ˆη L η L ) = X ˆV 1 ˆη η). A.6) Therefore, inserting A.4) and A.6) into 13), we have 9). 29
31 B. Computations for variance formula Using ) 1 X 1 H X H + X 1 K ˆV K X K = X 1 H X H ) 1 X K it can be easily shown that X H ) 1 1 ˆV H X H [ ˆV K + X K X 1 H X H ) { 1 X 1 H X H + X 1 K ˆV K X Σbb K = diag, 2σ σ 11,, n H n H n H + n K Now, to use the formula ) 1 X 1 H X H + X 1 L ˆV L X L = X 1 H X H ) 1 X L X H ) 1 1 ˆV H X H [ ˆV L + X L X 1 H X H ) ] X K X K X 1 H H) X, 2σ 2 11 n H + n K }. ) ] X L X L X 1 H H) X, note that ˆV L + X L X H ) { } 1 1 ˆV H X H X σ22 L = diag, 2σ2 22, n HL n HL where n 1 HL = n 1 H + n 1 L, and ) 1 X L X 1 H X 1 H = n H σ σ β 21.1 σ 22.1 µ 1 2β 21.1 σ σ σ12 2 ). Thus, we have ) 1 X 1 H X H + X 1 L ˆV L X L = ˆV H n HL n 2 H σ 22.1 /σ 22 β 21.1 σ 22.1 /σ β 21.1 σ 22.1 /σ σ /σ 2 22 σ 12 /σ σ 2 12/σ 2 22 σ β 21.1 σ β 21.1 σ σ σ σ 2 12, 30
32 which present the variances of ˆθ HK. where To compute the variances of ˆθ HKL, use ) 1 X 1 HK ˆV HK X HK + X 1 L ˆV L X L = X 1 HK ˆV HK X HK ) 1 X L X HK ) 1 1 ˆV HK X HK [ ˆV L + X L X 1 HK ˆV HK X HK ) 1 X 1 HK ˆV HK X HK = X 1 H X H + X 1 1. K ˆV K K) X Writing D ˆV ) 1 L + X L X 1 HK ˆV HK X HK X L { σ22 = diag 1 pl p K ρ 2), 2σ pl p K ρ 4) }, n HL n HL ) ] X L X L X 1 HK ˆV HK HK) X, where n 1 HL = n 1 H + n 1 L, p L = n L /n H + n L ), and p K = n K /n H + n K ), and X L = 1 n H X HK ) 1 1 ˆV HK X HK σ σ 12 1 p K ) 0 2β 21.1 σ 22.1 µ 1 2β 21.1 σ σ σ p K ) the variance of ˆθ HKL can be obtained by X HK 1 n 2 H ) 1 1 ˆV HK X HK + X 1 L ˆV L X L = σ β 21.1 σ β 21.1 σ σ σ 12 1 p K ) 0 0 2σ p K ) X HK D 1 ), ) 1 1 ˆV HK X HK σ β 21.1 σ β 21.1 σ σ σ 12 1 p K ) 0 0 2σ p K ) 31
33 References Anderson, T.W. 1957). Maximum likelihood estimates for the multivariate normal distribution when some observation are missing, Journal of the American Statistical Association 52, Chen, Q., Ibrahim, J. G., Chen, MH, and Senchaudhuri, P. 2008). Theory and inference for regression models with missing responses and covariates, Journal of Multivariate Analysis 99, Cox, D.R. and Reid, N. 1987). Parameter orthogonality and approximate conditional inference with discussion). Journal of Royal Statistical Society: Series B 49, Dempster, A.P., Laird, N.M., and Rubin, D.B. 1977). Maximum likelihood from incomplete data via the EM algorithm with discussion), Journal of Royal Statistical Society, Series B 39, Kim, J.K. 2004). Extension of Factoring Likelihood Approach To Non Monotone Missing Data, Journal of the Korean Statistical Society 2004), 33, Lehmann, E.L. 1983). Theory of Point Estimation. Wiley, New York. Little, R.J.L. 1982). Models for nonresponse in sample surveys, Journal of the American Statistical Association 77, Little, R.J.L. 1992). Regression with missing X s: A review, Journal of the American Statistical Association 87,
34 Little, R.J.L. and Rubin, D.B. 2002). data. Wiley, New York. Statistical Analysis with missing Liu, C. and Rubin, D. B. 1994). The ECME Algorithm: A Simple Extension of EM and ECM with Faster Monotone Convergence, Biometrika 81, Louis, T. A. 1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society: Series B 44, Meng, X.L. and Rubin, D.B. 1991). Using EM to obtain asymptotic variancecovariance matrices: The SEM algorithm. Journal of the American Statistical Association 86, Molenberghs, G. and Kenward, M. 2007). Missing Data in Clinical Studies. Wiley, New York. Rubin, D.B. 1974). Characterizing the estimation of parameters in incomplete data problems, Journal of the American Statistical Association 69, Rubin, D.B. 1976). Inference and missing data, Biometrika 63, Seber, G.A.F. and Wild, C.J. 1989). Nonlinear Regression. Wiley, New York. 33
Parametric fractional imputation for missing data analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????,??,?, pp. 1 14 C???? Biometrika Trust Printed in
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN13: 9780470860809 ISBN10: 0470860804 Editors Brian S Everitt & David
More informationStatistical Analysis with Missing Data
Statistical Analysis with Missing Data Second Edition RODERICK J. A. LITTLE DONALD B. RUBIN WILEY INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents Preface PARTI OVERVIEW AND BASIC APPROACHES
More informationMultivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #47/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More informationHandling attrition and nonresponse in longitudinal data
Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 6372 Handling attrition and nonresponse in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein
More informationi=1 In practice, the natural logarithm of the likelihood function, called the loglikelihood function and denoted by
Statistics 580 Maximum Likelihood Estimation Introduction Let y (y 1, y 2,..., y n be a vector of iid, random variables from one of a family of distributions on R n and indexed by a pdimensional parameter
More informationOverview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models
Overview 1 Introduction Longitudinal Data Variation and Correlation Different Approaches 2 Mixed Models Linear Mixed Models Generalized Linear Mixed Models 3 Marginal Models Linear Models Generalized Linear
More informationHandling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza
Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and
More informationOLS in Matrix Form. Let y be an n 1 vector of observations on the dependent variable.
OLS in Matrix Form 1 The True Model Let X be an n k matrix where we have observations on k independent variables for n observations Since our model will usually contain a constant term, one of the columns
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationRegression Estimation  Least Squares and Maximum Likelihood. Dr. Frank Wood
Regression Estimation  Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. b 0, b 1 Q = n (Y i (b 0 + b 1 X i )) 2 i=1 Minimize this by maximizing
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models  part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK2800 Kgs. Lyngby
More informationNote on the EM Algorithm in Linear Regression Model
International Mathematical Forum 4 2009 no. 38 18831889 Note on the M Algorithm in Linear Regression Model JiXia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University
More informationMissing Data Dr Eleni Matechou
1 Statistical Methods Principles Missing Data Dr Eleni Matechou matechou@stats.ox.ac.uk References: R.J.A. Little and D.B. Rubin 2nd edition Statistical Analysis with Missing Data J.L. Schafer and J.W.
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationIntroduction to latent variable models
Introduction to latent variable models lecture 1 Francesco Bartolucci Department of Economics, Finance and Statistics University of Perugia, IT bart@stat.unipg.it Outline [2/24] Latent variables and their
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationA Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit nonresponse. In a survey, certain respondents may be unreachable or may refuse to participate. Item
More informationFactorization Theorems
Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization
More informationCHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.
CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In
More informationMultiple Imputation for Missing Data: A Cautionary Tale
Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust
More informationChapter 4: Vector Autoregressive Models
Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationThe VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.
Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002Topics in StatisticsBiological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationA Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under TwoLevel Models
A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under TwoLevel Models Grace Y. Yi 13, JNK Rao 2 and Haocheng Li 1 1. University of Waterloo, Waterloo, Canada
More informationFactor analysis. Angela Montanari
Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number
More informationDepartment of Economics
Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 14730278 On Testing for Diagonality of Large Dimensional
More informationOn Small Sample Properties of Permutation Tests: A Significance Test for Regression Models
On Small Sample Properties of Permutation Tests: A Significance Test for Regression Models Hisashi Tanizaki Graduate School of Economics Kobe University (tanizaki@kobeu.ac.p) ABSTRACT In this paper we
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationMicroeconometrics Blundell Lecture 1 Overview and Binary Response Models
Microeconometrics Blundell Lecture 1 Overview and Binary Response Models Richard Blundell http://www.ucl.ac.uk/~uctp39a/ University College London FebruaryMarch 2016 Blundell (University College London)
More informationStatistics in Geophysics: Linear Regression II
Statistics in Geophysics: Linear Regression II Steffen Unkel Department of Statistics LudwigMaximiliansUniversity Munich, Germany Winter Term 2013/14 1/28 Model definition Suppose we have the following
More informationA REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA
123 Kwantitatieve Methoden (1999), 62, 123138. A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA Joop J. Hox 1 ABSTRACT. When we deal with a large data set with missing data, we have to undertake
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study loglinear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationGLM, insurance pricing & big data: paying attention to convergence issues.
GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK  michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.
More informationReview of the Methods for Handling Missing Data in. Longitudinal Data Analysis
Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 113 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics
More informationChapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
More informationLife Table Analysis using Weighted Survey Data
Life Table Analysis using Weighted Survey Data James G. Booth and Thomas A. Hirschl June 2005 Abstract Formulas for constructing valid pointwise confidence bands for survival distributions, estimated using
More informationStatistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP  Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
More informationItem Imputation Without Specifying Scale Structure
Original Article Item Imputation Without Specifying Scale Structure Stef van Buuren TNO Quality of Life, Leiden, The Netherlands University of Utrecht, The Netherlands Abstract. Imputation of incomplete
More informationLOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationAn EM algorithm for the estimation of a ne statespace systems with or without known inputs
An EM algorithm for the estimation of a ne statespace systems with or without known inputs Alexander W Blocker January 008 Abstract We derive an EM algorithm for the estimation of a ne Gaussian statespace
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationMehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics
INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree
More informationEstimation and Inference in Cointegration Models Economics 582
Estimation and Inference in Cointegration Models Economics 582 Eric Zivot May 17, 2012 Tests for Cointegration Let the ( 1) vector Y be (1). Recall, Y is cointegrated with 0 cointegrating vectors if there
More informationMissing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random
[Leeuw, Edith D. de, and Joop Hox. (2008). Missing Data. Encyclopedia of Survey Research Methods. Retrieved from http://sageereference.com/survey/article_n298.html] Missing Data An important indicator
More informationDealing with Missing Data
Dealing with Missing Data Roch Giorgi email: roch.giorgi@univamu.fr UMR 912 SESSTIM, Aix Marseille Université / INSERM / IRD, Marseille, France BioSTIC, APHM, Hôpital Timone, Marseille, France January
More informationPrinciple Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate
More informationTABLE OF CONTENTS ALLISON 1 1. INTRODUCTION... 3
ALLISON 1 TABLE OF CONTENTS 1. INTRODUCTION... 3 2. ASSUMPTIONS... 6 MISSING COMPLETELY AT RANDOM (MCAR)... 6 MISSING AT RANDOM (MAR)... 7 IGNORABLE... 8 NONIGNORABLE... 8 3. CONVENTIONAL METHODS... 10
More informationReject Inference in Credit Scoring. JieMen Mok
Reject Inference in Credit Scoring JieMen Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business
More information(Quasi)Newton methods
(Quasi)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable nonlinear function g, x such that g(x) = 0, where g : R n R n. Given a starting
More informationIt is important to bear in mind that one of the first three subscripts is redundant since k = i j +3.
IDENTIFICATION AND ESTIMATION OF AGE, PERIOD AND COHORT EFFECTS IN THE ANALYSIS OF DISCRETE ARCHIVAL DATA Stephen E. Fienberg, University of Minnesota William M. Mason, University of Michigan 1. INTRODUCTION
More informationSections 2.11 and 5.8
Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and
More information1 Short Introduction to Time Series
ECONOMICS 7344, Spring 202 Bent E. Sørensen January 24, 202 Short Introduction to Time Series A time series is a collection of stochastic variables x,.., x t,.., x T indexed by an integer value t. The
More informationEconometric Analysis of Cross Section and Panel Data Second Edition. Jeffrey M. Wooldridge. The MIT Press Cambridge, Massachusetts London, England
Econometric Analysis of Cross Section and Panel Data Second Edition Jeffrey M. Wooldridge The MIT Press Cambridge, Massachusetts London, England Preface Acknowledgments xxi xxix I INTRODUCTION AND BACKGROUND
More informationVisualization of missing values using the Rpackage VIM
Institut f. Statistik u. Wahrscheinlichkeitstheorie 040 Wien, Wiedner Hauptstr. 80/07 AUSTRIA http://www.statistik.tuwien.ac.at Visualization of missing values using the Rpackage VIM M. Templ and P.
More informationHypothesis Testing in the Classical Regression Model
LECTURE 5 Hypothesis Testing in the Classical Regression Model The Normal Distribution and the Sampling Distributions It is often appropriate to assume that the elements of the disturbance vector ε within
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationMATHEMATICAL METHODS OF STATISTICS
MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS
More information2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)
2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came
More informationBootstrapping Big Data
Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu
More informationSolution of Linear Systems
Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start
More information4.7 Confidence and Prediction Intervals
4.7 Confidence and Prediction Intervals Instead of conducting tests we could find confidence intervals for a regression coefficient, or a set of regression coefficient, or for the mean of the response
More information1 Limiting distribution for a Markov chain
Copyright c 2009 by Karl Sigman Limiting distribution for a Markov chain In these Lecture Notes, we shall study the limiting behavior of Markov chains as time n In particular, under suitable easytocheck
More informationDefinition The covariance of X and Y, denoted by cov(x, Y ) is defined by. cov(x, Y ) = E(X µ 1 )(Y µ 2 ).
Correlation Regression Bivariate Normal Suppose that X and Y are r.v. s with joint density f(x y) and suppose that the means of X and Y are respectively µ 1 µ 2 and the variances are 1 2. Definition The
More informationWhat s New in Econometrics? Lecture 8 Cluster and Stratified Sampling
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and
More informationAuxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationTime Series Analysis III
Lecture 12: Time Series Analysis III MIT 18.S096 Dr. Kempthorne Fall 2013 MIT 18.S096 Time Series Analysis III 1 Outline Time Series Analysis III 1 Time Series Analysis III MIT 18.S096 Time Series Analysis
More informationIntroduction to mixed model and missing data issues in longitudinal studies
Introduction to mixed model and missing data issues in longitudinal studies Hélène JacqminGadda INSERM, U897, Bordeaux, France Inserm workshop, St Raphael Outline of the talk I Introduction Mixed models
More informationMaximum likelihood estimation of mean reverting processes
Maximum likelihood estimation of mean reverting processes José Carlos García Franco Onward, Inc. jcpollo@onwardinc.com Abstract Mean reverting processes are frequently used models in real options. For
More informationProblem of Missing Data
VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VAaffiliated statisticians;
More informationJoint models for classification and comparison of mortality in different countries.
Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute
More informationFrom the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
More informationSYSTEMS OF REGRESSION EQUATIONS
SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations
More informationImputing Missing Data using SAS
ABSTRACT Paper 32952015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are
More informationEstimating an ARMA Process
Statistics 910, #12 1 Overview Estimating an ARMA Process 1. Main ideas 2. Fitting autoregressions 3. Fitting with moving average components 4. Standard errors 5. Examples 6. Appendix: Simple estimators
More informationJournal of Statistical Software
JSS Journal of Statistical Software August 26, Volume 16, Issue 8. http://www.jstatsoft.org/ Some Algorithms for the Conditional Mean Vector and Covariance Matrix John F. Monahan North Carolina State University
More informationC: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)}
C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)} 1. EES 800: Econometrics I Simple linear regression and correlation analysis. Specification and estimation of a regression model. Interpretation of regression
More informationPattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University
Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision
More informationRandom Vectors and the Variance Covariance Matrix
Random Vectors and the Variance Covariance Matrix Definition 1. A random vector X is a vector (X 1, X 2,..., X p ) of jointly distributed random variables. As is customary in linear algebra, we will write
More informationThe Characteristic Polynomial
Physics 116A Winter 2011 The Characteristic Polynomial 1 Coefficients of the characteristic polynomial Consider the eigenvalue problem for an n n matrix A, A v = λ v, v 0 (1) The solution to this problem
More informationDoptimal plans in observational studies
Doptimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
More informationMultiple Testing. Joseph P. Romano, Azeem M. Shaikh, and Michael Wolf. Abstract
Multiple Testing Joseph P. Romano, Azeem M. Shaikh, and Michael Wolf Abstract Multiple testing refers to any instance that involves the simultaneous testing of more than one hypothesis. If decisions about
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More informationA General Approach to Variance Estimation under Imputation for Missing Survey Data
A General Approach to Variance Estimation under Imputation for Missing Survey Data J.N.K. Rao Carleton University Ottawa, Canada 1 2 1 Joint work with J.K. Kim at Iowa State University. 2 Workshop on Survey
More informationThe Method of Least Squares
Hervé Abdi 1 1 Introduction The least square methods (LSM) is probably the most popular technique in statistics. This is due to several factors. First, most common estimators can be casted within this
More informationNONLIFE INSURANCE PRICING USING THE GENERALIZED ADDITIVE MODEL, SMOOTHING SPLINES AND LCURVES
NONLIFE INSURANCE PRICING USING THE GENERALIZED ADDITIVE MODEL, SMOOTHING SPLINES AND LCURVES Kivan Kaivanipour A thesis submitted for the degree of Master of Science in Engineering Physics Department
More informationAnalysis of Bayesian Dynamic Linear Models
Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main
More informationTesting Linearity against Nonlinearity and Detecting Common Nonlinear Components for Industrial Production of Sweden and Finland
Testing Linearity against Nonlinearity and Detecting Common Nonlinear Components for Industrial Production of Sweden and Finland Feng Li Supervisor: Changli He Master thesis in statistics School of Economics
More informationInner Product Spaces and Orthogonality
Inner Product Spaces and Orthogonality week 34 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,
More informationHow To Use A Monte Carlo Study To Decide On Sample Size and Determine Power
How To Use A Monte Carlo Study To Decide On Sample Size and Determine Power Linda K. Muthén Muthén & Muthén 11965 Venice Blvd., Suite 407 Los Angeles, CA 90066 Telephone: (310) 3919971 Fax: (310) 3918971
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationRow Echelon Form and Reduced Row Echelon Form
These notes closely follow the presentation of the material given in David C Lay s textbook Linear Algebra and its Applications (3rd edition) These notes are intended primarily for inclass presentation
More informationRegression Modeling Strategies
Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions
More informationAnalyzing Structural Equation Models With Missing Data
Analyzing Structural Equation Models With Missing Data Craig Enders* Arizona State University cenders@asu.edu based on Enders, C. K. (006). Analyzing structural equation models with missing data. In G.
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationSensitivity Analysis in Multiple Imputation for Missing Data
Paper SAS2702014 Sensitivity Analysis in Multiple Imputation for Missing Data Yang Yuan, SAS Institute Inc. ABSTRACT Multiple imputation, a popular strategy for dealing with missing values, usually assumes
More informationMaximum Likelihood Estimation of Logistic Regression Models: Theory and Implementation
Maximum Likelihood Estimation of Logistic Regression Models: Theory and Implementation Abstract This article presents an overview of the logistic regression model for dependent variables having two or
More information