Note on the EM Algorithm in Linear Regression Model



Similar documents
Parametric fractional imputation for missing data analysis

Maximum Likelihood Estimation

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

A Basic Introduction to Missing Data

Reject Inference in Credit Scoring. Jie-Men Mok

An extension of the factoring likelihood approach for non-monotone missing data

Bayesian Statistics in One Hour. Patrick Lam

Lecture 3: Linear methods for classification

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

Linear Classification. Volker Tresp Summer 2015

Statistical Machine Learning

Problem of Missing Data

Basics of Statistical Machine Learning

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

Introduction to General and Generalized Linear Models

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT

Estimating an ARMA Process

Imputation of missing data under missing not at random assumption & sensitivity analysis

1 Prior Probability and Posterior Probability

Monte Carlo-based statistical methods (MASM11/FMS091)

STA 4273H: Statistical Machine Learning

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data

Electronic Theses and Dissertations UC Riverside

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

Introduction to mixed model and missing data issues in longitudinal studies

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Wes, Delaram, and Emily MA751. Exercise p(x; β) = [1 p(xi ; β)] = 1 p(x. y i [βx i ] log [1 + exp {βx i }].

Analysis of Bayesian Dynamic Linear Models

A General Approach to Variance Estimation under Imputation for Missing Survey Data

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

STATISTICA Formula Guide: Logistic Regression. Table of Contents

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

Gaussian Processes to Speed up Hamiltonian Monte Carlo

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Multivariate Normal Distribution

Comparison of resampling method applied to censored data

Imputing Missing Data using SAS

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure

ZHIYONG ZHANG AND LIJUAN WANG

Efficiency and the Cramér-Rao Inequality

Analyzing Structural Equation Models With Missing Data

Handling attrition and non-response in longitudinal data

Optimizing Prediction with Hierarchical Models: Bayesian Clustering

Pattern Analysis. Logistic Regression. 12. Mai Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Applications of R Software in Bayesian Data Analysis

Bayesian Image Super-Resolution

Principle of Data Reduction

Sampling for Bayesian computation with large datasets

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

A hidden Markov model for criminal behaviour classification

Statistics Graduate Courses

Analysis of Longitudinal Data with Missing Values.

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Orthogonal Distance Regression

VI. Introduction to Logistic Regression

Handling missing data in Stata a whirlwind tour

Distribution (Weibull) Fitting

Lecture 8 February 4

Parallelization Strategies for Multicore Data Analysis

Missing Data Techniques for Structural Equation Modeling

Estimating the random coefficients logit model of demand using aggregate data

11. Time series and dynamic linear models

Statistical Machine Learning from Data

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

The Probit Link Function in Generalized Linear Models for Data Mining Applications

APPLIED MISSING DATA ANALYSIS

Gamma Distribution Fitting

Java Modules for Time Series Analysis

A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA: A FIRST REPORT

A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA

Package EstCRM. July 13, 2015

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page

Lecture 9: Introduction to Pattern Analysis

Normal Distribution. Definition A continuous random variable has a normal distribution if its probability density. f ( y ) = 1.

Least Squares Estimation

Probability Calculator

Statistics in Retail Finance. Chapter 6: Behavioural models

A Study on the Comparison of Electricity Forecasting Models: Korea and China

Fitting Subject-specific Curves to Grouped Longitudinal Data

Missing Data. Paul D. Allison INTRODUCTION

Logistic Regression (1/24/13)

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

Bayesian Approaches to Handling Missing Data

Missing data and net survival analysis Bernard Rachet

Standard errors of marginal effects in the heteroskedastic probit model

Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting.

Statistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation

Extreme Value Modeling for Detection and Attribution of Climate Extremes

Monte Carlo Simulation

Detection of changes in variance using binary segmentation and optimal partitioning

Transcription:

International Mathematical Forum 4 2009 no. 38 1883-1889 Note on the M Algorithm in Linear Regression Model Ji-Xia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University Henan Province 453007 China leabird@163.com yumiao728@yahoo.com.cn Abstract Linear regression model has been used extensively in the fields of information processing and data analysis. In the present paper we consider the linear model with missing data. Using the M (xpectation and Maximization algorithm the asymptotic variances and the standard errors for the ML of the unknown parameters are established. Mathematics Subject Classification: 93C05; 93C41 Keywords: Conditional expectation; maximum likelihood estimator; M algorithm; Newton-Raphson iteration 1 Introduction As a typical statistical model linear regression model has been widely used in the fields of information processing and data analysis. In fact there have been several statistical methods for its learning or modeling (e.g. the expectationmaximization (M algorithm [2] for maximum likelihood and the self-organizing network with hyper-ellipsoidal clustering [5]. Generally the parameters of linear regressive model can be estimated via the M algorithm under the maximum likelihood framework since the M algorithm owns certain good convergence behaviors in certain situations. However in some applications there are many data sets including missing observations [9] which cause many problems if the missing data is related to the values of the missing item [8] for instance in [4] Little and Rubin showed that this can cause bias and inefficiency for some estimations. So an new algorithm for estimating unknown parameters is proposed based on the likelihood function. In [1] Baker and Laird used the

1884 J.-X. Wang and Y. Miao M algorithm to obtain maximum likelihood estimates (ML of the unknown parameters in the model with the incomplete data. Ibrahim and Lipsitz [3] established Bayesian methods for estimation in generalized linear models. In the present paper we discuss the linear regression model with missing data and propose a method for estimating parameters by using Newton- Raphson iteration to solve the score equation. Moreover the standard errors of these estimators are calculated by the observed Fisher information matrix. 2 Linear regression model with missing data Suppose that y 1 y 2... y n are independent identically distributed normal random variables with unit variances. Let X i (X 1i X 2i is a 2 1 random vector of covariation where X 1i and X 2i are independent observations and follow normal distributions with means μ 1 μ 2 and variances σ1 2σ2 2 respectively. For notation convenience let X i (1X 1i X 2i and assume that β (β 0 β 1 β 2 are regression coefficients. It is also supposed that p(y i X i β 1 (y i X 2 i β exp 2π 2. (1 We assume that X 1i is completely observed and X 2i is partially missing for every i and our objective is to estimate βμ 1 μ 2 σ1 2σ2 2 and their standard errors from the known data with missing values. Missing value indicators are introduced in [6] as r i { 0 if yi is observed 1 if y i is missing. s i { 0 if x2i is observed 1 if x 2i is missing. (2 with probabilities p(r i ψ i p(s i ϕ i. Following the reference [8] for any i 1 2... n the missing-data mechanism is defined as logit(ψ i log ψ i 1 ψ i δ 1 X 1i δ 2 X 2i y i ω (3 and ϕ i logit(ϕ i log α 1 X 1i α 2 X 2i y i τ (4 1 ϕ i where δ (δ 1 δ 2 α (α 1 α 2 ω and τ are parameters determining the missing mechanism. hen the conditional probability functions for r i and s i are derived by qs. (2-(4 as p(r i X i y i δω exp{r i(x i δ y i ω} 1 exp{x i δ y iω}

Note on the M algorithm 1885 p(s i X i y i ατ exp{s i(xi α y i τ} 1 exp{xi α y iτ}. Now we derive the joint probability function of y i x 2i r i s i as p(y i x 2i r i s i x 1i p(r i X i y i δωp(s i X i y i ατp(y i X i βp(x 2i X 1i exp{r i(xi δ y i ω} 1 exp{xi δ y iω} exp{s i(xi α y i τ} 1 exp{xi α y iτ} 1 (2π 2 { exp (y i X i 2 herefore we can write down the complete-data log-likelihood l(θ by β 2 log L(θ y i X i r i s i ( exp{ri (Xi log δ y iω} ( exp{ri (Xi 1 exp{r i (Xi δ y log α y iτ} iω} 1 exp{s i (Xi α y iτ} n (y i X 2 i β 2 log(2π n 2 2 log(2πσ2 2 (x 2i μ 2 2 where θ (βδωατμ 2 σ2 2 is the parameter related to developing M algorithm. he complete-data log-likelihood specifies a model for the joint characterization of the observed data and the associated missing-data mechanism. 3 -step of M algorithm he ML of θ is a point which maximizes the observed-data likelihood function L(θ (y X obs r i s i where (y X obs is the observed components of (y X. Let θ (r be the r-st iteration estimate of θ and define the conditional expectation of l(θ-with respect to the conditional distribution of the missing data (y X mis given the observed data y i X i r i s i and the value θ (r as the following: 2σ 2 2 Q(θ θ (r [l(θ (y X obs rsθ (r ]. (5 he M algorithm is composed of -step and M-step iterations. Now for the expectation of the complete-data log-likelihood in the -step of M algorithm we consider four possible-cases: response variable y i is missing a covariance x 2i is missing both of them are missing and no missing values. hen the expected log-likelihood function can be written by } (2πσ 22 12 exp { (x (6

1886 J.-X. Wang and Y. Miao where x 2imis denotes the missing components of x 2i. qs.(3.1 and (3.2 lead to the conditional expectation of l(θ which is our target quantity as n 1 Q(θ θ (r l(θ n 3 in 2 1 n 2 in 1 1 in 3 1 y i 1 l(θp ( y imis X i r i s i θ (r dy imis l(θp ( x 2imis X iobs y i r i s i θ (r dx 2imis l(θp ( y imis x 2imis X iobs r i s i θ (r dy imis dx 2imis where n 1 n 2 n 3 are corresponding sample sizes y imis is the missing components of y i X iobs is the observed component of X i and p(y imis x 2imis X iobs r i s i p(y imis X i r i s i and p(y imis x 2imis X iobs r i s i are the conditional probabilities of the missing data given the observed data. hese conditional probabilities are regarded as the weights in Q(θ θ (r. he weights have the following form: p ( y imis x 2imis X iobs r i s i θ (r y 1 1 p ( y i X i θ (r p (x 2i x 1i p ( r i y i X i θ (r p ( s i y i X i θ (r p (yi X i θ (r p (x 2i x 1i p (r i y i X i θ (r p (s i y i X i θ (r p ( y i x 2i r i s i x 1i θ (r and p ( x 2imis X iobs y i r i s i θ (r p ( x 2i x 1i θ (r p ( s i y i X i θ (r p (x2i x 1i θ (r p (s i y i X i θ (r exp{r i(xi α y { iτ} 1 exp{xi α y iτ} (2πσ2 2 1 2 exp (x } 2i μ 2 2 2σ 2 2 p ( y imis X i r i s i θ (r p ( y i X i θ (r p ( r i y i X i θ (r y i 1 p (y i X i θ (r p (r i y i X i θ (r p ( y i X i θ (r p ( r i y i X i θ (r. hen the conditional expectation Q(θ θ (r is to be calculated by a Metropolis- Hastings(MH algorithm [7].

Note on the M algorithm 1887 4 M-step of M algorithm and convergence Now we need to find a value of θ saying θ (r at which Q(θ θ (r will attain the maximum. he Newton-Raphson method will be used to solve the score equation. he parameters θ (r1 in the M-step at the (r 1st M iteration and the (r 1st Newton-Raphson iteration take the following form (for β for example: β (r1 β (r ( 2 Q(θ θ (r 1 β β ββ (r ( Q(θ θ (r β ββ (r. he derivatives of the parameter β used in the iteration are given as follows: Q(θ θ (r β n 1 ( X i y i X i β and n 3 in 2 1 2 Q(θ θ (r β β n 2 in 1 1 [ Xi (y i X i β X i θ (r ] [ Xi (y i X i β X obs y i θ (r ] n 1 X i Xi n 3 in 2 1 n 2 in 1 1 in 3 1 ] [ Xi Xi X i θ (r ] [ Xi Xi X obs y i θ (r in 3 1 [ Xi (y i X i β X obs θ (r ] [ Xi Xi X obs θ (r ]. he derivatives of other components of β used in the iteration are given in the reference [6]. he (r1st estimates of μ 2 σ2 2 are obtained by solving the score equations: Q(θ θ (r μ 2 Q(θ θ (r σ 2 2 (x 2i x 1i y i r i s i nμ 2 0 ( (x 2i μ 2 2 x 1i y i r i s i nσ 2 2 0. herefore we can take μ (r1 2 σ 2(r1 2 by μ (r1 2 1 n (x 2i x 1i y i r i s i σ 2(r1 2 1 n ( (x 2i μ 2 2 x 1i y i r i s i

1888 J.-X. Wang and Y. Miao which are approximated by the sample averages of simulated and given observations. he sequence {Q(θ θ (r } often exhibits an increasing trend and then fluctuate around the value of Q(θ θ (r ifr becomes large enough. he sequence {θ (r } would also fluctuate the ML θ (r when r is sufficiently large. o monitor the convergence of the M algorithm we can plot {Q(θ θ (r } as well as {θ (r } against iteration number. We terminate the algorithm when the sequence of {Q(θ θ (r } become stationary. Otherwise we continue by increasing the Monte Carlo precision in the -step provided calculation is computationally feasible. 5 Standard errors of estimates It is well know that the distribution of maximum likelihood estimates ˆθ asymptotically tends to a normal distribution MV N(θ V (θ under some regularity conditions. he expected Fisher information matrix I(ˆθ which gives the inverse of variance matrix of ˆθ is approximated by the observed information matrix Jˆθ(Y : V (ˆθ 1 n [ 2 log L(θ θ 2 [ 2 log L(θ θ 2 ] ] θˆθ θˆθ [ ] n 2 log L(θ dx θ 2 nj(ˆθ. By using the following relation which is obtained in [9]: observed informationcomplete information-missing information we have [ ( ] I(ˆθ Jˆθ(Y 2 log L(θ 2 Q(θ θ (r log L(θ Var θ 2 θ 2 θ θ θˆθ where Var( is the conditional variance given (y X obs rs and θ (r. he details are to be provided in the reference [6]. ACKNOWLDGMNS. he authors acknowledge the financial support of the Foundation for Distinguished Young Scholars of Henan Province (084100510013. References [1] S. G. Baker and N. M. Laird Regression analysis for categorical variables with outcome subject to nonignorable nonresponse J. Am. Stat. Assoc 1988 83:62-69.

Note on the M algorithm 1889 [2] A. P. Dempster N. M. Laird and D. B. Rubin Maximum likelihood from incomplete data via the M algorithm. J.Royal Stat. Soc. B 1977 39: 1-38. [3] J. G. Ibrahim S. R. Lipsitz Missing covariates in generalized linear models when the missing data mechanism is non-ignorable J. Royal Stat. Soc. B 1999 61: 173-190. [4] R. J. A. Little and D. B. Rubin Statistical Analysis with Missing Data New York Wiley 2002. [5] J. Mao and A. K. Jain A self-organizing network for hyperellipsoidal clustering I rans. Neural Networks 1996 7(1: 16-29. [6] J. S. Park G. Q. Qian and Y. Jun Monte Carlo M algorithm in logistic linear models involving non-ignorable missing data Appl. Math. Comput. 2008 197: 440-450. [7] C. P. Robert and G. CasellaMonte Carlo Statistical Methods Berlin: Springer 1999. [8] M. M. RuedaS. Gonzalez and A. Arcos Indirect methods of imputation of missing data based on available units Appl. Math. Comput. 2005 164: 249-261. [9] Y. G. Smirlis and. K. Despotis Data envelopment analysis with missing values: An interval DA approach Appl. Math. Comput. 2006 177: 1-10. Received: February 2009