On the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture with Unequal Variance

Similar documents
Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean

Comparing large datasets structures through unsupervised learning

Portfolio Optimization within Mixture of Distributions

A hidden Markov model for criminal behaviour classification

Time Series Analysis

The Expectation Maximization Algorithm A short tutorial

Statistics Graduate Courses

Generalized Linear Mixed Models via Monte Carlo Likelihood Approximation Short Title: Monte Carlo Likelihood Approximation

Bootstrapping Big Data

Item selection by latent class-based methods: an application to nursing homes evaluation

Least Squares Estimation

Hypothesis Testing for Beginners

Multivariate Normal Distribution

Likelihood Approaches for Trial Designs in Early Phase Oncology

Generating Random Samples from the Generalized Pareto Mixture Model

An extension of the factoring likelihood approach for non-monotone missing data

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

Highly Efficient Incremental Estimation of Gaussian Mixture Models for Online Data Stream Clustering

Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization

Java Modules for Time Series Analysis

Revenue Management with Correlated Demand Forecasting

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

The Variability of P-Values. Summary

H INVESTIGATING THE BENEFITS OF INFORMATION MANAGEMENT SYSTEMS FOR HAZARD MANAGEMENT

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Nonparametric adaptive age replacement with a one-cycle criterion

A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA

Detection of changes in variance using binary segmentation and optimal partitioning

Orthogonal Distance Regression

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Linear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc.

Treatment of Incomplete Data in the Field of Operational Risk: The Effects on Parameter Estimates, EL and UL Figures

APPLIED MISSING DATA ANALYSIS

How To Check For Differences In The One Way Anova

Extreme Value Modeling for Detection and Attribution of Climate Extremes

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 12: June 22, Abstract. Review session.

Monte Carlo Simulation

A deeper analysis indicates, at least in qualitative

MATHEMATICAL METHODS OF STATISTICS

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Publication List. Chen Zehua Department of Statistics & Applied Probability National University of Singapore

Stephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng. LISREL for Windows: PRELIS User s Guide

Scaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce

Financial TIme Series Analysis: Part II

Statistical Machine Translation: IBM Models 1 and 2

Diagnosis of Students Online Learning Portfolios

Monte Carlo Methods in Finance

Independent t- Test (Comparing Two Means)

Signal Detection. Outline. Detection Theory. Example Applications of Detection Theory

How to get accurate sample size and power with nquery Advisor R

Model-Based Cluster Analysis for Web Users Sessions

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Note on the EM Algorithm in Linear Regression Model

Mixtures of Robust Probabilistic Principal Component Analyzers

Comparison of Estimation Methods for Complex Survey Data Analysis

Monte Carlo testing with Big Data

Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma

Understanding the Impact of Weights Constraints in Portfolio Theory

Gerry Hobbs, Department of Statistics, West Virginia University

Support Vector Machines with Clustering for Training with Very Large Datasets

Sampling and Subsampling for Cluster Analysis in Data Mining: With Applications to Sky Survey Data

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

Parametric fractional imputation for missing data analysis

A study on the bi-aspect procedure with location and scale parameters

Sample Size and Power in Clinical Trials

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Bandwidth Selection for Nonparametric Distribution Estimation

The Probit Link Function in Generalized Linear Models for Data Mining Applications

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Dirichlet Processes A gentle tutorial

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

Linear Models for Classification

Simple Linear Regression Inference

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Bayesian Statistics in One Hour. Patrick Lam

171:290 Model Selection Lecture II: The Akaike Information Criterion

SAS Certificate Applied Statistics and SAS Programming

Inference on the parameters of the Weibull distribution using records

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN T-TYPE TESTS FOR COMPLEX DATA

Impact of Remote Control Failure on Power System Restoration Time

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

The CUSUM algorithm a small review. Pierre Granjon

Monte Carlo analysis used for Contingency estimating.

Performance Optimization of I-4 I 4 Gasoline Engine with Variable Valve Timing Using WAVE/iSIGHT

More details on the inputs, functionality, and output can be found below.

Robust Inferences from Random Clustered Samples: Applications Using Data from the Panel Survey of Income Dynamics

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

Using SAS PROC MCMC to Estimate and Evaluate Item Response Theory Models

Time Series Analysis III

Statistics 3202 Introduction to Statistical Inference for Data Analytics 4-semester-hour course

A comparison of various clustering methods and algorithms in data mining

Fairfield Public Schools

Nonlinear Regression:

The Sample Overlap Problem for Systematic Sampling

An Introduction to Basic Statistics and Probability

1 Teaching notes on GMM 1.

Transcription:

On the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture with Unequal Variance by Z.D. Feng Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1124 Columbia Street, MP72 Seattle, W A 9814 C.E. McCulloch Biometrics Unit and Statistics Center Cornell University Ithaca, NY 14853 BU-111-MA August 199 Revised February 1994

ON THE LIKELIHOOD RATIO TEST STATISTIC FOR THE NUMBER OF COMPONENTS IN A NORMAL MIXTURE WITH UNEQUAL VARIANCE Z.D. Feng Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1124 Columbia Street, MP72 Seattle, WA 9814 C.E. McCulloch Biometrics Unit and Statistics Center Cornell University Ithaca, New York, 14853 February 1994 Abstract An important but difficult problem in practice is to determine the number of components in a mixture normal model with unequal variances. When the likelihood ratio test statistic -2log).. is used, it is unbounded above and fails to satisfy standard regularity conditions. A restricted maximization procedure must therefore be used which makes the procedure ad hoc. A consequence of this may explain the discrepancies among the simulation results of previous investigations. Key words: Finite normal mixture; EM algorithm; Likelihood ratio test; Bootstrap; Singularity.

1 INTRODUCTION Let X I,, Xn be a random sample of size n from a distribution F(x, } with density or mass function f(x, B). A finite mixture density has the form k f(x, B) = L 7rjfj(x, '1/Jj) j=i where '1/Jj is the m-dimensonal parameter vector for component probability density function fj, 7rj is the mixing probability for the component j with restrictions I:j=I 7rj = 1 and 7rj ~, and B = (1r1,,'Irk, '1/JI., '1/Jk) with dimension p = k- 1 + km. The number of components, k, may be known or unknown. For testing Ho : k 1 component mixture versus HI : k component mixture, where k 1 < k, the likelihood ratio statistic, -2log>.., does not have the usual chi-squared asymptotic distribution because: 1) under Ho, the true parameter Oo is on the boundary of the parameter space, 2) the distribution under Ho are not identifiable. Wolfe (1971) performed a small scale simulation to investigate the limiting distribution of -2log>.. and suggested that -~(n - 1 - m - ~)log>.. has (1) approximate limiting distribution X~m(k-k') where m is the dimension of '1/Jj and n is the sample SIZe. The testing of Ho: f(x,'i/j) = N(JL,G"2) HI: f(x,'i/j) = 7rN(JLI,G"i) + (1-7r)N(JLz,G"~) (2) (3) using the generalized likelihood ratio test has been found to have additional difficulties. In searching for the maximum likelihood estimator under HI, if we let ili equal any observation in the sample and let ur approach z~ro, then the likelihood is unbounded above and therefore the global maximum does not exist. This was first noticed by Kiefer and Wolfowitz (1956). Based on simulation, McLachlan (1987) suggested that x~ seemed to fit better as an approximation to the distribution of -2log>.. than x~ as suggested by Wolfe (1971). Hathaway (1985) suggested using the restriction mini,j(g"i/g"j) ~ c > to increase the chance of reliable convergence and claimed that by choosing a suitable c (satisfied by the true parameter) then there exists a consistent global maximum. McLachlan and Basford (1988) suggested restricting each component to have at least two observations to avoid the same difficult. However, their approach is only justifiable in estimation problems when we know that two components exist but it is not justified in the 1

testing problem. Both ideas essentially avoid putting a probability mass at a point. While we do not object to these approaches, there seems to be no mention of the dependency of the distribution of the test statistic on the choice of the criterion to avoid the above difficulty. We illustrate this point by a simulation. 2 Simulation Results and Conclusions We used 5 simulations with sample size 1 under the null hypothesis, N(O, 4). Figures 1, 2, and 3 are the simulated cumulative distributions of -2log).. using the EM algorithm (Dempster, Laird and Rubin, 1977), to compute the maximum likelihood estimator from incomplete data using three different criteria. The EM algorithm has been found to be useful to find the maximum likelihood estimator in mixture models (Redner and Walker, 1984). This algorithm was programmed in GAUSS, a mathematical programming language on the IBM PC. Since there are multiple maxima and a single starting point might converge to a local maxima, a grid of27 different starting points are used and the biggest maximum is chosen. We used grid of1r1 = (.1,.3,.5),fll = (-1,, 1),crf = (1,2,4), and fix (fl2, cr~) = (1, 4) as the starting points. Our experience indicates that using 27 starting points is computationally feasible and greatly improves the chance of finding the largest of the local maxima. Further increasing the number of starting points does not lead to much improvement. We compared three different criteria using a simple but equivalent one to the approach of Hathaway (1985): min(crf, cr~) 2: 1-6, 1-1 and 1-2 respectively. The simulation indicates that when the restriction is the most stringent (in the case of min(crf, cr ) 2: 1-6 ), the simulated distribution of -2log).. lies on the left of the cumulative distribution of xg. It is actually between x~ and xg. When the restriction is less stringent (as in the case of min(crf, cr~) 2: 1-1), it lies quite evenly between xg and X~ In the least stringent case of min(crf, o-~) 2: 1-2, it is closer to X~ In the upper tail which represents the cases where the maximum is located near the sigularity point, i.e., one of the data points, it becomes much bigger than the upper tail of X~ Brooks and Morgan (1993) used a maximization techique based on simulated annealing in the normal mixture problem and reported results which also differ from the McLachlan distributions. (FIGURES 1, 2, 3 ABOUT HERE) The above results indicate that when a restriction is imposed to avoid the unboundedness of the likelihood, the estimation process become ad hoc and therefore the associated asymptotic distribution of -2log).. also depends on the criterion of the restriction. This might explain the discrepancies 2

of the simulated asymptotic distribution of -2log:>... This also implies that the bootstrap procedure is a good candidate for this situation. Any reasonable criterion can be chosen to get the 'maximum likelihood estimator' and then bootstrap samples can be generated using the same criterion to compute -2log:>... Since the bootstrap distribution mimics the underlying distribution of -2log).. computed using the criterion, we do not need to worry about the correctness of the asymptotic distribution of -2log:>... This does, however, raise the interesting issue of power comparisions. For example, using the criteria of 1-6, 1-1, and 1-2 on the variance estimates and an alternative where (1fl,J.Ll,JL2,uf,u~) was (.5, 1.5, -1.5, 1,2.5), the values of the power were respectively as.68,. 7 and.5 for the likelihood ratio test with size.5 from 5 simulations. The configuration here is chosen to provide (JL, a- 2) = (, 4) under null hypothesis. The simulated vlaues of power corresponding to size.1 are.82,. 78, and. 7 respectively. To sum up, the comparison of different authors' results on the distribution of the likelihood ratio statistic for testing the number of components in a normal mixture with unequal variances is not very meaningful without explicit identification of the criteria used for computing. ACKNOWLEDGEMENT The authors would like to thank the referees and editor Morgan for their helpful comments. REFERENCES Brooks, S.P. and Morgan, B.J. T. (1993). Optimisation using simulated annealing. Manuscript, Institute of Mathematics and Statistics, University of Kent. Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1-38. Edlefsen, L.E., and Jones, S.D. (1988). GAUSS version 2.. Aptch System, Inc. Kent, WA. Hathaway, R.J. (1985). A contrained formulation of maximum-likelihood estimation for normal mixture distributions. Annals of Statistics, 9, 795-8. McLachlan, G.J. (1987). On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. A class of statistics with asymptotically normal distribution. Applied Statistics, 36, 318-324. McLachlan, G.J. and Basford, K.E. (1988). Mixture Models: Inference and Applications to Clustering, Marcel Dekker, New York. Redner, RA. and Walker, H.F. (1984) Mixture densities, maximum likelihood and the EM algorithm. SIAM Review, 26, 195-235. 3

Wolfe, J.H. (1971). A Monte Carlo study of the sampling distribution of the likelihood ratio for mixtures of multinomal distributions. Technical Bulletin STB 72-2, U.S. Naval Personnel and Training Research Laboratory, San Diego. 4

Figure 1. Simulated cumulative distribution function of the likelihood ratio statistic and x~ and xg distributions in a test of Ho : N(p,, a- 2 ) vs H 1 : 1rN(p, 1, a-f)+ (1-1r)N(p,2, a-~) when Ho is true. The likelihood ratio statistic is based on the maximum likelihood estimator. (sample size 1, 5 replications, variance ~ w-6). Figure 2. Simulated cumulative distribution function of the likelihood ratio statistic and xg and x~ distributions in a test of Ho : N(p,, u 2 ) vs H 1 : 1r N (1-tb u?} + (1-1r )N(p,2, ui) when H o is true. The likelihood ratio statistic is based on the maximum likelihood estimator. (sample size 1, 5 replications, variance ~ w-1). Figure 3. Simulated cumulative distribution function of the likelihood ratio statistic and xg and x~ distributions in a test of Ho : N(p,, u 2) vs H 1 : 1r N (1-tb uf} +(I - 1r )N(p,2, ui) when H o is true. The likelihood ratio statistic is based on the maximum likelihood estimator. (sample size 1, 5 replications, variance ~ w-2). 1

cumulative distribution..1.2.3.4.5.6.7.8.9 1. 1.1,, -' W") ~

cumulative distribution..1.2.3.4.5.6.7.8.9 1. 1.1 A" (!) ::Y Q_ \ rt- -,. N............ "',. ~ <J) >< t-' >< J (b t-'... ::T. "1 ~... N J I

cumulative distribution..1.2.3.4.5.6.7.8.9 1. 1.1 ~ U1 N A"" (l).\ "i l ::J N U1 Q_....-+ (.,J l ><... ><............ 1... ::r (.,J ()1.. '1 ~.....f>. X" CD... l}