Mode-Finding of Gaussian Mixtures

Transcription

1 Mode-Finding of Gaussian Mixtures Seppo Pulkkinen University of Turku January 13, 2012 Seppo Pulkkinen (University of Turku) Mode-Finding of Gaussian Mixtures January 13, / 22

2 Outline 1 Introduction Gaussian Mixtures Kernel Density Estimation 2 Practical Applications Visual Tracking 3 Overview of the Algorithm Continuation via the Gaussian Convolution Trust region-based predictor-corrector method Choice of initial values 4 Numerical Results 5 Software Implementation Seppo Pulkkinen (University of Turku) Mode-Finding of Gaussian Mixtures January 13, / 22

3 Gaussian Mixtures Basic Definitions and Motivation A d-dimensional Gaussian mixture is a weighted sum of probability densities of n random variables X i N d (µ i, Σ i ) with mean µ i R d covariance matrix Σ i R d d + That is, n n p(x) = w i g(x; µ i, Σ i ), w i = 1, i=1 where ( 1 g(x; µ, Σ) = exp (x µ)t Σ 1 ) (x µ) (2π) d 2 Σ 2 denotes the normal density and w i > 0 are weighting coefficients. i=1 Seppo Pulkkinen (University of Turku) Mode-Finding of Gaussian Mixtures January 13, / 22

4 Basic Definitions and Motivation Kernel Density Estimation A kernel density estimate (KDE) is constructed from a finite set of samples y i representing some unknown distribution. A KDE is a linear combination of sample probabilities, ˆp h (x) = 1 n K h ( x y n i ), i=1 The kernel function K h : R + R + satisfies the condition K h ( x )dx = 1 R d and h > 0 is the kernel bandwidth. The choice of the kernel bandwidth h is critical for the accuracy of the estimate. The required number of samples is large and increases rapidly with dimension. However, this is a viable approach in low dimensions. Seppo Pulkkinen (University of Turku) Mode-Finding of Gaussian Mixtures January 13, / 22

5 Basic Definitions and Motivation Density Estimation with Gaussian Kernels The KDE induced by the Gaussian kernel ) 1 K h (r) = ( (2π) d 2 h exp r2 d 2h 2 is a special case of a Gaussian mixture. A Gaussian Mixture, its KDE and histogram. A Gaussian KDE and its basis functions. Seppo Pulkkinen (University of Turku) Mode-Finding of Gaussian Mixtures January 13, / 22

6 Brute force algorithm for locating all modes of a GM: At each centroid µ i, start a local optimization method. Not computationally feasible when the number of mixture components is large (e.g. when seeking the modes of a KDE). Numerical results imply that: For homoscedastic GM (Σ i = σi): The algorithm locates all modes (except in some pathological cases). For isotropic GM (Σ i = σ i I): The algorithm usually works, but it is possible to construct an example where it fails to locate all modes. For nonisotropic GM: The algorithm generally fails to locate all modes. Seppo Pulkkinen (University of Turku) Mode-Finding of Gaussian Mixtures January 13, / 22 Basic Definitions and Motivation Mode-finding of Gaussian Mixtures and KDEs M. Á. Carreira-Perpiñán (2000). Mode-finding for Mixtures of Gaussian Distributions IEEE Transactions on Pattern Analysis and Machine Intelligence 22(11)

7 Basic Definitions and Motivation Finding the global Mode of a Gaussian Mixture or KDE In some applications, we only consider finding the global mode. A Gaussian mixture, and especially a KDE can be highly multimodal with a large number of nonrelevant modes. computationally expensive if the number of Gaussians is large. A bivariate Gaussian mixture. KDE of the Gaussian mixture. For seeking the global mode, we need an algorithm that does not get trapped into irrelevant local maxima. converges rapidly to the global mode. Seppo Pulkkinen (University of Turku) Mode-Finding of Gaussian Mixtures January 13, / 22

8 Practical Applications Practical Application: Real-Time Object Tracking Original image. Gaussian mixture model. Assume that a Gaussian mixture model (or KDE) gives the probability that the specified target is at the given location. The target can be located by finding the global mode of the density. Exhaustive mode finding might not be feasible since we are dealing with a real-time application. Seppo Pulkkinen (University of Turku) Mode-Finding of Gaussian Mixtures January 13, / 22

9 Mode-Finding Algorithm Global Mode-Finding Algorithm We have developed an algorithm for finding global modes (maxima) of isotropic Gaussian mixtures and KDEs. The isotropic Gaussian mixture (i.e. a GM with Σ i = σi 2 I) is p(x) = 1 (2π) d 2 n w i σ d i=1 i exp ( x µ i 2 ). The algorithm is more robust and efficient than, for instance, the classical mean-shift algorithm for mode-finding. 2σ 2 i S. Pulkkinen and M.M. Mäkelä and N. Karmitsa (2012). A Continuation approach to Mode-Finding of Multivariate Gaussian Mixtures and Kernel Density Estimates Journal of Global Optimization, to appear. Seppo Pulkkinen (University of Turku) Mode-Finding of Gaussian Mixtures January 13, / 22

10 Mode-Finding Algorithm Previous Approaches: The Mean-Shift Algorithm The Gaussian mean shift (Cheng, Comaniciu et al.) is obtained from the condition p(x) = 0. The update formula is a fixed point iteration: ) n i=1 w i exp ( x k µ i 2 µ σ x k+1 = d+2 2σ 2 i i i ( ). n i=1 w i exp x k µ i 2 σ d+2 2σ 2 i i This method has only linear convergence. The result also depends on the choice of the starting point. In principle, a mean-shift iteration could be started from a large number of starting points. This might not be computationally feasible for many practical applications. Seppo Pulkkinen (University of Turku) Mode-Finding of Gaussian Mixtures January 13, / 22

11 Mode-Finding Algorithm The Continuation Principle Use a parametrized transformation p γ such that lim γ 0 p γ = p. p γ0 has a unique maximizer for some γ 0 > 0. Gradually transform the original Gaussian mixture p into p γ0. Follow the maximizers of the transformed mixtures back to the original mixture as γ 0. This continuation approach effectively skips undesired local maxima. Seppo Pulkkinen (University of Turku) Mode-Finding of Gaussian Mixtures January 13, / 22

12 Mode-Finding Algorithm The Gaussian Convolution Definition The Gaussian convolution of a Gaussian mixture p is ) y x 2 p γ (x) = C γ p(y) exp ( γ 2 dy, R d where C γ = ( 1 πγ ) d is a normalization factor. Larger values of γ give a smoother mixture. The transformation produces the original mixture as γ 0. The Gaussian convolution of a Gaussian mixture has a closed-form expression. Seppo Pulkkinen (University of Turku) Mode-Finding of Gaussian Mixtures January 13, / 22

13 Mode-Finding Algorithm Differential Equation Formulation Define the homotopy mapping Impose the optimality conditions h(x, γ) = p γ (x), h(x, 0) = p(x). x h(x(γ), γ) = 0, 2 xh(x(γ), γ) is negative definite γ [0, γ 0 ], γ 0 > 0. This leads to the initial value problem x (γ) = 2 xh(x(γ), γ) 1 γ xh(x(γ), γ) γ ]0, γ 0 ], x(γ 0 ) = x 0 for some γ 0 > 0 and x 0 R d satisfying the above optimality conditions. Seppo Pulkkinen (University of Turku) Mode-Finding of Gaussian Mixtures January 13, / 22

14 Implementation and Numerical Results Predictor-Corrector Method The solution curve of the initial value problem is highly nonlinear. However, it is a linear first-order ODE. A standard IVP solver can be applied. Linearize the problem, i.e. take a predictor step along the tangent of the solution curve. Apply a corrector method to return back to the solution curve. The proposed algorithm uses a rapidly converging and robust trust region Newton method with superlinear convergence. Seppo Pulkkinen (University of Turku) Mode-Finding of Gaussian Mixtures January 13, / 22

15 Implementation and Numerical Results Choice of Starting Point The convolved Gaussian mixture p γ is strictly concave in a given d-ball if γ is sufficiently large. The unique maximizer of p γ converges to the mean of the Gaussian mixture as γ. It can be shown that all stationary points of a GM lie within the convex hull of the means µ i. Consequently, the starting point for the continuation can be uniquely determined. Seppo Pulkkinen (University of Turku) Mode-Finding of Gaussian Mixtures January 13, / 22

16 Implementation and Numerical Results Summary of the Algorithm Key Components of the Proposed Algorithm: 1 The Gaussian convolution for smoothing the Gaussian mixture. 2 A differential equation describing the transformation of maximizers of the transformed mixtures. 3 Conditions for concavity of the transformed mixture. As a result, the starting point can be uniquely determined. 4 A robust trust region-based predictor-corrector method. A steplength adaptation strategy. Newton-based corrector with trust regions. Seppo Pulkkinen (University of Turku) Mode-Finding of Gaussian Mixtures January 13, / 22

17 Numerical Results Implementation and Numerical Results The algorithm was applied to kernel density estimates obtained by sampling from Gaussian mixtures considered as the true density. We chose the optimal kernel bandwidth. The results imply a high success rate considering that the number of modes in the KDEs is large due to sampling artifacts. p Modes in p ˆp h Modes in ˆp h Success p Modes in p ˆp h Modes in ˆp h Success yes 1 37 no 2 35 yes no 3 40 yes 3 40 no 1 24 yes 1 23 no 2 21 yes yes 3 19 yes 3 23 yes 1 36 yes 1 35 no 2 25 yes no 3 31 yes 3 36 no 1 37 yes 1 26 yes 2 27 no yes 3 35 yes 3 27 yes 1 23 yes 1 57 yes 2 26 yes yes 3 24 yes 3 49 yes Seppo Pulkkinen (University of Turku) Mode-Finding of Gaussian Mixtures January 13, / 22

18 Implementation and Numerical Results Implementation Python + numpy + matplotlib high-level interfaces FORTRAN + OpenMP evaluation visualization f2py testing mode-finding The core routines are implemented in Fortran 95. Parallelization via OpenMP. Optimized routines for evaluation of a Gaussian mixture and its derivatives. Mode-finding via the trust region-based Newton method and the continuation method. Additional Python interfaces using PyLab via f2py. Future versions will include methods for other methods related to Gaussian mixture analysis. finding all modes and not only the global one (clustering) finding saddle points (principal curve estimation) Seppo Pulkkinen (University of Turku) Mode-Finding of Gaussian Mixtures January 13, / 22

19 Conclusions Conclusions and Future Research The proposed algorithm: Converges to the global or some significant mode of a Gaussian mixture or KDE. Is insensitive to the choice of starting point. Is computationally highly efficient. Topics of future research: Mode-Finding of anisotropic Gaussian mixtures? Finding all modes or saddle points of a Gaussian KDE is a fundamental problem in data analysis. In KDE-based clustering, the modes of the KDE represent clusters. The saddle points of a KDE represent cluster boundaries and they lie on the principal curves of a dataset. Develop an algorithm for exhaustive mode- and saddle point-finding of Gaussian mixtures and KDEs? Seppo Pulkkinen (University of Turku) Mode-Finding of Gaussian Mixtures January 13, / 22

20 Other Applications and Problems related to Gaussian Mixtures Finding all Modes: Connectedness of Critical Curves A point x R d is on a critical curve of p if and only if 2 p(x) p(x) = λ p(x) for some λ R Critical curves connect critical points (i.e. points where p(x) = 0). In principle, all modes of a Gaussian mixture can be found by following critical curves. Implementing a numerical algorithm for tracing a critical curve is nontrivial. Open problem: Under what conditions the graph of critical curves is connected? Seppo Pulkkinen (University of Turku) Mode-Finding of Gaussian Mixtures January 13, / 22

21 Other Applications and Problems related to Gaussian Mixtures Kernel Density Clustering True clusters Kernel density clusters Construct a KDE from the data samples. Each mode of the KDE is interpreted as a cluster. A data point x i belongs to the given cluster if a local iteration (e.g. mean-shift or Newton) started from x i converges to the mode corresponding the cluster. Seppo Pulkkinen (University of Turku) Mode-Finding of Gaussian Mixtures January 13, / 22

22 Other Applications and Problems related to Gaussian Mixtures Kernel Density Principal Curves (Ozertem et al., 2008) True principal curve Kernel density principal curves A principal curve passes through the middle of the data. A point x R d is on the principal curve of a probability density if and only if 2 p(x) p(x) = λ p(x), where λ is the largest negative eigenvalue of 2 p(x). Seppo Pulkkinen (University of Turku) Mode-Finding of Gaussian Mixtures January 13, / 22