Nonlinear Blind Source Separation and Independent Component Analysis

Nonlinear Blind Source Separation and Independent Component Analysis Prof. Juha Karhunen Helsinki University of Technology Neural Networks Research Centre Espoo, Finland Helsinki University of Technology, Espoo, Finland 1

Part I: Linear Independent Component Analysis and Blind Source Separation Helsinki University of Technology, Espoo, Finland 2

Motivation for independent component analysis (ICA) and blind source separation (BSS) Let us start with an example: three people are speaking simultaneously in a room that has three microphones. Denote the microphone signals by x 1 (t), x 2 (t), and x 3 (t). Each is a weighted sum of the speech signals which we denote by s 1 (t), s 2 (t), and s 3 (t): x 1 (t) = a 11 s 1 (t) + a 12 s 2 (t) + a 13 s 3 (t) (1) x 2 (t) = a 21 s 1 (t) + a 22 s 2 (t) + a 23 s 3 (t) (2) x 3 (t) = a 31 s 1 (t) + a 32 s 2 (t) + a 33 s 3 (t) (3) Cocktail-party problem: estimate the original speech signals using only the recorded signals. Helsinki University of Technology, Espoo, Finland 3

0.5 0 0.5 0 500 1000 1500 2000 2500 3000 3500 1 0 1 0 500 1000 1500 2000 2500 3000 3500 1 0 1 0 500 1000 1500 2000 2500 3000 3500 Figure 1: The original speech waveforms. Helsinki University of Technology, Espoo, Finland 4

1 0 1 0 500 1000 1500 2000 2500 3000 3500 2 0 2 0 500 1000 1500 2000 2500 3000 3500 2 1 0 1 0 500 1000 1500 2000 2500 3000 3500 Figure 2: The observed microphone signals. Helsinki University of Technology, Espoo, Finland 5

The problem: find the sources s 1 (t), s 2 (t) and s 3 (t) from the observed signals x 1 (t), x 2 (t) and x 3 (t). As the weights a ij are different, we may assume that the matrix A = (a ij ) (although unknown) is invertible. Thus there exist another set of weights w ij such that s 1 (t) = w 11 x 1 (t) + w 12 x 2 (t) + w 13 x 3 (t) (4) s 2 (t) = w 21 x 1 (t) + w 22 x 2 (t) + w 23 x 3 (t) s 3 (t) = w 31 x 1 (t) + w 32 x 2 (t) + w 33 x 3 (t) It turns out that this blind source separation (BSS) problem can be solved using independent component analysis (ICA). In ICA, it suffices to assume that the sources s j (t) are nongaussian and statistically independent. Helsinki University of Technology, Espoo, Finland 6

10 5 0 5 0 500 1000 1500 2000 2500 3000 3500 5 0 5 0 500 1000 1500 2000 2500 3000 3500 5 0 5 0 500 1000 1500 2000 2500 3000 3500 Figure 3: The estimates of the speech waveforms obtained by ICA. Helsinki University of Technology, Espoo, Finland 7

Definition of Independent Component Analysis ICA model is a statistical latent variable model x i = a i1 s 1 + a i2 s 2 +... + a in s n, for all i = 1,..., n (5) where the a ij, i, j = 1,..., n are some real coefficients. This is the basic linear ICA model, which can be extended in many ways. In the basic ICA model, we assume that each mixture x i as well as each independent component s j is a random variable. Using vector-matrix formulation: let x = (x 1,..., x n ) T, s = (s 1,..., s n ) T, A = (a ij ) (6) Then the basic ICA model is x = As (7) Helsinki University of Technology, Espoo, Finland 8

If the columns of A are denoted a j, the model can also be written as x = n a i s i (8) i=1 There are some basic assumptions or restrictions in the model. 1. The independent components are assumed statistically independent. 2. The independent components must have nongaussian distributions. - In the basic ICA, we need not know them. 3. In the basic ICA, the unknown mixing matrix A is square. - In other words, the number of independent components is equal to the number of observed mixtures. - This assumption can be relaxed by allowing more or less mixtures than independent components. Helsinki University of Technology, Espoo, Finland 9

Indeterminacies in the basic ICA model: scaling, sign, and order of the independent components. That is, only the waveforms of the independent components can be recovered without further information. Methods for linear ICA Independent components are usually estimated by trying to find an inverse separating matrix B. The components of the vector should be statistically independent. Ideally, B = A 1. y = Bx (9) Even though the ICA model x = As is linear and simple, the Helsinki University of Technology, Espoo, Finland 10

problem is difficult because of its blind nature. Higher-order statistics are needed for ICA. Using second-order statistics (covariances) provides uncorrelated components only. There exist infinitely many such uncorrelated solutions; most of them are quite different from ICA. However, prewhitening the data vectors x so that their components become uncorrelated is a useful preprocessing step. After that, the separating matrix B becomes orthogonal. Many methods for linear ICA now exist; the most popular of them are: The natural gradient algorithm B = µ[i g(y)y T ]B (10) Helsinki University of Technology, Espoo, Finland 11

- Here g(y) is a suitable nonlinearity applied to the components of the output vector y. - A simple adaptive neural algorithm, well justified theoretically. Fixed-point (FastICA) algorithms. - Fast batch algorithms applicable for large-scale problems. For more information, see our new 500-page textbook/monograph: A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Analysis, Wiley 2001. Linear blind source separation (BSS) In linear blind source separation (BSS), one tries to separate the original source signals from their linear mixtures. Assuming that the sources are independent and the mixing model is linear, x = As, one can apply linear ICA methods directly to BSS. Helsinki University of Technology, Espoo, Finland 12

Another major group of linear BSS methods utilizes time structure of the sources. Second-order temporal statistics are then sufficient for achieving blind separation. The sources can be even Gaussian provided that they have different autocorrelation sequences. ICA neglects possible temporal structure of the sources or independent components, treating them as random variables. On the other hand, it works for temporally uncorrelated sources. Ideally, both spatial independence and temporal structure should be taken into account in estimation. Helsinki University of Technology, Espoo, Finland 13

Practical applications of ICA The cocktail party problem : separation of voices or music or sounds. Sensor array processing, e.g. radar. Biomedical signal processing with multiple sensors: EEG, ECG, MEG, fmri. Telecommunications: e.g. multiuser detection in CDMA. Financial and other time series. Noise removal from signals and images. Feature extraction for images and signals. Projection pursuit: finding interesting projections from the data for visualizing it in two dimensions. Helsinki University of Technology, Espoo, Finland 14

Figure 4: Basis functions in ICA of natural images. These basis functions can be considered as the independent features of images. Every image window is a linear sum of these windows. Helsinki University of Technology, Espoo, Finland 15

MEG 1000 ft/cm 3 4 5 5 6 6 2 1 EOG ECG 500 µv 500 µv saccades blinking biting MEG 1 1 2 2 3 3 4 4 5 5 6 6 VEOG HEOG ECG 10 s Figure 5: 12 magnetic brain (MEG) signals containing various artifacts: ocular and muscle activity, the cardiac cycle, and magnetic disturbances. Helsinki University of Technology, Espoo, Finland 16

Magnitude Time delay Figure 6: An example of multipath propagation in urban environment. Helsinki University of Technology, Espoo, Finland 17

Extensions of basic linear ICA Noisy ICA; estimation of the mixing matrix and independent components requires more sophisticated methods. Overcomplete bases: the number of independent components is larger than the number of mixtures. Taking into account the temporal structure in the data. ICA and BSS for nonlinear mixture models. Separation of convolutive mixtures containing time delays. Separation of correlated or non-independent sources. Nonstationary sources, time dependent mixing matrices. Semi-blind problems: some prior information on the source signals and/or mixtures is available. Helsinki University of Technology, Espoo, Finland 18