AMICA: An Adaptive Mixture of Independent Component Anayzers wit Sared Components Jason A. Pamer, Ken Kreutz-Degado, and Scott Makeig 1 Abstract We derive an asymptotic Newton agoritm for Quasi Maximum Likeiood estimation of te ICA mixture mode, using te ordinary gradient and Hessian. Te probabiistic mixture framework can accommodate non-stationary environments and arbitrary source densities. We prove asymptotic stabiity wen te source modes matc te true sources. An appication to EEG segmentation is given. Index Terms Independent Component Anaysis, Bayesian inear mode, mixture mode, Newton metod, EEG I. INTRODUCTION A. Reated Work Te Gaussian iner mode approac is described [1] [3]. Non-Gaussian sources in te form of Gaussian scae mixtures, in particuar Student s t distribution, were deveoped in [4] [6]. A mixture of Gaussians source mode was empoyed in [7] [11]. Simiar approaces were proposed in [12], [13]. Tese modes generay incude noise and invove computationay intensive optimization agoritms. Te focus in tese modes is generay on variationa metods of automaticay determining te number of mixtures in a mixture mode during te optimization procedure. Tere is aso overap between te variationa tecnique used in tese metods, and te Gaussian scae mixture approac to representing non-gaussian densities. A mode simiar to tat proposed ere was presented in [14]. Te main distinguising features of te proposed mode are, J. A. Pamer and S. Makeig are wit te Swartz Center for Computationa Neuroscience, La Joa, CA, jason,scott@sccn.ucsd.edu. K. Kreutz-Degado is wit te ECE Department, Univ. of Caifornia San Diego, La Joa, CA, kreutz@ece.ucsd.edu.
2 1) Mixtures of Gaussian scae mixture sources provide more fexibiity tan te Gaussian mixture modes of [7], [11], or fixed density modes used in [14]. Accurate source density modeing is important to take advantage of Newton convergence for te true source mode, as we as to distinguis between partiay overapping ICA modes by posterior ikeiood. 2) Impementation of te Amari Newton metod described in [15] greaty improving te convergence, particuary in te mutipe mode case, in wic prewitening is not possibe (in genera a different witening matrix wi be required for eac unknown mode.) 3) Te second derivative source density quantities are converted to first derivative quantities using integration by parts reated properties of te score function and Fiser Information Matrix. Again accurate modeing of te source densities makes tis conversion possibe, and makes it robust in te presence of oter (interfering) modes. Te proposed mode is readiy extendabe to MAP estimation or Variationa Bayes or Ensembe Learning approaces, wic put conjugate yperpriors on te parameters. We are interested primariy in te arge sampe case, so we do not pursue tese extensions ere. Te probabiistic framework can aso be extended to incorporate Markov dependence of state parameters in te ICA and source mixtures. We ave aso extended te mode to incude mixtures of inear processes [16], were bind deconvoution is treated in a manner simiar to [17] [20], as we as compex ICA [21] and dependent sources [21] [23]. In a of tese contexts te adaptive source densities, asymptotic Newton metod, and mixture mode features can a be maintained. II. ICA MIXTURE MODEL In te standard inear mode, observations x(t) R m, t = 1,..., N, are modeed as inear combinations of a set of basis vectors A [a 1 a n ] wit random and independent coefficients s i (t), i = 1,..., n, x(t) = As(t) We assume for simpicity te noiseess case, or tat te data as been pre-processed, e.g.by PCA, fitering, etc., to remove noise. Te data is assumed owever to be non-stationary, so tat different inear modes may be in effect at different times. Tus for eac observation x(t), tere is an index t 1,..., M, wit corresponding compete basis set A wit center c, and a random vector of zero mean, independent sources s(t) q (s), were, n q (s) = q i (s i ) i=1
3 suc tat, x(t) = A s(t) + c wit = t. We sa assume tat ony one mode is active at eac time, and tat mode is active wit probabiity γ. For simpicity we assume tempora independence of te mode indices t, t = 1...., N. Since te mode is conditionay inear, te conditiona density of te observations is given by, were W A 1. p(x(t) ) = det W q ( W (x(t) c ) ) Te sources are taken to be mixtures of (generay nongaussian) Gaussian Scae Mixtures (GSMs), as in [24], q i ( si (t) ) = ( ) α ij βij q ij βij (s i (t) µ ij ) ; ρ ij were eac q ij is a GSM parameterized by ρ ij. Tus te density of te observations X x(t), t = 1,..., N, is given by, γ 0, p(x; Θ) = N t=1 =1 M =1 γ = 1. Te parameters to be estimated are, = 1,..., M, i = 1,..., n, and j = 1,..., m. M γ p(x(t) ), Θ = W, c, γ, α ij, µ ij, β ij, ρ ij, A. Invariances in te mode Besides te accepted invariance to permutation of te component indices, invariance or redundancy in te mode aso exists in two oter respects. Te first concerns te mode centers, c, and te source density ocation parameters µ ij. Specificay, we ave p(x; Θ) = p(x; Θ ), Θ =..., c, µ ij,..., Θ =..., c, µ ij,..., if c = c + c, µ ij = µ ij [W c ] i, j = 1,..., m for any c. Putting c = Ex(t), we make te sources s(t) zero mean given te mode. Te zero mean assumption is used in te cacuation of te expected Hessian for te Newton agoritm.
4 Tere is aso scae redundancy in te row norms of W and te scae parameters of te source densities. Specificay, p(x; Θ) = p(x; Θ ), were Θ = W, µ ij, β ij,..., Θ = W, µ ij, β ij,..., if for any τ i > 0, [W ] i: = [W ] i: /τ i, µ ij = µ ij/τ i, β ij = β ijτ 2 i, j = 1,..., m were [W ] i: is te it row of W. We use tis redundancy to enforce at eac iteration tat te rows of W are unit norm by putting τ i = [W ] i:. Tese reparameterizations constitute te ony updates for te mode centers c. Te centers are redundant parameters given te source means, and are used ony to maintain zero posterior source mean given te mode. III. MAXIMUM LIKELIHOOD In tis section we assume tat te mode is given and suppress te subscript. Given i.i.d. data X = x 1,..., x N, we consider te ML estimate of W = A 1. For te density of X, we ave, p(x) = N detw p s (Wx t ) t=1 Let y t = Wx t be te estimate of te sources s t, and et q i (y i ) be te density mode for te it source, wit q(y t ) = i q i(y it ). We define, f i (y it ) og q i (y it ) and f(y t ) i f i(y it ). For te negative og ikeiood of te data ten (wic is to be minimized), we ave, L(W) = N og det W + f(y t ) (1) t=1 Te gradient of tis function is proportiona to, L(W) W T + 1 N N f(y t )x T t (2) Note tat if we mutipy (2) by W T W on te rigt, we get, ( W = I 1 N ) g t yt T W (3) N t=1 t=1
5 were g t f(y t ). Tis transformation is in fact a positive definite inear transformation of te matrix gradient. Specificay, using te standard matrix inner product in R n n, A, B = tr(ab T ), we ave for nonzero V, V, VW T W = VW T, VW T > 0 (4) wen W is fu rank. Te direction (3) is known as te natura gradient [25]. A. Hessian Denote te gradient (2) by G wit eements g ij, eac a function of W. Taking te derivative of (2), we find, g ij = [W 1 ] i [W 1 ] jk + w k f i ( [Wxt ] k ) xjt x t δ ik were δ ik is te Kronecker deta symbo, and N denotes te empirica average 1 N. To see ow tis inear Hessian operator transforms an argument B, et C = H(B) be te transformed matrix. Ten we cacuate, c ij = k Te first term of c ij can be written, [W 1 ] i [W 1 ] jk b k + [W 1 ] i [W 1 B] j = f N i (y it )x jt b i x t [W T ] i [B T W T ] j = [W T B T W T ] ij Writing te second term in matrix form as we, we ave for te inear transformation C = H(B), C = W T B T W T + diag(f (y t ))Bx t x T t were diag(f (y t )) is te diagona matrix wit diagona eements f i (y it). Tis equation can be simpified as foows. First, et us rewrite te transformation (5) in terms of te source estimates y. We first write, C = (BW 1 ) T W T + diag ( f (y t ) ) BW 1 y t y T t W T N N N (5) Now if we define C CW T and B BW 1, ten we ave, C = B T + diag ( f (y t ) ) Byt yt T N (6)
6 At te optimum, we may assume tat te source density modes q i (y i ) are equivaent to te true source densities p i (s i ). Writing (6) in component form and etting N go to infinity we find for te diagona eements, [ C] ii = [ B] ii + E f i (y i ) k [ B] ik y k y i = (1 + ηi )[ B] ii (7) were we define η i Ef (y i )y 2 i. Te cross terms drop out since te expected vaue of f (y i )y i y k is zero for k i by te independence and zero mean assumption on te sources. Now we note, as in [15], [26], [27], tat te off-diagona eements of te equation (6) can be paired as foows, [ C] ij = [ B] ji + E f i (y i ) k [ B] ik y k y j = [ B]ji + κ i σ 2 j [ B] ij [ C] ji = [ B] ij + E f j (y j ) k [ B] jk y k y i = [ B]ij + κ j σ 2 i [ B] ji were we define κ i Ef i (y i) and σi 2 Ey2 i. Again te cross terms drop out due to te expectation of independent zero mean random variabes. Putting tese equations in matrix form, we ave, [ C] ij [ C] = κ iσj 2 1 [ B] ij ji 1 κ j σi 2 [ B] (8) ji If we denote te inear transformation defined by equations (7) and (8) by C = H( B), ten we ave, C = H(B) = H ( BW 1) W T (9) Tus by an argument simiar to (4), we see tat H is asymptoticay positive definite if and ony if H is asymptoticay positive definite and W is fu rank. Te conditions for positive definiteness of H can be found by inspection of equations (7) and (8). Wit te definitions, te conditions can be stated [15] as, 1) 1 + η i > 0, i 2) κ i > 0, i, and, 3) κ i κ j σ 2 i σ2 j 1 > 0, i j η i Ey 2 i f i (y i ), κ i Ef i (y i ), σ 2 i Ey 2 i B. Asymptotic stabiity Using integration by parts, it can be sown tat te stabiity conditions are aways satisfied wen f(y) = og p(y), i.e. q(y) matces te true source density p(y). Specificay, we ave te foowing.
7 Teorem 1: If f i (y i ) og q i (y i ) = og p i (y i ), wit p i (y) = 1, i = 1,..., n, i.e. te source density modes matc te true source densities, and p i (y) is twice differentiabe wit Ef i (y) and Eyi 2 finite, i = 1,..., n, and at most one source is Gaussian, ten te stabiity conditions od. Proof: For te first condition, we use integration by parts to evauate, Ey 2 f (y) = y 2 f (y)p(y)dy wit u = y 2 p(y) and dv = f (y)dy. Using te fact tat v = f (y) = p (y)/p(y), we get y 2 p (y) f (y) ( 2y y 2 f (y) ) p(y) dy (10) Te first term in (10) is zero if p (y) = o(1/y 2 ) as y ±. Tis must be te case for integrabe p(y), since oterwise we woud ave p (y) C/y 2, and p(y) = O(1/y) and non-integrabe. Ten, since p(y)dy = 1, we ave, 1 + Ey 2 f (y) = ( y 2 f (y) 2 2yf (y) + 1 ) p(y)dy = E ( yf (y) 1 ) 2 0 were equaity ods ony if p(y) 1/y, so strict inequaity must od for integrabe p(y). For te second condition, Ef (y) > 0 using integration by parts wit u = p(y), dv = f (y)dy, and te fact tat p (y) must tend to 0 as y ± for integrabe p(y), we get, Ef (y) = Finay, for te tird condition, we ave, f (y) 2 p(y)dy = E f (y) 2 > 0 Ey 2 Ef (y) = E y 2 E f (y) 2 ( Eyf (y) ) 2 = 1 by te Caucy Scwartz inequaity, wit equaity ony for f (y) y, i.e. p(y) Gaussian. Tus, wenever at east one of y i and y j is nongaussian. Ey 2 i Ef i (y i )Ey 2 j Ef j (y j ) > 1
8 C. Newton metod Te inverse of te Hessian operator, from (9), wi be given by, B = H 1 (C) = H 1( CW T ) W (11) Te cacuation of B = H 1 ( C) is easiy carried out by inverting te transformation (7) and (8), [ B] ii = [ C] ii 1 + η i, i = 1,..., n (12) [ B] ij = κ jσ 2 i [ C] ij [ C] ji κ i κ j σ 2 i σ2 j 1, i j (13) Te Newton direction is given by taking C = G, te gradient (2), Let, W = H 1( GW T ) W (14) Φ 1 N We ave GW T = I Φ. If we et B = H 1 ( GW T ), ten N g t yt T (15) t=1 bii = 1 [Φ] ii 1 + η i, i = 1,..., n (16) Ten bij = [Φ] ji κ j σ 2 i [Φ] ij κ i κ j σ 2 i σ2 j 1, i j (17) W = BW (18) IV. CRAMER-RAO LOWER BOUND By te teorem on asymptotic efficiency of Maximum Likeiood estimation, we ave tat te asymptotic and minimum error covariance of te estimation of W is given by te inverse Fiser Information matrix. Aso, since te asymptotic distribution is Gaussian, we can determine te asymptotic distribution inear transformations of Ŵ. In particuar, we see tat te asymptotic distribution of C ŴA, were A = W 1 is te true (unknown) mixing matrix. Remarkaby, we find tat tis asymptotic distribution does not depend on A, but ony on te source density statistics. Specificay, te inverse Fiser information matrix is H 1, and te asymptotic minimum error covariance in te estimate is N 1 H 1. Te minimum error covariance matrix for te pair of off-diagonas c ij and c ji wit a sampe size N is given by, 1 N 1 κ i κ j σ 2 i σ2 j 1 κ jσi 2 1 1 κ i σj 2
9 In particuar, te asymptotic distribution of C is Norma wit mean I, and variance, Eĉ 2 ij 1 N κ j σ 2 i κ i κ j σ 2 i σ2 j 1, For te Generaized Gaussian density, we ave, E(ĉ ii 1) 2 1 1 N 1 + η i κ i = ρ2 i Γ(2 1/ρ i), σi 2 = Γ(3/ρ i) Γ(1/ρ i ) Γ(1/ρ i ), η i = ρ i 1 V. EM PARAMETER UPDATES We define t to be te random variabe denoting te index of te mode cosen at time t, producing te observation x(t), and define te random variabes v t, 1, t = v t 0, oterwise We define j it to be te random variabe indicating te source density mixture component index tat is cosen at time t (independenty of t ) for te it source of te t mode, and we define te random variabes u ijt by, 1, j ti = j and t = u tij 0, oterwise We empoy te EM agoritm by writing te density of X as a margina integra over compete data, wic incudes U and V, p(x; Θ) = N M P vt t = N ( M ) exp v t L t V t=1 =1 V t=1 =1 = ( N M ( exp v t og γ + og det W ) + U,V were we make te definitions, t=1 =1 d i=1 ) u tij Q tij b t W (x t c ) y tij ( β kij [bt ] i µ kij) exp(q tij ) ( ) α kij βkij q kij ytij P t d γ det W exp(q tij ) exp(l t ) Te posterior expectation, ˆv t is given by, i=1 ˆv t = Ev t x t ; Θ = P [ v t = 1 x t ; Θ ] = P t M =1 P t = exp(l t ) M =1 exp(l t ) (19)
10 For te û ijt, we ave, û tij = P [ u tij =1 x t ; Θ ] = P [ u tij =1 v t =1, x t ; Θ ] P [ v t =1 x t ; Θ ] = ẑ tijˆv t (20) were ẑtij E u tij v t =1, x t, ; Θ : ẑ tij = exp(q tij ) m j =1 exp(q tij ) (21) Te function to be maximized in te EM agoritm is ten, N M [ ˆv t( og γ + og det W ) d + û ( tij og αkij + 1 2 og β k ij f ij (y tij ) )] t=1 =1 i=1 were f kij og q kij. Maximizing wit respect to γ subject to γ 0, γ = 1, we get, We can ten rearrange te ikeiood as, M =1 γ +1 og det W + n γ +1 = 1 N k=1 :i k >0 N ˆv t (22) t=1 û ( ti k j og αkj + 1 2 og β kj f kj (y )) tikj) Maximizing wit respect to α kj subject to α kj 0, j α kj = 1, we get, N α +1 :i kj = k >0 t=1 ˆv tẑ ti k j N :i k>0 t=1 ˆv t We define te foowing expectations conditioned on te mode, in wic G t are arbitrary functions of x t EvGt t ˆv t G t, E u G t, j t ˆv t t û tij G t t û tij We aso define te foowing, conditioned on te component k, in wic G t are arbitrary functions of x t and parameters of mode an arbitrary mode, :i E v G t k k >0 t ˆv t G t, E u G t k, j :i k>0 t ˆv t Now we can rearrange te ikeiood as, M =1 γ +1 og det W + n k=1 ζ +1 k :i k>0 t û ti G kj t :i k>0 t û ti kj ( α +1 1 kj 2 og β ( ( )) ) kj E u fkj βkj [bt ] ik µ kj k, j (23)
11 were we define ζ k = :i k >0 γ to be te tota probabiity of a context containing component k, ζ +1 k :i k >0 γ +1 (24) If te source mixture component densities are strongy super-gaussian, ten we can maximize te surrogate ikeiood, were, M =1 γ +1 og det W + n k=1 ζ +1 k ( α +1 1 kj 2 og β kj 1 2 β ) kj E u ξ 2 tikj( ) [bt ] ik µ kj ξ ti k j f kj (y ti kj ) y ti k j Te ocation and scae parameter updates are ten given by, µ +1 kj = E u ξ [b tikj t] ik k, j E u ξ tik j k, j = µ kj + 1 βkj and, β +1 kj = 1 ) E u ξ tikj( [bt ] ik µ 2 = kj k, j Te Generaized Gaussian sape parameters are updated by, E u f kj (yti k j ) k, j ρ ij = 1 ( ρ kj /Ψ( 1 + 1/ρ kj)) Eu yik jt ρ kj og yik jt ρ kj E u f kj (yti (25) k j )/y ti k j k, j β kj E u f kj (yti ) (26) kj y ti kj k, j k, j (27) or if ρ kj > 2, ρ kj = Ψ ( 1 + 1/ρ ij) /ρ kj E u yijt ρ kj og yijt ρ kj k, j (28) A. ICA mixture mode Newton updates Since F is an additive function of te W, te Newton updates can be considered separatey. Te cost function for W is, Te gradient of tis function is, og det W E v n i=1 ẑtij f k i j(y tij ) W T + E v gt (x t c ) T (29) were g t is defined by, [g t ] i ẑtij βkij f k ( ) ij ytij (30)
12 get, Denote te matrix gradient (29) by G. Taking te derivative of [G ] iν wit respect to [W ] kλ, we [G ] iν [W ] kλ = [ W 1 ] [ ] λi W 1 νk + δ ik For te inear transformation C = H(B), we ave, β ij E v ẑ tij f ij (y tij)(x νt [c ] ν )(x λt [c ] λ ) C = W T B T W T + E v Dt B(x t c )(x t c ) T were D t is te diagona matrix wit diagona eements [ Dt ]ii = m ẑtij β k i j f ij ( ) ytij To simpify te cacuation of te asymptotic vaue of te Hessian, we rewrite te second term on te rigt and side of (31) as, E v Dt BW 1 b tb T t W T (31) (32) If we define C CW T and B BW 1, ten we ave, C = B T + E v Dt Bbt b T t (33) Now we write te it row of te second term in (33) as, ẑ β kije v tij f k ij( ) ytij [ Bbt ] i b T t Since b t is zero mean given mode, te Hessian matrix reduces to a 2 2 bock diagona form as in te singe mode case. In te mutipe mode case we get, η i ᾱ +1 ij β k ij E u f k i j (y tij)[b t ] 2 i, j κ i σ 2 i E v [bt ] 2 i ᾱ +1 ij β k i j E u f k i j (y tij), j were ᾱ +1 ij = E vẑtij (sum weigted ony by mode ikeiood). If we define, η ij E u f k (y ij tij) ytij 2, j κ ij E u f k (y ij tij), j (34)
13 or, using integration by parts to rewrite te integras, λ ij 1 + η ij = E u ( f ki j (y tij)y tij 1 ) 2, j κ ij = E u f ki j (y tij) 2, j ten te expressions can be simpified to te foowing, λ i = ᾱ +1 ( ij λij + β ij κ ij µ 2 ij) κ i = ᾱ +1 ij β ij κ ij σ 2 i = E v [bt ] 2 i Define, Φ E v gt b T t (35) We ave G W T = I Φ. If we et, ten we ave, B = H 1( G W T ) ( ) = H 1 I Φ [ B] ii = 1 [Φ ] ii λ i, i = 1,..., n (36) Ten [ B] ij = [Φ ] ji κ j σ 2 i [Φ ] ij κ i κ j σ 2 i σ2 j 1, i j (37) W = BW (38) Te og ikeiood of Θ given X is cacuated as, L ( Θ X ) N ( M ) = og exp(l t ) t=1 =1 (39) VI. EXPERIMENTS VII. CONCLUSION REFERENCES [1] M. E. Tipping and C. M. Bisop, Probabiistic principa component anaysis, J. R. Stat. Soc. Ser. B Stat. Metodo., vo. 61, no. 3, pp. 611 622, 1999. [2] M. E. Tipping and C. M Bisop, Mixtures of probabiistic principe component anayzers, Neura Computation, vo. 11, pp. 443 482, 1999.
14 [3] S. Roweis and Z. Garamani, A unifying review of inear gaussian modes, Neura Computation, vo. 11, no. 5, pp. 305 345, 1999. [4] D. J. C. Mackay, Comparison of approximate metods for anding yperparameters, Neura Computation, vo. 11, no. 5, pp. 1035 1068, 1999. [5] M. E. Tipping, Sparse Bayesian earning and te Reevance Vector Macine, Journa of Macine Learning Researc, vo. 1, pp. 211 244, 2001. [6] M. E. Tipping and N. D. Lawrence, Variationa inference for student s t modes: Robust Bayesian interpoation and generaised component anaysis, Neurocomputing, vo. 69, pp. 123 141, 2005. [7] H. Attias, Independent factor anaysis, Neura Computation, vo. 11, pp. 803 851, 1999. [8] H. Attias, A variationa Bayesian framework for grapica modes, in Advances in Neura Information Processing Systems 12. 2000, MIT Press. [9] Z. Garamani and M. J. Bea, Variationa inference for Bayesian mixtures of factor anaysers, in Advances in Neura Information Processing Systems 12. 2000, MIT Press. [10] H. Lappaainen, Ensembe earning for independent component anaysis, in Proceedings of te First Internationa Worksop on Independent Component Anaysis, 1999. [11] R. A. Coudrey and S. J. Roberts, Variationa mixture of Bayesian independent component anaysers, Neura Computation, vo. 15, no. 1, pp. 213 252, 2002. [12] James W. Miskin, Ensembe Learning for Independent Component Anaysis, P.D. tesis, Dissertation, University of Cambridge, 2000. [13] K. Can, T.-W. Lee, and T. J. Sejnowski, Variationa earning of custers of undercompete nonsymmetric independent components, Journa of Macine Learning Researc, vo. 3, pp. 99 114, 2002. [14] T.-W. Lee, M. S. Lewicki, and T. J. Sejnowski, ICA mixture modes for unsupervised cassification of non-gaussian casses and automatic context switcing in bind signa separation, IEEE Trans. Pattern Anaysis and Macine Inteigence, vo. 22, no. 10, pp. 1078 1089, 2000. [15] S.-I. Amari, T.-P. Cen, and A. Cicocki, Stabiity anaysis of earning agoritms for bind source separation, Neura Networks, vo. 10, no. 8, pp. 1345 1351, 1997. [16] J. A. Pamer, Variationa and Scae Mixture Representations of Non-Gaussian Densities for Estimation in te Bayesian Linear Mode, P.D. tesis, University of Caifornia San Diego, 2006, Avaiabe at ttp://sccn.ucsd.edu/ jason. [17] H. Attias and C. E. Screiner, Bind source separation and deconvoution: Te dynamic component anaysis agoritm, Neura Computation, vo. 10, pp. 1373 1424, 1998. [18] S. C. Dougas, A. Cicocki, and S. Amari, Muticanne bind separation and deconvoution of sources wit arbitrary distributions, in Proc. IEEE Worksop on Neura Networks for Signa Processing, Ameia Isand Pantation, FL, 1997, pp. 436 445. [19] D. T. Pam, Mutua information approac to bind separation of stationary sources, IEEE Trans. Information Teory, vo. 48, no. 7, pp. 1935 1946, 2002. [20] A. M. Bronstein, M. M. Bronstein, and M. Zibuevsky, Reative optimization for bind deconvoution, IEEE Transactions on Signa Processing, vo. 53, no. 6, pp. 2018 2026, 2005. [21] T. Kim, H. Attias, S.-Y. Lee, and T.-W. Lee, Bind source separation expoiting iger-order frequency dependencies, IEEE Transactions on Speec and Audio Processing, vo. 15, no. 1, 2007.
15 [22] A. Hyvärinen, P. O. Hoyer, and M. Inki, Topograpic independent component anaysis, Neura Computation, vo. 13, no. 7, pp. 1527 1558, 2001. [23] T. Etoft, T. Kim, and T.-W. Lee, Mutivariate scae mixture of Gaussians modeing, in Proceedings of te 6t Internationa Conference on Independent Component Anaysis, J. Rosca et a., Ed. 2006, Lecture Notes in Computer Science, pp. 799 806, Springer-Verag. [24] J. A. Pamer, K. Kreutz-Degado, and S. Makeig, Super-Gaussian mixture source mode for ICA, in Proceedings of te 6t Internationa Conference on Independent Component Anaysis, J. Rosca et a., Ed. 2006, Lecture Notes in Computer Science, Springer-Verag. [25] S.-I. Amari, Natura gradient works efficienty in earning, Neura Computation, vo. 10, no. 2, pp. 251 276, 1998. [26] J.-F. Cardoso and B. H. Laed, Equivariant adaptive source separation, IEEE. Trans. Sig. Proc., vo. 44, no. 12, pp. 3017 3030, 1996. [27] D. T. Pam and P. Garat, Bind separation of mixture of independent sources troug a quasi-maximum ikeiood approac, IEEE Trans. Signa Processing, vo. 45, no. 7, pp. 1712 1725, 1997.