Function factorization using warped Gaussian processes

Fuctio factorizatio usig warped Gaussia processes Mikkel N. Schmidt ms@imm.dtu.dk Uiversity of Cambridge, Departmet of Egieerig, Trumpigto Street, Cambridge, CB2 PZ, UK Abstract We itroduce a ew approach to o-liear regressio called fuctio factorizatio, that is suitable for problems where a output variable ca reasoably be modeled by a umber of multiplicative iteractio terms betwee o-liear fuctios of the iputs. The idea is to approximate a complicated fuctio o a high-dimesioal space by the sum of products of simpler fuctios o lowerdimesioal subspaces. Fuctio factorizatio ca be see as a geeralizatio of matrix ad tesor factorizatio methods, i which the data are approximated by the sum of outer products of vectors. We preset a oparametric Bayesia approach to fuctio factorizatio where the priors over the factorizig fuctios are warped Gaussia processes, ad we do iferece usig Hamiltoia Markov chai Mote Carlo. We demostrate the superior predictive performace of the method o a food sciece data set compared to Gaussia process regressio ad tesor factorizatio usig PARAFAC ad GE- MANOVA models.. Itroductio I may regressio problems, the output variable ca oly be reasoably explaied by iteractios betwee the iput variables. A example, which we shall retur to i our experimets, is measuremets of the color of meat uder differet storage coditios. Color is a importat quality that affects the cosumers choice, ad it is thus importat to uderstad ad model how the color is iflueced by differet explaatory factors such as storage time, temperature, oxyge cotet i the atmosphere, ad exposure to light. It is reasoable to assume, that the color of meat does ot vary li- Appearig i Proceedigs of the 26 th Iteratioal Coferece o Machie Learig, Motreal, Caada, 2009. Copyright 2009 by the authors/owers. early with each of the explaatory variables, but that it depeds o iteractios of the explaatory variables i a o-liear fashio. I this paper we preset a ew approach to o-liear regressio that is suitable for problems where the output variable ca be reasoably explaied by o-liear iteractios of the iputs. The goal i regressio aalysis is to ifer a mappig fuctio, y : X R, based o N observed iput-output pairs, D = {x,y } N =, where x X are iputs ad y R are outputs. The Bayesia approach to the regressio problem is to formulate a prior distributio over mappig fuctios ad combie this with the data usig a suitable observatio model to ifer the posterior distributio over the mappig fuctio. The posterior ca the, for example, be used to make iferece about the value of the output, y, at a previously usee positio i iput space, x, by computig the predictive distributio which ivolves itegratig over the posterior. The mai idea i fuctio factorizatio is to approximate a complicated fuctio, yx, o a highdimesioal space, X, by the sum of products of a umber of simpler fuctios, f i,k x i, o lower dimesioal subspaces, X i, yx k= i= I f i,k x i. We refer to this as a K-compoet I-factor fuctio factorizatio model. I the model, we assume that the iputs, x X, ca aturally be divided ito I iputs that lie i subspaces of X, x X,...,x I X I, ad that the output is reasoably modeled by a umber of multiplicative iteractio terms betwee o-liear fuctios of the iputs. The subspaces eed ot be chose to be disjoit; for example, two subspaces could be idetical which makes it possible for the model to capture ostatioary modulatio effects. Bayesia iferece i fuctio factorizatio models requires the specificatio of a likelihood fuctio as well as priors over the factorizig fuctios. These priors

Fuctio factorizatio usig warped Gaussia processes could for example be chose by selectig a suitably flexible parameterized family of fuctios ad assume prior distributios over the parameters. Aother approach, which we will pursue i this paper, is to assume a o-parametric distributio over the fuctios. Specifically, we choose warped Gaussia process priors. 2 X Matrix factorizatio + Fuctio factorizatio 2. Relatio to other methods Fuctio factorizatio with warped Gaussia process priors FF-WGP geeralizes a umber of other well kow machie learig techiques icludig matrix ad tesor factorizatio, liear regressio, ad warped Gaussia process regressio. I the followig, we give a overview of the relatio to these methods. 2.. Relatio to matrix ad tesor factorizatio Fuctio factorizatio geeralizes matrix ad tesor factorizatio models, as illustrated i Figure. I matrix factorizatio, a data matrix, Y, is approximated by the product of two matrices, F ad F 2, Y F F 2. 2 To make the relatio to fuctio factorizatio explicit, we ca rewrite this as y, 2 2 k= i= f i,k i, 3 where y, 2 is elemet, 2 of Y ad f i,k is elemet k, of F i. Comparig this with Eq., we ote that the mai differece betwee matrix factorizatio ad fuctio factorizatio is that i the former the goal is to lear a set of parameters, f i,k i, whereas i the latter a set of fuctios, f i,k x i, are leared. I matrix factorizatio data poits lie o a regular grid i the joit space of colum ad row idices,, 2, ad are approximated by the sum of outer products of two factors, which are vectors defied o the colum ad row idices respectively. I fuctio factorizatio, data poits lie i the space X ad are approximated by the sum of products of fuctios defied o subspaces of X. I fuctio factorizatio the priors ca be chose, for example, to have support over the o-egative umbers or to have a sub- or super-gaussia desity, which will lead to geeralizatios of certai Bayesia formulatios of o-egative matrix factorizatio NMF ad idepedet compoet aalysis ICA. Similar aalogies exists betwee fuctio factorizatio ad X 2 Figure. Illustratio of the relatio betwee matrix factorizatio ad fuctio factorizatio. I matrix factorizatio, data lie o a regular grid ad is approximated by a product-sum of vectors. I fuctio factorizatio, data lie i X ad are approximated by a product-sum of fuctios over subspaces of X. higher-order decompositios of tesors such as the parallel factor aalysis PARAFAC model. Schmidt ad Laurberg s 2008 method for oegative matrix factorizatio with Gaussia process priors ca be see as a K-compoet two-factor fuctio factorizatio model, where the data is required to be i the form of a matrix, ad iferece is doe by computig a maximum a posteriori estimate. 2.2. Relatio to liear regressio Fuctio factorizatio ca also be see as a geeralizatio of liear regressio. To show this, we start from the basic liear regressio equatio, where the outputs are modeled by a liear combiatio of the iputs, y + x k β k, 4 i=k where x k deotes the kth elemet of x ad β k are regressio coefficiets. Sice this model is liear ad additive i the iputs, it does ot model o-liear effects ad iteractios betwee the iputs. To overcome these issues, liear regressio ca be performed o o-liear iteractive terms, y f k x i β k, 5 k= where f k are o-liear fuctios of all the iput variables. To show the relatio to fuctio factorizatio, we choose f k as a product of o-liear trasforma-

Fuctio factorizatio usig warped Gaussia processes tios of the iputs, Data GPR FF-GP I y f i,k x i β k. 6 k= i= 0 0 0 0 0 0 0 This expressio is very similar to Eq. ; however, i liear regressio the objective is to lear the regressio coefficiets, β k, for fixed o-liear trasformatios of the iputs, whereas i fuctio factorizatio the aim is to lear the o-liear fuctios themselves. Fuctio factorizatio geeralizes this formulatio of liear regressio, sice the regressio coefficiets without loss of geerality ca be icorporated ito the o-liear fuctios. 2.3. Relatio to warped Gaussia processes Fuctio factorizatio geeralizes warped Gaussia process WGP regressio, whe WGP priors are assumed over the factorizig fuctios. Obviously, whe K = I =, FF-WGP collapses to WGP regressio; however, i models with multiple factors the two methods differ i the assumptios that are made about the data. A simple illustratio is give i Figure 2 for a twodimesioal toy data problem geerated as the product of two cosie fuctios. Fiftee data poits were chose from the fuctio ad regressio aalysis was performed usig a GP ad a oe-compoet two-factor fuctio factorizatio with GP priors. I regios of the fuctio that are far away from ay of the observed iput poits, the GP teds to its zero mea prior. The fuctio factorizatio method o the other had assumes that the data has a multi-liear structure, ad sice that assumptio i this case is correct, the data is modeled accurately eve i regios with o observatios. I may data sets, the assumptio that the outputs are well modeled by multiplicative iteractive terms is reasoable, ad whe the assumptios holds, fuctio factorizatio ca provide better results tha Gaussia process regressio based methods. Adams ad Stegle s 2008 Gaussia process product model, i which a fuctio is modeled as the poit-wise product of two Gaussia processes for the purpose of capturig o-statioarities, ca be see as a oe-compoet two-factor fuctio factorizatio model with Gaussia process priors, where the two factorizig fuctios are both defied over the etire space, X. Figure 2. A simple toy example that illustrates importat differeces betwee Gaussia process regressio ad fuctio factorizatio. Left: A two-dimesioal toy data set is costructed as the multiplicatio of two cosies ad 5 data poits are chose at the idicated positios. Middle: Gaussia process regressio o the data poits fits the data well i the regio close to the observatio, ad far away from the observatios it teds to its zero mea prior. Right: A oe-compoet fuctio factorizatio o the same data fits the data well i the etire regio, due to the correct assumptio that the data comes from a product of two fuctios. 3. Fuctio factorizatio usig warped Gaussia processes I the followig, we describe a o-parametric Bayesia approach to fuctio factorizatio usig warped Gaussia process WGP priors over the factorizig fuctios. First, we give a summary of the WGP, ad the we describe fuctio factorizatio usig WGP priors. Fially, we preset a iferece procedure based o Hamiltoia Markov chai Mote Carlo HMCMC. 3.. Warped Gaussia processes A Gaussia process GP is a flexible ad practical method for specifyig a o-parametric distributio over a fuctio, gx. It is fully characterized by its mea fuctio, mx = E [ gx ], ad its covariace fuctio, cx,x = E [ gx mx gx mx ]. We use the otatio gx GP mx,cx,x to deote a radom fuctio draw from a GP. The GP is limited i the sese that it assumes that ay fiite subset of values of the fuctio gx follow a joit multivariate Gaussia distributio. The idea behid the warped Gaussia process WGP Selso et al., 2004 is to overcome this limitatio by mappig the GP through a o-liear warp fuctio, hg, parameterized by θ h, yx = h gx, gx GP mx,cx,x, 7 See Rasmusse ad Williams 2006 for a comprehesive itroductio to Gaussia processes i machie learig.

Fuctio factorizatio usig warped Gaussia processes θh θ c x x 2 x 3 x N g g 2 g 3 g N µ µ 2 µ 3 µ N k i We have foud empirically that MCMC iferece i the model is more efficiet whe we perform a chage of variables, such that the latet variables are ucorrelated a priori. We do this by defiig a ew latet variable, z i,k, related to gi,k by g i,k = = C i,k, zi,k, θ y y y 2 y 3 y N Figure 3. Graphical model for the fuctio factorizatio model with warped Gaussia process priors. Squares represet observed variable ad circles deote uobserved variables ad parameters. The bold lie idicates that the odes g i,k,..., gi,k N are fully coected. ad joitly lear the parameters of the GP ad the warp fuctio. Selso et al. 2004 lear the parameters of the WGP by maximum likelihood, but they ote that priors ca be icluded to lear maximum a posteriori estimates or Markov chai Mote Carlo ca be used to itegrate out the parameters. 3.2. The FF-WGP model Let µx deote a fuctio factorizatio model, µx = k= i= I f i,k x, 8 where f i,k x are modeled by zero mea WGPs. I the most geeral formulatio, each factor i each compoet has distict warp ad covariace fuctios, f i,k x = h i,k g i,k x, 9 g i,k x GP 0,c i,k x,x. 0 We the model the outputs, y, as idepedet ad idetically distributed give µ = µx, usig some likelihood fuctio, py µ, parameterized by θ y. A graphical model of the FF-WGP is show i Figure 3. The ukows i the model are the parameters of the likelihood fuctio, θ y, the warp fuctios, θ h, ad the covariace fuctios, θ c, as well as the latet variables, g i,k = g i,k x. 3.3. Chage of parameters The latet variables, g i,k, have a multivariate Gaussia distributio a priori, ad are thus highly correlated. where C i,k, holds the Cholesky decompositio of the covariace matrix of the i,kth Gaussia process, = C i,k, Ci,k, = ci,k, = ci,k x,x. 2 With this chage of variables, the model ca be writte as I N µ = h i,k, 3 k= i= = C i,k, zi,k ad the prior over the ew latet variables are i.i.d. stadard Gaussia, z i,k N0,. We emphasize that this chage of variables does ot chage the model oly its parameterizatio. 3.4. Posterior ad predictive distributio The joit posterior distributio of the latet variables ad the parameters coditioed o the data is give by pz,θ D pθ N = py µ I K i= k= pz i,k, 4 where z = {z i,k } deotes all latet variables, θ = {θ y,θ h,θ c } deotes all parameters, ad µ is defied i Eq. 3. To ifer the value of the output y at a previously usee poit x we must evaluate the predictive distributio, which requires itegratig the posterior distributio over the latet variables ad parameters, py x, D = py x,z,θpz,θ Ddzdθ. 5 Sice this itegral is aalytically itractable, we approximate it by Mote Carlo samplig, i.e., we draw M samples, {z m,θ m } M m=, from the posterior distributio i Eq. 4 ad approximate Eq. 5 by the sum M py x, D py x,z m,θ m. 6 m= This expressio ca be directly evaluated, if the iput poit, x, coicides with at least oe data poit, x,

Fuctio factorizatio usig warped Gaussia processes o all subspaces, X i, because the all required latet variables are istatiated i the posterior sample. I the toy example i Figure 2, for example, this meas that Eq. 6 ca be directly evaluated o the 8-by-8 grid formed by the 5 iput poits. To evaluate the predictive distributio outside these poits we istead draw samples from the predictive distributio. 3.5. Iferece usig Hamiltoia Markov chai Mote Carlo Sice we ca ot directly draw samples from the posterior distributio i Eq. 4, we use a Markov chai Mote Carlo samplig procedure. Hamiltoia Markov chai Mote Carlo HMCMC Duae et al., 987 is a attractive method for this problem, because it improves o the sometimes slow covergece rates due to the radom walk behavior of other MCMC methods such as Gibbs samplig ad Metropolis-Hastigs. HMCMC requires the computatio of derivatives of the logarithm of the posterior distributio with respect to all variables ad parameters, which ca be doe as we show i the followig. We defie L which is proportioal to the egative log posterior up to a additive costat L = log pθ I i= k= = 2 z i,k 2 log py µ. 7 We ow eed to compute the derivatives of L with respect to the latet variables ad all parameters of the model. The derivative with respect to the latet variables requires the computatio of the derivate of the log likelihood ad of the warp fuctios, L z i,k = = h i,k g i,k log py µ h i,k µ i i C i,k, + zi,k. 8 The derivative with respect to the parameters of the likelihood fuctio is straightforward, ad has two terms: the derivative of the prior ad the derivative of the log likelihood, L log pθ = θ y θ y = log py µ θ y. 9 Similarly, the derivate with respect to the parameters of the warp fuctios has two terms: the derivative of the prior ad the derivative of the log likelihood, where we use the chai rule, L log pθ = θ h θ h I k= i= i i = h i,k log py µ µ h i,k θ h. 20 The derivative with respect to the parameters of the covariace fuctio is a bit more ivolved, sice it requires the computatio of the derivative of a Cholesky decompositio. We begi with the derivative of the log posterior with respect to the Cholesky decompositio itself L C i,k = log py µ µ, i i h i,k h i,k g i,k z i,k. 2 Usig this, we ca compute the backward derivative Smith, 995 of the Cholesky decompositio, F i,k,, ad fially the desired derivative ca be evaluated as L θ c = F i,k c i,k,, = = θ c. 22 The computatio of the derivative of the Cholesky decompositio is approximately as computatioally expesive as computig a Cholesky decompositio which is the most expesive computatio i the WGP method. For that reaso, the use of the backwards derivative is attractive whe we eed to compute the derivative with respect to several parameters of a covariace fuctio, sice the backward derivate eeds oly be computed oce. 4. Experimets We evaluated the proposed model o a food sciece data set Bro & Jakobse, 2002 that cosists of measuremets of the color of fresh beef as it chages durig storage uder differet coditios. The storage coditios are determied by the values of five idepedet variables summarized i Table. I a reduced factorial experimetal desig, measuremets were take at a subset of the possible combiatios of values of the idepedet variables, such that the data ca be represeted as a five-dimesioal tesor with 60% of the values missig. A detailed descriptio of the experimetal desig ad the data is give by Bro ad Jakobse 2002 who aalyze the data usig geeralized multiplicative aalysis of variace GEMANOVA. We ote, that because of the factorial experimetal desig, the iput data poits lie o a regular grid which

Fuctio factorizatio usig warped Gaussia processes hg 2 0 0 g λ = 0 λ = λ = 2 Figure 4. Warp fuctio that trasforms a stadard Gaussia distributio ito a expoetial distributio with log scale λ. is a requiremet i order to aalyze the data usig tesor factorizatio methods, suitably tailored to hadle missig data Tomasi & Bro, 2002. The fuctio factorizatio method does ot require the data to lie o a grid, ad its applicability thus exteds beyod multiway array data. 4.. Model choice I our experimets we use a Gaussia likelihood, py µ = exp y µ 2, 23 2π expv 2expv parameterized by the log variace, θ y = {v}, which esures that the variace is always positive. This likelihood fuctio allows us to directly compare our results with those of Bro ad Jakobse 2002 who use least squares PARAFAC ad GEMANOVA models to aalyze the same data. The purpose of the warp fuctios is to map the Gaussia outputs of the GPs ito aother desired distributio. I the preset data we expect the factorizig fuctios to be o-egative, because the output variable that measures the red color of the meat is iheretly o-egative. We use a particular warp fuctio, illustrated i Figure 4, h i,k g = exp λ i log 2 g 2 2 erf, 24 parameterized by log scale parameters θ h = {λ i }. This warp fuctio was suggested by Schmidt ad Laurberg 2008 ad has the property that it trasforms a stadard Gaussia variable to a expoetially distributed variable. For the first four idepedet variables, show i Table, we choose a Gaussia covariace fuctio, c i,k x,x = exp expl i x x 2, 25 parameterized by log legth-scale parameters θ c = {l i }. We choose the covariace fuctio for the fifth idepedet variable, the muscle umber, as a delta fuctio, c i,k x,x = δx x, 26 such that this factor is effectively modeled as a expoetially distributed radom variable. For simplicity, we choose a o-iformative flat improper prior over all parameters, pθ, but we ote that it is straightforward to choose proper priors over the parameters i a hierarchical fashio, ad iclude ay hyper-parameters i the HMCMC iferece procedure. 4.2. Cross-validatio We divided the data set ito te subsets ad performed te-fold cross-validatio. We icluded three differet FF-WGP models i our experimets: K = {,2,3}. I each ru, we ra the HMCMC sampler for 5000 iteratios usig 20 leapfrog steps i each iteratio ad discarded the first half of the samples to allow the sampler to bur i. Plots of the samples of the parameters idicated that the samplig procedure had stabilized after a few hudred iteratios. As a example, the log variace parameter, v, as a fuctio of the iteratio umber is show i Figure 5 for oe of the HMCMC rus. We the computed the posterior mea estimate of the held-out data. For compariso we fitted K = {,2,3} compoet PARAFAC models usig the N-way toolbox Adersso & Bro, 2000. We also fitted a stadard Gaussia process with Gaussia covariace by maximum likelihood ad computed the posterior mea of the held-out data. For all of the models, we the computed the root mea squared error of the held out data. 4.3. Results The results of the experimets are give i Table 2. A oe-compoet PARAFAC model yielded a relatively high cross-validatio error, whereas a two-compoet PARAFAC yields a relatively low error. A threecompoet PARAFAC leads to over-fittig, i.e., it fitted the traiig data better but gave a higher crossvalidatio error. The two-compoet GEMANOVA model is suggested by Bro & Jakobse, 2002 as a good tradeoff betwee model complexity ad predictive quality, ad correspods to a restricted PARAFAC model with some of

Fuctio factorizatio usig warped Gaussia processes 3 2 0.9 FF-WGP PARAFAC v 0.8 0 0 000 2000 3000 4000 5000 Iteratio Figure 5. Log variace parameter, v, as a fuctio of iteratio umber i oe ru of the HMCMC sampler for a two-compoet FF-WGP model. Table. Idepedet variables that affect the color of beef as it chages durig storage. Idepedet variable Uit Values Storage time Days 0, 3, 7, 8, 0 Temperature C 2,5,8 Oxyge cotet i head-space % 40,60,80 Exposure time to light % 0,50,00 Muscle umber,2,3,4,5,6 the factors fixed. I the experimets of Bro ad Jakobse 2002, the GEMANOVA model yields a crossvalidatio error comparable to our results o a twocompoet PARAFAC model, but usig fewer parameters. Our result for the oe-compoet FF-WGP model was similar to the oe-compoet PARAFAC, which agai suggests that oe multiplicative compoet is ot eough to adequately describe the data. Two- ad three-compoet FF-WGP models yielded better predictios ad had o problems with over-fittig, sice all parameters are itegrated out usig MCMC. Gaussia process regressio performed slightly worse tha the factorizatio based models, possibly because the factorial structure of the problem is ot exploited. Figure 6 shows the leared factor pertaiig to storage time i the oe-compoet PARAFAC ad FF- WGP model. I the PARAFAC model, the value of the factor ca oly be estimated at the five iput positios at which data poits were available. The fuctio factorizatio approach o the other had gives a full posterior distributio over a fuctio, which ca be evaluated aywhere. The two methods agree that 0.7 0 2 4 6 8 0 Days Figure 6. Relative effect of storage time: The factor pertaiig to storage time i oe-factor PARAFAC ad FF- WGP models. The PARAFAC model estimates the value of the factor at the five poits at which data was recorded. The FF-WGP outputs a posterior distributio over fuctios. The plot shows the posterior mea ad stadard deviatio. storage time egatively iflueces the color of beef i a ear-liear maer. 5. Coclusios We have preseted a ew approach to o-liear regressio called fuctio factorizatio. The method is based o approximatig a fuctio i a highdimesioal space as the sum of products of fuctios o subspaces. Usig warped Gaussia processes as o-parametric priors, we have preseted a Bayesia iferece procedure based o Hamiltoia Markov chai Mote Carlo samplig. Factorizatio based methods such as the PARAFAC model, that model data as a product of factors, ca lead to ituitive ad iterpretable results whe the factors have a physical meaig. No-parametric Bayesia regressio methods, such as the warped Gaussia process, make fewer assumptios o the structure of the data, but lack the same iterpretability. The fuctio factorizatio method preseted here combies the idea of a factorized model with the flexibility of o-parametric Bayesia regressio. O a food sciece data set we have show that the fuctio factorizatio method usig warped Gaussia process priors leads to superior performace i terms of cross-validated root mea squared error o a pre- 2 Our results differ from Bro ad Jakobse 2002 who ote that the model is degeerate but report a leave-oeout cross-validatio RMSE of.50. 3 Leave-oe-out cross-validatio result from Bro & Jakobse, 2002 with oe multiplicative compoet plus oe mai compoet.

Fuctio factorizatio usig warped Gaussia processes Table 2. Te-fold cross-validatio root mea squared error results o beef color dataset for parallel factor aalysis PARAFAC, geeralized multiplicative aalysis of variace GEMANOVA, fuctio factorizatio usig warped Gaussia processes FF-WGP, ad Gaussia process regressio GPR. Model Compoets Cross-val. RMSE PARAFAC 2.95 2.7 2 3 2.36 GEMANOVA 2.75 3 Smith, S. P. 995. Differetiatio of the cholesky algorithm. Computatioal ad Graphical Statistics, Joural of, 4, 34 47. Selso, E., Rasmusse, C. E., & Ghahramai, Z. 2004. Warped gaussia processes. Neural Iformatio Processig Systems, Advaces i NIPS pp. 337 344. Tomasi, G., & Bro, R. 2002. Parafac ad missig values. Chemometrics ad Itelliget Laboratory Systems, 75, 63 80. 0.06/j.chemolab.2004.07.003. FF-WGP 2.94 2.50 3.45 GPR /a.80 dictio task compared with tesor factorizatio ad Gaussia process regressio. Also, we have demostrated that the fuctio factorizatio method provides full posterior distributios over the factorizig fuctios, which ca improve o the iterpretability of the model. Refereces Adams, R., & Stegle, O. 2008. Gaussia process product models for oparametric ostatioarity. Machie Learig, Iteratioal Coferece o ICML pp. 8. Adersso, C., & Bro, R. 2000. The -way toolbox for matlab. Chemometrics & Itelliget Laboratory Systems, 52, 4. Bro, R., & Jakobse, M. 2002. Explorig complex iteractios i desiged data usig gemaova. color chages i fresh beef durig storage. Chemometrics, Joural of, 6, 294 304. Duae, S., Keedy, A. D., Pedleto, B. J., & Roweth, D. 987. Hybrid mote carlo. Physics Letters B, 95, 26 222. Rasmusse, C. E., & Williams, C. K. I. 2006. Gaussia processes for machie learig. MIT Press. Schmidt, M., & Laurberg, H. 2008. Noegative matrix factorizatio with gaussia process priors. Computatioal Itelligece ad Neurosciece, 2008. 5/2008/36705.