Learning Curves for Gaussian Processes via Numerical Cubature Integration

Learnng Curves for Gaussan Processes va Numercal Cubature Integraton Smo Särkkä Department of Bomedcal Engneerng and Computatonal Scence Aalto Unversty, Fnland smo.sarkka@tkk.f Abstract. Ths paper s concerned wth estmaton of learnng curves for Gaussan process regresson wth multdmensonal numercal ntegraton. We propose an approach where the recurson equatons for the generalzaton error are approxmately solved usng numercal cubature ntegraton methods. The advantage of the approach s that the egenfuncton expanson of the covarance functon does not need to be known. The accuracy of the proposed method s compared to egenfuncton expanson based approxmatons to the learnng curve. Keywords: Gaussan process regresson, learnng curve, numercal cubature 1 Introducton Gaussan process (GP) regresson [1, 2] refers to a Bayesan machne learnng approach, where nstead of usng a fxed form parametrc model such as a MLP neural network [3] one postulates a Gaussan process pror over the model functons. Learnng n Gaussan process regresson means computng the posteror Gaussan process, whch s condtoned to observed measurements. The predcton of unobserved values amounts to computng predctve dstrbutons and ther statstcs. Ths paper s concerned wth approxmate computaton of learnng curves for Gaussan process regresson. By learnng curve we mean the average generalzaton error ǫ as functon of the number of tranng samples n. A common way to compute approxmatons to the learnng curves s to express the approxmate average learnng curve or ts bounds n terms of the egenvalues of the covarance functon [4 8]. Upper and lower bounds for one-dmensonal covarance functons, n terms of spectral denstes and egenvalues have been presented n [9]. One possble approach s to express the lower bound for the learnng curve n terms of the equvalent kernel [10], whch leads to smlar results as the classcal error bounds for Gaussan processes (see, e.g., [11, 12]). Statstcal physcs based approxmatons to GP learnng curves have been consdered n [13, 14]. In ths paper we shall follow the deas presented n [5,8], but nstead of usng the egenfuncton expanson, we approxmate the ntegrals over the tranng and test nputs wth multdmensonal numercal ntegraton. The advantage of the

2 Learnng Curves for Gaussan Processes va Numercal Cubature Integraton approach s that the learnng curve can be evaluated wthout the knowledge of the egenfunctons and egenvalues of the covarance functon. In the numercal ntegraton methods, we shall specfcally consder applcaton of multdmensonal generalzatons of Gauss-Hermte quadratures, that s, Gauss-Hermte cubatures for computaton of the multdmensonal ntegrals. The usage of such numercal cubature rules has also recently ganed attenton n context of nonlnear Kalman flterng and smoothng [15 18]. 2 Recurson for Learnng Curve Consder the followng Gaussan process regresson model: f(x) GP(0,C(x,x )) y k = f(x k )+r k, (1) wherey k,k = 1,2,...,narethemeasurements,r k N(0,s 2 )stheiidmeasurement error sequence, and the nput s x R d. That s, the unknown functon f(x) s modeled as a zero mean Gaussan process wth the gven covarance functon C(x,x ). Here we shall assume that both the functon f(x) and the measurements y k are scalar valued, but the extenson to vector case s straghtforward. We shall also assume that the pror Gaussan process has zero mean for notatonal convenence. Gvennmeasurementsy = (y 1,...,y n )atnputpostonsx 1:n = (x 1,...,x n ) the posteror mean and covarance functons of f are gven as [1,2]: m (x) = C(x,x 1:n )[C(x 1:n,x 1:n )+s 2 I] 1 y C (x,x ) = C(x,x ) C(x,x 1:n )[C(x 1:n,x 1:n )+s 2 I] 1 C T (x,x 1:n ). (2) For the purposes of estmatng the learnng curves, we shall assume that the nput postons n the tranng set x k are random, and form an IID process x 1,...,x n such that x k p(x). If we assume that the test nputs have the dstrbuton x p (x), we obtan the followng well known expresson for the average generalzaton error of the Gaussan process: ǫ = C(x,x) C(x,x 1:n )[C(x 1:n,x 1:n )+s 2 I] 1 C T (x,x 1:n ), (3) where the expectaton s taken over both the tranng and test nput postons x 1,...,x n p( ) and x p ( ), respectvely. Note that the error s no longer functon of the measurements y 1,...,y n, nor the nput postons. Ths Gaussan process regresson soluton (2) can also be equvalently wrtten n the followng recursve form: Intalzaton: At ntal step we have m (0) (x) = 0 C (0) (x,x ) = C(x,x ). (4)

Learnng Curves for Gaussan Processes va Numercal Cubature Integraton 3 Update: At each measurement we perform the followng update step: m (k+1) (x) = m (k) (x)+ C (k) (x,x k ) C (k) (x k,x k )+s 2 (y k m (k) (x)) C (k+1) (x,x ) = C (k) (x,x ) C(k) (x,x k )C (k) (x,x k ) C (k) (x k,x k )+s 2. The result at step k = n wll then be exactly the same as gven by the equatons (2). Ths recurson can be seen as a specal case of the update step of nfntedmensonal dstrbuted parameter Kalman flter (see, e.g., [19, 20]) wth a trval dynamc model. Usng these recursons equatons, we can now wrte down the formal recurson formula for the covarance functon, whch s averaged over n tranng nputs as follows: Ĉ (k+1) (x,x ) = C (k) (x,x ) (5) C (k) (x,x k )C (k) (x,x k ) p(x R C d (k) (x k,x k )+s 2 k )dx k. (6) In ths artcle, we shall follow [8] and gnore the dependence from the nputs before the prevous step and approxmate ths as Ĉ (k+1) (x,x ) = Ĉ(k) (x,x Ĉ (k) (x,x k ) )Ĉ(k) (x,x k ) p(x k )dx k. (7) R d Ĉ (k) (x k,x k )+s 2 The approxmaton to the average generalzaton error s then gven as ǫ = Ĉ (x,x)p (x)dx. (8) R d 3 Egenfuncton Expanson Approxmaton of Recurson As done n [8], we can use the egenfuncton expanson method for solvng the approxmate average generalzaton error as follows. By Mercer s theorem the nput averaged kernel Ĉ (k) (x,x ) has the egenfuncton expanson Ĉ (k) (x,x ) = =1 λ (k) φ (x)φ (x ), (9) where φ (x) and λ (k) are the orthonormal set of egenfunctons and egenvalues of the kernel such that λ (k) φ (x) = Substtutng the seres nto the recurson (7) now gves R d Ĉ (k) (x,x )φ (x )p(x )dx. (10) Ĉ (k+1) (x,x ) = λ (k) φ (x)φ (x ) [ ][ ] λ(k) φ (x)φ (x k ) j λ(k) j φ j (x )φ j (x k ) [ R ] d λ(k) φ (x k )φ (x k ) +s 2 p(x k)dx k. (11)

4 Learnng Curves for Gaussan Processes va Numercal Cubature Integraton If we approxmate the latter ntegral by takng expectatons separately n denomnator and numerator, then by the orthonormalty propertes of the egenfunctons ths reduces to: Ĉ (k+1) (x,x ) = λ (k) [ λ (k) ] 2 j λ(k) j +s 2 φ (x)φ (x ), (12) whch mples that Ĉ(k+1) (x,x ) also has an egenfuncton expansons n terms of the same egenfunctons. If we denote the coeffcents as λ (k+1), then the approxmate recurson equaton for the coeffcents s gven as λ (k+1) = λ (k) [ λ (k) ] 2 j λ(k) j +s 2 (13) Ifwehavep (x) = p(x),thentheapproxmaton(8)totheaveragegeneralzaton error now reduces to [8] ǫ D = λ [ λ ] 2 j λ j +s 2. (14) We could then proceed to use further approxmatons by consderng n as contnuous, whch would lead to UC and LC approxmatons [8]: The upper contnuous (UC) approxmaton has the form ǫ UC = s 2 λ n λ +s2, (15) where λ are the egenvalues of the pror covarance functon, and the effectve number of tranng examples n s the soluton to the self-consstency equaton n + ln ( 1+s 2 n λ ) = n. (16) The lower contnuous (LC) approxmaton s the soluton to the self-consstency equaton ǫ LC = s 2 λ n λ +s2, (17) where n = s 2 n/[s 2 +ǫ LC ].

Learnng Curves for Gaussan Processes va Numercal Cubature Integraton 5 4 Numercal Cubature Approxmaton of Recurson Cubature ntegraton refers to methods for approxmate computaton of ntegrals of the form E[g(x)] = g(x)p(x)dx, (18) R d where p(x) s some fxed weght functon. In partcular, cubature ntegraton methods here prmarly refer to multdmensonal generalzatons of Gaussan quadratures, that s, to approxmatons of the form E[g(x)] W () g(x () ), (19) where the weghts W () and the evaluaton ponts x () are (known) functonals of the weght functon p(x). In partcular, when p(x) s a multdmensonal Gaussan dstrbuton, we can use multdmensonal Gauss-Hermte cubatures or more effcent sphercal cubature rules (see, e.g., [21, 16, 17]). However, because here we need qute hgh order rules and constructon of such effcent hgher order sphercal rules s qute complcated task, here we have used smpler Cartesan product based Gauss-Hermte cubature rules. We can now use a multdmensonal cubature approxmaton to the ntegral n Equaton (7) whch leads to the followng: Ĉ (k+1) (x,x ) = Ĉ(k) (x,x ) W () Ĉ(k) (x,x () )Ĉ(k) (x,x () ) Ĉ (k) (x (),x () )+s 2, (20) where the weghts W () and sgma ponts x () correspond to ntegraton over the tranng set dstrbuton p(x). For arbtrary x and x we thus may now run the recurson (7), apply the above approxmaton on each step and get an approxmaton to Ĉ (x,x ). Analogously, we can now form approxmaton to the average generalzaton error n Equaton (8) as follows: Ĉ (x,x)p (x)dx W (j) Ĉ (x (j),x (j) ), (21) R d j where the weghts W () and sgma ponts x () correspond to ntegraton over the test set dstrbuton p (x). The computaton of the latter ntegral can now be done by evaluatng the former quadrature based approxmaton (20) at the quadrature ponts of the latter ntegral, that s, at x = x = x (j). Note that ths procedure mght underestmate the generalzaton error slghtly, because the sgma ponts for the tran and test sets are n the same postons. It would be possble to use dfferent sgma ponts for tran and test sets, but then the computaton would be slghtly more complcated. We can now compute smple approxmaton to the learnng curve by assumng that p (x) = p(x) and by usng the same cubature rule for tran and test sets. Ths leads to a sngle set of sgma ponts x () = x () and weghts W () = W (). The algorthm can be mplemented as follows:

6 Learnng Curves for Gaussan Processes va Numercal Cubature Integraton Intalze the elements of matrx P (0) as follows: for n = 1,...,N do P = P (n 1) P (0) = C(x(),x ( ) ). (22) W ()P(n 1) P (n 1) P (n 1) +s, (23) 2 where P denotes the th column of P and P denotes the th row. The approxmate learnng curve s gven as ǫ C = W () P. (24) 5 Numercal Comparson We tested the error bounds presented n ths artcle usng 1d and 2d squared exponental (SE) covarance functons exp( x x 2 /(2l 2 )) and wth Matérn covarance functon (1+ 3 x x /l)exp( 3 x x /l)). For SE covarance we used the parameters values l = 1, σ 2 = 10 3. The parameters for the Matérn covarance were selected to be l = 1, σ 2 = 0.1. The nput and test sets were assumed to have a zero mean unt Gaussan dstrbuton, for whch the weghts W () and evaluaton ponts x () can be obtaned by usng exstng methods. In addton to the bounds ǫ D defned n Equaton (14), ǫ UC n (15), ǫ LC n (17),and the proposed bound ǫ C n (24), we also compared to the followng well known Opper-Vvarell (OV) bound [5]: ǫ OV = s 2 λ nλ +s2. (25) For the SE covarance functons we used the known closed form formulas for the egenvalues, n the 1d Matérn case we computed the egenvalues numercally. In the 2d Matérn case the egenvalues were not avalable, because the egenvalue problem became too bg to be solved wth the requred numercal accuracy. We used 60th order Gauss-Hermte quadrature for the 1d ǫ C calculatons and 20th order Gauss-Hermte product-rule cubature for the 2d ǫ C calculatons. For all the cases, we also computed approxmaton to the true generalzaton error curve ǫ MC usng Monte Carlo method wth 100 ndependent tranng sets for each tranng set sze 1 100, and the generalzaton error was estmated wth test sets of sze 100, whch were drawn ndependently for each MC sample. The learnng curves computed usng dfferent approxmatons are shown n Fgure1.Ascanbeseennthefgures,nthe1dand2dSEcasestheproposedapproxmaton ǫ C overestmates the error, but s stll much better than ǫ OV and ts relatve accuracy s close to the other methods. In the 1d and 2d Matérn cases the proposed approxmaton s very accurate. The overall performance of the proposed method s very good gven that t does not need the egenvalues of the covarance functon at all.

Learnng Curves for Gaussan Processes va Numercal Cubature Integraton 7 10 0 ε MC 10 0 ε MC ε C ε C 10 1 ε OV ε D ε OV ε D ε UC ε UC ε 10 2 ε LC ε 10 1 ε LC 10 3 10 4 0 20 40 60 80 100 Tranng samples (a) 1d SE 10 2 0 20 40 60 80 100 Tranng samples (b) 1d Matérn 10 0 10 0 ε MC ε C 10 1 ε 10 2 10 3 ε MC ε C ε OV ε D ε UC ε LC 10 4 0 20 40 60 80 100 Tranng samples (c) 2d SE ε 10 1 10 2 0 20 40 60 80 100 Tranng samples (d) 2d Matérn Fg. 1. Learnng curves for squared exponental (SE) and Matérn covarance functons wth nput dmensons 1 and 2. 6 Concluson In ths artcle we have presented a new cubature ntegraton based method for approxmate computaton of learnng curves n Gaussan process regresson. The advantage of the method s that t does not requre avalablty of egenvalues of the covarance functon unlke most of the alternatve methods. The accuracy of the method was numercally compared to prevously proposed egenfuncton expanson based methods and the propoposed approach seems to gve good approxmatons to learnng curves especally n the case of Matérn covarance functon. Acknowledgments. The author s grateful to the Centre of Excellence n Computatonal Complex Systems Research of Academy of Fnland for the fnancal support and also lkes to thank Ak Vehtar for helpful comments on the manuscrpt.

8 Learnng Curves for Gaussan Processes va Numercal Cubature Integraton References 1. O Hagan, A.: Curve fttng and optmal desgn for predcton (wth dscusson). Journal of the Royal Statstcal Socety B 40(1), 1 42 (1978) 2. Rasmussen, C.E., Wllams, C.K.I.: Gaussan Processes for Machne Learnng. The MIT Press (2006) 3. Bshop, C.M.: Pattern recognton and machne learnng. Sprnger (2006) 4. Opper, M.: Regresson wth Gaussan processes: Average case performance. In: Theoretcal Aspects of Neural Computaton. Sprnger-Verlag (1997) 5. Opper, M., Vvarell, F.: General bounds on bayes errors for regresson wth gaussan processes. In: NIPS 11, pp. 302 308. The MIT Press (1999) 6. Sollch, P.: Approxmate learnng curves for Gaussan processes. In: Proceedngs of ICANN 99, pp. 437 442. (1999) 7. Sollch, P.: Learnng curves for Gaussan processes. In: NIPS 11. The MIT Press (1999) 8. Sollch, P., Halees, A.: Learnng curves for Gaussan process regresson: Approxmatons and bounds. Neural Computaton 14, 1393 1428 (2002) 9. Wllams, C.K.I., Vvarell, F.: Upper and lower bounds on the learnng curve for Gaussan processes. Machne Learnng 40, 77 102 (2000) 10. Sollch, P., Wllams, C.: Usng the equvalent kernel to understand Gaussan process regresson. In: NIPS 17, pp. 1313 1320. MIT Press (2005) 11. Van Trees, H.L.: Detecton, Estmaton, and Modulaton Theory Part I. John Wley & Sons, New York (1968) 12. Papouls, A.: Probablty, Random Varables, and Stochastc Processes. McGraw- Hll (1984) 13. Malzahn, D., Opper, M.: Learnng curves for Gaussan process regresson: A framework for good approxmatons. In: NIPS 13. The MIT Press (2001) 14. Malzahn, D., Opper, M.: A varatonal approach to learnng curves. In: NIPS 14. The MIT Press (2002) 15. Ito, K., Xong, K.: Gaussan flters for nonlnear flterng problems. IEEE Transactons on Automatc Control 45, 910 927 (2000) 16. Wu, Y., Hu, D., Wu, M., Hu, X.: A numercal-ntegraton perspectve on Gaussan flters. IEEE Transactons on Sgnal Processng 54, 2910 2921 (2006) 17. Arasaratnam, I., Haykn, S.: Cubature Kalman flters. IEEE Transactons on Automatc Control 54, 1254 1269 (2009) 18. Särkkä, S., Hartkanen, J.: On Gaussan optmal smoothng of non-lnear state space models. IEEE Transactons on Automatc Control 55, 1938 1941 (2010) 19. Curtan, R.: A survey of nfnte-dmensonal flterng. SIAM Revew 17(3), 395 411 (1975) 20. Ray, W.H., Lanots, D.G.: Dstrbuted Parameter Systems. Dekker (1978) 21. Cools, R.: Constructng cubature formulae: The scence behnd the art. In: Acta Numerca. Volume 6, pp. 1 54. Cambrdge Unversty Press (1997)