Kernel Mean Estimation and Stein Effect

Size: px
Start display at page:

Download "Kernel Mean Estimation and Stein Effect"

Transcription

1 Krikamol Muadet Empirical Iferece Departmet, Max Plack Istitute for Itelliget Systems, Tübige, Germay Keji Fukumizu The Istitute of Statistical Mathematics, Tokyo, Japa Bharath Sriperumbudur Statistical Laboratory, Uiversity of Cambridge, Cambridge, Uited Kigdom Arthur Gretto Gatsby Computatioal Neurosciece Uit, Uiversity College Lodo, Lodo, Uited Kigdom Berhard Schölkopf Empirical Iferece Departmet, Max Plack Istitute for Itelliget Systems, Tübige, Germay Abstract A mea fuctio i a reproducig kerel Hilbert space (RKHS), or a kerel mea, is a importat part of may algorithms ragig from kerel pricipal compoet aalysis to Hilbert-space embeddig of distributios. Give a fiite sample, a empirical average is the stadard estimate for the true kerel mea. We show that this estimator ca be improved due to a well-kow pheomeo i statistics called Stei s pheomeo. After cosideratio, our theoretical aalysis reveals the existece of a wide class of estimators that are better tha the stadard oe. Focusig o a subset of this class, we propose efficiet shrikage estimators for the kerel mea. Empirical evaluatios o several applicatios clearly demostrate that the proposed estimators outperform the stadard kerel mea estimator.. Itroductio This paper aims to improve the estimatio of the mea fuctio i a reproducig kerel Hilbert space (RKHS) from a fiite sample. A kerel mea of a probability distributio P over a measurable space X is defied by µ P k(x, ) dp(x) H, () X Proceedigs of the st Iteratioal Coferece o Machie Learig, Beijig, Chia,. JMLR: W&CP volume. Copyright by the author(s). wherehis a RKHS associated with a reproducig kerel k : X X R. Coditios esurig that this expectatio exists are give i Smola et al. (7). Ufortuately, it is ot practical to compute µ P directly because the distributio P is usually ukow. Istead, give a i.i.d sample x,x,...,x from P, we ca easily compute the empirical kerel mea by the average µ P k(x i, ). () The estimate µ P is the most commoly used estimate of the true kerel mea. Our primary iterest here is to ivestigate whether oe ca improve upo this stadard estimator. The kerel mea has recetly gaied attetio i the machie learig commuity, thaks to the itroductio of Hilbert space embeddig for distributios (Berliet ad Aga, ; Smola et al., 7). Represetig the distributio as a mea fuctio i the RKHS has several advatages: ) the represetatio with appropriate choice of kerel k has bee show to preserve all iformatio about the distributio (Fukumizu et al., ; Sriperumbudur et al., ; ); ) basic operatios o the distributio ca be carried out by meas of ier products i RKHS, e.g., E P [f(x)] = f,µ P H for all f H; ) o itermediate desity estimatio is required, e.g., whe testig for homogeeity from fiite samples. As a result, may algorithms have beefited from the kerel mea represetatio, amely, maximum mea discrepacy (MMD) (Gretto et al., 7), kerel depedecy measure (Gretto et al., ), kerel twosample-test (Gretto et al., ), Hilbert space embeddig of HMMs (Sog et al., ), ad kerel Bayes rule

2 (Fukumizu et al., ). Their performaces rely directly o the quality of the empirical estimate µ P. However, it is of great importace, especially for our readers who are ot familiar with kerel methods, to realize a more fudametal role of the kerel mea. It basically serves as a foudatio to most kerel-based learig algorithms. For istace, oliear compoet aalyses, such as kerel PCA, kerel FDA, ad kerel CCA, rely heavily o mea fuctios ad covariace operators i RKHS (Schölkopf et al., 99). The kerel k-meas algorithm performs clusterig i feature space usig mea fuctios as the represetatives of the clusters (Dhillo et al., ). Moreover, it also serves as a basis i early developmet of algorithms for classificatio ad aomaly detectio (Shawe-Taylor ad Cristiaii,, chap. ). All of those employ () as the estimate of the true mea fuctio. Thus, the fact that substatial improvemet ca be gaied whe estimatig () may i fact raise a widespread suspicio o traditioal way of learig with kerels. We show i this work that the stadard estimator () is, i a certai sese, ot optimal, i.e., there exist better estimators (more below). I additio, we propose shrikage estimators that outperform the stadard oe. At first glace, it was defiitely couter-ituitive ad surprisig for us, ad will udoubtedly also be for some of our readers, that the empirical kerel mea could be improved, ad, give the simplicity of the proposed estimators, that this has remaied uoticed util ow. Oe of the reasos may be that there is a commo belief that the estimator ˆµ P already gives a good estimate ofµ P, ad, as sample size goes to ifiity, the estimatio error disappears (Shawe-Taylor ad Cristiaii, ). As a result, o eed is felt to improve the kerel mea estimatio. However, give a fiite sample, substatial improvemet is i fact possible ad several factors may come ito play, as will be see later i this work. This work was partly ispired by Stei s semial work i 9, which showed that a maximum likelihood estimator (MLE), i.e., the stadard empirical mea, for the mea of the multivariate Gaussia distributio N(θ,σ I) is iadmissible (Stei, 9). That is, there exists a estimator that always achieves smaller total mea squared error regardless of the true θ, whe the dimesio is at least. Perhaps the best kow estimator of such kid is James- Steis estimator (James ad Stei, 9). Iterestigly, the James-Stei estimator is itself iadmissible, ad there exists a wide class of estimators that outperform the MLE, see e.g., Berger (97). However, our work differs fudametally from the Stei s semial works ad those alog this lie i two aspects. First, our settig is o-parametric i a sese that we do ot assume ay parametric form of the distributio, whereas most of traditioal works focus o some specific distributios, e.g., Gaussia distributio. Secod, our settig ivolves a o-liear feature map ito a high-dimesioal space, if ot ifiite. As a result, higher momets of the distributio may come ito play. Thus, oe caot adopt Stei s settig straightforwardly. A direct geeralizatio of James-Stei estimator to ifiite-dimesioal Hilbert space has already bee cosidered (Berger ad Wolpert, 9; Madelbaum ad Shepp, 97; Privault ad Rveillac, ). I those works, θ which is the parameter to be estimated is assumed to be the mea of a Gaussia measure o the Hilbert space from which samples are draw. I our case, o the other had, the samples are draw from P ad ot from the Gaussia distributio whose mea isµ P. The cotributio of this paper ca be summarized as follows: First, we show that the stadard kerel mea estimator ca be improved by providig a alterative estimator that achieves smaller risk ( ). The theoretical aalysis reveals the existece of a wide class of estimators that are better tha the stadard. To this ed, we propose i a kerel mea shrikage estimator (KMSE), which is based o a ovel motivatio for regularizatio through the otio of shrikage. Moreover, we propose a efficiet leave-oeout cross-validatio procedure to select the shrikage parameter, which is ovel i the cotext of kerel mea estimatio. Lastly, we demostrate the beefit of the proposed estimators i several applicatios ( ).. Motivatio: Shrikage Estimators For a arbitrary distributio P, deote by µ ad µ the true kerel mea ad its empirical estimate () from the i.i.d. sample x,x,...,x P (we remove the subscript for ease of otatio). The most atural loss fuctio cosidered i this work is l(µ, µ) = µ µ H. A estimator µ is a mappig which is measurable w.r.t. the Borel σ-algebra of H ad is evaluated by its risk fuctior(µ, µ) = E P [l(µ, µ)] wheree P idicates expectatio over the choice of i.i.d. sample of sizefromp. Let us cosider a alterative kerel mea estimator: µ α αf + ( α) µ where α < ad f H. It is essetially a shrikage estimator that shriks the stadard estimator toward a fuctio f by a amout specified by α. If α =, µ α reduces to the stadard estimator µ. The followig theorem asserts that the risk of shrikage estimator µ α is smaller tha that of stadard estimator µ give a appropriate choice of α, regardless of the fuctio f (more below). Theorem. For all distributiospad the kerel k, there existsα > for which R(µ, µ α ) < R(µ, µ). Proof. The risk of the stadard kerel mea estimator satisfies E µ µ = (E[k(x,x)] E[k(x, x)]) =:

3 where x is a idepedet copy of x. Let us defie the risk of the proposed shrikage estimator by α := E µ α µ where α is a o-egative shrikage parameter. We ca the write this i terms of the stadard risk as α = αe µ µ, µ µ+µ f + α E f α E[f (x)] + α E µ. It follows from the reproducig property of H that E[f (x)] = f,µ. Moreover, usig the fact that E µ = E µ µ+µ = + E[k(x, x)], we ca simplify the shrikage risk by α = α ( + f µ ) α +. Thus, we have α = α ( + f µ ) α which is o-positive where [ ] α, + f µ () ad miimized at α = /( + f µ ). As we ca see i (), there is a rage ofαfor which a opositive α, i.e., R(µ, µ α ) R(µ, µ), is guarateed. However, Theorem relies o the importat assumptio that the true kerel mea of the distributiopis required to estimate α. I spite of this, the theorem has a importat implicatio suggestig that the shrikage estimator µ α ca improve upo µ if α is chose appropriately. Later, we will exploit this result i order to costruct more practical estimators. Remark. The followig observatios follow immediately from Theorem : The shrikage estimator always improves upo the stadard oe regardless of the directio of shrikage, as specified by f. I other words, there exists a wide class of kerel mea estimators that are better tha the stadard oe. The value of α also depeds o the choice of f. The furtherf is fromµ, the smallerαbecomes. Thus, the shrikage gets smaller if f is chose such that it is far from the true kerel mea. This effect is aki to James-Stei estimator. The improvemet ca be viewed as a bias-variace trade-off: the shrikage estimator reduces variace substatially at the expese of a little bias. Remark sheds light o how oe ca practically costruct the shrikage estimator: we ca choose f arbitrarily as log as the parameter α is chose appropriately. Moreover, further improvemet ca be gaied by icorporatig prior kowledge as to the locatio of µ P, which ca be straightforwardly itegrated ito the framework via f (Berger ad Wolpert, 9). Ispired by James-Stei estimator, we focus o f =. We will ivestigate the effect of differet prior f i future works.. Kerel Mea Shrikage Estimator I this sectio we give a ovel formulatio of kerel mea estimator that allows us to estimate the shrikage parameter efficietly. I the followig, let φ : X H be a feature map associated with the kerel k ad, be a ier product i the RKHSHsuch thatk(x,x ) = φ(x),φ(x ). Uless stated otherwise, deotes the RKHS orm. The kerel mea µ P ad its empirical estimate µ P ca be obtaied as a miimizer of the loss fuctioals E(g) E x P φ(x) g, Ê(g) φ(x i ) g, respectively. We will call the estimator miimizig the loss fuctioal Ê(g) a kerel mea estimator (KME). Note that the losse(g) is differet from the oe cosidered i, i.e., l(µ,g) = µ g = E[φ(x)] g. Nevertheless, we havel(µ,g) = E xx k(x,x ) E x g(x)+ g. SiceE(g) = E x k(x,x) E x g(x)+ g, the lossl(µ,g) differs frome(g) oly bye x k(x,x) E xx k(x,x ) which is ot a fuctio of g. We itroduce the ew form here because it will give a more tractable cross-validatio computatio (.). I spite of this, the resultig estimators are always evaluated w.r.t. the loss i (cf..). From the formulatio above, it is atural to ask if miimizig the regularized versio of Ê(g) will give better estimator. O the oe had, oe ca argue that, ulike i the classical risk miimizatio, we do ot really eed a regularizer here. The stadard estimator () is kow to be, i a certai sese, optimal ad ca be estimated reliably (Shawe-Taylor ad Cristiaii,, prop..). Moreover, the origial formulatio ofê(g) is a well-posed problem. O the other had, sice regularizatio may be viewed as shrikig the solutio toward zero, it ca actually improve the kerel mea estimatio, as suggested by Theorem (cf. discussios at the ed of ). Cosequetly, we miimize a modified loss fuctioal Ê (g) Ê(g)+Ω( g ) = φ(x i ) g +Ω( g ), () whereω( ) deotes a mootoically-icreasig regularizatio fuctioal ad is a o-egative regularizatio parameter. I what follows, we refer to the shrikage estimator µ miimizig Ê(g) as a kerel mea shrikage estimator (KMSE). The parameters α ad play similar role as a shrikage parameter. They specify a amout by which the stadard estimator µ is shruk toward f =. Thus, the term shrikage parameter ad regularizatio parameter will be used iterchageably.

4 It follows from the represeter theorem thatg lies i a subspace spaed by the data, i.e., g = j= β jφ(x j ) for some β R. By cosiderig Ω( g ) = g, we ca rewrite () as φ(x i) β j φ(x j ) + β j φ(x j ) j= j= = β Kβ β K +β Kβ +c, () wherecis a costat term,kis a Gram matrix such that K ij = k(x i,x j ), ad = [/,/,...,/]. Takig a derivative of () w.r.t. β ad settig it to zero yield β = (/( + )). By settig α = /( + ) the shrikage estimate ca be writte as µ = ( α) µ. Sice < α <, the estimator µ correspods to a shrikage estimator discussed i whe f =. We call this estimator a simple kerel mea shrikage estimator (S-KMSE). Usig the expasio g = j= β jφ(x j ), we may cosider whe the regularizatio fuctioal is writte i term of β, e.g., β β. This leads to a particularly iterestig kerel mea estimator. I this case, the optimal weight vector is give by β = (K + I) K ad the shrikage estimate ca be writte accordigly as µ = j= β jφ(x j ) = Φ (K + I) K where Φ = [φ(x ),φ(x ),...,φ(x )]. Ulike the S-KMSE, this estimator shriks the usual estimate differetly i each coordiate (cf. Theorem ). Hece, we will call it a flexible kerel mea shrikage estimator (F-KMSE). The followig theorem characterizes the F-KMSE as a shrikage estimator. Theorem. The F-KMSE ca be writte as µ = γ i γ µ,v i+ i v i where {γ i,v i } are eigevalue ad eigevector pairs of the empirical covariace operator Ĉ xx ih. I words, the effect of F-KMSE is to reduce high frequecy compoets of the expasio of µ, by expadig this i terms of the kerel PCA basis ad shrikig the coefficiets of the high order eigefuctios, e.g., see Rasmusse ad Williams (, sec..). Note that the covariace operator Ĉxx itself does ot deped o. As we ca see, the solutio to the regularized versio is ideed of the form of shrikage estimators whe f =. That is, both S-KMSE ad F-KMSE shrik the stadard kerel mea estimate towards zero. The differece is that the S-KMSE shriks equally i all coordiate, whereas the F-KMSE also costraits the amout of shrikage by the iformatio cotaied i each coordiate. Moreover, the squared RKHS orm ca be decomposed as a sum of squared loss weighted by the eigevalues γ i (cf. Madelbaum ad Shepp (97, appedix)). By the same reasoig as Stei s result i fiite-dimesioal case, oe would suspect that a improvemet of shrikage estimators i H should also deped o how fast the eigevalues of k decay. That is, oe would expect greater improvemet if the values ofγ i decay very slowly. For example, the Gaussia RBF kerel with larger badwidth gives smaller improvemet whe compared to oe with smaller badwidth. Similarly, we should expect to see more improvemet whe applyig a Laplacia kerel tha whe usig a Gaussia RBF kerel. I some applicatios of kerel mea embeddig, oe may wat to iterpret the weight β as a probability vector (Nishiyama et al., ). However, the weight vector β output by our estimators is i geeral ot ormalized. I fact, all elemets will be smaller tha / as a result of shrikage. However, oe may impose a costrait that β must sum to oe ad resort to a quadratic programmig (Sog et al., ). Ufortuately, this approach has udesirable effect of sparsity which is ulikely to improve upo the stadard estimator. Post-ormalizig the weights ofte deteriorates the estimatio performace. To the best of our kowledge, o previous attempt has bee made to improve the kerel mea estimatio. However, we discuss some closely related works here. For example, istead of the loss fuctioal Ê(g), Kim ad Scott () cosider a robust loss fuctio such as the Huber s loss to reduce the effect of outliers. The authors cosider kerel desity estimators, which differ fudametally from kerel mea estimators. They eed to reduce the kerel badwidth with icreasig sample size for the estimators to be cosistet. Regularized versio of MMD was adopted by Daafar et al. () i the cotext of kerelbased hypothesis testig. The resultig formulatio resembles our S-KMSE. Furthermore, the F-KMSE is of a similar form as the coditioal mea embeddig used i Grüewälder et al. (), which ca be viewed more geerally as a regressio problem i RKHS with smooth operators (Grüewälder et al., )... Choosig Shrikage Parameter As discussed i, the amout of shrikage plays a importat role i our estimators. I this work we propose to select the shrikage parameter by a automatic leaveoe-out cross-validatio. For a give shrikage parameter, let us cosider the observatiox i as beig a ew observatio by omittig it from the dataset. Deote by µ ( i) = j i β( i) j φ(x j ) the kerel mea estimated from the remaiig data, usig the valueas a shrikage parameter, so thatβ ( i) is the miimizer ofê( i) (g). We will measure the quality of µ ( i) by how well it approximates φ(x i ). The overall quality of the

5 estimate is quatified by the cross-validatio score LOOCV() = φ(x i ) µ ( i) H. () By simple algebra, it is ot difficult to show that the optimal shrikage parameter of S-KMSE ca be calculated aalytically, as stated by the followig theorem. Theorem. Let ρ j= k(x i,x j ) ad k(x i,x i ). The shrikage parameter = ( ρ)/(( )ρ+ / ) of the S-KMSE is the miimizer of LOOCV(). O the other had, fidig the optimalfor the F-KMSE is relatively more ivolved. Evaluatig the score () aïvely requires oe to solve for µ ( i) explicitly for every i. Fortuately, we ca simplify the score such that it ca be evaluated efficietly, as stated i the followig theorem. Theorem. The LOOCV score of F-KMSE satisfies LOOCV() = (β K K i ) C (β K K i ) where β is the weight vector calculated from the full dataset with the shrikage parameter ad C = (K K(K+I) K) K(K K(K+I) K). Proof of Thorem. For fixed ad i, let µ ( i) be the leave-oe-out kerel mea estimate of F-KMSE ad let A (K + I). The, we ca write a expressio for the deleted residual as ( i) := µ ( i) φ(x i ) = µ φ(x i ) + j= l= A jl φ(x l ), µ ( i) φ(x i ) φ(x j ). Sice ( i) lies i a subspace spaed by the sample φ(x ),...,φ(x ), we have ( i) = k= ξ kφ(x k ) for some ξ R. Substitutig ( i) back yields k= ξ kφ(x k ) = µ φ(x i ) + j= {AKξ} jφ(x j ). By takig the ier product o both sides w.r.t. the sample φ(x ),...,φ(x ) ad solvig for ξ, we have ξ = (K KAK) (β K K i ) wherek i is theith colum of K. Cosequetly, the leave-oe-out score of the sample x i ca be computed by ( i) = ξ Kξ = (β K K i ) (K KAK) K(K KAK) (β K K i ) = (β K K i ) C (β K K i ). Averagig ( i) over all samples gives LOOCV() = ( i) = (β K K i ) C (β K K i ), as required. It is iterestig to see that the leave-oe-out crossvalidatio score i Theorem depeds oly o the oleave-oe-out solutio β, which ca be obtaied as a byproduct of the algorithm. Computatioal complexity The S-KMSE requires O( ) operatios to select shrikage parameter. For the F-KMSE, there are two steps i cross-validatio. First, we eed to compute (K + I) repeatedly for differet values of. Assume that we kow the eigedecompositio K = UDU where D is diagoal with d ii ad UU = I. It follows that (K+I) = U(D+I) U. Cosequetly, solvig for β takes O( ) operatios. Sice eigedecompositio requires O( ) operatios, fidig β for may s is essetially free. A low-rak approximatio ca also be adopted to reduce the computatioal cost further. Secod, we eed to compute the cross-validatio score (). As show i Theorem, we ca compute it usig oly β obtaied from the previous step. The calculatio of C ca be simplified further via the eigedecompositio of K as C = U(D D(D+I) D) D(D D(D+ I) D) U. Sice it oly ivolves the iverse of diagoal matrices, the iversio ca be evaluated i O() operatios. The overall computatioal complexity of the crossvalidatio requires oly O( ) operatios, as opposed to the aïve approach that requires O( ) operatios. Whe performed as a by-product of the algorithm, the computatioal cost of cross-validatio procedure becomes egligible as the dataset becomes larger. I practice, we use the fmisearch ad fmibd routies of the MATLAB optimizatio toolbox to fid the best shrikage parameter... Covariace Operators The covariace operator fromh X toh Y ca be viewed as a mea fuctio i a product space H X H Y. Hece, we ca also costruct a shrikage estimator of covariace operator i RKHS. Let (H X,k X ) ad (H Y,k Y ) be the RKHS of fuctios o measurable space X ad Y, respectively, with p.d. kerel k X ad k Y (with feature map φ ad ϕ). We will cosider a radom vector (X,Y) : Ω X Y with distributio P XY, with P X ad P Y as margial distributios. Uder some coditios, there exists a uique cross-covariace operator Σ YX : H X H Y such that g,σ YX f HY = E XY [(f(x) E X [f(x)])(g(y) E Y [g(y)])] = Cov(f(X),g(Y)) holds for all f H X ad g H Y (Fukumizu et al., ). If X equals Y, we get the self-adjoit operatorσ XX called the covariace operator. Give a i.i.d sample from P XY writte as (x,y ),(x,y ),...,(x,y ), we ca write the empirical cross-covariace operator as Σ YX := φ(x i) ϕ(y i ) µ X µ Y where µ X = φ(x i) ad µ Y = ϕ(y i). Let φ ad ϕ be the cetered feature maps of φ ad ϕ, respectively. The, it ca be rewritte as Σ YX := φ(x i ) ϕ(y i ) H X H Y. It follows from the ier product property i product space that φ(x) ϕ(y), φ(x ) ϕ(y ) HX H Y = φ(x), φ(x ) HX ϕ(y), ϕ(y ) HY = k X (x,x ) k Y (y,y ).

6 =. γ = γ = γ =. γ x = γ x 9 7 = γ x =. γ x = γ x = γ x =. γ = γ = γ (a) LIN (b) POLY (c) POLY (d) RBF Figure. The average loss of KME (left), S-KMSE (middle), ad F-KMSE (right) estimators with differet values of shrikage parameter. Iside boxes correspod to estimators. We repeat the experimets over differet distributios with = ad d =. The, we ca obtai the shrikage estimators for the covariace operator by pluggig the kerel k((x,y),(x,y )) = k X (x,x ) k Y (y,y ) i our KM- SEs. We will call this estimator a covariace-operator shrikage estimator (COSE). The same trick ca be easily geeralized to tesors of higher order, which have bee previously used, for example, i Sog et al. ().. Experimets We focus o the compariso betwee our shrikage estimators ad the stadard estimator of the kerel mea usig both sythetic datasets ad real-world datasets... Sythetic Data Give the true data-geeratig distributio P, we evaluate differet estimators usig the loss fuctio l(β) β ik(x i, ) E P [k(x, )] H where β is the weight vector associated with differet estimators. To allow for a exact calculatio of l(β), we cosider whe P is a mixture-of-gaussias distributio ad k is the followig kerel fuctio: ) liear kerel k(x,x ) = x x ; ) polyomial degree- kerel k(x,x ) = (x x + ) ; ) polyomial degree- kerel k(x,x ) = (x x + ) ; ad ) Gaussia RBF kerel k(x,x ) = exp ( x x /σ ). We will refer to them as LIN, POLY, POLY, ad RBF, respectively. Experimetal protocol. Data are geerated from a d- dimesioal mixture of Gaussias: x π i N(θ i,σ i )+ε, θ ij U(,), Σ i W( I d,7), ε N(,. I d ), where U(a,b) ad W(Σ,df) represet the uiform distributio ad Wishart distributio, respectively. We set π = [.,.,.,.]. The choice of parameters here is quite arbitrary; we have experimeted usig various parameter settigs ad the results are similar to those preseted here. For the Gaussia RBF kerel, we set the badwidth parameter to square-root of the media Euclidea distace betwee samples i the dataset (i.e., σ = media { x i x j } throughout). Figure shows the average loss of differet estimators usig differet kerels as we icrease the value of shrikage parameter. Here we scale the shrikage parameter by the miimum o-zero eigevalue γ of kerel matrix K. I geeral, we fid S-KMSE ad F-KMSE ted to outperform KME. However, as becomes large, there are some cases where shrikage deteriorates the estimatio performace, e.g., see LIN kerel ad some outliers i the figures. This suggests that it is very importat to choose the parameter appropriately (cf. the discussio i ). Similarly, Figure depicts the average loss as we vary the sample size ad dimesio of the data. I this case, the shrikage parameter is chose by the proposed leave-oeout cross-validatio score. As we ca see, both S-KMSE Average Loss Average Loss LIN Sample Size (d=) 7 LIN Dimesio (=).... x POLY Sample Size (d=) x POLY Dimesio (=). x POLY.. Sample Size (d=) x POLY Dimesio (=) RBF KME S KMSE F KMSE Sample Size (d=) RBF Dimesio (=) Figure. The average loss over differet distributios of KME, S-KMSE, ad F-KMSE with varyig sample size () ad dimesio (d). The shrikage parameter is chose by LOOCV.

7 Table. Average egative log-likelihood of the model Q o test poits over radomizatios. The boldface represets the result whose differece from the baselie, i.e., KME, is statistically sigificat. Dataset LIN POLY POLY RBF KME S-KMSE F-KMSE KME S-KMSE F-KMSE KME S-KMSE F-KMSE KME S-KMSE F-KMSE. ioosphere soar australia specft wdbc wie satimage segmet vehicle svmguide vowel housig bodyfat abaloe glass ad F-KMSE outperform the stadard KME. The S-KMSE performs slightly better tha the F-KMSE. Moreover, the improvemet is more substatial i the large d, small paradigm. I the worst cases, the S-KMSE ad F-KMSE perform as well as the KME. Lastly, it is istructive to ote that the improvemet varies with the choice of kerel k. Briefly, the choice of kerel reflects the dimesioality of feature space H. Oe would expect more improvemet i high-dimesioal space, e.g., RBF kerel, tha the low-dimesioal, e.g., liear kerel (cf. discussios at the ed of ). This pheomeo ca be observed i both Figure ad... Real Data We cosider three bechmark applicatios: desity estimatio via kerel mea matchig (Sog et al., ), kerel PCA usig shrikage mea ad covariace operator (Schölkopf et al., 99), ad discrimiative learig o distributios (Muadet ad Schölkopf, ; Muadet et al., ). For the first two tasks we employ datasets from the UCI repositories. We use oly realvalued features, each of which is ormalized to have zero mea ad uit variace. Desity estimatio. We perform desity estimatio via kerel mea matchig (Sog et al., ). That is, we fit the desity Q = m j= π jn(θ j,σj I) to each dataset by miimizig µ µ Q H s.t. m j= π j =. The kerel mea µ is obtaied from the samples usig differet estimators, whereas µ Q is the kerel mea embeddig of the desity Q. Ulike experimets i Sog et al. (), our goal is to compare differet estimators of µ P where P is the true data distributio. That is, we replace ˆµ with a versio obtaied via shrikage. A better estimate ofµ P should lead to better desity estimatio, as measured by the egative log-likelihood of Q o the test set. We use % of the dataset as a test set. We set m = for each dataset. The model is iitialized by ruig radom iitializatios usig the k-meas algorithm ad returig the best. We repeat the experimets times ad perform the paired sig test o the results at the % sigificace level. The average egative log-likelihood of the model Q, optimized via differet estimators, is reported i Table. Clearly, both S-KMSE ad F-KMSE cosistetly achieve smaller egative log-likelihood whe compared to KME. There are however few cases i which KME outperforms the proposed estimators, especially whe the dataset is relatively large, e.g., satimage ad abaloe. We suspect that i those cases the stadard KME already provides a accurate estimate of the kerel mea. To get a better estimate, more effort is required to optimize for the shrikage parameter. Moreover, the improvemet across differet kerels is cosistet with results o the sythetic datasets. Kerel PCA. I this experimet, we perform the KPCA usig differet estimates of the mea ad covariace operators. We compare the recostructio error E proj (z) = φ(z) Pφ(z) o test samples wherepis the projectio costructed from the first pricipal compoets. We use a Gaussia RBF kerel for all datasets. We compare differet scearios: ) stadard KPCA; ) shrikage ceterig with S-KMSE; ) shrikage ceterig with F-KMSE; ) KPCA with S-COSE; ad ) KPCA with F-COSE. To perform KPCA o shrikage covariace operator, we solve the geeralized eigevalue problem K c BK c V = K c VD where B = diag(β) ad K c is the cetered Gram matrix. The weight vector β is obtaied from shrikage estimators usig the kerel matrix K c K c where deotes the Hadamard product. We use % of the dataset as a test set. The paired sig test is a oparametric test that ca be used to examie whether two paired samples have the same distributio. I our case, we compare S-KMSE ad F-KMSE agaist KME.

8 KME S KMSE F KMSE S COSE F COSE recostructio error.... ioosphere soar australia specft wdbc wie satimage segmet vehicle svmguide vowel housig bodyfat abaloe glass Figure. The average recostructio error of KPCA o hold-out test samples over repetitios. The KME represets the stadard approach, whereas S-KMSE ad F-KMSE use shrikage meas to perform ceterig. The S-COSE ad F-COSE directly use the shrikage estimate of the covariace operator. Figure illustrates the results of KPCA. Clearly, the S- COSE ad F-COSE cosistetly outperforms all other estimators. Although we observe a improvemet of S-KMSE ad F-KMSE over KME, it is very small compared to that of S-COSE ad F-COSE. This makes sese ituitively, sice chagig the mea poit or shiftig data does ot chage the covariace structure cosiderably, so it will ot sigificatly affect the recostructio error. Table. The classificatio accuracy of SMM ad the area uder ROC curve (AUC) of OCSMM usig differet kerel mea estimators to costruct the kerel o distributios. Estimator Liear No-liear SMM OCSMM SMM OCSMM KME S-KMSE F-KMSE Discrimiative learig o distributios. A positive semi-defiite kerel betwee distributios ca be defied via their kerel mea embeddigs. That is, give a traiig sample ( P,y ),...,( P m,y m ) P {,+} where P i := k= δ x i ad xi k k P i, the liear kerel betwee two distributios is approximated by µ Pi, µ Pj = k= βi k φ(xi k ), l= βj l φ(xj l ) = k,l= βi k βj l k(xi k,xj l ). The weight vectors βi ad β j come from the kerel mea estimates of µ Pi ad µ Pj, respectively. The o-liear kerel ca the be defied accordigly, e.g., κ(p i,p j ) = exp( µ Pi µ Pj H /σ ). Our goal i this experimet is to ivestigate if the shrikage estimate of the kerel mea improves the performace of the discrimiative learig o distributios. To this ed, we coduct experimets o atural scee categorizatio usig support measure machie (SMM) (Muadet et al., ) ad group aomaly detectio o a high-eergy physics dataset usig oe-class SMM (OC- SMM) (Muadet ad Schölkopf, ). We use both liear ad o-liear kerels where the Gaussia RBF kerel is employed as a embeddig kerel (Muadet et al., ). All hyper-parameters are chose by -fold crossvalidatio. For our usupervised problem, we repeat the experimets usig several parameter settigs ad report the best results. Table reports the classificatio accuracy of SMM ad the area uder ROC curve (AUC) of OCSMM usig differet kerel mea estimators. Both shrikage estimators cosistetly lead to better performace o both SMM ad OC- SMM whe compared to KME. To summarize, we fid sufficiet evidece to coclude that both S-KMSE ad F-KMSE outperforms the stadard KME. The performace of S-KMSE ad F-KMSE is very competitive. The differece depeds o the dataset ad the kerel fuctio.. Coclusios To coclude, we show that the commoly used kerel mea estimator ca be improved. Our theoretical result suggests that there exists a wide class of kerel mea estimators that are better tha the stadard oe. To demostrate this, we focus o two efficiet shrikage estimators, amely, simple ad flexible kerel mea shrikage estimators. Empirical study clearly shows that the proposed estimators outperform the stadard oe i various scearios. Most importatly, the shrikage estimates ot oly provide more accurate estimatio, but also lead to superior performace o real-world applicatios. Ackowledgmets The authors wish to thak David Hogg ad Ross Fedely for readig the first draft ad aoymous reviewers who gave valuable suggestio that has helped to improve the mauscript.

Chapter 7 Methods of Finding Estimators

Chapter 7 Methods of Finding Estimators Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of

More information

Modified Line Search Method for Global Optimization

Modified Line Search Method for Global Optimization Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o

More information

7. Sample Covariance and Correlation

7. Sample Covariance and Correlation 1 of 8 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 7. Sample Covariace ad Correlatio The Bivariate Model Suppose agai that we have a basic radom experimet, ad that X ad Y

More information

LECTURE 13: Cross-validation

LECTURE 13: Cross-validation LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M

More information

Properties of MLE: consistency, asymptotic normality. Fisher information.

Properties of MLE: consistency, asymptotic normality. Fisher information. Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout

More information

Hypothesis testing. Null and alternative hypotheses

Hypothesis testing. Null and alternative hypotheses Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate

More information

I. Chi-squared Distributions

I. Chi-squared Distributions 1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.

More information

Research Article Sign Data Derivative Recovery

Research Article Sign Data Derivative Recovery Iteratioal Scholarly Research Network ISRN Applied Mathematics Volume 0, Article ID 63070, 7 pages doi:0.540/0/63070 Research Article Sig Data Derivative Recovery L. M. Housto, G. A. Glass, ad A. D. Dymikov

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should

More information

A probabilistic proof of a binomial identity

A probabilistic proof of a binomial identity A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two

More information

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample

More information

Department of Computer Science, University of Otago

Department of Computer Science, University of Otago Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly

More information

3. Covariance and Correlation

3. Covariance and Correlation Virtual Laboratories > 3. Expected Value > 1 2 3 4 5 6 3. Covariace ad Correlatio Recall that by takig the expected value of various trasformatios of a radom variable, we ca measure may iterestig characteristics

More information

Gregory Carey, 1998 Linear Transformations & Composites - 1. Linear Transformations and Linear Composites

Gregory Carey, 1998 Linear Transformations & Composites - 1. Linear Transformations and Linear Composites Gregory Carey, 1998 Liear Trasformatios & Composites - 1 Liear Trasformatios ad Liear Composites I Liear Trasformatios of Variables Meas ad Stadard Deviatios of Liear Trasformatios A liear trasformatio

More information

Coordinating Principal Component Analyzers

Coordinating Principal Component Analyzers Coordiatig Pricipal Compoet Aalyzers J.J. Verbeek ad N. Vlassis ad B. Kröse Iformatics Istitute, Uiversity of Amsterdam Kruislaa 403, 1098 SJ Amsterdam, The Netherlads Abstract. Mixtures of Pricipal Compoet

More information

Maximum Likelihood Estimators.

Maximum Likelihood Estimators. Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio

More information

Plug-in martingales for testing exchangeability on-line

Plug-in martingales for testing exchangeability on-line Plug-i martigales for testig exchageability o-lie Valetia Fedorova, Alex Gammerma, Ilia Nouretdiov, ad Vladimir Vovk Computer Learig Research Cetre Royal Holloway, Uiversity of Lodo, UK {valetia,ilia,alex,vovk}@cs.rhul.ac.uk

More information

Class Meeting # 16: The Fourier Transform on R n

Class Meeting # 16: The Fourier Transform on R n MATH 18.152 COUSE NOTES - CLASS MEETING # 16 18.152 Itroductio to PDEs, Fall 2011 Professor: Jared Speck Class Meetig # 16: The Fourier Trasform o 1. Itroductio to the Fourier Trasform Earlier i the course,

More information

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5 Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.

More information

Asymptotic Growth of Functions

Asymptotic Growth of Functions CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll

More information

9.8: THE POWER OF A TEST

9.8: THE POWER OF A TEST 9.8: The Power of a Test CD9-1 9.8: THE POWER OF A TEST I the iitial discussio of statistical hypothesis testig, the two types of risks that are take whe decisios are made about populatio parameters based

More information

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Case Study. Normal and t Distributions. Density Plot. Normal Distributions Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca

More information

Chapter 5: Inner Product Spaces

Chapter 5: Inner Product Spaces Chapter 5: Ier Product Spaces Chapter 5: Ier Product Spaces SECION A Itroductio to Ier Product Spaces By the ed of this sectio you will be able to uderstad what is meat by a ier product space give examples

More information

Irreducible polynomials with consecutive zero coefficients

Irreducible polynomials with consecutive zero coefficients Irreducible polyomials with cosecutive zero coefficiets Theodoulos Garefalakis Departmet of Mathematics, Uiversity of Crete, 71409 Heraklio, Greece Abstract Let q be a prime power. We cosider the problem

More information

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value

More information

Divide and Conquer. Maximum/minimum. Integer Multiplication. CS125 Lecture 4 Fall 2015

Divide and Conquer. Maximum/minimum. Integer Multiplication. CS125 Lecture 4 Fall 2015 CS125 Lecture 4 Fall 2015 Divide ad Coquer We have see oe geeral paradigm for fidig algorithms: the greedy approach. We ow cosider aother geeral paradigm, kow as divide ad coquer. We have already see a

More information

Entropy of bi-capacities

Entropy of bi-capacities Etropy of bi-capacities Iva Kojadiovic LINA CNRS FRE 2729 Site école polytechique de l uiv. de Nates Rue Christia Pauc 44306 Nates, Frace iva.kojadiovic@uiv-ates.fr Jea-Luc Marichal Applied Mathematics

More information

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork Solutios to Selected Problems I: Patter Classificatio by Duda, Hart, Stork Joh L. Weatherwax February 4, 008 Problem Solutios Chapter Bayesia Decisio Theory Problem radomized rules Part a: Let Rx be the

More information

Normal Distribution.

Normal Distribution. Normal Distributio www.icrf.l Normal distributio I probability theory, the ormal or Gaussia distributio, is a cotiuous probability distributio that is ofte used as a first approimatio to describe realvalued

More information

Incremental calculation of weighted mean and variance

Incremental calculation of weighted mean and variance Icremetal calculatio of weighted mea ad variace Toy Fich faf@cam.ac.uk dot@dotat.at Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically

More information

THE HEIGHT OF q-binary SEARCH TREES

THE HEIGHT OF q-binary SEARCH TREES THE HEIGHT OF q-binary SEARCH TREES MICHAEL DRMOTA AND HELMUT PRODINGER Abstract. q biary search trees are obtaied from words, equipped with the geometric distributio istead of permutatios. The average

More information

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed Multi-Evet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria

More information

Soving Recurrence Relations

Soving Recurrence Relations Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree

More information

Systems Design Project: Indoor Location of Wireless Devices

Systems Design Project: Indoor Location of Wireless Devices Systems Desig Project: Idoor Locatio of Wireless Devices Prepared By: Bria Murphy Seior Systems Sciece ad Egieerig Washigto Uiversity i St. Louis Phoe: (805) 698-5295 Email: bcm1@cec.wustl.edu Supervised

More information

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k.

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k. 18.409 A Algorithmist s Toolkit September 17, 009 Lecture 3 Lecturer: Joatha Keler Scribe: Adre Wibisoo 1 Outlie Today s lecture covers three mai parts: Courat-Fischer formula ad Rayleigh quotiets The

More information

PSYCHOLOGICAL STATISTICS

PSYCHOLOGICAL STATISTICS UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION B Sc. Cousellig Psychology (0 Adm.) IV SEMESTER COMPLEMENTARY COURSE PSYCHOLOGICAL STATISTICS QUESTION BANK. Iferetial statistics is the brach of statistics

More information

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee

More information

Convexity, Inequalities, and Norms

Convexity, Inequalities, and Norms Covexity, Iequalities, ad Norms Covex Fuctios You are probably familiar with the otio of cocavity of fuctios. Give a twicedifferetiable fuctio ϕ: R R, We say that ϕ is covex (or cocave up) if ϕ (x) 0 for

More information

Statistical inference: example 1. Inferential Statistics

Statistical inference: example 1. Inferential Statistics Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either

More information

4.1 Sigma Notation and Riemann Sums

4.1 Sigma Notation and Riemann Sums 0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas

More information

The second difference is the sequence of differences of the first difference sequence, 2

The second difference is the sequence of differences of the first difference sequence, 2 Differece Equatios I differetial equatios, you look for a fuctio that satisfies ad equatio ivolvig derivatives. I differece equatios, istead of a fuctio of a cotiuous variable (such as time), we look for

More information

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. This documet was writte ad copyrighted by Paul Dawkis. Use of this documet ad its olie versio is govered by the Terms ad Coditios of Use located at http://tutorial.math.lamar.edu/terms.asp. The olie versio

More information

Key Ideas Section 8-1: Overview hypothesis testing Hypothesis Hypothesis Test Section 8-2: Basics of Hypothesis Testing Null Hypothesis

Key Ideas Section 8-1: Overview hypothesis testing Hypothesis Hypothesis Test Section 8-2: Basics of Hypothesis Testing Null Hypothesis Chapter 8 Key Ideas Hypothesis (Null ad Alterative), Hypothesis Test, Test Statistic, P-value Type I Error, Type II Error, Sigificace Level, Power Sectio 8-1: Overview Cofidece Itervals (Chapter 7) are

More information

NEW HIGH PERFORMANCE COMPUTATIONAL METHODS FOR MORTGAGES AND ANNUITIES. Yuri Shestopaloff,

NEW HIGH PERFORMANCE COMPUTATIONAL METHODS FOR MORTGAGES AND ANNUITIES. Yuri Shestopaloff, NEW HIGH PERFORMNCE COMPUTTIONL METHODS FOR MORTGGES ND NNUITIES Yuri Shestopaloff, Geerally, mortgage ad auity equatios do ot have aalytical solutios for ukow iterest rate, which has to be foud usig umerical

More information

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1) BASIC STATISTICS. SAMPLES, RANDOM SAMPLING AND SAMPLE STATISTICS.. Radom Sample. The radom variables X,X 2,..., X are called a radom sample of size from the populatio f(x if X,X 2,..., X are mutually idepedet

More information

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008 I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces

More information

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length Joural o Satisfiability, Boolea Modelig ad Computatio 1 2005) 49-60 A Faster Clause-Shorteig Algorithm for SAT with No Restrictio o Clause Legth Evgey Datsi Alexader Wolpert Departmet of Computer Sciece

More information

DAME - Microsoft Excel add-in for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2

DAME - Microsoft Excel add-in for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2 Itroductio DAME - Microsoft Excel add-i for solvig multicriteria decisio problems with scearios Radomir Perzia, Jaroslav Ramik 2 Abstract. The mai goal of every ecoomic aget is to make a good decisio,

More information

THE problem of fitting a circle to a collection of points

THE problem of fitting a circle to a collection of points IEEE TRANACTION ON INTRUMENTATION AND MEAUREMENT, VOL. XX, NO. Y, MONTH 000 A Few Methods for Fittig Circles to Data Dale Umbach, Kerry N. Joes Abstract Five methods are discussed to fit circles to data.

More information

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas: Chapter 7 - Samplig Distributios 1 Itroductio What is statistics? It cosist of three major areas: Data Collectio: samplig plas ad experimetal desigs Descriptive Statistics: umerical ad graphical summaries

More information

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8 CME 30: NUMERICAL LINEAR ALGEBRA FALL 005/06 LECTURE 8 GENE H GOLUB 1 Positive Defiite Matrices A matrix A is positive defiite if x Ax > 0 for all ozero x A positive defiite matrix has real ad positive

More information

Sequences and Series

Sequences and Series CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their

More information

Function factorization using warped Gaussian processes

Function factorization using warped Gaussian processes Fuctio factorizatio usig warped Gaussia processes Mikkel N. Schmidt ms@imm.dtu.dk Uiversity of Cambridge, Departmet of Egieerig, Trumpigto Street, Cambridge, CB2 PZ, UK Abstract We itroduce a ew approach

More information

5: Introduction to Estimation

5: Introduction to Estimation 5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample

More information

1 Correlation and Regression Analysis

1 Correlation and Regression Analysis 1 Correlatio ad Regressio Aalysis I this sectio we will be ivestigatig the relatioship betwee two cotiuous variable, such as height ad weight, the cocetratio of a ijected drug ad heart rate, or the cosumptio

More information

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection The aalysis of the Courot oligopoly model cosiderig the subjective motive i the strategy selectio Shigehito Furuyama Teruhisa Nakai Departmet of Systems Maagemet Egieerig Faculty of Egieerig Kasai Uiversity

More information

MARTINGALES AND A BASIC APPLICATION

MARTINGALES AND A BASIC APPLICATION MARTINGALES AND A BASIC APPLICATION TURNER SMITH Abstract. This paper will develop the measure-theoretic approach to probability i order to preset the defiitio of martigales. From there we will apply this

More information

Cluster Validity Measurement Techniques

Cluster Validity Measurement Techniques Cluster Validity Measuremet Techiques Ferec Kovács, Csaba Legáy, Attila Babos Departmet of Automatio ad Applied Iformatics Budapest Uiversity of Techology ad Ecoomics Goldma György tér 3, H- Budapest,

More information

Lecture 2: Karger s Min Cut Algorithm

Lecture 2: Karger s Min Cut Algorithm priceto uiv. F 3 cos 5: Advaced Algorithm Desig Lecture : Karger s Mi Cut Algorithm Lecturer: Sajeev Arora Scribe:Sajeev Today s topic is simple but gorgeous: Karger s mi cut algorithm ad its extesio.

More information

A gentle introduction to Expectation Maximization

A gentle introduction to Expectation Maximization A getle itroductio to Expectatio Maximizatio Mark Johso Brow Uiversity November 2009 1 / 15 Outlie What is Expectatio Maximizatio? Mixture models ad clusterig EM for setece topic modelig 2 / 15 Why Expectatio

More information

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring No-life isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy

More information

TIGHT BOUNDS ON EXPECTED ORDER STATISTICS

TIGHT BOUNDS ON EXPECTED ORDER STATISTICS Probability i the Egieerig ad Iformatioal Scieces, 20, 2006, 667 686+ Prited i the U+S+A+ TIGHT BOUNDS ON EXPECTED ORDER STATISTICS DIMITRIS BERTSIMAS Sloa School of Maagemet ad Operatios Research Ceter

More information

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL.

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL. Auities Uder Radom Rates of Iterest II By Abraham Zas Techio I.I.T. Haifa ISRAEL ad Haifa Uiversity Haifa ISRAEL Departmet of Mathematics, Techio - Israel Istitute of Techology, 3000, Haifa, Israel I memory

More information

Swaps: Constant maturity swaps (CMS) and constant maturity. Treasury (CMT) swaps

Swaps: Constant maturity swaps (CMS) and constant maturity. Treasury (CMT) swaps Swaps: Costat maturity swaps (CMS) ad costat maturity reasury (CM) swaps A Costat Maturity Swap (CMS) swap is a swap where oe of the legs pays (respectively receives) a swap rate of a fixed maturity, while

More information

Regularized Distance Metric Learning: Theory and Algorithm

Regularized Distance Metric Learning: Theory and Algorithm Regularized Distace Metric Learig: Theory ad Algorithm Rog Ji 1 Shiju Wag 2 Yag Zhou 1 1 Dept. of Computer Sciece & Egieerig, Michiga State Uiversity, East Lasig, MI 48824 2 Radiology ad Imagig Scieces,

More information

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

, a Wishart distribution with n -1 degrees of freedom and scale matrix. UMEÅ UNIVERSITET Matematisk-statistiska istitutioe Multivariat dataaalys D MSTD79 PA TENTAMEN 004-0-9 LÖSNINGSFÖRSLAG TILL TENTAMEN I MATEMATISK STATISTIK Multivariat dataaalys D, 5 poäg.. Assume that

More information

Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 11 04/01/2008. Sven Zenker

Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 11 04/01/2008. Sven Zenker Parameter estimatio for oliear models: Numerical approaches to solvig the iverse problem Lecture 11 04/01/2008 Sve Zeker Review: Trasformatio of radom variables Cosider probability distributio of a radom

More information

A Fuzzy Model of Software Project Effort Estimation

A Fuzzy Model of Software Project Effort Estimation TJFS: Turkish Joural of Fuzzy Systems (eissn: 309 90) A Official Joural of Turkish Fuzzy Systems Associatio Vol.4, No.2, pp. 68-76, 203 A Fuzzy Model of Software Project Effort Estimatio Oumout Chouseioglou

More information

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return EVALUATING ALTERNATIVE CAPITAL INVESTMENT PROGRAMS By Ke D. Duft, Extesio Ecoomist I the March 98 issue of this publicatio we reviewed the procedure by which a capital ivestmet project was assessed. The

More information

Determining the sample size

Determining the sample size Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors

More information

On The Comparison of Several Goodness of Fit Tests: With Application to Wind Speed Data

On The Comparison of Several Goodness of Fit Tests: With Application to Wind Speed Data Proceedigs of the 3rd WSEAS It Cof o RENEWABLE ENERGY SOURCES O The Compariso of Several Goodess of Fit Tests: With Applicatio to Wid Speed Data FAZNA ASHAHABUDDIN, KAMARULZAMAN IBRAHIM, AND ABDUL AZIZ

More information

ADAPTIVE NETWORKS SAFETY CONTROL ON FUZZY LOGIC

ADAPTIVE NETWORKS SAFETY CONTROL ON FUZZY LOGIC 8 th Iteratioal Coferece o DEVELOPMENT AND APPLICATION SYSTEMS S u c e a v a, R o m a i a, M a y 25 27, 2 6 ADAPTIVE NETWORKS SAFETY CONTROL ON FUZZY LOGIC Vadim MUKHIN 1, Elea PAVLENKO 2 Natioal Techical

More information

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009) 18.409 A Algorithmist s Toolkit October 27, 2009 Lecture 13 Lecturer: Joatha Keler Scribe: Joatha Pies (2009) 1 Outlie Last time, we proved the Bru-Mikowski iequality for boxes. Today we ll go over the

More information

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5

More information

arxiv:1506.03481v1 [stat.me] 10 Jun 2015

arxiv:1506.03481v1 [stat.me] 10 Jun 2015 BEHAVIOUR OF ABC FOR BIG DATA By Wetao Li ad Paul Fearhead Lacaster Uiversity arxiv:1506.03481v1 [stat.me] 10 Ju 2015 May statistical applicatios ivolve models that it is difficult to evaluate the likelihood,

More information

Finding the circle that best fits a set of points

Finding the circle that best fits a set of points Fidig the circle that best fits a set of poits L. MAISONOBE October 5 th 007 Cotets 1 Itroductio Solvig the problem.1 Priciples............................... Iitializatio.............................

More information

Perfect Packing Theorems and the Average-Case Behavior of Optimal and Online Bin Packing

Perfect Packing Theorems and the Average-Case Behavior of Optimal and Online Bin Packing SIAM REVIEW Vol. 44, No. 1, pp. 95 108 c 2002 Society for Idustrial ad Applied Mathematics Perfect Packig Theorems ad the Average-Case Behavior of Optimal ad Olie Bi Packig E. G. Coffma, Jr. C. Courcoubetis

More information

Trading the randomness - Designing an optimal trading strategy under a drifted random walk price model

Trading the randomness - Designing an optimal trading strategy under a drifted random walk price model Tradig the radomess - Desigig a optimal tradig strategy uder a drifted radom walk price model Yuao Wu Math 20 Project Paper Professor Zachary Hamaker Abstract: I this paper the author iteds to explore

More information

Theorems About Power Series

Theorems About Power Series Physics 6A Witer 20 Theorems About Power Series Cosider a power series, f(x) = a x, () where the a are real coefficiets ad x is a real variable. There exists a real o-egative umber R, called the radius

More information

Lesson 17 Pearson s Correlation Coefficient

Lesson 17 Pearson s Correlation Coefficient Outlie Measures of Relatioships Pearso s Correlatio Coefficiet (r) -types of data -scatter plots -measure of directio -measure of stregth Computatio -covariatio of X ad Y -uique variatio i X ad Y -measurig

More information

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics

More information

Chapter 7: Confidence Interval and Sample Size

Chapter 7: Confidence Interval and Sample Size Chapter 7: Cofidece Iterval ad Sample Size Learig Objectives Upo successful completio of Chapter 7, you will be able to: Fid the cofidece iterval for the mea, proportio, ad variace. Determie the miimum

More information

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical

More information

WHEN IS THE (CO)SINE OF A RATIONAL ANGLE EQUAL TO A RATIONAL NUMBER?

WHEN IS THE (CO)SINE OF A RATIONAL ANGLE EQUAL TO A RATIONAL NUMBER? WHEN IS THE (CO)SINE OF A RATIONAL ANGLE EQUAL TO A RATIONAL NUMBER? JÖRG JAHNEL 1. My Motivatio Some Sort of a Itroductio Last term I tought Topological Groups at the Göttige Georg August Uiversity. This

More information

Confidence Intervals for One Mean

Confidence Intervals for One Mean Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a

More information

Data-Enhanced Predictive Modeling for Sales Targeting

Data-Enhanced Predictive Modeling for Sales Targeting Data-Ehaced Predictive Modelig for Sales Targetig Saharo Rosset Richard D. Lawrece Abstract We describe ad aalyze the idea of data-ehaced predictive modelig (DEM). The term ehaced here refers to the case

More information

Lesson 15 ANOVA (analysis of variance)

Lesson 15 ANOVA (analysis of variance) Outlie Variability -betwee group variability -withi group variability -total variability -F-ratio Computatio -sums of squares (betwee/withi/total -degrees of freedom (betwee/withi/total -mea square (betwee/withi

More information

1 Computing the Standard Deviation of Sample Means

1 Computing the Standard Deviation of Sample Means Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.

More information

A Mathematical Perspective on Gambling

A Mathematical Perspective on Gambling A Mathematical Perspective o Gamblig Molly Maxwell Abstract. This paper presets some basic topics i probability ad statistics, icludig sample spaces, probabilistic evets, expectatios, the biomial ad ormal

More information

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

Research Method (I) --Knowledge on Sampling (Simple Random Sampling) Research Method (I) --Kowledge o Samplig (Simple Radom Samplig) 1. Itroductio to samplig 1.1 Defiitio of samplig Samplig ca be defied as selectig part of the elemets i a populatio. It results i the fact

More information

Unit 20 Hypotheses Testing

Unit 20 Hypotheses Testing Uit 2 Hypotheses Testig Objectives: To uderstad how to formulate a ull hypothesis ad a alterative hypothesis about a populatio proportio, ad how to choose a sigificace level To uderstad how to collect

More information

LIMIT DISTRIBUTION FOR THE WEIGHTED RANK CORRELATION COEFFICIENT, r W

LIMIT DISTRIBUTION FOR THE WEIGHTED RANK CORRELATION COEFFICIENT, r W REVSTAT Statistical Joural Volume 4, Number 3, November 2006, 189 200 LIMIT DISTRIBUTION FOR THE WEIGHTED RANK CORRELATION COEFFICIENT, r W Authors: Joaquim F. Pito da Costa Dep. de Matemática Aplicada,

More information

CHAPTER 3 DIGITAL CODING OF SIGNALS

CHAPTER 3 DIGITAL CODING OF SIGNALS CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity

More information

Linear Algebra II. 4 Determinants. Notes 4 1st November Definition of determinant

Linear Algebra II. 4 Determinants. Notes 4 1st November Definition of determinant MTH6140 Liear Algebra II Notes 4 1st November 2010 4 Determiats The determiat is a fuctio defied o square matrices; its value is a scalar. It has some very importat properties: perhaps most importat is

More information

Universal coding for classes of sources

Universal coding for classes of sources Coexios module: m46228 Uiversal codig for classes of sources Dever Greee This work is produced by The Coexios Project ad licesed uder the Creative Commos Attributio Licese We have discussed several parametric

More information

THIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E. MCCARTHY, SANDRA POTT, AND BRETT D. WICK

THIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E. MCCARTHY, SANDRA POTT, AND BRETT D. WICK THIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E MCCARTHY, SANDRA POTT, AND BRETT D WICK Abstract We provide a ew proof of Volberg s Theorem characterizig thi iterpolatig sequeces as those for

More information

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis Ruig Time ( 3.) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.

More information

5 Boolean Decision Trees (February 11)

5 Boolean Decision Trees (February 11) 5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected

More information

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13 EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may

More information