Kernel Mean Estimation and Stein Effect


 Jonah Hood
 1 years ago
 Views:
Transcription
1 Krikamol Muadet Empirical Iferece Departmet, Max Plack Istitute for Itelliget Systems, Tübige, Germay Keji Fukumizu The Istitute of Statistical Mathematics, Tokyo, Japa Bharath Sriperumbudur Statistical Laboratory, Uiversity of Cambridge, Cambridge, Uited Kigdom Arthur Gretto Gatsby Computatioal Neurosciece Uit, Uiversity College Lodo, Lodo, Uited Kigdom Berhard Schölkopf Empirical Iferece Departmet, Max Plack Istitute for Itelliget Systems, Tübige, Germay Abstract A mea fuctio i a reproducig kerel Hilbert space (RKHS), or a kerel mea, is a importat part of may algorithms ragig from kerel pricipal compoet aalysis to Hilbertspace embeddig of distributios. Give a fiite sample, a empirical average is the stadard estimate for the true kerel mea. We show that this estimator ca be improved due to a wellkow pheomeo i statistics called Stei s pheomeo. After cosideratio, our theoretical aalysis reveals the existece of a wide class of estimators that are better tha the stadard oe. Focusig o a subset of this class, we propose efficiet shrikage estimators for the kerel mea. Empirical evaluatios o several applicatios clearly demostrate that the proposed estimators outperform the stadard kerel mea estimator.. Itroductio This paper aims to improve the estimatio of the mea fuctio i a reproducig kerel Hilbert space (RKHS) from a fiite sample. A kerel mea of a probability distributio P over a measurable space X is defied by µ P k(x, ) dp(x) H, () X Proceedigs of the st Iteratioal Coferece o Machie Learig, Beijig, Chia,. JMLR: W&CP volume. Copyright by the author(s). wherehis a RKHS associated with a reproducig kerel k : X X R. Coditios esurig that this expectatio exists are give i Smola et al. (7). Ufortuately, it is ot practical to compute µ P directly because the distributio P is usually ukow. Istead, give a i.i.d sample x,x,...,x from P, we ca easily compute the empirical kerel mea by the average µ P k(x i, ). () The estimate µ P is the most commoly used estimate of the true kerel mea. Our primary iterest here is to ivestigate whether oe ca improve upo this stadard estimator. The kerel mea has recetly gaied attetio i the machie learig commuity, thaks to the itroductio of Hilbert space embeddig for distributios (Berliet ad Aga, ; Smola et al., 7). Represetig the distributio as a mea fuctio i the RKHS has several advatages: ) the represetatio with appropriate choice of kerel k has bee show to preserve all iformatio about the distributio (Fukumizu et al., ; Sriperumbudur et al., ; ); ) basic operatios o the distributio ca be carried out by meas of ier products i RKHS, e.g., E P [f(x)] = f,µ P H for all f H; ) o itermediate desity estimatio is required, e.g., whe testig for homogeeity from fiite samples. As a result, may algorithms have beefited from the kerel mea represetatio, amely, maximum mea discrepacy (MMD) (Gretto et al., 7), kerel depedecy measure (Gretto et al., ), kerel twosampletest (Gretto et al., ), Hilbert space embeddig of HMMs (Sog et al., ), ad kerel Bayes rule
2 (Fukumizu et al., ). Their performaces rely directly o the quality of the empirical estimate µ P. However, it is of great importace, especially for our readers who are ot familiar with kerel methods, to realize a more fudametal role of the kerel mea. It basically serves as a foudatio to most kerelbased learig algorithms. For istace, oliear compoet aalyses, such as kerel PCA, kerel FDA, ad kerel CCA, rely heavily o mea fuctios ad covariace operators i RKHS (Schölkopf et al., 99). The kerel kmeas algorithm performs clusterig i feature space usig mea fuctios as the represetatives of the clusters (Dhillo et al., ). Moreover, it also serves as a basis i early developmet of algorithms for classificatio ad aomaly detectio (ShaweTaylor ad Cristiaii,, chap. ). All of those employ () as the estimate of the true mea fuctio. Thus, the fact that substatial improvemet ca be gaied whe estimatig () may i fact raise a widespread suspicio o traditioal way of learig with kerels. We show i this work that the stadard estimator () is, i a certai sese, ot optimal, i.e., there exist better estimators (more below). I additio, we propose shrikage estimators that outperform the stadard oe. At first glace, it was defiitely couterituitive ad surprisig for us, ad will udoubtedly also be for some of our readers, that the empirical kerel mea could be improved, ad, give the simplicity of the proposed estimators, that this has remaied uoticed util ow. Oe of the reasos may be that there is a commo belief that the estimator ˆµ P already gives a good estimate ofµ P, ad, as sample size goes to ifiity, the estimatio error disappears (ShaweTaylor ad Cristiaii, ). As a result, o eed is felt to improve the kerel mea estimatio. However, give a fiite sample, substatial improvemet is i fact possible ad several factors may come ito play, as will be see later i this work. This work was partly ispired by Stei s semial work i 9, which showed that a maximum likelihood estimator (MLE), i.e., the stadard empirical mea, for the mea of the multivariate Gaussia distributio N(θ,σ I) is iadmissible (Stei, 9). That is, there exists a estimator that always achieves smaller total mea squared error regardless of the true θ, whe the dimesio is at least. Perhaps the best kow estimator of such kid is James Steis estimator (James ad Stei, 9). Iterestigly, the JamesStei estimator is itself iadmissible, ad there exists a wide class of estimators that outperform the MLE, see e.g., Berger (97). However, our work differs fudametally from the Stei s semial works ad those alog this lie i two aspects. First, our settig is oparametric i a sese that we do ot assume ay parametric form of the distributio, whereas most of traditioal works focus o some specific distributios, e.g., Gaussia distributio. Secod, our settig ivolves a oliear feature map ito a highdimesioal space, if ot ifiite. As a result, higher momets of the distributio may come ito play. Thus, oe caot adopt Stei s settig straightforwardly. A direct geeralizatio of JamesStei estimator to ifiitedimesioal Hilbert space has already bee cosidered (Berger ad Wolpert, 9; Madelbaum ad Shepp, 97; Privault ad Rveillac, ). I those works, θ which is the parameter to be estimated is assumed to be the mea of a Gaussia measure o the Hilbert space from which samples are draw. I our case, o the other had, the samples are draw from P ad ot from the Gaussia distributio whose mea isµ P. The cotributio of this paper ca be summarized as follows: First, we show that the stadard kerel mea estimator ca be improved by providig a alterative estimator that achieves smaller risk ( ). The theoretical aalysis reveals the existece of a wide class of estimators that are better tha the stadard. To this ed, we propose i a kerel mea shrikage estimator (KMSE), which is based o a ovel motivatio for regularizatio through the otio of shrikage. Moreover, we propose a efficiet leaveoeout crossvalidatio procedure to select the shrikage parameter, which is ovel i the cotext of kerel mea estimatio. Lastly, we demostrate the beefit of the proposed estimators i several applicatios ( ).. Motivatio: Shrikage Estimators For a arbitrary distributio P, deote by µ ad µ the true kerel mea ad its empirical estimate () from the i.i.d. sample x,x,...,x P (we remove the subscript for ease of otatio). The most atural loss fuctio cosidered i this work is l(µ, µ) = µ µ H. A estimator µ is a mappig which is measurable w.r.t. the Borel σalgebra of H ad is evaluated by its risk fuctior(µ, µ) = E P [l(µ, µ)] wheree P idicates expectatio over the choice of i.i.d. sample of sizefromp. Let us cosider a alterative kerel mea estimator: µ α αf + ( α) µ where α < ad f H. It is essetially a shrikage estimator that shriks the stadard estimator toward a fuctio f by a amout specified by α. If α =, µ α reduces to the stadard estimator µ. The followig theorem asserts that the risk of shrikage estimator µ α is smaller tha that of stadard estimator µ give a appropriate choice of α, regardless of the fuctio f (more below). Theorem. For all distributiospad the kerel k, there existsα > for which R(µ, µ α ) < R(µ, µ). Proof. The risk of the stadard kerel mea estimator satisfies E µ µ = (E[k(x,x)] E[k(x, x)]) =:
3 where x is a idepedet copy of x. Let us defie the risk of the proposed shrikage estimator by α := E µ α µ where α is a oegative shrikage parameter. We ca the write this i terms of the stadard risk as α = αe µ µ, µ µ+µ f + α E f α E[f (x)] + α E µ. It follows from the reproducig property of H that E[f (x)] = f,µ. Moreover, usig the fact that E µ = E µ µ+µ = + E[k(x, x)], we ca simplify the shrikage risk by α = α ( + f µ ) α +. Thus, we have α = α ( + f µ ) α which is opositive where [ ] α, + f µ () ad miimized at α = /( + f µ ). As we ca see i (), there is a rage ofαfor which a opositive α, i.e., R(µ, µ α ) R(µ, µ), is guarateed. However, Theorem relies o the importat assumptio that the true kerel mea of the distributiopis required to estimate α. I spite of this, the theorem has a importat implicatio suggestig that the shrikage estimator µ α ca improve upo µ if α is chose appropriately. Later, we will exploit this result i order to costruct more practical estimators. Remark. The followig observatios follow immediately from Theorem : The shrikage estimator always improves upo the stadard oe regardless of the directio of shrikage, as specified by f. I other words, there exists a wide class of kerel mea estimators that are better tha the stadard oe. The value of α also depeds o the choice of f. The furtherf is fromµ, the smallerαbecomes. Thus, the shrikage gets smaller if f is chose such that it is far from the true kerel mea. This effect is aki to JamesStei estimator. The improvemet ca be viewed as a biasvariace tradeoff: the shrikage estimator reduces variace substatially at the expese of a little bias. Remark sheds light o how oe ca practically costruct the shrikage estimator: we ca choose f arbitrarily as log as the parameter α is chose appropriately. Moreover, further improvemet ca be gaied by icorporatig prior kowledge as to the locatio of µ P, which ca be straightforwardly itegrated ito the framework via f (Berger ad Wolpert, 9). Ispired by JamesStei estimator, we focus o f =. We will ivestigate the effect of differet prior f i future works.. Kerel Mea Shrikage Estimator I this sectio we give a ovel formulatio of kerel mea estimator that allows us to estimate the shrikage parameter efficietly. I the followig, let φ : X H be a feature map associated with the kerel k ad, be a ier product i the RKHSHsuch thatk(x,x ) = φ(x),φ(x ). Uless stated otherwise, deotes the RKHS orm. The kerel mea µ P ad its empirical estimate µ P ca be obtaied as a miimizer of the loss fuctioals E(g) E x P φ(x) g, Ê(g) φ(x i ) g, respectively. We will call the estimator miimizig the loss fuctioal Ê(g) a kerel mea estimator (KME). Note that the losse(g) is differet from the oe cosidered i, i.e., l(µ,g) = µ g = E[φ(x)] g. Nevertheless, we havel(µ,g) = E xx k(x,x ) E x g(x)+ g. SiceE(g) = E x k(x,x) E x g(x)+ g, the lossl(µ,g) differs frome(g) oly bye x k(x,x) E xx k(x,x ) which is ot a fuctio of g. We itroduce the ew form here because it will give a more tractable crossvalidatio computatio (.). I spite of this, the resultig estimators are always evaluated w.r.t. the loss i (cf..). From the formulatio above, it is atural to ask if miimizig the regularized versio of Ê(g) will give better estimator. O the oe had, oe ca argue that, ulike i the classical risk miimizatio, we do ot really eed a regularizer here. The stadard estimator () is kow to be, i a certai sese, optimal ad ca be estimated reliably (ShaweTaylor ad Cristiaii,, prop..). Moreover, the origial formulatio ofê(g) is a wellposed problem. O the other had, sice regularizatio may be viewed as shrikig the solutio toward zero, it ca actually improve the kerel mea estimatio, as suggested by Theorem (cf. discussios at the ed of ). Cosequetly, we miimize a modified loss fuctioal Ê (g) Ê(g)+Ω( g ) = φ(x i ) g +Ω( g ), () whereω( ) deotes a mootoicallyicreasig regularizatio fuctioal ad is a oegative regularizatio parameter. I what follows, we refer to the shrikage estimator µ miimizig Ê(g) as a kerel mea shrikage estimator (KMSE). The parameters α ad play similar role as a shrikage parameter. They specify a amout by which the stadard estimator µ is shruk toward f =. Thus, the term shrikage parameter ad regularizatio parameter will be used iterchageably.
4 It follows from the represeter theorem thatg lies i a subspace spaed by the data, i.e., g = j= β jφ(x j ) for some β R. By cosiderig Ω( g ) = g, we ca rewrite () as φ(x i) β j φ(x j ) + β j φ(x j ) j= j= = β Kβ β K +β Kβ +c, () wherecis a costat term,kis a Gram matrix such that K ij = k(x i,x j ), ad = [/,/,...,/]. Takig a derivative of () w.r.t. β ad settig it to zero yield β = (/( + )). By settig α = /( + ) the shrikage estimate ca be writte as µ = ( α) µ. Sice < α <, the estimator µ correspods to a shrikage estimator discussed i whe f =. We call this estimator a simple kerel mea shrikage estimator (SKMSE). Usig the expasio g = j= β jφ(x j ), we may cosider whe the regularizatio fuctioal is writte i term of β, e.g., β β. This leads to a particularly iterestig kerel mea estimator. I this case, the optimal weight vector is give by β = (K + I) K ad the shrikage estimate ca be writte accordigly as µ = j= β jφ(x j ) = Φ (K + I) K where Φ = [φ(x ),φ(x ),...,φ(x )]. Ulike the SKMSE, this estimator shriks the usual estimate differetly i each coordiate (cf. Theorem ). Hece, we will call it a flexible kerel mea shrikage estimator (FKMSE). The followig theorem characterizes the FKMSE as a shrikage estimator. Theorem. The FKMSE ca be writte as µ = γ i γ µ,v i+ i v i where {γ i,v i } are eigevalue ad eigevector pairs of the empirical covariace operator Ĉ xx ih. I words, the effect of FKMSE is to reduce high frequecy compoets of the expasio of µ, by expadig this i terms of the kerel PCA basis ad shrikig the coefficiets of the high order eigefuctios, e.g., see Rasmusse ad Williams (, sec..). Note that the covariace operator Ĉxx itself does ot deped o. As we ca see, the solutio to the regularized versio is ideed of the form of shrikage estimators whe f =. That is, both SKMSE ad FKMSE shrik the stadard kerel mea estimate towards zero. The differece is that the SKMSE shriks equally i all coordiate, whereas the FKMSE also costraits the amout of shrikage by the iformatio cotaied i each coordiate. Moreover, the squared RKHS orm ca be decomposed as a sum of squared loss weighted by the eigevalues γ i (cf. Madelbaum ad Shepp (97, appedix)). By the same reasoig as Stei s result i fiitedimesioal case, oe would suspect that a improvemet of shrikage estimators i H should also deped o how fast the eigevalues of k decay. That is, oe would expect greater improvemet if the values ofγ i decay very slowly. For example, the Gaussia RBF kerel with larger badwidth gives smaller improvemet whe compared to oe with smaller badwidth. Similarly, we should expect to see more improvemet whe applyig a Laplacia kerel tha whe usig a Gaussia RBF kerel. I some applicatios of kerel mea embeddig, oe may wat to iterpret the weight β as a probability vector (Nishiyama et al., ). However, the weight vector β output by our estimators is i geeral ot ormalized. I fact, all elemets will be smaller tha / as a result of shrikage. However, oe may impose a costrait that β must sum to oe ad resort to a quadratic programmig (Sog et al., ). Ufortuately, this approach has udesirable effect of sparsity which is ulikely to improve upo the stadard estimator. Postormalizig the weights ofte deteriorates the estimatio performace. To the best of our kowledge, o previous attempt has bee made to improve the kerel mea estimatio. However, we discuss some closely related works here. For example, istead of the loss fuctioal Ê(g), Kim ad Scott () cosider a robust loss fuctio such as the Huber s loss to reduce the effect of outliers. The authors cosider kerel desity estimators, which differ fudametally from kerel mea estimators. They eed to reduce the kerel badwidth with icreasig sample size for the estimators to be cosistet. Regularized versio of MMD was adopted by Daafar et al. () i the cotext of kerelbased hypothesis testig. The resultig formulatio resembles our SKMSE. Furthermore, the FKMSE is of a similar form as the coditioal mea embeddig used i Grüewälder et al. (), which ca be viewed more geerally as a regressio problem i RKHS with smooth operators (Grüewälder et al., )... Choosig Shrikage Parameter As discussed i, the amout of shrikage plays a importat role i our estimators. I this work we propose to select the shrikage parameter by a automatic leaveoeout crossvalidatio. For a give shrikage parameter, let us cosider the observatiox i as beig a ew observatio by omittig it from the dataset. Deote by µ ( i) = j i β( i) j φ(x j ) the kerel mea estimated from the remaiig data, usig the valueas a shrikage parameter, so thatβ ( i) is the miimizer ofê( i) (g). We will measure the quality of µ ( i) by how well it approximates φ(x i ). The overall quality of the
5 estimate is quatified by the crossvalidatio score LOOCV() = φ(x i ) µ ( i) H. () By simple algebra, it is ot difficult to show that the optimal shrikage parameter of SKMSE ca be calculated aalytically, as stated by the followig theorem. Theorem. Let ρ j= k(x i,x j ) ad k(x i,x i ). The shrikage parameter = ( ρ)/(( )ρ+ / ) of the SKMSE is the miimizer of LOOCV(). O the other had, fidig the optimalfor the FKMSE is relatively more ivolved. Evaluatig the score () aïvely requires oe to solve for µ ( i) explicitly for every i. Fortuately, we ca simplify the score such that it ca be evaluated efficietly, as stated i the followig theorem. Theorem. The LOOCV score of FKMSE satisfies LOOCV() = (β K K i ) C (β K K i ) where β is the weight vector calculated from the full dataset with the shrikage parameter ad C = (K K(K+I) K) K(K K(K+I) K). Proof of Thorem. For fixed ad i, let µ ( i) be the leaveoeout kerel mea estimate of FKMSE ad let A (K + I). The, we ca write a expressio for the deleted residual as ( i) := µ ( i) φ(x i ) = µ φ(x i ) + j= l= A jl φ(x l ), µ ( i) φ(x i ) φ(x j ). Sice ( i) lies i a subspace spaed by the sample φ(x ),...,φ(x ), we have ( i) = k= ξ kφ(x k ) for some ξ R. Substitutig ( i) back yields k= ξ kφ(x k ) = µ φ(x i ) + j= {AKξ} jφ(x j ). By takig the ier product o both sides w.r.t. the sample φ(x ),...,φ(x ) ad solvig for ξ, we have ξ = (K KAK) (β K K i ) wherek i is theith colum of K. Cosequetly, the leaveoeout score of the sample x i ca be computed by ( i) = ξ Kξ = (β K K i ) (K KAK) K(K KAK) (β K K i ) = (β K K i ) C (β K K i ). Averagig ( i) over all samples gives LOOCV() = ( i) = (β K K i ) C (β K K i ), as required. It is iterestig to see that the leaveoeout crossvalidatio score i Theorem depeds oly o the oleaveoeout solutio β, which ca be obtaied as a byproduct of the algorithm. Computatioal complexity The SKMSE requires O( ) operatios to select shrikage parameter. For the FKMSE, there are two steps i crossvalidatio. First, we eed to compute (K + I) repeatedly for differet values of. Assume that we kow the eigedecompositio K = UDU where D is diagoal with d ii ad UU = I. It follows that (K+I) = U(D+I) U. Cosequetly, solvig for β takes O( ) operatios. Sice eigedecompositio requires O( ) operatios, fidig β for may s is essetially free. A lowrak approximatio ca also be adopted to reduce the computatioal cost further. Secod, we eed to compute the crossvalidatio score (). As show i Theorem, we ca compute it usig oly β obtaied from the previous step. The calculatio of C ca be simplified further via the eigedecompositio of K as C = U(D D(D+I) D) D(D D(D+ I) D) U. Sice it oly ivolves the iverse of diagoal matrices, the iversio ca be evaluated i O() operatios. The overall computatioal complexity of the crossvalidatio requires oly O( ) operatios, as opposed to the aïve approach that requires O( ) operatios. Whe performed as a byproduct of the algorithm, the computatioal cost of crossvalidatio procedure becomes egligible as the dataset becomes larger. I practice, we use the fmisearch ad fmibd routies of the MATLAB optimizatio toolbox to fid the best shrikage parameter... Covariace Operators The covariace operator fromh X toh Y ca be viewed as a mea fuctio i a product space H X H Y. Hece, we ca also costruct a shrikage estimator of covariace operator i RKHS. Let (H X,k X ) ad (H Y,k Y ) be the RKHS of fuctios o measurable space X ad Y, respectively, with p.d. kerel k X ad k Y (with feature map φ ad ϕ). We will cosider a radom vector (X,Y) : Ω X Y with distributio P XY, with P X ad P Y as margial distributios. Uder some coditios, there exists a uique crosscovariace operator Σ YX : H X H Y such that g,σ YX f HY = E XY [(f(x) E X [f(x)])(g(y) E Y [g(y)])] = Cov(f(X),g(Y)) holds for all f H X ad g H Y (Fukumizu et al., ). If X equals Y, we get the selfadjoit operatorσ XX called the covariace operator. Give a i.i.d sample from P XY writte as (x,y ),(x,y ),...,(x,y ), we ca write the empirical crosscovariace operator as Σ YX := φ(x i) ϕ(y i ) µ X µ Y where µ X = φ(x i) ad µ Y = ϕ(y i). Let φ ad ϕ be the cetered feature maps of φ ad ϕ, respectively. The, it ca be rewritte as Σ YX := φ(x i ) ϕ(y i ) H X H Y. It follows from the ier product property i product space that φ(x) ϕ(y), φ(x ) ϕ(y ) HX H Y = φ(x), φ(x ) HX ϕ(y), ϕ(y ) HY = k X (x,x ) k Y (y,y ).
6 =. γ = γ = γ =. γ x = γ x 9 7 = γ x =. γ x = γ x = γ x =. γ = γ = γ (a) LIN (b) POLY (c) POLY (d) RBF Figure. The average loss of KME (left), SKMSE (middle), ad FKMSE (right) estimators with differet values of shrikage parameter. Iside boxes correspod to estimators. We repeat the experimets over differet distributios with = ad d =. The, we ca obtai the shrikage estimators for the covariace operator by pluggig the kerel k((x,y),(x,y )) = k X (x,x ) k Y (y,y ) i our KM SEs. We will call this estimator a covariaceoperator shrikage estimator (COSE). The same trick ca be easily geeralized to tesors of higher order, which have bee previously used, for example, i Sog et al. ().. Experimets We focus o the compariso betwee our shrikage estimators ad the stadard estimator of the kerel mea usig both sythetic datasets ad realworld datasets... Sythetic Data Give the true datageeratig distributio P, we evaluate differet estimators usig the loss fuctio l(β) β ik(x i, ) E P [k(x, )] H where β is the weight vector associated with differet estimators. To allow for a exact calculatio of l(β), we cosider whe P is a mixtureofgaussias distributio ad k is the followig kerel fuctio: ) liear kerel k(x,x ) = x x ; ) polyomial degree kerel k(x,x ) = (x x + ) ; ) polyomial degree kerel k(x,x ) = (x x + ) ; ad ) Gaussia RBF kerel k(x,x ) = exp ( x x /σ ). We will refer to them as LIN, POLY, POLY, ad RBF, respectively. Experimetal protocol. Data are geerated from a d dimesioal mixture of Gaussias: x π i N(θ i,σ i )+ε, θ ij U(,), Σ i W( I d,7), ε N(,. I d ), where U(a,b) ad W(Σ,df) represet the uiform distributio ad Wishart distributio, respectively. We set π = [.,.,.,.]. The choice of parameters here is quite arbitrary; we have experimeted usig various parameter settigs ad the results are similar to those preseted here. For the Gaussia RBF kerel, we set the badwidth parameter to squareroot of the media Euclidea distace betwee samples i the dataset (i.e., σ = media { x i x j } throughout). Figure shows the average loss of differet estimators usig differet kerels as we icrease the value of shrikage parameter. Here we scale the shrikage parameter by the miimum ozero eigevalue γ of kerel matrix K. I geeral, we fid SKMSE ad FKMSE ted to outperform KME. However, as becomes large, there are some cases where shrikage deteriorates the estimatio performace, e.g., see LIN kerel ad some outliers i the figures. This suggests that it is very importat to choose the parameter appropriately (cf. the discussio i ). Similarly, Figure depicts the average loss as we vary the sample size ad dimesio of the data. I this case, the shrikage parameter is chose by the proposed leaveoeout crossvalidatio score. As we ca see, both SKMSE Average Loss Average Loss LIN Sample Size (d=) 7 LIN Dimesio (=).... x POLY Sample Size (d=) x POLY Dimesio (=). x POLY.. Sample Size (d=) x POLY Dimesio (=) RBF KME S KMSE F KMSE Sample Size (d=) RBF Dimesio (=) Figure. The average loss over differet distributios of KME, SKMSE, ad FKMSE with varyig sample size () ad dimesio (d). The shrikage parameter is chose by LOOCV.
7 Table. Average egative loglikelihood of the model Q o test poits over radomizatios. The boldface represets the result whose differece from the baselie, i.e., KME, is statistically sigificat. Dataset LIN POLY POLY RBF KME SKMSE FKMSE KME SKMSE FKMSE KME SKMSE FKMSE KME SKMSE FKMSE. ioosphere soar australia specft wdbc wie satimage segmet vehicle svmguide vowel housig bodyfat abaloe glass ad FKMSE outperform the stadard KME. The SKMSE performs slightly better tha the FKMSE. Moreover, the improvemet is more substatial i the large d, small paradigm. I the worst cases, the SKMSE ad FKMSE perform as well as the KME. Lastly, it is istructive to ote that the improvemet varies with the choice of kerel k. Briefly, the choice of kerel reflects the dimesioality of feature space H. Oe would expect more improvemet i highdimesioal space, e.g., RBF kerel, tha the lowdimesioal, e.g., liear kerel (cf. discussios at the ed of ). This pheomeo ca be observed i both Figure ad... Real Data We cosider three bechmark applicatios: desity estimatio via kerel mea matchig (Sog et al., ), kerel PCA usig shrikage mea ad covariace operator (Schölkopf et al., 99), ad discrimiative learig o distributios (Muadet ad Schölkopf, ; Muadet et al., ). For the first two tasks we employ datasets from the UCI repositories. We use oly realvalued features, each of which is ormalized to have zero mea ad uit variace. Desity estimatio. We perform desity estimatio via kerel mea matchig (Sog et al., ). That is, we fit the desity Q = m j= π jn(θ j,σj I) to each dataset by miimizig µ µ Q H s.t. m j= π j =. The kerel mea µ is obtaied from the samples usig differet estimators, whereas µ Q is the kerel mea embeddig of the desity Q. Ulike experimets i Sog et al. (), our goal is to compare differet estimators of µ P where P is the true data distributio. That is, we replace ˆµ with a versio obtaied via shrikage. A better estimate ofµ P should lead to better desity estimatio, as measured by the egative loglikelihood of Q o the test set. We use % of the dataset as a test set. We set m = for each dataset. The model is iitialized by ruig radom iitializatios usig the kmeas algorithm ad returig the best. We repeat the experimets times ad perform the paired sig test o the results at the % sigificace level. The average egative loglikelihood of the model Q, optimized via differet estimators, is reported i Table. Clearly, both SKMSE ad FKMSE cosistetly achieve smaller egative loglikelihood whe compared to KME. There are however few cases i which KME outperforms the proposed estimators, especially whe the dataset is relatively large, e.g., satimage ad abaloe. We suspect that i those cases the stadard KME already provides a accurate estimate of the kerel mea. To get a better estimate, more effort is required to optimize for the shrikage parameter. Moreover, the improvemet across differet kerels is cosistet with results o the sythetic datasets. Kerel PCA. I this experimet, we perform the KPCA usig differet estimates of the mea ad covariace operators. We compare the recostructio error E proj (z) = φ(z) Pφ(z) o test samples wherepis the projectio costructed from the first pricipal compoets. We use a Gaussia RBF kerel for all datasets. We compare differet scearios: ) stadard KPCA; ) shrikage ceterig with SKMSE; ) shrikage ceterig with FKMSE; ) KPCA with SCOSE; ad ) KPCA with FCOSE. To perform KPCA o shrikage covariace operator, we solve the geeralized eigevalue problem K c BK c V = K c VD where B = diag(β) ad K c is the cetered Gram matrix. The weight vector β is obtaied from shrikage estimators usig the kerel matrix K c K c where deotes the Hadamard product. We use % of the dataset as a test set. The paired sig test is a oparametric test that ca be used to examie whether two paired samples have the same distributio. I our case, we compare SKMSE ad FKMSE agaist KME.
8 KME S KMSE F KMSE S COSE F COSE recostructio error.... ioosphere soar australia specft wdbc wie satimage segmet vehicle svmguide vowel housig bodyfat abaloe glass Figure. The average recostructio error of KPCA o holdout test samples over repetitios. The KME represets the stadard approach, whereas SKMSE ad FKMSE use shrikage meas to perform ceterig. The SCOSE ad FCOSE directly use the shrikage estimate of the covariace operator. Figure illustrates the results of KPCA. Clearly, the S COSE ad FCOSE cosistetly outperforms all other estimators. Although we observe a improvemet of SKMSE ad FKMSE over KME, it is very small compared to that of SCOSE ad FCOSE. This makes sese ituitively, sice chagig the mea poit or shiftig data does ot chage the covariace structure cosiderably, so it will ot sigificatly affect the recostructio error. Table. The classificatio accuracy of SMM ad the area uder ROC curve (AUC) of OCSMM usig differet kerel mea estimators to costruct the kerel o distributios. Estimator Liear Noliear SMM OCSMM SMM OCSMM KME SKMSE FKMSE Discrimiative learig o distributios. A positive semidefiite kerel betwee distributios ca be defied via their kerel mea embeddigs. That is, give a traiig sample ( P,y ),...,( P m,y m ) P {,+} where P i := k= δ x i ad xi k k P i, the liear kerel betwee two distributios is approximated by µ Pi, µ Pj = k= βi k φ(xi k ), l= βj l φ(xj l ) = k,l= βi k βj l k(xi k,xj l ). The weight vectors βi ad β j come from the kerel mea estimates of µ Pi ad µ Pj, respectively. The oliear kerel ca the be defied accordigly, e.g., κ(p i,p j ) = exp( µ Pi µ Pj H /σ ). Our goal i this experimet is to ivestigate if the shrikage estimate of the kerel mea improves the performace of the discrimiative learig o distributios. To this ed, we coduct experimets o atural scee categorizatio usig support measure machie (SMM) (Muadet et al., ) ad group aomaly detectio o a higheergy physics dataset usig oeclass SMM (OC SMM) (Muadet ad Schölkopf, ). We use both liear ad oliear kerels where the Gaussia RBF kerel is employed as a embeddig kerel (Muadet et al., ). All hyperparameters are chose by fold crossvalidatio. For our usupervised problem, we repeat the experimets usig several parameter settigs ad report the best results. Table reports the classificatio accuracy of SMM ad the area uder ROC curve (AUC) of OCSMM usig differet kerel mea estimators. Both shrikage estimators cosistetly lead to better performace o both SMM ad OC SMM whe compared to KME. To summarize, we fid sufficiet evidece to coclude that both SKMSE ad FKMSE outperforms the stadard KME. The performace of SKMSE ad FKMSE is very competitive. The differece depeds o the dataset ad the kerel fuctio.. Coclusios To coclude, we show that the commoly used kerel mea estimator ca be improved. Our theoretical result suggests that there exists a wide class of kerel mea estimators that are better tha the stadard oe. To demostrate this, we focus o two efficiet shrikage estimators, amely, simple ad flexible kerel mea shrikage estimators. Empirical study clearly shows that the proposed estimators outperform the stadard oe i various scearios. Most importatly, the shrikage estimates ot oly provide more accurate estimatio, but also lead to superior performace o realworld applicatios. Ackowledgmets The authors wish to thak David Hogg ad Ross Fedely for readig the first draft ad aoymous reviewers who gave valuable suggestio that has helped to improve the mauscript.
Chapter 7 Methods of Finding Estimators
Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of
More informationModified Line Search Method for Global Optimization
Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o
More information7. Sample Covariance and Correlation
1 of 8 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 7. Sample Covariace ad Correlatio The Bivariate Model Suppose agai that we have a basic radom experimet, ad that X ad Y
More informationLECTURE 13: Crossvalidation
LECTURE 3: Crossvalidatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Threeway data partitioi Itroductio to Patter Aalysis Ricardo GutierrezOsua Texas A&M
More informationProperties of MLE: consistency, asymptotic normality. Fisher information.
Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout
More informationHypothesis testing. Null and alternative hypotheses
Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate
More informationI. Chisquared Distributions
1 M 358K Supplemet to Chapter 23: CHISQUARED DISTRIBUTIONS, TDISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad tdistributios, we first eed to look at aother family of distributios, the chisquared distributios.
More informationResearch Article Sign Data Derivative Recovery
Iteratioal Scholarly Research Network ISRN Applied Mathematics Volume 0, Article ID 63070, 7 pages doi:0.540/0/63070 Research Article Sig Data Derivative Recovery L. M. Housto, G. A. Glass, ad A. D. Dymikov
More informationOutput Analysis (2, Chapters 10 &11 Law)
B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should
More informationA probabilistic proof of a binomial identity
A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two
More informationTHE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n
We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample
More informationDepartment of Computer Science, University of Otago
Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS200609 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly
More information3. Covariance and Correlation
Virtual Laboratories > 3. Expected Value > 1 2 3 4 5 6 3. Covariace ad Correlatio Recall that by takig the expected value of various trasformatios of a radom variable, we ca measure may iterestig characteristics
More informationGregory Carey, 1998 Linear Transformations & Composites  1. Linear Transformations and Linear Composites
Gregory Carey, 1998 Liear Trasformatios & Composites  1 Liear Trasformatios ad Liear Composites I Liear Trasformatios of Variables Meas ad Stadard Deviatios of Liear Trasformatios A liear trasformatio
More informationCoordinating Principal Component Analyzers
Coordiatig Pricipal Compoet Aalyzers J.J. Verbeek ad N. Vlassis ad B. Kröse Iformatics Istitute, Uiversity of Amsterdam Kruislaa 403, 1098 SJ Amsterdam, The Netherlads Abstract. Mixtures of Pricipal Compoet
More informationMaximum Likelihood Estimators.
Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio
More informationPlugin martingales for testing exchangeability online
Plugi martigales for testig exchageability olie Valetia Fedorova, Alex Gammerma, Ilia Nouretdiov, ad Vladimir Vovk Computer Learig Research Cetre Royal Holloway, Uiversity of Lodo, UK {valetia,ilia,alex,vovk}@cs.rhul.ac.uk
More informationClass Meeting # 16: The Fourier Transform on R n
MATH 18.152 COUSE NOTES  CLASS MEETING # 16 18.152 Itroductio to PDEs, Fall 2011 Professor: Jared Speck Class Meetig # 16: The Fourier Trasform o 1. Itroductio to the Fourier Trasform Earlier i the course,
More information0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5
Sectio 13 KolmogorovSmirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.
More informationAsymptotic Growth of Functions
CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll
More information9.8: THE POWER OF A TEST
9.8: The Power of a Test CD91 9.8: THE POWER OF A TEST I the iitial discussio of statistical hypothesis testig, the two types of risks that are take whe decisios are made about populatio parameters based
More informationCase Study. Normal and t Distributions. Density Plot. Normal Distributions
Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca
More informationChapter 5: Inner Product Spaces
Chapter 5: Ier Product Spaces Chapter 5: Ier Product Spaces SECION A Itroductio to Ier Product Spaces By the ed of this sectio you will be able to uderstad what is meat by a ier product space give examples
More informationIrreducible polynomials with consecutive zero coefficients
Irreducible polyomials with cosecutive zero coefficiets Theodoulos Garefalakis Departmet of Mathematics, Uiversity of Crete, 71409 Heraklio, Greece Abstract Let q be a prime power. We cosider the problem
More informationChapter 6: Variance, the law of large numbers and the MonteCarlo method
Chapter 6: Variace, the law of large umbers ad the MoteCarlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value
More informationDivide and Conquer. Maximum/minimum. Integer Multiplication. CS125 Lecture 4 Fall 2015
CS125 Lecture 4 Fall 2015 Divide ad Coquer We have see oe geeral paradigm for fidig algorithms: the greedy approach. We ow cosider aother geeral paradigm, kow as divide ad coquer. We have already see a
More informationEntropy of bicapacities
Etropy of bicapacities Iva Kojadiovic LINA CNRS FRE 2729 Site école polytechique de l uiv. de Nates Rue Christia Pauc 44306 Nates, Frace iva.kojadiovic@uivates.fr JeaLuc Marichal Applied Mathematics
More informationSolutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork
Solutios to Selected Problems I: Patter Classificatio by Duda, Hart, Stork Joh L. Weatherwax February 4, 008 Problem Solutios Chapter Bayesia Decisio Theory Problem radomized rules Part a: Let Rx be the
More informationNormal Distribution.
Normal Distributio www.icrf.l Normal distributio I probability theory, the ormal or Gaussia distributio, is a cotiuous probability distributio that is ofte used as a first approimatio to describe realvalued
More informationIncremental calculation of weighted mean and variance
Icremetal calculatio of weighted mea ad variace Toy Fich faf@cam.ac.uk dot@dotat.at Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically
More informationTHE HEIGHT OF qbinary SEARCH TREES
THE HEIGHT OF qbinary SEARCH TREES MICHAEL DRMOTA AND HELMUT PRODINGER Abstract. q biary search trees are obtaied from words, equipped with the geometric distributio istead of permutatios. The average
More informationTaking DCOP to the Real World: Efficient Complete Solutions for Distributed MultiEvent Scheduling
Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed MultiEvet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria
More informationSoving Recurrence Relations
Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree
More informationSystems Design Project: Indoor Location of Wireless Devices
Systems Desig Project: Idoor Locatio of Wireless Devices Prepared By: Bria Murphy Seior Systems Sciece ad Egieerig Washigto Uiversity i St. Louis Phoe: (805) 6985295 Email: bcm1@cec.wustl.edu Supervised
More informationLecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k.
18.409 A Algorithmist s Toolkit September 17, 009 Lecture 3 Lecturer: Joatha Keler Scribe: Adre Wibisoo 1 Outlie Today s lecture covers three mai parts: CouratFischer formula ad Rayleigh quotiets The
More informationPSYCHOLOGICAL STATISTICS
UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION B Sc. Cousellig Psychology (0 Adm.) IV SEMESTER COMPLEMENTARY COURSE PSYCHOLOGICAL STATISTICS QUESTION BANK. Iferetial statistics is the brach of statistics
More informationVladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT
Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee
More informationConvexity, Inequalities, and Norms
Covexity, Iequalities, ad Norms Covex Fuctios You are probably familiar with the otio of cocavity of fuctios. Give a twicedifferetiable fuctio ϕ: R R, We say that ϕ is covex (or cocave up) if ϕ (x) 0 for
More informationStatistical inference: example 1. Inferential Statistics
Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either
More information4.1 Sigma Notation and Riemann Sums
0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas
More informationThe second difference is the sequence of differences of the first difference sequence, 2
Differece Equatios I differetial equatios, you look for a fuctio that satisfies ad equatio ivolvig derivatives. I differece equatios, istead of a fuctio of a cotiuous variable (such as time), we look for
More informationHere are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.
This documet was writte ad copyrighted by Paul Dawkis. Use of this documet ad its olie versio is govered by the Terms ad Coditios of Use located at http://tutorial.math.lamar.edu/terms.asp. The olie versio
More informationKey Ideas Section 81: Overview hypothesis testing Hypothesis Hypothesis Test Section 82: Basics of Hypothesis Testing Null Hypothesis
Chapter 8 Key Ideas Hypothesis (Null ad Alterative), Hypothesis Test, Test Statistic, Pvalue Type I Error, Type II Error, Sigificace Level, Power Sectio 81: Overview Cofidece Itervals (Chapter 7) are
More informationNEW HIGH PERFORMANCE COMPUTATIONAL METHODS FOR MORTGAGES AND ANNUITIES. Yuri Shestopaloff,
NEW HIGH PERFORMNCE COMPUTTIONL METHODS FOR MORTGGES ND NNUITIES Yuri Shestopaloff, Geerally, mortgage ad auity equatios do ot have aalytical solutios for ukow iterest rate, which has to be foud usig umerical
More informationBASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)
BASIC STATISTICS. SAMPLES, RANDOM SAMPLING AND SAMPLE STATISTICS.. Radom Sample. The radom variables X,X 2,..., X are called a radom sample of size from the populatio f(x if X,X 2,..., X are mutually idepedet
More informationIn nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008
I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces
More informationA Faster ClauseShortening Algorithm for SAT with No Restriction on Clause Length
Joural o Satisfiability, Boolea Modelig ad Computatio 1 2005) 4960 A Faster ClauseShorteig Algorithm for SAT with No Restrictio o Clause Legth Evgey Datsi Alexader Wolpert Departmet of Computer Sciece
More informationDAME  Microsoft Excel addin for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2
Itroductio DAME  Microsoft Excel addi for solvig multicriteria decisio problems with scearios Radomir Perzia, Jaroslav Ramik 2 Abstract. The mai goal of every ecoomic aget is to make a good decisio,
More informationTHE problem of fitting a circle to a collection of points
IEEE TRANACTION ON INTRUMENTATION AND MEAUREMENT, VOL. XX, NO. Y, MONTH 000 A Few Methods for Fittig Circles to Data Dale Umbach, Kerry N. Joes Abstract Five methods are discussed to fit circles to data.
More informationChapter 7  Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:
Chapter 7  Samplig Distributios 1 Itroductio What is statistics? It cosist of three major areas: Data Collectio: samplig plas ad experimetal desigs Descriptive Statistics: umerical ad graphical summaries
More informationCME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8
CME 30: NUMERICAL LINEAR ALGEBRA FALL 005/06 LECTURE 8 GENE H GOLUB 1 Positive Defiite Matrices A matrix A is positive defiite if x Ax > 0 for all ozero x A positive defiite matrix has real ad positive
More informationSequences and Series
CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their
More informationFunction factorization using warped Gaussian processes
Fuctio factorizatio usig warped Gaussia processes Mikkel N. Schmidt ms@imm.dtu.dk Uiversity of Cambridge, Departmet of Egieerig, Trumpigto Street, Cambridge, CB2 PZ, UK Abstract We itroduce a ew approach
More information5: Introduction to Estimation
5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample
More information1 Correlation and Regression Analysis
1 Correlatio ad Regressio Aalysis I this sectio we will be ivestigatig the relatioship betwee two cotiuous variable, such as height ad weight, the cocetratio of a ijected drug ad heart rate, or the cosumptio
More informationThe analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection
The aalysis of the Courot oligopoly model cosiderig the subjective motive i the strategy selectio Shigehito Furuyama Teruhisa Nakai Departmet of Systems Maagemet Egieerig Faculty of Egieerig Kasai Uiversity
More informationMARTINGALES AND A BASIC APPLICATION
MARTINGALES AND A BASIC APPLICATION TURNER SMITH Abstract. This paper will develop the measuretheoretic approach to probability i order to preset the defiitio of martigales. From there we will apply this
More informationCluster Validity Measurement Techniques
Cluster Validity Measuremet Techiques Ferec Kovács, Csaba Legáy, Attila Babos Departmet of Automatio ad Applied Iformatics Budapest Uiversity of Techology ad Ecoomics Goldma György tér 3, H Budapest,
More informationLecture 2: Karger s Min Cut Algorithm
priceto uiv. F 3 cos 5: Advaced Algorithm Desig Lecture : Karger s Mi Cut Algorithm Lecturer: Sajeev Arora Scribe:Sajeev Today s topic is simple but gorgeous: Karger s mi cut algorithm ad its extesio.
More informationA gentle introduction to Expectation Maximization
A getle itroductio to Expectatio Maximizatio Mark Johso Brow Uiversity November 2009 1 / 15 Outlie What is Expectatio Maximizatio? Mixture models ad clusterig EM for setece topic modelig 2 / 15 Why Expectatio
More informationNonlife insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring
Nolife isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy
More informationTIGHT BOUNDS ON EXPECTED ORDER STATISTICS
Probability i the Egieerig ad Iformatioal Scieces, 20, 2006, 667 686+ Prited i the U+S+A+ TIGHT BOUNDS ON EXPECTED ORDER STATISTICS DIMITRIS BERTSIMAS Sloa School of Maagemet ad Operatios Research Ceter
More informationAnnuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL.
Auities Uder Radom Rates of Iterest II By Abraham Zas Techio I.I.T. Haifa ISRAEL ad Haifa Uiversity Haifa ISRAEL Departmet of Mathematics, Techio  Israel Istitute of Techology, 3000, Haifa, Israel I memory
More informationSwaps: Constant maturity swaps (CMS) and constant maturity. Treasury (CMT) swaps
Swaps: Costat maturity swaps (CMS) ad costat maturity reasury (CM) swaps A Costat Maturity Swap (CMS) swap is a swap where oe of the legs pays (respectively receives) a swap rate of a fixed maturity, while
More informationRegularized Distance Metric Learning: Theory and Algorithm
Regularized Distace Metric Learig: Theory ad Algorithm Rog Ji 1 Shiju Wag 2 Yag Zhou 1 1 Dept. of Computer Sciece & Egieerig, Michiga State Uiversity, East Lasig, MI 48824 2 Radiology ad Imagig Scieces,
More information, a Wishart distribution with n 1 degrees of freedom and scale matrix.
UMEÅ UNIVERSITET Matematiskstatistiska istitutioe Multivariat dataaalys D MSTD79 PA TENTAMEN 00409 LÖSNINGSFÖRSLAG TILL TENTAMEN I MATEMATISK STATISTIK Multivariat dataaalys D, 5 poäg.. Assume that
More informationParameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 11 04/01/2008. Sven Zenker
Parameter estimatio for oliear models: Numerical approaches to solvig the iverse problem Lecture 11 04/01/2008 Sve Zeker Review: Trasformatio of radom variables Cosider probability distributio of a radom
More informationA Fuzzy Model of Software Project Effort Estimation
TJFS: Turkish Joural of Fuzzy Systems (eissn: 309 90) A Official Joural of Turkish Fuzzy Systems Associatio Vol.4, No.2, pp. 6876, 203 A Fuzzy Model of Software Project Effort Estimatio Oumout Chouseioglou
More informationwhere: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return
EVALUATING ALTERNATIVE CAPITAL INVESTMENT PROGRAMS By Ke D. Duft, Extesio Ecoomist I the March 98 issue of this publicatio we reviewed the procedure by which a capital ivestmet project was assessed. The
More informationDetermining the sample size
Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors
More informationOn The Comparison of Several Goodness of Fit Tests: With Application to Wind Speed Data
Proceedigs of the 3rd WSEAS It Cof o RENEWABLE ENERGY SOURCES O The Compariso of Several Goodess of Fit Tests: With Applicatio to Wid Speed Data FAZNA ASHAHABUDDIN, KAMARULZAMAN IBRAHIM, AND ABDUL AZIZ
More informationADAPTIVE NETWORKS SAFETY CONTROL ON FUZZY LOGIC
8 th Iteratioal Coferece o DEVELOPMENT AND APPLICATION SYSTEMS S u c e a v a, R o m a i a, M a y 25 27, 2 6 ADAPTIVE NETWORKS SAFETY CONTROL ON FUZZY LOGIC Vadim MUKHIN 1, Elea PAVLENKO 2 Natioal Techical
More informationLecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)
18.409 A Algorithmist s Toolkit October 27, 2009 Lecture 13 Lecturer: Joatha Keler Scribe: Joatha Pies (2009) 1 Outlie Last time, we proved the BruMikowski iequality for boxes. Today we ll go over the
More informationWeek 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable
Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5
More informationarxiv:1506.03481v1 [stat.me] 10 Jun 2015
BEHAVIOUR OF ABC FOR BIG DATA By Wetao Li ad Paul Fearhead Lacaster Uiversity arxiv:1506.03481v1 [stat.me] 10 Ju 2015 May statistical applicatios ivolve models that it is difficult to evaluate the likelihood,
More informationFinding the circle that best fits a set of points
Fidig the circle that best fits a set of poits L. MAISONOBE October 5 th 007 Cotets 1 Itroductio Solvig the problem.1 Priciples............................... Iitializatio.............................
More informationPerfect Packing Theorems and the AverageCase Behavior of Optimal and Online Bin Packing
SIAM REVIEW Vol. 44, No. 1, pp. 95 108 c 2002 Society for Idustrial ad Applied Mathematics Perfect Packig Theorems ad the AverageCase Behavior of Optimal ad Olie Bi Packig E. G. Coffma, Jr. C. Courcoubetis
More informationTrading the randomness  Designing an optimal trading strategy under a drifted random walk price model
Tradig the radomess  Desigig a optimal tradig strategy uder a drifted radom walk price model Yuao Wu Math 20 Project Paper Professor Zachary Hamaker Abstract: I this paper the author iteds to explore
More informationTheorems About Power Series
Physics 6A Witer 20 Theorems About Power Series Cosider a power series, f(x) = a x, () where the a are real coefficiets ad x is a real variable. There exists a real oegative umber R, called the radius
More informationLesson 17 Pearson s Correlation Coefficient
Outlie Measures of Relatioships Pearso s Correlatio Coefficiet (r) types of data scatter plots measure of directio measure of stregth Computatio covariatio of X ad Y uique variatio i X ad Y measurig
More informationPROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUSMALUS SYSTEM
PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUSMALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics
More informationChapter 7: Confidence Interval and Sample Size
Chapter 7: Cofidece Iterval ad Sample Size Learig Objectives Upo successful completio of Chapter 7, you will be able to: Fid the cofidece iterval for the mea, proportio, ad variace. Determie the miimum
More informationAnalyzing Longitudinal Data from Complex Surveys Using SUDAAN
Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical
More informationWHEN IS THE (CO)SINE OF A RATIONAL ANGLE EQUAL TO A RATIONAL NUMBER?
WHEN IS THE (CO)SINE OF A RATIONAL ANGLE EQUAL TO A RATIONAL NUMBER? JÖRG JAHNEL 1. My Motivatio Some Sort of a Itroductio Last term I tought Topological Groups at the Göttige Georg August Uiversity. This
More informationConfidence Intervals for One Mean
Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a
More informationDataEnhanced Predictive Modeling for Sales Targeting
DataEhaced Predictive Modelig for Sales Targetig Saharo Rosset Richard D. Lawrece Abstract We describe ad aalyze the idea of dataehaced predictive modelig (DEM). The term ehaced here refers to the case
More informationLesson 15 ANOVA (analysis of variance)
Outlie Variability betwee group variability withi group variability total variability Fratio Computatio sums of squares (betwee/withi/total degrees of freedom (betwee/withi/total mea square (betwee/withi
More information1 Computing the Standard Deviation of Sample Means
Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.
More informationA Mathematical Perspective on Gambling
A Mathematical Perspective o Gamblig Molly Maxwell Abstract. This paper presets some basic topics i probability ad statistics, icludig sample spaces, probabilistic evets, expectatios, the biomial ad ormal
More informationResearch Method (I) Knowledge on Sampling (Simple Random Sampling)
Research Method (I) Kowledge o Samplig (Simple Radom Samplig) 1. Itroductio to samplig 1.1 Defiitio of samplig Samplig ca be defied as selectig part of the elemets i a populatio. It results i the fact
More informationUnit 20 Hypotheses Testing
Uit 2 Hypotheses Testig Objectives: To uderstad how to formulate a ull hypothesis ad a alterative hypothesis about a populatio proportio, ad how to choose a sigificace level To uderstad how to collect
More informationLIMIT DISTRIBUTION FOR THE WEIGHTED RANK CORRELATION COEFFICIENT, r W
REVSTAT Statistical Joural Volume 4, Number 3, November 2006, 189 200 LIMIT DISTRIBUTION FOR THE WEIGHTED RANK CORRELATION COEFFICIENT, r W Authors: Joaquim F. Pito da Costa Dep. de Matemática Aplicada,
More informationCHAPTER 3 DIGITAL CODING OF SIGNALS
CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity
More informationLinear Algebra II. 4 Determinants. Notes 4 1st November Definition of determinant
MTH6140 Liear Algebra II Notes 4 1st November 2010 4 Determiats The determiat is a fuctio defied o square matrices; its value is a scalar. It has some very importat properties: perhaps most importat is
More informationUniversal coding for classes of sources
Coexios module: m46228 Uiversal codig for classes of sources Dever Greee This work is produced by The Coexios Project ad licesed uder the Creative Commos Attributio Licese We have discussed several parametric
More informationTHIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E. MCCARTHY, SANDRA POTT, AND BRETT D. WICK
THIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E MCCARTHY, SANDRA POTT, AND BRETT D WICK Abstract We provide a ew proof of Volberg s Theorem characterizig thi iterpolatig sequeces as those for
More informationRunning Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis
Ruig Time ( 3.) Aalysis of Algorithms Iput Algorithm Output A algorithm is a stepbystep procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.
More information5 Boolean Decision Trees (February 11)
5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected
More informationSECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES
SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,
More informationDiscrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13
EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may
More information