Kernel Mean Estimation and Stein Effect

Size: px
Start display at page:

Download "Kernel Mean Estimation and Stein Effect"

Transcription

1 Krikamol Muadet Empirical Iferece Departmet, Max Plack Istitute for Itelliget Systems, Tübige, Germay Keji Fukumizu The Istitute of Statistical Mathematics, Tokyo, Japa Bharath Sriperumbudur Statistical Laboratory, Uiversity of Cambridge, Cambridge, Uited Kigdom Arthur Gretto Gatsby Computatioal Neurosciece Uit, Uiversity College Lodo, Lodo, Uited Kigdom Berhard Schölkopf Empirical Iferece Departmet, Max Plack Istitute for Itelliget Systems, Tübige, Germay Abstract A mea fuctio i a reproducig kerel Hilbert space (RKHS), or a kerel mea, is a importat part of may algorithms ragig from kerel pricipal compoet aalysis to Hilbert-space embeddig of distributios. Give a fiite sample, a empirical average is the stadard estimate for the true kerel mea. We show that this estimator ca be improved due to a well-kow pheomeo i statistics called Stei s pheomeo. After cosideratio, our theoretical aalysis reveals the existece of a wide class of estimators that are better tha the stadard oe. Focusig o a subset of this class, we propose efficiet shrikage estimators for the kerel mea. Empirical evaluatios o several applicatios clearly demostrate that the proposed estimators outperform the stadard kerel mea estimator.. Itroductio This paper aims to improve the estimatio of the mea fuctio i a reproducig kerel Hilbert space (RKHS) from a fiite sample. A kerel mea of a probability distributio P over a measurable space X is defied by µ P k(x, ) dp(x) H, () X Proceedigs of the st Iteratioal Coferece o Machie Learig, Beijig, Chia,. JMLR: W&CP volume. Copyright by the author(s). wherehis a RKHS associated with a reproducig kerel k : X X R. Coditios esurig that this expectatio exists are give i Smola et al. (7). Ufortuately, it is ot practical to compute µ P directly because the distributio P is usually ukow. Istead, give a i.i.d sample x,x,...,x from P, we ca easily compute the empirical kerel mea by the average µ P k(x i, ). () The estimate µ P is the most commoly used estimate of the true kerel mea. Our primary iterest here is to ivestigate whether oe ca improve upo this stadard estimator. The kerel mea has recetly gaied attetio i the machie learig commuity, thaks to the itroductio of Hilbert space embeddig for distributios (Berliet ad Aga, ; Smola et al., 7). Represetig the distributio as a mea fuctio i the RKHS has several advatages: ) the represetatio with appropriate choice of kerel k has bee show to preserve all iformatio about the distributio (Fukumizu et al., ; Sriperumbudur et al., ; ); ) basic operatios o the distributio ca be carried out by meas of ier products i RKHS, e.g., E P [f(x)] = f,µ P H for all f H; ) o itermediate desity estimatio is required, e.g., whe testig for homogeeity from fiite samples. As a result, may algorithms have beefited from the kerel mea represetatio, amely, maximum mea discrepacy (MMD) (Gretto et al., 7), kerel depedecy measure (Gretto et al., ), kerel twosample-test (Gretto et al., ), Hilbert space embeddig of HMMs (Sog et al., ), ad kerel Bayes rule

2 (Fukumizu et al., ). Their performaces rely directly o the quality of the empirical estimate µ P. However, it is of great importace, especially for our readers who are ot familiar with kerel methods, to realize a more fudametal role of the kerel mea. It basically serves as a foudatio to most kerel-based learig algorithms. For istace, oliear compoet aalyses, such as kerel PCA, kerel FDA, ad kerel CCA, rely heavily o mea fuctios ad covariace operators i RKHS (Schölkopf et al., 99). The kerel k-meas algorithm performs clusterig i feature space usig mea fuctios as the represetatives of the clusters (Dhillo et al., ). Moreover, it also serves as a basis i early developmet of algorithms for classificatio ad aomaly detectio (Shawe-Taylor ad Cristiaii,, chap. ). All of those employ () as the estimate of the true mea fuctio. Thus, the fact that substatial improvemet ca be gaied whe estimatig () may i fact raise a widespread suspicio o traditioal way of learig with kerels. We show i this work that the stadard estimator () is, i a certai sese, ot optimal, i.e., there exist better estimators (more below). I additio, we propose shrikage estimators that outperform the stadard oe. At first glace, it was defiitely couter-ituitive ad surprisig for us, ad will udoubtedly also be for some of our readers, that the empirical kerel mea could be improved, ad, give the simplicity of the proposed estimators, that this has remaied uoticed util ow. Oe of the reasos may be that there is a commo belief that the estimator ˆµ P already gives a good estimate ofµ P, ad, as sample size goes to ifiity, the estimatio error disappears (Shawe-Taylor ad Cristiaii, ). As a result, o eed is felt to improve the kerel mea estimatio. However, give a fiite sample, substatial improvemet is i fact possible ad several factors may come ito play, as will be see later i this work. This work was partly ispired by Stei s semial work i 9, which showed that a maximum likelihood estimator (MLE), i.e., the stadard empirical mea, for the mea of the multivariate Gaussia distributio N(θ,σ I) is iadmissible (Stei, 9). That is, there exists a estimator that always achieves smaller total mea squared error regardless of the true θ, whe the dimesio is at least. Perhaps the best kow estimator of such kid is James- Steis estimator (James ad Stei, 9). Iterestigly, the James-Stei estimator is itself iadmissible, ad there exists a wide class of estimators that outperform the MLE, see e.g., Berger (97). However, our work differs fudametally from the Stei s semial works ad those alog this lie i two aspects. First, our settig is o-parametric i a sese that we do ot assume ay parametric form of the distributio, whereas most of traditioal works focus o some specific distributios, e.g., Gaussia distributio. Secod, our settig ivolves a o-liear feature map ito a high-dimesioal space, if ot ifiite. As a result, higher momets of the distributio may come ito play. Thus, oe caot adopt Stei s settig straightforwardly. A direct geeralizatio of James-Stei estimator to ifiite-dimesioal Hilbert space has already bee cosidered (Berger ad Wolpert, 9; Madelbaum ad Shepp, 97; Privault ad Rveillac, ). I those works, θ which is the parameter to be estimated is assumed to be the mea of a Gaussia measure o the Hilbert space from which samples are draw. I our case, o the other had, the samples are draw from P ad ot from the Gaussia distributio whose mea isµ P. The cotributio of this paper ca be summarized as follows: First, we show that the stadard kerel mea estimator ca be improved by providig a alterative estimator that achieves smaller risk ( ). The theoretical aalysis reveals the existece of a wide class of estimators that are better tha the stadard. To this ed, we propose i a kerel mea shrikage estimator (KMSE), which is based o a ovel motivatio for regularizatio through the otio of shrikage. Moreover, we propose a efficiet leave-oeout cross-validatio procedure to select the shrikage parameter, which is ovel i the cotext of kerel mea estimatio. Lastly, we demostrate the beefit of the proposed estimators i several applicatios ( ).. Motivatio: Shrikage Estimators For a arbitrary distributio P, deote by µ ad µ the true kerel mea ad its empirical estimate () from the i.i.d. sample x,x,...,x P (we remove the subscript for ease of otatio). The most atural loss fuctio cosidered i this work is l(µ, µ) = µ µ H. A estimator µ is a mappig which is measurable w.r.t. the Borel σ-algebra of H ad is evaluated by its risk fuctior(µ, µ) = E P [l(µ, µ)] wheree P idicates expectatio over the choice of i.i.d. sample of sizefromp. Let us cosider a alterative kerel mea estimator: µ α αf + ( α) µ where α < ad f H. It is essetially a shrikage estimator that shriks the stadard estimator toward a fuctio f by a amout specified by α. If α =, µ α reduces to the stadard estimator µ. The followig theorem asserts that the risk of shrikage estimator µ α is smaller tha that of stadard estimator µ give a appropriate choice of α, regardless of the fuctio f (more below). Theorem. For all distributiospad the kerel k, there existsα > for which R(µ, µ α ) < R(µ, µ). Proof. The risk of the stadard kerel mea estimator satisfies E µ µ = (E[k(x,x)] E[k(x, x)]) =:

3 where x is a idepedet copy of x. Let us defie the risk of the proposed shrikage estimator by α := E µ α µ where α is a o-egative shrikage parameter. We ca the write this i terms of the stadard risk as α = αe µ µ, µ µ+µ f + α E f α E[f (x)] + α E µ. It follows from the reproducig property of H that E[f (x)] = f,µ. Moreover, usig the fact that E µ = E µ µ+µ = + E[k(x, x)], we ca simplify the shrikage risk by α = α ( + f µ ) α +. Thus, we have α = α ( + f µ ) α which is o-positive where [ ] α, + f µ () ad miimized at α = /( + f µ ). As we ca see i (), there is a rage ofαfor which a opositive α, i.e., R(µ, µ α ) R(µ, µ), is guarateed. However, Theorem relies o the importat assumptio that the true kerel mea of the distributiopis required to estimate α. I spite of this, the theorem has a importat implicatio suggestig that the shrikage estimator µ α ca improve upo µ if α is chose appropriately. Later, we will exploit this result i order to costruct more practical estimators. Remark. The followig observatios follow immediately from Theorem : The shrikage estimator always improves upo the stadard oe regardless of the directio of shrikage, as specified by f. I other words, there exists a wide class of kerel mea estimators that are better tha the stadard oe. The value of α also depeds o the choice of f. The furtherf is fromµ, the smallerαbecomes. Thus, the shrikage gets smaller if f is chose such that it is far from the true kerel mea. This effect is aki to James-Stei estimator. The improvemet ca be viewed as a bias-variace trade-off: the shrikage estimator reduces variace substatially at the expese of a little bias. Remark sheds light o how oe ca practically costruct the shrikage estimator: we ca choose f arbitrarily as log as the parameter α is chose appropriately. Moreover, further improvemet ca be gaied by icorporatig prior kowledge as to the locatio of µ P, which ca be straightforwardly itegrated ito the framework via f (Berger ad Wolpert, 9). Ispired by James-Stei estimator, we focus o f =. We will ivestigate the effect of differet prior f i future works.. Kerel Mea Shrikage Estimator I this sectio we give a ovel formulatio of kerel mea estimator that allows us to estimate the shrikage parameter efficietly. I the followig, let φ : X H be a feature map associated with the kerel k ad, be a ier product i the RKHSHsuch thatk(x,x ) = φ(x),φ(x ). Uless stated otherwise, deotes the RKHS orm. The kerel mea µ P ad its empirical estimate µ P ca be obtaied as a miimizer of the loss fuctioals E(g) E x P φ(x) g, Ê(g) φ(x i ) g, respectively. We will call the estimator miimizig the loss fuctioal Ê(g) a kerel mea estimator (KME). Note that the losse(g) is differet from the oe cosidered i, i.e., l(µ,g) = µ g = E[φ(x)] g. Nevertheless, we havel(µ,g) = E xx k(x,x ) E x g(x)+ g. SiceE(g) = E x k(x,x) E x g(x)+ g, the lossl(µ,g) differs frome(g) oly bye x k(x,x) E xx k(x,x ) which is ot a fuctio of g. We itroduce the ew form here because it will give a more tractable cross-validatio computatio (.). I spite of this, the resultig estimators are always evaluated w.r.t. the loss i (cf..). From the formulatio above, it is atural to ask if miimizig the regularized versio of Ê(g) will give better estimator. O the oe had, oe ca argue that, ulike i the classical risk miimizatio, we do ot really eed a regularizer here. The stadard estimator () is kow to be, i a certai sese, optimal ad ca be estimated reliably (Shawe-Taylor ad Cristiaii,, prop..). Moreover, the origial formulatio ofê(g) is a well-posed problem. O the other had, sice regularizatio may be viewed as shrikig the solutio toward zero, it ca actually improve the kerel mea estimatio, as suggested by Theorem (cf. discussios at the ed of ). Cosequetly, we miimize a modified loss fuctioal Ê (g) Ê(g)+Ω( g ) = φ(x i ) g +Ω( g ), () whereω( ) deotes a mootoically-icreasig regularizatio fuctioal ad is a o-egative regularizatio parameter. I what follows, we refer to the shrikage estimator µ miimizig Ê(g) as a kerel mea shrikage estimator (KMSE). The parameters α ad play similar role as a shrikage parameter. They specify a amout by which the stadard estimator µ is shruk toward f =. Thus, the term shrikage parameter ad regularizatio parameter will be used iterchageably.

4 It follows from the represeter theorem thatg lies i a subspace spaed by the data, i.e., g = j= β jφ(x j ) for some β R. By cosiderig Ω( g ) = g, we ca rewrite () as φ(x i) β j φ(x j ) + β j φ(x j ) j= j= = β Kβ β K +β Kβ +c, () wherecis a costat term,kis a Gram matrix such that K ij = k(x i,x j ), ad = [/,/,...,/]. Takig a derivative of () w.r.t. β ad settig it to zero yield β = (/( + )). By settig α = /( + ) the shrikage estimate ca be writte as µ = ( α) µ. Sice < α <, the estimator µ correspods to a shrikage estimator discussed i whe f =. We call this estimator a simple kerel mea shrikage estimator (S-KMSE). Usig the expasio g = j= β jφ(x j ), we may cosider whe the regularizatio fuctioal is writte i term of β, e.g., β β. This leads to a particularly iterestig kerel mea estimator. I this case, the optimal weight vector is give by β = (K + I) K ad the shrikage estimate ca be writte accordigly as µ = j= β jφ(x j ) = Φ (K + I) K where Φ = [φ(x ),φ(x ),...,φ(x )]. Ulike the S-KMSE, this estimator shriks the usual estimate differetly i each coordiate (cf. Theorem ). Hece, we will call it a flexible kerel mea shrikage estimator (F-KMSE). The followig theorem characterizes the F-KMSE as a shrikage estimator. Theorem. The F-KMSE ca be writte as µ = γ i γ µ,v i+ i v i where {γ i,v i } are eigevalue ad eigevector pairs of the empirical covariace operator Ĉ xx ih. I words, the effect of F-KMSE is to reduce high frequecy compoets of the expasio of µ, by expadig this i terms of the kerel PCA basis ad shrikig the coefficiets of the high order eigefuctios, e.g., see Rasmusse ad Williams (, sec..). Note that the covariace operator Ĉxx itself does ot deped o. As we ca see, the solutio to the regularized versio is ideed of the form of shrikage estimators whe f =. That is, both S-KMSE ad F-KMSE shrik the stadard kerel mea estimate towards zero. The differece is that the S-KMSE shriks equally i all coordiate, whereas the F-KMSE also costraits the amout of shrikage by the iformatio cotaied i each coordiate. Moreover, the squared RKHS orm ca be decomposed as a sum of squared loss weighted by the eigevalues γ i (cf. Madelbaum ad Shepp (97, appedix)). By the same reasoig as Stei s result i fiite-dimesioal case, oe would suspect that a improvemet of shrikage estimators i H should also deped o how fast the eigevalues of k decay. That is, oe would expect greater improvemet if the values ofγ i decay very slowly. For example, the Gaussia RBF kerel with larger badwidth gives smaller improvemet whe compared to oe with smaller badwidth. Similarly, we should expect to see more improvemet whe applyig a Laplacia kerel tha whe usig a Gaussia RBF kerel. I some applicatios of kerel mea embeddig, oe may wat to iterpret the weight β as a probability vector (Nishiyama et al., ). However, the weight vector β output by our estimators is i geeral ot ormalized. I fact, all elemets will be smaller tha / as a result of shrikage. However, oe may impose a costrait that β must sum to oe ad resort to a quadratic programmig (Sog et al., ). Ufortuately, this approach has udesirable effect of sparsity which is ulikely to improve upo the stadard estimator. Post-ormalizig the weights ofte deteriorates the estimatio performace. To the best of our kowledge, o previous attempt has bee made to improve the kerel mea estimatio. However, we discuss some closely related works here. For example, istead of the loss fuctioal Ê(g), Kim ad Scott () cosider a robust loss fuctio such as the Huber s loss to reduce the effect of outliers. The authors cosider kerel desity estimators, which differ fudametally from kerel mea estimators. They eed to reduce the kerel badwidth with icreasig sample size for the estimators to be cosistet. Regularized versio of MMD was adopted by Daafar et al. () i the cotext of kerelbased hypothesis testig. The resultig formulatio resembles our S-KMSE. Furthermore, the F-KMSE is of a similar form as the coditioal mea embeddig used i Grüewälder et al. (), which ca be viewed more geerally as a regressio problem i RKHS with smooth operators (Grüewälder et al., )... Choosig Shrikage Parameter As discussed i, the amout of shrikage plays a importat role i our estimators. I this work we propose to select the shrikage parameter by a automatic leaveoe-out cross-validatio. For a give shrikage parameter, let us cosider the observatiox i as beig a ew observatio by omittig it from the dataset. Deote by µ ( i) = j i β( i) j φ(x j ) the kerel mea estimated from the remaiig data, usig the valueas a shrikage parameter, so thatβ ( i) is the miimizer ofê( i) (g). We will measure the quality of µ ( i) by how well it approximates φ(x i ). The overall quality of the

5 estimate is quatified by the cross-validatio score LOOCV() = φ(x i ) µ ( i) H. () By simple algebra, it is ot difficult to show that the optimal shrikage parameter of S-KMSE ca be calculated aalytically, as stated by the followig theorem. Theorem. Let ρ j= k(x i,x j ) ad k(x i,x i ). The shrikage parameter = ( ρ)/(( )ρ+ / ) of the S-KMSE is the miimizer of LOOCV(). O the other had, fidig the optimalfor the F-KMSE is relatively more ivolved. Evaluatig the score () aïvely requires oe to solve for µ ( i) explicitly for every i. Fortuately, we ca simplify the score such that it ca be evaluated efficietly, as stated i the followig theorem. Theorem. The LOOCV score of F-KMSE satisfies LOOCV() = (β K K i ) C (β K K i ) where β is the weight vector calculated from the full dataset with the shrikage parameter ad C = (K K(K+I) K) K(K K(K+I) K). Proof of Thorem. For fixed ad i, let µ ( i) be the leave-oe-out kerel mea estimate of F-KMSE ad let A (K + I). The, we ca write a expressio for the deleted residual as ( i) := µ ( i) φ(x i ) = µ φ(x i ) + j= l= A jl φ(x l ), µ ( i) φ(x i ) φ(x j ). Sice ( i) lies i a subspace spaed by the sample φ(x ),...,φ(x ), we have ( i) = k= ξ kφ(x k ) for some ξ R. Substitutig ( i) back yields k= ξ kφ(x k ) = µ φ(x i ) + j= {AKξ} jφ(x j ). By takig the ier product o both sides w.r.t. the sample φ(x ),...,φ(x ) ad solvig for ξ, we have ξ = (K KAK) (β K K i ) wherek i is theith colum of K. Cosequetly, the leave-oe-out score of the sample x i ca be computed by ( i) = ξ Kξ = (β K K i ) (K KAK) K(K KAK) (β K K i ) = (β K K i ) C (β K K i ). Averagig ( i) over all samples gives LOOCV() = ( i) = (β K K i ) C (β K K i ), as required. It is iterestig to see that the leave-oe-out crossvalidatio score i Theorem depeds oly o the oleave-oe-out solutio β, which ca be obtaied as a byproduct of the algorithm. Computatioal complexity The S-KMSE requires O( ) operatios to select shrikage parameter. For the F-KMSE, there are two steps i cross-validatio. First, we eed to compute (K + I) repeatedly for differet values of. Assume that we kow the eigedecompositio K = UDU where D is diagoal with d ii ad UU = I. It follows that (K+I) = U(D+I) U. Cosequetly, solvig for β takes O( ) operatios. Sice eigedecompositio requires O( ) operatios, fidig β for may s is essetially free. A low-rak approximatio ca also be adopted to reduce the computatioal cost further. Secod, we eed to compute the cross-validatio score (). As show i Theorem, we ca compute it usig oly β obtaied from the previous step. The calculatio of C ca be simplified further via the eigedecompositio of K as C = U(D D(D+I) D) D(D D(D+ I) D) U. Sice it oly ivolves the iverse of diagoal matrices, the iversio ca be evaluated i O() operatios. The overall computatioal complexity of the crossvalidatio requires oly O( ) operatios, as opposed to the aïve approach that requires O( ) operatios. Whe performed as a by-product of the algorithm, the computatioal cost of cross-validatio procedure becomes egligible as the dataset becomes larger. I practice, we use the fmisearch ad fmibd routies of the MATLAB optimizatio toolbox to fid the best shrikage parameter... Covariace Operators The covariace operator fromh X toh Y ca be viewed as a mea fuctio i a product space H X H Y. Hece, we ca also costruct a shrikage estimator of covariace operator i RKHS. Let (H X,k X ) ad (H Y,k Y ) be the RKHS of fuctios o measurable space X ad Y, respectively, with p.d. kerel k X ad k Y (with feature map φ ad ϕ). We will cosider a radom vector (X,Y) : Ω X Y with distributio P XY, with P X ad P Y as margial distributios. Uder some coditios, there exists a uique cross-covariace operator Σ YX : H X H Y such that g,σ YX f HY = E XY [(f(x) E X [f(x)])(g(y) E Y [g(y)])] = Cov(f(X),g(Y)) holds for all f H X ad g H Y (Fukumizu et al., ). If X equals Y, we get the self-adjoit operatorσ XX called the covariace operator. Give a i.i.d sample from P XY writte as (x,y ),(x,y ),...,(x,y ), we ca write the empirical cross-covariace operator as Σ YX := φ(x i) ϕ(y i ) µ X µ Y where µ X = φ(x i) ad µ Y = ϕ(y i). Let φ ad ϕ be the cetered feature maps of φ ad ϕ, respectively. The, it ca be rewritte as Σ YX := φ(x i ) ϕ(y i ) H X H Y. It follows from the ier product property i product space that φ(x) ϕ(y), φ(x ) ϕ(y ) HX H Y = φ(x), φ(x ) HX ϕ(y), ϕ(y ) HY = k X (x,x ) k Y (y,y ).

6 =. γ = γ = γ =. γ x = γ x 9 7 = γ x =. γ x = γ x = γ x =. γ = γ = γ (a) LIN (b) POLY (c) POLY (d) RBF Figure. The average loss of KME (left), S-KMSE (middle), ad F-KMSE (right) estimators with differet values of shrikage parameter. Iside boxes correspod to estimators. We repeat the experimets over differet distributios with = ad d =. The, we ca obtai the shrikage estimators for the covariace operator by pluggig the kerel k((x,y),(x,y )) = k X (x,x ) k Y (y,y ) i our KM- SEs. We will call this estimator a covariace-operator shrikage estimator (COSE). The same trick ca be easily geeralized to tesors of higher order, which have bee previously used, for example, i Sog et al. ().. Experimets We focus o the compariso betwee our shrikage estimators ad the stadard estimator of the kerel mea usig both sythetic datasets ad real-world datasets... Sythetic Data Give the true data-geeratig distributio P, we evaluate differet estimators usig the loss fuctio l(β) β ik(x i, ) E P [k(x, )] H where β is the weight vector associated with differet estimators. To allow for a exact calculatio of l(β), we cosider whe P is a mixture-of-gaussias distributio ad k is the followig kerel fuctio: ) liear kerel k(x,x ) = x x ; ) polyomial degree- kerel k(x,x ) = (x x + ) ; ) polyomial degree- kerel k(x,x ) = (x x + ) ; ad ) Gaussia RBF kerel k(x,x ) = exp ( x x /σ ). We will refer to them as LIN, POLY, POLY, ad RBF, respectively. Experimetal protocol. Data are geerated from a d- dimesioal mixture of Gaussias: x π i N(θ i,σ i )+ε, θ ij U(,), Σ i W( I d,7), ε N(,. I d ), where U(a,b) ad W(Σ,df) represet the uiform distributio ad Wishart distributio, respectively. We set π = [.,.,.,.]. The choice of parameters here is quite arbitrary; we have experimeted usig various parameter settigs ad the results are similar to those preseted here. For the Gaussia RBF kerel, we set the badwidth parameter to square-root of the media Euclidea distace betwee samples i the dataset (i.e., σ = media { x i x j } throughout). Figure shows the average loss of differet estimators usig differet kerels as we icrease the value of shrikage parameter. Here we scale the shrikage parameter by the miimum o-zero eigevalue γ of kerel matrix K. I geeral, we fid S-KMSE ad F-KMSE ted to outperform KME. However, as becomes large, there are some cases where shrikage deteriorates the estimatio performace, e.g., see LIN kerel ad some outliers i the figures. This suggests that it is very importat to choose the parameter appropriately (cf. the discussio i ). Similarly, Figure depicts the average loss as we vary the sample size ad dimesio of the data. I this case, the shrikage parameter is chose by the proposed leave-oeout cross-validatio score. As we ca see, both S-KMSE Average Loss Average Loss LIN Sample Size (d=) 7 LIN Dimesio (=).... x POLY Sample Size (d=) x POLY Dimesio (=). x POLY.. Sample Size (d=) x POLY Dimesio (=) RBF KME S KMSE F KMSE Sample Size (d=) RBF Dimesio (=) Figure. The average loss over differet distributios of KME, S-KMSE, ad F-KMSE with varyig sample size () ad dimesio (d). The shrikage parameter is chose by LOOCV.

7 Table. Average egative log-likelihood of the model Q o test poits over radomizatios. The boldface represets the result whose differece from the baselie, i.e., KME, is statistically sigificat. Dataset LIN POLY POLY RBF KME S-KMSE F-KMSE KME S-KMSE F-KMSE KME S-KMSE F-KMSE KME S-KMSE F-KMSE. ioosphere soar australia specft wdbc wie satimage segmet vehicle svmguide vowel housig bodyfat abaloe glass ad F-KMSE outperform the stadard KME. The S-KMSE performs slightly better tha the F-KMSE. Moreover, the improvemet is more substatial i the large d, small paradigm. I the worst cases, the S-KMSE ad F-KMSE perform as well as the KME. Lastly, it is istructive to ote that the improvemet varies with the choice of kerel k. Briefly, the choice of kerel reflects the dimesioality of feature space H. Oe would expect more improvemet i high-dimesioal space, e.g., RBF kerel, tha the low-dimesioal, e.g., liear kerel (cf. discussios at the ed of ). This pheomeo ca be observed i both Figure ad... Real Data We cosider three bechmark applicatios: desity estimatio via kerel mea matchig (Sog et al., ), kerel PCA usig shrikage mea ad covariace operator (Schölkopf et al., 99), ad discrimiative learig o distributios (Muadet ad Schölkopf, ; Muadet et al., ). For the first two tasks we employ datasets from the UCI repositories. We use oly realvalued features, each of which is ormalized to have zero mea ad uit variace. Desity estimatio. We perform desity estimatio via kerel mea matchig (Sog et al., ). That is, we fit the desity Q = m j= π jn(θ j,σj I) to each dataset by miimizig µ µ Q H s.t. m j= π j =. The kerel mea µ is obtaied from the samples usig differet estimators, whereas µ Q is the kerel mea embeddig of the desity Q. Ulike experimets i Sog et al. (), our goal is to compare differet estimators of µ P where P is the true data distributio. That is, we replace ˆµ with a versio obtaied via shrikage. A better estimate ofµ P should lead to better desity estimatio, as measured by the egative log-likelihood of Q o the test set. We use % of the dataset as a test set. We set m = for each dataset. The model is iitialized by ruig radom iitializatios usig the k-meas algorithm ad returig the best. We repeat the experimets times ad perform the paired sig test o the results at the % sigificace level. The average egative log-likelihood of the model Q, optimized via differet estimators, is reported i Table. Clearly, both S-KMSE ad F-KMSE cosistetly achieve smaller egative log-likelihood whe compared to KME. There are however few cases i which KME outperforms the proposed estimators, especially whe the dataset is relatively large, e.g., satimage ad abaloe. We suspect that i those cases the stadard KME already provides a accurate estimate of the kerel mea. To get a better estimate, more effort is required to optimize for the shrikage parameter. Moreover, the improvemet across differet kerels is cosistet with results o the sythetic datasets. Kerel PCA. I this experimet, we perform the KPCA usig differet estimates of the mea ad covariace operators. We compare the recostructio error E proj (z) = φ(z) Pφ(z) o test samples wherepis the projectio costructed from the first pricipal compoets. We use a Gaussia RBF kerel for all datasets. We compare differet scearios: ) stadard KPCA; ) shrikage ceterig with S-KMSE; ) shrikage ceterig with F-KMSE; ) KPCA with S-COSE; ad ) KPCA with F-COSE. To perform KPCA o shrikage covariace operator, we solve the geeralized eigevalue problem K c BK c V = K c VD where B = diag(β) ad K c is the cetered Gram matrix. The weight vector β is obtaied from shrikage estimators usig the kerel matrix K c K c where deotes the Hadamard product. We use % of the dataset as a test set. The paired sig test is a oparametric test that ca be used to examie whether two paired samples have the same distributio. I our case, we compare S-KMSE ad F-KMSE agaist KME.

8 KME S KMSE F KMSE S COSE F COSE recostructio error.... ioosphere soar australia specft wdbc wie satimage segmet vehicle svmguide vowel housig bodyfat abaloe glass Figure. The average recostructio error of KPCA o hold-out test samples over repetitios. The KME represets the stadard approach, whereas S-KMSE ad F-KMSE use shrikage meas to perform ceterig. The S-COSE ad F-COSE directly use the shrikage estimate of the covariace operator. Figure illustrates the results of KPCA. Clearly, the S- COSE ad F-COSE cosistetly outperforms all other estimators. Although we observe a improvemet of S-KMSE ad F-KMSE over KME, it is very small compared to that of S-COSE ad F-COSE. This makes sese ituitively, sice chagig the mea poit or shiftig data does ot chage the covariace structure cosiderably, so it will ot sigificatly affect the recostructio error. Table. The classificatio accuracy of SMM ad the area uder ROC curve (AUC) of OCSMM usig differet kerel mea estimators to costruct the kerel o distributios. Estimator Liear No-liear SMM OCSMM SMM OCSMM KME S-KMSE F-KMSE Discrimiative learig o distributios. A positive semi-defiite kerel betwee distributios ca be defied via their kerel mea embeddigs. That is, give a traiig sample ( P,y ),...,( P m,y m ) P {,+} where P i := k= δ x i ad xi k k P i, the liear kerel betwee two distributios is approximated by µ Pi, µ Pj = k= βi k φ(xi k ), l= βj l φ(xj l ) = k,l= βi k βj l k(xi k,xj l ). The weight vectors βi ad β j come from the kerel mea estimates of µ Pi ad µ Pj, respectively. The o-liear kerel ca the be defied accordigly, e.g., κ(p i,p j ) = exp( µ Pi µ Pj H /σ ). Our goal i this experimet is to ivestigate if the shrikage estimate of the kerel mea improves the performace of the discrimiative learig o distributios. To this ed, we coduct experimets o atural scee categorizatio usig support measure machie (SMM) (Muadet et al., ) ad group aomaly detectio o a high-eergy physics dataset usig oe-class SMM (OC- SMM) (Muadet ad Schölkopf, ). We use both liear ad o-liear kerels where the Gaussia RBF kerel is employed as a embeddig kerel (Muadet et al., ). All hyper-parameters are chose by -fold crossvalidatio. For our usupervised problem, we repeat the experimets usig several parameter settigs ad report the best results. Table reports the classificatio accuracy of SMM ad the area uder ROC curve (AUC) of OCSMM usig differet kerel mea estimators. Both shrikage estimators cosistetly lead to better performace o both SMM ad OC- SMM whe compared to KME. To summarize, we fid sufficiet evidece to coclude that both S-KMSE ad F-KMSE outperforms the stadard KME. The performace of S-KMSE ad F-KMSE is very competitive. The differece depeds o the dataset ad the kerel fuctio.. Coclusios To coclude, we show that the commoly used kerel mea estimator ca be improved. Our theoretical result suggests that there exists a wide class of kerel mea estimators that are better tha the stadard oe. To demostrate this, we focus o two efficiet shrikage estimators, amely, simple ad flexible kerel mea shrikage estimators. Empirical study clearly shows that the proposed estimators outperform the stadard oe i various scearios. Most importatly, the shrikage estimates ot oly provide more accurate estimatio, but also lead to superior performace o real-world applicatios. Ackowledgmets The authors wish to thak David Hogg ad Ross Fedely for readig the first draft ad aoymous reviewers who gave valuable suggestio that has helped to improve the mauscript.

Chapter 7 Methods of Finding Estimators

Chapter 7 Methods of Finding Estimators Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of

More information

Modified Line Search Method for Global Optimization

Modified Line Search Method for Global Optimization Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o

More information

7. Sample Covariance and Correlation

7. Sample Covariance and Correlation 1 of 8 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 7. Sample Covariace ad Correlatio The Bivariate Model Suppose agai that we have a basic radom experimet, ad that X ad Y

More information

NPTEL STRUCTURAL RELIABILITY

NPTEL STRUCTURAL RELIABILITY NPTEL Course O STRUCTURAL RELIABILITY Module # 0 Lecture 1 Course Format: Web Istructor: Dr. Aruasis Chakraborty Departmet of Civil Egieerig Idia Istitute of Techology Guwahati 1. Lecture 01: Basic Statistics

More information

LECTURE 13: Cross-validation

LECTURE 13: Cross-validation LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M

More information

Properties of MLE: consistency, asymptotic normality. Fisher information.

Properties of MLE: consistency, asymptotic normality. Fisher information. Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout

More information

Hypothesis testing. Null and alternative hypotheses

Hypothesis testing. Null and alternative hypotheses Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate

More information

Research Article Sign Data Derivative Recovery

Research Article Sign Data Derivative Recovery Iteratioal Scholarly Research Network ISRN Applied Mathematics Volume 0, Article ID 63070, 7 pages doi:0.540/0/63070 Research Article Sig Data Derivative Recovery L. M. Housto, G. A. Glass, ad A. D. Dymikov

More information

I. Chi-squared Distributions

I. Chi-squared Distributions 1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should

More information

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample

More information

A probabilistic proof of a binomial identity

A probabilistic proof of a binomial identity A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two

More information

3. Covariance and Correlation

3. Covariance and Correlation Virtual Laboratories > 3. Expected Value > 1 2 3 4 5 6 3. Covariace ad Correlatio Recall that by takig the expected value of various trasformatios of a radom variable, we ca measure may iterestig characteristics

More information

Gregory Carey, 1998 Linear Transformations & Composites - 1. Linear Transformations and Linear Composites

Gregory Carey, 1998 Linear Transformations & Composites - 1. Linear Transformations and Linear Composites Gregory Carey, 1998 Liear Trasformatios & Composites - 1 Liear Trasformatios ad Liear Composites I Liear Trasformatios of Variables Meas ad Stadard Deviatios of Liear Trasformatios A liear trasformatio

More information

Department of Computer Science, University of Otago

Department of Computer Science, University of Otago Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly

More information

Maximum Likelihood Estimators.

Maximum Likelihood Estimators. Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio

More information

Coordinating Principal Component Analyzers

Coordinating Principal Component Analyzers Coordiatig Pricipal Compoet Aalyzers J.J. Verbeek ad N. Vlassis ad B. Kröse Iformatics Istitute, Uiversity of Amsterdam Kruislaa 403, 1098 SJ Amsterdam, The Netherlads Abstract. Mixtures of Pricipal Compoet

More information

Definition. Definition. 7-2 Estimating a Population Proportion. Definition. Definition

Definition. Definition. 7-2 Estimating a Population Proportion. Definition. Definition 7- stimatig a Populatio Proportio I this sectio we preset methods for usig a sample proportio to estimate the value of a populatio proportio. The sample proportio is the best poit estimate of the populatio

More information

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5 Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.

More information

Plug-in martingales for testing exchangeability on-line

Plug-in martingales for testing exchangeability on-line Plug-i martigales for testig exchageability o-lie Valetia Fedorova, Alex Gammerma, Ilia Nouretdiov, ad Vladimir Vovk Computer Learig Research Cetre Royal Holloway, Uiversity of Lodo, UK {valetia,ilia,alex,vovk}@cs.rhul.ac.uk

More information

9.8: THE POWER OF A TEST

9.8: THE POWER OF A TEST 9.8: The Power of a Test CD9-1 9.8: THE POWER OF A TEST I the iitial discussio of statistical hypothesis testig, the two types of risks that are take whe decisios are made about populatio parameters based

More information

Class Meeting # 16: The Fourier Transform on R n

Class Meeting # 16: The Fourier Transform on R n MATH 18.152 COUSE NOTES - CLASS MEETING # 16 18.152 Itroductio to PDEs, Fall 2011 Professor: Jared Speck Class Meetig # 16: The Fourier Trasform o 1. Itroductio to the Fourier Trasform Earlier i the course,

More information

13 Fast Fourier Transform (FFT)

13 Fast Fourier Transform (FFT) 13 Fast Fourier Trasform FFT) The fast Fourier trasform FFT) is a algorithm for the efficiet implemetatio of the discrete Fourier trasform. We begi our discussio oce more with the cotiuous Fourier trasform.

More information

Chapter 5: Inner Product Spaces

Chapter 5: Inner Product Spaces Chapter 5: Ier Product Spaces Chapter 5: Ier Product Spaces SECION A Itroductio to Ier Product Spaces By the ed of this sectio you will be able to uderstad what is meat by a ier product space give examples

More information

Asymptotic Growth of Functions

Asymptotic Growth of Functions CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll

More information

Chapter Gaussian Elimination

Chapter Gaussian Elimination Chapter 04.06 Gaussia Elimiatio After readig this chapter, you should be able to:. solve a set of simultaeous liear equatios usig Naïve Gauss elimiatio,. lear the pitfalls of the Naïve Gauss elimiatio

More information

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Case Study. Normal and t Distributions. Density Plot. Normal Distributions Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca

More information

Lesson 12. Sequences and Series

Lesson 12. Sequences and Series Retur to List of Lessos Lesso. Sequeces ad Series A ifiite sequece { a, a, a,... a,...} ca be thought of as a list of umbers writte i defiite order ad certai patter. It is usually deoted by { a } =, or

More information

A Gentle Introduction to Algorithms: Part II

A Gentle Introduction to Algorithms: Part II A Getle Itroductio to Algorithms: Part II Cotets of Part I:. Merge: (to merge two sorted lists ito a sigle sorted list.) 2. Bubble Sort 3. Merge Sort: 4. The Big-O, Big-Θ, Big-Ω otatios: asymptotic bouds

More information

Hypothesis Tests Applied to Means

Hypothesis Tests Applied to Means The Samplig Distributio of the Mea Hypothesis Tests Applied to Meas Recall that the samplig distributio of the mea is the distributio of sample meas that would be obtaied from a particular populatio (with

More information

Divide and Conquer. Maximum/minimum. Integer Multiplication. CS125 Lecture 4 Fall 2015

Divide and Conquer. Maximum/minimum. Integer Multiplication. CS125 Lecture 4 Fall 2015 CS125 Lecture 4 Fall 2015 Divide ad Coquer We have see oe geeral paradigm for fidig algorithms: the greedy approach. We ow cosider aother geeral paradigm, kow as divide ad coquer. We have already see a

More information

Irreducible polynomials with consecutive zero coefficients

Irreducible polynomials with consecutive zero coefficients Irreducible polyomials with cosecutive zero coefficiets Theodoulos Garefalakis Departmet of Mathematics, Uiversity of Crete, 71409 Heraklio, Greece Abstract Let q be a prime power. We cosider the problem

More information

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value

More information

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork Solutios to Selected Problems I: Patter Classificatio by Duda, Hart, Stork Joh L. Weatherwax February 4, 008 Problem Solutios Chapter Bayesia Decisio Theory Problem radomized rules Part a: Let Rx be the

More information

Entropy of bi-capacities

Entropy of bi-capacities Etropy of bi-capacities Iva Kojadiovic LINA CNRS FRE 2729 Site école polytechique de l uiv. de Nates Rue Christia Pauc 44306 Nates, Frace iva.kojadiovic@uiv-ates.fr Jea-Luc Marichal Applied Mathematics

More information

Economics 140A Confidence Intervals and Hypothesis Testing

Economics 140A Confidence Intervals and Hypothesis Testing Ecoomics 140A Cofidece Itervals ad Hypothesis Testig Obtaiig a estimate of a parameter is ot the al purpose of statistical iferece because it is highly ulikely that the populatio value of a parameter is

More information

Normal Distribution.

Normal Distribution. Normal Distributio www.icrf.l Normal distributio I probability theory, the ormal or Gaussia distributio, is a cotiuous probability distributio that is ofte used as a first approimatio to describe realvalued

More information

ON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: A SIGNIFICANCE TEST FOR REGRESSION MODELS*

ON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: A SIGNIFICANCE TEST FOR REGRESSION MODELS* Kobe Uiversity Ecoomic Review 52 (2006) 27 ON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: A SIGNIFICANCE TEST FOR REGRESSION MODELS* By HISASHI TANIZAKI I this paper, we cosider a oparametric permutatio

More information

Systems Design Project: Indoor Location of Wireless Devices

Systems Design Project: Indoor Location of Wireless Devices Systems Desig Project: Idoor Locatio of Wireless Devices Prepared By: Bria Murphy Seior Systems Sciece ad Egieerig Washigto Uiversity i St. Louis Phoe: (805) 698-5295 Email: bcm1@cec.wustl.edu Supervised

More information

Confidence Intervals for One Mean with Tolerance Probability

Confidence Intervals for One Mean with Tolerance Probability Chapter 421 Cofidece Itervals for Oe Mea with Tolerace Probability Itroductio This procedure calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) with

More information

Incremental calculation of weighted mean and variance

Incremental calculation of weighted mean and variance Icremetal calculatio of weighted mea ad variace Toy Fich faf@cam.ac.uk dot@dotat.at Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically

More information

Section 7-3 Estimating a Population. Requirements

Section 7-3 Estimating a Population. Requirements Sectio 7-3 Estimatig a Populatio Mea: σ Kow Key Cocept This sectio presets methods for usig sample data to fid a poit estimate ad cofidece iterval estimate of a populatio mea. A key requiremet i this sectio

More information

Convexity, Inequalities, and Norms

Convexity, Inequalities, and Norms Covexity, Iequalities, ad Norms Covex Fuctios You are probably familiar with the otio of cocavity of fuctios. Give a twicedifferetiable fuctio ϕ: R R, We say that ϕ is covex (or cocave up) if ϕ (x) 0 for

More information

4.1 Sigma Notation and Riemann Sums

4.1 Sigma Notation and Riemann Sums 0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas

More information

Lecture Notes CMSC 251

Lecture Notes CMSC 251 We have this messy summatio to solve though First observe that the value remais costat throughout the sum, ad so we ca pull it out frot Also ote that we ca write 3 i / i ad (3/) i T () = log 3 (log ) 1

More information

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed Multi-Evet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria

More information

THE HEIGHT OF q-binary SEARCH TREES

THE HEIGHT OF q-binary SEARCH TREES THE HEIGHT OF q-binary SEARCH TREES MICHAEL DRMOTA AND HELMUT PRODINGER Abstract. q biary search trees are obtaied from words, equipped with the geometric distributio istead of permutatios. The average

More information

ORDERS OF GROWTH KEITH CONRAD

ORDERS OF GROWTH KEITH CONRAD ORDERS OF GROWTH KEITH CONRAD Itroductio Gaiig a ituitive feel for the relative growth of fuctios is importat if you really wat to uderstad their behavior It also helps you better grasp topics i calculus

More information

3. Continuous Random Variables

3. Continuous Random Variables Statistics ad probability: 3-1 3. Cotiuous Radom Variables A cotiuous radom variable is a radom variable which ca take values measured o a cotiuous scale e.g. weights, stregths, times or legths. For ay

More information

Soving Recurrence Relations

Soving Recurrence Relations Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree

More information

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k.

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k. 18.409 A Algorithmist s Toolkit September 17, 009 Lecture 3 Lecturer: Joatha Keler Scribe: Adre Wibisoo 1 Outlie Today s lecture covers three mai parts: Courat-Fischer formula ad Rayleigh quotiets The

More information

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008 I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces

More information

Key Ideas Section 8-1: Overview hypothesis testing Hypothesis Hypothesis Test Section 8-2: Basics of Hypothesis Testing Null Hypothesis

Key Ideas Section 8-1: Overview hypothesis testing Hypothesis Hypothesis Test Section 8-2: Basics of Hypothesis Testing Null Hypothesis Chapter 8 Key Ideas Hypothesis (Null ad Alterative), Hypothesis Test, Test Statistic, P-value Type I Error, Type II Error, Sigificace Level, Power Sectio 8-1: Overview Cofidece Itervals (Chapter 7) are

More information

1 Introduction to reducing variance in Monte Carlo simulations

1 Introduction to reducing variance in Monte Carlo simulations Copyright c 007 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a uow mea µ = E(X) of a distributio by

More information

Methods of Evaluating Estimators

Methods of Evaluating Estimators Math 541: Statistical Theory II Istructor: Sogfeg Zheg Methods of Evaluatig Estimators Let X 1, X 2,, X be i.i.d. radom variables, i.e., a radom sample from f(x θ), where θ is ukow. A estimator of θ is

More information

PSYCHOLOGICAL STATISTICS

PSYCHOLOGICAL STATISTICS UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION B Sc. Cousellig Psychology (0 Adm.) IV SEMESTER COMPLEMENTARY COURSE PSYCHOLOGICAL STATISTICS QUESTION BANK. Iferetial statistics is the brach of statistics

More information

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. This documet was writte ad copyrighted by Paul Dawkis. Use of this documet ad its olie versio is govered by the Terms ad Coditios of Use located at http://tutorial.math.lamar.edu/terms.asp. The olie versio

More information

The second difference is the sequence of differences of the first difference sequence, 2

The second difference is the sequence of differences of the first difference sequence, 2 Differece Equatios I differetial equatios, you look for a fuctio that satisfies ad equatio ivolvig derivatives. I differece equatios, istead of a fuctio of a cotiuous variable (such as time), we look for

More information

Alternatives To Pearson s and Spearman s Correlation Coefficients

Alternatives To Pearson s and Spearman s Correlation Coefficients Alteratives To Pearso s ad Spearma s Correlatio Coefficiets Floreti Smaradache Chair of Math & Scieces Departmet Uiversity of New Mexico Gallup, NM 8730, USA Abstract. This article presets several alteratives

More information

BASIC STATISTICS. Discrete. Mass Probability Function: P(X=x i ) Only one finite set of values is considered {x 1, x 2,...} Prob. t = 1.

BASIC STATISTICS. Discrete. Mass Probability Function: P(X=x i ) Only one finite set of values is considered {x 1, x 2,...} Prob. t = 1. BASIC STATISTICS 1.) Basic Cocepts: Statistics: is a sciece that aalyzes iformatio variables (for istace, populatio age, height of a basketball team, the temperatures of summer moths, etc.) ad attempts

More information

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee

More information

NEW HIGH PERFORMANCE COMPUTATIONAL METHODS FOR MORTGAGES AND ANNUITIES. Yuri Shestopaloff,

NEW HIGH PERFORMANCE COMPUTATIONAL METHODS FOR MORTGAGES AND ANNUITIES. Yuri Shestopaloff, NEW HIGH PERFORMNCE COMPUTTIONL METHODS FOR MORTGGES ND NNUITIES Yuri Shestopaloff, Geerally, mortgage ad auity equatios do ot have aalytical solutios for ukow iterest rate, which has to be foud usig umerical

More information

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1) BASIC STATISTICS. SAMPLES, RANDOM SAMPLING AND SAMPLE STATISTICS.. Radom Sample. The radom variables X,X 2,..., X are called a radom sample of size from the populatio f(x if X,X 2,..., X are mutually idepedet

More information

Statistical inference: example 1. Inferential Statistics

Statistical inference: example 1. Inferential Statistics Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either

More information

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas: Chapter 7 - Samplig Distributios 1 Itroductio What is statistics? It cosist of three major areas: Data Collectio: samplig plas ad experimetal desigs Descriptive Statistics: umerical ad graphical summaries

More information

THE problem of fitting a circle to a collection of points

THE problem of fitting a circle to a collection of points IEEE TRANACTION ON INTRUMENTATION AND MEAUREMENT, VOL. XX, NO. Y, MONTH 000 A Few Methods for Fittig Circles to Data Dale Umbach, Kerry N. Joes Abstract Five methods are discussed to fit circles to data.

More information

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length Joural o Satisfiability, Boolea Modelig ad Computatio 1 2005) 49-60 A Faster Clause-Shorteig Algorithm for SAT with No Restrictio o Clause Legth Evgey Datsi Alexader Wolpert Departmet of Computer Sciece

More information

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8 CME 30: NUMERICAL LINEAR ALGEBRA FALL 005/06 LECTURE 8 GENE H GOLUB 1 Positive Defiite Matrices A matrix A is positive defiite if x Ax > 0 for all ozero x A positive defiite matrix has real ad positive

More information

Function factorization using warped Gaussian processes

Function factorization using warped Gaussian processes Fuctio factorizatio usig warped Gaussia processes Mikkel N. Schmidt ms@imm.dtu.dk Uiversity of Cambridge, Departmet of Egieerig, Trumpigto Street, Cambridge, CB2 PZ, UK Abstract We itroduce a ew approach

More information

DAME - Microsoft Excel add-in for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2

DAME - Microsoft Excel add-in for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2 Itroductio DAME - Microsoft Excel add-i for solvig multicriteria decisio problems with scearios Radomir Perzia, Jaroslav Ramik 2 Abstract. The mai goal of every ecoomic aget is to make a good decisio,

More information

Sequences and Series

Sequences and Series CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their

More information

1 Correlation and Regression Analysis

1 Correlation and Regression Analysis 1 Correlatio ad Regressio Aalysis I this sectio we will be ivestigatig the relatioship betwee two cotiuous variable, such as height ad weight, the cocetratio of a ijected drug ad heart rate, or the cosumptio

More information

B1. Fourier Analysis of Discrete Time Signals

B1. Fourier Analysis of Discrete Time Signals B. Fourier Aalysis of Discrete Time Sigals Objectives Itroduce discrete time periodic sigals Defie the Discrete Fourier Series (DFS) expasio of periodic sigals Defie the Discrete Fourier Trasform (DFT)

More information

We have seen that the physically observable properties of a quantum system are represented

We have seen that the physically observable properties of a quantum system are represented Chapter 14 Probability, Expectatio Value ad Ucertaity We have see that the physically observable properties of a quatum system are represeted by Hermitea operators (also referred to as observables ) such

More information

5: Introduction to Estimation

5: Introduction to Estimation 5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample

More information

Lecture 7: Borel Sets and Lebesgue Measure

Lecture 7: Borel Sets and Lebesgue Measure EE50: Probability Foudatios for Electrical Egieers July-November 205 Lecture 7: Borel Sets ad Lebesgue Measure Lecturer: Dr. Krisha Jagaatha Scribes: Ravi Kolla, Aseem Sharma, Vishakh Hegde I this lecture,

More information

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection The aalysis of the Courot oligopoly model cosiderig the subjective motive i the strategy selectio Shigehito Furuyama Teruhisa Nakai Departmet of Systems Maagemet Egieerig Faculty of Egieerig Kasai Uiversity

More information

Estimating the Mean and Variance of a Normal Distribution

Estimating the Mean and Variance of a Normal Distribution Estimatig the Mea ad Variace of a Normal Distributio Learig Objectives After completig this module, the studet will be able to eplai the value of repeatig eperimets eplai the role of the law of large umbers

More information

Section IV.5: Recurrence Relations from Algorithms

Section IV.5: Recurrence Relations from Algorithms Sectio IV.5: Recurrece Relatios from Algorithms Give a recursive algorithm with iput size, we wish to fid a Θ (best big O) estimate for its ru time T() either by obtaiig a explicit formula for T() or by

More information

A gentle introduction to Expectation Maximization

A gentle introduction to Expectation Maximization A getle itroductio to Expectatio Maximizatio Mark Johso Brow Uiversity November 2009 1 / 15 Outlie What is Expectatio Maximizatio? Mixture models ad clusterig EM for setece topic modelig 2 / 15 Why Expectatio

More information

Grade 7. Strand: Number Specific Learning Outcomes It is expected that students will:

Grade 7. Strand: Number Specific Learning Outcomes It is expected that students will: Strad: Number Specific Learig Outcomes It is expected that studets will: 7.N.1. Determie ad explai why a umber is divisible by 2, 3, 4, 5, 6, 8, 9, or 10, ad why a umber caot be divided by 0. [C, R] [C]

More information

Lecture 2: Karger s Min Cut Algorithm

Lecture 2: Karger s Min Cut Algorithm priceto uiv. F 3 cos 5: Advaced Algorithm Desig Lecture : Karger s Mi Cut Algorithm Lecturer: Sajeev Arora Scribe:Sajeev Today s topic is simple but gorgeous: Karger s mi cut algorithm ad its extesio.

More information

Cluster Validity Measurement Techniques

Cluster Validity Measurement Techniques Cluster Validity Measuremet Techiques Ferec Kovács, Csaba Legáy, Attila Babos Departmet of Automatio ad Applied Iformatics Budapest Uiversity of Techology ad Ecoomics Goldma György tér 3, H- Budapest,

More information

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring No-life isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy

More information

MARTINGALES AND A BASIC APPLICATION

MARTINGALES AND A BASIC APPLICATION MARTINGALES AND A BASIC APPLICATION TURNER SMITH Abstract. This paper will develop the measure-theoretic approach to probability i order to preset the defiitio of martigales. From there we will apply this

More information

Standard Errors and Confidence Intervals

Standard Errors and Confidence Intervals Stadard Errors ad Cofidece Itervals Itroductio I the documet Data Descriptio, Populatios ad the Normal Distributio a sample had bee obtaied from the populatio of heights of 5-year-old boys. If we assume

More information

TIGHT BOUNDS ON EXPECTED ORDER STATISTICS

TIGHT BOUNDS ON EXPECTED ORDER STATISTICS Probability i the Egieerig ad Iformatioal Scieces, 20, 2006, 667 686+ Prited i the U+S+A+ TIGHT BOUNDS ON EXPECTED ORDER STATISTICS DIMITRIS BERTSIMAS Sloa School of Maagemet ad Operatios Research Ceter

More information

Solving Divide-and-Conquer Recurrences

Solving Divide-and-Conquer Recurrences Solvig Divide-ad-Coquer Recurreces Victor Adamchik A divide-ad-coquer algorithm cosists of three steps: dividig a problem ito smaller subproblems solvig (recursively) each subproblem the combiig solutios

More information

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL.

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL. Auities Uder Radom Rates of Iterest II By Abraham Zas Techio I.I.T. Haifa ISRAEL ad Haifa Uiversity Haifa ISRAEL Departmet of Mathematics, Techio - Israel Istitute of Techology, 3000, Haifa, Israel I memory

More information

Swaps: Constant maturity swaps (CMS) and constant maturity. Treasury (CMT) swaps

Swaps: Constant maturity swaps (CMS) and constant maturity. Treasury (CMT) swaps Swaps: Costat maturity swaps (CMS) ad costat maturity reasury (CM) swaps A Costat Maturity Swap (CMS) swap is a swap where oe of the legs pays (respectively receives) a swap rate of a fixed maturity, while

More information

Regularized Distance Metric Learning: Theory and Algorithm

Regularized Distance Metric Learning: Theory and Algorithm Regularized Distace Metric Learig: Theory ad Algorithm Rog Ji 1 Shiju Wag 2 Yag Zhou 1 1 Dept. of Computer Sciece & Egieerig, Michiga State Uiversity, East Lasig, MI 48824 2 Radiology ad Imagig Scieces,

More information

Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 11 04/01/2008. Sven Zenker

Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 11 04/01/2008. Sven Zenker Parameter estimatio for oliear models: Numerical approaches to solvig the iverse problem Lecture 11 04/01/2008 Sve Zeker Review: Trasformatio of radom variables Cosider probability distributio of a radom

More information

when n = 1, 2, 3, 4, 5, 6, This list represents the amount of dollars you have after n days. Note: The use of is read as and so on.

when n = 1, 2, 3, 4, 5, 6, This list represents the amount of dollars you have after n days. Note: The use of is read as and so on. Geometric eries Before we defie what is meat by a series, we eed to itroduce a related topic, that of sequeces. Formally, a sequece is a fuctio that computes a ordered list. uppose that o day 1, you have

More information

The Field of Complex Numbers

The Field of Complex Numbers The Field of Complex Numbers S. F. Ellermeyer The costructio of the system of complex umbers begis by appedig to the system of real umbers a umber which we call i with the property that i = 1. (Note that

More information

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

, a Wishart distribution with n -1 degrees of freedom and scale matrix. UMEÅ UNIVERSITET Matematisk-statistiska istitutioe Multivariat dataaalys D MSTD79 PA TENTAMEN 004-0-9 LÖSNINGSFÖRSLAG TILL TENTAMEN I MATEMATISK STATISTIK Multivariat dataaalys D, 5 poäg.. Assume that

More information

428 CHAPTER 12 MULTIPLE LINEAR REGRESSION

428 CHAPTER 12 MULTIPLE LINEAR REGRESSION 48 CHAPTER 1 MULTIPLE LINEAR REGRESSION Table 1-8 Team Wis Pts GF GA PPG PPcT SHG PPGA PKPcT SHGA Chicago 47 104 338 68 86 7. 4 71 76.6 6 Miesota 40 96 31 90 91 6.4 17 67 80.7 0 Toroto 8 68 3 330 79.3

More information

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return EVALUATING ALTERNATIVE CAPITAL INVESTMENT PROGRAMS By Ke D. Duft, Extesio Ecoomist I the March 98 issue of this publicatio we reviewed the procedure by which a capital ivestmet project was assessed. The

More information

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009) 18.409 A Algorithmist s Toolkit October 27, 2009 Lecture 13 Lecturer: Joatha Keler Scribe: Joatha Pies (2009) 1 Outlie Last time, we proved the Bru-Mikowski iequality for boxes. Today we ll go over the

More information

Determining the sample size

Determining the sample size Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors

More information

On The Comparison of Several Goodness of Fit Tests: With Application to Wind Speed Data

On The Comparison of Several Goodness of Fit Tests: With Application to Wind Speed Data Proceedigs of the 3rd WSEAS It Cof o RENEWABLE ENERGY SOURCES O The Compariso of Several Goodess of Fit Tests: With Applicatio to Wid Speed Data FAZNA ASHAHABUDDIN, KAMARULZAMAN IBRAHIM, AND ABDUL AZIZ

More information