Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis

Size: px
Start display at page:

Download "Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis"

Transcription

1 Joural of Machie Learig Research 8 (2007) Submitted 3/06; Revised 12/06; Published 5/07 Dimesioality Reductio of Multimodal Labeled Data by Local Fisher Discrimiat Aalysis Masashi Sugiyama Departmet of Computer Sciece Tokyo Istitute of Techology , O-okayama, Meguro-ku, Tokyo, , Japa SUGI@CS.TITECH.AC.JP Editor: Sam Roweis Abstract Reducig the dimesioality of data without losig itrisic iformatio is a importat preprocessig step i high-dimesioal data aalysis. Fisher discrimiat aalysis (FDA) is a traditioal techique for supervised dimesioality reductio, but it teds to give udesired results if samples i a class are multimodal. A usupervised dimesioality reductio method called localitypreservig projectio (LPP) ca work well with multimodal data due to its locality preservig property. However, sice LPP does ot take the label iformatio ito accout, it is ot ecessarily useful i supervised learig scearios. I this paper, we propose a ew liear supervised dimesioality reductio method called local Fisher discrimiat aalysis (LFDA), which effectively combies the ideas of FDA ad LPP. LFDA has a aalytic form of the embeddig trasformatio ad the solutio ca be easily computed just by solvig a geeralized eigevalue problem. We demostrate the practical usefuless ad high scalability of the LFDA method i data visualizatio ad classificatio tasks through extesive simulatio studies. We also show that LFDA ca be exteded to o-liear dimesioality reductio scearios by applyig the kerel trick. Keywords: dimesioality reductio, supervised learig, Fisher discrimiat aalysis, locality preservig projectio, affiity matrix 1. Itroductio The goal of dimesioality reductio is to embed high-dimesioal data samples i a low-dimesioal space so that most of itrisic iformatio cotaied i the data is preserved (e.g., Roweis ad Saul, 2000; Teebaum et al., 2000; Hito ad Salakhutdiov, 2006). Oce dimesioality reductio is carried out appropriately, the compact represetatio of the data ca be used for various succeedig tasks such as visualizatio, classificatio, etc. I this paper, we cosider the supervised dimesioality reductio problem, that is, samples are accompaied with class labels. Fisher discrimiat aalysis (FDA) (Fisher, 1936; Fukuaga, 1990) is a popular method for liear supervised dimesioality reductio. 1 FDA seeks for a embeddig trasformatio such. A efficiet MATLAB implemetatio of local Fisher discrimiat aalysis is available from the author s website: sugi/software/lfda/. 1. FDA may refer to the classificatio method which first projects data samples oto a oe-dimesioal subspace ad the classifies the samples by thresholdig (Fisher, 1936; Duda et al., 2001). The oe-dimesioal embeddig space used here is obtaied as the maximizer of the so-called Fisher criterio. This Fisher criterio ca be used for dimesioality reductio oto a space with dimesio more tha oe i multi-class problems (Fukuaga, 1990). With some abuse, we refer to the dimesioality reductio method based o the Fisher criterio as FDA (see Sectio 2.2 for detail). c 2007 Masashi Sugiyama.

2 SUGIYAMA that the betwee-class scatter is maximized ad the withi-class scatter is miimized. FDA is a traditioal but useful method for dimesioality reductio. However, it teds to give udesired results if samples i a class form several separate clusters (i.e., multimodal) (see, e.g., Fukuaga, 1990). Withi-class multimodality ca be observed i may practical applicatios. For example, i disease diagosis, the distributio of medial checkup samples of sick patiets could be multimodal sice there may be several differet causes eve for a sigle disease. I a traditioal task of hadwritte digit recogitio, withi-class multimodality appears if digits are classified ito, for example, eve ad odd umbers. More geerally, solvig multi-class classificatio problems by a set of two-class oe-versus-rest problems aturally iduces withi-class multimodality. For this reaso, there is a uiversal eed for reducig the dimesioality of multimodal data. I order to reduce the dimesioality of multimodal data appropriately, it is importat to preserve the local structure of the data. Locality-preservig projectio (LPP) (He ad Niyogi, 2004) meets this requiremet; LPP seeks for a embeddig trasformatio such that earby data pairs i the origial space close i the embeddig space. Thus LPP ca reduce the dimesioality of multimodal data without losig the local structure. However, LPP is a usupervised dimesioality reductio method ad does ot take the label iformatio ito accout. Therefore, it does ot ecessarily work appropriately i supervised dimesioality reductio scearios. I this paper, we propose a ew dimesioality reductio method called local Fisher discrimiat aalysis (LFDA). LFDA effectively combies the ideas of FDA ad LPP, that is, LFDA maximizes betwee-class separability ad preserves withi-class local structure at the same time. Thus LFDA is useful for dimesioality reductio of multimodal labeled data. The origial FDA provides a meaigful result oly whe the dimesioality of the embeddig space is smaller tha the umber of classes because of the rak deficiecy of the betwee-class scatter matrix (Fukuaga, 1990). This is a essetial limitatio of FDA i dimesioality reductio. O the other had, the proposed LFDA does ot geerally suffer from this problem ad ca be employed for dimesioality reductio ito a arbitrary dimesioal space. Furthermore, LFDA iherits a excellet property from FDA it has a aalytic form of the embeddig matrix ad the solutio ca be easily computed just by solvig a geeralized eigevalue problem. This is a advatage over recetly proposed supervised dimesioality reductio methods (e.g., Goldberger et al., 2005; Globerso ad Roweis, 2006). Furthermore, LFDA ca be aturally exteded to oliear dimesioality reductio scearios by applyig the kerel trick (Schölkopf ad Smola, 2002). The rest of this paper is orgaized as follows. I Sectio 2, we formulate the liear dimesioality reductio problem, briefly review FDA ad LPP, ad illustrate how they typically behave. I Sectio 3, we defie LFDA ad show its fudametal properties. I Sectio 4, we discuss the relatio betwee LFDA ad other methods. I Sectio 5, we umerically evaluate the performace of LFDA ad existig methods i visualizatio ad classificatio tasks usig bechmark data sets. Fially, we give cocludig remarks ad future prospects i Sectio Liear Dimesioality Reductio I this sectio, we formulate the problem of liear dimesioality reductio ad review existig methods. 1028

3 LOCAL FISHER DISCRIMINANT ANALYSIS 2.1 Formulatio Let x i R d (i = 1,2,...,) be d-dimesioal samples ad y i {1,2,...,c} be associated class labels, where is the umber of samples ad c is the umber of classes. Let l be the umber of samples i class l: c l =. Let X be the matrix of all samples: l=1 X (x 1 x 2 x ). Let z i R r (1 r d) be low-dimesioal represetatios of x i, where r is the reduced dimesio (i.e., the dimesio of the embeddig space). Effectively we cosider d to be large ad r to be small, but ot limited to such cases. For the momet, we focus o liear dimesioality reductio, that is, usig a d r trasformatio matrix T, the embedded samples z i are give by z i = T x i, where deotes the traspose of a matrix or vector. I Sectio 3.4, we exted our discussio to the o-liear dimesioality reductio scearios where the mappig from x i to z i is o-liear. 2.2 Fisher Discrimiat Aalysis for Dimesioality Reductio Oe of the most popular dimesioality reductio techiques is Fisher discrimiat aalysis (FDA) (Fisher, 1936; Fukuaga, 1990; Duda et al., 2001). Here we briefly describe the defiitio of FDA. Let S (w) ad S (b) be the withi-class scatter matrix ad the betwee-class scatter matrix: S (w) S (b) c l=1 i:y i =l c l=1 (x i µ l )(x i µ l ), (1) l (µ l µ)(µ l µ), (2) where i:yi =l deotes the summatio over i such that y i = l, µ l is the mea of the samples i class l, ad µ is the mea of all samples: µ l 1 l x i, i:y i =l µ 1 i = i=1x 1 We assume that S (w) has full rak. The FDA trasformatio matrix T FDA is defied as follows: 2 [ ( )] T FDA argmax T R d r tr (T S (w) T ) 1 T S (b) T. (3) c l=1 l µ l. 2. The followig defiitio is also used i the literature (e.g., Fukuaga, 1990) ad yields the same solutio. ( ) T FDA = argmax det T S (b) T ( ), T R d r det T S (w) T where det( ) deotes the determiat of a matrix. 1029

4 SUGIYAMA That is, FDA seeks a trasformatio matrix T such that the betwee-class scatter is maximized while the withi-class scatter is miimized. I the above formulatio, we implicitly assumed that T S (w) T is ivertible. This implies that the above optimizatio is subject to rak(t ) = r. Let {ϕ k } d k=1 be the geeralized eigevectors associated with the geeralized eigevalues λ 1 λ 2 λ d of the followig geeralized eigevalue problem: S (b) ϕ = λs (w) ϕ. The a solutio T FDA of the above maximizatio problem is aalytically give by T FDA = (ϕ 1 ϕ 2 ϕ r ). Note that the solutio is ot uique ad the followig simple costrait is sometimes imposed additioally (Fukuaga, 1990). T FDAS (w) T FDA = I r, where I r is the idetity matrix o R r. This costrait makes the withi-class scatter i the embeddig space sphered. The betwee-class scatter matrix S (b) has at most rak c 1 (Fukuaga, 1990). This implies that the multiplicity of λ = 0 is at least d c+1. Therefore, FDA ca fid at most c 1 meaigful features; the remaiig features foud by FDA are arbitrary. This is a essetial limitatio of FDA for dimesioality reductio ad is very restrictive i practice. 2.3 Locality-Preservig Projectio Aother dimesioality reductio techique that is relevat to the curret settig is locality-preservig projectio (LPP) (He ad Niyogi, 2004). Here we review LPP. Let A be a affiity matrix, that is, the -dimesioal matrix with the (i, j)-th elemet A i, j beig the affiity betwee x i ad x j. We assume that A i, j [0,1]; A i, j is large if x i ad x j are close ad A i, j is small if x i ad x j are far apart. There are several differet maers of defiig A. We briefly describe typical defiitios i Appedix D. The LPP trasformatio matrix T LPP is defied as follows: 3 T LPP argmi T R d r ( 1 2 A i, j T x i T x j 2 ) subject to T XDX T = I r, (4) where D is the -dimesioal diagoal matrix with i-th diagoal elemet beig D i,i A i, j. j=1 3. The matrix D i the costrait (4) is motivated by a geometric argumet (Belki ad Niyogi, 2003). However, it is sometimes dropped for the sake of simplicity (Ham et al., 2004). 1030

5 LOCAL FISHER DISCRIMINANT ANALYSIS Eq. (4) implies that LPP looks for a trasformatio matrix T such that earby data pairs i the origial space R d are kept close i the embeddig space. The costrait (4) is imposed for avoidig degeeracy. Let {ψ k } d k=1 be the geeralized eigevectors associated with the geeralized eigevalues γ 1 γ 2 γ d of the followig geeralized eigevalue problem: XLX ψ = γxdx ψ, where L D A. L is called the graph-laplacia matrix i the spectral graph theory (Chug, 1997), where A is see as the adjacecy matrix of a graph. He ad Niyogi (2004) showed that a solutio of Eq. (4) is give by T LPP = (ψ d ψ d 1 ψ d r+1 ). 2.4 Typical Behavior of FDA ad LPP Dimesioality reductio results obtaied by FDA ad LPP are illustrated i Figure 1 (LFDA will be defied ad explaied i Sectio 3) two-dimesioal two-class data samples are embedded ito a oe-dimesioal space. I LPP, the affiity matrix A is determied by the local scalig method (Zelik-Maor ad Peroa, 2005, see also Appedix D.4). For the simplest data set depicted i Figure 1(a), both FDA ad LPP icely separate the samples i differet classes ( ad ) from each other. For the data set depicted i Figure 1(b), FDA still works well, but LPP mixes samples i differet classes ito a sigle cluster. This is caused by the usupervised ature of LPP. O the other had, for the data set depicted i Figure 1(c), LPP works well but FDA collapses the samples i differet classes ito a sigle cluster. The reaso for the failure of FDA is that the levels of the betwee-class scatter ad the withi-class scatter are ot evaluated i a ituitively atural way because of the two separate clusters i -class (see also Fukuaga, 1990). 3. Local Fisher Discrimiat Aalysis As illustrated i Figure 1, FDA ca perform poorly if samples i a class form several separate clusters (i.e., multimodal). I other words, the udesired behavior of FDA is caused by the globality whe evaluatig the withi-class scatter ad the betwee-class scatter (e.g., Figure 1(c)). O the other had, because of the usupervised ature of LPP, it ca overlap samples i differet classes if they are close i the origial high-dimesioal space R d (e.g., Figure 1(b)). To overcome these problems, we propose combiig the ideas of FDA ad LPP; more specifically, we evaluate the levels of the betwee-class scatter ad the withi-class scatter i a local maer. This allows us to attai betwee-class separatio ad withi-class local structure preservatio at the same time. We call our ew method local Fisher discrimiat aalysis (LFDA). 3.1 Reformulatig FDA I order to itroduce LFDA, let us first reformulate FDA i a pairwise maer. 1031

6 SUGIYAMA 10 FDA LPP LFDA (a) Toy data set 1 10 FDA LPP LFDA 10 FDA LPP LFDA (b) Toy data set 2 (c) Toy data set 3 Figure 1: Examples of dimesioality reductio by FDA, LPP ad LFDA. Two-dimesioal twoclass samples are embedded ito a oe-dimesioal space. The lie i the figure deotes the oe-dimesioal embeddig space (which the data samples are projected o) obtaied by each method. 1032

7 LOCAL FISHER DISCRIMINANT ANALYSIS Lemma 1 S (w) ad S (b) defied by Eqs. (1) ad (2) ca be expressed as where S (w) = 1 2 S (b) = 1 2 W (w) i, j (x i x j )(x i x j ), (5) W (b) i, j (x i x j )(x i x j ), (6) W (w) i, j W (b) i, j { 1/l if y i = y j = l, 0 if y i y j, { 1/ 1/l if y i = y j = l, 1/ if y i y j. (7) (8) A proof of Lemma 1 is give i Appedix A. Note that 1/ 1/ l i Eq. (8) is egative while 1/ l ad 1/ i Eqs. (7) ad (8) are positive. This implies that if the data pairs i the same class are made close, the withi-class scatter matrix S (w) gets small ad the betwee-class scatter matrix S (b) gets large. O the other had, if the data pairs i differet classes are separated from each other, the betwee-class scatter matrix S (b) gets large. Therefore, we may iterpret FDA as keepig the sample pairs i the same class close ad the sample pairs i differet classes apart. A more formal discussio o the above iterpretatio is give i Appedix B. 3.2 Defiitio ad Typical Behavior of LFDA Based o the above pairwise expressio, let us defie the local withi-class scatter matrix S (w) ad the local betwee-class scatter matrix S (b) as follows. where S (w) 1 2 S (b) 1 2 W (w) i, j (x i x j )(x i x j ), (9) W (b) i, j (x i x j )(x i x j ), W (w) i, j W (b) i, j { Ai, j / l if y i = y j = l, 0 if y i y j, { Ai, j (1/ 1/ l ) if y i = y j = l, 1/ if y i y j. (10) (11) Namely, accordig to the affiity A i, j, we weight the values for the sample pairs i the same class. This meas that far apart sample pairs i the same class have less ifluece o S (w) ad S (b). Note that we do ot weight the values for the sample pairs i differet classes sice we wat to separate them from each other irrespective of the affiity i the origial space. From here o, we deote the local couterparts of matrices by symbols with tilde. 1033

8 SUGIYAMA We defie the LFDA trasformatio matrix T LFDA as [ tr T LFDA argmax T R d r ( (T S (w) T ) 1 T S (b) T )]. (12) That is, we look for a trasformatio matrix T such that earby data pairs i the same class are made close ad the data pairs i differet classes are separated from each other; far apart data pairs i the same class are ot imposed to be close. Eq. (12) is of the same form as Eq. (3). Therefore, we ca similarly compute a aalytic form of T LFDA by solvig a geeralized eigevalue problem of S (b) ad S (w). A efficiet implemetatio of LFDA is summarized as a pseudo code i Figure 2 (see Appedix C for detail). Toy examples of dimesioality reductio by LFDA are illustrated i Figure 1. We used the local scalig method for computig the affiity matrix A (see Appedix D.4). Note that we perform the earest eighbor search i the local scalig method i a classwise maer sice we do ot eed the affiity values for the sample pairs i differet classes (see Eqs. 10 ad 11). This highly cotributes to reducig the computatioal cost (see Appedix C). Figure 1 shows that LFDA gives desirable results for all three data sets, that is, LFDA ca compesate for the drawbacks of FDA ad LPP by effectively combiig the ideas of FDA ad LPP. If the affiity value A i, j is set to 1 for all sample pairs (i.e., all pairs are equally close to each other), S (w) ad S (b) agree with S (w) ad S (b), respectively, ad LFDA is reduced to the origial FDA. Therefore, LFDA may be regarded as a atural localized variat of FDA. 3.3 Properties of LFDA Here we discuss fudametal properties of LFDA. First, we give a iterpretatio of LFDA i terms of the poitwise scatter. S (w) ca be expressed as S (w) = P (w) i, yi i=1 where yi is the umber of samples i the class to which the sample x i belogs ad P (w) i is the poitwise local withi-class scatter matrix aroud x i : P (w) i j:y j =y i A i, j (x j x i )(x j x i ). Therefore, miimizig S (w) correspods to miimizig the weighted sum of the poitwise local withi-class scatter matrices over all samples. S (b) ca also be expressed i a similar way as S (b) = 1 ( ) P (w) i + 1 yi 2 P (b) i, (13) i=1 where P (b) i is the poitwise betwee-class scatter matrix aroud x i : i=1 P (b) i j:y j y i (x j x i )(x j x i ). 1034

9 LOCAL FISHER DISCRIMINANT ANALYSIS Iput: Labeled samples {(x i,y i ) x i R d,y i {1,2,...,c}} i=1 Dimesioality of embeddig space r (1 r d) Output: d r trasformatio matrix T LFDA 1: S (b) 0 d d ; 2: S (w) 0 d d ; 3: for l = 1,2,...,c % Compute scatter matrices i a classwise maer 4: {x i } l i=1 {x j} j:y j =l; 5: for i = 1,2,..., l % Determie local scalig 6: x (7) i 7: σ i x i x (7) 7th earest eighbor of x i amog {x j } l i ; 8: ed 9: for i, j = 1,2,..., l % Defie affiity matrix 10: A i, j exp( x i x j 2 /(σ i σ j )); 11: ed 12: X (x 1 x 2 x l ); 13: G Xdiag(A1 l )X X AX ; j=1 ; 14: S (b) S (b) + G/ + (1 l /)X X + X1 l (X1 l ) /; 15: S (w) S (w) + G/ l ; 16: ed 17: S (b) S (b) X1 (X1 ) / S (w) ; 18: { λ k, ϕ k } r k=1 geeralized eigevalues ad ormalized eigevectors of S (b) ϕ = λ S (w) ϕ; % λ 1 λ 2 λ d 19: T LFDA = ( λ1 ϕ 1 λ2 ϕ 2 λr ϕ r ); Figure 2: Efficiet implemetatio of LFDA (see Appedix C for detail). The affiity matrix is computed by the local scalig method (see Appedix D.4). Matrices ad vectors deoted with uderlie are classwise couterparts of the origial oes. 0 d d deotes the d d matrix with zeros, 1 l deotes the l -dimesioal vector with oes, ad diag(a1 l ) deotes the diagoal matrix with diagoal elemets A1 l. The geeralized eigevectors i lie 18 are ormalized by Eq. (14), which is ofte automatically carried out by a eigesolver. The weightig scheme of the eigevectors i lie 19 is explaied i Sectio 3.3. A possible bottleeck of the above implemetatio is the earest eighbor search i lie 6. This could be alleviated by icorporatig the prior kowledge of the data structure or by approximatio (see Saul ad Roweis, 2003, ad refereces therei). Aother possible bottleeck is the computatio of X A X i lie 13, which could be eased by sparsely defiig the affiity matrix (see Appedix D). A MATLAB implemetatio is available from sugi/software/lfda/. 1035

10 SUGIYAMA Note that P (b) i does ot iclude the localizatio factor A i, j. Eq. (13) implies that maximizig S (b) correspods to miimizig the weighted sum of the poitwise local withi-class scatter matrices ad maximizig the sum of the poitwise betwee-class scater matrices. Next, we discuss the issue of eigevalue multiplicity i LFDA. The origial FDA allows us to extract at most c 1 meaigful features sice the betwee-class scatter matrix S (b) has rak at most c 1 (Fukuaga, 1990). O the other had, the local betwee-class scatter matrix S (b) geerally has a much higher rak with less eigevalue multiplicity, thaks to the localizatio factor A i, j icluded i W (b) (see Eq. 11). I the simulatio show i Sectio 5, S (b) is always full rak for various data sets. Therefore, the proposed LFDA ca be practically employed for dimesioality reductio ito ay dimesioal spaces. This is a very importat ad sigificat improvemet over the origial FDA. Fially, we discuss the ivariace property of LFDA. The value of the LFDA criterio (12) is ivariat uder liear trasformatios, that is, for ay r-dimesioal ivertible matrix H, T LFDA H is also a solutio of Eq. (12). Therefore, the solutio T LFDA is ot uique the rage of the trasformatio H T LFDA is uiquely determied, but the distace metric (Goldberger et al., 2005; Globerso ad Roweis, 2006; Weiberger et al., 2006) i the embeddig space ca be arbitrary because of the arbitrariess of the matrix H. I practice, we propose determiig the LFDA trasformatio matrix T LFDA as follows. First, we rescale the geeralized eigevectors { ϕ k } d k=1 so that ϕ k S (w) ϕ k = { 1 if k = k, 0 if k k. (14) Note that this rescalig is ofte automatically carried out by a eigesolver. The we weight each geeralized eigevector by the square root of its associated geeralized eigevalue, that is, T LFDA = ( λ1 ϕ 1 λ2 ϕ 2 λr ϕ r ), (15) where λ 1 λ 2 λ d. This weightig scheme weakes the ifluece of mior eigevectors ad is show to work well i experimets (see Sectio 5). 3.4 Kerel LFDA for No-Liear Dimesioality Reductio Here we show how LFDA ca be exteded to o-liear dimesioality reductio scearios. As detailed i Appedix C, the geeralized eigevalue problem that eeds to be solved i LFDA ca be expressed as X L (b) X ϕ = λx L (w) X ϕ, (16) where L (b) = L (m) L (w) ad L (m) ad L (w) are defied by Eqs. (33) ad (35), respectively. Sice X ϕ i Eq. (16) belogs to the rage of X, it ca be expressed by usig some vector α R as X ϕ = X X α = K α, where K is the -dimesioal matrix with the (i, j)-th elemet beig K i, j x i x j. 1036

11 LOCAL FISHER DISCRIMINANT ANALYSIS The multiplyig Eq. (16) by X from the left-had side yields K L (b) K α = λk L (w) K α. (17) This implies that {x i } i=1 appear oly i terms of their ier products. Therefore, we ca obtai a o-liear variat of LFDA by the kerel trick (Vapik, 1998; Schölkopf et al., 1998), which is explaied below. Let us cosider a o-liear mappig φ(x) from R d to a reproducig kerel Hilbert space H (Aroszaj, 1950). Let K(x,x ) be the reproducig kerel of H. A typical choice of the kerel fuctio would be the Gaussia kerel: K(x,x ) = exp ( x x 2 ) 2σ 2, with σ > 0. For other choices, see, for example, Wahba (1990), Vapik (1998), ad Schölkopf ad Smola (2002). Because of the reproducig property of K(x,x ), K is ow the kerel matrix, that is, the (i, j)-th elemet is give by where, deotes the ier product i H. K i, j = φ(x i ),φ(x j ) = K(x i,x j ), It ca be cofirmed that L (w) is always degeerated (sice L (w) (1,1,...,1) always vaishes; see Eq. 35 for detail). Therefore, K L (w) K is always degeerated ad we caot directly solve the geeralized eigevalue problem (17). To cope with this problem, we propose regularizig K L (w) K ad solvig the followig geeralized eigevalue problem istead (cf. Friedma, 1989). K L (b) K α = λ(k L (w) K + εi ) α, (18) where ε is a small costat. Let { α k } k=1 be the geeralized eigevectors associated with the geeralized eigevalues λ 1 λ 2 λ of Eq. (18). The the embedded image of φ(x ) i H is give by K(x 1,x ) ( λ1 α 1 λ2 α 2 λr α r ) K(x 2,x ).. K(x,x ) We call this kerelized variat of LFDA kerel LFDA (KLFDA). Recetly, kerel fuctios for o-vectorial structured data such as strigs, trees, ad graphs have bee proposed (see, e.g., Lodhi et al., 2002; Duffy ad Collis, 2002; Kashima ad Koyaagi, 2002; Kodor ad Lafferty, 2002; Kashima et al., 2003; Gärter et al., 2003; Gärter, 2003). Sice KLFDA uses the samples oly via the kerel fuctio K(x,x ), it allows us to reduce the dimesioality of such o-vectorial data. 4. Compariso with Related Methods I this sectio, we discuss the relatio betwee the proposed LFDA ad other methods. 1037

12 SUGIYAMA 4.1 Dimesioality Reductio Usig Local Discrimiat Iformatio A discrimiat adaptive earest eighbor (DANN) classifier (Hastie ad Tibshirai, 1996a) employs a adapted distace metric at each test poit for classificatio. Based o a similar idea, they also proposed a global supervised dimesioality reductio method usig local discrimiat iformatio (LDI) i the same paper. We refer to this supervised dimesioality reductio method as LDI. The mai idea of LDI is to localize FDA which is very similar to the proposed LFDA. Here we discuss the relatio betwee LDI ad LFDA. I LDI, the data samples {x i } i=1 are first sphered accordig to the withi-class scatter matrix S (w), that is, for i = 1,2,...,, x i (S (w) ) 2 1 xi. Let A i, j be the weight of sample x j aroud x i defied by [ ( ) ] 3 3 x 1 i x j if x A i, j x i x (K) i x j < x i x (K) i i, 0 otherwise. where x (K) i is the K-th earest eighbor of x i i the sphered space. Note that 0 A i, j 1 ad A i, j is o-icreasig as x i x j icreases. Thus it has the same meaig as our affiity matrix. K is suggested to be determied by K = max(/5,50). Let µ [i] l be the local weighted mea of the sphered samples i class l aroud x i, ad let µ [i] be the local weighted mea of the sphered samples aroud x i : where µ [i] l 1 [i] l µ [i] 1 [i] A i, j x j, j:y j =l A i, j x j = 1 j=1 [i] [i] l [i] A i, j, j:y j =l A i, j. j=1 c l=1 [i] l µ[i] l, Let S (b) be the average betwee sum-of-squares matrix defied as S (b) i=1 1 [i] c l=1 The LDI trasformatio matrix T LDI is defied as T LDI argmax T R d r [i] l (µ[i] l µ[i] )(µ [i] l µ[i] ). [ ] T S (b) T subject to T T = I r. 1038

13 LOCAL FISHER DISCRIMINANT ANALYSIS T LDI is a trasformatio matrix for sphered samples; the LDI trasformatio matrix T LDI for osphered samples is give by T LDI = (S (w) ) 1 2 T LDI. Similar to FDA (ad LFDA), T LDI ca be efficietly computed by solvig a geeralized eigevalue problem. The average betwee sum-of-squares matrix S (b) is coceptually very similar to the local betweeclass scatter matrix S (b) i LFDA. Ideed, as proved i Appedix E, we ca express S (b) i a pairwise maer as where W (b) i, j S (b) = 1 2 k=1 1 [k] ( k=1 W (b) i, j (x i x j )(x i x j ), (19) 1 [k] 1 [k] l ) A i,k A j,k if y i = y j = l, 1 ( [k] ) A i,ka 2 j,k if y i y j. However, there exist critical differeces betwee LDI ad LFDA. A sigificat differece is that the values for the sample pairs i differet classes are also localized i LDI (see Eq. 20), while they are kept ulocalized i LFDA (see Eq. 11). This implies that far apart sample pairs i differet classes could be made close i LDI, which is ot desirable i supervised dimesioality reductio. Furthermore, the computatio of S (b) is slightly less efficiet tha S (b) sice W (b) icludes the summatio over k. Aother importat differece betwee LDI ad LFDA is that the withi-class scatter matrix S (w) is ot localized i LDI. However, as we showed i Sectio 3.1, the withi-class scatter matrix S (w) also accouts for collapsig the withi-class multimodal structure (i.e., far apart sample pairs i the same class are made close). This pheomeo is experimetally cofirmed i Sectio Mixture Discrimiat Aalysis FDA ca be iterpreted as maximum likelihood estimatio of Gaussia distributios with commo covariace ad differet meas for each class. Based o this view, Hastie ad Tibshirai (1996b) proposed mixture discrimiat aalysis (MDA), which exteds FDA to maximum likelihood estimatio of Gaussia mixture distributios. A maximum likelihood solutio is obtaied by a EM-type algorithm (cf. Dempster et al., 1977). However, this is a iterative algorithm ad gives oly a local optimal solutio. Therefore, the computatio of MDA is rather slow ad there is o guaratee that the global solutio ca be obtaied. Furthermore, the umber of mixture compoets (clusters) i each class as well as the iitial locatio of cluster ceters should be determied by users. For cluster ceters, usig stadard techiques such as k-meas clusterig (MacQuee, 1967; Everitt et al., 2001) or learig vector quatizatio (Kohoe, 1989) are recommeded. However, they are also iterative algorithms ad have o guaratee that the global solutio ca be obtaied. Furthermore, there seems to be o systematic method for determiig the umber of clusters. O the other had, the proposed LFDA cotais o tuig parameters (give that the affiity matrix is determied by the local scalig method, see Appedix D.4) ad the global solutio ca (20) 1039

14 SUGIYAMA be obtaied aalytically. However, it still lacks a probabilistic iterpretatio, which remais ope curretly. 4.3 Neighborhood Compoet Aalysis Goldberger et al. (2005) proposed a supervised dimesioality reductio method called eighborhood compoet aalysis (NCA). The NCA trasformatio matrix T NCA is defied as follows. ) T NCA argmax p i, j (T T ), (21) T R d r j:y j =y i where ( i=1 exp { (x i x j ) U(x i x j ) } p i, j (U) k i exp{ (x i x k ) if i j, U(x i x k )} 0 if i = j. The above defiitio correspods to maximizig the expected umber of correctly classified samples by a stochastic variat of earest eighbor classifiers. Therefore, NCA seeks a trasformatio matrix T such that the betwee-class separability is maximized. Eqs. (21) ad (22) imply that earby data pairs i the same class are made close, which is similar to the proposed LFDA. Ideed, the simulatio results i Sectio 5.2 show that NCA teds to preserve the multimodal structure of the data very well. However, a crucial weakess of NCA is optimizatio: the optimizatio problem (21) is o-covex. Therefore, there is o guaratee that the globally optimal solutio ca be obtaied. Goldberger et al. (2005) proposed usig a gradiet ascet method for optimizatio: T T + ε J NCA (T ), (23) where ε (> 0) is the step size ad the gradiet J NCA (T ) is give by ({ J NCA (T ) = 2T i=1 } p i, j (T T )}{ p i, j (T T )(x i x j )(x i x j ) j:y j =y i j=1 j:y j =y i p i, j (T T )(x i x j )(x i x j ) The gradiet ascet iteratio (23) is computatioally rather iefficiet. Also, the choice of the step size ε is troublesome. If the step size is small eough, the covergece to oe of the local optima is guarateed but such a choice makes the covergece very slow; o the other had, if the step size is too large, gradiet flows oscillate ad proper covergece properties may ot be guarateed aymore. Furthermore, the choice of the termiatio coditio i the iterative algorithm is ofte cumbersome i practice. Because of the o-covexity of the optimizatio problem, the quality of the obtaied solutio depeds o the iitializatio of the matrix T. A useful heuristic to alleviate the local optimum problem is to employ the FDA (or LFDA) result as a iitial matrix for optimizatio (Goldberger et al., 2005). I the experimets i Sectio 5, usig the LFDA result as a iitial matrix appears to be better tha the radom iitializatio. However, the local optima problem still remais eve with the above heuristic. ). (22) 1040

15 LOCAL FISHER DISCRIMINANT ANALYSIS Whe a dimesioality reductio techique is applied to classificatio tasks, we ofte wat to embed the data samples ito spaces with several differet dimesios the best dimesioality is later chose by, for example, cross-validatio (Stoe, 1974; Wahba, 1990). I such a sceario, NCA requires to optimize the trasformatio matrix idividually for each dimesioality r of the embeddig space. O the other had, LFDA eeds to compute the trasformatio matrix oly oce for the largest r; its sub-matrices become the optimal solutios for smaller dimesios. Therefore, LFDA is computatioally more efficiet tha NCA i this sceario. A simple MATLAB implemetatio of NCA is available. 4 We use this software i Sectio Maximally Collapsig Metric Learig I order to overcome the computatioal problem of NCA, Globerso ad Roweis (2006) proposed a alterative method called maximally collapsig metric learig (MCML). Let p i, j be the ideal value of p i, j(u) defied by Eq. (22): where p i, j is ormalized so that p i, j { 1 if yi = y j, 0 if y i y j, p i, j = 1. j i p i, j ca be attaied if all samples i the same class collapse ito a sigle poit while samples i other classes are mapped to other locatios. I reality, however, ay U may ot be able to attai p i, j (U) = p i, j exactly; istead the optimal approximatio to p i, j uder the Kullback-Leibler divergece (Kullback ad Leibler, 1951) is obtaied. This is formally defied as U MCML argmi U R d d ( ) p p i, j i, j log p i, j (U) subject to U PSD(r), (24) where PSD(r) is the set of all positive semidefiite matrices of rak r (i.e., r eigevalues are positive ad others are zero). Oce U MCML is obtaied, the MCML trasformatio matrix T MCML is computed by T MCML = (φ 1 φ 2 φ r ), (25) where {φ k } r k=1 are the eigevectors associated with the positive eigevalues η 1 η 2 η r > 0 of the followig eigevalue problem: U MCML φ = ηφ. Oe of the motivatios of MCML is to alleviate the difficulty of optimizatio i NCA. However, MCML still has a weakess i optimizatio: the optimizatio problem (24) is covex oly whe r = d, that is, the dimesioality is ot reduced but oly the distace metric of the origial space is chaged. This meas that if r < d (which is our primal focus i this paper), we may ot be able to 4. Implemetatio available at fowlkes/software/ca/. 1041

16 SUGIYAMA obtai the globally optimal solutio. Globerso ad Roweis (2006) proposed the followig heuristic algorithm to approximate T MCML. First, the optimizatio problem (24) with r = d is solved: Û MCML argmi U R d d ( p p i, j i, j log p i, j (U) ) subject to U PSD(d). (26) Although Eq. (26) is covex, a aalytic form of the uique optimal solutio Û MCML is ot kow yet. Globerso ad Roweis (2006) proposed usig the followig alterate iterative procedure for obtaiig Û MCML. U U ε J MCML (U), (27) U d k=1 max(0, η k ) φ k φ k, (28) where ε (> 0) is the step size, η k ad φ k are eigevalues ad eigevectors of U, ad the gradiet J MCML (U) is give by J MCML (U) = (p i, j p i, j (U))(x i x j )(x i x j ). The the eigevalue decompositio of Û MCML is carried out ad eigevalues η 1 η 2 η d ad associated eigevectors { φ k } d k=1 are obtaied: Û MCML φ = η φ. Fially, {φ k } r k=1 i Eq. (25) are replaced by { φ k } r k=1, which yields T MCML ( φ 1 φ 2 φ r ). (29) This approximatio is show to be practically useful (Globerso ad Roweis, 2006), although there seems to be o theoretical aalysis for this approximatio. MCML may have a advatage over NCA i computatio: there exists the aalytic approximatio (29) that ca be computed efficietly usig the solutio of aother covex optimizatio problem (26). However, MCML still relies o the gradiet-based alterate iterative algorithm (27) (28) to solve the covex optimizatio problem (26), which is computatioally very expesive sice the eigevalue decompositio of a d-dimesioal matrix should be carried out i each iteratio (see Eq. 28). Furthermore, the difficulty of appropriately choosig the step size ad the termiatio coditio i the iterative procedure still remais. Sice MCML requires all the samples i the same class to collapse ito a sigle poit, it is ot ecessarily useful i dimesioality reductio of multimodal data samples. Furthermore, the MCML results ca be sigificatly iflueced by outliers sice the outliers are also required to collapse ito the same sigle poit together with other samples. This pheomeo is illustrated i Figure 3, where a sigle outlier sigificatly chages the MCML result. Globerso ad Roweis (2006) showed that the sufficiet statistics of the MCML algorithm are poitwise scatter matrices (cf. Sectio 3.3). Sice LFDA also has a iterpretatio i terms of poitwise scatter matrices, there may be a lik betwee LFDA ad MCML ad this eeds to be ivestigated i the future work. 1042

17 LOCAL FISHER DISCRIMINANT ANALYSIS 10 8 LFDA MCML 10 8 LFDA MCML outlier (a) Toy data set (b) Toy data set 1 Figure 3: Toy examples of dimesioality reductio. The toy data set 1 is equivalet to the oe used i Figure 1(a). The data set 1 icludes a sigle outlier. 4.5 Remark o Rak Costrait The optimizatio problem of MCML (see Eq. 24) is ot geerally covex sice the rak costrait is o-covex (Boyd ad Vadeberghe, 2004). The o-covexity iduced by the rak costrait seems to be a uiversal problem i dimesioality reductio. NCA elimiates the rak costrait by decomposig U ito T T (see Eqs. 21 ad 22). However, eve with this decompositio, the optimizatio problem is still o-covex. O the other had, FDA, LDI, ad LFDA cast the optimizatio problem i the form of the Rayleigh quotiet. This is computatioally very advatageous sice it allows us to aalytically determie the rage of the embeddig space. However, we caot determie the distace metric i the embeddig space sice the Rayleigh quotiet is ivariat uder liear trasformatios. For this reaso, a additioal criterio is eeded to determie the distace metric (see also Sectio 3.3). 5. Numerical Examples I this sectio, we umerically evaluate the performace of LFDA ad existig methods. 5.1 Exploratory Data Aalysis Here we use the Thyroid disease data set available from the UCI machie learig repository (Blake ad Merz, 1998) ad illustrate how LFDA ca be used for exploratory data aalysis. The origial data cosists of 5-dimesioal iput vector x of the followig laboratory tests. 1. T3-resi uptake test. 2. Total Serum thyroxi as measured by the isotopic displacemet method. 1043

18 SUGIYAMA 8 Hyperthyroidism Hypothyroidism 8 Hyperthyroidism Hypothyroidism First Feature First Feature 30 Euthyroidism 20 Euthyroidism First Feature First Feature (a) FDA (b) LFDA Figure 4: Histograms of the first feature values obtaied by FDA ad LFDA for the Thyroid disease data set. The top row correspods to the sick patiets ad the bottom row correspods to the healthy patiets. 3. Total Serum triiodothyroie as measured by radioimmuo assay. 4. Basal thyroid-stimulatig hormoe (TSH) as measured by radioimmuo assay. 5. Maximal absolute differece of TSH value after ijectio of 200 micro grams of thyrotropireleasig hormoe as compared to the basal value. The task is to predict whether patiets thyroids are euthyroidism, hypothyroidism, or hyperthyroidism (Coomas et al., 1983), that is, whether patiets thyroids are ormal, hypo-fuctioig, or hyper-fuctioig (Blake ad Merz, 1998). The diagosis (the class label) is based o a complete medical record, icludig aamesis, sca etc. Here we merge the hypothyroidism class ad the hyperthyroidism class ito a sigle class ad create biary labeled data (whether thyroids are ormal or ot). Our goal is to predict whether patiets thyroids are ormal, hypo-fuctioig, or hyper-fuctioig from the biary labeled data samples. Figure 4 depicts the histograms of the first feature values obtaied by FDA ad LFDA the top row correspods to the sick patiets ad the bottom row correspods to the healthy patiets. This shows that both FDA ad LFDA separate the patiets with ormal thyroids from sick patiets reasoably well. I additio to betwee-class separability, LFDA clearly preserves the multimodal structure amog sick patiets (i.e., hypo-fuctioig ad hyper-fuctioig), which is lost by ordiary FDA. Aother iterestig fidig from the figure is that the first feature values obtaied by LFDA has a strog egative correlatio to the fuctioig level of thyroids this could be used for predictig the fuctioig level of thyroids. 1044

19 LOCAL FISHER DISCRIMINANT ANALYSIS Data Set d -ad- class class Letter recogitio 16 A & C B Iris 4 Setosa & Virgiica Versicolour Table 1: Two-class data sets used for visualizatio experimets (r = 2). 5.2 Data Visualizatio Here we apply the proposed ad existig dimesioality reductio methods to bechmark data sets ad ivestigate how they behave i data visualizatio tasks. We use the Letter recogitio data set ad the Iris data set available from the UCI machie learig repository (Blake ad Merz, 1998). Table 1 describes the specificatios of the data sets. Each data set cotais three types of samples specified by,, ad. We merged ad ito a sigle class ad created two-class problems. We test LFDA, FDA, LPP, LDI, NCA, ad MCML ad evaluate the betwee-class separability (i.e., ad are well separated from ) ad the withi-class multimodality preservatio capability (i.e., ad are well grouped). For LPP ad LFDA, we determied the affiity matrix by the local scalig method (see Appedix D.4). For NCA, we used the LFDA result as a iitial matrix sice this iitializatio scheme appears to work better tha the radom iitializatio. FDA allows us to extract oly oe meaigful feature i two-class classificatio problems (see Sectio 2.2), so we choose the secod feature radomly here. Figures 5 ad 6 depict the samples embedded i the two-dimesioal space foud by each method. The horizotal axis is the first feature foud by each method, while the vertical axis is the secod feature. First, we compare the embeddig results of LFDA with those of FDA ad LPP. For the Letter recogitio data set (see the top row of Figure 5), LFDA icely separates samples i differet classes from each other, ad at the same time, it clearly preserves withi-class multimodality. FDA separates ad from well, but withi-class multimodality is lost, that is, ad are mixed. LPP gives two separate clusters of samples, but samples i differet classes are mixed i oe of the clusters. For the Iris data set (see the top row of Figure 6), LFDA simultaeously achieves betwee-class separatio ad withi-class multimodality preservatio. O the other had, FDA teds to mix samples i differet classes, which would be caused by withi-class multimodality. LPP also works well for this data set because three clusters are well separated from each other i the origial high-dimesioal space. Overall, LFDA is foud to be more appropriate for embeddig labeled multimodal data samples tha FDA ad LPP, implyig that our primal goal has bee successfully achieved. Next, we compare the results of LFDA with those of LDI, NCA, ad MCML. For the Letter recogitio data set (see Figure 5), LFDA, LDI, NCA, ad MCML separate the samples i differet classes from each other very well. However, LDI ad MCML collapse ad ito a sigle cluster, while LFDA ad NCA preserve the multimodal structure clearly. The NCA result is almost idetical to the LFDA result (i.e., the iitial value of the NCA iteratio), but the result may vary if the iitial value for the gradiet ascet algorithm is chaged. For the Iris data set (see Figure 6), LFDA, LDI, ad NCA work excelletly i both betwee-class separatio ad withi-class multimodality preservatio. O the other had, MCML mixes the samples i differet classes. Overall, LDI works fairly well, but the withi-class multimodal structure is sometimes lost sice LDI oly partially takes withi-class multimodality ito accout (see Sectio 4.1). NCA also works very well, which 1045

20 SUGIYAMA LFDA A C B FDA LPP LDI NCA MCML Figure 5: Visualizatio of the Letter recogitio data set. LFDA FDA LPP Setosa Virgiica Verisicolour LDI 0 NCA MCML Figure 6: Visualizatio of the Iris data set. 1046

21 LOCAL FISHER DISCRIMINANT ANALYSIS Data ame Iput dimesioality # of traiig samples # of test samples # of realizatios baaa breast-cacer diabetes flare-solar germa heart image rigorm splice thyroid titaic twoorm waveform USPS-eo USPS-sl Table 2: List of biary classificatio data sets. Data sets idicated by cotai itrisic withiclass multimodal structures. implies that the heuristic to use the LFDA result as a iitial value is useful. However, NCA does ot provide sigificat performace improvemet over LFDA i the above simulatios. The MCML results have similar tedecies to FDA. Based o the above simulatio results, we coclude that LFDA is a promisig method i the visualizatio of multimodal labeled data. 5.3 Classificatio Here we apply the proposed ad existig dimesioality reductio techiques to classificatio tasks, ad objectively evaluate the effectiveess of LFDA. There are several measures for quatitatively evaluatig separability of data samples i differet classes (e.g., Fukuaga, 1990; Globerso et al., 2005). Here we use a simple oe: misclassificatio rate by a oe-earest-eighbor classifier. As explaied i Sectio 3.3, the LFDA criterio is ivariat uder liear trasformatios, while the misclassificatio rate by a oe-earest-eighbor classifier depeds o the distace metric. This meas that the followig simulatio results are highly depedet o the ormalizatio scheme (15). We employ the IDA data sets, 5 which are stadard biary classificatio data sets origially used i Rätsch et al. (2001). I additio, we use two biary classificatio data sets created from the USPS hadwritte digit data set. The first task (USPS-eo) is to separate eve umbers from odd umbers ad the secod task (USPS-sl) is to separate small umbers ( 0 to 4 ) from large umbers ( 5 to 9 ). For traiig ad testig, 100 samples are radomly chose for each digit. Table 2 summarizes 5. Data sets available at

22 SUGIYAMA Data set LFDA LDI NCA MCML LPP PCA baaa 13.7 ± ± ± ± ± ± 0.8 breast-cacer 34.7 ± ± ± ± ± ± 5.0 diabetes 32.0 ± ± ± ± ± 3.0 flare-solar 39.2 ± ± ± ± 5.1 germa 29.9 ± ± ± ± ± ± 2.4 heart 21.9 ± ± ± ± ± ± 3.5 image 3.2 ± ± ± ± ± 0.5 rigorm 21.1 ± ± ± ± ± ± 1.4 splice 16.9 ± ± ± ± ± 1.3 thyroid 4.6 ± ± ± ± ± ± 2.6 titaic 33.1 ± ± ± ± ± ± 12.0 twoorm 3.5 ± ± ± ± ± ± 0.6 waveform 12.5 ± ± ± ± ± ± 1.2 USPS1 9.0 ± ± ± ± 0.7 USPS ± ± ± ± ± 0.8 Computatio time (ratio) Table 3: Meas ad stadard deviatios of the misclassificatio rate whe the embeddig dimesioality is chose by cross validatio. For each data set, the best method ad comparable oes based o the t-test at the sigificace level 5% are marked by. Data sets idicated by cotai the itrisic withi-class multimodal structure. the specificatios of the data sets. The rigorm, twoorm, ad waveform data sets cotai features with oly oise. The thyroid, waveform, USPS-eo, ad USPS-sl data sets cotai itrisic withiclass multimodal structures sice they are coverted from multi-class problems by mergig some of the classes. The baaa data set is also multimodal. We test LFDA, LDI, NCA, MCML, LPP, ad pricipal compoet aalysis (PCA). Note that LPP ad PCA are usupervised dimesioality reductio methods, while others are supervised methods. NCA is ot tested for the diabetes, flare-solar, image, splice, USPS-eo, ad USPS-sl data sets ad MCML is ot tested for the flare-solar ad USPS-eo data sets sice the executio time is too log. Figure 7 depicts the mea misclassificatio rate by a oe-earest-eighbor classifier as fuctios of the dimesioality r of the reduced space. The error bars are omitted for clear visibility. Istead, we plotted the results of the followig sigificace test: for each dimesioality r, the mea misclassificatio rate by the best method ad comparable oes based o the t-test (Hekel, 1979) at the sigificace level 5% are marked by. The results show that LFDA works quite well, but overall there is o sigle best method that cosistetly outperforms the others. Table 3 describes the mea ad stadard deviatio of the misclassificatio rate by each method whe the embeddig dimesioality r is chose by 5-fold cross validatio (Stoe, 1974; Wahba, 1990); for the USPS-eo ad USPS-sl data sets, we used 20-fold cross validatio sice this was more accurate. For each data set, the best method ad comparable oes based o the t-test at the sigificace level 5% are idicated by. The table shows that overall LFDA has excellet 1048

23 LOCAL FISHER DISCRIMINANT ANALYSIS Mea Misclassificatio Rate LFDA LDI NCA MCML LPP PCA baaa 1 2 Reduced Dimesio r Mea Misclassificatio Rate breast cacer Reduced Dimesio r Mea Misclassificatio Rate diabetes Reduced Dimesio r flare solar 0.42 germa 0.27 heart Mea Misclassificatio Rate Mea Misclassificatio Rate Mea Misclassificatio Rate Reduced Dimesio r Reduced Dimesio r Reduced Dimesio r 0.35 image 0.36 rigorm 0.45 splice Mea Misclassificatio Rate Mea Misclassificatio Rate Mea Misclassificatio Rate Reduced Dimesio r Reduced Dimesio r Reduced Dimesio r thyroid titaic twoorm Mea Misclassificatio Rate Mea Misclassificatio Rate Mea Misclassificatio Rate Reduced Dimesio r Reduced Dimesio r Reduced Dimesio r 0.26 waveform 0.5 USPS eo 0.5 USPS sl Mea Misclassificatio Rate Mea Misclassificatio Rate Mea Misclassificatio Rate Reduced Dimesio r Reduced Dimesio r Reduced Dimesio r Figure 7: Mea misclassificatio rates by a oe-earest-eighbor method as fuctios of the dimesioality of the embeddig space. For each dimesio, the best method ad comparable oes based o the t-test at the sigificace level 5% are marked by. 1049

Modified Line Search Method for Global Optimization

Modified Line Search Method for Global Optimization Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should

More information

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8 CME 30: NUMERICAL LINEAR ALGEBRA FALL 005/06 LECTURE 8 GENE H GOLUB 1 Positive Defiite Matrices A matrix A is positive defiite if x Ax > 0 for all ozero x A positive defiite matrix has real ad positive

More information

Chapter 7 Methods of Finding Estimators

Chapter 7 Methods of Finding Estimators Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of

More information

I. Chi-squared Distributions

I. Chi-squared Distributions 1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.

More information

LECTURE 13: Cross-validation

LECTURE 13: Cross-validation LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M

More information

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k.

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k. 18.409 A Algorithmist s Toolkit September 17, 009 Lecture 3 Lecturer: Joatha Keler Scribe: Adre Wibisoo 1 Outlie Today s lecture covers three mai parts: Courat-Fischer formula ad Rayleigh quotiets The

More information

Department of Computer Science, University of Otago

Department of Computer Science, University of Otago Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly

More information

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample

More information

5: Introduction to Estimation

5: Introduction to Estimation 5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample

More information

Incremental calculation of weighted mean and variance

Incremental calculation of weighted mean and variance Icremetal calculatio of weighted mea ad variace Toy Fich faf@cam.ac.uk dot@dotat.at Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically

More information

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

, a Wishart distribution with n -1 degrees of freedom and scale matrix. UMEÅ UNIVERSITET Matematisk-statistiska istitutioe Multivariat dataaalys D MSTD79 PA TENTAMEN 004-0-9 LÖSNINGSFÖRSLAG TILL TENTAMEN I MATEMATISK STATISTIK Multivariat dataaalys D, 5 poäg.. Assume that

More information

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork Solutios to Selected Problems I: Patter Classificatio by Duda, Hart, Stork Joh L. Weatherwax February 4, 008 Problem Solutios Chapter Bayesia Decisio Theory Problem radomized rules Part a: Let Rx be the

More information

Coordinating Principal Component Analyzers

Coordinating Principal Component Analyzers Coordiatig Pricipal Compoet Aalyzers J.J. Verbeek ad N. Vlassis ad B. Kröse Iformatics Istitute, Uiversity of Amsterdam Kruislaa 403, 1098 SJ Amsterdam, The Netherlads Abstract. Mixtures of Pricipal Compoet

More information

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008 I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces

More information

Properties of MLE: consistency, asymptotic normality. Fisher information.

Properties of MLE: consistency, asymptotic normality. Fisher information. Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout

More information

Maximum Likelihood Estimators.

Maximum Likelihood Estimators. Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio

More information

Soving Recurrence Relations

Soving Recurrence Relations Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree

More information

Hypothesis testing. Null and alternative hypotheses

Hypothesis testing. Null and alternative hypotheses Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate

More information

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection The aalysis of the Courot oligopoly model cosiderig the subjective motive i the strategy selectio Shigehito Furuyama Teruhisa Nakai Departmet of Systems Maagemet Egieerig Faculty of Egieerig Kasai Uiversity

More information

CHAPTER 3 THE TIME VALUE OF MONEY

CHAPTER 3 THE TIME VALUE OF MONEY CHAPTER 3 THE TIME VALUE OF MONEY OVERVIEW A dollar i the had today is worth more tha a dollar to be received i the future because, if you had it ow, you could ivest that dollar ad ear iterest. Of all

More information

Convexity, Inequalities, and Norms

Convexity, Inequalities, and Norms Covexity, Iequalities, ad Norms Covex Fuctios You are probably familiar with the otio of cocavity of fuctios. Give a twicedifferetiable fuctio ϕ: R R, We say that ϕ is covex (or cocave up) if ϕ (x) 0 for

More information

A probabilistic proof of a binomial identity

A probabilistic proof of a binomial identity A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two

More information

5 Boolean Decision Trees (February 11)

5 Boolean Decision Trees (February 11) 5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected

More information

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed Multi-Evet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria

More information

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Case Study. Normal and t Distributions. Density Plot. Normal Distributions Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca

More information

NEW HIGH PERFORMANCE COMPUTATIONAL METHODS FOR MORTGAGES AND ANNUITIES. Yuri Shestopaloff,

NEW HIGH PERFORMANCE COMPUTATIONAL METHODS FOR MORTGAGES AND ANNUITIES. Yuri Shestopaloff, NEW HIGH PERFORMNCE COMPUTTIONL METHODS FOR MORTGGES ND NNUITIES Yuri Shestopaloff, Geerally, mortgage ad auity equatios do ot have aalytical solutios for ukow iterest rate, which has to be foud usig umerical

More information

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009) 18.409 A Algorithmist s Toolkit October 27, 2009 Lecture 13 Lecturer: Joatha Keler Scribe: Joatha Pies (2009) 1 Outlie Last time, we proved the Bru-Mikowski iequality for boxes. Today we ll go over the

More information

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,

More information

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value

More information

Now here is the important step

Now here is the important step LINEST i Excel The Excel spreadsheet fuctio "liest" is a complete liear least squares curve fittig routie that produces ucertaity estimates for the fit values. There are two ways to access the "liest"

More information

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments Project Deliverables CS 361, Lecture 28 Jared Saia Uiversity of New Mexico Each Group should tur i oe group project cosistig of: About 6-12 pages of text (ca be loger with appedix) 6-12 figures (please

More information

Determining the sample size

Determining the sample size Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors

More information

Sequences and Series

Sequences and Series CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their

More information

1 Computing the Standard Deviation of Sample Means

1 Computing the Standard Deviation of Sample Means Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.

More information

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. This documet was writte ad copyrighted by Paul Dawkis. Use of this documet ad its olie versio is govered by the Terms ad Coditios of Use located at http://tutorial.math.lamar.edu/terms.asp. The olie versio

More information

Regularized Distance Metric Learning: Theory and Algorithm

Regularized Distance Metric Learning: Theory and Algorithm Regularized Distace Metric Learig: Theory ad Algorithm Rog Ji 1 Shiju Wag 2 Yag Zhou 1 1 Dept. of Computer Sciece & Egieerig, Michiga State Uiversity, East Lasig, MI 48824 2 Radiology ad Imagig Scieces,

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13 EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may

More information

Plug-in martingales for testing exchangeability on-line

Plug-in martingales for testing exchangeability on-line Plug-i martigales for testig exchageability o-lie Valetia Fedorova, Alex Gammerma, Ilia Nouretdiov, ad Vladimir Vovk Computer Learig Research Cetre Royal Holloway, Uiversity of Lodo, UK {valetia,ilia,alex,vovk}@cs.rhul.ac.uk

More information

Chapter 5: Inner Product Spaces

Chapter 5: Inner Product Spaces Chapter 5: Ier Product Spaces Chapter 5: Ier Product Spaces SECION A Itroductio to Ier Product Spaces By the ed of this sectio you will be able to uderstad what is meat by a ier product space give examples

More information

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations CS3A Hadout 3 Witer 00 February, 00 Solvig Recurrece Relatios Itroductio A wide variety of recurrece problems occur i models. Some of these recurrece relatios ca be solved usig iteratio or some other ad

More information

Systems Design Project: Indoor Location of Wireless Devices

Systems Design Project: Indoor Location of Wireless Devices Systems Desig Project: Idoor Locatio of Wireless Devices Prepared By: Bria Murphy Seior Systems Sciece ad Egieerig Washigto Uiversity i St. Louis Phoe: (805) 698-5295 Email: bcm1@cec.wustl.edu Supervised

More information

Asymptotic Growth of Functions

Asymptotic Growth of Functions CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll

More information

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical

More information

CHAPTER 3 DIGITAL CODING OF SIGNALS

CHAPTER 3 DIGITAL CODING OF SIGNALS CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity

More information

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5 Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.

More information

Chapter 7: Confidence Interval and Sample Size

Chapter 7: Confidence Interval and Sample Size Chapter 7: Cofidece Iterval ad Sample Size Learig Objectives Upo successful completio of Chapter 7, you will be able to: Fid the cofidece iterval for the mea, proportio, ad variace. Determie the miimum

More information

Normal Distribution.

Normal Distribution. Normal Distributio www.icrf.l Normal distributio I probability theory, the ormal or Gaussia distributio, is a cotiuous probability distributio that is ofte used as a first approimatio to describe realvalued

More information

Measures of Spread and Boxplots Discrete Math, Section 9.4

Measures of Spread and Boxplots Discrete Math, Section 9.4 Measures of Spread ad Boxplots Discrete Math, Sectio 9.4 We start with a example: Example 1: Comparig Mea ad Media Compute the mea ad media of each data set: S 1 = {4, 6, 8, 10, 1, 14, 16} S = {4, 7, 9,

More information

DAME - Microsoft Excel add-in for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2

DAME - Microsoft Excel add-in for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2 Itroductio DAME - Microsoft Excel add-i for solvig multicriteria decisio problems with scearios Radomir Perzia, Jaroslav Ramik 2 Abstract. The mai goal of every ecoomic aget is to make a good decisio,

More information

Confidence Intervals for One Mean

Confidence Intervals for One Mean Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a

More information

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth Questio 1: What is a ordiary auity? Let s look at a ordiary auity that is certai ad simple. By this, we mea a auity over a fixed term whose paymet period matches the iterest coversio period. Additioally,

More information

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring No-life isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy

More information

Entropy of bi-capacities

Entropy of bi-capacities Etropy of bi-capacities Iva Kojadiovic LINA CNRS FRE 2729 Site école polytechique de l uiv. de Nates Rue Christia Pauc 44306 Nates, Frace iva.kojadiovic@uiv-ates.fr Jea-Luc Marichal Applied Mathematics

More information

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee

More information

Finding the circle that best fits a set of points

Finding the circle that best fits a set of points Fidig the circle that best fits a set of poits L. MAISONOBE October 5 th 007 Cotets 1 Itroductio Solvig the problem.1 Priciples............................... Iitializatio.............................

More information

Your organization has a Class B IP address of 166.144.0.0 Before you implement subnetting, the Network ID and Host ID are divided as follows:

Your organization has a Class B IP address of 166.144.0.0 Before you implement subnetting, the Network ID and Host ID are divided as follows: Subettig Subettig is used to subdivide a sigle class of etwork i to multiple smaller etworks. Example: Your orgaizatio has a Class B IP address of 166.144.0.0 Before you implemet subettig, the Network

More information

INVESTMENT PERFORMANCE COUNCIL (IPC) Guidance Statement on Calculation Methodology

INVESTMENT PERFORMANCE COUNCIL (IPC) Guidance Statement on Calculation Methodology Adoptio Date: 4 March 2004 Effective Date: 1 Jue 2004 Retroactive Applicatio: No Public Commet Period: Aug Nov 2002 INVESTMENT PERFORMANCE COUNCIL (IPC) Preface Guidace Statemet o Calculatio Methodology

More information

Totally Corrective Boosting Algorithms that Maximize the Margin

Totally Corrective Boosting Algorithms that Maximize the Margin Mafred K. Warmuth mafred@cse.ucsc.edu Ju Liao liaoju@cse.ucsc.edu Uiversity of Califoria at Sata Cruz, Sata Cruz, CA 95064, USA Guar Rätsch Guar.Raetsch@tuebige.mpg.de Friedrich Miescher Laboratory of

More information

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here). BEGINNING ALGEBRA Roots ad Radicals (revised summer, 00 Olso) Packet to Supplemet the Curret Textbook - Part Review of Square Roots & Irratioals (This portio ca be ay time before Part ad should mostly

More information

Review: Classification Outline

Review: Classification Outline Data Miig CS 341, Sprig 2007 Decisio Trees Neural etworks Review: Lecture 6: Classificatio issues, regressio, bayesia classificatio Pretice Hall 2 Data Miig Core Techiques Classificatio Clusterig Associatio

More information

FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix

FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. Powers of a matrix We begi with a propositio which illustrates the usefuless of the diagoalizatio. Recall that a square matrix A is diogaalizable if

More information

CS103X: Discrete Structures Homework 4 Solutions

CS103X: Discrete Structures Homework 4 Solutions CS103X: Discrete Structures Homewor 4 Solutios Due February 22, 2008 Exercise 1 10 poits. Silico Valley questios: a How may possible six-figure salaries i whole dollar amouts are there that cotai at least

More information

INVESTMENT PERFORMANCE COUNCIL (IPC)

INVESTMENT PERFORMANCE COUNCIL (IPC) INVESTMENT PEFOMANCE COUNCIL (IPC) INVITATION TO COMMENT: Global Ivestmet Performace Stadards (GIPS ) Guidace Statemet o Calculatio Methodology The Associatio for Ivestmet Maagemet ad esearch (AIM) seeks

More information

Capacity of Wireless Networks with Heterogeneous Traffic

Capacity of Wireless Networks with Heterogeneous Traffic Capacity of Wireless Networks with Heterogeeous Traffic Migyue Ji, Zheg Wag, Hamid R. Sadjadpour, J.J. Garcia-Lua-Aceves Departmet of Electrical Egieerig ad Computer Egieerig Uiversity of Califoria, Sata

More information

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number. GCSE STATISTICS You should kow: 1) How to draw a frequecy diagram: e.g. NUMBER TALLY FREQUENCY 1 3 5 ) How to draw a bar chart, a pictogram, ad a pie chart. 3) How to use averages: a) Mea - add up all

More information

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis Ruig Time ( 3.) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.

More information

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE 6.44. The absolute value of the complex number z a bi is

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE 6.44. The absolute value of the complex number z a bi is 0_0605.qxd /5/05 0:45 AM Page 470 470 Chapter 6 Additioal Topics i Trigoometry 6.5 Trigoometric Form of a Complex Number What you should lear Plot complex umbers i the complex plae ad fid absolute values

More information

Lesson 17 Pearson s Correlation Coefficient

Lesson 17 Pearson s Correlation Coefficient Outlie Measures of Relatioships Pearso s Correlatio Coefficiet (r) -types of data -scatter plots -measure of directio -measure of stregth Computatio -covariatio of X ad Y -uique variatio i X ad Y -measurig

More information

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S CONTROL CHART FOR THE CHANGES IN A PROCESS Supraee Lisawadi Departmet of Mathematics ad Statistics, Faculty of Sciece ad Techoology, Thammasat

More information

1 Correlation and Regression Analysis

1 Correlation and Regression Analysis 1 Correlatio ad Regressio Aalysis I this sectio we will be ivestigatig the relatioship betwee two cotiuous variable, such as height ad weight, the cocetratio of a ijected drug ad heart rate, or the cosumptio

More information

Theorems About Power Series

Theorems About Power Series Physics 6A Witer 20 Theorems About Power Series Cosider a power series, f(x) = a x, () where the a are real coefficiets ad x is a real variable. There exists a real o-egative umber R, called the radius

More information

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100B Istructor: Nicolas Christou Three importat distributios: Distributios related to the ormal distributio Chi-square (χ ) distributio.

More information

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5

More information

Ekkehart Schlicht: Economic Surplus and Derived Demand

Ekkehart Schlicht: Economic Surplus and Derived Demand Ekkehart Schlicht: Ecoomic Surplus ad Derived Demad Muich Discussio Paper No. 2006-17 Departmet of Ecoomics Uiversity of Muich Volkswirtschaftliche Fakultät Ludwig-Maximilias-Uiversität Müche Olie at http://epub.ub.ui-mueche.de/940/

More information

Cutting-Plane Training of Structural SVMs

Cutting-Plane Training of Structural SVMs Cuttig-Plae Traiig of Structural SVMs Thorste Joachims, Thomas Filey, ad Chu-Nam Joh Yu Abstract Discrimiative traiig approaches like structural SVMs have show much promise for buildig highly complex ad

More information

Lecture 4: Cheeger s Inequality

Lecture 4: Cheeger s Inequality Spectral Graph Theory ad Applicatios WS 0/0 Lecture 4: Cheeger s Iequality Lecturer: Thomas Sauerwald & He Su Statemet of Cheeger s Iequality I this lecture we assume for simplicity that G is a d-regular

More information

3. Greatest Common Divisor - Least Common Multiple

3. Greatest Common Divisor - Least Common Multiple 3 Greatest Commo Divisor - Least Commo Multiple Defiitio 31: The greatest commo divisor of two atural umbers a ad b is the largest atural umber c which divides both a ad b We deote the greatest commo gcd

More information

Tradigms of Astundithi and Toyota

Tradigms of Astundithi and Toyota Tradig the radomess - Desigig a optimal tradig strategy uder a drifted radom walk price model Yuao Wu Math 20 Project Paper Professor Zachary Hamaker Abstract: I this paper the author iteds to explore

More information

Lesson 15 ANOVA (analysis of variance)

Lesson 15 ANOVA (analysis of variance) Outlie Variability -betwee group variability -withi group variability -total variability -F-ratio Computatio -sums of squares (betwee/withi/total -degrees of freedom (betwee/withi/total -mea square (betwee/withi

More information

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx SAMPLE QUESTIONS FOR FINAL EXAM REAL ANALYSIS I FALL 006 3 4 Fid the followig usig the defiitio of the Riema itegral: a 0 x + dx 3 Cosider the partitio P x 0 3, x 3 +, x 3 +,......, x 3 3 + 3 of the iterval

More information

(VCP-310) 1-800-418-6789

(VCP-310) 1-800-418-6789 Maual VMware Lesso 1: Uderstadig the VMware Product Lie I this lesso, you will first lear what virtualizatio is. Next, you ll explore the products offered by VMware that provide virtualizatio services.

More information

Data-Enhanced Predictive Modeling for Sales Targeting

Data-Enhanced Predictive Modeling for Sales Targeting Data-Ehaced Predictive Modelig for Sales Targetig Saharo Rosset Richard D. Lawrece Abstract We describe ad aalyze the idea of data-ehaced predictive modelig (DEM). The term ehaced here refers to the case

More information

5.4 Amortization. Question 1: How do you find the present value of an annuity? Question 2: How is a loan amortized?

5.4 Amortization. Question 1: How do you find the present value of an annuity? Question 2: How is a loan amortized? 5.4 Amortizatio Questio 1: How do you fid the preset value of a auity? Questio 2: How is a loa amortized? Questio 3: How do you make a amortizatio table? Oe of the most commo fiacial istrumets a perso

More information

AMS 2000 subject classification. Primary 62G08, 62G20; secondary 62G99

AMS 2000 subject classification. Primary 62G08, 62G20; secondary 62G99 VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS Jia Huag 1, Joel L. Horowitz 2 ad Fegrog Wei 3 1 Uiversity of Iowa, 2 Northwester Uiversity ad 3 Uiversity of West Georgia Abstract We cosider a oparametric

More information

Partial Di erential Equations

Partial Di erential Equations Partial Di eretial Equatios Partial Di eretial Equatios Much of moder sciece, egieerig, ad mathematics is based o the study of partial di eretial equatios, where a partial di eretial equatio is a equatio

More information

Factoring x n 1: cyclotomic and Aurifeuillian polynomials Paul Garrett <garrett@math.umn.edu>

Factoring x n 1: cyclotomic and Aurifeuillian polynomials Paul Garrett <garrett@math.umn.edu> (March 16, 004) Factorig x 1: cyclotomic ad Aurifeuillia polyomials Paul Garrett Polyomials of the form x 1, x 3 1, x 4 1 have at least oe systematic factorizatio x 1 = (x 1)(x 1

More information

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction THE ARITHMETIC OF INTEGERS - multiplicatio, expoetiatio, divisio, additio, ad subtractio What to do ad what ot to do. THE INTEGERS Recall that a iteger is oe of the whole umbers, which may be either positive,

More information

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation HP 1C Statistics - average ad stadard deviatio Average ad stadard deviatio cocepts HP1C average ad stadard deviatio Practice calculatig averages ad stadard deviatios with oe or two variables HP 1C Statistics

More information

Domain 1: Designing a SQL Server Instance and a Database Solution

Domain 1: Designing a SQL Server Instance and a Database Solution Maual SQL Server 2008 Desig, Optimize ad Maitai (70-450) 1-800-418-6789 Domai 1: Desigig a SQL Server Istace ad a Database Solutio Desigig for CPU, Memory ad Storage Capacity Requiremets Whe desigig a

More information

Convention Paper 6764

Convention Paper 6764 Audio Egieerig Society Covetio Paper 6764 Preseted at the 10th Covetio 006 May 0 3 Paris, Frace This covetio paper has bee reproduced from the author's advace mauscript, without editig, correctios, or

More information

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics

More information

Building Blocks Problem Related to Harmonic Series

Building Blocks Problem Related to Harmonic Series TMME, vol3, o, p.76 Buildig Blocks Problem Related to Harmoic Series Yutaka Nishiyama Osaka Uiversity of Ecoomics, Japa Abstract: I this discussio I give a eplaatio of the divergece ad covergece of ifiite

More information

Research Article Sign Data Derivative Recovery

Research Article Sign Data Derivative Recovery Iteratioal Scholarly Research Network ISRN Applied Mathematics Volume 0, Article ID 63070, 7 pages doi:0.540/0/63070 Research Article Sig Data Derivative Recovery L. M. Housto, G. A. Glass, ad A. D. Dymikov

More information

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the. Cofidece Itervals A cofidece iterval is a iterval whose purpose is to estimate a parameter (a umber that could, i theory, be calculated from the populatio, if measuremets were available for the whole populatio).

More information

Probabilistic Engineering Mechanics. Do Rosenblatt and Nataf isoprobabilistic transformations really differ?

Probabilistic Engineering Mechanics. Do Rosenblatt and Nataf isoprobabilistic transformations really differ? Probabilistic Egieerig Mechaics 4 (009) 577 584 Cotets lists available at ScieceDirect Probabilistic Egieerig Mechaics joural homepage: wwwelseviercom/locate/probegmech Do Roseblatt ad Nataf isoprobabilistic

More information

Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 11 04/01/2008. Sven Zenker

Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 11 04/01/2008. Sven Zenker Parameter estimatio for oliear models: Numerical approaches to solvig the iverse problem Lecture 11 04/01/2008 Sve Zeker Review: Trasformatio of radom variables Cosider probability distributio of a radom

More information

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length Joural o Satisfiability, Boolea Modelig ad Computatio 1 2005) 49-60 A Faster Clause-Shorteig Algorithm for SAT with No Restrictio o Clause Legth Evgey Datsi Alexader Wolpert Departmet of Computer Sciece

More information

PSYCHOLOGICAL STATISTICS

PSYCHOLOGICAL STATISTICS UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION B Sc. Cousellig Psychology (0 Adm.) IV SEMESTER COMPLEMENTARY COURSE PSYCHOLOGICAL STATISTICS QUESTION BANK. Iferetial statistics is the brach of statistics

More information

Chatpun Khamyat Department of Industrial Engineering, Kasetsart University, Bangkok, Thailand ocpky@hotmail.com

Chatpun Khamyat Department of Industrial Engineering, Kasetsart University, Bangkok, Thailand ocpky@hotmail.com SOLVING THE OIL DELIVERY TRUCKS ROUTING PROBLEM WITH MODIFY MULTI-TRAVELING SALESMAN PROBLEM APPROACH CASE STUDY: THE SME'S OIL LOGISTIC COMPANY IN BANGKOK THAILAND Chatpu Khamyat Departmet of Idustrial

More information