Multiclass sparse logistic regression for classification of multiple cancer types using gene expression data

Size: px
Start display at page:

Download "Multiclass sparse logistic regression for classification of multiple cancer types using gene expression data"

Transcription

1 Computatonal Statstcs & Data Analyss 51 (26) Multclass sparse logstc regresson for classfcaton of multple cancer types usng gene expresson data Yongda Km a,, Sunghoon Kwon a, Seuck Heun Song b a Seoul Natonal Unversty, Korea b Korea Unversty, Korea Receved 22 March 26; receved n revsed form 23 May 26; accepted 5 June 26 Avalable onlne 3 June 26 Abstract Montorng gene expresson profles s a novel approach to cancer dagnoss. Several studes have showed that the sparse logstc regresson s a useful classfcaton method for gene expresson data. Not only does t gve a sparse soluton wth hgh accuracy, t provdes the user wth explct probabltes of classfcaton apart from the class nformaton. However, ts optmal extenson to more than two classes s not obvous. In ths paper, we propose a multclass extenson of sparse logstc regresson. Analyss of fve publcly avalable gene expresson data sets shows that the proposed method outperforms the standard multnomal logstc model n predcton accuracy as well as gene selectvty. 26 Elsever B.V. All rghts reserved. Keywords: Classfcaton; Gene expresson data; Multnomal logt model; One-aganst-all; Sparse logstc regresson 1. Introducton Constructng a classfcaton rule for tssue samples based on gene expresson profles has receved much attenton recently due to emergng mcroarray technology. A new challenge s that the number of genes (.e. the dmenson of nputs) s much larger than the number of tssue samples, n whch case standard classfcaton methods ether are not applcable or perform badly. Also, dentfyng a small subset of nformatve genes, called marker genes, whch dscrmnate types of tumors or tumor versus normal tssues, has become an mportant subject. Hence, good learnng algorthms wth gene expresson data should provde a classfcaton rule whch not only yelds hgh accuracy but also has the ablty to dentfy marker genes. In related lterature, Guyon et al. (22) proposed a recursve feature elmnaton technque wth support vector machnes, L et al. (22) ntroduced two Bayesan approaches wth the technque of automatc relevance determnaton, and Shevade and Keerth (23) and Roth (22) appled the sparse logstc regresson, to name just a few. Correspondng author. E-mal address: ydkm@stats.snu.ac.kr (Y. Km) /$ - see front matter 26 Elsever B.V. All rghts reserved. do:1.116/j.csda

2 1644 Y. Km et al. / Computatonal Statstcs & Data Analyss 51 (26) Among these tools, sparse logstc regresson s a useful classfcaton method for gene expresson data. It gves a sparse soluton wth hgh accuracy and also t provdes the user wth explct probabltes of classfcaton apart from the class nformaton. However, ts optmal extenson to more than two classes s not obvous. A standard multclass extenson of sparse logstc regresson mght be sparse multnomal logstc (SML) regresson (Krshnapuram et al., 24), whch s a sparse verson of the multnomal logt model a popular multclass formulaton n statstcs (see, for example, Agrest, 199). SML, however, has a problem n gene selecton. Snce the estmates of the regresson coeffcents depend on the choce of the baselne class (see Secton 2 for defnton), and so do the selected genes. Hence, some mportant genes are dropped n the fnal model, whch n turn degrades the predcton accuraces. Emprcal results n Secton 4 confrms ths observaton. In ths paper, we propose a new multclass extenson of sparse logstc regresson called sparse one-aganst-all logstc (SOVAL) regresson, whose man dea s to reduce a multclass problem to multple bnary problems and to construct a classfer usng the reduced multple bnary problems smultaneously. By analyzng fve real data sets of gene expressons, we show that SOVAL outperforms SML n predcton accuracy as well as gene selectvty. The paper s organzed as follows. In Secton 2, SOVAL as well as SML are presented. A computatonal algorthm based on the gradent LASSO algorthm of Km et al. (25) s gven n Secton 3. Results of numercal experments are presented n Secton 4 and concludng remarks follow n Secton Models Let {(x 1,y 1 ),...,(x n,y n )} be nput output pars of a gven data set where x R p s a gene expresson level and y {1, 2,...,J} s a type of cancer of the th tssue sample. Here, n s the number of tssues, p the number of genes and J the number of classes (.e. tumor types). We frst present SML and then propose SOVAL SML regresson SML starts wth the multnomal logt model exp ( f j (x ) ) Pr (y = j x ) = Jm=1 exp (f m (x )) for j = 1,...,J where f j (x ) = β (j) + β (j) 1 x 1 + +β (j) p x p. For dentfablty, ( we let β (J ) k ) = for k ( =, 1,...,p. ) Let β = β (1) 1),...,β(J, β j = β (j) 1,...,β(j) p and β = (β 1,...,β J 1 ). For the sparse model, we estmate β and β by maxmzng the log-lkelhood ( n J ))) J L 1 (β, β) = I (y = j) f j (x ) log exp (f m (x (1) =1 j=1 m=1 wth the constrant J 1 p j=1 k=1 β (j) k λ. Here, λ > s a regularzaton parameter, whch should be selected n advance usng cross valdaton or any other method. Once the regresson coeffcents β and β are estmated, the classfer s constructed as follows. Let c( j)be the cost of classfyng an observaton to the th class when the true class s j. Then, a new tssue sample wth gene expresson x s classfed nto class C(x) where C(x) = arg mn j J c( j)pr(y = j x). =1 If c( j) are all equal, whch s most frequent n practce, C(x) becomes arg max j Pr(y = j x).

3 Y. Km et al. / Computatonal Statstcs & Data Analyss 51 (26) The mportance of the kth gene for classfcaton of tumor types s measured by ρ k where J 1 ρ k = β (j) k j=1. The larger ρ k s, the more mportant the kth gene s for classfyng the tumor type and so genes wth suffcently large ρ k can be consdered as marker genes. Usng ρ k, we can reformulate SML as f j (x ) = θ (j) + ρ 1 θ (j) 1 x 1 + +ρ p θ (j) p x p wth J 1 j=1 θ (j) k = 1, ρ k for k = 1,...,p and p k=1 ρ k λ. Hence, SML can be consdered as a garrot type estmate (Breman, 1995) for ρ k, and so we expect that the soluton of ρ k s sparse. In SML, we set β (J ) k = for k = 1,...,pfor dentfablty of the model, and the regresson coeffcent β (j) k,j = J can be nterpreted as the log odds rato of the jth group versus the Jth group for the kth gene. In ths sense, we call the Jth class the baselne class. Ths conventon has a problem that the estmates depends on the choce of the baselne class. For an example, consder the followng smple stuaton. Let p = 1,J = 3 and λ = 1. Suppose x 1 s bnary (.e. x 1 {, 1}). Let Odd(k, j) be the odds rato of the kth group versus the jth group. That s, Odd(k, j) = n=1 I (y = k, x 1 = 1) n =1 I (y = j,x 1 = ) n=1 I (y = k, x 1 = ) n =1 I (y = j,x 1 = 1). Suppose log Odd(1 3) =.5 and log Odd(2 3) =.5. Then, the estmates of the regresson coeffcents from SML become β (1) 1 =.5 and β(2) 1 =.5 f we choose the thrd class as the baselne class. Now, suppose we change the baselne class to the second class. Then snce log Odd(1 2) = 1. and log Odd(3 2) =.5, n order for the class probabltes to reman the same, the estmates of β (1) 1 and β (3) 1 have to be 1. and.5, respectvely, whch s mpossble snce t volates the constrant (.e. β (1) 1 + β (3) 1 > 1). Hence, there s a danger that some mportant genes may be dropped n the fnal model due to the choce of the baselne class, whch results n poor predcton accuracy. Emprcal results n Secton 4 confrms ths observaton. Instead of choosng the baselne class, there are other ways to resolve the dentfcaton problem. An example s to let J j=1 β (j) k = (2) for all k. Ths constrant, however, makes the computaton harder. A man techncal dffculty of sparse logstc regresson s that computaton s relatvely demandng. Ths s manly because the objectve functon to be optmzed s not dfferentable due to L 1 constrant, and hence specal optmzaton technques are requred. Wthn the authors knowledge, there s no specal optmzaton algorthm for sparse logstc regresson whch can deal wth the constrant (2), n partcular for large number of genes Sparse one-aganst-all logstc regresson For gven y, the standard one-aganst-all (OVA) approach makes J many bnary outputs y (1),...,y (J ) I (y = j), and assumes ( Pr y (j) = 1 x ) = exp ( f j (x ) ) 1 + exp ( f j (x ) ) va y (j) =

4 1646 Y. Km et al. / Computatonal Statstcs & Data Analyss 51 (26) for j = 1,...,J where f j (x) = β (j) + β (j) 1 x 1 + +β (j) p x p. ( ) ( ) Let β = β (1),...,β(J ), β j = β (j) 1,...,β(j) p and β = (β 1,...,β J ). Then, t estmates β and β by estmatng β (j) and β j for j = 1,...,J va maxmzng the log-lkelhood of y (j) gven by n =1 [ y (j) f j (x ) log ( exp ( f j (x ) ) + 1 )] subject to p k=1 β (j) k λ j. There are multple regularzaton parameters λ 1,...,λ J, whch should be selected smultaneously n advance usng cross valdaton or any other method. Note that selectng multple regularzaton parameters s computatonally very hard snce computatonal complexty s exponentally proportonal to the number of regularzaton parameters. To resolve ths problem, SOVAL estmates β and β by maxmzng the followng (pseudo) log-lkelhood L 2 (β, β) = n J =1 j=1 [ y (j) f j (x ) log ( exp ( f j (x ) ) + 1 )] (3) subject to p Jj=1 β (j) k=1 k λ. Note that there s a sngle regularzaton parameter λ. Moreover, SOVAL s equally flexble to the standard OVA approach n the sense that f the optmal model s constructed usng the standard OVA approach wth the regularzaton parameters λ 1,...,λ J, the same model can be constructed usng SOVAL wth the regularzaton parameter λ = J j=1 λ j. Once the regresson coeffcents are estmated, the class probabltes are estmated by Pr(y = j x) = 1 ( ) C(x) Pr y (j) = 1 x, where C(x) = J m=1 Pr ( y (m) = 1 ) x. And the correspondng classfer can be constructed smlarly to the SML case. Also, the gene mportance measure s defned smlarly (that s, ρ k = J β (j) j=1 ). k 3. A computatonal algorthm We frst present a general verson of the gradent LASSO algorthm developed by Km et al. (25), and explan how to modfy t for SOVAL as well as SML. Let z R q and L(z) be a convex functon defned on R q. The objectve of the gradent LASSO s to fnd the mnmzer of L(z) over z D where D s the subset of R q defned by D = { z R q q : k=1 z k 1 }. Let e k be the vector n R q wth the kth component equal 1 and the others. Fg. 1 s the gradent LASSO algorthm for ths problem. The hardest part of the gradent LASSO s the step (a)() and (b)(v) for obtanng ˆα and ˆδ, but t can be done usng standard optmzaton technques such as the Newton Raphson algorthm. That s, the gradent LASSO algorthm does not requre any specal non-lnear optmzaton algorthms. Also, Km et al. (25) proved that the convergence rate of the gradent LASSO s 1/m where m s the number of teratons under some regularty condtons. A surprsng result s that ths convergence rate does not depend on the dmenson of nputs whch s very large for gene expresson data. Ths feature makes the gradent LASSO algorthm well suted for analyzng gene expresson data. In SOVAL as well as SML, the ntercept term β s not constraned, and hence the gradent LASSO algorthm cannot be appled drectly. For ths, we propose to estmate the ntercept term β by lettng β =, and maxmze

5 Y. Km et al. / Computatonal Statstcs & Data Analyss 51 (26) Fg. 1. Gradent LASSO algorthm. the log-lkelhood functons L 1 and L 2 wth respect to β only. For SML, β becomes β (j) = log ȳ(j) ȳ (J ) for j = 1,...,J 1 where ȳ (j) = n =1 I (y = j) /n. Smlarly, for SOVAL, we have β (j) ȳ (j) = log 1 ȳ (j) for j = 1,...,J. The gradent LASSO algorthm can be modfed for two multclass sparse logstc regressons by lettng z = β/λ and replacng L by ether L 1 or L 2. Remark. The gradent LASSO algorthm presented here s a smpler verson of the orgnal gradent LASSO algorthm of Km et al. (25). In fact, usng a more complcated verson of the gradent LASSO algorthm, we can estmate β and β smultaneously. But, the algorthm for ths s much more nvolved, and the results from estmatng β and β sequentally as s done here are not much dfferent from those that result from estmatng β and β smultaneously.

6 1648 Y. Km et al. / Computatonal Statstcs & Data Analyss 51 (26) Numercal experments We compare the two multclass extensons of sparse logstc regressons on fve publcly avalable data sets Data descrpton Leukema: The data set for ths project s the gene expresson data from leukema patents used n Golub et al. (1999). Ths data set comes from a study of gene expressons n two types of acute leukemas, acute lymphoblastc leukema (ALL) and acute myelod leukema (AML). There are two key subclasses of ALL, those arsng from T-cells and those arsng from B-cells. Ths data set s composed of 38 samples classfed as ALL T cell or ALL B cell or AML n the tranng set and an ndependent test set of 34 samples. The tranng set contans 8 ALL T-cell and 19 ALL B-cell samples and 11 AML samples. The ndependent test set consst of 1 ALL T cell and 19 ALL B cell samples and 14 AML samples. Each sample contans 7129 gene expresson values obtaned from Affymetrx olgonucleotde mcroarrays. In ths paper, we combne the tranng and test samples and analyze them together. Ths data set can be downloaded at Lymphoma: Ths data set s avalable at and contans gene expresson levels of the 3 most prevalent adult lymphod malgnances: 42 samples of dffuse large Bcell lymphoma (DLBCL, class ), 9 observatons of follcular lymphoma (FL, class 1), and 11 cases of chronc lymphocytc leukema (CLL, class 2). The total sample sze s n = 62, and the expresson of p = 426 well-measured genes, preferentally expressed n lymphod cells or wth known mmunologcal or oncologcal mportance, are documented. More nformaton on these data can be found n Alzadeh et al. (2). We mputed mssng values and standardzed the data as descrbed n Dudot et al. (22). Small, round blue-cell tumors: Ths data set about the small, round blue cell tumors (SRBCTs) of chldhood ncludes 63 samples classfed as neuroblastoma, rhabdomyosarcoma, non-hodgkn lymphoma and the Ewng famly of tumors. Gene-expresson data from the cdna mcroarray experment contans 6567 genes. For data preprocessng, we followed the protocol detaled n the supplementary nformaton to Khan et al. (21). Ths data set can be downloaded at Bran cancer: Ths data set, presented n Pomeroy et al. (22), contans n = 42 mcroarray gene expresson profles from fve dfferent tumors of the central nervous system, that s, 1 medulloblastomas, 1 malgnant glomas, 1 atypcal teratod/rhabdod tumors (AT/RTs), 8 prmtve neuro-ectodermal tumors (PNETs) and 4 human cerebella. The raw data were orgnated usng the Affymetrx technology and are publcly avalable at For data preprocessng, we followed the protocol descrbed n the supplementary nformaton to Pomeroy et al. (22). After thresholdng, flterng, applyng a logarthmc transformaton and standardzng each expresson profle to zero mean and unt varance, a data set comprsng p = 5597 genes remaned. NCI6: NCI6 s a data set of gene expresson profles of 6 Natonal Cancer Insttute (NCI) cell lnes. These 6 human tumor cell lnes are derved from patents wth leukema, melanoma, lung, colon, central nervous system, ovaran, renal, breast and prostate cancers. The data set s comprsed of gene-expresson levels of p = 7129 genes for n = 6 human tumor cell lnes whch can be dvded nto 8 classes: eght breast, sx CNS, seven colon, sx leukema, eght melanoma, nne non-small-cell lung carcnoma, sx ovaran and eght renal tumors. A more detaled descrpton of the data can be found at Staunton et al. (21). Ths data set can be downloaded at Predcton accuracy We evaluated the predcton accuracy of the two sparse multclass logstc regresson models usng random partton. Ths means that we dvded the data set at random such that 7% of the data set becomes tranng samples and the other 3% test samples. We repeated ths procedure 1 tmes and the averaged msclassfcaton errors were reported. For selectng λ, we used the fve-fold cross valdaton. We used a number of preprocessng steps as was done by Guyon et al. (21) that ncluded: takng the logarthm of all values, normalzng sample vectors, normalzng feature vectors, and passng the results through a squashng functon of the type f(x)= c arctan(x/c) to dmnsh the mportance of outlers. Along wth the predcton errors, we nvestgated the effect of prescreenng of genes to the predcton accuracy. One of the standard approaches for analyzng gene expresson data s to pck out relevant genes usng smple prescreenng

7 Y. Km et al. / Computatonal Statstcs & Data Analyss 51 (26) Table 1 Average test errors Data Method Number of covarates (The number of classes) p = 1 p = 5 p = 1 p = 5 p = 1 Full Leukema SML (3) SOVAL Lymphoma SML (3) SOVAL Small, round blue-cell SML (4) SOVAL Bran SML (5) SOVAL NCI6 SML (8) SOVAL measures to reduce computatonal costs as well as to mprove predcton accuracy (see for example, Golub et al., 1999; Dudot et al., 22). Snce multclass problems are of current concern n ths paper, we used the F-rato of between class sum of squares to wthn class sum of squares for each gene, followng Dudot et al. (22). For gene l, the F-rato s defned as n=1 ( ) Jj=1 BSS(l) I (y = j) x (j) 2 WSS(l) = l x l n=1 ) Jj=1 I (y = j) (x l x (j) 2, l where x (j) l ndcates the average expresson level of gene l for class j samples, and x l s the overall mean expresson level of gene l n the tranng set. We use the F-rato for ts smplcty, and there are dfferent types of the F-rato. Table 1 and Fg. 2 reports the test errors wth dfferent gene subset szes obtaned by the prescreenng wth the F-rato, whch shows that SOVAL s more accurate n most cases than SML. In some cases, the mprovements are larger than 5%. Second, we can see from Tables 1 and 2 that the prescreenng affects the accuracy sgnfcantly. The optmum test errors are acheved around p = 1 or p = 5 (except for the data set small, round blue-cell where the optmum error s acheved when p = 1). From ths fndng, we may conclude that the purpose of prescreenng s not to select relevant genes but to elmnate rrelevant genes. Ths result somehow contrasts wth that of Dudot et al. (22) where fndng small numbers of relevant genes by prescreenng affects predcton accuraces sgnfcantly n some cases. A reason for ths dfference would be that we use sparse methods whle Dudot et al. (22) do not. For non-sparse methods, the classfer depends on all genes used as nputs and so prescreenng would be mportant. However, sparse methods automatcally select genes whle they construct a classfer, and so prescreenng s not necessary. Moreover, the prescreenng may drop some nformatve genes n an early stage, and the resultng model would be suboptmal. In ths vew, for sparse methods, effcent computatonal algorthms for dealng wth large dmensonal nputs wthout prescreenng are necessary, and our algorthm s such an algorthm Performance of gene selecton Table 2 presents the average number of genes selected from the two sparse methods. It shows that SML tends to yeld more sparse models than SOVAL, n partcular when the number of classes s large. Along wth the error rates n Table 1, we can conclude that SML fals to detect some mportant genes, whch results n hgher error rates. To confrm our concluson, we dd the followng experment. The effectveness of gene dentfcaton was tested on mnature data sets syntheszed from the orgnal data. The mnature data sets of 1 genes were constructed as follows. Frst, usng the F-rato as a measure of margnal assocaton between each gene and the tumor type, we ranked the genes and selected the top 2 genes as varables truly assocated wth the class. As rrelevant varables, we ncluded the bottom 8 genes wth the class label correspondng to each covarate vector of 8 genes randomly mxed together, so that they

8 165 Y. Km et al. / Computatonal Statstcs & Data Analyss 51 (26) Leukema Number of Covarates 8 4 Lymphoma Number of Covarates 6 4 Small, round blue-cell Number of Covarates Bran Number of Covarates NCI Number of Covarates Fg. 2. Average test errors. were genunely unrelated to the class, but the potental correlatons between those genes were ntact. Ten replcates of synthetc tranng data were obtaned by the 1-fold cross valdaton from these mnature data sets, keepng the class proportons n each sample the same as these n the orgnal data. See Ln (25) and Jung and Jang (26) for smlar experments. We appled the two sparse multclass logstc regresson models to these 1 replcates, and the optmal regularzaton parameters were selected wth the 1 test data sets constructed from the 1 fold cross valdaton. Fg. 3 s the boxplot of the number of selected genes and the number of the selected genes among the 2 nformatve genes from the 1 replcates of the mnature data sets by the 1-fold cross valdaton of the orgnal data sets. It shows that SOVAL ncludes more nformatve genes than SML when the number of classes s large.

9 Y. Km et al. / Computatonal Statstcs & Data Analyss 51 (26) Table 2 The averaged numbers of genes selected Data Method Number of covarates (The number of classes) p = 1 p = 5 p = 1 p = 5 p = 1 Full Leukema SML (3) SOVAL Lymphoma SML (3) SOVAL Small, round blue-cell SML (4) SOVAL Bran SML (5) SOVAL NCI6 SML (8) SOVAL Leukema Lymphoma Small, round blue-cell Bran (a) NCI (b) Fg. 3. The boxplots of: (a) the total number of genes selected and (b) the number of genes selected among the top 2 nformatve genes.

10 1652 Y. Km et al. / Computatonal Statstcs & Data Analyss 51 (26) Leukema mportance of GENE GENE rank by F rato Lymphoma mportance of GENE GENE rank by F rato Small, round blue-cell mportance of GENE GENE rank by F rato Bran mportance of GENE GENE rank by F rato NCI mportance of GENE GENE rank by F rato Fg. 4. The plots of the mportance versus gene rank by F-rato.

11 Y. Km et al. / Computatonal Statstcs & Data Analyss 51 (26) class class class class class class 6 class 7 The gene wth the hghest F-rato class class class class class class class 6 class 7 The gene wth the hghest mportance class 8 Fg. 5. The boxplots of the expresson levels of the two genes havng the hghest F-rato and hghest mportance accordng to the class labels n the NCI data set. Fnally, we compared genes selected from SOVAL and genes selected from the margnal F-rato. Fg. 4 shows the plots where the x-axs dsplays the gene ranks obtaned by the margnal F-rato and the y-axs s gene mportance measured by the SOVAL. The results are strkng, n partcular when the number of classes s large (.e. n the data sets Bran and NCI). There are many genes havng smultaneously lower ranks of the margnal F-rato but havng larger mportance. To understand why ths happens, we select the two genes from the NCI data sets, one whch have the largest F-rato and the other whch has the largest mportance. The rank of the F-rato of the gene wth the hghest mportance s 132, and the mportance of the gene wth the hghest F-rato s. That s, these two genes have sgnfcantly dfferent F-rato and gene mportance values. Fg. 5 presents the boxplot of the gene expresson levels of these two genes accordng to the class labels. Frst of all, the dstrbutons of the expresson levels of the two genes are smlar. They have large postve expresson levels at the seventh class and negatve expresson levels for the other classes. An excepton s the thrd class, where the gene wth the hghest F-rato has expresson levels around whle the gene wth the hghest mportance has negatve expresson levels. Ths dfference partally explans why the ranks from the F-rato and from gene mportance are qute dfferent. The F-rato measures the varaton of the mean expresson levels of the classes, and so the gene wth the hghest F-rato has addtonal varaton due to the thrd class compared to the gene wth the hghest mportance. In contrast, SOVAL bascally measures the dfference of the means from one class to the other classes. For the seventh class, ths dfference s larger for the gene wth the hghest mportance than for the gene wth

12 1654 Y. Km et al. / Computatonal Statstcs & Data Analyss 51 (26) the hghest F-rato. So, we conclude that f we want to detect genes whch affect all the classes, the F-rato would be more approprate. However, f we want to detect genes whch affect a certan class, sparse logstc regresson would be more appealng. 5. Concludng remarks In ths paper, we proposed a multclass extenson of sparse logstc regresson, so called SOVAL, compared t wth SML, and developed the effcent computatonal algorthm sutable for gene expresson data. The numercal experments showed that SOVAL outperforms SML n many aspects. The former: () gves better accuraces n partcular; () has hgher power of detectng mportant genes and () does not requre the choce of a baselne class. The man dea of SOVAL s somehow related to the Scott s method of estmatng a mxture model (Scott, 21, 24). The Scott s method relaxed a constrant of the densty functon and focused on a partcular component rather than all components. SOVAL also relaxed a constrant that the sum of the probabltes of the classes s 1 and mplctly found genes mportant for a specfc class rather than all classes. Ths smlarty would partally explan the good predcton performance of SOVAL. We leave ths conjecture as a future work. We have seen that the selected genes by SOVAL are much dfferent from those selected by the margnal F-rato. Ths s partly because SOVAL measures the classfcaton power of genes for a specfc class whle the margnal F-rato measures the overall effect of genes on all classes. Hence, f one wants to detect genes whch affect a specfc class, SOVAL s more sutable. In ths vew, SOVAL can be consdered as a new way of detectng relevant genes and can be used as a preprocessng procedure for more complcated non-lnear classfcaton methods such as the support vector machne or boostng. For ths purpose, however, effcent computatonal algorthms are requred snce we should work wth large numbers of genes wthout prescreenng, and the algorthm proposed n ths paper can serve for ths purpose. Acknowledgments The frst author and second author were supported n part by KOSEF through the Statstcal Research Center for Complex Systems at Seoul Natonal Unversty. The thrd author was supported n part by KOSEF (R ). References Agrest, A., 199. Categorcal Data Analyss. Wley, New York. Alzadeh, A., Esen, M., Davs, R., Ma, C., Lossos, I., Rosenwald, A., Boldrck, J., Sabet, H., Tran, T., Yu, X., et al., 2. Dstnct types of dffuse large B-cell lymphoma dentfed by gene expresson proflng. Nature 43, Breman, L., Better subset regresson usng the nonnegatve garrote. Technometrcs 37 (4), Dudot, S., Frdlyand, J., Speed, T., 22. Comparson of dscrmnaton methods for the classfcaton of tumors usng gene expresson data. J. Amer. Statst. Assoc. 97, Golub, T., Slonm, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesrov, J., Coller, H., Loh, M., Downng, J., Calgur, M., Bloomfeld, C., Lander, E., Molecular classfcaton of cancer: class dscovery and class predcton by gene expresson montorng. Scence 286, Guyon, I., Weston, J., Barnhll, S., Vapnk, V., 22. Gene selecton for cancer classfcaton usng support vector machnes. Mach. Learn. 46, Jung, S.H., Jang, W., 26. How accurately can we control the FDR n analyzng mcroarray data? Bonformatcs, to appear. oxfordjournals.org/cg/reprnt/btl161? Khan, J., We, J., Rngner, M., Saal, L., Ladany, M., Westermann, F., Berthold, F., Schwab, M., Atonescu, C., Peterson, C., Meltzer, P., 21. Classfcaton and dagnostc predcton of cancers usng gene expresson proflng and artfcal neural networks. Nature Med. 7, Km, J., Km, Y., Km, Y., 25. A gradent descent algorthm for generalzed LASSO. Techncal Report, Department of Statstcs, Seoul Natonal Unversty, Korea. Krshnapuram, B., Carln, L., Fgueredo, M., Hartemnk, A., 24. Learnng sparse classfer: mult-class formulaton, fast algorthms and generalzaton bounds. IEEE Trans. Pattern Anal. Mach. Intell. 27, L, Y., Campbell, C., Tppng, M., 22. Bayesan automatc relevance determnaton algorthms for classfyng gene expresson data. Bonformatcs 18, Ln, D.Y., 25. An effcent Monte Carlo approach to assessng statstcal sgnfcance n genomc studes. Bonformatcs 43, Pomeroy, S., Tamayo, P., Gaasenbeek, M., Sturla, L., Angelo, M., McLaughln, M., Km, J., Goumnerova, L., Black, P., Lau, C., et al., 22. Predcton of central nervous system embryonal tumor outcome based on gene expresson. Nature 415, Roth, V., 22. The generalzed LASSO: a wrapper approach to gene selecton for mcroarray data. Techncal Report, Unversty of Bonn, Computer Scence III.

13 Y. Km et al. / Computatonal Statstcs & Data Analyss 51 (26) Scott, D.W., 21. Parametrc statstcal modelng by mnmum ntegrated square error. Technometrcs 43, Scott, D.W., 24. Partal mxture estmaton and outler detecton n data and regresson. In: Theory and Applcatons of Recent Robust Methods.Brkhäuser, Basel, pp Shevade, K., Keerth, S., 23. A smple and effcent algorthm for gene selecton usng sparse logstc regresson. Bonformatcs 19, Staunton, J., Slonm, D., Coller, H., Tamayo, P., Angelo, M., Park, J., Scherf, U., Lee, J., Renhold, W., Wensten, J., Mesrov, J., Lander, E., Golub, T., 21. Chemosenstvty predcton by transcrptonal proflng. Proc. Nat. Acad. Sc. 98 (19),

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

Interpreting Patterns and Analysis of Acute Leukemia Gene Expression Data by Multivariate Statistical Analysis

Interpreting Patterns and Analysis of Acute Leukemia Gene Expression Data by Multivariate Statistical Analysis Interpretng Patterns and Analyss of Acute Leukema Gene Expresson Data by Multvarate Statstcal Analyss ChangKyoo Yoo * and Peter A. Vanrolleghem BIOMATH, Department of Appled Mathematcs, Bometrcs and Process

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

STATISTICAL DATA ANALYSIS IN EXCEL

STATISTICAL DATA ANALYSIS IN EXCEL Mcroarray Center STATISTICAL DATA ANALYSIS IN EXCEL Lecture 6 Some Advanced Topcs Dr. Petr Nazarov 14-01-013 petr.nazarov@crp-sante.lu Statstcal data analyss n Ecel. 6. Some advanced topcs Correcton for

More information

Logistic Regression. Steve Kroon

Logistic Regression. Steve Kroon Logstc Regresson Steve Kroon Course notes sectons: 24.3-24.4 Dsclamer: these notes do not explctly ndcate whether values are vectors or scalars, but expects the reader to dscern ths from the context. Scenaro

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

Statistical Methods to Develop Rating Models

Statistical Methods to Develop Rating Models Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and

More information

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence 1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

How To Understand The Results Of The German Meris Cloud And Water Vapour Product Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

Single and multiple stage classifiers implementing logistic discrimination

Single and multiple stage classifiers implementing logistic discrimination Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM BARRIOT Jean-Perre, SARRAILH Mchel BGI/CNES 18.av.E.Beln 31401 TOULOUSE Cedex 4 (France) Emal: jean-perre.barrot@cnes.fr 1/Introducton The

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

Binomial Link Functions. Lori Murray, Phil Munz

Binomial Link Functions. Lori Murray, Phil Munz Bnomal Lnk Functons Lor Murray, Phl Munz Bnomal Lnk Functons Logt Lnk functon: ( p) p ln 1 p Probt Lnk functon: ( p) 1 ( p) Complentary Log Log functon: ( p) ln( ln(1 p)) Motvatng Example A researcher

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

Searching for Interacting Features for Spam Filtering

Searching for Interacting Features for Spam Filtering Searchng for Interactng Features for Spam Flterng Chuanlang Chen 1, Yun-Chao Gong 2, Rongfang Be 1,, and X. Z. Gao 3 1 Department of Computer Scence, Bejng Normal Unversty, Bejng 100875, Chna 2 Software

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

Estimating the Number of Clusters in Genetics of Acute Lymphoblastic Leukemia Data

Estimating the Number of Clusters in Genetics of Acute Lymphoblastic Leukemia Data Journal of Al Azhar Unversty-Gaza (Natural Scences), 2011, 13 : 109-118 Estmatng the Number of Clusters n Genetcs of Acute Lymphoblastc Leukema Data Mahmoud K. Okasha, Khaled I.A. Almghar Department of

More information

Mining Multiple Large Data Sources

Mining Multiple Large Data Sources The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of

More information

Lecture 2: Single Layer Perceptrons Kevin Swingler

Lecture 2: Single Layer Perceptrons Kevin Swingler Lecture 2: Sngle Layer Perceptrons Kevn Sngler kms@cs.str.ac.uk Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses

More information

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6 PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

1. Measuring association using correlation and regression

1. Measuring association using correlation and regression How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a

More information

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model

More information

SIMPLE LINEAR CORRELATION

SIMPLE LINEAR CORRELATION SIMPLE LINEAR CORRELATION Smple lnear correlaton s a measure of the degree to whch two varables vary together, or a measure of the ntensty of the assocaton between two varables. Correlaton often s abused.

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

Gender Classification for Real-Time Audience Analysis System

Gender Classification for Real-Time Audience Analysis System Gender Classfcaton for Real-Tme Audence Analyss System Vladmr Khryashchev, Lev Shmaglt, Andrey Shemyakov, Anton Lebedev Yaroslavl State Unversty Yaroslavl, Russa vhr@yandex.ru, shmaglt_lev@yahoo.com, andrey.shemakov@gmal.com,

More information

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success

More information

How To Calculate The Accountng Perod Of Nequalty

How To Calculate The Accountng Perod Of Nequalty Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.

More information

Gender differences in revealed risk taking: evidence from mutual fund investors

Gender differences in revealed risk taking: evidence from mutual fund investors Economcs Letters 76 (2002) 151 158 www.elsever.com/ locate/ econbase Gender dfferences n revealed rsk takng: evdence from mutual fund nvestors a b c, * Peggy D. Dwyer, James H. Glkeson, John A. Lst a Unversty

More information

Statistical algorithms in Review Manager 5

Statistical algorithms in Review Manager 5 Statstcal algorthms n Reve Manager 5 Jonathan J Deeks and Julan PT Hggns on behalf of the Statstcal Methods Group of The Cochrane Collaboraton August 00 Data structure Consder a meta-analyss of k studes

More information

CHAPTER 14 MORE ABOUT REGRESSION

CHAPTER 14 MORE ABOUT REGRESSION CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp

More information

Fisher Markets and Convex Programs

Fisher Markets and Convex Programs Fsher Markets and Convex Programs Nkhl R. Devanur 1 Introducton Convex programmng dualty s usually stated n ts most general form, wth convex objectve functons and convex constrants. (The book by Boyd and

More information

Survival analysis methods in Insurance Applications in car insurance contracts

Survival analysis methods in Insurance Applications in car insurance contracts Survval analyss methods n Insurance Applcatons n car nsurance contracts Abder OULIDI 1 Jean-Mare MARION 2 Hervé GANACHAUD 3 Abstract In ths wor, we are nterested n survval models and ther applcatons on

More information

Support vector domain description

Support vector domain description Pattern Recognton Letters 20 (1999) 1191±1199 www.elsever.nl/locate/patrec Support vector doman descrpton Davd M.J. Tax *,1, Robert P.W. Dun Pattern Recognton Group, Faculty of Appled Scence, Delft Unversty

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP) 6.3 / -- Communcaton Networks II (Görg) SS20 -- www.comnets.un-bremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes

More information

PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION

PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIIOUS AFFILIATION AND PARTICIPATION Danny Cohen-Zada Department of Economcs, Ben-uron Unversty, Beer-Sheva 84105, Israel Wllam Sander Department of Economcs, DePaul

More information

1 Example 1: Axis-aligned rectangles

1 Example 1: Axis-aligned rectangles COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n

More information

Transition Matrix Models of Consumer Credit Ratings

Transition Matrix Models of Consumer Credit Ratings Transton Matrx Models of Consumer Credt Ratngs Abstract Although the corporate credt rsk lterature has many studes modellng the change n the credt rsk of corporate bonds over tme, there s far less analyss

More information

An artificial Neural Network approach to monitor and diagnose multi-attribute quality control processes. S. T. A. Niaki*

An artificial Neural Network approach to monitor and diagnose multi-attribute quality control processes. S. T. A. Niaki* Journal of Industral Engneerng Internatonal July 008, Vol. 4, No. 7, 04 Islamc Azad Unversty, South Tehran Branch An artfcal Neural Network approach to montor and dagnose multattrbute qualty control processes

More information

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement An Enhanced Super-Resoluton System wth Improved Image Regstraton, Automatc Image Selecton, and Image Enhancement Yu-Chuan Kuo ( ), Chen-Yu Chen ( ), and Chou-Shann Fuh ( ) Department of Computer Scence

More information

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy 4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.

More information

Performance Analysis and Coding Strategy of ECOC SVMs

Performance Analysis and Coding Strategy of ECOC SVMs Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04), pp.67-76 http://dx.do.org/0.457/jgdc.04.7..07 Performance Analyss and Codng Strategy of ECOC SVMs Zhgang Yan, and Yuanxuan Yang, School

More information

Hallucinating Multiple Occluded CCTV Face Images of Different Resolutions

Hallucinating Multiple Occluded CCTV Face Images of Different Resolutions In Proc. IEEE Internatonal Conference on Advanced Vdeo and Sgnal based Survellance (AVSS 05), September 2005 Hallucnatng Multple Occluded CCTV Face Images of Dfferent Resolutons Ku Ja Shaogang Gong Computer

More information

On fourth order simultaneously zero-finding method for multiple roots of complex polynomial equations 1

On fourth order simultaneously zero-finding method for multiple roots of complex polynomial equations 1 General Mathematcs Vol. 6, No. 3 (2008), 9 3 On fourth order smultaneously zero-fndng method for multple roots of complex polynomal euatons Nazr Ahmad Mr and Khald Ayub Abstract In ths paper, we present

More information

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao

More information

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta

More information

New Approaches to Support Vector Ordinal Regression

New Approaches to Support Vector Ordinal Regression New Approaches to Support Vector Ordnal Regresson We Chu chuwe@gatsby.ucl.ac.uk Gatsby Computatonal Neuroscence Unt, Unversty College London, London, WCN 3AR, UK S. Sathya Keerth selvarak@yahoo-nc.com

More information

+ + + - - This circuit than can be reduced to a planar circuit

+ + + - - This circuit than can be reduced to a planar circuit MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to

More information

Fast Fuzzy Clustering of Web Page Collections

Fast Fuzzy Clustering of Web Page Collections Fast Fuzzy Clusterng of Web Page Collectons Chrstan Borgelt and Andreas Nürnberger Dept. of Knowledge Processng and Language Engneerng Otto-von-Guercke-Unversty of Magdeburg Unverstätsplatz, D-396 Magdeburg,

More information

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,

More information

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching) Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton

More information

Analysis of Premium Liabilities for Australian Lines of Business

Analysis of Premium Liabilities for Australian Lines of Business Summary of Analyss of Premum Labltes for Australan Lnes of Busness Emly Tao Honours Research Paper, The Unversty of Melbourne Emly Tao Acknowledgements I am grateful to the Australan Prudental Regulaton

More information

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Research Note APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES * Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789-794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC

More information

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management

More information

Detecting Credit Card Fraud using Periodic Features

Detecting Credit Card Fraud using Periodic Features Detectng Credt Card Fraud usng Perodc Features Alejandro Correa Bahnsen, Djamla Aouada, Aleksandar Stojanovc and Björn Ottersten Interdscplnary Centre for Securty, Relablty and Trust Unversty of Luxembourg,

More information

IMPACT ANALYSIS OF A CELLULAR PHONE

IMPACT ANALYSIS OF A CELLULAR PHONE 4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng

More information

Learning from Multiple Outlooks

Learning from Multiple Outlooks Learnng from Multple Outlooks Maayan Harel Department of Electrcal Engneerng, Technon, Hafa, Israel She Mannor Department of Electrcal Engneerng, Technon, Hafa, Israel maayanga@tx.technon.ac.l she@ee.technon.ac.l

More information

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES The goal: to measure (determne) an unknown quantty x (the value of a RV X) Realsaton: n results: y 1, y 2,..., y j,..., y n, (the measured values of Y 1, Y 2,..., Y j,..., Y n ) every result s encumbered

More information

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS 21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts Power-of-wo Polces for Sngle- Warehouse Mult-Retaler Inventory Systems wth Order Frequency Dscounts José A. Ventura Pennsylvana State Unversty (USA) Yale. Herer echnon Israel Insttute of echnology (Israel)

More information

The Journal of Systems and Software

The Journal of Systems and Software The Journal of Systems and Software 82 (2009) 241 252 Contents lsts avalable at ScenceDrect The Journal of Systems and Software journal homepage: www. elsever. com/ locate/ jss A study of project selecton

More information

Learning the Best K-th Channel for QoS Provisioning in Cognitive Networks

Learning the Best K-th Channel for QoS Provisioning in Cognitive Networks 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Microarray data normalization and transformation

Microarray data normalization and transformation revew Mcroarray data normalzaton and transformaton John Quackenbush do:38/ng3 Nature Publshng Group http://wwwnaturecom/naturegenetcs Underlyng every mcroarray experment s an expermental queston that one

More information

ONE of the most crucial problems that every image

ONE of the most crucial problems that every image IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 10, OCTOBER 2014 4413 Maxmum Margn Projecton Subspace Learnng for Vsual Data Analyss Symeon Nktds, Anastasos Tefas, Member, IEEE, and Ioanns Ptas, Fellow,

More information

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble 1 ECE544NA Fnal Project: Robust Machne Learnng Hardware va Classfer Ensemble Sa Zhang, szhang12@llnos.edu Dept. of Electr. & Comput. Eng., Unv. of Illnos at Urbana-Champagn, Urbana, IL, USA Abstract In

More information

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification IDC IDC A Herarchcal Anomaly Network Intruson Detecton System usng Neural Network Classfcaton ZHENG ZHANG, JUN LI, C. N. MANIKOPOULOS, JAY JORGENSON and JOSE UCLES ECE Department, New Jersey Inst. of Tech.,

More information

Learning to Classify Ordinal Data: The Data Replication Method

Learning to Classify Ordinal Data: The Data Replication Method Journal of Machne Learnng Research 8 (7) 393-49 Submtted /6; Revsed 9/6; Publshed 7/7 Learnng to Classfy Ordnal Data: The Data Replcaton Method Jame S. Cardoso INESC Porto, Faculdade de Engenhara, Unversdade

More information

MARKET SHARE CONSTRAINTS AND THE LOSS FUNCTION IN CHOICE BASED CONJOINT ANALYSIS

MARKET SHARE CONSTRAINTS AND THE LOSS FUNCTION IN CHOICE BASED CONJOINT ANALYSIS MARKET SHARE CONSTRAINTS AND THE LOSS FUNCTION IN CHOICE BASED CONJOINT ANALYSIS Tmothy J. Glbrde Assstant Professor of Marketng 315 Mendoza College of Busness Unversty of Notre Dame Notre Dame, IN 46556

More information

Damage detection in composite laminates using coin-tap method

Damage detection in composite laminates using coin-tap method Damage detecton n composte lamnates usng con-tap method S.J. Km Korea Aerospace Research Insttute, 45 Eoeun-Dong, Youseong-Gu, 35-333 Daejeon, Republc of Korea yaeln@kar.re.kr 45 The con-tap test has the

More information

Improved SVM in Cloud Computing Information Mining

Improved SVM in Cloud Computing Information Mining Internatonal Journal of Grd Dstrbuton Computng Vol.8, No.1 (015), pp.33-40 http://dx.do.org/10.1457/jgdc.015.8.1.04 Improved n Cloud Computng Informaton Mnng Lvshuhong (ZhengDe polytechnc college JangSu

More information

A Performance Analysis of View Maintenance Techniques for Data Warehouses

A Performance Analysis of View Maintenance Techniques for Data Warehouses A Performance Analyss of Vew Mantenance Technques for Data Warehouses Xng Wang Dell Computer Corporaton Round Roc, Texas Le Gruenwald The nversty of Olahoma School of Computer Scence orman, OK 739 Guangtao

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

Predicting Software Development Project Outcomes *

Predicting Software Development Project Outcomes * Predctng Software Development Project Outcomes * Rosna Weber, Mchael Waller, June Verner, Wllam Evanco College of Informaton Scence & Technology, Drexel Unversty 3141 Chestnut Street Phladelpha, PA 19104

More information

Fault tolerance in cloud technologies presented as a service

Fault tolerance in cloud technologies presented as a service Internatonal Scentfc Conference Computer Scence 2015 Pavel Dzhunev, PhD student Fault tolerance n cloud technologes presented as a servce INTRODUCTION Improvements n technques for vrtualzaton and performance

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

ActiveClean: Interactive Data Cleaning While Learning Convex Loss Models

ActiveClean: Interactive Data Cleaning While Learning Convex Loss Models ActveClean: Interactve Data Cleanng Whle Learnng Convex Loss Models Sanjay Krshnan, Jannan Wang, Eugene Wu, Mchael J. Frankln, Ken Goldberg UC Berkeley, Columba Unversty {sanjaykrshnan, jnwang, frankln,

More information

A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns

A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns A study on the ablty of Support Vector Regresson and Neural Networks to Forecast Basc Tme Seres Patterns Sven F. Crone, Jose Guajardo 2, and Rchard Weber 2 Lancaster Unversty, Department of Management

More information

Sketching Sampled Data Streams

Sketching Sampled Data Streams Sketchng Sampled Data Streams Florn Rusu, Aln Dobra CISE Department Unversty of Florda Ganesvlle, FL, USA frusu@cse.ufl.edu adobra@cse.ufl.edu Abstract Samplng s used as a unversal method to reduce the

More information

Support Vector Machine Model for Currency Crisis Discrimination. Arindam Chaudhuri 1. Abstract

Support Vector Machine Model for Currency Crisis Discrimination. Arindam Chaudhuri 1. Abstract Support Vector Machne Model for Currency Crss Dscrmnaton Arndam Chaudhur Abstract Support Vector Machne (SVM) s powerful classfcaton technque based on the dea of structural rsk mnmzaton. Use of kernel

More information