Feature Selection via Correlation Coefficient Clustering

Size: px
Start display at page:

Download "Feature Selection via Correlation Coefficient Clustering"

Transcription

1 JOURNAL OF SOFTWARE, VOL. 5, NO. 1, DECEMBER Feature Selecton va Correlaton Coeffcent Clusterng Hu-Huang Hsu Department of Computer Scence and Informaton Engneerng, Tamkang Unversty, Tape, 5137, Tawan Emal: Cheng-We Hseh Department of Computer Scence and Informaton Engneerng, Tamkang Unversty, Tape, 5137, Tawan Emal: Abstract Feature selecton s a fundamental problem n machne learnng and data mnng. How to choose the most problem-related features from a set of collected features s essental. In ths paper, a novel method usng correlaton coeffcent clusterng n removng smlar/redundant features s proposed. The collected features are grouped nto clusters by measurng ther correlaton coeffcent values. The most class-dependent feature n each cluster s retaned whle others n the same cluster are removed. Thus, the most class-related and mutually unrelated features are dentfed. The proposed method was appled to two datasets: the dsordered proten dataset and the Arrhythma (ARR) dataset. The expermental results show that the method s superor to other feature selecton methods n speed and/or accuracy. Detal dscussons are gven n the paper. Index Terms Feature Selecton, Clusterng, Correlaton Coeffcent, Support Vector Machnes (SVMs), Machne Learnng, Classfcaton I. INTRODUCTION Feature selecton ams to select the most problemrelated features and to remove unnecessary features [1]. The unnecessary features nclude both nosy and redundant features. We can say that f a feature cannot help mprove the classfcaton accuracy, the feature s useless and unnecessary. The nosy feature s especally meant to harm the classfcaton results. If the class classfcaton result s mproved by removng some features, we can say that these features could be nosy features. But one mportant queston s how to fnd these nosy features? The wrapper mode feature selecton model could be helpful []. However, t s usually very tme consumng, because t combnes some learnng machnes whch are the core of selectng features [3][4]. Features whch lower the overall accuracy by the learnng machne wll be removed from the orgnal feature set. The procedure would be progressvely repeated untl the classfcaton accuracy cannot be further mproved. Ths procedure needs complcated computaton and always takes a lot of tme. In ths paper we focus on reducng repeated or redundant features. The targetng features may not be exactly the same, but they are closely related. Smlar features nputted to the classfer not only ncrease the computaton tme, but also decrease ts classfcaton capablty. There are several measures whch are helpful n fndng the redundant features. For example, mutual nformaton, correlaton coeffcent, and ch-square can be used to fnd the dependency between two features. However, for a large amount of features, ths parwse dependency nformaton s not enough for us to fnd the features whch are close to each other n groups. Hence, clusterng analyss s appled here. It s a very useful technque to dvde a feature set nto subsets wthn whch features are closely related to each other. If we can separate the collected features nto such groups, we need to keep only one feature n each group because they are almost the same. Therefore we can greatly reduce the number of features by removng those redundant features. Clusterng analyss usually uses Eucldean dstance as the smlarty measurement. But measurements based on the nformaton theory could be more helpful n fndng dependency between two varables than smply measurng the dstance n space. In ths research, the correlaton coeffcent nstead of the Eucldean dstance s used for clusterng analyss. The correlaton coeffcent of two random varables s a quantty that measures the mutual dependency of the two varables. Hence, when two features are mutually dependent, t means the occurrence and varaton of the two features must be almost the same. For a classfcaton problem, we need to keep only one of them snce they share almost the same characterstcs. For hundreds or even thousands of collected features, there must be features that are very smlar to each other, and we can take these features as the same knd of features. We certanly do not need to use all features of the same knd for classfcaton. After clusterng analyss dentfes all dfferent knds of features, we can remove a great number of redundant features. The classfcaton performance n both the computatonal speed and the classfcaton accuracy can be mproved wth the removal of these redundant features. A novel feature selecton algorthm based on the above-mentoned correlaton coeffcent clusterng s proposed n ths paper. Support vector machnes (SVMs) [5] are used as the classfer for testng the feature selecton results on two datasets: do: /jsw

2 137 JOURNAL OF SOFTWARE, VOL. 5, NO. 1, DECEMBER 010 dsordered proten data and Arrhythma (ARR) data. Detals are gven n the subsequent sectons. The rest of ths paper s organzed as follows. Secton II ntroduces related work. Secton III presents the proposed clusterng feature selecton mechansm. Secton IV descrbes the SVM learnng model and the datasets. Secton V shows expermental results and dscussons. Fnally, Secton VI draws a bref concluson. II. RELATED WORK Feature selecton methods have been appled to classfcaton problems n order to select a reduced feature set that makes the classfer faster and more accurate. Roughly speakng, the feature selecton model contans two dfferent modes: flters and wrappers []. The flters measure the nformaton of features [6][7] (e.g., nformaton gan) to decde the feature selecton result. Ths knd of model works fast, but the classfcaton result s not always satsfed. Because the flters contan no error rate controllng technque, the result of flters s not always stable. On the other hand, the wrappers combne a learnng model n t. The wrappers perform the feature selecton through two man steps: feature searchng and classfcaton error rate measurement. The feature searchng procedure selects features from the orgnal feature set and nput them nto the next classfcaton procedure to test ther predcton error rate. The wrappers work slowly because both the two man steps are very tme-consumng. Moreover, complex calculaton makes t dffcult to perform the wrappers on applcatons wth a large number of features. In our prevous research, we combned the flters and the wrappers to solve the applcatons wth a large number of features [8]. At frst, we use the fast flter models wth two nformaton measurement: nformaton gan and F-score. These two models can flter out a lot of features not that related to the problem. As mentoned above, the flter mght not provde a satsfed classfcaton result. Hence, we perform the wrappermode feature selecton to mprove the classfer s predcton result. The hybrd mechansm was appled to the proten dsordered regon predcton problem whch s to fnd out the unstructured regons of protens. The learnng model used n t was the support vector machne. In the expermental results, 350 features were selected from the orgnal 440 features and the predcton accuracy was 8.7%. One way to solve the problem of redundant or repeated features s to use some knd of feature dependency measurements, such as mutual nformaton (MI), correlaton coeffcent, or ch-square. A mutual nformaton feature selecton mechansm was proposed by Huang et al. [9]. They used a flter approach to perform the feature selecton. In ther pont of vew, there are two types of nput features perceved as beng unnecessary. They are features completely rrelevant to the output classes and features redundant gven other nput features. By usng the mutual nformaton performed on class-related and feature-related features, feature selecton can be done. The concept s from the nformaton theory whch analyzes the relatonshp between features and classes to remove the redundant features and the most rrelevant features to the class. Another feature dependency measurement feature selecton was proposed by Peng et al. [10]. They also used mutual nformaton to perform feature selecton. Ther orgnal feature selecton concept s based on features max dependency (MaxDep) [11] whch measures the feature sets statstcal dependency wth the target class. MaxDep selects m features that jontly have the largest dependency on the target class. The fnal selected features have the maxmal dependency values that are calculated from some smlarty measurements, for example, correlaton coeffcent or mutual nformaton. However, the estmaton of MaxDep s very hard due to ts multvarate dependency measurement whch s retreved from a hgh dmensonal space. Both feature searchng and nformaton measurng are qute tmeconsumng. In order to mprove MaxDep, Peng et al. desgned a two-stage feature selecton algorthm by combnng the mnmal-redundancy-maxmal-relevance crteron (mrmr) and other more sophstcated feature selectors. It calculates the features wth the maxmal class-related value whle ths feature s n the mnmal redundancy wth all the already selected features. It then performs optmal frst-order ncremental selecton to mprove the classfcaton result. By usng some wrapper knd of feature selecton model (e.g., forward/backward floatng search), they get the fnal compact feature set wth the hghest classfcaton accuracy. The results confrm that mrmr leads to promsng mprovement on feature selecton and classfcaton accuracy. For the feature dependency measurement technques, the correlaton coeffcent also plays an mportant role though t has not been used as often as mutual nformaton. From the defnton, the correlaton coeffcent provdes a quanttatve measurement that represents the strength of a lnear relatonshp between two sequences of observatons. Hence, for most varables relatonshp tests, calculatng correlaton coeffcents would be the frst step to determne f they are lnearly dependent. On the other hand, mutual nformaton s based on the knowledge measurement, whch handles the test of how much knowledge one can gan of a certan varable by knowng the value of another varable. Mutual nformaton helps reduce the range of the probablty densty functon for a random varable x f the varable y s known. Therefore, f we only want to test the dependency between two varables nstead of testng the knowledge gan, t s preferable to use the correlaton coeffcent. In the next secton, we ntroduce the correlaton coeffcent based feature selecton model whch can fnd out redundant features by testng parwse feature dependency. III. CORRELATION COEFFICIENT CLUSTERING FOR FEATURE SELECTION To fnd related feature groups s not an easy task. The parwse smlarty measurements of the whole feature set are hard to be realzed due to a large amount of huge

3 JOURNAL OF SOFTWARE, VOL. 5, NO. 1, DECEMBER calculatons. Besdes, the result of parwse measurements cannot be used to dentfy multple smlar features. Thus we propose to use clusterng analyss to group the most related features together. Ths could dvde the feature set nto groups of multple features. The Eucldean dstance s the most used smlarty measurement n clusterng analyss. However, t does not ft our feature selecton goal. Therefore, we replace the dstance measurement wth the correlaton coeffcent n clusterng. Next, feature selecton wthn feature clusters s also an mportant problem. Ths s also an mportant procedure of feature selecton. One representatve feature needs to be pcked from each feature cluster. In prevous researches, lttle attenton was pad to ths problem. The researchers thought that snce the features n the same cluster are almost the same, any of them can be chosen and the classfcaton results would be about the same. But there exsts dfference among those smlar features. Here we propose to choose the feature most related to the class n each feature cluster. The feature that has the hghest correlaton coeffcent value wth the class label s pcked. The followng subsectons ntroduce the clusterng mechansm, the correlaton coeffcent, and the proposed correlaton coeffcent clusterng algorthm for feature selecton. A. Clusterng Clusterng s one of the most wdely used technques for exploratory data analyss. It also can be consdered as the most mportant unsupervsed learnng problem. Practcally, clusterng analyss fnds a structure n a collecton of unlabeled data. Hence, t separates the orgnal dataset nto smaller datasets called clusters. Data n each cluster are close to each other. Fg.1 demonstrates such separaton of data. Cluster Cluster 1 Cluster 3 Fgure 1. Separaton of data va clusterng. Clusterng algorthms can be classfed as herarchcal clusterng, overlappng clusterng, exclusve clusterng, and probablstc clusterng [1]. In our research, we only consder the exclusve clusterng, and that means each node n the Fg.1 can only belong to one cluster. There are also many clusterng algorthms. Among them, K- means s the classcal one. For K-means clusterng, t works on separatng n observatons nto k clusters, and each observaton belongs to the nearest mean s cluster. Usually the Eucldean dstance s used as the dstance metrc to calculate the observatons relatonshp. K- means clusterng works as the followng steps. 1. Randomly select k nodes as the means from n observatons, where k n.. Calculate the Eucldean dstance from each node to all the means, and the (n-k) observatons belong to ther respectve nearest mean. 3. Re-calculate the means of all clusters m 1, m,, m k. 4. Repeat Steps and 3 untl the content of each cluster s fxed. Fnally, each cluster could represent a dfferent collecton from the other clusters. By usng ths knd of clusterng models, the observatons could be easly separated accordng to the Eucldean dstance measurement. Ths s much better than measurng the dstance between each pars for all the observatons consderng the tme complexty. However, the Eucldean dstance can only measure the space dstance between observatons. The observatons dependency cannot be revealed. Hence, n ths paper, we apply the correlaton coeffcent n clusterng to measure the dependency of all observatons. B. Correlaton Coeffcent In statstcs, the correlaton coeffcent ndcates the strength and drecton of a relatonshp between two random varables. The commonest use refers to a lnear relatonshp. In general statstcal usage, correlaton or corelaton refers to the departure of two random varables from ndependence. Equaton (1) shows the calculaton of the correlaton coeffcent between two varables x and y. There are totally n observatons. r xy = n = 1 x n = 1 x y nxy nx = 1 Two varables have strong dependency when ther correlaton coeffcent value s close to 1 or -1. When the value s 0, t means that the two varables are not related at all. In our research, strong dependency s what we are lookng for, no matter t s postve or negatve. Therefore, n the measurement procedure, the absolute value of the correlaton coeffcent r s used. C. Correlaton Coeffcent Clusterng Algorthm In ths study, we combne the correlaton coeffcent wth clusterng analyss for feature selecton. Instead of usng the Eucldean dstance, we choose the correlaton coeffcent as the smlarty measurement as dscussed n the prevous subsecton. Moreover, clusterng analyss can separate the whole feature set nto dfferent groups. Closely related features can be put together after the frst clusterng steps. The features are dvded nto dfferent knds of groups accordng to ther dependency. And each knd of groups can represent a part of the feature space. For the fnal goal of feature selecton, we must choose the most relevant and non-redundant features from the orgnal feature set to reduce the number of features. In ths approach, only one feature s needed from each knd/cluster of features. The reason s that features n the n y ny (1)

4 1374 JOURNAL OF SOFTWARE, VOL. 5, NO. 1, DECEMBER 010 same cluster are very close to each other and we do not need to use more than two features of the same knd to perform the classfcaton task. Fg. shows the concept of the proposed feature selecton model. In the clusterng procedure, we use the correlaton coeffcent as the smlarty measurement to check the dependency among features. a 1, a, a 3, a 4,., a n Clusterng Whole feature set c 1 c c k Correlaton coeffcent clusterng for smlar features Feature selecton from k clusters m 1 m m k Removal of redundant features Fgure. The process of correlaton coeffcent clusterng feature selecton. The remaned features are the result of feature selecton. A problem comes up here regardng how to pck the representatve feature for each feature cluster. That s, whch feature n a cluster should we keep? We propose to pck the most class-dependent feature n each cluster as the representatve one. The correlaton coeffcent can also be used to decde the class-feature dependency. The most class-dependent features from all clusters can certanly help mprove the overall classfcaton accuracy. The pseudocode of the proposed correlaton coeffcent clusterng feature selecton algorthm s as follows. Randomly select k nodes m(m 1,, m k ) from n observatons a(a 1,,a n ); WHILE orgnally selected k nodes m(m 1,, m k )!= new selected k nodes m (m 1,, m k ) FOR = 1 to n (observatons) FOR j = 1 to k (nodes) r j = Correlaton_Coeffcent (a, m j ); IF r j MAX(r 1, r,, r k ) a belongs to m j s cluster; END IF FOR p = 1 to k (clusters C 1,,C k ) FOR q = 1 to C p s length t(cluster p s contents s 1,,s t ) r q = Correlaton_Coeffcent (s q, Class labels); IF r q MAX(r 1, r,, r p ) m p = s q ; END IF END WHILE RETURN m (m 1,, m k ) ; Next, we make a bref comparson of the proposed method wth mrmr. Frst, mrmr only choose the most nformatonal features,.e., the most class-related features. As we know the the m best features are not the best m features [13], the result by mrmr mght gnore features whch are not so closely related to the class label, but can complement other features to mprove the classfcaton result. In the proposed method, no such features would be mssed. Secondly, the Mn-Redundancy step of mrmr only randomly keeps one of the Max-Relevance features. On the other hand, the proposed method retans the most class-related feature n each feature cluster by calculatng the correlaton coeffcents between the features and the class. Other features n the same cluster are then removed. IV. LEARNING MODEL AND DATASETS A machne learnng method s needed when we apply the proposed feature selecton n classfcaton problems. The support vector machne (SVM) was chosen for the experments n ths research due to ts advantages n the use of kernels for nonlnear problems and the optmzaton of the separatng margns. Furthermore, t can avod the local mnma problems durng the tranng process. In ths secton, the datasets used for the experments n ths research are also ntroduced. A. Support Vector Machne The SVM s based on the SV (support vector) learnng. That means the SVM does not always compare the predcton target wth all the exstng tranng nodes. In contrast, the SVM selects a group of nodes as ts SVs, and uses these SVs to judge the label of the classfcaton target. In the testng stage, the SVM model uses the SVs to do the classfcaton. These SVs locate near the hyperplanes that cause the maxmum margn of class separaton. Fg. 3 demonstrates the maxmum margn between two classes whch are separated by the hyperplane n the SVM model. H 1 and H are the boundares. And the nodes whch are located near these two lnes are support vectors. H 1 H Fgure 3. The SVM could fnd out the maxmum margn and use the SVs to predct the predcton targets. Boundares H1 and H are located on these SVs.

5 JOURNAL OF SOFTWARE, VOL. 5, NO. 1, DECEMBER B. Datasets Proten dsordered regon predcton s the frst problem tred n ths research. In proteomcs, a proten s functon s always strongly related to ts structure. Whle some parts of a proten have a fxed defnte structure, such as α-helx, β-sheet, or col, other parts are not assocated wth well-defned conformatons. Prevously, these so-called dsordered regons were not thought to have a specfc functon of ther own. But, recent studes suggest that some dsordered regons may have mportant sgnalng or regulatory functons. In addton, some crtcal dseases are strongly related to these dsordered regons. Thus, proten dsordered regon predcton s an mportant problem. However, the most relevant features n ths problem are yet to be determned [14][15]. Our ordered and dsordered sequences were collected from the PDB [16] and DsProt [17] databases. The protens n DsProt are all wth dsordered regons. The proten sequences collected from PDB contan mostly ordered regons. Those data selected from DsProt are taken as postve tranng data, and the negatve tranng data are derved from PDB_Select_5 [18] whch s a non-redundant dataset of the Proten Data Bank (PDB). Fnally, 119 proten sequences wth 440 features were collected and there are totally 1676 resdues. The 440 features were determned from related researches [8]. In order to compare the proposed method wth MaxDep and mrmr, the Arrhythma (ARR) dataset from UCI machne learnng archve [19] was also used. The am of ths dataset s to dstngush between the absence and presence of cardac arrhythma and to classfy a datum nto one of the 16 classes. However, we can only consder two states: normal and abnormal. Class 1 refers to normal, Classes to 15 refer to dfferent abnormal classes of arrhythma, and Class 16 refers to the other unclassfed ones. In ths dataset, there are totally 45 nstances wth 79 features. Among the features, 06 are lnear values and the rest are nomnal. V. EXPERIMENTAL RESULTS A software tool has been mplemented for the proposed feature selecton method (Fg. 4). C#.NET n MS Vsual Studo was used to develop the tool. The user can determne the number of clusters, the smlarty measurement, and the clusterng method n our tool. As for the determnaton of the number of clusters n the experments, several methods have been tred, namely, gap statstc [0], Calnsk-Harabasz ndex [1], Krzanowsk-La ndex [], and Hartgan statstc [3]. Most of them compare the values of between-cluster sums of squares and the values of wthn-cluster sums of squares to detect the dstrbuton of data. Followng ther dstrbuton, the number of clusters can be estmated. There are two man problems. Frst, these methods can only gve estmates and sometmes perform not so precsely. Secondly, n our experment, the number of clusters s also the fnal number of remaned features. Accordng to the past researches, wth only man classrelated features the classfer mght not perform well. Sometmes t s necessary to nclude some addtonal features to mprove the classfer s dscrmnaton ablty. Therefore, n our experments, although we had the estmated number of clusters from these models, we stll tred several dfferent numbers of clusters. Fgure 4. The nterface of the feature selecton software tool For the SVM learnng machne n ths experment, we use the RBF kernel. The expermental results of proten dsordered regon predcton wth the proposed method are lsted n TABLE 1. There are totally 440 features n the orgnal dataset. The best result va fve-fold crossvaldaton s 86.30% wth only 00 features. It s much better than the result produced by our prevous work wth a hybrd feature selecton model [8]. The best result n [8] was 8.7% wth 350 features. The number of features s further reduced by 34% ((350-00)/440) and the classfcaton accuracy s rased by 3.58%. Ths demonstrates the usefulness of the proposed feature selecton method. TABLE 1. FIVE-FOLD CROSS-VALIDATION RESULTS ON DISORDERED PROTEIN DATA Feature number Accuracy (5-fold cross-valdaton) % % % % Next, we compare the proposed method wth mrmr and MaxDep [10] on the ARR dataset. Fg. 5 shows that the proposed method s better than MaxDep and comparable to mrmr n classfcaton accuracy. The number of selected features ranges from 5 to 55 (from the orgnal 79 features). The proposed method provdes a better and more stable result than MaxDep. In the procedure of feature searchng, MaxDep has to search through the whole feature set wth dfferent combnatons. Ths procedure also takes tme. The proposed method dd not perform better than mrmr. The reason s that mrmr ncorporates the wrapper mode n the second stage of ts feature selecton

6 1376 JOURNAL OF SOFTWARE, VOL. 5, NO. 1, DECEMBER 010 procedure. The wrapper mode works as a post modfcaton step whch can further mprove the classfcaton accuracy by repeatedly usng a learnng machne. Ths repeated process s very tme-consumng. On the other hand, our method only uses clusterng analyss once. It s more lke a flter mode feature selecton procedure that does not requre very complex calculatons. Fgure 5. Ten-fold cross-valdaton accuracy comparson among MaxDep, mrmr, and correlaton coeffcent clusterng feature selecton on Arrhythma data (learnng machne: SVM) From the expermental results, we can observe that the number of features can be greatly reduced by the proposed method on both datasets. The advantage of the proposed method s that t can execute much faster than the wrapper-mode feature selecton methods whle mantanng comparable classfcaton accuracy. Clusterng analyss s very helpful n fndng maxmal dependency among features. Each cluster can represent a dfferent knd of features. VI. CONCLUSION In ths paper, a novel feature selecton method s proposed. The key characterstc of the method s to apply clusterng analyss n groupng the collected features. Only one representatve feature s needed from each feature group. Ths can greatly reduce the total number of features. In the method, the correlaton coeffcent s used to fnd smlar features wth maxmum dependency. It s also used to dentfy the most class-dependent feature as the representatve feature n each feature cluster. Flter-mode feature selecton methods only focus on dentfyng the most class-related features wthout consderng redundancy among these features. Also, some removed features are actually helpful to the overall classfcaton performance, but are vewed as not so classrelated and removed just because ther measures are low. On the other hand, feature selecton methods nvolved wth the wrapper mode requre a lot of computatons. The proposed method s advantageous to both flter-mode and wrapper-mode methods. Ths method s yet to consder the removal of nosy features whch can be harmful to the overall performance. One smple way to dentfy possble nosy data s to look for the representatve features whch have a low correlaton coeffcent value wth the class. A representatve feature wth a near zero correlaton coeffcent value should defntely be removed. But experments are needed to carefully examne the threshold settng. Ths s one future drecton of ths research. REFERENCES [1] C. Desy, B. Subbulakshm, S. Baskar, and N. Ramaraj, Effcent Dmensonalty Reducton Approaches for Feature Selecton, Internatonal Conference on Computatonal Intellgence and Multmeda Applcatons, vol., pp , 007. [do: /ICCIMA ] [] R. Kohav, and G. John, Wrappers for Feature Subset Selecton, Artfcal Intellgence, vol. 97, pp , [do: /S (97)00043-X] [3] J. R. Qunlan, Dscoverng Rules from Large Collectons of Examples: A Case Study, In Mche, D. ed., Expert Systems n the Mcroelectronc Age, Scotland: Ednburgh Unversty Press, Ednburgh, 1979, pp [4] Y. Lu, Y. F. Yn, J. J. Gao, and C. G. Tan, Wrapper Feature Selecton Optmzed SVM Model for Demand Forecastng, The Internatonal Conference on Young Computer Scentsts, pp , 008. [do: /ICYCS ] [5] LIBSVM - A Lbrary for Support Vector Machnes, (last accessed Nov 3, 009) [6] A. Al-An, A dependency-based search strategy for feature selecton, Expert Systems wth Applcatons: An Internatonal Journal, vol.36, pp , 009. [7] B. Bonev, F. Escolano and M. Angel-Cazorla, A Novel Informaton Theory Method for Flter Feature Selecton, MICAI 007: Advances n Artfcal Intellgence, Sprnger Berln / Hedelberg, pp , 007. [8] H.-H. Hsu, C.-W. Hseh, and M.-D. Lu, A Hybrd Feature Selecton Mechansm, n Proc. Eghth Internatonal Conference on Intellgent Systems Desgn and Applcatons (ISDA 008), vol., pp , Kaohsung, Tawan, Nov. 6-8, 008. [do: /ISDA ] [9] J. J. Huang, Y. Z. Ca, and X. M. Xu, A Flter Approach to Feature Selecton Based on Mutual Informaton, Cogntve Informatcs, vol. 1, pp , 006. [do: /COGINF ] [10] H. C. Peng, F. H. Long and C. Dng, Feature Selecton Based on Mutual Informaton: Crtera of Max- Dependency, Max-Relevance, and Mn-Redundancy, IEEE Transactons on Pattern Analyss and Machne Intellgence, vol. 7, no. 8, pp , 005. [do: /TPAMI ] [11] C. Dng and H. C. Peng, Mnmum Redundancy Feature Selecton from Mcroarray Gene Expresson Data, Proc. Second IEEE Computatonal Systems Bonformatcs Conf., pp , 003. [do: /CSB ] [1] M. Matteucc, A Tutoral on Clusterng Algorthms, l/ (last accessed Feb 3, 010) [13] T. M. Cover, The Best Two Independent Measurements Are Not the Two Best, IEEE Trans. Systems, Man, and Cybernetcs, vol. 4, pp , [14] C. Bracken, L. M. Iakoucheva, P.R. Romero, and A.K. Dunker, Combnng predcton, computaton and

7 JOURNAL OF SOFTWARE, VOL. 5, NO. 1, DECEMBER experment for the characterzaton of proten dsorder, Curr. Opn. Struct. Bol, vol. 14, pp , 004. [15] K. Peng, P. Radovojac, S. Vucetc, A. K. Dunker and Z. Obradovc, Length-dependent predcton of proten ntrnsc dsorder, BMC Bonformatcs, vol. 7, pp. 08, 006. [do: / ] [16] H. M. Berman, J. Westbrook, Z. Feng, G. Gllland, T. N. Bhat, H. Wessg et al., The Proten Data Bank, Nuclec Acds Resource, vol.8, pp. 35-4, 000. [do: /s ] [17] S. Vucetc, Z. Obradovc, V. Vacc, P. Radvojac, K. Peng, L. M. Iakoucheva et al., DsProt: A Database of Proten Dsorder, Bonformatcs, vol 1, pp , 005. [do: /bonformatcs/bth476] [18] S. F. Altschul, W. Gsh, W. Mller, E. W. Myers, and D. J. Lpman, Basc local algnment search tool, J. Mol. Bol., vol. 15, pp , [do: /jmb ] [19] UCI machne learnng repostory, (last accessed Feb 3, 010) [0] R. Tbshran, G. Walther, and T. Haste, Estmatng the number of clusters n a data set va the gap statstcs, Journal of the Royal Statstcal Socety, Seres B 63, pp , 001. [1] R. B. Calnsk, and J. A. Harabasz, A denrte method for cluster analyss, Communcatons n Statstcs, vol. 3, pp. 1-7, [] L. Kaufman, and P. Rousseeuw, Fndng Groups n Data: An Introducton to Cluster Analyss. New York: Wley, [3] J. A. Hartgan, Clusterng Algorthms. Wley, Hu-Huang Hsu s an Assocate Professor n the Department of Computer Scence and Informaton Engneerng at Tamkang Unversty, Tape, Tawan. He receved both hs PhD and MS Degrees from the Department of Electrcal and Computer Engneerng at the Unversty of Florda, USA, n 1994 and 1991, respectvely. He has publshed over 80 referred papers and book chapters, as well as partcpated n many nternatonal academc actvtes. Hs current research nterests are n the areas of machne learnng, data mnng, bo-medcal nformatcs, ambent ntellgence, and multmeda processng. He s a senor member of the IEEE. Cheng-We Hseh receved hs master s degree n Computer Scence and Informaton Engneerng at Natonal Central Unversty. Hs MS degree s from the Department of Computer Scence & Informaton Engneerng at Tamkang Unversty, Tape, Tawan. He s a PhD canddate n the Department of Computer Scence & Informaton Engneerng at Tamkang Unversty, Tape, Tawan. Hs major research nterests nclude applcatons n bonformatcs, machne learnng.

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES In ths chapter, we wll learn how to descrbe the relatonshp between two quanttatve varables. Remember (from Chapter 2) that the terms quanttatve varable

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

Performance Analysis and Coding Strategy of ECOC SVMs

Performance Analysis and Coding Strategy of ECOC SVMs Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04), pp.67-76 http://dx.do.org/0.457/jgdc.04.7..07 Performance Analyss and Codng Strategy of ECOC SVMs Zhgang Yan, and Yuanxuan Yang, School

More information

A Secure Password-Authenticated Key Agreement Using Smart Cards

A Secure Password-Authenticated Key Agreement Using Smart Cards A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

SIMPLE LINEAR CORRELATION

SIMPLE LINEAR CORRELATION SIMPLE LINEAR CORRELATION Smple lnear correlaton s a measure of the degree to whch two varables vary together, or a measure of the ntensty of the assocaton between two varables. Correlaton often s abused.

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm Document Clusterng Analyss Based on Hybrd PSO+K-means Algorthm Xaohu Cu, Thomas E. Potok Appled Software Engneerng Research Group, Computatonal Scences and Engneerng Dvson, Oak Rdge Natonal Laboratory,

More information

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao

More information

1. Measuring association using correlation and regression

1. Measuring association using correlation and regression How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a

More information

Single and multiple stage classifiers implementing logistic discrimination

Single and multiple stage classifiers implementing logistic discrimination Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,

More information

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION NEURO-FUZZY INFERENE SYSTEM FOR E-OMMERE WEBSITE EVALUATION Huan Lu, School of Software, Harbn Unversty of Scence and Technology, Harbn, hna Faculty of Appled Mathematcs and omputer Scence, Belarusan State

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

Statistical Methods to Develop Rating Models

Statistical Methods to Develop Rating Models Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and

More information

Gender Classification for Real-Time Audience Analysis System

Gender Classification for Real-Time Audience Analysis System Gender Classfcaton for Real-Tme Audence Analyss System Vladmr Khryashchev, Lev Shmaglt, Andrey Shemyakov, Anton Lebedev Yaroslavl State Unversty Yaroslavl, Russa vhr@yandex.ru, shmaglt_lev@yahoo.com, andrey.shemakov@gmal.com,

More information

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

How To Understand The Results Of The German Meris Cloud And Water Vapour Product Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL

More information

A DATA MINING APPLICATION IN A STUDENT DATABASE

A DATA MINING APPLICATION IN A STUDENT DATABASE JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul

More information

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES The goal: to measure (determne) an unknown quantty x (the value of a RV X) Realsaton: n results: y 1, y 2,..., y j,..., y n, (the measured values of Y 1, Y 2,..., Y j,..., Y n ) every result s encumbered

More information

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS 21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The

More information

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence 1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Internatonal Journal of Electronc Busness Management, Vol. 3, No. 4, pp. 30-30 (2005) 30 THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Yu-Mn Chang *, Yu-Cheh

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

Mining Multiple Large Data Sources

Mining Multiple Large Data Sources The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

IMPACT ANALYSIS OF A CELLULAR PHONE

IMPACT ANALYSIS OF A CELLULAR PHONE 4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT Toshhko Oda (1), Kochro Iwaoka (2) (1), (2) Infrastructure Systems Busness Unt, Panasonc System Networks Co., Ltd. Saedo-cho

More information

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES Zuzanna BRO EK-MUCHA, Grzegorz ZADORA, 2 Insttute of Forensc Research, Cracow, Poland 2 Faculty of Chemstry, Jagellonan

More information

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

v a 1 b 1 i, a 2 b 2 i,..., a n b n i. SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 455 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces we have studed thus far n the text are real vector spaces snce the scalars are

More information

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification IDC IDC A Herarchcal Anomaly Network Intruson Detecton System usng Neural Network Classfcaton ZHENG ZHANG, JUN LI, C. N. MANIKOPOULOS, JAY JORGENSON and JOSE UCLES ECE Department, New Jersey Inst. of Tech.,

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

Searching for Interacting Features for Spam Filtering

Searching for Interacting Features for Spam Filtering Searchng for Interactng Features for Spam Flterng Chuanlang Chen 1, Yun-Chao Gong 2, Rongfang Be 1,, and X. Z. Gao 3 1 Department of Computer Scence, Bejng Normal Unversty, Bejng 100875, Chna 2 Software

More information

Cluster Analysis of Data Points using Partitioning and Probabilistic Model-based Algorithms

Cluster Analysis of Data Points using Partitioning and Probabilistic Model-based Algorithms Internatonal Journal of Appled Informaton Systems (IJAIS) ISSN : 2249-0868 Foundaton of Computer Scence FCS, New York, USA Volume 7 No.7, August 2014 www.jas.org Cluster Analyss of Data Ponts usng Parttonng

More information

Data Visualization by Pairwise Distortion Minimization

Data Visualization by Pairwise Distortion Minimization Communcatons n Statstcs, Theory and Methods 34 (6), 005 Data Vsualzaton by Parwse Dstorton Mnmzaton By Marc Sobel, and Longn Jan Lateck* Department of Statstcs and Department of Computer and Informaton

More information

Design and Development of a Security Evaluation Platform Based on International Standards

Design and Development of a Security Evaluation Platform Based on International Standards Internatonal Journal of Informatcs Socety, VOL.5, NO.2 (203) 7-80 7 Desgn and Development of a Securty Evaluaton Platform Based on Internatonal Standards Yuj Takahash and Yoshm Teshgawara Graduate School

More information

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Research Note APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES * Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789-794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC

More information

A novel Method for Data Mining and Classification based on

A novel Method for Data Mining and Classification based on A novel Method for Data Mnng and Classfcaton based on Ensemble Learnng 1 1, Frst Author Nejang Normal Unversty;Schuan Nejang 641112,Chna, E-mal: lhan-gege@126.com Abstract Data mnng has been attached great

More information

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits Lnear Crcuts Analyss. Superposton, Theenn /Norton Equalent crcuts So far we hae explored tmendependent (resste) elements that are also lnear. A tmendependent elements s one for whch we can plot an / cure.

More information

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

Conversion between the vector and raster data structures using Fuzzy Geographical Entities Converson between the vector and raster data structures usng Fuzzy Geographcal Enttes Cdála Fonte Department of Mathematcs Faculty of Scences and Technology Unversty of Combra, Apartado 38, 3 454 Combra,

More information

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,

More information

Estimating the Number of Clusters in Genetics of Acute Lymphoblastic Leukemia Data

Estimating the Number of Clusters in Genetics of Acute Lymphoblastic Leukemia Data Journal of Al Azhar Unversty-Gaza (Natural Scences), 2011, 13 : 109-118 Estmatng the Number of Clusters n Genetcs of Acute Lymphoblastc Leukema Data Mahmoud K. Okasha, Khaled I.A. Almghar Department of

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

Implementation of Deutsch's Algorithm Using Mathcad

Implementation of Deutsch's Algorithm Using Mathcad Implementaton of Deutsch's Algorthm Usng Mathcad Frank Roux The followng s a Mathcad mplementaton of Davd Deutsch's quantum computer prototype as presented on pages - n "Machnes, Logc and Quantum Physcs"

More information

CHAPTER 14 MORE ABOUT REGRESSION

CHAPTER 14 MORE ABOUT REGRESSION CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp

More information

Calculating the high frequency transmission line parameters of power cables

Calculating the high frequency transmission line parameters of power cables < ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,

More information

Machine Learning and Software Quality Prediction: As an Expert System

Machine Learning and Software Quality Prediction: As an Expert System I.J. Informaton Engneerng and Electronc Busness, 2014, 2, 9-27 Publshed Onlne Aprl 2014 n MECS (http://www.mecs-press.org/) DOI: 10.5815/jeeb.2014.02.02 Machne Learnng and Software Qualty Predcton: As

More information

Lecture 2: Single Layer Perceptrons Kevin Swingler

Lecture 2: Single Layer Perceptrons Kevin Swingler Lecture 2: Sngle Layer Perceptrons Kevn Sngler kms@cs.str.ac.uk Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses

More information

Damage detection in composite laminates using coin-tap method

Damage detection in composite laminates using coin-tap method Damage detecton n composte lamnates usng con-tap method S.J. Km Korea Aerospace Research Insttute, 45 Eoeun-Dong, Youseong-Gu, 35-333 Daejeon, Republc of Korea yaeln@kar.re.kr 45 The con-tap test has the

More information

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications Methodology to Determne Relatonshps between Performance Factors n Hadoop Cloud Computng Applcatons Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng and

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST) yaoqi.feng@yahoo.

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST) yaoqi.feng@yahoo. ICSV4 Carns Australa 9- July, 007 RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL Yaoq FENG, Hanpng QIU Dynamc Test Laboratory, BISEE Chna Academy of Space Technology (CAST) yaoq.feng@yahoo.com Abstract

More information

Semantic Link Analysis for Finding Answer Experts *

Semantic Link Analysis for Finding Answer Experts * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 28, 51-65 (2012) Semantc Lnk Analyss for Fndng Answer Experts * YAO LU 1,2,3, XIAOJUN QUAN 2, JINGSHENG LEI 4, XINGLIANG NI 1,2,3, WENYIN LIU 2,3 AND YINLONG

More information

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME, ISSUE, FEBRUARY ISSN 77-866 Logcal Development Of Vogel s Approxmaton Method (LD- An Approach To Fnd Basc Feasble Soluton Of Transportaton

More information

The Application of Fractional Brownian Motion in Option Pricing

The Application of Fractional Brownian Motion in Option Pricing Vol. 0, No. (05), pp. 73-8 http://dx.do.org/0.457/jmue.05.0..6 The Applcaton of Fractonal Brownan Moton n Opton Prcng Qng-xn Zhou School of Basc Scence,arbn Unversty of Commerce,arbn zhouqngxn98@6.com

More information

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy 4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.

More information

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng

More information

Detecting Credit Card Fraud using Periodic Features

Detecting Credit Card Fraud using Periodic Features Detectng Credt Card Fraud usng Perodc Features Alejandro Correa Bahnsen, Djamla Aouada, Aleksandar Stojanovc and Björn Ottersten Interdscplnary Centre for Securty, Relablty and Trust Unversty of Luxembourg,

More information

A Probabilistic Theory of Coherence

A Probabilistic Theory of Coherence A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want

More information

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching) Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton

More information

1 Example 1: Axis-aligned rectangles

1 Example 1: Axis-aligned rectangles COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

More information

Properties of Indoor Received Signal Strength for WLAN Location Fingerprinting

Properties of Indoor Received Signal Strength for WLAN Location Fingerprinting Propertes of Indoor Receved Sgnal Strength for WLAN Locaton Fngerprntng Kamol Kaemarungs and Prashant Krshnamurthy Telecommuncatons Program, School of Informaton Scences, Unversty of Pttsburgh E-mal: kakst2,prashk@ptt.edu

More information

Implementations of Web-based Recommender Systems Using Hybrid Methods

Implementations of Web-based Recommender Systems Using Hybrid Methods Internatonal Journal of Computer Scence & Applcatons Vol. 3 Issue 3, pp 52-64 2006 Technomathematcs Research Foundaton Implementatons of Web-based Recommender Systems Usng Hybrd Methods Janusz Sobeck Insttute

More information

A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns

A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns A study on the ablty of Support Vector Regresson and Neural Networks to Forecast Basc Tme Seres Patterns Sven F. Crone, Jose Guajardo 2, and Rchard Weber 2 Lancaster Unversty, Department of Management

More information

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008 Rsk-based Fatgue Estmate of Deep Water Rsers -- Course Project for EM388F: Fracture Mechancs, Sprng 2008 Chen Sh Department of Cvl, Archtectural, and Envronmental Engneerng The Unversty of Texas at Austn

More information

Improved SVM in Cloud Computing Information Mining

Improved SVM in Cloud Computing Information Mining Internatonal Journal of Grd Dstrbuton Computng Vol.8, No.1 (015), pp.33-40 http://dx.do.org/10.1457/jgdc.015.8.1.04 Improved n Cloud Computng Informaton Mnng Lvshuhong (ZhengDe polytechnc college JangSu

More information

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University Characterzaton of Assembly Varaton Analyss Methods A Thess Presented to the Department of Mechancal Engneerng Brgham Young Unversty In Partal Fulfllment of the Requrements for the Degree Master of Scence

More information

The Journal of Systems and Software

The Journal of Systems and Software The Journal of Systems and Software 82 (2009) 241 252 Contents lsts avalable at ScenceDrect The Journal of Systems and Software journal homepage: www. elsever. com/ locate/ jss A study of project selecton

More information

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001 Proceedngs of the Annual Meetng of the Amercan Statstcal Assocaton, August 5-9, 2001 LIST-ASSISTED SAMPLING: THE EFFECT OF TELEPHONE SYSTEM CHANGES ON DESIGN 1 Clyde Tucker, Bureau of Labor Statstcs James

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

Cluster Analysis. Cluster Analysis

Cluster Analysis. Cluster Analysis Cluster Analyss Cluster Analyss What s Cluster Analyss? Types of Data n Cluster Analyss A Categorzaton of Maor Clusterng Methos Parttonng Methos Herarchcal Methos Densty-Base Methos Gr-Base Methos Moel-Base

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

A Dynamic Load Balancing for Massive Multiplayer Online Game Server

A Dynamic Load Balancing for Massive Multiplayer Online Game Server A Dynamc Load Balancng for Massve Multplayer Onlne Game Server Jungyoul Lm, Jaeyong Chung, Jnryong Km and Kwanghyun Shm Dgtal Content Research Dvson Electroncs and Telecommuncatons Research Insttute Daejeon,

More information

Improved Mining of Software Complexity Data on Evolutionary Filtered Training Sets

Improved Mining of Software Complexity Data on Evolutionary Filtered Training Sets Improved Mnng of Software Complexty Data on Evolutonary Fltered Tranng Sets VILI PODGORELEC Insttute of Informatcs, FERI Unversty of Marbor Smetanova ulca 17, SI-2000 Marbor SLOVENIA vl.podgorelec@un-mb.s

More information

Invoicing and Financial Forecasting of Time and Amount of Corresponding Cash Inflow

Invoicing and Financial Forecasting of Time and Amount of Corresponding Cash Inflow Dragan Smć Svetlana Smć Vasa Svrčevć Invocng and Fnancal Forecastng of Tme and Amount of Correspondng Cash Inflow Artcle Info:, Vol. 6 (2011), No. 3, pp. 014-021 Receved 13 Janyary 2011 Accepted 20 Aprl

More information

A Performance Analysis of View Maintenance Techniques for Data Warehouses

A Performance Analysis of View Maintenance Techniques for Data Warehouses A Performance Analyss of Vew Mantenance Technques for Data Warehouses Xng Wang Dell Computer Corporaton Round Roc, Texas Le Gruenwald The nversty of Olahoma School of Computer Scence orman, OK 739 Guangtao

More information

How To Calculate The Accountng Perod Of Nequalty

How To Calculate The Accountng Perod Of Nequalty Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.

More information

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

Study on Model of Risks Assessment of Standard Operation in Rural Power Network Study on Model of Rsks Assessment of Standard Operaton n Rural Power Network Qngj L 1, Tao Yang 2 1 Qngj L, College of Informaton and Electrcal Engneerng, Shenyang Agrculture Unversty, Shenyang 110866,

More information

Rank Based Clustering For Document Retrieval From Biomedical Databases

Rank Based Clustering For Document Retrieval From Biomedical Databases Jayanth Mancassamy et al /Internatonal Journal on Computer Scence and Engneerng Vol.1(2), 2009, 111-115 Rank Based Clusterng For Document Retreval From Bomedcal Databases Jayanth Mancassamy Department

More information

Support vector domain description

Support vector domain description Pattern Recognton Letters 20 (1999) 1191±1199 www.elsever.nl/locate/patrec Support vector doman descrpton Davd M.J. Tax *,1, Robert P.W. Dun Pattern Recognton Group, Faculty of Appled Scence, Delft Unversty

More information

Assessing Student Learning Through Keyword Density Analysis of Online Class Messages

Assessing Student Learning Through Keyword Density Analysis of Online Class Messages Assessng Student Learnng Through Keyword Densty Analyss of Onlne Class Messages Xn Chen New Jersey Insttute of Technology xc7@njt.edu Brook Wu New Jersey Insttute of Technology wu@njt.edu ABSTRACT Ths

More information