Improved Mining of Software Complexity Data on Evolutionary Filtered Training Sets
|
|
- Cornelius Simon
- 8 years ago
- Views:
Transcription
1 Improved Mnng of Software Complexty Data on Evolutonary Fltered Tranng Sets VILI PODGORELEC Insttute of Informatcs, FERI Unversty of Marbor Smetanova ulca 17, SI-2000 Marbor SLOVENIA Abstract: - Wth the evoluton of nformaton technology and software systems, software relablty has become one of the most mportant topcs of software engneerng. As the dependency of socety on software systems ncrease, so ncreases also the mportance of effcent software fault predcton. In ths paper we present a new approach to mprovng the classfcaton of faulty software modules. The proposed approach s based on flterng tranng sets wth the ntroducton of data outlers dentfcaton and removal method. The method uses an ensemble of evolutonary nduced decson trees to dentfy the outlers. We argue that a classfer traned by a fltered dataset captures a more general knowledge model and should therefore perform better also on unseen cases. The proposed method s appled on a real-world software relablty analyss dataset and the obtaned results are dscussed. Key-Words: - data mnng, classfcaton, evolutonary decson trees, flterng tranng sets, software fault predcton, search-based software engneerng 1 Introducton The early and accurate dentfcaton of potentally dangerous (faulty) software modules s of vtal mportance for better software relablty. Relablty s one of the most mportant aspects of software systems of any knd (nformaton systems, embedded systems, etc.). The sze and complexty of software s growng dramatcally durng last decades. The demand for hghly complex software systems s ncreasng more rapdly than the ablty to desgn, mplement and mantan them. When the requrements for and dependences of computers ncrease, the possblty of crses from falures also ncreases. The mpact of these falures ranges from nconvenence to economc damages to loss of lves therefore t s clear that software relablty s becomng a major concern not only for software engneers and computer scentsts, but also for the socety as a whole. Therefore, the employment of an effcent fault predctve technque to foresee dangerous software modules s essental. Wth the creaton of large emprcal databases of software projects, as a result of stmulated research on estmaton models, metrcs and methods for measurng and mprovng processes and products, ntellgent mnng of these datasets can largely add to the mprovement of software relablty [1]. In order to help the software engneers n predctng the faulty software modules, computerzed data mnng and decson support tools can be used whch are able to help software engneers to process a huge amount of data avalable from prevous software projects and suggest the probable predcton based on the values of several mportant attrbutes. Black-box classfcaton methods (neural networks for example) are not very approprate for ths knd of task, because the software experts want to evaluate and valdate the decson makng process nduced by those tools, before there s enough trust to use the tools n practce. On the other hand, the evaluaton of the nduced classfcaton rules produced by the computerzed tools by a software expert can be an mportant source of new knowledge on the assocatons of the avalable attrbutes and new laws of software relablty engneerng. In order to acheve ths goal, the classfcaton process should be easly understandable, nterpretable and straghtforward. One of the most popular and proven-useful approach are decson trees. However, t has been shown that decson trees are a weak classfer, prone to produce very dfferent solutons based on an even small change n nput (tranng) data. Therefore, naccuraces and nose n tranng data can easly lead to an naccurate result. ISSN: Issue 11, Volume 6, November 2009
2 The dea that we present n ths paper s to construct an outler predcton method that flters out so-called data outlers,.e. data tems that fall outsde the boundares that enclose most other data tems n the data set [1, 2]. When a fltered dataset s used to tran a classfer (a decson tree) t should produce a better and more relable classfcaton result. Furthermore, as a consequence of ncreased homogenety of the data, the results should be more general and thus smpler, less complex and easer to apply n general. Smlar method has already been appled to data mnng n medcne wth some success [3]. Our proposton for the predcton of outlers s the followng: f a sngle (known) data case s classfed dfferently by dfferent accurate classfers, t potentally contans contradctory nformaton (Fgure 1). Although ths contradcton s not necessarly an error n data, a usual decson tree classfer s not able to correctly construct a general model based on such data. Therefore, our proposton s that the general knowledge model bult by a decson tree (or some other nducton method n that matter) would be more effcent when a classfer would be traned wthout such msleadng data. 1.1 The am and scope of the paper The man objectve of ths paper s to nvestgate whether t s possble to mprove the classfcaton performance of several machne learnng algorthms when the outlers are fltered out of the tranng set. For the outler predcton method a set of evolutonary nduced decson trees s used. The proposed approach s appled to a software engneerng problem of predctng potentally dangerous (contanng errors and faults) software modules. The data mnng results (both quanttatve classfcaton results and qualtatve assessment of nduced models) of dfferent machne learnng algorthms are presented and dscussed. n data set 2 Flterng the dataset by outler predcton method The basc dea of outler predcton s to defne a crteron or crtera, upon whch for each data case from a dataset can be determned whether t belongs to the majorty of the cases or not. If the specfc data case regardng the defned crtera does not belong to the majorty, then t s called an outler (regardng specfc crtera). How the crtera are defned determnes the outler predcton method. In general, defnng outler crtera s not trval, whereas the dentfcaton of outlers based on these crtera s. In our prevous work we presented how to use evolutonary algorthm for the nducton of decson trees [4, 5, 6]. One of the greatest advantage of evolutonary constructon of decson trees, besde the proven effcency n classfyng, s the ablty to produce several comparably accurate classfers for the same dataset. Havng ths n mnd, t s possble to get a decson for the same data case based on dfferent classfers (usng dfferent attrbutes and/or dfferent relatons). class 1 classfers class 2 Fg 1. When a sngle data record s classfed dfferently by dfferent classfers, t potentally contans contradctory nformaton. In our method, we frst defne an approach to the dentfcaton of outlers. For ths purpose the algorthm for the constructon of decson trees ISSN: Issue 11, Volume 6, November 2009
3 gentrees s used (descrbed n the next secton). Wth gentrees a set of classfers (decson trees) are nduced. For each tranng object (data case from the tranng set) classfcaton cc (x) s calculated for all decson classes (Eq. 1) the resultng value represents the number of classfers that classfed object x wth the decson class. Then classfcaton confuson score CCS(x) s calculated for each tranng object x (Eq. 2) the result represents the confuson score of a set of classfers when classfyng object x; f all DTs gve the same classfcaton, then the result s 0; hgher numbers represent less homogeneous objects for classfcaton possble outlers. The hgher s the number CCS(x), more probably les the object x outsde the majorty area. Based on the CCS (Eq. 2) t s determned whch objects should be fltered out from the dataset for the classfcaton process. For ths purpose a tolerance threshold tt s defned for a dataset; f CCS(x)>tt then object x s fltered out. survval of the fttest, frst lad down by Charles Darwn, so by smulatng ths process, genetc algorthms are able to evolve solutons to real-world problems, f they have been sutably encoded [9]. They are often capable of fndng optmal solutons even n the most complex of search spaces or at least they offer sgnfcant benefts over other search and optmsaton technques. The basc mechansm of evolutonary algorthms s presented n Fgure 2. cc ( x) ;, num _ DTs j 1 1; class ( DT 0; otherwse ( 1... num _ classes ) j, x) (1) CCS 2 num _ classes cc ( x) ( x) 1 (2) 1 cc max ( x) Fg 2. The basc mechansm of evolutonary algorthms: selecton, crossover and mutaton. 3 Backend technologes: evolutonary algorthms and decson trees The central pont of our approach, as has been sad above, s the use of evolutonary nduced decson trees whch are beng used to construct a set of classfers for dentfyng outlers n a tranng set. Both backend technologes enablng our approach are brefly presented frst, followng by the descrpton of our algorthm for nducng decson trees. 3.1 Evolutonary algorthms Genetc algorthms are adaptve heurstc search methods whch may be used to solve all knds of complex search and optmsaton problems [7, 8]. They are based on the evolutonary deas of natural selecton and genetc processes of bologcal organsms. Lke natural populatons evolve accordng to the prncples of natural selecton and A typcal genetc algorthm operates on a set of solutons (populaton) wthn the search space. The search space represents all the possble solutons whch can be obtaned for the gven problem, and s usually very complex or even nfnte. Every pont of the search space s one of the possble solutons and therefore the am of the genetc algorthm s to fnd an optmal pont or at least come as close to t as possble. The genetc algorthm conssts of three genetc operators: selecton, crossover (recombnaton), and mutaton. Selecton s the survval of the fttest ndvduals wthn the genetc algorthm wth the am of gvng the preference to the best ones. For ths purpose all solutons have to be evaluated, whch s done wth the use of the evaluaton functon. Selecton determnes ndvduals to be used for the second genetc operator - crossover or recombnaton, where from two good ndvduals a new, even better one s constructed. The crossover process s repeated untl the whole new populaton s completed wth the offsprng produced by the ISSN: Issue 11, Volume 6, November 2009
4 crossover. All constructed ndvduals have to preserve the feasblty regardng the gven problem; n ths manner t s mportant to coordnate nternal representaton of ndvduals wth genetc operators. The last genetc operator s mutaton, whch s an occasonal (low probablty) alteraton of an ndvdual that helps to fnd an optmal soluton to the gven problem faster and more relably. DTs have been already used n a large amount of dfferent applcatons: n medcne [5], fnance [11], envronmental studes [12], etc. Ther greatest advantage s comparatvely hgh classfcaton accuracy whle preservng the transparent knowledge model whch can be easly nterpreted and valdated by doman experts. Ths has made DTs one of the most popular classfers. 3.2 Decson trees Inductve nference s the process of movng from concrete examples to general models, where the goal s to learn how to classfy objects by analysng a set of nstances (already solved cases) whose classes are known. Instances are typcally represented as attrbute-value vectors. Tranng nput conssts of a set of such vectors, each belongng to a known class, and the output conssts of a mappng from attrbute values to classes. Ths mappng should accurately classfy both the gven nstances and other unseen nstances. A decson tree (DT) [5, 9] s a formalsm for expressng such mappngs and conssts of tests or attrbute nodes lnked to two or more sub-trees and leafs or decson nodes labeled wth a class whch means the decson. A test node computes some outcome based on the attrbute values of an nstance, where each possble outcome s assocated wth one of the sub-trees. An nstance s classfed by startng at the root node of the tree. If ths node s a test, the outcome for the nstance s determned and the process contnues usng the approprate subtree. When a leaf s eventually encountered, ts label gves the predcted class of the nstance. An example of a DT s presented n Fgure 3. Fg 3. An example of a decson tree. 3.3 Evolutonary nducton of decson trees: the gentrees algorthm Evolutonary algorthms (EAs) are generally used for very complex optmzaton tasks [8], for whch no effcent determnstc or heurstc method s developed. Constructon of DTs s a complex task, but an exact method exsts, that n general works effcently and relably [5, 9]. At a frst glance there s no reason to use EAs. Nevertheless, there are some objectve reasons that justfy our evolutonary approach. Frst, EAs provde a very general concept that can be used n all knds of decson makng problems. Because of ther robustness they can be used also on ncomplete, nosy data (whch often happens because of measurement errors, unavalablty of proper nstruments, etc.). Ths s not very successfully solvable by tradtonal technques of DT constructon. Furthermore, EAs use evolutonary prncples to evolve solutons, therefore solutons can be found that can be easly overlooked otherwse. Another mportant advantage of EAs s the possblty of optmzng the decson tree's topology and adaptng class ntervals for numerc attrbutes, wthn the evoluton process. And the most mportant advantage of the evolutonary approach n our case s that not only one but several equally qualtatve solutons are obtaned for the same dataset. In ths way the same decson can be made based on dfferent attrbutes and/or combnaton of the attrbutes. By dfferent settngs of parameters n EA runs, searchng can be drected to dfferent stuatons that gve us dfferent solutons for our fnal decson forest. Of course there are also some drawbacks of our method regardng the heurstc nducton, the most obvous one beng the hgher nducton tme complexty. When defnng the nternal representaton of ndvduals wthn the genetc populaton, together wth the approprate genetc operators that wll work upon the populaton, t s mportant to assure the feasblty of all solutons durng the whole evoluton process. We decded to present an ndvdual drectly as a DT. In ths manner all ntermedate solutons are feasble, no nformaton s ISSN: Issue 11, Volume 6, November 2009
5 lost because of the converson between nternal representaton and the DT, the ftness functon (FF) can be straghtforward, etc. The problem wth drect codng of soluton may brng some problems n defnng of genetc operators. As DTs may be seen as a knd of smple computer programs (wth attrbute nodes beng condtonal clauses and decson nodes beng assgnments) we decded to defne genetc operators smlar to those used n genetc programmng where ndvduals are computer program trees [10]. The nducton of DT-lke classfers wth the use of evolutonary algorthms has been already used n several cases [5, 14, 15], also appled to software engneerng [6] Inducton of DTs for the ntal populaton Frst step of the genetc algorthm s the nducton of the ntal populaton. A random decson tree s constructed based on the followng algorthm: 1. nput: number of attrbute nodes V that wll be n the tree 2. select an attrbute A from the set of all possble K attrbutes and defne t a root node tn 3. n accordance wth the selected attrbute's A type (dscrete or contnuous) defne a test for ths node tn: a. for contnuous attrbutes n a form of tn(a ) <, where tn(a ) s the attrbute value for a data object and s a splt constant b. for dscrete attrbutes two dsjunctve sets of all possble attrbute values are randomly defned 4. connect empty leaves to both new branches from node tn 5. randomly select an empty leaf node tn (note below) 6. randomly select an attrbute A from the set of all possble K attrbutes (note below) 7. replace the selected leaf node tn wth the attrbute A and go to step 3 8. fnsh when V attrbute nodes has been created In the above algorthm: - the probablty of selectng an empty leaf s decreased wth the depth of the leaf n a growng tree, and - the probablty of choosng an attrbute depends on a number of prevous uses of that attrbute n a tree n ths manner unused attrbutes have better chances to be selected. A smplfed pseudo-code of the above algorthm s presented at Fgure 4. // ntal decson trees Select number of nodes N repeat Select an attrbute X Defne f t (X)< or f t (X ) V Connect empty leaves to node Randomly select an empty leaf node untl N nodes are created Fg 4. The basc algorthm for constructng random DTs. For each empty leaf the followng algorthm determnes the approprate decson class: let S be the tranng set of all tranng objects N wth M possble decson classes 1,.., M and N s the number of objects wthn S of a class. Let S tn be the sample set at node tn (an empty leaf for whch we are tryng to select a decson class) wth N tn objects; N tn s the number of objects wthn S tn of a decson class. Now we can defne a functon that measures a potental percentage of correctly classfed objects of a class : F tn, ) N N tn ( (3) Decson tn for the leaf node tn s then marked as the decson, for whch F(tn,) s maxmal. The rankng of an ndvdual DT wthn a populaton s based on the FF: FF M V w ( 1 acc ) c( tn ) wu nu (4) 1 1 where M s the number of decson classes, V s the number of attrbute nodes n a tree, acc s the accuracy of classfcaton of objects of a specfc decson class, w s the mportance weght for classfyng the objects of the decson class, c(tn ) s the cost of usng the attrbute n a node tn, nu s number of unused decson (leaf) nodes,.e. where no object from the tranng set fall nto, and w u s the weght of the presence of unused decson nodes n a tree. Accordng to FF the best trees (the most ft ones) have the lowest functon values the am of the evolutonary process s to mnmze the value of FF for the best tree. A near optmal DT would: 1) classfy all tranng objects wth accuracy n ISSN: Issue 11, Volume 6, November 2009
6 accordance wth mportance weghts w (some decson classes may be more mportant than the others), 2) have very lttle unused decson nodes (there s no evaluaton possble for ths knd of decson nodes regardng the tranng set), and 3) consst of low-cost attrbute nodes (n ths manner the desrable/undesrable attrbutes can be prortzed). qualtatve solutons are obtaned regardng the chosen FF. The evoluton stops when an optmal or at least an acceptable soluton s found or f the ftness score of the best ndvdual does not change for a predefned (large) number of generatons. A smplfed pseudo-code algorthm for the evolutonary process of mprovng the qualty of a DT (n accordance wth FF) s presented at Fgure Crossover and mutaton Crossover works on two selected ndvduals as an exchange of two randomly selected sub-trees. In ths manner a randomly selected tranng object s selected frst whch s used to determne paths (by fndng a decson through the tree) n both selected trees. Then an attrbute node s randomly selected on a path n the frst tree and an attrbute s randomly selected on a path n the second tree. Fnally, the sub-tree from a selected attrbute node n the frst tree s replaced wth the sub-tree from a selected attrbute node n the second tree and n ths manner an offsprng s created whch s put nto a new populaton. Mutaton conssts of several parts: 1) one randomly selected attrbute node s replaced wth another attrbute, randomly chosen from the set of all attrbutes; 2) a test n a randomly selected attrbute node s changed,.e. the splt constant s mutated; 3) a randomly selected decson (leaf) node s replaced by an attrbute node; 4) a randomly selected attrbute node s replaced by a decson node. // evolvng DTs repeat tournament selecton crossover mutaton evaluate all DTs wth FF untl termnaton crtera reached Fg 5. The basc algorthm for evolvng DTs. 4 Applcaton of the method The descrbed tranng set flterng method has been appled to a real-world software relablty analyss dataset, composed at the Unversty of Udne, Italy, from the software development project of a hosptal nformaton system the whole medcal software system conssts of 904 modules n C programmng language representng more than lnes of code. Frst the modules have been dentfed ether as OK or DANGEROUS by applyng the model developed by Pghn [11] the modules that contan less than 5 errors were set as OK and the others as DANGEROUS. A set of 168 attrbutes, contanng varous software complexty measures, has been determned for each software module. From all 904 modules 804 have been randomly selected for the tranng set, and the remanng 100 modules have been selected for the testng set. A set of classfers were nduced wth gentrees based on the tranng set. Then outlers were dentfed usng the class confuson score metrcs (Eq. 2), whch were removed from the orgnal tranng set n order to get a fltered tranng set. Fnally, some well-known classfcaton algorthms were used on both orgnal and fltered tranng set n order to compare classfcaton results. The followng classfcaton algorthms have been used: AREX [12], ID3, C4.5 [13, 14], Naïve-Bayes (N-B), nstance-based classfer (IB,.e. k-nearest neghbors), and logstc regresson (LogReg) [15, 16]. All the results are the averages of 10-fold crossvaldaton. Wth the combnaton of presented crossover that works as a constructve operator towards local optmums, and mutaton that works as a destructve operator n order to keep the needed genetc dversty, the searchng for the soluton tends to be drected toward the globally optmal soluton. In ths manner the optmal soluton s the most approprate DT regardng our specfc needs (expressed n the form of FF). As the evoluton repeats, more 4.1 A Dataset The used software complexty measures dataset (SCM dataset) has been carefully composed from a well-prepared protocol [17]. The startng pont for the analyss was the defnton and measurement of a set of expermental attrbutes connected to the structure of software products after the code phase. Such parameters may, for example, be the total number of lnes of code and lnes of comments, the ISSN: Issue 11, Volume 6, November 2009
7 occurrence of varous types of nstructons, the operators and the types of data used. For each software module, moreover, the fault sgnals up to the moment when measurement started were consdered. By faults all the malfunctons encountered durng the nternal test phase and after the release of the software are meant. In our expermental envronment the code phase ncluded a prelmnary test of modules. After ths phase the modules went on to the real test sesson, n whch faults were sgnaled and measurement started. What had to be dealt wth was whether the chosen set of parameters would be suffcently large to dentfy the structure of a program. Accordngly t has been started wth a very large set of parameters. These parameters were affected by the persstence of multcollnearty. It was necessary to reduce the total number of parameters to a smaller set of ndependent parameters. Ths was acheved by statstcal procedures elmnatng those whch were heavly dependent on other parameters n explanng the presence of faults n a code, or whch were completely rrelevant. A subset of 168 structural parameters was defned whch the factoral analyss dentfed as beng reasonably free from multcollnearty, plus the dependent varable, the number of faults. The basc propertes of the SCM dataset are presented n Table 1. Table 1. The basc propertes of the SCM dataset. number of cases 904 number of attrbutes nomnal 1 - contnuous 167 number of decson classes 2 decson classes dstrbuton 76.4%; 23.6% 4.2 Quanttatve results As descrbed above, after flterng out the outlers by the proposed outler predcton method, the reduced data set has been used by some well-known classfcaton methods. When usng the same tranng set and the same parameter settng, the algorthms produce the same determnstc results (for example C4.5 algorthm uses entropy measures for the nducton of decson trees, etc). In ths manner, the dfference n acheved classfcaton effectveness on a testng set can be objectvely compared between the nduced classfers traned by ether the orgnal, non-fltered tranng set or the fltered tranng set. The classfcaton results are presented n tables 2 and 3 and on fgures 4 and 5. For the SCM dataset fve classfers were nduced and the tolerance threshold selected tt=0.2; f none or only one classfer (out of fve) msclassfed an object, then the object was not dentfed as an outler. Altogether 29 objects (from 804 n the orgnal tranng set) were removed from the tranng set (3.6% removal). Table 2. Average classfcaton accuraces on the SCM tranng dataset. classfcaton algorthm accuracy on the tranng set [%] orgnal set fltered set AREX C ID IB Naïve-Bayes LogReg Table 3. Average classfcaton accuraces on the SCM testng dataset. classfcaton accuracy on the tranng set [%] algorthm orgnal set fltered set AREX C ID IB Naïve-Bayes LogReg ,00 100,00 95,00 90,00 85,00 80,00 75,00 70,00 Accuracy on tranng set for SCM dataset. AREX C4.5 ID3 IB Nave- Bayes orgnal fltered LogReg Fg. 4. Classfcaton results on the SCM tranng dataset. ISSN: Issue 11, Volume 6, November 2009
8 85,00 Accuracy on testng set for SCM dataset. rules, whereas the classfers nduced on the orgnal tranng set are much more demandng to nterpret. 82,00 79,00 76,00 73,00 70,00 AREX C4.5 ID3 IB Nave- Bayes orgnal fltered LogReg Fg. 5. Classfcaton results on the SCM testng dataset All the results presented are based on the 10-fold cross valdaton for each algorthm on a tranng set and a testng set. The classfcaton results of the method AREX are based on 10 ndependent evolutonary runs for each fold. 4.3 Qualtatve results The possblty of accurate predctons of the potentally dangerous software modules based on the software complexty attrbutes s very mportant for software engneers n order to mprove the software relablty. The proposed classfcaton method proves to be effectve n performng ths task. However, the knowledge (.e. combnatons of the used attrbutes) used to make the predctons would be of an mmense mportance n order to decrease the possblty of the problems to arse n the frst place. Therefore, the bult knowledge models (lke decson trees or decson rules) should be studed to fnd ths knowledge. An nterestng phenomenon that arose wth the flterng of the dentfed outlers from the orgnal tranng set s the fact, that the bult knowledge models based upon the fltered learnng set were much less complex than those bult on the orgnal, non-fltered learnng set. Ths means that less attrbutes were used to predct the faulty modules and the overall models were smpler, smaller and less complex easer to nterpret. For the comparson on Fgure 6 there are two decson trees bult on the orgnal, non-fltered learnng set, and on Fgure 7 there are a few decson trees bult on the fltered learnng set; the accuracy of all the decson models are pretty much the same. We can see that n the case of fltered tranng set, the resultng classfers are almost as smple as smple alpha --[< ] OK --[>= ] sgnf_of_comments --[< ] StrCtrl_lnes --[< ] OK --[>= ] DANGEROUS --[>= ] selecton_nstr --[<3.799] DANGEROUS --[>=3.799] formal_functon_params --[<19.162] DANGEROUS --[>=19.162] sgnf_of_comments --[< ] OK --[>= ] DANGEROUS words_of_comments --[< ] break --[<12.282] functon_calls_to_funcs --[<11.985] vect_functon_args --[<25.088] const_wth_#defne --[<41.584] formal_func_pars --[<8.134] OK --[>=8.134] DANGEROUS --[>=41.584] DANGEROUS --[>=25.088] fletype --[c,r,h] OK --[s] DANGEROUS --[>=11.985] StrCtrl_lnes --[<10.611] DANGEROUS --[>=10.611] OK --[>=12.282] DANGEROUS --[>= ] DANGEROUS Fg. 6. Two of the nduced decson trees for predctng dangerous software modules on the orgnal, non-fltered learnng set. tot_of_comments --[< ] break --[< ] OK --[>= ] DANGER --[>= ] DANGER declared_vect --[< ] solo_comment_lnes --[< ] OK --[>= ] DANGER --[>= ] DANGER StrCtrl_lnes --[< ] sgnf_of_comments --[< ] OK --[>= ] DANGER --[>= ] DANGER ISSN: Issue 11, Volume 6, November 2009
9 case --[< ] words_of_comments --[< ] OK --[>= ] DANGER --[>= ] DANGER Fg. 7. A few of the nduced decson trees for predctng dangerous software modules on the fltered learnng set. 4.4 Dscusson The classfcaton results on the SCM dataset show that all the decson tree based classfers (AREX, ID3, C4.5) mproved on the tranng set (as expected) and also mproved consderably on the testng set. Whereas the mprovement was expected on the tranng set, as the outlers are more dffcult to place n the general model, the mprovement on the testng set was only hoped for. The Naïve-Bayes classfer mproved on the tranng set, but scored the same on the testng set. The IB and LogReg classfers scored practcally the same results on the tranng set and show some mprovements on the testng set. From the above results t can be concluded that the flterng of the tranng dataset (by removng the outlers) has only a mnor effect on IB classfers; as they search for the most smlar objects n the dataset practcally the outlers have no effect on the search. The decson tree classfers beneft from flterng the removal of outlers helps to reduce the uncertanty that s ntroduced by outlers, even as the percentage of the removed objects s rather small (3.6%). As DTs are very weak classfers, where even a change of a sngle tranng case can result n a completely dfferent soluton, the removal of dentfed outlers (whch are selected based on ther complexty of beng correctly classfed) substantally mproves the classfcaton results. The Naïve-Bayes and logstc regresson classfers do not beneft consderably from flterng; however, the classfcaton results are not worse ether. We must know that statstcs, upon whch these classfers are based, normally works towards generalzaton, neglectng mnor (boundary) cases. For ths reason, n our opnon, the removal of outlers from the tranng set does not nfluence the results consderably. Generally speakng for all the used classfcaton methods: when usng the fltered tranng set, the quanttatve classfcaton results on the testng set are at least as good as when traned on the orgnal tranng set, and n many cases better. 5 Concluson A fault predctve technque to foresee dangerous software modules has been presented n the paper. It s based on a new outler predcton and removal technque, that s used to flter an orgnal tranng set, whch s then used to tran varous classfers. Our proposton was that the fltered tranng datasets used to tran the classfers can mprove the classfcaton results regardng the orgnal, nonfltered datasets. The obtaned results show an mprovement of the classfcaton performance wth the decson tree classfers (whch are weak classfers), where outlers have rather negatve effect on the tranng process. The proposed approach has only a mnor effect on the nstance based, Naïve-Bayes and logstc regresson classfers n these cases the outlers do not represent a bg ssue as statstcal methods already work prmarly n a generalzaton manner, neglectng mnor (boundary) cases. However, all the classfcaton results are at least as good as wth the orgnal, non-fltered dataset. Ths fact speaks n favor of usng the proposed outler predcton and removal method when nducng a fault predctve knowledge model. In partcular, because traned knowledge models (n the case of nducng DT classfers), at least as t turned out n our experment, are substantally less complex when traned upon fltered tranng sets. In ths manner, an expert can draw some general knowledge out of the nduced model. As even the smallest mprovement s of vtal mportance when mnng the software relablty measures data, we plan to further explore the possbltes that the proposed approach offers. References: [1] P.K. Kapur, O. Shatnaw, A.G. Aggarwal, R. Kumar, Unfed framework for developng testng effort dependent software relablty growth models, WSEAS Transactons on Systems, vol. 8, num. 4, 2009, pp [2] M.M. Breung, H.-P. Kregel, R.T. Ng, J. Sander, LOF: Identfyng densty-based local outlers, Proceedngs of ACM SIGMOD, [3] E.M. Knorr, R.T. Ng, Algorthms for mnng dstance-based outlers n large datasets, Proceedngs of the 24th VLDB Conference, ISSN: Issue 11, Volume 6, November 2009
10 [4] V. Podgorelec, M. Hercko, I. Rozman, Improvng mnng of medcal data by outlers predcton, Proceedngs of the 18th IEEE Symposum on Computer-based Medcal Systems CBMS'2005, [5] V. Podgorelec, P. Kokol, Towards more optmal medcal dagnosng wth evolutonary algorthms, Journal of Medcal Systems, Kluwer Academc/Plenum Press, vol. 25, num. 3, 2001, pp [6] V. Podgorelec, P. Kokol, B. Stglc, I. Rozman, Decson trees: an overvew and ther use n medcne, Journal of Medcal Systems, Kluwer Academc/Plenum Press, vol. 26, num. 5, 2002, pp [7] V. Podgorelec, P. Kokol, Evolutonary nduced decson trees for dangerous software modules predcton, Informaton Processng Letters, vol. 82, num. 1, 2002, pp [8] T. Baeck, Evolutonary Algorthms n Theory and Practce, Oxford Unversty Press, Inc., [9] D.E. Goldberg, Genetc Algorthms n Search, Optmzaton, and Machne Learnng, Readng MA: Addson Wesley, [10] J.H. Holland, Adaptaton n natural and artfcal systems, Cambrdge, MA: MIT Press, [11] A. Rotshten, M. Posner, H. Rakytyanska, Fuzzy IF-THEN Rules Extracton for Medcal Dagnoss Usng Genetc Algorthm, WSEAS Transactons on Systems, vol. 3, num. 2, 2004, pp [12] C.-S. Huang, Y.-J. Ln, C. Ln, Implementaton of classfers for choosng nsurance polcy usng decson trees: a case study, WSEAS Transactons on Computers, vol. 7, num. 10, 2008, pp [13] J.R. Koza, Genetc Programmng: On the Programmng of Computers by Natural Selecton, Cambrdge, MA: MIT Press, [14] M. Kasparova, J. Krupka, P. Jrava, Applcaton of decson trees n problem of ar qualty modellng n the Czech Republc localty, WSEAS Transactons on Systems, vol. 7, num. 10, 2008, pp [15] M. Zorman, B. Cernohorsk, G. Gorsek, M. Ojstersek, Hybrd evolutonary bult decson trees for predcton of perspectve cross-country skers, WSEAS Transactons on Informaton Scence and Applcatons, vol. 2, num. 1, 2005, pp [16] M. Pghn, P. Kokol, RPSM: A rsk-predctve structural expermental metrc, Proceedngs of FESMA'99, pp , [17] V. Podgorelec, P. Kokol, I. Rozman, AREX - Classfcaton rules extractng algorthm based on automatc programmng, Proceedngs of the 15th European Conference on Artfcal Intellgence ECAI 2002, pp , [18] J.R. Qunlan, C4.5: Programs for Machne Learnng, Morgan Kaufmann, [19], RuleQuest Research Data Mnng Tools, [20], Machne Learnng n C++, MLC++ lbrary, [21] J. Demsar, B. Zupan, Orange: From Expermental Machne Learnng to Interactve Data Mnng, Whte Paper ( Faculty of Computer and Informaton Scence, Unversty of Ljubljana, [22] V. Podgorelec, P. Kokol, M. Zorman, M. Sprogar, M. Pghn, The Operatve Constrants of Software Relablty Predcton Methods, Proceedngs of the World Multconference SCI'2001, ISSN: Issue 11, Volume 6, November 2009
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis
The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.
More informationFeature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College
Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure
More informationForecasting the Direction and Strength of Stock Market Movement
Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems
More informationWhat is Candidate Sampling
What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble
More informationThe Greedy Method. Introduction. 0/1 Knapsack Problem
The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton
More informationForecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network
700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School
More informationMining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System
Mnng Feature Importance: Applyng Evolutonary Algorthms wthn a Web-based Educatonal System Behrouz MINAEI-BIDGOLI 1, and Gerd KORTEMEYER 2, and Wllam F. PUNCH 1 1 Genetc Algorthms Research and Applcatons
More informationNEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION
NEURO-FUZZY INFERENE SYSTEM FOR E-OMMERE WEBSITE EVALUATION Huan Lu, School of Software, Harbn Unversty of Scence and Technology, Harbn, hna Faculty of Appled Mathematcs and omputer Scence, Belarusan State
More informationLogistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification
Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson
More informationLuby s Alg. for Maximal Independent Sets using Pairwise Independence
Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent
More informationGender Classification for Real-Time Audience Analysis System
Gender Classfcaton for Real-Tme Audence Analyss System Vladmr Khryashchev, Lev Shmaglt, Andrey Shemyakov, Anton Lebedev Yaroslavl State Unversty Yaroslavl, Russa vhr@yandex.ru, shmaglt_lev@yahoo.com, andrey.shemakov@gmal.com,
More informationBusiness Process Improvement using Multi-objective Optimisation K. Vergidis 1, A. Tiwari 1 and B. Majeed 2
Busness Process Improvement usng Mult-objectve Optmsaton K. Vergds 1, A. Twar 1 and B. Majeed 2 1 Manufacturng Department, School of Industral and Manufacturng Scence, Cranfeld Unversty, Cranfeld, MK43
More informationA hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm
Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel
More informationbenefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).
REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or
More informationModule 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..
More informationDescriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications
CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary
More informationECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble
1 ECE544NA Fnal Project: Robust Machne Learnng Hardware va Classfer Ensemble Sa Zhang, szhang12@llnos.edu Dept. of Electr. & Comput. Eng., Unv. of Illnos at Urbana-Champagn, Urbana, IL, USA Abstract In
More informationCS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements
Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there
More informationBayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending
Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success
More informationAnts Can Schedule Software Projects
Ants Can Schedule Software Proects Broderck Crawford 1,2, Rcardo Soto 1,3, Frankln Johnson 4, and Erc Monfroy 5 1 Pontfca Unversdad Católca de Valparaíso, Chle FrstName.Name@ucv.cl 2 Unversdad Fns Terrae,
More informationCan Auto Liability Insurance Purchases Signal Risk Attitude?
Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang
More informationOn the Optimal Control of a Cascade of Hydro-Electric Power Stations
On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;
More informationOn-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features
On-Lne Fault Detecton n Wnd Turbne Transmsson System usng Adaptve Flter and Robust Statstcal Features Ruoyu L Remote Dagnostcs Center SKF USA Inc. 3443 N. Sam Houston Pkwy., Houston TX 77086 Emal: ruoyu.l@skf.com
More informationPlanning for Marketing Campaigns
Plannng for Marketng Campagns Qang Yang and Hong Cheng Department of Computer Scence Hong Kong Unversty of Scence and Technology Clearwater Bay, Kowloon, Hong Kong, Chna (qyang, csch)@cs.ust.hk Abstract
More informationSingle and multiple stage classifiers implementing logistic discrimination
Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,
More informationSoftware project management with GAs
Informaton Scences 177 (27) 238 241 www.elsever.com/locate/ns Software project management wth GAs Enrque Alba *, J. Francsco Chcano Unversty of Málaga, Grupo GISUM, Departamento de Lenguajes y Cencas de
More informationA DATA MINING APPLICATION IN A STUDENT DATABASE
JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul
More informationA Secure Password-Authenticated Key Agreement Using Smart Cards
A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,
More informationIMPACT ANALYSIS OF A CELLULAR PHONE
4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng
More informationFREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES
FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES Zuzanna BRO EK-MUCHA, Grzegorz ZADORA, 2 Insttute of Forensc Research, Cracow, Poland 2 Faculty of Chemstry, Jagellonan
More informationStudy on Model of Risks Assessment of Standard Operation in Rural Power Network
Study on Model of Rsks Assessment of Standard Operaton n Rural Power Network Qngj L 1, Tao Yang 2 1 Qngj L, College of Informaton and Electrcal Engneerng, Shenyang Agrculture Unversty, Shenyang 110866,
More informationCalculating the high frequency transmission line parameters of power cables
< ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,
More informationSearching for Interacting Features for Spam Filtering
Searchng for Interactng Features for Spam Flterng Chuanlang Chen 1, Yun-Chao Gong 2, Rongfang Be 1,, and X. Z. Gao 3 1 Department of Computer Scence, Bejng Normal Unversty, Bejng 100875, Chna 2 Software
More informationPerformance Analysis and Coding Strategy of ECOC SVMs
Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04), pp.67-76 http://dx.do.org/0.457/jgdc.04.7..07 Performance Analyss and Codng Strategy of ECOC SVMs Zhgang Yan, and Yuanxuan Yang, School
More informationAn Inductive Fuzzy Classification Approach applied to Individual Marketing
An Inductve Fuzzy Classfcaton Approach appled to Indvdual Marketng Mchael Kaufmann, Andreas Meer Abstract A data mnng methodology for an nductve fuzzy classfcaton s ntroduced. The nducton step s based
More informationAn Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement
An Enhanced Super-Resoluton System wth Improved Image Regstraton, Automatc Image Selecton, and Image Enhancement Yu-Chuan Kuo ( ), Chen-Yu Chen ( ), and Chou-Shann Fuh ( ) Department of Computer Scence
More informationStatistical Methods to Develop Rating Models
Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and
More informationFault tolerance in cloud technologies presented as a service
Internatonal Scentfc Conference Computer Scence 2015 Pavel Dzhunev, PhD student Fault tolerance n cloud technologes presented as a servce INTRODUCTION Improvements n technques for vrtualzaton and performance
More informationInstitute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic
Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange
More informationSCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS
SCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS Magdalena Rogalska 1, Wocech Bożeko 2,Zdzsław Heduck 3, 1 Lubln Unversty of Technology, 2- Lubln, Nadbystrzycka 4., Poland. E-mal:rogalska@akropols.pol.lubln.pl
More informationAn Analysis of Dynamic Severity and Population Size
An Analyss of Dynamc Severty and Populaton Sze Karsten Wecker Unversty of Stuttgart, Insttute of Computer Scence, Bretwesenstr. 2 22, 7565 Stuttgart, Germany, emal: Karsten.Wecker@nformatk.un-stuttgart.de
More informationComputer-assisted Auditing for High- Volume Medical Coding
Computer-asssted Audtng for Hgh-Volume Medcal Codng Computer-asssted Audtng for Hgh- Volume Medcal Codng by Danel T. Henze, PhD; Peter Feller, MS; Jerry McCorkle, BA; and Mark Morsch, MS Abstract The volume
More informationAn Alternative Way to Measure Private Equity Performance
An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate
More informationRobust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School
Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management
More informationRisk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008
Rsk-based Fatgue Estmate of Deep Water Rsers -- Course Project for EM388F: Fracture Mechancs, Sprng 2008 Chen Sh Department of Cvl, Archtectural, and Envronmental Engneerng The Unversty of Texas at Austn
More informationAn Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services
An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao
More informationCourse outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy
Fnancal Tme Seres Analyss Patrck McSharry patrck@mcsharry.net www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton
More informationSciences Shenyang, Shenyang, China.
Advanced Materals Research Vols. 314-316 (2011) pp 1315-1320 (2011) Trans Tech Publcatons, Swtzerland do:10.4028/www.scentfc.net/amr.314-316.1315 Solvng the Two-Obectve Shop Schedulng Problem n MTO Manufacturng
More informationAn Interest-Oriented Network Evolution Mechanism for Online Communities
An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne
More informationA novel Method for Data Mining and Classification based on
A novel Method for Data Mnng and Classfcaton based on Ensemble Learnng 1 1, Frst Author Nejang Normal Unversty;Schuan Nejang 641112,Chna, E-mal: lhan-gege@126.com Abstract Data mnng has been attached great
More informationLecture 2: Single Layer Perceptrons Kevin Swingler
Lecture 2: Sngle Layer Perceptrons Kevn Sngler kms@cs.str.ac.uk Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses
More informationMooring Pattern Optimization using Genetic Algorithms
6th World Congresses of Structural and Multdscplnary Optmzaton Ro de Janero, 30 May - 03 June 005, Brazl Moorng Pattern Optmzaton usng Genetc Algorthms Alonso J. Juvnao Carbono, Ivan F. M. Menezes Luz
More informationTHE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION
Internatonal Journal of Electronc Busness Management, Vol. 3, No. 4, pp. 30-30 (2005) 30 THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Yu-Mn Chang *, Yu-Cheh
More informationVision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION
Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble
More informationA Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression
Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,
More informationAN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE
AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE Yu-L Huang Industral Engneerng Department New Mexco State Unversty Las Cruces, New Mexco 88003, U.S.A. Abstract Patent
More informationA New Task Scheduling Algorithm Based on Improved Genetic Algorithm
A New Task Schedulng Algorthm Based on Improved Genetc Algorthm n Cloud Computng Envronment Congcong Xong, Long Feng, Lxan Chen A New Task Schedulng Algorthm Based on Improved Genetc Algorthm n Cloud Computng
More informationTesting and Debugging Resource Allocation for Fault Detection and Removal Process
Internatonal Journal of New Computer Archtectures and ther Applcatons (IJNCAA) 4(4): 93-00 The Socety of Dgtal Informaton and Wreless Communcatons, 04 (ISSN: 0-9085) Testng and Debuggng Resource Allocaton
More informationA spam filtering model based on immune mechanism
Avalable onlne www.jocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):2533-2540 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A spam flterng model based on mmune mechansm Ya-png
More informationMachine Learning and Software Quality Prediction: As an Expert System
I.J. Informaton Engneerng and Electronc Busness, 2014, 2, 9-27 Publshed Onlne Aprl 2014 n MECS (http://www.mecs-press.org/) DOI: 10.5815/jeeb.2014.02.02 Machne Learnng and Software Qualty Predcton: As
More information1. Introduction. Graham Kendall School of Computer Science and IT ASAP Research Group University of Nottingham Nottingham, NG8 1BB gxk@cs.nott.ac.
The Co-evoluton of Tradng Strateges n A Mult-agent Based Smulated Stock Market Through the Integraton of Indvdual Learnng and Socal Learnng Graham Kendall School of Computer Scence and IT ASAP Research
More informationHowHow to Find the Best Online Stock Broker
A GENERAL APPROACH FOR SECURITY MONITORING AND PREVENTIVE CONTROL OF NETWORKS WITH LARGE WIND POWER PRODUCTION Helena Vasconcelos INESC Porto hvasconcelos@nescportopt J N Fdalgo INESC Porto and FEUP jfdalgo@nescportopt
More informationJ. Parallel Distrib. Comput.
J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n
More informationConversion between the vector and raster data structures using Fuzzy Geographical Entities
Converson between the vector and raster data structures usng Fuzzy Geographcal Enttes Cdála Fonte Department of Mathematcs Faculty of Scences and Technology Unversty of Combra, Apartado 38, 3 454 Combra,
More informationSupport Vector Machines
Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.
More informationHOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA*
HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA* Luísa Farnha** 1. INTRODUCTION The rapd growth n Portuguese households ndebtedness n the past few years ncreased the concerns that debt
More informationThe OC Curve of Attribute Acceptance Plans
The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4
More informationMinimal Coding Network With Combinatorial Structure For Instantaneous Recovery From Edge Failures
Mnmal Codng Network Wth Combnatoral Structure For Instantaneous Recovery From Edge Falures Ashly Joseph 1, Mr.M.Sadsh Sendl 2, Dr.S.Karthk 3 1 Fnal Year ME CSE Student Department of Computer Scence Engneerng
More informationHow To Understand The Results Of The German Meris Cloud And Water Vapour Product
Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller
More informationSurvey on Virtual Machine Placement Techniques in Cloud Computing Environment
Survey on Vrtual Machne Placement Technques n Cloud Computng Envronment Rajeev Kumar Gupta and R. K. Paterya Department of Computer Scence & Engneerng, MANIT, Bhopal, Inda ABSTRACT In tradtonal data center
More informationRecurrence. 1 Definitions and main statements
Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.
More informationMethodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications
Methodology to Determne Relatonshps between Performance Factors n Hadoop Cloud Computng Applcatons Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng and
More informationMining Multiple Large Data Sources
The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of
More informationRate Monotonic (RM) Disadvantages of cyclic. TDDB47 Real Time Systems. Lecture 2: RM & EDF. Priority-based scheduling. States of a process
Dsadvantages of cyclc TDDB47 Real Tme Systems Manual scheduler constructon Cannot deal wth any runtme changes What happens f we add a task to the set? Real-Tme Systems Laboratory Department of Computer
More information1 Example 1: Axis-aligned rectangles
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton
More informationFace Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)
Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton
More informationApplication of Multi-Agents for Fault Detection and Reconfiguration of Power Distribution Systems
1 Applcaton of Mult-Agents for Fault Detecton and Reconfguraton of Power Dstrbuton Systems K. Nareshkumar, Member, IEEE, M. A. Choudhry, Senor Member, IEEE, J. La, A. Felach, Senor Member, IEEE Abstract--The
More informationPredicting Software Development Project Outcomes *
Predctng Software Development Project Outcomes * Rosna Weber, Mchael Waller, June Verner, Wllam Evanco College of Informaton Scence & Technology, Drexel Unversty 3141 Chestnut Street Phladelpha, PA 19104
More informationDamage detection in composite laminates using coin-tap method
Damage detecton n composte lamnates usng con-tap method S.J. Km Korea Aerospace Research Insttute, 45 Eoeun-Dong, Youseong-Gu, 35-333 Daejeon, Republc of Korea yaeln@kar.re.kr 45 The con-tap test has the
More informationA GENETIC ALGORITHM-BASED METHOD FOR CREATING IMPARTIAL WORK SCHEDULES FOR NURSES
82 Internatonal Journal of Electronc Busness Management, Vol. 0, No. 3, pp. 82-93 (202) A GENETIC ALGORITHM-BASED METHOD FOR CREATING IMPARTIAL WORK SCHEDULES FOR NURSES Feng-Cheng Yang * and We-Tng Wu
More informationThe Journal of Systems and Software
The Journal of Systems and Software 82 (2009) 241 252 Contents lsts avalable at ScenceDrect The Journal of Systems and Software journal homepage: www. elsever. com/ locate/ jss A study of project selecton
More informationPOLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and
POLYSA: A Polynomal Algorthm for Non-bnary Constrant Satsfacton Problems wth and Mguel A. Saldo, Federco Barber Dpto. Sstemas Informátcos y Computacón Unversdad Poltécnca de Valenca, Camno de Vera s/n
More informationPerformance Management and Evaluation Research to University Students
631 A publcaton of CHEMICAL ENGINEERING TRANSACTIONS VOL. 46, 2015 Guest Edtors: Peyu Ren, Yancang L, Hupng Song Copyrght 2015, AIDIC Servz S.r.l., ISBN 978-88-95608-37-2; ISSN 2283-9216 The Italan Assocaton
More informationProject Networks With Mixed-Time Constraints
Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa
More informationUsing Multi-objective Metaheuristics to Solve the Software Project Scheduling Problem
Usng Mult-obectve Metaheurstcs to Solve the Software Proect Schedulng Problem Francsco Chcano Unversty of Málaga, Span chcano@lcc.uma.es Francsco Luna Unversty of Málaga, Span flv@lcc.uma.es Enrque Alba
More information8 Algorithm for Binary Searching in Trees
8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the
More informationUsing an Adaptive Fuzzy Logic System to Optimise Knowledge Discovery in Proteomics
Usng an Adaptve Fuzzy Logc System to Optmse Knowledge Dscovery n Proteomcs James Malone, Ken McGarry and Chrs Bowerman School of Computng and Technology Sunderland Unversty St. Peter s Way, Sunderland,
More informationRELIABILITY, RISK AND AVAILABILITY ANLYSIS OF A CONTAINER GANTRY CRANE ABSTRACT
Kolowrock Krzysztof Joanna oszynska MODELLING ENVIRONMENT AND INFRATRUCTURE INFLUENCE ON RELIABILITY AND OPERATION RT&A # () (Vol.) March RELIABILITY RIK AND AVAILABILITY ANLYI OF A CONTAINER GANTRY CRANE
More informationA DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña
Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION
More informationPSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12
14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed
More informationCredit Limit Optimization (CLO) for Credit Cards
Credt Lmt Optmzaton (CLO) for Credt Cards Vay S. Desa CSCC IX, Ednburgh September 8, 2005 Copyrght 2003, SAS Insttute Inc. All rghts reserved. SAS Propretary Agenda Background Tradtonal approaches to credt
More informationMETHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS
METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng
More informationA Simple Approach to Clustering in Excel
A Smple Approach to Clusterng n Excel Aravnd H Center for Computatonal Engneerng and Networng Amrta Vshwa Vdyapeetham, Combatore, Inda C Rajgopal Center for Computatonal Engneerng and Networng Amrta Vshwa
More informationEstimating the Development Effort of Web Projects in Chile
Estmatng the Development Effort of Web Projects n Chle Sergo F. Ochoa Computer Scences Department Unversty of Chle (56 2) 678-4364 sochoa@dcc.uchle.cl M. Cecla Bastarrca Computer Scences Department Unversty
More informationAnt Colony Optimization for Economic Generator Scheduling and Load Dispatch
Proceedngs of the th WSEAS Int. Conf. on EVOLUTIONARY COMPUTING, Lsbon, Portugal, June 1-18, 5 (pp17-175) Ant Colony Optmzaton for Economc Generator Schedulng and Load Dspatch K. S. Swarup Abstract Feasblty
More informationSelecting Best Employee of the Year Using Analytical Hierarchy Process
J. Basc. Appl. Sc. Res., 5(11)72-76, 2015 2015, TextRoad Publcaton ISSN 2090-4304 Journal of Basc and Appled Scentfc Research www.textroad.com Selectng Best Employee of the Year Usng Analytcal Herarchy
More informationDocument Clustering Analysis Based on Hybrid PSO+K-means Algorithm
Document Clusterng Analyss Based on Hybrd PSO+K-means Algorthm Xaohu Cu, Thomas E. Potok Appled Software Engneerng Research Group, Computatonal Scences and Engneerng Dvson, Oak Rdge Natonal Laboratory,
More informationBUSINESS PROCESS PERFORMANCE MANAGEMENT USING BAYESIAN BELIEF NETWORK. 0688, dskim@ssu.ac.kr
Proceedngs of the 41st Internatonal Conference on Computers & Industral Engneerng BUSINESS PROCESS PERFORMANCE MANAGEMENT USING BAYESIAN BELIEF NETWORK Yeong-bn Mn 1, Yongwoo Shn 2, Km Jeehong 1, Dongsoo
More informationRisk Model of Long-Term Production Scheduling in Open Pit Gold Mining
Rsk Model of Long-Term Producton Schedulng n Open Pt Gold Mnng R Halatchev 1 and P Lever 2 ABSTRACT Open pt gold mnng s an mportant sector of the Australan mnng ndustry. It uses large amounts of nvestments,
More informationChapter 6. Classification and Prediction
Chapter 6. Classfcaton and Predcton What s classfcaton? What s Lazy learners (or learnng from predcton? your neghbors) Issues regardng classfcaton and Frequent-pattern-based predcton classfcaton Classfcaton
More information