Improved Mining of Software Complexity Data on Evolutionary Filtered Training Sets

Size: px
Start display at page:

Download "Improved Mining of Software Complexity Data on Evolutionary Filtered Training Sets"

Transcription

1 Improved Mnng of Software Complexty Data on Evolutonary Fltered Tranng Sets VILI PODGORELEC Insttute of Informatcs, FERI Unversty of Marbor Smetanova ulca 17, SI-2000 Marbor SLOVENIA Abstract: - Wth the evoluton of nformaton technology and software systems, software relablty has become one of the most mportant topcs of software engneerng. As the dependency of socety on software systems ncrease, so ncreases also the mportance of effcent software fault predcton. In ths paper we present a new approach to mprovng the classfcaton of faulty software modules. The proposed approach s based on flterng tranng sets wth the ntroducton of data outlers dentfcaton and removal method. The method uses an ensemble of evolutonary nduced decson trees to dentfy the outlers. We argue that a classfer traned by a fltered dataset captures a more general knowledge model and should therefore perform better also on unseen cases. The proposed method s appled on a real-world software relablty analyss dataset and the obtaned results are dscussed. Key-Words: - data mnng, classfcaton, evolutonary decson trees, flterng tranng sets, software fault predcton, search-based software engneerng 1 Introducton The early and accurate dentfcaton of potentally dangerous (faulty) software modules s of vtal mportance for better software relablty. Relablty s one of the most mportant aspects of software systems of any knd (nformaton systems, embedded systems, etc.). The sze and complexty of software s growng dramatcally durng last decades. The demand for hghly complex software systems s ncreasng more rapdly than the ablty to desgn, mplement and mantan them. When the requrements for and dependences of computers ncrease, the possblty of crses from falures also ncreases. The mpact of these falures ranges from nconvenence to economc damages to loss of lves therefore t s clear that software relablty s becomng a major concern not only for software engneers and computer scentsts, but also for the socety as a whole. Therefore, the employment of an effcent fault predctve technque to foresee dangerous software modules s essental. Wth the creaton of large emprcal databases of software projects, as a result of stmulated research on estmaton models, metrcs and methods for measurng and mprovng processes and products, ntellgent mnng of these datasets can largely add to the mprovement of software relablty [1]. In order to help the software engneers n predctng the faulty software modules, computerzed data mnng and decson support tools can be used whch are able to help software engneers to process a huge amount of data avalable from prevous software projects and suggest the probable predcton based on the values of several mportant attrbutes. Black-box classfcaton methods (neural networks for example) are not very approprate for ths knd of task, because the software experts want to evaluate and valdate the decson makng process nduced by those tools, before there s enough trust to use the tools n practce. On the other hand, the evaluaton of the nduced classfcaton rules produced by the computerzed tools by a software expert can be an mportant source of new knowledge on the assocatons of the avalable attrbutes and new laws of software relablty engneerng. In order to acheve ths goal, the classfcaton process should be easly understandable, nterpretable and straghtforward. One of the most popular and proven-useful approach are decson trees. However, t has been shown that decson trees are a weak classfer, prone to produce very dfferent solutons based on an even small change n nput (tranng) data. Therefore, naccuraces and nose n tranng data can easly lead to an naccurate result. ISSN: Issue 11, Volume 6, November 2009

2 The dea that we present n ths paper s to construct an outler predcton method that flters out so-called data outlers,.e. data tems that fall outsde the boundares that enclose most other data tems n the data set [1, 2]. When a fltered dataset s used to tran a classfer (a decson tree) t should produce a better and more relable classfcaton result. Furthermore, as a consequence of ncreased homogenety of the data, the results should be more general and thus smpler, less complex and easer to apply n general. Smlar method has already been appled to data mnng n medcne wth some success [3]. Our proposton for the predcton of outlers s the followng: f a sngle (known) data case s classfed dfferently by dfferent accurate classfers, t potentally contans contradctory nformaton (Fgure 1). Although ths contradcton s not necessarly an error n data, a usual decson tree classfer s not able to correctly construct a general model based on such data. Therefore, our proposton s that the general knowledge model bult by a decson tree (or some other nducton method n that matter) would be more effcent when a classfer would be traned wthout such msleadng data. 1.1 The am and scope of the paper The man objectve of ths paper s to nvestgate whether t s possble to mprove the classfcaton performance of several machne learnng algorthms when the outlers are fltered out of the tranng set. For the outler predcton method a set of evolutonary nduced decson trees s used. The proposed approach s appled to a software engneerng problem of predctng potentally dangerous (contanng errors and faults) software modules. The data mnng results (both quanttatve classfcaton results and qualtatve assessment of nduced models) of dfferent machne learnng algorthms are presented and dscussed. n data set 2 Flterng the dataset by outler predcton method The basc dea of outler predcton s to defne a crteron or crtera, upon whch for each data case from a dataset can be determned whether t belongs to the majorty of the cases or not. If the specfc data case regardng the defned crtera does not belong to the majorty, then t s called an outler (regardng specfc crtera). How the crtera are defned determnes the outler predcton method. In general, defnng outler crtera s not trval, whereas the dentfcaton of outlers based on these crtera s. In our prevous work we presented how to use evolutonary algorthm for the nducton of decson trees [4, 5, 6]. One of the greatest advantage of evolutonary constructon of decson trees, besde the proven effcency n classfyng, s the ablty to produce several comparably accurate classfers for the same dataset. Havng ths n mnd, t s possble to get a decson for the same data case based on dfferent classfers (usng dfferent attrbutes and/or dfferent relatons). class 1 classfers class 2 Fg 1. When a sngle data record s classfed dfferently by dfferent classfers, t potentally contans contradctory nformaton. In our method, we frst defne an approach to the dentfcaton of outlers. For ths purpose the algorthm for the constructon of decson trees ISSN: Issue 11, Volume 6, November 2009

3 gentrees s used (descrbed n the next secton). Wth gentrees a set of classfers (decson trees) are nduced. For each tranng object (data case from the tranng set) classfcaton cc (x) s calculated for all decson classes (Eq. 1) the resultng value represents the number of classfers that classfed object x wth the decson class. Then classfcaton confuson score CCS(x) s calculated for each tranng object x (Eq. 2) the result represents the confuson score of a set of classfers when classfyng object x; f all DTs gve the same classfcaton, then the result s 0; hgher numbers represent less homogeneous objects for classfcaton possble outlers. The hgher s the number CCS(x), more probably les the object x outsde the majorty area. Based on the CCS (Eq. 2) t s determned whch objects should be fltered out from the dataset for the classfcaton process. For ths purpose a tolerance threshold tt s defned for a dataset; f CCS(x)>tt then object x s fltered out. survval of the fttest, frst lad down by Charles Darwn, so by smulatng ths process, genetc algorthms are able to evolve solutons to real-world problems, f they have been sutably encoded [9]. They are often capable of fndng optmal solutons even n the most complex of search spaces or at least they offer sgnfcant benefts over other search and optmsaton technques. The basc mechansm of evolutonary algorthms s presented n Fgure 2. cc ( x) ;, num _ DTs j 1 1; class ( DT 0; otherwse ( 1... num _ classes ) j, x) (1) CCS 2 num _ classes cc ( x) ( x) 1 (2) 1 cc max ( x) Fg 2. The basc mechansm of evolutonary algorthms: selecton, crossover and mutaton. 3 Backend technologes: evolutonary algorthms and decson trees The central pont of our approach, as has been sad above, s the use of evolutonary nduced decson trees whch are beng used to construct a set of classfers for dentfyng outlers n a tranng set. Both backend technologes enablng our approach are brefly presented frst, followng by the descrpton of our algorthm for nducng decson trees. 3.1 Evolutonary algorthms Genetc algorthms are adaptve heurstc search methods whch may be used to solve all knds of complex search and optmsaton problems [7, 8]. They are based on the evolutonary deas of natural selecton and genetc processes of bologcal organsms. Lke natural populatons evolve accordng to the prncples of natural selecton and A typcal genetc algorthm operates on a set of solutons (populaton) wthn the search space. The search space represents all the possble solutons whch can be obtaned for the gven problem, and s usually very complex or even nfnte. Every pont of the search space s one of the possble solutons and therefore the am of the genetc algorthm s to fnd an optmal pont or at least come as close to t as possble. The genetc algorthm conssts of three genetc operators: selecton, crossover (recombnaton), and mutaton. Selecton s the survval of the fttest ndvduals wthn the genetc algorthm wth the am of gvng the preference to the best ones. For ths purpose all solutons have to be evaluated, whch s done wth the use of the evaluaton functon. Selecton determnes ndvduals to be used for the second genetc operator - crossover or recombnaton, where from two good ndvduals a new, even better one s constructed. The crossover process s repeated untl the whole new populaton s completed wth the offsprng produced by the ISSN: Issue 11, Volume 6, November 2009

4 crossover. All constructed ndvduals have to preserve the feasblty regardng the gven problem; n ths manner t s mportant to coordnate nternal representaton of ndvduals wth genetc operators. The last genetc operator s mutaton, whch s an occasonal (low probablty) alteraton of an ndvdual that helps to fnd an optmal soluton to the gven problem faster and more relably. DTs have been already used n a large amount of dfferent applcatons: n medcne [5], fnance [11], envronmental studes [12], etc. Ther greatest advantage s comparatvely hgh classfcaton accuracy whle preservng the transparent knowledge model whch can be easly nterpreted and valdated by doman experts. Ths has made DTs one of the most popular classfers. 3.2 Decson trees Inductve nference s the process of movng from concrete examples to general models, where the goal s to learn how to classfy objects by analysng a set of nstances (already solved cases) whose classes are known. Instances are typcally represented as attrbute-value vectors. Tranng nput conssts of a set of such vectors, each belongng to a known class, and the output conssts of a mappng from attrbute values to classes. Ths mappng should accurately classfy both the gven nstances and other unseen nstances. A decson tree (DT) [5, 9] s a formalsm for expressng such mappngs and conssts of tests or attrbute nodes lnked to two or more sub-trees and leafs or decson nodes labeled wth a class whch means the decson. A test node computes some outcome based on the attrbute values of an nstance, where each possble outcome s assocated wth one of the sub-trees. An nstance s classfed by startng at the root node of the tree. If ths node s a test, the outcome for the nstance s determned and the process contnues usng the approprate subtree. When a leaf s eventually encountered, ts label gves the predcted class of the nstance. An example of a DT s presented n Fgure 3. Fg 3. An example of a decson tree. 3.3 Evolutonary nducton of decson trees: the gentrees algorthm Evolutonary algorthms (EAs) are generally used for very complex optmzaton tasks [8], for whch no effcent determnstc or heurstc method s developed. Constructon of DTs s a complex task, but an exact method exsts, that n general works effcently and relably [5, 9]. At a frst glance there s no reason to use EAs. Nevertheless, there are some objectve reasons that justfy our evolutonary approach. Frst, EAs provde a very general concept that can be used n all knds of decson makng problems. Because of ther robustness they can be used also on ncomplete, nosy data (whch often happens because of measurement errors, unavalablty of proper nstruments, etc.). Ths s not very successfully solvable by tradtonal technques of DT constructon. Furthermore, EAs use evolutonary prncples to evolve solutons, therefore solutons can be found that can be easly overlooked otherwse. Another mportant advantage of EAs s the possblty of optmzng the decson tree's topology and adaptng class ntervals for numerc attrbutes, wthn the evoluton process. And the most mportant advantage of the evolutonary approach n our case s that not only one but several equally qualtatve solutons are obtaned for the same dataset. In ths way the same decson can be made based on dfferent attrbutes and/or combnaton of the attrbutes. By dfferent settngs of parameters n EA runs, searchng can be drected to dfferent stuatons that gve us dfferent solutons for our fnal decson forest. Of course there are also some drawbacks of our method regardng the heurstc nducton, the most obvous one beng the hgher nducton tme complexty. When defnng the nternal representaton of ndvduals wthn the genetc populaton, together wth the approprate genetc operators that wll work upon the populaton, t s mportant to assure the feasblty of all solutons durng the whole evoluton process. We decded to present an ndvdual drectly as a DT. In ths manner all ntermedate solutons are feasble, no nformaton s ISSN: Issue 11, Volume 6, November 2009

5 lost because of the converson between nternal representaton and the DT, the ftness functon (FF) can be straghtforward, etc. The problem wth drect codng of soluton may brng some problems n defnng of genetc operators. As DTs may be seen as a knd of smple computer programs (wth attrbute nodes beng condtonal clauses and decson nodes beng assgnments) we decded to defne genetc operators smlar to those used n genetc programmng where ndvduals are computer program trees [10]. The nducton of DT-lke classfers wth the use of evolutonary algorthms has been already used n several cases [5, 14, 15], also appled to software engneerng [6] Inducton of DTs for the ntal populaton Frst step of the genetc algorthm s the nducton of the ntal populaton. A random decson tree s constructed based on the followng algorthm: 1. nput: number of attrbute nodes V that wll be n the tree 2. select an attrbute A from the set of all possble K attrbutes and defne t a root node tn 3. n accordance wth the selected attrbute's A type (dscrete or contnuous) defne a test for ths node tn: a. for contnuous attrbutes n a form of tn(a ) <, where tn(a ) s the attrbute value for a data object and s a splt constant b. for dscrete attrbutes two dsjunctve sets of all possble attrbute values are randomly defned 4. connect empty leaves to both new branches from node tn 5. randomly select an empty leaf node tn (note below) 6. randomly select an attrbute A from the set of all possble K attrbutes (note below) 7. replace the selected leaf node tn wth the attrbute A and go to step 3 8. fnsh when V attrbute nodes has been created In the above algorthm: - the probablty of selectng an empty leaf s decreased wth the depth of the leaf n a growng tree, and - the probablty of choosng an attrbute depends on a number of prevous uses of that attrbute n a tree n ths manner unused attrbutes have better chances to be selected. A smplfed pseudo-code of the above algorthm s presented at Fgure 4. // ntal decson trees Select number of nodes N repeat Select an attrbute X Defne f t (X)< or f t (X ) V Connect empty leaves to node Randomly select an empty leaf node untl N nodes are created Fg 4. The basc algorthm for constructng random DTs. For each empty leaf the followng algorthm determnes the approprate decson class: let S be the tranng set of all tranng objects N wth M possble decson classes 1,.., M and N s the number of objects wthn S of a class. Let S tn be the sample set at node tn (an empty leaf for whch we are tryng to select a decson class) wth N tn objects; N tn s the number of objects wthn S tn of a decson class. Now we can defne a functon that measures a potental percentage of correctly classfed objects of a class : F tn, ) N N tn ( (3) Decson tn for the leaf node tn s then marked as the decson, for whch F(tn,) s maxmal. The rankng of an ndvdual DT wthn a populaton s based on the FF: FF M V w ( 1 acc ) c( tn ) wu nu (4) 1 1 where M s the number of decson classes, V s the number of attrbute nodes n a tree, acc s the accuracy of classfcaton of objects of a specfc decson class, w s the mportance weght for classfyng the objects of the decson class, c(tn ) s the cost of usng the attrbute n a node tn, nu s number of unused decson (leaf) nodes,.e. where no object from the tranng set fall nto, and w u s the weght of the presence of unused decson nodes n a tree. Accordng to FF the best trees (the most ft ones) have the lowest functon values the am of the evolutonary process s to mnmze the value of FF for the best tree. A near optmal DT would: 1) classfy all tranng objects wth accuracy n ISSN: Issue 11, Volume 6, November 2009

6 accordance wth mportance weghts w (some decson classes may be more mportant than the others), 2) have very lttle unused decson nodes (there s no evaluaton possble for ths knd of decson nodes regardng the tranng set), and 3) consst of low-cost attrbute nodes (n ths manner the desrable/undesrable attrbutes can be prortzed). qualtatve solutons are obtaned regardng the chosen FF. The evoluton stops when an optmal or at least an acceptable soluton s found or f the ftness score of the best ndvdual does not change for a predefned (large) number of generatons. A smplfed pseudo-code algorthm for the evolutonary process of mprovng the qualty of a DT (n accordance wth FF) s presented at Fgure Crossover and mutaton Crossover works on two selected ndvduals as an exchange of two randomly selected sub-trees. In ths manner a randomly selected tranng object s selected frst whch s used to determne paths (by fndng a decson through the tree) n both selected trees. Then an attrbute node s randomly selected on a path n the frst tree and an attrbute s randomly selected on a path n the second tree. Fnally, the sub-tree from a selected attrbute node n the frst tree s replaced wth the sub-tree from a selected attrbute node n the second tree and n ths manner an offsprng s created whch s put nto a new populaton. Mutaton conssts of several parts: 1) one randomly selected attrbute node s replaced wth another attrbute, randomly chosen from the set of all attrbutes; 2) a test n a randomly selected attrbute node s changed,.e. the splt constant s mutated; 3) a randomly selected decson (leaf) node s replaced by an attrbute node; 4) a randomly selected attrbute node s replaced by a decson node. // evolvng DTs repeat tournament selecton crossover mutaton evaluate all DTs wth FF untl termnaton crtera reached Fg 5. The basc algorthm for evolvng DTs. 4 Applcaton of the method The descrbed tranng set flterng method has been appled to a real-world software relablty analyss dataset, composed at the Unversty of Udne, Italy, from the software development project of a hosptal nformaton system the whole medcal software system conssts of 904 modules n C programmng language representng more than lnes of code. Frst the modules have been dentfed ether as OK or DANGEROUS by applyng the model developed by Pghn [11] the modules that contan less than 5 errors were set as OK and the others as DANGEROUS. A set of 168 attrbutes, contanng varous software complexty measures, has been determned for each software module. From all 904 modules 804 have been randomly selected for the tranng set, and the remanng 100 modules have been selected for the testng set. A set of classfers were nduced wth gentrees based on the tranng set. Then outlers were dentfed usng the class confuson score metrcs (Eq. 2), whch were removed from the orgnal tranng set n order to get a fltered tranng set. Fnally, some well-known classfcaton algorthms were used on both orgnal and fltered tranng set n order to compare classfcaton results. The followng classfcaton algorthms have been used: AREX [12], ID3, C4.5 [13, 14], Naïve-Bayes (N-B), nstance-based classfer (IB,.e. k-nearest neghbors), and logstc regresson (LogReg) [15, 16]. All the results are the averages of 10-fold crossvaldaton. Wth the combnaton of presented crossover that works as a constructve operator towards local optmums, and mutaton that works as a destructve operator n order to keep the needed genetc dversty, the searchng for the soluton tends to be drected toward the globally optmal soluton. In ths manner the optmal soluton s the most approprate DT regardng our specfc needs (expressed n the form of FF). As the evoluton repeats, more 4.1 A Dataset The used software complexty measures dataset (SCM dataset) has been carefully composed from a well-prepared protocol [17]. The startng pont for the analyss was the defnton and measurement of a set of expermental attrbutes connected to the structure of software products after the code phase. Such parameters may, for example, be the total number of lnes of code and lnes of comments, the ISSN: Issue 11, Volume 6, November 2009

7 occurrence of varous types of nstructons, the operators and the types of data used. For each software module, moreover, the fault sgnals up to the moment when measurement started were consdered. By faults all the malfunctons encountered durng the nternal test phase and after the release of the software are meant. In our expermental envronment the code phase ncluded a prelmnary test of modules. After ths phase the modules went on to the real test sesson, n whch faults were sgnaled and measurement started. What had to be dealt wth was whether the chosen set of parameters would be suffcently large to dentfy the structure of a program. Accordngly t has been started wth a very large set of parameters. These parameters were affected by the persstence of multcollnearty. It was necessary to reduce the total number of parameters to a smaller set of ndependent parameters. Ths was acheved by statstcal procedures elmnatng those whch were heavly dependent on other parameters n explanng the presence of faults n a code, or whch were completely rrelevant. A subset of 168 structural parameters was defned whch the factoral analyss dentfed as beng reasonably free from multcollnearty, plus the dependent varable, the number of faults. The basc propertes of the SCM dataset are presented n Table 1. Table 1. The basc propertes of the SCM dataset. number of cases 904 number of attrbutes nomnal 1 - contnuous 167 number of decson classes 2 decson classes dstrbuton 76.4%; 23.6% 4.2 Quanttatve results As descrbed above, after flterng out the outlers by the proposed outler predcton method, the reduced data set has been used by some well-known classfcaton methods. When usng the same tranng set and the same parameter settng, the algorthms produce the same determnstc results (for example C4.5 algorthm uses entropy measures for the nducton of decson trees, etc). In ths manner, the dfference n acheved classfcaton effectveness on a testng set can be objectvely compared between the nduced classfers traned by ether the orgnal, non-fltered tranng set or the fltered tranng set. The classfcaton results are presented n tables 2 and 3 and on fgures 4 and 5. For the SCM dataset fve classfers were nduced and the tolerance threshold selected tt=0.2; f none or only one classfer (out of fve) msclassfed an object, then the object was not dentfed as an outler. Altogether 29 objects (from 804 n the orgnal tranng set) were removed from the tranng set (3.6% removal). Table 2. Average classfcaton accuraces on the SCM tranng dataset. classfcaton algorthm accuracy on the tranng set [%] orgnal set fltered set AREX C ID IB Naïve-Bayes LogReg Table 3. Average classfcaton accuraces on the SCM testng dataset. classfcaton accuracy on the tranng set [%] algorthm orgnal set fltered set AREX C ID IB Naïve-Bayes LogReg ,00 100,00 95,00 90,00 85,00 80,00 75,00 70,00 Accuracy on tranng set for SCM dataset. AREX C4.5 ID3 IB Nave- Bayes orgnal fltered LogReg Fg. 4. Classfcaton results on the SCM tranng dataset. ISSN: Issue 11, Volume 6, November 2009

8 85,00 Accuracy on testng set for SCM dataset. rules, whereas the classfers nduced on the orgnal tranng set are much more demandng to nterpret. 82,00 79,00 76,00 73,00 70,00 AREX C4.5 ID3 IB Nave- Bayes orgnal fltered LogReg Fg. 5. Classfcaton results on the SCM testng dataset All the results presented are based on the 10-fold cross valdaton for each algorthm on a tranng set and a testng set. The classfcaton results of the method AREX are based on 10 ndependent evolutonary runs for each fold. 4.3 Qualtatve results The possblty of accurate predctons of the potentally dangerous software modules based on the software complexty attrbutes s very mportant for software engneers n order to mprove the software relablty. The proposed classfcaton method proves to be effectve n performng ths task. However, the knowledge (.e. combnatons of the used attrbutes) used to make the predctons would be of an mmense mportance n order to decrease the possblty of the problems to arse n the frst place. Therefore, the bult knowledge models (lke decson trees or decson rules) should be studed to fnd ths knowledge. An nterestng phenomenon that arose wth the flterng of the dentfed outlers from the orgnal tranng set s the fact, that the bult knowledge models based upon the fltered learnng set were much less complex than those bult on the orgnal, non-fltered learnng set. Ths means that less attrbutes were used to predct the faulty modules and the overall models were smpler, smaller and less complex easer to nterpret. For the comparson on Fgure 6 there are two decson trees bult on the orgnal, non-fltered learnng set, and on Fgure 7 there are a few decson trees bult on the fltered learnng set; the accuracy of all the decson models are pretty much the same. We can see that n the case of fltered tranng set, the resultng classfers are almost as smple as smple alpha --[< ] OK --[>= ] sgnf_of_comments --[< ] StrCtrl_lnes --[< ] OK --[>= ] DANGEROUS --[>= ] selecton_nstr --[<3.799] DANGEROUS --[>=3.799] formal_functon_params --[<19.162] DANGEROUS --[>=19.162] sgnf_of_comments --[< ] OK --[>= ] DANGEROUS words_of_comments --[< ] break --[<12.282] functon_calls_to_funcs --[<11.985] vect_functon_args --[<25.088] const_wth_#defne --[<41.584] formal_func_pars --[<8.134] OK --[>=8.134] DANGEROUS --[>=41.584] DANGEROUS --[>=25.088] fletype --[c,r,h] OK --[s] DANGEROUS --[>=11.985] StrCtrl_lnes --[<10.611] DANGEROUS --[>=10.611] OK --[>=12.282] DANGEROUS --[>= ] DANGEROUS Fg. 6. Two of the nduced decson trees for predctng dangerous software modules on the orgnal, non-fltered learnng set. tot_of_comments --[< ] break --[< ] OK --[>= ] DANGER --[>= ] DANGER declared_vect --[< ] solo_comment_lnes --[< ] OK --[>= ] DANGER --[>= ] DANGER StrCtrl_lnes --[< ] sgnf_of_comments --[< ] OK --[>= ] DANGER --[>= ] DANGER ISSN: Issue 11, Volume 6, November 2009

9 case --[< ] words_of_comments --[< ] OK --[>= ] DANGER --[>= ] DANGER Fg. 7. A few of the nduced decson trees for predctng dangerous software modules on the fltered learnng set. 4.4 Dscusson The classfcaton results on the SCM dataset show that all the decson tree based classfers (AREX, ID3, C4.5) mproved on the tranng set (as expected) and also mproved consderably on the testng set. Whereas the mprovement was expected on the tranng set, as the outlers are more dffcult to place n the general model, the mprovement on the testng set was only hoped for. The Naïve-Bayes classfer mproved on the tranng set, but scored the same on the testng set. The IB and LogReg classfers scored practcally the same results on the tranng set and show some mprovements on the testng set. From the above results t can be concluded that the flterng of the tranng dataset (by removng the outlers) has only a mnor effect on IB classfers; as they search for the most smlar objects n the dataset practcally the outlers have no effect on the search. The decson tree classfers beneft from flterng the removal of outlers helps to reduce the uncertanty that s ntroduced by outlers, even as the percentage of the removed objects s rather small (3.6%). As DTs are very weak classfers, where even a change of a sngle tranng case can result n a completely dfferent soluton, the removal of dentfed outlers (whch are selected based on ther complexty of beng correctly classfed) substantally mproves the classfcaton results. The Naïve-Bayes and logstc regresson classfers do not beneft consderably from flterng; however, the classfcaton results are not worse ether. We must know that statstcs, upon whch these classfers are based, normally works towards generalzaton, neglectng mnor (boundary) cases. For ths reason, n our opnon, the removal of outlers from the tranng set does not nfluence the results consderably. Generally speakng for all the used classfcaton methods: when usng the fltered tranng set, the quanttatve classfcaton results on the testng set are at least as good as when traned on the orgnal tranng set, and n many cases better. 5 Concluson A fault predctve technque to foresee dangerous software modules has been presented n the paper. It s based on a new outler predcton and removal technque, that s used to flter an orgnal tranng set, whch s then used to tran varous classfers. Our proposton was that the fltered tranng datasets used to tran the classfers can mprove the classfcaton results regardng the orgnal, nonfltered datasets. The obtaned results show an mprovement of the classfcaton performance wth the decson tree classfers (whch are weak classfers), where outlers have rather negatve effect on the tranng process. The proposed approach has only a mnor effect on the nstance based, Naïve-Bayes and logstc regresson classfers n these cases the outlers do not represent a bg ssue as statstcal methods already work prmarly n a generalzaton manner, neglectng mnor (boundary) cases. However, all the classfcaton results are at least as good as wth the orgnal, non-fltered dataset. Ths fact speaks n favor of usng the proposed outler predcton and removal method when nducng a fault predctve knowledge model. In partcular, because traned knowledge models (n the case of nducng DT classfers), at least as t turned out n our experment, are substantally less complex when traned upon fltered tranng sets. In ths manner, an expert can draw some general knowledge out of the nduced model. As even the smallest mprovement s of vtal mportance when mnng the software relablty measures data, we plan to further explore the possbltes that the proposed approach offers. References: [1] P.K. Kapur, O. Shatnaw, A.G. Aggarwal, R. Kumar, Unfed framework for developng testng effort dependent software relablty growth models, WSEAS Transactons on Systems, vol. 8, num. 4, 2009, pp [2] M.M. Breung, H.-P. Kregel, R.T. Ng, J. Sander, LOF: Identfyng densty-based local outlers, Proceedngs of ACM SIGMOD, [3] E.M. Knorr, R.T. Ng, Algorthms for mnng dstance-based outlers n large datasets, Proceedngs of the 24th VLDB Conference, ISSN: Issue 11, Volume 6, November 2009

10 [4] V. Podgorelec, M. Hercko, I. Rozman, Improvng mnng of medcal data by outlers predcton, Proceedngs of the 18th IEEE Symposum on Computer-based Medcal Systems CBMS'2005, [5] V. Podgorelec, P. Kokol, Towards more optmal medcal dagnosng wth evolutonary algorthms, Journal of Medcal Systems, Kluwer Academc/Plenum Press, vol. 25, num. 3, 2001, pp [6] V. Podgorelec, P. Kokol, B. Stglc, I. Rozman, Decson trees: an overvew and ther use n medcne, Journal of Medcal Systems, Kluwer Academc/Plenum Press, vol. 26, num. 5, 2002, pp [7] V. Podgorelec, P. Kokol, Evolutonary nduced decson trees for dangerous software modules predcton, Informaton Processng Letters, vol. 82, num. 1, 2002, pp [8] T. Baeck, Evolutonary Algorthms n Theory and Practce, Oxford Unversty Press, Inc., [9] D.E. Goldberg, Genetc Algorthms n Search, Optmzaton, and Machne Learnng, Readng MA: Addson Wesley, [10] J.H. Holland, Adaptaton n natural and artfcal systems, Cambrdge, MA: MIT Press, [11] A. Rotshten, M. Posner, H. Rakytyanska, Fuzzy IF-THEN Rules Extracton for Medcal Dagnoss Usng Genetc Algorthm, WSEAS Transactons on Systems, vol. 3, num. 2, 2004, pp [12] C.-S. Huang, Y.-J. Ln, C. Ln, Implementaton of classfers for choosng nsurance polcy usng decson trees: a case study, WSEAS Transactons on Computers, vol. 7, num. 10, 2008, pp [13] J.R. Koza, Genetc Programmng: On the Programmng of Computers by Natural Selecton, Cambrdge, MA: MIT Press, [14] M. Kasparova, J. Krupka, P. Jrava, Applcaton of decson trees n problem of ar qualty modellng n the Czech Republc localty, WSEAS Transactons on Systems, vol. 7, num. 10, 2008, pp [15] M. Zorman, B. Cernohorsk, G. Gorsek, M. Ojstersek, Hybrd evolutonary bult decson trees for predcton of perspectve cross-country skers, WSEAS Transactons on Informaton Scence and Applcatons, vol. 2, num. 1, 2005, pp [16] M. Pghn, P. Kokol, RPSM: A rsk-predctve structural expermental metrc, Proceedngs of FESMA'99, pp , [17] V. Podgorelec, P. Kokol, I. Rozman, AREX - Classfcaton rules extractng algorthm based on automatc programmng, Proceedngs of the 15th European Conference on Artfcal Intellgence ECAI 2002, pp , [18] J.R. Qunlan, C4.5: Programs for Machne Learnng, Morgan Kaufmann, [19], RuleQuest Research Data Mnng Tools, [20], Machne Learnng n C++, MLC++ lbrary, [21] J. Demsar, B. Zupan, Orange: From Expermental Machne Learnng to Interactve Data Mnng, Whte Paper ( Faculty of Computer and Informaton Scence, Unversty of Ljubljana, [22] V. Podgorelec, P. Kokol, M. Zorman, M. Sprogar, M. Pghn, The Operatve Constrants of Software Relablty Predcton Methods, Proceedngs of the World Multconference SCI'2001, ISSN: Issue 11, Volume 6, November 2009

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System Mnng Feature Importance: Applyng Evolutonary Algorthms wthn a Web-based Educatonal System Behrouz MINAEI-BIDGOLI 1, and Gerd KORTEMEYER 2, and Wllam F. PUNCH 1 1 Genetc Algorthms Research and Applcatons

More information

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION NEURO-FUZZY INFERENE SYSTEM FOR E-OMMERE WEBSITE EVALUATION Huan Lu, School of Software, Harbn Unversty of Scence and Technology, Harbn, hna Faculty of Appled Mathematcs and omputer Scence, Belarusan State

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

Gender Classification for Real-Time Audience Analysis System

Gender Classification for Real-Time Audience Analysis System Gender Classfcaton for Real-Tme Audence Analyss System Vladmr Khryashchev, Lev Shmaglt, Andrey Shemyakov, Anton Lebedev Yaroslavl State Unversty Yaroslavl, Russa vhr@yandex.ru, shmaglt_lev@yahoo.com, andrey.shemakov@gmal.com,

More information

Business Process Improvement using Multi-objective Optimisation K. Vergidis 1, A. Tiwari 1 and B. Majeed 2

Business Process Improvement using Multi-objective Optimisation K. Vergidis 1, A. Tiwari 1 and B. Majeed 2 Busness Process Improvement usng Mult-objectve Optmsaton K. Vergds 1, A. Twar 1 and B. Majeed 2 1 Manufacturng Department, School of Industral and Manufacturng Scence, Cranfeld Unversty, Cranfeld, MK43

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble 1 ECE544NA Fnal Project: Robust Machne Learnng Hardware va Classfer Ensemble Sa Zhang, szhang12@llnos.edu Dept. of Electr. & Comput. Eng., Unv. of Illnos at Urbana-Champagn, Urbana, IL, USA Abstract In

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success

More information

Ants Can Schedule Software Projects

Ants Can Schedule Software Projects Ants Can Schedule Software Proects Broderck Crawford 1,2, Rcardo Soto 1,3, Frankln Johnson 4, and Erc Monfroy 5 1 Pontfca Unversdad Católca de Valparaíso, Chle FrstName.Name@ucv.cl 2 Unversdad Fns Terrae,

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features On-Lne Fault Detecton n Wnd Turbne Transmsson System usng Adaptve Flter and Robust Statstcal Features Ruoyu L Remote Dagnostcs Center SKF USA Inc. 3443 N. Sam Houston Pkwy., Houston TX 77086 Emal: ruoyu.l@skf.com

More information

Planning for Marketing Campaigns

Planning for Marketing Campaigns Plannng for Marketng Campagns Qang Yang and Hong Cheng Department of Computer Scence Hong Kong Unversty of Scence and Technology Clearwater Bay, Kowloon, Hong Kong, Chna (qyang, csch)@cs.ust.hk Abstract

More information

Single and multiple stage classifiers implementing logistic discrimination

Single and multiple stage classifiers implementing logistic discrimination Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,

More information

Software project management with GAs

Software project management with GAs Informaton Scences 177 (27) 238 241 www.elsever.com/locate/ns Software project management wth GAs Enrque Alba *, J. Francsco Chcano Unversty of Málaga, Grupo GISUM, Departamento de Lenguajes y Cencas de

More information

A DATA MINING APPLICATION IN A STUDENT DATABASE

A DATA MINING APPLICATION IN A STUDENT DATABASE JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul

More information

A Secure Password-Authenticated Key Agreement Using Smart Cards

A Secure Password-Authenticated Key Agreement Using Smart Cards A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,

More information

IMPACT ANALYSIS OF A CELLULAR PHONE

IMPACT ANALYSIS OF A CELLULAR PHONE 4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng

More information

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES Zuzanna BRO EK-MUCHA, Grzegorz ZADORA, 2 Insttute of Forensc Research, Cracow, Poland 2 Faculty of Chemstry, Jagellonan

More information

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

Study on Model of Risks Assessment of Standard Operation in Rural Power Network Study on Model of Rsks Assessment of Standard Operaton n Rural Power Network Qngj L 1, Tao Yang 2 1 Qngj L, College of Informaton and Electrcal Engneerng, Shenyang Agrculture Unversty, Shenyang 110866,

More information

Calculating the high frequency transmission line parameters of power cables

Calculating the high frequency transmission line parameters of power cables < ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,

More information

Searching for Interacting Features for Spam Filtering

Searching for Interacting Features for Spam Filtering Searchng for Interactng Features for Spam Flterng Chuanlang Chen 1, Yun-Chao Gong 2, Rongfang Be 1,, and X. Z. Gao 3 1 Department of Computer Scence, Bejng Normal Unversty, Bejng 100875, Chna 2 Software

More information

Performance Analysis and Coding Strategy of ECOC SVMs

Performance Analysis and Coding Strategy of ECOC SVMs Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04), pp.67-76 http://dx.do.org/0.457/jgdc.04.7..07 Performance Analyss and Codng Strategy of ECOC SVMs Zhgang Yan, and Yuanxuan Yang, School

More information

An Inductive Fuzzy Classification Approach applied to Individual Marketing

An Inductive Fuzzy Classification Approach applied to Individual Marketing An Inductve Fuzzy Classfcaton Approach appled to Indvdual Marketng Mchael Kaufmann, Andreas Meer Abstract A data mnng methodology for an nductve fuzzy classfcaton s ntroduced. The nducton step s based

More information

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement An Enhanced Super-Resoluton System wth Improved Image Regstraton, Automatc Image Selecton, and Image Enhancement Yu-Chuan Kuo ( ), Chen-Yu Chen ( ), and Chou-Shann Fuh ( ) Department of Computer Scence

More information

Statistical Methods to Develop Rating Models

Statistical Methods to Develop Rating Models Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and

More information

Fault tolerance in cloud technologies presented as a service

Fault tolerance in cloud technologies presented as a service Internatonal Scentfc Conference Computer Scence 2015 Pavel Dzhunev, PhD student Fault tolerance n cloud technologes presented as a servce INTRODUCTION Improvements n technques for vrtualzaton and performance

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

SCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS

SCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS SCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS Magdalena Rogalska 1, Wocech Bożeko 2,Zdzsław Heduck 3, 1 Lubln Unversty of Technology, 2- Lubln, Nadbystrzycka 4., Poland. E-mal:rogalska@akropols.pol.lubln.pl

More information

An Analysis of Dynamic Severity and Population Size

An Analysis of Dynamic Severity and Population Size An Analyss of Dynamc Severty and Populaton Sze Karsten Wecker Unversty of Stuttgart, Insttute of Computer Scence, Bretwesenstr. 2 22, 7565 Stuttgart, Germany, emal: Karsten.Wecker@nformatk.un-stuttgart.de

More information

Computer-assisted Auditing for High- Volume Medical Coding

Computer-assisted Auditing for High- Volume Medical Coding Computer-asssted Audtng for Hgh-Volume Medcal Codng Computer-asssted Audtng for Hgh- Volume Medcal Codng by Danel T. Henze, PhD; Peter Feller, MS; Jerry McCorkle, BA; and Mark Morsch, MS Abstract The volume

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management

More information

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008 Rsk-based Fatgue Estmate of Deep Water Rsers -- Course Project for EM388F: Fracture Mechancs, Sprng 2008 Chen Sh Department of Cvl, Archtectural, and Envronmental Engneerng The Unversty of Texas at Austn

More information

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao

More information

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy Fnancal Tme Seres Analyss Patrck McSharry patrck@mcsharry.net www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton

More information

Sciences Shenyang, Shenyang, China.

Sciences Shenyang, Shenyang, China. Advanced Materals Research Vols. 314-316 (2011) pp 1315-1320 (2011) Trans Tech Publcatons, Swtzerland do:10.4028/www.scentfc.net/amr.314-316.1315 Solvng the Two-Obectve Shop Schedulng Problem n MTO Manufacturng

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

A novel Method for Data Mining and Classification based on

A novel Method for Data Mining and Classification based on A novel Method for Data Mnng and Classfcaton based on Ensemble Learnng 1 1, Frst Author Nejang Normal Unversty;Schuan Nejang 641112,Chna, E-mal: lhan-gege@126.com Abstract Data mnng has been attached great

More information

Lecture 2: Single Layer Perceptrons Kevin Swingler

Lecture 2: Single Layer Perceptrons Kevin Swingler Lecture 2: Sngle Layer Perceptrons Kevn Sngler kms@cs.str.ac.uk Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses

More information

Mooring Pattern Optimization using Genetic Algorithms

Mooring Pattern Optimization using Genetic Algorithms 6th World Congresses of Structural and Multdscplnary Optmzaton Ro de Janero, 30 May - 03 June 005, Brazl Moorng Pattern Optmzaton usng Genetc Algorthms Alonso J. Juvnao Carbono, Ivan F. M. Menezes Luz

More information

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Internatonal Journal of Electronc Busness Management, Vol. 3, No. 4, pp. 30-30 (2005) 30 THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Yu-Mn Chang *, Yu-Cheh

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,

More information

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE Yu-L Huang Industral Engneerng Department New Mexco State Unversty Las Cruces, New Mexco 88003, U.S.A. Abstract Patent

More information

A New Task Scheduling Algorithm Based on Improved Genetic Algorithm

A New Task Scheduling Algorithm Based on Improved Genetic Algorithm A New Task Schedulng Algorthm Based on Improved Genetc Algorthm n Cloud Computng Envronment Congcong Xong, Long Feng, Lxan Chen A New Task Schedulng Algorthm Based on Improved Genetc Algorthm n Cloud Computng

More information

Testing and Debugging Resource Allocation for Fault Detection and Removal Process

Testing and Debugging Resource Allocation for Fault Detection and Removal Process Internatonal Journal of New Computer Archtectures and ther Applcatons (IJNCAA) 4(4): 93-00 The Socety of Dgtal Informaton and Wreless Communcatons, 04 (ISSN: 0-9085) Testng and Debuggng Resource Allocaton

More information

A spam filtering model based on immune mechanism

A spam filtering model based on immune mechanism Avalable onlne www.jocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):2533-2540 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A spam flterng model based on mmune mechansm Ya-png

More information

Machine Learning and Software Quality Prediction: As an Expert System

Machine Learning and Software Quality Prediction: As an Expert System I.J. Informaton Engneerng and Electronc Busness, 2014, 2, 9-27 Publshed Onlne Aprl 2014 n MECS (http://www.mecs-press.org/) DOI: 10.5815/jeeb.2014.02.02 Machne Learnng and Software Qualty Predcton: As

More information

1. Introduction. Graham Kendall School of Computer Science and IT ASAP Research Group University of Nottingham Nottingham, NG8 1BB gxk@cs.nott.ac.

1. Introduction. Graham Kendall School of Computer Science and IT ASAP Research Group University of Nottingham Nottingham, NG8 1BB gxk@cs.nott.ac. The Co-evoluton of Tradng Strateges n A Mult-agent Based Smulated Stock Market Through the Integraton of Indvdual Learnng and Socal Learnng Graham Kendall School of Computer Scence and IT ASAP Research

More information

HowHow to Find the Best Online Stock Broker

HowHow to Find the Best Online Stock Broker A GENERAL APPROACH FOR SECURITY MONITORING AND PREVENTIVE CONTROL OF NETWORKS WITH LARGE WIND POWER PRODUCTION Helena Vasconcelos INESC Porto hvasconcelos@nescportopt J N Fdalgo INESC Porto and FEUP jfdalgo@nescportopt

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n

More information

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

Conversion between the vector and raster data structures using Fuzzy Geographical Entities Converson between the vector and raster data structures usng Fuzzy Geographcal Enttes Cdála Fonte Department of Mathematcs Faculty of Scences and Technology Unversty of Combra, Apartado 38, 3 454 Combra,

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.

More information

HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA*

HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA* HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA* Luísa Farnha** 1. INTRODUCTION The rapd growth n Portuguese households ndebtedness n the past few years ncreased the concerns that debt

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

Minimal Coding Network With Combinatorial Structure For Instantaneous Recovery From Edge Failures

Minimal Coding Network With Combinatorial Structure For Instantaneous Recovery From Edge Failures Mnmal Codng Network Wth Combnatoral Structure For Instantaneous Recovery From Edge Falures Ashly Joseph 1, Mr.M.Sadsh Sendl 2, Dr.S.Karthk 3 1 Fnal Year ME CSE Student Department of Computer Scence Engneerng

More information

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

How To Understand The Results Of The German Meris Cloud And Water Vapour Product Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

Survey on Virtual Machine Placement Techniques in Cloud Computing Environment

Survey on Virtual Machine Placement Techniques in Cloud Computing Environment Survey on Vrtual Machne Placement Technques n Cloud Computng Envronment Rajeev Kumar Gupta and R. K. Paterya Department of Computer Scence & Engneerng, MANIT, Bhopal, Inda ABSTRACT In tradtonal data center

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications Methodology to Determne Relatonshps between Performance Factors n Hadoop Cloud Computng Applcatons Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng and

More information

Mining Multiple Large Data Sources

Mining Multiple Large Data Sources The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of

More information

Rate Monotonic (RM) Disadvantages of cyclic. TDDB47 Real Time Systems. Lecture 2: RM & EDF. Priority-based scheduling. States of a process

Rate Monotonic (RM) Disadvantages of cyclic. TDDB47 Real Time Systems. Lecture 2: RM & EDF. Priority-based scheduling. States of a process Dsadvantages of cyclc TDDB47 Real Tme Systems Manual scheduler constructon Cannot deal wth any runtme changes What happens f we add a task to the set? Real-Tme Systems Laboratory Department of Computer

More information

1 Example 1: Axis-aligned rectangles

1 Example 1: Axis-aligned rectangles COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

More information

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching) Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton

More information

Application of Multi-Agents for Fault Detection and Reconfiguration of Power Distribution Systems

Application of Multi-Agents for Fault Detection and Reconfiguration of Power Distribution Systems 1 Applcaton of Mult-Agents for Fault Detecton and Reconfguraton of Power Dstrbuton Systems K. Nareshkumar, Member, IEEE, M. A. Choudhry, Senor Member, IEEE, J. La, A. Felach, Senor Member, IEEE Abstract--The

More information

Predicting Software Development Project Outcomes *

Predicting Software Development Project Outcomes * Predctng Software Development Project Outcomes * Rosna Weber, Mchael Waller, June Verner, Wllam Evanco College of Informaton Scence & Technology, Drexel Unversty 3141 Chestnut Street Phladelpha, PA 19104

More information

Damage detection in composite laminates using coin-tap method

Damage detection in composite laminates using coin-tap method Damage detecton n composte lamnates usng con-tap method S.J. Km Korea Aerospace Research Insttute, 45 Eoeun-Dong, Youseong-Gu, 35-333 Daejeon, Republc of Korea yaeln@kar.re.kr 45 The con-tap test has the

More information

A GENETIC ALGORITHM-BASED METHOD FOR CREATING IMPARTIAL WORK SCHEDULES FOR NURSES

A GENETIC ALGORITHM-BASED METHOD FOR CREATING IMPARTIAL WORK SCHEDULES FOR NURSES 82 Internatonal Journal of Electronc Busness Management, Vol. 0, No. 3, pp. 82-93 (202) A GENETIC ALGORITHM-BASED METHOD FOR CREATING IMPARTIAL WORK SCHEDULES FOR NURSES Feng-Cheng Yang * and We-Tng Wu

More information

The Journal of Systems and Software

The Journal of Systems and Software The Journal of Systems and Software 82 (2009) 241 252 Contents lsts avalable at ScenceDrect The Journal of Systems and Software journal homepage: www. elsever. com/ locate/ jss A study of project selecton

More information

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and POLYSA: A Polynomal Algorthm for Non-bnary Constrant Satsfacton Problems wth and Mguel A. Saldo, Federco Barber Dpto. Sstemas Informátcos y Computacón Unversdad Poltécnca de Valenca, Camno de Vera s/n

More information

Performance Management and Evaluation Research to University Students

Performance Management and Evaluation Research to University Students 631 A publcaton of CHEMICAL ENGINEERING TRANSACTIONS VOL. 46, 2015 Guest Edtors: Peyu Ren, Yancang L, Hupng Song Copyrght 2015, AIDIC Servz S.r.l., ISBN 978-88-95608-37-2; ISSN 2283-9216 The Italan Assocaton

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

Using Multi-objective Metaheuristics to Solve the Software Project Scheduling Problem

Using Multi-objective Metaheuristics to Solve the Software Project Scheduling Problem Usng Mult-obectve Metaheurstcs to Solve the Software Proect Schedulng Problem Francsco Chcano Unversty of Málaga, Span chcano@lcc.uma.es Francsco Luna Unversty of Málaga, Span flv@lcc.uma.es Enrque Alba

More information

8 Algorithm for Binary Searching in Trees

8 Algorithm for Binary Searching in Trees 8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the

More information

Using an Adaptive Fuzzy Logic System to Optimise Knowledge Discovery in Proteomics

Using an Adaptive Fuzzy Logic System to Optimise Knowledge Discovery in Proteomics Usng an Adaptve Fuzzy Logc System to Optmse Knowledge Dscovery n Proteomcs James Malone, Ken McGarry and Chrs Bowerman School of Computng and Technology Sunderland Unversty St. Peter s Way, Sunderland,

More information

RELIABILITY, RISK AND AVAILABILITY ANLYSIS OF A CONTAINER GANTRY CRANE ABSTRACT

RELIABILITY, RISK AND AVAILABILITY ANLYSIS OF A CONTAINER GANTRY CRANE ABSTRACT Kolowrock Krzysztof Joanna oszynska MODELLING ENVIRONMENT AND INFRATRUCTURE INFLUENCE ON RELIABILITY AND OPERATION RT&A # () (Vol.) March RELIABILITY RIK AND AVAILABILITY ANLYI OF A CONTAINER GANTRY CRANE

More information

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

Credit Limit Optimization (CLO) for Credit Cards

Credit Limit Optimization (CLO) for Credit Cards Credt Lmt Optmzaton (CLO) for Credt Cards Vay S. Desa CSCC IX, Ednburgh September 8, 2005 Copyrght 2003, SAS Insttute Inc. All rghts reserved. SAS Propretary Agenda Background Tradtonal approaches to credt

More information

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng

More information

A Simple Approach to Clustering in Excel

A Simple Approach to Clustering in Excel A Smple Approach to Clusterng n Excel Aravnd H Center for Computatonal Engneerng and Networng Amrta Vshwa Vdyapeetham, Combatore, Inda C Rajgopal Center for Computatonal Engneerng and Networng Amrta Vshwa

More information

Estimating the Development Effort of Web Projects in Chile

Estimating the Development Effort of Web Projects in Chile Estmatng the Development Effort of Web Projects n Chle Sergo F. Ochoa Computer Scences Department Unversty of Chle (56 2) 678-4364 sochoa@dcc.uchle.cl M. Cecla Bastarrca Computer Scences Department Unversty

More information

Ant Colony Optimization for Economic Generator Scheduling and Load Dispatch

Ant Colony Optimization for Economic Generator Scheduling and Load Dispatch Proceedngs of the th WSEAS Int. Conf. on EVOLUTIONARY COMPUTING, Lsbon, Portugal, June 1-18, 5 (pp17-175) Ant Colony Optmzaton for Economc Generator Schedulng and Load Dspatch K. S. Swarup Abstract Feasblty

More information

Selecting Best Employee of the Year Using Analytical Hierarchy Process

Selecting Best Employee of the Year Using Analytical Hierarchy Process J. Basc. Appl. Sc. Res., 5(11)72-76, 2015 2015, TextRoad Publcaton ISSN 2090-4304 Journal of Basc and Appled Scentfc Research www.textroad.com Selectng Best Employee of the Year Usng Analytcal Herarchy

More information

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm Document Clusterng Analyss Based on Hybrd PSO+K-means Algorthm Xaohu Cu, Thomas E. Potok Appled Software Engneerng Research Group, Computatonal Scences and Engneerng Dvson, Oak Rdge Natonal Laboratory,

More information

BUSINESS PROCESS PERFORMANCE MANAGEMENT USING BAYESIAN BELIEF NETWORK. 0688, dskim@ssu.ac.kr

BUSINESS PROCESS PERFORMANCE MANAGEMENT USING BAYESIAN BELIEF NETWORK. 0688, dskim@ssu.ac.kr Proceedngs of the 41st Internatonal Conference on Computers & Industral Engneerng BUSINESS PROCESS PERFORMANCE MANAGEMENT USING BAYESIAN BELIEF NETWORK Yeong-bn Mn 1, Yongwoo Shn 2, Km Jeehong 1, Dongsoo

More information

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining Rsk Model of Long-Term Producton Schedulng n Open Pt Gold Mnng R Halatchev 1 and P Lever 2 ABSTRACT Open pt gold mnng s an mportant sector of the Australan mnng ndustry. It uses large amounts of nvestments,

More information

Chapter 6. Classification and Prediction

Chapter 6. Classification and Prediction Chapter 6. Classfcaton and Predcton What s classfcaton? What s Lazy learners (or learnng from predcton? your neghbors) Issues regardng classfcaton and Frequent-pattern-based predcton classfcaton Classfcaton

More information