TAXONOMIC EVIDENCE APPLYING ALGORITHMS OF INTELLIGENT DATA MINING. ASTEROIDS FAMILIES

Size: px
Start display at page:

Download "TAXONOMIC EVIDENCE APPLYING ALGORITHMS OF INTELLIGENT DATA MINING. ASTEROIDS FAMILIES"

Transcription

1 TAXONOMIC EVIDENCE APPLYING ALGORITHMS OF INTELLIGENT DATA MINING. ASTEROIDS FAMILIES Gregoro Perchnsky(1) Magdalena Servente(2) Arturo Carlos Servetto(1) Ramón García Martínez(3,2) Rosa Beatrz Orellana(4) Angel Lus Plastno (5) Databases and Operatng System Laboratory Computer Scence Department School of Engneerng Unversty of Buenos Ares Paseo Colón Nº th Floor South Wng (1063) Buenos Ares -Argentna Phone: (54 11) (nt. 140/145) FAX: (54 1) (2) Intellgent System Laboratory Computer Scence Computer Scence Department School of Engneerng Unversty of Buenos Ares Paseo Colón Nº th Floor South Wng (1063) Buenos Ares -Argentna Phone: (54 11) (nt. 140/145) FAX: (54 1) (3) Software Egneerng & Knowledge Engneerng Center (CAPIS). Graduate School Buenos Ares Insttute of Technology Madero 399. (1106) Buenos Ares - Argentna Phone: (54-11) FAX: (54 1) ext 277 (4) Mechancs Laboratory Celestal Mechancs Department School of Astronomcal and Geophyscal Scences Unversty of La Plata Paseo del Bosque (1900) La Plata - Buenos Ares - Argentna Phone: (54 221) (5) PROTEM Laboratory Department of Physcal Scences School of Scences - Unversty of La Plata C.C. 727 or (115 # 48/49) (1900) La Plata Buenos Ares - Argentna Phone: (54 221) (54 221) (ext. 247) KEYWORDS: classfcaton, cluster (famly), spectrum, nducton, dvde and rule, entropy. ABSTRACT Numercal Taxonomy ams to group n clusters, usng socalled structure analyss of operatonal taxonomc unts (OTUs or taxons or taxa) through numercal methods. Clusters that constute famles was the purpose of ths seres of last projects. Structural analyss, based on ther phenotypc characterstcs, exhbts the relatonshps, n terms of degrees of smlarty, between two or more OTUs. Enttes formed by dynamc domans of attrbutes, change accordng to taxonomcal requrements: Classfcaton of objects to form famles. Taxonomc objects are represented by semantcs applcaton of Dynamc Relatonal Database Model. Famles of OTUs are obtaned employng as tools ) the Eucldean dstance and ) nearest neghbor technques. Thus taxonomc evdence s gathered so as to quantfy the smlarty for each par of OTUs (par-group method) obtaned from the basc data matrx. The man contrbuton up untl now s to ntroduce the concept of spectrum of the OTUs, based n the states of ther characters. The concept of famles spectra emerges, f the superposton prncple s appled to the spectra of the OTUs, and the groups are delmted through the maxmum of the Benaymé-Tchebycheff relaton, that determnes Invarants (centrod, varance and radus). A new taxonomc crteron s thereby formulated. An astronomc applcaton s worked out. The result s a new crteron for the classfcaton of asterods n the hyperspace of orbtal proper elements. Thus, a new approach to Computatonal Taxonomy s presented, that has been already employed wth reference to Data Mnng. Ths paper analyses the applcaton of Machne Learnng technques to Data Mnng. We focused our nterest on the TDIDT (Top Down Inducton Trees) nducton famly from pre-classfed data, and n partcular to the ID3 and the C4.5 algorthms, created by Qunlan. We tred to determne the degree of effcency acheved by the TDIDT famly s algorthms when appled n data mnng to generate vald models of the data n classfcaton problems wth the Gan of Entropy. The Informatcs (Data Mnng and Computatonal Taxonomy), s always the orgnal objectve of our researches. 1. Introducton Taxonomc objects are here represented by the applcaton of the semantcs of the Dynamc Relatonal Database Model: Classfcaton of objects to form famles or clusters[1]. Famles of OTUs are obtaned employng as tools ) the Eucldean dstance and ) nearest neghbor technques. Thus taxonomc evdence s gathered so as to quantfy the smlarty for each par of OTUs (par-group method) obtaned from the basc data matrx[2][3][4].the man contrbuton of the seres

2 of papers presented untl now was to ntroduce the concept of spectrum of the OTUs, based n the states of ther characters. The concept of famles spectra emerges, f the superposton prncple s appled to the spectra of the OTUs, and the groups are delmted through the maxmum of the Benaymé-Tchebycheff relaton, that determnes Invarants (centrod, varance and radus) [1]. Applyng the ntegrated, ndependent doman technque dynamcally to compute the Matrx of Smlarty, and, by recourse to an teratve algorthm, famles or clusters are obtaned. A new taxonomc crteron was thereby formulated. The consderable dscrepances among the ncongrutes and exstng classfcatons of astrophyscal study results have motvated an nterdscplnary program of research that notces a clusterng of asterods n stablzed famles [5]. In our case, s worked n an nterdscplnary way n Celestal Mechancs[5], Theory of the Informaton[6][7], Neural Networks[8] and Dynamc Databases [1] and the Algorthmc of the Numercal Taxonomy [2] [4], to acheve the dscovery of the depths of the structure formaton of the Solar An astronomc applcaton s worked out. The result s a new crteron for the classfcaton of asterods n the hyperspace of orbtal proper elements. Thus, a new approach to Computatonal Taxonomy s presented, that has been already employed wth reference to Data Mnng. On the other hand: () the work of [1] has clarfed subtle ponts concernng the dynamc evoluton n the long-term of the asterods orbts, whose modelng s an essental prerequste for the proper elements dervng (for the classfcaton n famles); and () the avalablty of physcal data on szes, shapes, numercal taxonomy and rotaton velocty to many hundred asterods has provoked new famles analyses [1]. Whle the most populous famles appear n both crtera n qute homogeneous form, the crteron of the composton and physcal precedents and cosmochemcal, s a crteron wth more or less dffculty and the crteron whch wth less dffculty has dentfed famles s that one whch uses data from celestal mechancs. We do not consder n the transformaton of sotropc and homogeneous sets, changng the values of the eccentrcty and the semaxs to recompute the values of the zones of nter-gap of the asterods belt nto the veloctes n average, or elmnatng groups from 5 or fewer objects, all of whch we consder are outsde a Computatonal crteron. 1.1 Intellgent Data Mnng Introducton Machne Learnng s the feld dedcated to the development of computatonal methods underlyng learnng processes and to applyng computer-based learnng systems to practcal problems. Data Mnng tres to solve those problems related to the search of nterestng patterns and mportant regulartes n large databases [9] [[10]..[15]]. Data Mnng uses methods and strateges from other areas, ncludng Machne Learnng. When we apply Machne Learnng technques to solve a Data Mnng problem, we refer to t as an Intellgent Data Mnng. Ths paper analyses the TDIDT (Top Down Inducton Trees) nducton famly, and n partcular to the C4.5 algorthm[13b][14]. We tred to determne the degree of effcency acheved by the C4.5 algorthm when appled n data mnng to generate vald models of the data n classfcaton problems wth the Gan of Entropy. The C4.5 algorthm generate decson trees and decson rules from pre-classfed data. The dvde and rule method s used to buld the decson trees. Ths method dvdes the nput data n subsets accordng to some preestablshed crtera. Then t works on each of these subsets dvdng them agan, untl all the cases present n one subset belong to the same class. 2. Constructng the decson trees 2.1. ID3 The Inducton Decson Trees algorthm was developed as a supervsed learnng method, for buld decson trees from a set of examples. The examples must have a group of attrbutes and a class. The attrbutes and classes must be dscrete, and the classes must be dsjont. The frst versons of ths algorthms allowed just two classes: postve and negatve. Ths restrcton was elmnated n later releases, but the dsjont classes restrcton was preserved. The descrptons generated by ID3 cover each one of the examples n the tranng set C4.5 The C4.5 algorthm s a descendant of the ID3 algorthm, and solves many of ts predecessor s lmtatons. For example, the C4.5 works wth contnuous attrbutes, by dvdng the possble results n two branches: one for those values A <=N and another one for A >N. Moreover, the trees are less bushy because each leaf covers a dstrbuton of classes and not one class n partcular as the ID3 trees, ths makes trees less profound and more understandable[13b][14]. C4.5 generates a decson tree parttonng the data recursvely, accordng to the depthfrst strategy. Before makng each partton, the system analyses all the possble tests that can dvde the data set and selects the test wth the hgher nformaton gan or the hgher gan rato. For dscrete attrbutes, t consders a test wth n possble outcomes, n beng the amount of possble values that the attrbute can take. For contnuous attrbute, a bnary test s performed on each of the values that the attrbute can take.

3 2.3. Decson trees The trees TDIDT, to those whch belong generated them by the ID3 and post C4.5, are bult from method of Hunt.The ID3 and C4.5 algorthms use the dvde and rule strategy to buld the ntal decson tree from the tranng data [16]. The form of ths method to buld a decson tree as of a set T of tranng data, dvdes the data n each step accordng to the values of the best attrbute. Any test that dvdes T n a non trval manner, as long as two dfferent {T } are not empty, s very smple. They wll be the classes {C 1, C 2,..., C k }. T contans cases belongng to several classes, n ths case, the dea s to refne T n subsets of cases that tend, or seem to tend toward a collecton of cases belongng to an only class. It s chosen a test based on an only attrbute, that has one or more resulted, mutually excludng {O 1, O 2,..., O n }. T s partton of the subsets T 1, T 2,..., T n where T contans all the cases of T that have the result O for the elected test. The decson tree for T conssts n a node of decson dentfyng the test, wth a branch for each possble result. The constructon mechansm of the tree s appled recursvely to each subset of tranng data, so that the -th branch carry to the decson tree bult by the subset T of tranng data. Stll, the ultmate objectve behnd the process of constructng the decson tree sn t just to fnd any decson tree, but to fnd a decson tree that reveals a certan structure of the doman, that s to say, a tree wth predctve power. That s the reason why each leave must cover a large number of cases, and why each partton must have the smallest possble number of classes. In an deal case, we would lke to choose n each step the test that generates the smallest decson tree. Bascally, what we are lookng for s a small decson tree consstent wth the tranng data. We could explore and analyze all the possble decson trees and choose the smplest one. However, the searchng and hypothess space has an exponental number of trees that would have to be explored. The problem of fndng the smallest decson tree consstent wth the tranng data has NPcomplexty. To calculate whch s the best attrbute to dvde the data n each step, both the nformaton gan and the gan rato were used. Moreover, the trees generated wth the C4.5 algorthm were pruned accordng to the method, ths postprunng was made n order to avod the overfttng of the data Transformng decson trees to decson rules Decson trees that are too bg or too bushy are somewhat dffcult to read and understand because each node must be nterpreted n the context defned by the prevous branches. In any decson tree, the condtons that must be satsfed when classfyng a case can be found followng a tral from the root to the leave to whch that case belongs. If that tral was transformed drectly nto a producton rule, the antecedent of the rule would be the conjuncton of all the tests n the nodes that must be traversed to reach the leaf. All the antecedents of the rules bult ths way are mutually exclusve and exhaustve. To transform a tree to decson rules, the C4.5 algorthm traverses the decson tree n preorder (from the root to the leaves, from left to rght) and constructs a rule for each path from the root to the leaves. The rule s antecedent s the conjuncton of the value tests belongng to each of the vsted nodes, and the class s the one correspondng to the leaf reached Evaluaton of the TDIDT famly We used a crossed-valdaton approach to evaluate the decson trees and the producton rules obtaned. Each dataset was dvded nto two sets wth proportons 2:3 and 1:3. We used two thrds of the orgnal data as a tranng set and one thrd to evaluate the results. We expressed the results of these tests n a confuson matrx, where each class had two values assocated to t: the number of examples classfed correctly and the number of examples classfed as belongng to another class. 3. Requrements engneerng Hrayama Examnng the dstrbuton of the asterods wth respect to ther orbtal elements, n partcular ther prncpal movement, the nclnaton and the eccentrcty, are observed condensatons n dfferent places that seem at random, but there are some cases n whch takng nto account only the quanttes of the probablty s not so evdent [1]. The asterods are also grouped by havng nearby nclnatons or the plans of the orbtal have practcally the same pole (that of the orbt of Jupter), other groupngs do not have the same center but the drawng of the graph takng the eccentrcty and the length of the perhelon nstead of the nclnaton and the length of the node dstrbuton has the shape of a crcumference. Contnung the development of the mentoned theory do not exst doubts of the fact that there are physcal relatonshps that connect the asterods. Because of ths t s that we can venture that there exst assocated asterod famles. The theory remans verfed and thus the famles tranng such as KORONIS (fhn-158), EOS (fhn-221), THEMIS (fhn- 24), FLORA (fhn-244), MARIA (fhn-170) and PHOCAEA (fhn-25) (where fhn s famly head number). The orbtal elements dstrbuton n asterod belts s not at random showng the famles exstence, such that the groups of asterods whose semmajor-axs, ther eccentrcty and ther nclnaton (or the sne of the same) are approxmated to a cluster for certan specal values followng to Arnold (about 1969 there was less than 1735

4 objects) [1]. It has been verfed the agglomeraton n famles (clusterng) correctng the perturbaton perodc produced by secular varatons caused by the major planets, lke Jupter, takng the proper elements. Other groupngs have been dentfed by proper resonance characterstcs or current of mpelled asterods (JET STREAMS) through the FLORA famly and objects that cross MARS n orbts of superor order eccentrcty. Takng nto account that Celestal Bodes are based on physcal attrbutes, on phenotypc characterstc of characters or attrbutes of the asterods and fnally on ther genotypc or common orgn. Nearby vcnty condton should be taken account and the hgh densty famles are the most stable and less random. Famles of Hrayama are confrmed and the small famles are of low densty and the probablty to belong to the famles s hgh and therefore ther couplng by the pargroup method s possble. About 1982, Carus and Valsech there s a record of 2125 smaller planets, asterod type, groupng whch produce dscrepances n the results of the classfcaton computatonal methods based on physcal and dynamcal parameters [1]. Ths dscrepancy among the statstc methods s dsconcertng snce the relatonshp among the members of a famly wth respect to the dynamcal parameters and any physcal study that s accomplshed on the same should be concurrent. It can be observed that the growth n observatons does not solve the dscrepances. Of the methods of famles dentfcaton the dscrepances emerge by ther probablst crtera and the future new asterods dscovery seem that exsts a contradcton between them, but n spte of all ths, f there s congruty, the suspected famles appear n the realty (scentfc method of contrast) but f the methods are arbtrary they are always debatable n addton to the methodologcal doubt [the authors]. For Wllams the problem of Arnold was already dscussed n functon of ther crteron of dstrbuton densty unform Possonan and the proper elements. In the 1980s the analyss technques by smlarty and a generalzed dstance but wth the use of personal judgements or manual managng s what s usual and not an automatc classfcaton. Because of ths appears the consderaton of the varance (σ j ) of the domans and famles for the process of elements dentfcaton wthn the famly or the subsequent. The accepted classes have been splt nto two types: 1), f the class has been dentfed n two ntervals, wthout notceable dfferences and 2), f the class was found mxed couplng wth other less mportant classes n overlap ntervals, beng able to exst masked famles or less relable contours, these aspects should emerge of the proper statstc method. These projects of the Jet Propulson Laboratory, Calforna Insttute of Technology, gave as a result crossng orbts of major planets and that are splt nto famles, by the characterstc of the method. A characterstc s that the strong resonance does not appear n asterod and the weak one s taken as nose. The dstances are taken from a rght lne SUN-PLANET (Mars MXR, Jupter JXR, Saturn SXR, etc.) and the proper values are more exact wthn belt than outsde t (somethng whch endorses the theory of the authors). For Knezevc and Mlan the proper asterod elements of an analytcal theory of second order, of asterods dentfed n the prncpal belt (man-belt), are much more exact than those of eccentrcty and small nclnaton n the regon of the famly Thems. Ths s because the short perodcal perturbatons are elmnated and are taken nto account the prncpal second dependent order effects, accordng to the results of the consstent algorthm wth the modern dynamc theores of Kolmogorov-Arnold-Moser, they are about 3495 asterods of the edton of the Lenngrad Ephemerdes of the Mnor Planets. Hldas, Troyanas and the nearby to the Earth (q < 1.1 u.a.) were dscarded. All ths development appears less clear and arbtrary, there s not a formal bass n the relatonshp convergence quantty of teratons (code of qualty QC) and the number of asterods. The crteron of Zappala,Cellno,Farnella and Knezevc (1992 and subsequent) s mportant snce an mproved asterods classfcaton was noted n dynamc famles, analyzng a numbered asterods database, whose proper elements have been computed n a new second-order, fourth-degree secular perturbaton theory by, and verfed ther stablty n the long term. The multvarate crteron uses the technque of herarchc clusterng data analyss. It was appled to buld for each zone of the asterods belt a "dendrogram, graph, n the proper elements space, wth a dstance n functon related to the necessary ncremental velocty of the orbtal change after the ejecton from the fractonal parent body. The parameters of mportance assocated wth each famly, measured as random concentratons results, (as to transform the zones ansotropy and nhomogeneous nto homogeneous zones and sotropy of the nter-gaps zones n the asterods belt modfyng mechancal attrbutes as the semmajor-axs and the nclnaton) and the hardness parameters (stablty), were obtaned repeatng the classfcaton procedure after varyng the velocty elements n small quanttes to recompute the real zones from the calculatons wth the artfcal changng of the coeffcents of the dstance functon. The most mportant and healthy famles are as usual Thems, Eos, and Korons, that jontly nclude 14% of the known prncpal belt of the populaton; but 12 more relable and healthy famles that were found throughout the belt, the majorty departed partally of prevous classfcatons. It s the case of FLORA n the regon of the nteror belt, gvng rse for a very dffcult relable famles dentfcaton, manly when have a hgh densty and the

5 accuracy of the nclnatons and proper eccentrctes s poor manly on account of the proxmty of a strong secular resonance. It s arrved thus to consttute 21 famles wth an actually mportant method and totally automated methods Spectral analyss classfcaton crteron We have decded to accomplsh wth our spectral analyss crteron, the classfcatons extended to the proper elements database of asterods n famles[1]. We recognze that the works of Zappala are very mportant (automatc classfcaton and herarchc method), and a pont of nflecton n the early 90 s but s dfferent the approach because we work n computatonal taxonomy, n a taxonomc hyperspace, and not n a crteron of the composton and physcal precedents and cosmochemcal. Zappala use a confusng methodology, wth only one varable of velocty, and that transforms a homogeneous space nto nhomogeneous one and conversely not clearly unvocal. Incorporatng thus an updated and larger set of osculatng elements that were derved from the secular perturbaton theory, whose accuracy (specfcally, the stablty n the tme) has been extensvely verfed by numercal ntegraton n the long-term; n automatc form, and to prejudce the technque of data analyss n not-random groups s not used n the proper elements space as n the crteron of Zappala and quanttatvely the statstcal mportance of these groups; wth robustness of the statstcs for the mportant famles wth respect to the small random varatons of proper elements, all based on an analyss on Computatonal Taxonomy. We do not consder n the transformaton of sotropc and homogeneous sets, changng the values of the eccentrcty and the semaxs to recompute the values of the zones of nter-gap of the asterods belt nto the veloctes n average, or elmnatng groups from 5 or fewer objects, all of whch we consder are outsde a Computatonal crteron. Thus, a new approach to Computatonal Taxonomy s presented, that has been already employed wth reference to Data Mnng Numercal Taxonomy. We nfer an analogy of the taxonomc representaton [1] n dynamc relatonal database. We explan the theoretcal development of a doman s structured Database and how they can be represented n a Dynamc Database. Immedately we apply our model to the structural aspects of the taxonomy, applyng Scalng Methods for domans[2] [4]. We defne numercal methods used for establshng and defnng clusters by ther taxonomc dstances. We shall let C jk stand for a general dssmlarty coeffcent of whch taxonomc dstance, d jk, s a specal example. Eucldean dstances wll be used n the explanaton of clusterng technques. In dscussng clusterng procedures we make a useful dstncton between three types of measure. We use clusterng strategy of space-conservng or the space-dstortng strateges that appears as though the space n the mmedate vcnty of a cluster has been contracted or dlated and f we return to the crteron of admsson for a canddate jonng an extant cluster, ths s constant n all par-group method. Thus we can represent the data matrx and to compute the resemblance of normalzed domans. The steps of clusterng are the recomputaton of the coeffcent of smlarty for future admsson followed by the admsson crteron for new members to an establshed cluster. The strateges of both space-conservng and spacedstortng that appear n the mmedate vcnty of a cluster ether contract or dlate the space, and ths s constant n all par-group methods [1] Dsperson Once a typcal value t s known of the varable of the states of the characters, t s necessary to have a parameter that gve an dea of how scattered, or concentrated, are ther values respect to the mean value[19]. It s consdered to the varance as a moment of second order and represents the moment of nerta of the dstrbuton of objects ( mass ) wth respect to ther gravty center: centrod. When X j = ( Xj - Xj ) / σj [2] s a normalzed varable the one whch represents the devaton of Xj wth respect to ther mean n unts of σj. The normalzaton of the states of the character causes that the average of all character wll be of value zero and varance of untary value. If we take as value of the dsperson to the varance σ 2 d, we express the prncple of mnmal square. It wll be g ( Xj ) a not negatve functon of the varable Xj, for all k > 0 wll have to be the probablty functon: If g ( Xj ) = ( Xj - Xj ) 2, K = k 2 σj 2, obtanng for all k > 0 the nequalty from Benaymé-Tchevcheff: P ( Xj - Xj k. σj ) 1 / k 2 Ths nequalty shows that the quantty of ( OTUs ) mass of the located dstrbuton would be of the nterval Xj - k. σj < Xj < Xj + k. σj t s to what s maxmal value equal to 1 / k 2, gvng a utlzaton dea of σj as measure of the dsperson or concentraton Clusters and Spectra. In dscussng Sequental, Agglomeratve, Herarchc and Nonoverlappng (SAHN) [4] clusterng procedures we

6 make a useful dstncton between the three types of measure. We shall be concerned wth clusters J,K and L contanng tj, tk and tl OTUs, respectvely, where tj, tk and tl all 1. OTUs j and k are contaned n clusters J and K, and l L, respectvely. Gven two clusters J and K that are to be joned, the problem s to evaluate the dssmlarty between the resultng jont cluster and addtonal canddates L for further fuson. The fused cluster s denoted (J,K), wth t j,k = t j + t k OTUs. The cluster center or centrod represents an average object, whch s smply a mathematcal construct that permts the characterzaton of the Densty, the Varance, the taxon radus and the range as INVARIANT quanttes. The states of the taxonomc characters n a class, defned ordnarly wth reference to the set of ther propertes, allow one to calculate the dstances between the members of the class. The dstances can be establshed by the smlarty relatonshp among ndvduals (obtanng a matrx of smlarty that has been computed). Consderng characterstc spectra [1], n addton to the states of the characters or attrbutes of the OTUs, we ntroduce here the new SPECTRAL concepts of )OBJECTS and )FAMILY SPECTRA. Wthn the taxonomc space ths method of clusterng delmts taxonomc groups n such a manner that they can be vsualzed as characterstc spectra of an OTU and characterstc spectra of the famles. We defne an ndvdual spectral metrc for the set of dstances between an OTU and the other OTUs of the set. Each one provdes the states of the characters and, therefore, s constant for each OTU, f the taxonomc condtons do not change (n analogy wth the fasors) havng an ndvdual taxonomc spectrum (ITS). The spectrum of taxonomc smlarty s the set of dstances between the OTUs of the set, that determne the constant characterstcs of a cluster or famly, for a gven type of taxonomc condtons. Invarants are found that characterze each cluster. Among them we menton the varance, the radus, the densty and the centrod. These nvarants are assocated wth the spectra of taxonomc smlarty that dentfy each famly Tests of Intellgent Data Mnng A software system was constructed to evaluate the C4.5 algorthm. Ths system takes the tranng data as an nput and allows the user to choose whether he wants to construct a decson tree accordng to the C4.5. If the user chooses the C4.5, the decson tree s generated, then t s pruned and the decson rules are bult. The decson tree and the ruleset generated by the C4.5 are evaluated separate from each other. We use the system to test the algorthms n dfferent domans, manly Elta: a base of asterods Compute of the Informaton Gan In the cases, n those whch the set T contans examples belongng to dfferent classes, s accomplshed a test on the dfferent attrbutes and s accomplshed a partton accordng to the "better" attrbute. To fnd the "better" attrbute, s used the theory of the nformaton, that supports that the nformaton s maxmzed when the entropy s mnmzed. The entropy determnes the randomness or dsorder of a set. We suppose that we have negatve and postve examples. In ths context the entropy of the subset S, H(S ), t can be calculated as: + + H ( S ) = p log p p log p (3.4.1) + Where p s the probablty of a example s taken n random mode of S wll be postve. Ths probablty may be calculated as + + n p = (3.4.2) + n + n Beng + n the quantty of postves examples of S, and n the quantty of negatves examples. + The probablty p s calculated n analogous form to p, replacng the quantty of postves examples by the quantty of negatves examples, and conversely. Generalzng the expresson (3.4.1) for any type of examples, we obtan the general formulaton of the entropy: n H ( S ) = p log p (3.4.3) = 1 In all the calculatons related to the entropy, we defne 0log0 equal to 0. If the attrbute at dvde the set S n the subsets S, = 1,2,....., n, then, the total entropy of the system of subsets wll be: n ( S ) H ( ) H ( S, at) = P (3.4.4) = 1 S Where ( ) S H s the entropy of the subset P S s the probablty of the fact that an example belong to S. S and ( ) It can be calculate, used the relatve szes of the subsets, as: S P( S ) = (3.4.5) S The gan of nformaton may be calculate as the decrease n entropy. Thus: I S, at = H S H S, at (3.4.6) ( ) ( ) ( )

7 H s the value of the entropy a pror, before H, s the value of the entropy of the subsets system generated by the partton accordng to at. The use of the entropy to evaluate the best attrbute s not the only one exstng method or used n Automatc Learnng. However, t s used by Qunlan upon developng the ID3 and hs succeedng the C4.5. Where ( S ) accomplshng the subdvson, and ( S at) Numercal Data The decson trees can be generated so much as dscrete attrbutes as contnous attrbutes. When t s worked wth dscrete attrbutes, the partton of the set accordng to the value of an attrbute s smple. To solve ths problem, t can be appealed to the bnary method. Ths method conssts n formng two ranges of agreement values to the value of an attrbute, that they can be taken as symbolc. 4. Results and Conclusons Results of the C4.5. The C4.5 wth post-prunng results n trees smaller and less bushy. If we analyze the trees obtaned n the doman, we ll see that the percentages of error obtaned wth the C4.5 are between a 3% and a 3.7%, snce that the C4.5 generate smaller trees and smaller rulesets. Dervatve of the fact that each leaf n a tree generated covers a dstrbuton of classes Error percentage {ELITA} { [1]: C4.5-Gan Trees [2]: C4.5-Gan Rulers [3]: C4.5-Proporton of Gan Trees [4]: C4.5-Rulers Proporton of Gan Trees} < 3% From the analyss of ths value we could conclude that no method can generate a clearly superor model for the doman. On the contrary, we could state that the error percentage doesn t appear to depend on the method used, but on the analyzed doman Hypothess space The hypothess space for ths algorthm s complete accordng to the avalable attrbutes. Because any value test can be represented wth a decson tree, ths algorthm avod one of the prncpal rsks of nductve method that works reducng the spaces of the hypothess. An mportant feature of the C4.5 algorthm s that t use all the avalable data n each step to chose the best attrbute; ths s a decson that s made wth statstc method. Ths fact favors ths algorthm over other algorthms because analyze how the nput dataset take the representaton nto decson trees n consstent forms. Once an attrbute has been selected as a decson node, the algorthm does not go back over ther choces. Ths s the reason why ths algorthm can converge to a local maxmum[20]. The C4.5 algorthm adds a certan degree of reconsderaton of ts choces n the post-prunng of the decson trees. Nevertheless, we can state that the results show that the proporton of error depends on the data doman. For future study, we suggest an analyss the nput datasets wth the numercal method of clusterng and choosng for the doman the method that mantans a low percentage error n extended databases as a robustness of the method. 5. Corollary From what has been sad, the work uses the Sequental, Agglomeratve, Herarchc and Nonoverlappng clusterng procedures, spectral analyss crteron and nvarants to accomplsh classfcatons n extended databases, of proper asterod elements, to structure famles. The pre-classfed data s an mportant nput to Intellgent Data Mnng, and Computatonal Taxonomy n Databases wll have always a low percentage error n extended databases as a robustness of the method; to combne a sure result. References [1]Perchnsky, G., Orellana, R., Plastno, A.L., Jmenez Rey, E. and Gross, M.D. "Spectra of Taxonomc Evdence n Databases." Proceedngs of XVIII Internatonal Conference on Appled Informatcs. (Paper ).Innsbruck. Austra [2]Crsc, J.V., Lopez Armengol, M.F. "Introducton to Theory and Practce of the Numercal Taxonomy", A.S.O. Regonal Program of Scence and Technology for Development. Washngton D.C. Spansh [3]Gennar,J.H. A Survey of Clusterng Methods (b). Techncal Report Department of Computer Scence and Informatcs. Unversty of Calforna., Irvne, CA [4]Sokal, R.R., Sneath, P.H.A. "Numercal Taxonomy".W.H.Freeman and Company [5]Zappala, V, Cellno,A., Farnella,P., Mlan,A., The Astronomcal Journal, 107, [6]Abramson,N., Informaton Theory and Codng. McGraw Hll. Parannfo. Madrd [7]Hammng, R.W. Codng and nformaton theory. Englewood Clfs, NJ: Prentce Hall [8]Freeman,J.A., Skapura,D.M. Neural Networks. Algorthms, applcatons and technques of programmng. Addson Wesley. Iberoamercana. Spansh [9]Mchalsk, R. S A Theory and Methodology of Inductve Learnng. En Mchalsk, R. S., Carbonell, J. G., Mtchell, T. M. (1983) Machne Learnng: An Artfcal Intellgence Approach, Vol. I. Morgan-Kauffman, USA. [10]Qunlan, J.R Inducton of Decson Trees. In Machne Learnng, Ch. 1, p Morgan Kaufmann. [11]Qunlan, J.R Generatng Producton Rules from Decson trees. Proceedng of the Tenth Internatonal Jont

8 Conference on Artfcal Intellgence, p San Mateo, CA., Morgan Kaufmann, USA. [12]Qunlan, J.R Decson trees and mult-valued attrbutes. En J.E. Hayes, D. Mche, and J. Rchards (eds.), Machne Intellgence, V. II, p Oxford Unversty Press, Oxford, UK. [13]Qunlan, J.R Learnng Effcent Classfcaton Procedures and Ther Applcaton to Chess Games, In R. S. Mchalsk, J. G. Carbonell, & T. M. Mtchells (Eds.) Machne Learnng, The Artfcal Intellgence Approach. Morgan Kaufmann, V. II, Ch. 15, p , USA. [13b]Qunlan, J.R C4.5: Programs for Machne Learnng. Morgan Kaufmann Publshers, San Mateo, Calforna, EE.UU. [14]Qunlan, J.R Improved Use of Contnuous Attrbutes n C4.5. Basser Departament of Computer Scence, Unversty of Scence, Australa. [15]Qunlan, J.R Learnng Frst-Order Defntons of Functons. Basser Departament of Computer Scence, Unversty of Scence, Australa [16]Hunt, E.B., Marn, J., Stone, P.J (1995-AI). Experments n Inducton. New York: Academc Press, USA. [17]Hrayama,K. Present State of the Famles of Asterods. Proceedng of Physcs-Mathematcs Socety. Japan II:9. pp [18]Cramer, Harald. Mathematcs Methods n Statstcs.Agular Edton.Madrd.Spansh [19]Mtchell, T Machne Learnng. MCB/McGraw-Hll, Carnege Mellon Unversty, USA. [20]Mtchell, T Decson Trees. Cornell Unversty, USA. [21]Feynman, R.P., Leghton, R.B. & Sands, M. Lectures on physcs, Manly Mechancs, Radaton and Heat. pp ff, 28-6 ff, 29-1 ff, [22]Hetcht,E. and Zajac,A., Optc. Inter-Amercan Educatonal Fund. pp Spansh 1977.

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

Conversion between the vector and raster data structures using Fuzzy Geographical Entities Converson between the vector and raster data structures usng Fuzzy Geographcal Enttes Cdála Fonte Department of Mathematcs Faculty of Scences and Technology Unversty of Combra, Apartado 38, 3 454 Combra,

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

CS 2750 Machine Learning. Lecture 17a. Clustering. CS 2750 Machine Learning. Clustering

CS 2750 Machine Learning. Lecture 17a. Clustering. CS 2750 Machine Learning. Clustering Lecture 7a Clusterng Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Clusterng Groups together smlar nstances n the data sample Basc clusterng problem: dstrbute data nto k dfferent groups such that

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL

More information

A DATA MINING APPLICATION IN A STUDENT DATABASE

A DATA MINING APPLICATION IN A STUDENT DATABASE JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul

More information

Time Series Analysis in Studies of AGN Variability. Bradley M. Peterson The Ohio State University

Time Series Analysis in Studies of AGN Variability. Bradley M. Peterson The Ohio State University Tme Seres Analyss n Studes of AGN Varablty Bradley M. Peterson The Oho State Unversty 1 Lnear Correlaton Degree to whch two parameters are lnearly correlated can be expressed n terms of the lnear correlaton

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence 1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

Lecture 18: Clustering & classification

Lecture 18: Clustering & classification O CPS260/BGT204. Algorthms n Computatonal Bology October 30, 2003 Lecturer: Pana K. Agarwal Lecture 8: Clusterng & classfcaton Scrbe: Daun Hou Open Problem In HomeWor 2, problem 5 has an open problem whch

More information

MAPP. MERIS level 3 cloud and water vapour products. Issue: 1. Revision: 0. Date: 9.12.1998. Function Name Organisation Signature Date

MAPP. MERIS level 3 cloud and water vapour products. Issue: 1. Revision: 0. Date: 9.12.1998. Function Name Organisation Signature Date Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.

More information

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP) 6.3 / -- Communcaton Networks II (Görg) SS20 -- www.comnets.un-bremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

1 Example 1: Axis-aligned rectangles

1 Example 1: Axis-aligned rectangles COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

More information

8 Algorithm for Binary Searching in Trees

8 Algorithm for Binary Searching in Trees 8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

1 Approximation Algorithms

1 Approximation Algorithms CME 305: Dscrete Mathematcs and Algorthms 1 Approxmaton Algorthms In lght of the apparent ntractablty of the problems we beleve not to le n P, t makes sense to pursue deas other than complete solutons

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

Ring structure of splines on triangulations

Ring structure of splines on triangulations www.oeaw.ac.at Rng structure of splnes on trangulatons N. Vllamzar RICAM-Report 2014-48 www.rcam.oeaw.ac.at RING STRUCTURE OF SPLINES ON TRIANGULATIONS NELLY VILLAMIZAR Introducton For a trangulated regon

More information

320 The Internatonal Arab Journal of Informaton Technology, Vol. 5, No. 3, July 2008 Comparsons Between Data Clusterng Algorthms Osama Abu Abbas Computer Scence Department, Yarmouk Unversty, Jordan Abstract:

More information

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES The goal: to measure (determne) an unknown quantty x (the value of a RV X) Realsaton: n results: y 1, y 2,..., y j,..., y n, (the measured values of Y 1, Y 2,..., Y j,..., Y n ) every result s encumbered

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits Lnear Crcuts Analyss. Superposton, Theenn /Norton Equalent crcuts So far we hae explored tmendependent (resste) elements that are also lnear. A tmendependent elements s one for whch we can plot an / cure.

More information

IMPACT ANALYSIS OF A CELLULAR PHONE

IMPACT ANALYSIS OF A CELLULAR PHONE 4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

The Analysis of Outliers in Statistical Data

The Analysis of Outliers in Statistical Data THALES Project No. xxxx The Analyss of Outlers n Statstcal Data Research Team Chrysses Caron, Assocate Professor (P.I.) Vaslk Karot, Doctoral canddate Polychrons Economou, Chrstna Perrakou, Postgraduate

More information

How Much to Bet on Video Poker

How Much to Bet on Video Poker How Much to Bet on Vdeo Poker Trstan Barnett A queston that arses whenever a gae s favorable to the player s how uch to wager on each event? Whle conservatve play (or nu bet nzes large fluctuatons, t lacks

More information

The covariance is the two variable analog to the variance. The formula for the covariance between two variables is

The covariance is the two variable analog to the variance. The formula for the covariance between two variables is Regresson Lectures So far we have talked only about statstcs that descrbe one varable. What we are gong to be dscussng for much of the remander of the course s relatonshps between two or more varables.

More information

An Inductive Fuzzy Classification Approach applied to Individual Marketing

An Inductive Fuzzy Classification Approach applied to Individual Marketing An Inductve Fuzzy Classfcaton Approach appled to Indvdual Marketng Mchael Kaufmann, Andreas Meer Abstract A data mnng methodology for an nductve fuzzy classfcaton s ntroduced. The nducton step s based

More information

On Mean Squared Error of Hierarchical Estimator

On Mean Squared Error of Hierarchical Estimator S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,

More information

Abstract. 260 Business Intelligence Journal July IDENTIFICATION OF DEMAND THROUGH STATISTICAL DISTRIBUTION MODELING FOR IMPROVED DEMAND FORECASTING

Abstract. 260 Business Intelligence Journal July IDENTIFICATION OF DEMAND THROUGH STATISTICAL DISTRIBUTION MODELING FOR IMPROVED DEMAND FORECASTING 260 Busness Intellgence Journal July IDENTIFICATION OF DEMAND THROUGH STATISTICAL DISTRIBUTION MODELING FOR IMPROVED DEMAND FORECASTING Murphy Choy Mchelle L.F. Cheong School of Informaton Systems, Sngapore

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION NEURO-FUZZY INFERENE SYSTEM FOR E-OMMERE WEBSITE EVALUATION Huan Lu, School of Software, Harbn Unversty of Scence and Technology, Harbn, hna Faculty of Appled Mathematcs and omputer Scence, Belarusan State

More information

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching) Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton

More information

Passive Filters. References: Barbow (pp 265-275), Hayes & Horowitz (pp 32-60), Rizzoni (Chap. 6)

Passive Filters. References: Barbow (pp 265-275), Hayes & Horowitz (pp 32-60), Rizzoni (Chap. 6) Passve Flters eferences: Barbow (pp 6575), Hayes & Horowtz (pp 360), zzon (Chap. 6) Frequencyselectve or flter crcuts pass to the output only those nput sgnals that are n a desred range of frequences (called

More information

Cluster Analysis. Cluster Analysis

Cluster Analysis. Cluster Analysis Cluster Analyss Cluster Analyss What s Cluster Analyss? Types of Data n Cluster Analyss A Categorzaton of Maor Clusterng Methos Parttonng Methos Herarchcal Methos Densty-Base Methos Gr-Base Methos Moel-Base

More information

Calculating the high frequency transmission line parameters of power cables

Calculating the high frequency transmission line parameters of power cables < ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,

More information

where the coordinates are related to those in the old frame as follows.

where the coordinates are related to those in the old frame as follows. Chapter 2 - Cartesan Vectors and Tensors: Ther Algebra Defnton of a vector Examples of vectors Scalar multplcaton Addton of vectors coplanar vectors Unt vectors A bass of non-coplanar vectors Scalar product

More information

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES Zuzanna BRO EK-MUCHA, Grzegorz ZADORA, 2 Insttute of Forensc Research, Cracow, Poland 2 Faculty of Chemstry, Jagellonan

More information

Improved Mining of Software Complexity Data on Evolutionary Filtered Training Sets

Improved Mining of Software Complexity Data on Evolutionary Filtered Training Sets Improved Mnng of Software Complexty Data on Evolutonary Fltered Tranng Sets VILI PODGORELEC Insttute of Informatcs, FERI Unversty of Marbor Smetanova ulca 17, SI-2000 Marbor SLOVENIA vl.podgorelec@un-mb.s

More information

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008 Rsk-based Fatgue Estmate of Deep Water Rsers -- Course Project for EM388F: Fracture Mechancs, Sprng 2008 Chen Sh Department of Cvl, Archtectural, and Envronmental Engneerng The Unversty of Texas at Austn

More information

Minimal Coding Network With Combinatorial Structure For Instantaneous Recovery From Edge Failures

Minimal Coding Network With Combinatorial Structure For Instantaneous Recovery From Edge Failures Mnmal Codng Network Wth Combnatoral Structure For Instantaneous Recovery From Edge Falures Ashly Joseph 1, Mr.M.Sadsh Sendl 2, Dr.S.Karthk 3 1 Fnal Year ME CSE Student Department of Computer Scence Engneerng

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1. HIGHER DOCTORATE DEGREES SUMMARY OF PRINCIPAL CHANGES General changes None Secton 3.2 Refer to text (Amendments to verson 03.0, UPR AS02 are shown n talcs.) 1 INTRODUCTION 1.1 The Unversty may award Hgher

More information

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems STAN-CS-73-355 I SU-SE-73-013 An Analyss of Central Processor Schedulng n Multprogrammed Computer Systems (Dgest Edton) by Thomas G. Prce October 1972 Techncal Report No. 57 Reproducton n whole or n part

More information

x f(x) 1 0.25 1 0.75 x 1 0 1 1 0.04 0.01 0.20 1 0.12 0.03 0.60

x f(x) 1 0.25 1 0.75 x 1 0 1 1 0.04 0.01 0.20 1 0.12 0.03 0.60 BIVARIATE DISTRIBUTIONS Let be a varable that assumes the values { 1,,..., n }. Then, a functon that epresses the relatve frequenc of these values s called a unvarate frequenc functon. It must be true

More information

Performance Analysis and Coding Strategy of ECOC SVMs

Performance Analysis and Coding Strategy of ECOC SVMs Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04), pp.67-76 http://dx.do.org/0.457/jgdc.04.7..07 Performance Analyss and Codng Strategy of ECOC SVMs Zhgang Yan, and Yuanxuan Yang, School

More information

Formation of probabilistic concepts through observations containing. discrete and continuous attributes.

Formation of probabilistic concepts through observations containing. discrete and continuous attributes. Formaton of probablstc concepts through observatons contanng dscrete and contnuous attrbutes Rcardo Batsta Rebouças, João José Vasco Furtado Mestrado em Informátca Aplcada (MIA) Unversdade de Fortaleza

More information

greatest common divisor

greatest common divisor 4. GCD 1 The greatest common dvsor of two ntegers a and b (not both zero) s the largest nteger whch s a common factor of both a and b. We denote ths number by gcd(a, b), or smply (a, b) when there s no

More information

ErrorPropagation.nb 1. Error Propagation

ErrorPropagation.nb 1. Error Propagation ErrorPropagaton.nb Error Propagaton Suppose that we make observatons of a quantty x that s subject to random fluctuatons or measurement errors. Our best estmate of the true value for ths quantty s then

More information

+ + + - - This circuit than can be reduced to a planar circuit

+ + + - - This circuit than can be reduced to a planar circuit MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to

More information

Automated information technology for ionosphere monitoring of low-orbit navigation satellite signals

Automated information technology for ionosphere monitoring of low-orbit navigation satellite signals Automated nformaton technology for onosphere montorng of low-orbt navgaton satellte sgnals Alexander Romanov, Sergey Trusov and Alexey Romanov Federal State Untary Enterprse Russan Insttute of Space Devce

More information

CHAPTER 14 MORE ABOUT REGRESSION

CHAPTER 14 MORE ABOUT REGRESSION CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp

More information

QUANTUM MECHANICS, BRAS AND KETS

QUANTUM MECHANICS, BRAS AND KETS PH575 SPRING QUANTUM MECHANICS, BRAS AND KETS The followng summares the man relatons and defntons from quantum mechancs that we wll be usng. State of a phscal sstem: The state of a phscal sstem s represented

More information

Estimating the Number of Clusters in Genetics of Acute Lymphoblastic Leukemia Data

Estimating the Number of Clusters in Genetics of Acute Lymphoblastic Leukemia Data Journal of Al Azhar Unversty-Gaza (Natural Scences), 2011, 13 : 109-118 Estmatng the Number of Clusters n Genetcs of Acute Lymphoblastc Leukema Data Mahmoud K. Okasha, Khaled I.A. Almghar Department of

More information

2.4 Bivariate distributions

2.4 Bivariate distributions page 28 2.4 Bvarate dstrbutons 2.4.1 Defntons Let X and Y be dscrete r.v.s defned on the same probablty space (S, F, P). Instead of treatng them separately, t s often necessary to thnk of them actng together

More information

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management

More information

Implementation of Deutsch's Algorithm Using Mathcad

Implementation of Deutsch's Algorithm Using Mathcad Implementaton of Deutsch's Algorthm Usng Mathcad Frank Roux The followng s a Mathcad mplementaton of Davd Deutsch's quantum computer prototype as presented on pages - n "Machnes, Logc and Quantum Physcs"

More information

A Dynamic Load Balancing for Massive Multiplayer Online Game Server

A Dynamic Load Balancing for Massive Multiplayer Online Game Server A Dynamc Load Balancng for Massve Multplayer Onlne Game Server Jungyoul Lm, Jaeyong Chung, Jnryong Km and Kwanghyun Shm Dgtal Content Research Dvson Electroncs and Telecommuncatons Research Insttute Daejeon,

More information

Lecture 2: Single Layer Perceptrons Kevin Swingler

Lecture 2: Single Layer Perceptrons Kevin Swingler Lecture 2: Sngle Layer Perceptrons Kevn Sngler kms@cs.str.ac.uk Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses

More information

Generalizing the degree sequence problem

Generalizing the degree sequence problem Mddlebury College March 2009 Arzona State Unversty Dscrete Mathematcs Semnar The degree sequence problem Problem: Gven an nteger sequence d = (d 1,...,d n ) determne f there exsts a graph G wth d as ts

More information

Network Security Situation Evaluation Method for Distributed Denial of Service

Network Security Situation Evaluation Method for Distributed Denial of Service Network Securty Stuaton Evaluaton Method for Dstrbuted Denal of Servce Jn Q,2, Cu YMn,2, Huang MnHuan,2, Kuang XaoHu,2, TangHong,2 ) Scence and Technology on Informaton System Securty Laboratory, Bejng,

More information

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features On-Lne Fault Detecton n Wnd Turbne Transmsson System usng Adaptve Flter and Robust Statstcal Features Ruoyu L Remote Dagnostcs Center SKF USA Inc. 3443 N. Sam Houston Pkwy., Houston TX 77086 Emal: ruoyu.l@skf.com

More information

Mining Multiple Large Data Sources

Mining Multiple Large Data Sources The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of

More information

A Probabilistic Theory of Coherence

A Probabilistic Theory of Coherence A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want

More information

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Research Note APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES * Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789-794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC

More information

SCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS

SCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS SCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS Magdalena Rogalska 1, Wocech Bożeko 2,Zdzsław Heduck 3, 1 Lubln Unversty of Technology, 2- Lubln, Nadbystrzycka 4., Poland. E-mal:rogalska@akropols.pol.lubln.pl

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n

More information

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

Study on Model of Risks Assessment of Standard Operation in Rural Power Network Study on Model of Rsks Assessment of Standard Operaton n Rural Power Network Qngj L 1, Tao Yang 2 1 Qngj L, College of Informaton and Electrcal Engneerng, Shenyang Agrculture Unversty, Shenyang 110866,

More information

Research on Transformation Engineering BOM into Manufacturing BOM Based on BOP

Research on Transformation Engineering BOM into Manufacturing BOM Based on BOP Appled Mechancs and Materals Vols 10-12 (2008) pp 99-103 Onlne avalable snce 2007/Dec/06 at wwwscentfcnet (2008) Trans Tech Publcatons, Swtzerland do:104028/wwwscentfcnet/amm10-1299 Research on Transformaton

More information

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success

More information

Extending Probabilistic Dynamic Epistemic Logic

Extending Probabilistic Dynamic Epistemic Logic Extendng Probablstc Dynamc Epstemc Logc Joshua Sack May 29, 2008 Probablty Space Defnton A probablty space s a tuple (S, A, µ), where 1 S s a set called the sample space. 2 A P(S) s a σ-algebra: a set

More information

Performance Management and Evaluation Research to University Students

Performance Management and Evaluation Research to University Students 631 A publcaton of CHEMICAL ENGINEERING TRANSACTIONS VOL. 46, 2015 Guest Edtors: Peyu Ren, Yancang L, Hupng Song Copyrght 2015, AIDIC Servz S.r.l., ISBN 978-88-95608-37-2; ISSN 2283-9216 The Italan Assocaton

More information

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Internatonal Journal of Electronc Busness Management, Vol. 3, No. 4, pp. 30-30 (2005) 30 THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Yu-Mn Chang *, Yu-Cheh

More information

IT PROJECT METRICS. Projects and Programs Evaluation. Risks, resources, activities, portfolio and project management. 1. IT projects.

IT PROJECT METRICS. Projects and Programs Evaluation. Risks, resources, activities, portfolio and project management. 1. IT projects. IT PROJECT METRICS Ion IVAN PhD, Unversty Professor, Department of Economc Informatcs Unversty of Economcs, Bucharest, Romana Author of more than 25 books and over 75 journal artcles n the feld of software

More information

IDENTIFICATION AND CONTROL OF A FLEXIBLE TRANSMISSION SYSTEM

IDENTIFICATION AND CONTROL OF A FLEXIBLE TRANSMISSION SYSTEM Abstract IDENTIFICATION AND CONTROL OF A FLEXIBLE TRANSMISSION SYSTEM Alca Esparza Pedro Dept. Sstemas y Automátca, Unversdad Poltécnca de Valenca, Span alespe@sa.upv.es The dentfcaton and control of a

More information

A GENERAL APPROACH FOR SECURITY MONITORING AND PREVENTIVE CONTROL OF NETWORKS WITH LARGE WIND POWER PRODUCTION

A GENERAL APPROACH FOR SECURITY MONITORING AND PREVENTIVE CONTROL OF NETWORKS WITH LARGE WIND POWER PRODUCTION A GENERAL APPROACH FOR SECURITY MONITORING AND PREVENTIVE CONTROL OF NETWORKS WITH LARGE WIND POWER PRODUCTION Helena Vasconcelos INESC Porto hvasconcelos@nescportopt J N Fdalgo INESC Porto and FEUP jfdalgo@nescportopt

More information

Sketching Sampled Data Streams

Sketching Sampled Data Streams Sketchng Sampled Data Streams Florn Rusu, Aln Dobra CISE Department Unversty of Florda Ganesvlle, FL, USA frusu@cse.ufl.edu adobra@cse.ufl.edu Abstract Samplng s used as a unversal method to reduce the

More information

Efficient Project Portfolio as a tool for Enterprise Risk Management

Efficient Project Portfolio as a tool for Enterprise Risk Management Effcent Proect Portfolo as a tool for Enterprse Rsk Management Valentn O. Nkonov Ural State Techncal Unversty Growth Traectory Consultng Company January 5, 27 Effcent Proect Portfolo as a tool for Enterprse

More information