TAXONOMIC EVIDENCE APPLYING ALGORITHMS OF INTELLIGENT DATA MINING. ASTEROIDS FAMILIES


 Cameron Perkins
 1 years ago
 Views:
Transcription
1 TAXONOMIC EVIDENCE APPLYING ALGORITHMS OF INTELLIGENT DATA MINING. ASTEROIDS FAMILIES Gregoro Perchnsky(1) Magdalena Servente(2) Arturo Carlos Servetto(1) Ramón García Martínez(3,2) Rosa Beatrz Orellana(4) Angel Lus Plastno (5) Databases and Operatng System Laboratory Computer Scence Department School of Engneerng Unversty of Buenos Ares Paseo Colón Nº th Floor South Wng (1063) Buenos Ares Argentna Phone: (54 11) (nt. 140/145) FAX: (54 1) (2) Intellgent System Laboratory Computer Scence Computer Scence Department School of Engneerng Unversty of Buenos Ares Paseo Colón Nº th Floor South Wng (1063) Buenos Ares Argentna Phone: (54 11) (nt. 140/145) FAX: (54 1) (3) Software Egneerng & Knowledge Engneerng Center (CAPIS). Graduate School Buenos Ares Insttute of Technology Madero 399. (1106) Buenos Ares  Argentna Phone: (5411) FAX: (54 1) ext 277 (4) Mechancs Laboratory Celestal Mechancs Department School of Astronomcal and Geophyscal Scences Unversty of La Plata Paseo del Bosque (1900) La Plata  Buenos Ares  Argentna Phone: (54 221) (5) PROTEM Laboratory Department of Physcal Scences School of Scences  Unversty of La Plata C.C. 727 or (115 # 48/49) (1900) La Plata Buenos Ares  Argentna Phone: (54 221) (54 221) (ext. 247) KEYWORDS: classfcaton, cluster (famly), spectrum, nducton, dvde and rule, entropy. ABSTRACT Numercal Taxonomy ams to group n clusters, usng socalled structure analyss of operatonal taxonomc unts (OTUs or taxons or taxa) through numercal methods. Clusters that constute famles was the purpose of ths seres of last projects. Structural analyss, based on ther phenotypc characterstcs, exhbts the relatonshps, n terms of degrees of smlarty, between two or more OTUs. Enttes formed by dynamc domans of attrbutes, change accordng to taxonomcal requrements: Classfcaton of objects to form famles. Taxonomc objects are represented by semantcs applcaton of Dynamc Relatonal Database Model. Famles of OTUs are obtaned employng as tools ) the Eucldean dstance and ) nearest neghbor technques. Thus taxonomc evdence s gathered so as to quantfy the smlarty for each par of OTUs (pargroup method) obtaned from the basc data matrx. The man contrbuton up untl now s to ntroduce the concept of spectrum of the OTUs, based n the states of ther characters. The concept of famles spectra emerges, f the superposton prncple s appled to the spectra of the OTUs, and the groups are delmted through the maxmum of the BenayméTchebycheff relaton, that determnes Invarants (centrod, varance and radus). A new taxonomc crteron s thereby formulated. An astronomc applcaton s worked out. The result s a new crteron for the classfcaton of asterods n the hyperspace of orbtal proper elements. Thus, a new approach to Computatonal Taxonomy s presented, that has been already employed wth reference to Data Mnng. Ths paper analyses the applcaton of Machne Learnng technques to Data Mnng. We focused our nterest on the TDIDT (Top Down Inducton Trees) nducton famly from preclassfed data, and n partcular to the ID3 and the C4.5 algorthms, created by Qunlan. We tred to determne the degree of effcency acheved by the TDIDT famly s algorthms when appled n data mnng to generate vald models of the data n classfcaton problems wth the Gan of Entropy. The Informatcs (Data Mnng and Computatonal Taxonomy), s always the orgnal objectve of our researches. 1. Introducton Taxonomc objects are here represented by the applcaton of the semantcs of the Dynamc Relatonal Database Model: Classfcaton of objects to form famles or clusters[1]. Famles of OTUs are obtaned employng as tools ) the Eucldean dstance and ) nearest neghbor technques. Thus taxonomc evdence s gathered so as to quantfy the smlarty for each par of OTUs (pargroup method) obtaned from the basc data matrx[2][3][4].the man contrbuton of the seres
2 of papers presented untl now was to ntroduce the concept of spectrum of the OTUs, based n the states of ther characters. The concept of famles spectra emerges, f the superposton prncple s appled to the spectra of the OTUs, and the groups are delmted through the maxmum of the BenayméTchebycheff relaton, that determnes Invarants (centrod, varance and radus) [1]. Applyng the ntegrated, ndependent doman technque dynamcally to compute the Matrx of Smlarty, and, by recourse to an teratve algorthm, famles or clusters are obtaned. A new taxonomc crteron was thereby formulated. The consderable dscrepances among the ncongrutes and exstng classfcatons of astrophyscal study results have motvated an nterdscplnary program of research that notces a clusterng of asterods n stablzed famles [5]. In our case, s worked n an nterdscplnary way n Celestal Mechancs[5], Theory of the Informaton[6][7], Neural Networks[8] and Dynamc Databases [1] and the Algorthmc of the Numercal Taxonomy [2] [4], to acheve the dscovery of the depths of the structure formaton of the Solar An astronomc applcaton s worked out. The result s a new crteron for the classfcaton of asterods n the hyperspace of orbtal proper elements. Thus, a new approach to Computatonal Taxonomy s presented, that has been already employed wth reference to Data Mnng. On the other hand: () the work of [1] has clarfed subtle ponts concernng the dynamc evoluton n the longterm of the asterods orbts, whose modelng s an essental prerequste for the proper elements dervng (for the classfcaton n famles); and () the avalablty of physcal data on szes, shapes, numercal taxonomy and rotaton velocty to many hundred asterods has provoked new famles analyses [1]. Whle the most populous famles appear n both crtera n qute homogeneous form, the crteron of the composton and physcal precedents and cosmochemcal, s a crteron wth more or less dffculty and the crteron whch wth less dffculty has dentfed famles s that one whch uses data from celestal mechancs. We do not consder n the transformaton of sotropc and homogeneous sets, changng the values of the eccentrcty and the semaxs to recompute the values of the zones of ntergap of the asterods belt nto the veloctes n average, or elmnatng groups from 5 or fewer objects, all of whch we consder are outsde a Computatonal crteron. 1.1 Intellgent Data Mnng Introducton Machne Learnng s the feld dedcated to the development of computatonal methods underlyng learnng processes and to applyng computerbased learnng systems to practcal problems. Data Mnng tres to solve those problems related to the search of nterestng patterns and mportant regulartes n large databases [9] [[10]..[15]]. Data Mnng uses methods and strateges from other areas, ncludng Machne Learnng. When we apply Machne Learnng technques to solve a Data Mnng problem, we refer to t as an Intellgent Data Mnng. Ths paper analyses the TDIDT (Top Down Inducton Trees) nducton famly, and n partcular to the C4.5 algorthm[13b][14]. We tred to determne the degree of effcency acheved by the C4.5 algorthm when appled n data mnng to generate vald models of the data n classfcaton problems wth the Gan of Entropy. The C4.5 algorthm generate decson trees and decson rules from preclassfed data. The dvde and rule method s used to buld the decson trees. Ths method dvdes the nput data n subsets accordng to some preestablshed crtera. Then t works on each of these subsets dvdng them agan, untl all the cases present n one subset belong to the same class. 2. Constructng the decson trees 2.1. ID3 The Inducton Decson Trees algorthm was developed as a supervsed learnng method, for buld decson trees from a set of examples. The examples must have a group of attrbutes and a class. The attrbutes and classes must be dscrete, and the classes must be dsjont. The frst versons of ths algorthms allowed just two classes: postve and negatve. Ths restrcton was elmnated n later releases, but the dsjont classes restrcton was preserved. The descrptons generated by ID3 cover each one of the examples n the tranng set C4.5 The C4.5 algorthm s a descendant of the ID3 algorthm, and solves many of ts predecessor s lmtatons. For example, the C4.5 works wth contnuous attrbutes, by dvdng the possble results n two branches: one for those values A <=N and another one for A >N. Moreover, the trees are less bushy because each leaf covers a dstrbuton of classes and not one class n partcular as the ID3 trees, ths makes trees less profound and more understandable[13b][14]. C4.5 generates a decson tree parttonng the data recursvely, accordng to the depthfrst strategy. Before makng each partton, the system analyses all the possble tests that can dvde the data set and selects the test wth the hgher nformaton gan or the hgher gan rato. For dscrete attrbutes, t consders a test wth n possble outcomes, n beng the amount of possble values that the attrbute can take. For contnuous attrbute, a bnary test s performed on each of the values that the attrbute can take.
3 2.3. Decson trees The trees TDIDT, to those whch belong generated them by the ID3 and post C4.5, are bult from method of Hunt.The ID3 and C4.5 algorthms use the dvde and rule strategy to buld the ntal decson tree from the tranng data [16]. The form of ths method to buld a decson tree as of a set T of tranng data, dvdes the data n each step accordng to the values of the best attrbute. Any test that dvdes T n a non trval manner, as long as two dfferent {T } are not empty, s very smple. They wll be the classes {C 1, C 2,..., C k }. T contans cases belongng to several classes, n ths case, the dea s to refne T n subsets of cases that tend, or seem to tend toward a collecton of cases belongng to an only class. It s chosen a test based on an only attrbute, that has one or more resulted, mutually excludng {O 1, O 2,..., O n }. T s partton of the subsets T 1, T 2,..., T n where T contans all the cases of T that have the result O for the elected test. The decson tree for T conssts n a node of decson dentfyng the test, wth a branch for each possble result. The constructon mechansm of the tree s appled recursvely to each subset of tranng data, so that the th branch carry to the decson tree bult by the subset T of tranng data. Stll, the ultmate objectve behnd the process of constructng the decson tree sn t just to fnd any decson tree, but to fnd a decson tree that reveals a certan structure of the doman, that s to say, a tree wth predctve power. That s the reason why each leave must cover a large number of cases, and why each partton must have the smallest possble number of classes. In an deal case, we would lke to choose n each step the test that generates the smallest decson tree. Bascally, what we are lookng for s a small decson tree consstent wth the tranng data. We could explore and analyze all the possble decson trees and choose the smplest one. However, the searchng and hypothess space has an exponental number of trees that would have to be explored. The problem of fndng the smallest decson tree consstent wth the tranng data has NPcomplexty. To calculate whch s the best attrbute to dvde the data n each step, both the nformaton gan and the gan rato were used. Moreover, the trees generated wth the C4.5 algorthm were pruned accordng to the method, ths postprunng was made n order to avod the overfttng of the data Transformng decson trees to decson rules Decson trees that are too bg or too bushy are somewhat dffcult to read and understand because each node must be nterpreted n the context defned by the prevous branches. In any decson tree, the condtons that must be satsfed when classfyng a case can be found followng a tral from the root to the leave to whch that case belongs. If that tral was transformed drectly nto a producton rule, the antecedent of the rule would be the conjuncton of all the tests n the nodes that must be traversed to reach the leaf. All the antecedents of the rules bult ths way are mutually exclusve and exhaustve. To transform a tree to decson rules, the C4.5 algorthm traverses the decson tree n preorder (from the root to the leaves, from left to rght) and constructs a rule for each path from the root to the leaves. The rule s antecedent s the conjuncton of the value tests belongng to each of the vsted nodes, and the class s the one correspondng to the leaf reached Evaluaton of the TDIDT famly We used a crossedvaldaton approach to evaluate the decson trees and the producton rules obtaned. Each dataset was dvded nto two sets wth proportons 2:3 and 1:3. We used two thrds of the orgnal data as a tranng set and one thrd to evaluate the results. We expressed the results of these tests n a confuson matrx, where each class had two values assocated to t: the number of examples classfed correctly and the number of examples classfed as belongng to another class. 3. Requrements engneerng Hrayama Examnng the dstrbuton of the asterods wth respect to ther orbtal elements, n partcular ther prncpal movement, the nclnaton and the eccentrcty, are observed condensatons n dfferent places that seem at random, but there are some cases n whch takng nto account only the quanttes of the probablty s not so evdent [1]. The asterods are also grouped by havng nearby nclnatons or the plans of the orbtal have practcally the same pole (that of the orbt of Jupter), other groupngs do not have the same center but the drawng of the graph takng the eccentrcty and the length of the perhelon nstead of the nclnaton and the length of the node dstrbuton has the shape of a crcumference. Contnung the development of the mentoned theory do not exst doubts of the fact that there are physcal relatonshps that connect the asterods. Because of ths t s that we can venture that there exst assocated asterod famles. The theory remans verfed and thus the famles tranng such as KORONIS (fhn158), EOS (fhn221), THEMIS (fhn 24), FLORA (fhn244), MARIA (fhn170) and PHOCAEA (fhn25) (where fhn s famly head number). The orbtal elements dstrbuton n asterod belts s not at random showng the famles exstence, such that the groups of asterods whose semmajoraxs, ther eccentrcty and ther nclnaton (or the sne of the same) are approxmated to a cluster for certan specal values followng to Arnold (about 1969 there was less than 1735
4 objects) [1]. It has been verfed the agglomeraton n famles (clusterng) correctng the perturbaton perodc produced by secular varatons caused by the major planets, lke Jupter, takng the proper elements. Other groupngs have been dentfed by proper resonance characterstcs or current of mpelled asterods (JET STREAMS) through the FLORA famly and objects that cross MARS n orbts of superor order eccentrcty. Takng nto account that Celestal Bodes are based on physcal attrbutes, on phenotypc characterstc of characters or attrbutes of the asterods and fnally on ther genotypc or common orgn. Nearby vcnty condton should be taken account and the hgh densty famles are the most stable and less random. Famles of Hrayama are confrmed and the small famles are of low densty and the probablty to belong to the famles s hgh and therefore ther couplng by the pargroup method s possble. About 1982, Carus and Valsech there s a record of 2125 smaller planets, asterod type, groupng whch produce dscrepances n the results of the classfcaton computatonal methods based on physcal and dynamcal parameters [1]. Ths dscrepancy among the statstc methods s dsconcertng snce the relatonshp among the members of a famly wth respect to the dynamcal parameters and any physcal study that s accomplshed on the same should be concurrent. It can be observed that the growth n observatons does not solve the dscrepances. Of the methods of famles dentfcaton the dscrepances emerge by ther probablst crtera and the future new asterods dscovery seem that exsts a contradcton between them, but n spte of all ths, f there s congruty, the suspected famles appear n the realty (scentfc method of contrast) but f the methods are arbtrary they are always debatable n addton to the methodologcal doubt [the authors]. For Wllams the problem of Arnold was already dscussed n functon of ther crteron of dstrbuton densty unform Possonan and the proper elements. In the 1980s the analyss technques by smlarty and a generalzed dstance but wth the use of personal judgements or manual managng s what s usual and not an automatc classfcaton. Because of ths appears the consderaton of the varance (σ j ) of the domans and famles for the process of elements dentfcaton wthn the famly or the subsequent. The accepted classes have been splt nto two types: 1), f the class has been dentfed n two ntervals, wthout notceable dfferences and 2), f the class was found mxed couplng wth other less mportant classes n overlap ntervals, beng able to exst masked famles or less relable contours, these aspects should emerge of the proper statstc method. These projects of the Jet Propulson Laboratory, Calforna Insttute of Technology, gave as a result crossng orbts of major planets and that are splt nto famles, by the characterstc of the method. A characterstc s that the strong resonance does not appear n asterod and the weak one s taken as nose. The dstances are taken from a rght lne SUNPLANET (Mars MXR, Jupter JXR, Saturn SXR, etc.) and the proper values are more exact wthn belt than outsde t (somethng whch endorses the theory of the authors). For Knezevc and Mlan the proper asterod elements of an analytcal theory of second order, of asterods dentfed n the prncpal belt (manbelt), are much more exact than those of eccentrcty and small nclnaton n the regon of the famly Thems. Ths s because the short perodcal perturbatons are elmnated and are taken nto account the prncpal second dependent order effects, accordng to the results of the consstent algorthm wth the modern dynamc theores of KolmogorovArnoldMoser, they are about 3495 asterods of the edton of the Lenngrad Ephemerdes of the Mnor Planets. Hldas, Troyanas and the nearby to the Earth (q < 1.1 u.a.) were dscarded. All ths development appears less clear and arbtrary, there s not a formal bass n the relatonshp convergence quantty of teratons (code of qualty QC) and the number of asterods. The crteron of Zappala,Cellno,Farnella and Knezevc (1992 and subsequent) s mportant snce an mproved asterods classfcaton was noted n dynamc famles, analyzng a numbered asterods database, whose proper elements have been computed n a new secondorder, fourthdegree secular perturbaton theory by, and verfed ther stablty n the long term. The multvarate crteron uses the technque of herarchc clusterng data analyss. It was appled to buld for each zone of the asterods belt a "dendrogram, graph, n the proper elements space, wth a dstance n functon related to the necessary ncremental velocty of the orbtal change after the ejecton from the fractonal parent body. The parameters of mportance assocated wth each famly, measured as random concentratons results, (as to transform the zones ansotropy and nhomogeneous nto homogeneous zones and sotropy of the ntergaps zones n the asterods belt modfyng mechancal attrbutes as the semmajoraxs and the nclnaton) and the hardness parameters (stablty), were obtaned repeatng the classfcaton procedure after varyng the velocty elements n small quanttes to recompute the real zones from the calculatons wth the artfcal changng of the coeffcents of the dstance functon. The most mportant and healthy famles are as usual Thems, Eos, and Korons, that jontly nclude 14% of the known prncpal belt of the populaton; but 12 more relable and healthy famles that were found throughout the belt, the majorty departed partally of prevous classfcatons. It s the case of FLORA n the regon of the nteror belt, gvng rse for a very dffcult relable famles dentfcaton, manly when have a hgh densty and the
5 accuracy of the nclnatons and proper eccentrctes s poor manly on account of the proxmty of a strong secular resonance. It s arrved thus to consttute 21 famles wth an actually mportant method and totally automated methods Spectral analyss classfcaton crteron We have decded to accomplsh wth our spectral analyss crteron, the classfcatons extended to the proper elements database of asterods n famles[1]. We recognze that the works of Zappala are very mportant (automatc classfcaton and herarchc method), and a pont of nflecton n the early 90 s but s dfferent the approach because we work n computatonal taxonomy, n a taxonomc hyperspace, and not n a crteron of the composton and physcal precedents and cosmochemcal. Zappala use a confusng methodology, wth only one varable of velocty, and that transforms a homogeneous space nto nhomogeneous one and conversely not clearly unvocal. Incorporatng thus an updated and larger set of osculatng elements that were derved from the secular perturbaton theory, whose accuracy (specfcally, the stablty n the tme) has been extensvely verfed by numercal ntegraton n the longterm; n automatc form, and to prejudce the technque of data analyss n notrandom groups s not used n the proper elements space as n the crteron of Zappala and quanttatvely the statstcal mportance of these groups; wth robustness of the statstcs for the mportant famles wth respect to the small random varatons of proper elements, all based on an analyss on Computatonal Taxonomy. We do not consder n the transformaton of sotropc and homogeneous sets, changng the values of the eccentrcty and the semaxs to recompute the values of the zones of ntergap of the asterods belt nto the veloctes n average, or elmnatng groups from 5 or fewer objects, all of whch we consder are outsde a Computatonal crteron. Thus, a new approach to Computatonal Taxonomy s presented, that has been already employed wth reference to Data Mnng Numercal Taxonomy. We nfer an analogy of the taxonomc representaton [1] n dynamc relatonal database. We explan the theoretcal development of a doman s structured Database and how they can be represented n a Dynamc Database. Immedately we apply our model to the structural aspects of the taxonomy, applyng Scalng Methods for domans[2] [4]. We defne numercal methods used for establshng and defnng clusters by ther taxonomc dstances. We shall let C jk stand for a general dssmlarty coeffcent of whch taxonomc dstance, d jk, s a specal example. Eucldean dstances wll be used n the explanaton of clusterng technques. In dscussng clusterng procedures we make a useful dstncton between three types of measure. We use clusterng strategy of spaceconservng or the spacedstortng strateges that appears as though the space n the mmedate vcnty of a cluster has been contracted or dlated and f we return to the crteron of admsson for a canddate jonng an extant cluster, ths s constant n all pargroup method. Thus we can represent the data matrx and to compute the resemblance of normalzed domans. The steps of clusterng are the recomputaton of the coeffcent of smlarty for future admsson followed by the admsson crteron for new members to an establshed cluster. The strateges of both spaceconservng and spacedstortng that appear n the mmedate vcnty of a cluster ether contract or dlate the space, and ths s constant n all pargroup methods [1] Dsperson Once a typcal value t s known of the varable of the states of the characters, t s necessary to have a parameter that gve an dea of how scattered, or concentrated, are ther values respect to the mean value[19]. It s consdered to the varance as a moment of second order and represents the moment of nerta of the dstrbuton of objects ( mass ) wth respect to ther gravty center: centrod. When X j = ( Xj  Xj ) / σj [2] s a normalzed varable the one whch represents the devaton of Xj wth respect to ther mean n unts of σj. The normalzaton of the states of the character causes that the average of all character wll be of value zero and varance of untary value. If we take as value of the dsperson to the varance σ 2 d, we express the prncple of mnmal square. It wll be g ( Xj ) a not negatve functon of the varable Xj, for all k > 0 wll have to be the probablty functon: If g ( Xj ) = ( Xj  Xj ) 2, K = k 2 σj 2, obtanng for all k > 0 the nequalty from BenayméTchevcheff: P ( Xj  Xj k. σj ) 1 / k 2 Ths nequalty shows that the quantty of ( OTUs ) mass of the located dstrbuton would be of the nterval Xj  k. σj < Xj < Xj + k. σj t s to what s maxmal value equal to 1 / k 2, gvng a utlzaton dea of σj as measure of the dsperson or concentraton Clusters and Spectra. In dscussng Sequental, Agglomeratve, Herarchc and Nonoverlappng (SAHN) [4] clusterng procedures we
6 make a useful dstncton between the three types of measure. We shall be concerned wth clusters J,K and L contanng tj, tk and tl OTUs, respectvely, where tj, tk and tl all 1. OTUs j and k are contaned n clusters J and K, and l L, respectvely. Gven two clusters J and K that are to be joned, the problem s to evaluate the dssmlarty between the resultng jont cluster and addtonal canddates L for further fuson. The fused cluster s denoted (J,K), wth t j,k = t j + t k OTUs. The cluster center or centrod represents an average object, whch s smply a mathematcal construct that permts the characterzaton of the Densty, the Varance, the taxon radus and the range as INVARIANT quanttes. The states of the taxonomc characters n a class, defned ordnarly wth reference to the set of ther propertes, allow one to calculate the dstances between the members of the class. The dstances can be establshed by the smlarty relatonshp among ndvduals (obtanng a matrx of smlarty that has been computed). Consderng characterstc spectra [1], n addton to the states of the characters or attrbutes of the OTUs, we ntroduce here the new SPECTRAL concepts of )OBJECTS and )FAMILY SPECTRA. Wthn the taxonomc space ths method of clusterng delmts taxonomc groups n such a manner that they can be vsualzed as characterstc spectra of an OTU and characterstc spectra of the famles. We defne an ndvdual spectral metrc for the set of dstances between an OTU and the other OTUs of the set. Each one provdes the states of the characters and, therefore, s constant for each OTU, f the taxonomc condtons do not change (n analogy wth the fasors) havng an ndvdual taxonomc spectrum (ITS). The spectrum of taxonomc smlarty s the set of dstances between the OTUs of the set, that determne the constant characterstcs of a cluster or famly, for a gven type of taxonomc condtons. Invarants are found that characterze each cluster. Among them we menton the varance, the radus, the densty and the centrod. These nvarants are assocated wth the spectra of taxonomc smlarty that dentfy each famly Tests of Intellgent Data Mnng A software system was constructed to evaluate the C4.5 algorthm. Ths system takes the tranng data as an nput and allows the user to choose whether he wants to construct a decson tree accordng to the C4.5. If the user chooses the C4.5, the decson tree s generated, then t s pruned and the decson rules are bult. The decson tree and the ruleset generated by the C4.5 are evaluated separate from each other. We use the system to test the algorthms n dfferent domans, manly Elta: a base of asterods Compute of the Informaton Gan In the cases, n those whch the set T contans examples belongng to dfferent classes, s accomplshed a test on the dfferent attrbutes and s accomplshed a partton accordng to the "better" attrbute. To fnd the "better" attrbute, s used the theory of the nformaton, that supports that the nformaton s maxmzed when the entropy s mnmzed. The entropy determnes the randomness or dsorder of a set. We suppose that we have negatve and postve examples. In ths context the entropy of the subset S, H(S ), t can be calculated as: + + H ( S ) = p log p p log p (3.4.1) + Where p s the probablty of a example s taken n random mode of S wll be postve. Ths probablty may be calculated as + + n p = (3.4.2) + n + n Beng + n the quantty of postves examples of S, and n the quantty of negatves examples. + The probablty p s calculated n analogous form to p, replacng the quantty of postves examples by the quantty of negatves examples, and conversely. Generalzng the expresson (3.4.1) for any type of examples, we obtan the general formulaton of the entropy: n H ( S ) = p log p (3.4.3) = 1 In all the calculatons related to the entropy, we defne 0log0 equal to 0. If the attrbute at dvde the set S n the subsets S, = 1,2,....., n, then, the total entropy of the system of subsets wll be: n ( S ) H ( ) H ( S, at) = P (3.4.4) = 1 S Where ( ) S H s the entropy of the subset P S s the probablty of the fact that an example belong to S. S and ( ) It can be calculate, used the relatve szes of the subsets, as: S P( S ) = (3.4.5) S The gan of nformaton may be calculate as the decrease n entropy. Thus: I S, at = H S H S, at (3.4.6) ( ) ( ) ( )
7 H s the value of the entropy a pror, before H, s the value of the entropy of the subsets system generated by the partton accordng to at. The use of the entropy to evaluate the best attrbute s not the only one exstng method or used n Automatc Learnng. However, t s used by Qunlan upon developng the ID3 and hs succeedng the C4.5. Where ( S ) accomplshng the subdvson, and ( S at) Numercal Data The decson trees can be generated so much as dscrete attrbutes as contnous attrbutes. When t s worked wth dscrete attrbutes, the partton of the set accordng to the value of an attrbute s smple. To solve ths problem, t can be appealed to the bnary method. Ths method conssts n formng two ranges of agreement values to the value of an attrbute, that they can be taken as symbolc. 4. Results and Conclusons Results of the C4.5. The C4.5 wth postprunng results n trees smaller and less bushy. If we analyze the trees obtaned n the doman, we ll see that the percentages of error obtaned wth the C4.5 are between a 3% and a 3.7%, snce that the C4.5 generate smaller trees and smaller rulesets. Dervatve of the fact that each leaf n a tree generated covers a dstrbuton of classes Error percentage {ELITA} { [1]: C4.5Gan Trees [2]: C4.5Gan Rulers [3]: C4.5Proporton of Gan Trees [4]: C4.5Rulers Proporton of Gan Trees} < 3% From the analyss of ths value we could conclude that no method can generate a clearly superor model for the doman. On the contrary, we could state that the error percentage doesn t appear to depend on the method used, but on the analyzed doman Hypothess space The hypothess space for ths algorthm s complete accordng to the avalable attrbutes. Because any value test can be represented wth a decson tree, ths algorthm avod one of the prncpal rsks of nductve method that works reducng the spaces of the hypothess. An mportant feature of the C4.5 algorthm s that t use all the avalable data n each step to chose the best attrbute; ths s a decson that s made wth statstc method. Ths fact favors ths algorthm over other algorthms because analyze how the nput dataset take the representaton nto decson trees n consstent forms. Once an attrbute has been selected as a decson node, the algorthm does not go back over ther choces. Ths s the reason why ths algorthm can converge to a local maxmum[20]. The C4.5 algorthm adds a certan degree of reconsderaton of ts choces n the postprunng of the decson trees. Nevertheless, we can state that the results show that the proporton of error depends on the data doman. For future study, we suggest an analyss the nput datasets wth the numercal method of clusterng and choosng for the doman the method that mantans a low percentage error n extended databases as a robustness of the method. 5. Corollary From what has been sad, the work uses the Sequental, Agglomeratve, Herarchc and Nonoverlappng clusterng procedures, spectral analyss crteron and nvarants to accomplsh classfcatons n extended databases, of proper asterod elements, to structure famles. The preclassfed data s an mportant nput to Intellgent Data Mnng, and Computatonal Taxonomy n Databases wll have always a low percentage error n extended databases as a robustness of the method; to combne a sure result. References [1]Perchnsky, G., Orellana, R., Plastno, A.L., Jmenez Rey, E. and Gross, M.D. "Spectra of Taxonomc Evdence n Databases." Proceedngs of XVIII Internatonal Conference on Appled Informatcs. (Paper ).Innsbruck. Austra [2]Crsc, J.V., Lopez Armengol, M.F. "Introducton to Theory and Practce of the Numercal Taxonomy", A.S.O. Regonal Program of Scence and Technology for Development. Washngton D.C. Spansh [3]Gennar,J.H. A Survey of Clusterng Methods (b). Techncal Report Department of Computer Scence and Informatcs. Unversty of Calforna., Irvne, CA [4]Sokal, R.R., Sneath, P.H.A. "Numercal Taxonomy".W.H.Freeman and Company [5]Zappala, V, Cellno,A., Farnella,P., Mlan,A., The Astronomcal Journal, 107, [6]Abramson,N., Informaton Theory and Codng. McGraw Hll. Parannfo. Madrd [7]Hammng, R.W. Codng and nformaton theory. Englewood Clfs, NJ: Prentce Hall [8]Freeman,J.A., Skapura,D.M. Neural Networks. Algorthms, applcatons and technques of programmng. Addson Wesley. Iberoamercana. Spansh [9]Mchalsk, R. S A Theory and Methodology of Inductve Learnng. En Mchalsk, R. S., Carbonell, J. G., Mtchell, T. M. (1983) Machne Learnng: An Artfcal Intellgence Approach, Vol. I. MorganKauffman, USA. [10]Qunlan, J.R Inducton of Decson Trees. In Machne Learnng, Ch. 1, p Morgan Kaufmann. [11]Qunlan, J.R Generatng Producton Rules from Decson trees. Proceedng of the Tenth Internatonal Jont
8 Conference on Artfcal Intellgence, p San Mateo, CA., Morgan Kaufmann, USA. [12]Qunlan, J.R Decson trees and multvalued attrbutes. En J.E. Hayes, D. Mche, and J. Rchards (eds.), Machne Intellgence, V. II, p Oxford Unversty Press, Oxford, UK. [13]Qunlan, J.R Learnng Effcent Classfcaton Procedures and Ther Applcaton to Chess Games, In R. S. Mchalsk, J. G. Carbonell, & T. M. Mtchells (Eds.) Machne Learnng, The Artfcal Intellgence Approach. Morgan Kaufmann, V. II, Ch. 15, p , USA. [13b]Qunlan, J.R C4.5: Programs for Machne Learnng. Morgan Kaufmann Publshers, San Mateo, Calforna, EE.UU. [14]Qunlan, J.R Improved Use of Contnuous Attrbutes n C4.5. Basser Departament of Computer Scence, Unversty of Scence, Australa. [15]Qunlan, J.R Learnng FrstOrder Defntons of Functons. Basser Departament of Computer Scence, Unversty of Scence, Australa [16]Hunt, E.B., Marn, J., Stone, P.J (1995AI). Experments n Inducton. New York: Academc Press, USA. [17]Hrayama,K. Present State of the Famles of Asterods. Proceedng of PhyscsMathematcs Socety. Japan II:9. pp [18]Cramer, Harald. Mathematcs Methods n Statstcs.Agular Edton.Madrd.Spansh [19]Mtchell, T Machne Learnng. MCB/McGrawHll, Carnege Mellon Unversty, USA. [20]Mtchell, T Decson Trees. Cornell Unversty, USA. [21]Feynman, R.P., Leghton, R.B. & Sands, M. Lectures on physcs, Manly Mechancs, Radaton and Heat. pp ff, 286 ff, 291 ff, [22]Hetcht,E. and Zajac,A., Optc. InterAmercan Educatonal Fund. pp Spansh 1977.
The Development of Web Log Mining Based on ImproveKMeans Clustering Analysis
The Development of Web Log Mnng Based on ImproveKMeans Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.
More informationbenefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).
REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or
More informationConversion between the vector and raster data structures using Fuzzy Geographical Entities
Converson between the vector and raster data structures usng Fuzzy Geographcal Enttes Cdála Fonte Department of Mathematcs Faculty of Scences and Technology Unversty of Combra, Apartado 38, 3 454 Combra,
More informationInstitute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic
Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange
More informationWhat is Candidate Sampling
What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble
More informationDescriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications
CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary
More informationRecurrence. 1 Definitions and main statements
Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.
More informationAn InterestOriented Network Evolution Mechanism for Online Communities
An InterestOrented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne
More informationCan Auto Liability Insurance Purchases Signal Risk Attitude?
Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? ChuShu L Department of Internatonal Busness, Asa Unversty, Tawan ShengChang
More informationA hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm
Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):18841889 Research Artcle ISSN : 09757384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel
More informationFeature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College
Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure
More informationVision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION
Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble
More informationModule 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..
More informationCS 2750 Machine Learning. Lecture 17a. Clustering. CS 2750 Machine Learning. Clustering
Lecture 7a Clusterng Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Clusterng Groups together smlar nstances n the data sample Basc clusterng problem: dstrbute data nto k dfferent groups such that
More information8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by
6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng
More informationPSYCHOLOGICAL RESEARCH (PYC 304C) Lecture 12
14 The Chsquared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed
More informationCHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol
CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL
More informationTime Series Analysis in Studies of AGN Variability. Bradley M. Peterson The Ohio State University
Tme Seres Analyss n Studes of AGN Varablty Bradley M. Peterson The Oho State Unversty 1 Lnear Correlaton Degree to whch two parameters are lnearly correlated can be expressed n terms of the lnear correlaton
More informationAn Alternative Way to Measure Private Equity Performance
An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate
More informationA DATA MINING APPLICATION IN A STUDENT DATABASE
JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (5357) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng BüyükbakkalköyIstanbul
More informationStudy on CET4 Marks in China s Graded English Teaching
Study on CET4 Marks n Chna s Graded Englsh Teachng CHE We College of Foregn Studes, Shandong Insttute of Busness and Technology, P.R.Chna, 264005 Abstract: Ths paper deploys Logt model, and decomposes
More informationCommunication Networks II Contents
8 / 1  Communcaton Networs II (Görg)  www.comnets.unbremen.de Communcaton Networs II Contents 1 Fundamentals of probablty theory 2 Traffc n communcaton networs 3 Stochastc & Marovan Processes (SP
More informationThe OC Curve of Attribute Acceptance Plans
The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4
More informationBERNSTEIN POLYNOMIALS
OnLne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful
More informationHow Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence
1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh
More informationLecture 18: Clustering & classification
O CPS260/BGT204. Algorthms n Computatonal Bology October 30, 2003 Lecturer: Pana K. Agarwal Lecture 8: Clusterng & classfcaton Scrbe: Daun Hou Open Problem In HomeWor 2, problem 5 has an open problem whch
More informationANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING
ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 6105194390,
More informationLuby s Alg. for Maximal Independent Sets using Pairwise Independence
Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent
More informationMAPP. MERIS level 3 cloud and water vapour products. Issue: 1. Revision: 0. Date: 9.12.1998. Function Name Organisation Signature Date
Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPPATBDClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller
More informationThe Magnetic Field. Concepts and Principles. Moving Charges. Permanent Magnets
. The Magnetc Feld Concepts and Prncples Movng Charges All charged partcles create electrc felds, and these felds can be detected by other charged partcles resultng n electrc force. However, a completely
More informationSupport Vector Machines
Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.
More information1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)
6.3 /  Communcaton Networks II (Görg) SS20  www.comnets.unbremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes
More informationForecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network
700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School
More information1 Example 1: Axisaligned rectangles
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton
More informationForecasting the Direction and Strength of Stock Market Movement
Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract  Stock market s one of the most complcated systems
More informationNonlinear data mapping by neural networks
Nonlnear data mappng by neural networks R.P.W. Dun Delft Unversty of Technology, Netherlands Abstract A revew s gven of the use of neural networks for nonlnear mappng of hgh dmensonal data on lower dmensonal
More information8 Algorithm for Binary Searching in Trees
8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the
More information1 Approximation Algorithms
CME 305: Dscrete Mathematcs and Algorthms 1 Approxmaton Algorthms In lght of the apparent ntractablty of the problems we beleve not to le n P, t makes sense to pursue deas other than complete solutons
More informationL10: Linear discriminants analysis
L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss
More informationRing structure of splines on triangulations
www.oeaw.ac.at Rng structure of splnes on trangulatons N. Vllamzar RICAMReport 201448 www.rcam.oeaw.ac.at RING STRUCTURE OF SPLINES ON TRIANGULATIONS NELLY VILLAMIZAR Introducton For a trangulated regon
More informationEE201 Circuit Theory I 2015 Spring. Dr. Yılmaz KALKAN
EE201 Crcut Theory I 2015 Sprng Dr. Yılmaz KALKAN 1. Basc Concepts (Chapter 1 of Nlsson  3 Hrs.) Introducton, Current and Voltage, Power and Energy 2. Basc Laws (Chapter 2&3 of Nlsson  6 Hrs.) Voltage
More information320 The Internatonal Arab Journal of Informaton Technology, Vol. 5, No. 3, July 2008 Comparsons Between Data Clusterng Algorthms Osama Abu Abbas Computer Scence Department, Yarmouk Unversty, Jordan Abstract:
More informationTHE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES
The goal: to measure (determne) an unknown quantty x (the value of a RV X) Realsaton: n results: y 1, y 2,..., y j,..., y n, (the measured values of Y 1, Y 2,..., Y j,..., Y n ) every result s encumbered
More informationCausal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting
Causal, Explanatory Forecastng Assumes causeandeffect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of
More informationLinear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits
Lnear Crcuts Analyss. Superposton, Theenn /Norton Equalent crcuts So far we hae explored tmendependent (resste) elements that are also lnear. A tmendependent elements s one for whch we can plot an / cure.
More informationCalculation of Sampling Weights
Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a twostage stratfed cluster desgn. 1 The frst stage conssted of a sample
More informationIMPACT ANALYSIS OF A CELLULAR PHONE
4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng
More informationCS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements
Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there
More informationDEFINING %COMPLETE IN MICROSOFT PROJECT
CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMISP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,
More informationThe covariance is the two variable analog to the variance. The formula for the covariance between two variables is
Regresson Lectures So far we have talked only about statstcs that descrbe one varable. What we are gong to be dscussng for much of the remander of the course s relatonshps between two or more varables.
More informationThe Analysis of Outliers in Statistical Data
THALES Project No. xxxx The Analyss of Outlers n Statstcal Data Research Team Chrysses Caron, Assocate Professor (P.I.) Vaslk Karot, Doctoral canddate Polychrons Economou, Chrstna Perrakou, Postgraduate
More informationA Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy Scurve Regression
Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy Scurve Regresson ChengWu Chen, Morrs H. L. Wang and TngYa Hseh Department of Cvl Engneerng, Natonal Central Unversty,
More informationAn Inductive Fuzzy Classification Approach applied to Individual Marketing
An Inductve Fuzzy Classfcaton Approach appled to Indvdual Marketng Mchael Kaufmann, Andreas Meer Abstract A data mnng methodology for an nductve fuzzy classfcaton s ntroduced. The nducton step s based
More informationProject Networks With MixedTime Constraints
Project Networs Wth MxedTme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa
More informationOn Mean Squared Error of Hierarchical Estimator
S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta
More informationCalculating the high frequency transmission line parameters of power cables
< ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,
More informationAbstract. 260 Business Intelligence Journal July IDENTIFICATION OF DEMAND THROUGH STATISTICAL DISTRIBUTION MODELING FOR IMPROVED DEMAND FORECASTING
260 Busness Intellgence Journal July IDENTIFICATION OF DEMAND THROUGH STATISTICAL DISTRIBUTION MODELING FOR IMPROVED DEMAND FORECASTING Murphy Choy Mchelle L.F. Cheong School of Informaton Systems, Sngapore
More informationHow Much to Bet on Video Poker
How Much to Bet on Vdeo Poker Trstan Barnett A queston that arses whenever a gae s favorable to the player s how uch to wager on each event? Whle conservatve play (or nu bet nzes large fluctuatons, t lacks
More informationOn the Optimal Control of a Cascade of HydroElectric Power Stations
On the Optmal Control of a Cascade of HydroElectrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;
More informationNEUROFUZZY INFERENCE SYSTEM FOR ECOMMERCE WEBSITE EVALUATION
NEUROFUZZY INFERENE SYSTEM FOR EOMMERE WEBSITE EVALUATION Huan Lu, School of Software, Harbn Unversty of Scence and Technology, Harbn, hna Faculty of Appled Mathematcs and omputer Scence, Belarusan State
More informationFace Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)
Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton
More informationMinimal Coding Network With Combinatorial Structure For Instantaneous Recovery From Edge Failures
Mnmal Codng Network Wth Combnatoral Structure For Instantaneous Recovery From Edge Falures Ashly Joseph 1, Mr.M.Sadsh Sendl 2, Dr.S.Karthk 3 1 Fnal Year ME CSE Student Department of Computer Scence Engneerng
More informationPassive Filters. References: Barbow (pp 265275), Hayes & Horowitz (pp 3260), Rizzoni (Chap. 6)
Passve Flters eferences: Barbow (pp 6575), Hayes & Horowtz (pp 360), zzon (Chap. 6) Frequencyselectve or flter crcuts pass to the output only those nput sgnals that are n a desred range of frequences (called
More informationCluster Analysis. Cluster Analysis
Cluster Analyss Cluster Analyss What s Cluster Analyss? Types of Data n Cluster Analyss A Categorzaton of Maor Clusterng Methos Parttonng Methos Herarchcal Methos DenstyBase Methos GrBase Methos MoelBase
More informationImproved Mining of Software Complexity Data on Evolutionary Filtered Training Sets
Improved Mnng of Software Complexty Data on Evolutonary Fltered Tranng Sets VILI PODGORELEC Insttute of Informatcs, FERI Unversty of Marbor Smetanova ulca 17, SI2000 Marbor SLOVENIA vl.podgorelec@unmb.s
More informationwhere the coordinates are related to those in the old frame as follows.
Chapter 2  Cartesan Vectors and Tensors: Ther Algebra Defnton of a vector Examples of vectors Scalar multplcaton Addton of vectors coplanar vectors Unt vectors A bass of noncoplanar vectors Scalar product
More information3 Supervised Learning
3 Supervsed Learnng Supervsed learnng has been a great success n realworld applcatons. It s used n almost every doman, ncludng text and Web domans. Supervsed learnng s also called classfcaton or nductve
More informationFormation of probabilistic concepts through observations containing. discrete and continuous attributes.
Formaton of probablstc concepts through observatons contanng dscrete and contnuous attrbutes Rcardo Batsta Rebouças, João José Vasco Furtado Mestrado em Informátca Aplcada (MIA) Unversdade de Fortaleza
More informationFREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES
FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES Zuzanna BRO EKMUCHA, Grzegorz ZADORA, 2 Insttute of Forensc Research, Cracow, Poland 2 Faculty of Chemstry, Jagellonan
More informationRiskbased Fatigue Estimate of Deep Water Risers  Course Project for EM388F: Fracture Mechanics, Spring 2008
Rskbased Fatgue Estmate of Deep Water Rsers  Course Project for EM388F: Fracture Mechancs, Sprng 2008 Chen Sh Department of Cvl, Archtectural, and Envronmental Engneerng The Unversty of Texas at Austn
More informationAn Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems
STANCS73355 I SUSE73013 An Analyss of Central Processor Schedulng n Multprogrammed Computer Systems (Dgest Edton) by Thomas G. Prce October 1972 Techncal Report No. 57 Reproducton n whole or n part
More information1.1 The University may award Higher Doctorate degrees as specified from timetotime in UPR AS11 1.
HIGHER DOCTORATE DEGREES SUMMARY OF PRINCIPAL CHANGES General changes None Secton 3.2 Refer to text (Amendments to verson 03.0, UPR AS02 are shown n talcs.) 1 INTRODUCTION 1.1 The Unversty may award Hgher
More informationThe Greedy Method. Introduction. 0/1 Knapsack Problem
The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton
More informationx f(x) 1 0.25 1 0.75 x 1 0 1 1 0.04 0.01 0.20 1 0.12 0.03 0.60
BIVARIATE DISTRIBUTIONS Let be a varable that assumes the values { 1,,..., n }. Then, a functon that epresses the relatve frequenc of these values s called a unvarate frequenc functon. It must be true
More informationCHAPTER 14 MORE ABOUT REGRESSION
CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp
More informationPerformance Analysis and Coding Strategy of ECOC SVMs
Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04), pp.6776 http://dx.do.org/0.457/jgdc.04.7..07 Performance Analyss and Codng Strategy of ECOC SVMs Zhgang Yan, and Yuanxuan Yang, School
More informationgreatest common divisor
4. GCD 1 The greatest common dvsor of two ntegers a and b (not both zero) s the largest nteger whch s a common factor of both a and b. We denote ths number by gcd(a, b), or smply (a, b) when there s no
More informationErrorPropagation.nb 1. Error Propagation
ErrorPropagaton.nb Error Propagaton Suppose that we make observatons of a quantty x that s subject to random fluctuatons or measurement errors. Our best estmate of the true value for ths quantty s then
More information+ + +   This circuit than can be reduced to a planar circuit
MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to
More informationQUANTUM MECHANICS, BRAS AND KETS
PH575 SPRING QUANTUM MECHANICS, BRAS AND KETS The followng summares the man relatons and defntons from quantum mechancs that we wll be usng. State of a phscal sstem: The state of a phscal sstem s represented
More informationAutomated information technology for ionosphere monitoring of loworbit navigation satellite signals
Automated nformaton technology for onosphere montorng of loworbt navgaton satellte sgnals Alexander Romanov, Sergey Trusov and Alexey Romanov Federal State Untary Enterprse Russan Insttute of Space Devce
More informationFORCED CONVECTION HEAT TRANSFER IN A DOUBLE PIPE HEAT EXCHANGER
FORCED CONVECION HEA RANSFER IN A DOUBLE PIPE HEA EXCHANGER Dr. J. Mchael Doster Department of Nuclear Engneerng Box 7909 North Carolna State Unversty Ralegh, NC 276957909 Introducton he convectve heat
More information2.4 Bivariate distributions
page 28 2.4 Bvarate dstrbutons 2.4.1 Defntons Let X and Y be dscrete r.v.s defned on the same probablty space (S, F, P). Instead of treatng them separately, t s often necessary to thnk of them actng together
More informationRobust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School
Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management
More informationImplementation of Deutsch's Algorithm Using Mathcad
Implementaton of Deutsch's Algorthm Usng Mathcad Frank Roux The followng s a Mathcad mplementaton of Davd Deutsch's quantum computer prototype as presented on pages  n "Machnes, Logc and Quantum Physcs"
More informationJ. Parallel Distrib. Comput.
J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n
More informationA Dynamic Load Balancing for Massive Multiplayer Online Game Server
A Dynamc Load Balancng for Massve Multplayer Onlne Game Server Jungyoul Lm, Jaeyong Chung, Jnryong Km and Kwanghyun Shm Dgtal Content Research Dvson Electroncs and Telecommuncatons Research Insttute Daejeon,
More informationLecture 2: Single Layer Perceptrons Kevin Swingler
Lecture 2: Sngle Layer Perceptrons Kevn Sngler kms@cs.str.ac.uk Recap: McCullochPtts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses
More informationGeneralizing the degree sequence problem
Mddlebury College March 2009 Arzona State Unversty Dscrete Mathematcs Semnar The degree sequence problem Problem: Gven an nteger sequence d = (d 1,...,d n ) determne f there exsts a graph G wth d as ts
More informationOnLine Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features
OnLne Fault Detecton n Wnd Turbne Transmsson System usng Adaptve Flter and Robust Statstcal Features Ruoyu L Remote Dagnostcs Center SKF USA Inc. 3443 N. Sam Houston Pkwy., Houston TX 77086 Emal: ruoyu.l@skf.com
More informationNetwork Security Situation Evaluation Method for Distributed Denial of Service
Network Securty Stuaton Evaluaton Method for Dstrbuted Denal of Servce Jn Q,2, Cu YMn,2, Huang MnHuan,2, Kuang XaoHu,2, TangHong,2 ) Scence and Technology on Informaton System Securty Laboratory, Bejng,
More informationEstimating the Number of Clusters in Genetics of Acute Lymphoblastic Leukemia Data
Journal of Al Azhar UnverstyGaza (Natural Scences), 2011, 13 : 109118 Estmatng the Number of Clusters n Genetcs of Acute Lymphoblastc Leukema Data Mahmoud K. Okasha, Khaled I.A. Almghar Department of
More informationMining Multiple Large Data Sources
The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of
More information"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *
Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC
More informationA Probabilistic Theory of Coherence
A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want
More informationSCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS
SCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS Magdalena Rogalska 1, Wocech Bożeko 2,Zdzsław Heduck 3, 1 Lubln Unversty of Technology, 2 Lubln, Nadbystrzycka 4., Poland. Emal:rogalska@akropols.pol.lubln.pl
More informationSingle and multiple stage classifiers implementing logistic discrimination
Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul  PUCRS Av. Ipranga,
More informationUnsupervised Learning and Clustering
Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also
More informationStudy on Model of Risks Assessment of Standard Operation in Rural Power Network
Study on Model of Rsks Assessment of Standard Operaton n Rural Power Network Qngj L 1, Tao Yang 2 1 Qngj L, College of Informaton and Electrcal Engneerng, Shenyang Agrculture Unversty, Shenyang 110866,
More informationExtending Probabilistic Dynamic Epistemic Logic
Extendng Probablstc Dynamc Epstemc Logc Joshua Sack May 29, 2008 Probablty Space Defnton A probablty space s a tuple (S, A, µ), where 1 S s a set called the sample space. 2 A P(S) s a σalgebra: a set
More information