Big Data Analysis and Reporting with Decision Tree Induction

Big Data Analysis and Reporting with Deision Tree Indution PETRA PERNER Institute of Computer Vision and Applied Computer Sienes, IBaI Postbox 30 11 14, 04251 Leipzig GERMANY pperner@ibai-institut.de, www.ibai-institut.de Abstrat: Data mining methods are widely used aross many disiplines to identify patterns, rules or assoiations among huge volumes of data. While in the past mostly blak box methods suh as neural nets and support vetor mahines have been heavily used in tehnial domains, methods that have explanation apability are preferred in medial domains. Nowadays, data mining methods with explanation apability are also used for tehnial domains after more work on advantages and disadvantages of the methods has been done. Deision tree indution suh as C4.5 is the most preferred method sine it works well on average regardless of the data set being used. This method an easily learn a deision tree without heavy user interation while in neural nets a lot of time is spent on training the net. Cross-validation methods an be applied to deision tree indution methods; these methods ensure that the alulated error rate omes lose to the true error rate. The error rate and the partiular goodness measures desribed in this paper are quantitative measures that provide help in understanding the quality of the model. The data olletion problem with its noise problem has to be onsidered. Speialized auray measures and proper visualization methods help to understand this problem. Sine deision tree indution is a supervised method, the assoiated data labels onstitute another problem. Re-labeling should be onsidered after the model has been learnt. This paper also disusses how to fit the learnt model to the expert s knowledge. The problem of omparing two deision trees in aordane with its explanation power is disussed. Finally, we summarize our methodology on interpretation of deision trees. Key-Words: Big Data Analysis, Reporting and Visualization, Deision Tree Indution, Comparison Deision Trees, Classifiation, Similarity Measure 1 Introdution Data mining methods are widely used aross many disiplines to identify patterns, rules or assoiations among huge volumes of data. Different methods an be applied to aomplish this. While in the past mostly blak box methods suh as neural nets and support vetor mahines (SVM) have been heavily used in tehnial domains, methods that have explanation apability have been partiularly used in medial domains sine a physiian likes to understand the outome of a lassifier and map it to his domain knowledge; otherwise, the level of aeptane of an automati system is low. Nowadays, data mining methods with explanation apability are also used for tehnial domains after more work on advantages and disadvantages of the methods has been done. The most preferred method among the methods with explanation apability is deision tree indution method [1]. This method an easily learn a deision tree without heavy user interation while in neural nets a lot of time is spent on training the net. Crossvalidation methods an be applied to deision tree indution methods; these methods ensure that the alulated error rate omes lose to the true error rate. A large number of deision tree methods exist but the method that works well on average on all kinds of data sets is still the C4.5 deision tree method and some of its variants. Although the user an easily apply this method to his data set thanks to all the different tools that are available and set up in suh a way that none omputer-siene speialist, an use them without any problem, the user is still faed with the problem of how to interpret the result of a deision tree indution method. This problem espeially arises when two different data sets for one problem are available or when the data set is olleted in temporal sequene. Then the data set grows over time and the results might hange. The aim of this paper is to give an overview of the problems that arise when interpreting deision trees. This paper is aimed at providing the user with a methodology on how to use the resulting model of deision tree indution methods. In Setion 2, we explain the data olletion problem. In Setion 3, we review how deision tree indution based on the entropy priniple works. In Setion 4, we present quantitative and qualitative ISBN: 978-1-61804-276-7 25

measures that allow a user to judge the performane of a deision tree. Finally, in setion 5 we disuss the results ahieved so far with our methodology. 2 The Problem Many fators influene the result of the deision tree indution proess. The data olletion problem is a triky pit fall. The data might beome very noisy due to some subjetive or system-dependent problems during the data olletion proess. Newomers in data mining go into data mining step by step. First, they will aquire a small data base that allows them to test what an be ahieved by data mining methods. Then, they will enlarge the data base hoping that a larger data set will result in better data mining results. But often this is not the ase. Others may have big data olletions that have been olleted in their daily pratie suh as in marketing and finane. To a ertain point, they want to analyze these data with data mining methods. If they do this based on all data they might be faed with a lot of noise in the data sine ustomer behavior might have hanged over time due to some external fators suh as eonomi fators, limate ondition hanges in a ertain area and so on. Web data an hange severely over time. People from different geographi areas and different nations an aess a website and leave a distint trak dependent on the geographi area they are from and the nation they belong to. If the user has to label the data, then it might be apparent that the subjetive deision about the lass the data set belongs to might result in some noise. Depending on the form of the day of the expert or on his experiene level, he will label the data properly or not as well as he should. Orale-based lassifiation methods [12] [13] or similarity-based methods [14] [15] might help the user to overome suh subjetive fators. If the data have been olleted over an extended period of time, there might be some data drift. In ase of a web-based shop the ustomers frequenting the shop might have hanged beause the produts now attrat other groups of people. In a medial appliation the data might hange beause the medial treatment protool has been hanged. This has to be taken into onsideration when using the data. It is also possible that the data are olleted in time intervals. The data in time period _1 might have other harateristis than the data olleted in time period_2. In agriulture this might be true beause the weather onditions have hanged. If this is the ase, the data annot make up a single data set. The data must be kept separate with a tag indiating that they were olleted under different weather onditions. In this paper we desribe the behavior of deision tree indution under hanging onditions (see Figure 1) in order to give the user a methodology for using deision tree indution methods. The user should be able to detet suh influenes based on the results of the deision tree indution proess. Fig. 1 The Data Colletion Problem 3 Deision Tree Indution Based on the Gain Ratio (Entropy-Based Measure) The appliation of deision tree indution methods requires some basi knowledge of how deision tree indution methods work. This setion reviews the basi properties of deision tree indution[17]. Deision trees reursively split the deision spae into subspaes based on the deision rules in the nodes until the final stop riterion is reahed or the remaining sample set does not suggest further splitting (see Figure 2). For this reursive splitting proess, the tree building proess must always pik from all attributes that one that shows the best result on the attribute seletion riteria for the remaining sample set. Whereas for ategorial attributes the partition of the attribute values is given a-priori, the partition (also alled attribute disretization) of the attribute values for numerial attributes must be determined. Fig. 2 Overall Tree Indution Proedure Attribute disretization an be done before or during the tree building proess [2]. We will onsider the ase where the attribute disretization is done during the tree building proess. The disretization must be arried out before the attribute seletion ISBN: 978-1-61804-276-7 26

proess sine the seleted partition regarding the attribute values of a numerial attribute highly influenes the predition power of that attribute. After the attribute seletion riterion has been alulated for all attributes based on the remaining sample set, the resulting values are evaluated and the attribute with the best value for the attribute seletion riterion is seleted for further splitting of the sample set. Then the tree is extended by two further nodes. To eah node the subset reated by splitting based on the attribute values is assigned and the tree building proess is repeated. Deision tree indution is a supervised method. It requires that the data is labeled by its lass. The indued deision tree tends to overfit the data. In Figure 3 we have demonstrated this situation based on a tree indued based on the well-known IRIS data set. Overfit is typially due to noise in the attribute values and lass information present in the training set. The tree building proess will produe subtrees that fit this noise. This auses an inreased error rate when lassifying unseen ases. Pruning the tree, whih means replaing subtrees with leaves, will help to avoid this problem (see Figure 4). Fig. 3 Deision Tree original In ase of the IRIS data set the pruned tree provides better auray than the unpruned tree. However, pruning is often based on a statistial model assumption that might not always fit the partiular data. Therefore, there might be a situation where the unpruned tree gives better results than the pruned tree even when heked with new data. 3.1 Attribute Splitting Criteria Following the theory of the Shannon hannel, we onsider the data set as the soure and measure the impurity of the reeived data when transmitted via the hannel. The transmission over the hannel results in partitioning of the data set into subsets based on splits on the attribute values J of the attribute A. The aim should be to transmit the signal with the least loss of information. This an be desribed by the following riterion: IF I(A)=I(C)-I(C/J)=Max THEN Selet Attribute-A where I(A) is the entropy of the soure, I(C) is the entropy of the reeiver or the expeted entropy to generate the message C1, C2,..., Cm, and I(C/J) is the lost entropy when branhing on the attribute values J of attribute A. For the alulation of this riterion we onsider first the ontingeny table in Table 1 with m the number of lasses, n the number of attribute values J, n the number of examples, Li the number of examples with the attribute value Ji, Rj the number of examples belonging to lass Cj, and xij the number of examples belonging to lass Cj and having attribute value Ai. Table 1 Contingeny Table for an Attribute Now, we an define the entropy of the lass C by: m R j R j I C) ld j 1 N N ( (1) The entropy of the lass given the feature values is: Fig. 4 Deision Tree pruned ISBN: 978-1-61804-276-7 27

n m n n m L xij x i ij 1 I( C / J ) ld ( LildLi xijldxij) (2) N L L N i 1 j 1 i i i 1 i 1 j 1 The best feature is the one that ahieves the lowest value for (2) or, equivalently, the highest value of the "mutual information" I(C) - I(C/J). The main drawbak of this measure is its sensitivity to the number of attribute values. In the extreme, a feature that takes N distint values for N examples ahieves omplete disrimination between different lasses, giving I(C/J)=0, even though the features may onsist of random noise and may be useless for prediting the lasses of future examples. Therefore, Quinlan 3 introdued a normalization by the entropy of the attribute itself: n Li Li G(A)=I(A)/I(J) with I( J ) ld (3) N N Other normalizations have been proposed by Coppersmith et. al 4] and Lopez de Montaras 5. Comparative studies have been done by White and Lui 6. The behavior of the entropy is very interesting 7. Figure 5 shows the graph for the single term p ld p. The graph is not symmetrial. It has its maximum when 37% of the data have the same value. In that ase this value will trump all other values. 0,53 0,5 -p* ldp 0,25 i 1 0 1/e 0,37 0 0,5 p 1 Fig. 5 Diagram of p ld p; The maximum is at p=1/e; H=0.5 is assumed for p=0.25 and p=0.5 In ase of a binary split, we are faed with the situation that there are two soures with the signal probability of p and 1-p. The entropy is shown in Figure 6. It has its maximum when all values are equally distributed. The maximum value for the splitting riterion will be reahed if most of the samples fall on one side of the split. The deision tree indution algorithm will always favor splits that meet this situation. Figure 11 demonstrates this situation based of the IRIS data set. The visualization shows to the user the loation of the lass-speifi data dependent on two attributes. This helps the user to understand what hanged in the data. H(p,1-p) 1 0,8 0,6 0,4 0,2 0 0 0,2 0,4 0,6 0,8 1 p, 1-p Fig. 6 Behaviour of H(p,1-p) under the ondition that two soures are of the signal probability p and 1 - p 4 How to Interpret the Results of a Deision Tree 4.1 Quantitative Measures of the Quality of the Deision Tree Model One of the most important measures of the quality of a deision tree is auray. This measure is judged based on the available data set. Usually, rossvalidation is used for evaluating the model sine it is never lear if the available data set is a good representation of the entire domain. Compared to test-and-train, ross-validation an provide a measure statistially lose to the true error rate. Espeially if one has small sample sets, the predition of the error rate based on ross-validation is a must. Although this is a well-known fat by now, there are still frequently results presented that are based on testand-train and small sample sets. In ase of neural nets there is hardly any work available that judges the error rate based on ross-validation. If a larger data set is available, ross-validation is also a better hoie for the estimation of the error rate sine one an never be sure if the data set overs the property of the whole domain. Faed with the problem of omputational omplexity, n-fold ross-validation is a good hoie. It splits the whole data set into bloks of n and runs ross-validation based theorem. The output of ross-validation is mean auray. As you might know from statistis it is muh better ISBN: 978-1-61804-276-7 28

to predit a measure based on single measures obtained from a data set split into bloks of data and to average over the measure than predit the measure based on a single shot on the whole data set. Moreover the variane of the auray gives you another hint in regard to how good the measure is: If the variane is high, there is muh noise in the data; if the variane is low, the result is muh more stable. The quality of a neural net is often not judged based on ross-validation. Cross-validation requires setting up a new model in eah loop of the yle. The mean auray over all values of the auray of the eah single yle is alulated as well as the standard deviation of auray. Neural nets are not automatially set up but deision trees are. A neural network needs a lot of training and people laim that suh a neural net one it is stable in its behavior - is the gold standard. However, the auray is judged based on the test-and-train approah and it is not sure if it is the true auray. Bootstrapping for the evaluation of auray is another hoie but it is muh more omputationally expensive than ross-validation; therefore, many tools do not provide this proedure. Auray and the standard deviation are an overall measure, respetively. The standard deviation of the auray an be taken as a measure to evaluate how stable the model is. A high standard deviation might show that the data are very noisy and that the model might hange when new data beome available. More detailed measures an be alulated that give a more detailed insight into the behavior of the model 8. The most widely used evaluation riterion for a lassifier is the error rate f r.= N f /N with N f the number of false lassified samples and N the whole number of samples. In addition, we use a ontingeny table in order to show the qualities of a lassifier, see Table 2. Table 2. Contingeny Table The table ontains the atual and the real lass distribution as well as the marginal distribution ij. The main diagonal is the number of orret lassified samples. The last row shows the number of samples assigned to the lass shown in row 1 and the last line shows the real lass distribution. Based on this table, we an alulate parameters that assess the quality of the lassifier. The orretness p is the number of orret lassified samples over the number of samples: p m ii i m m 1 (4) i 1 j 1 For the investigation of the lassifiation quality we measure the lassifiation quality p ki aording to a partiular lass i and the number of orret lassified samples p ti for one lass i: p ki ii m j 1 ji p ji ii ti m i 1 ji (5) Other riteria shown in Table 3 are also important when judging the quality of a model. Table 3. Criteria for Comparison of Learned Classifiers Generalization Capability of the Classifier Representation of the Classifier Classifiation Costs Explanation Capability Learning Performane Error Rate based on the Test Data Set Error Rate based on the Design Data Set Number of Features used for Classifiation Number of Nodes or Neurons Can a human understand the deision Learning Time Sensitivity to Class Distribution in the Sample Set One of these riteria is the ost for lassifiation expressed by the number of features and the number of deisions used during lassifiation. The other riterion is the time needed for learning. We also onsider the explanation apability of the lassifier as another quality riterion. It is also important to know if the lassifiation method an learn orretly the lassifiation funtion (the mapping of the attributes to the lasses) based on the training data set. Therefore, we not only onsider the error rate based on the test set, we also onsider the error rate based on the training data set. ISBN: 978-1-61804-276-7 29

4.2 Explanation Capability of the Deision Tree Model Suppose we have a data set X with n samples. The outome of the data mining proess is a deision tree that represents the model in a hierarhial rule-based fashion. One path from the top to the leave of a tree an be desribed by a rule that ombines the deisions of eah node by a logial AND. The loser the deision is to the leave, the more noise is ontained in the deision sine the entire data set is subsequently split into two parts from the top to the bottom and in the end only a few samples are ontained in the two data sets. Pruning is performed to avoid that the model overfits the data. Pruning provides a more ompat tree and often a better model in terms of auray. The pruning algorithm is based on an assumption regarding the distribution of the data. If this assumption does not fit the data, the pruned model does not have better auray. Then it is better to stay with the unpruned tree. When users feel onfident about the data mining proess, they are often keen on getting more data. Then they apply the updated data set that is ombined of the data set X and the new data set X ontaining n+t samples (n < n+t) to the deision tree indution. If the resulting model only hanges in nodes lose to the leaf of a deision tree, the user understands why this is so. There will be onfusion when the whole struture of the deision tree has been hanged espeially, when the attribute in the root node hanges. The root node deision should be the most onfident deision. The reason for a hange an be that there were always two ompeting attributes having slightly different values for the attribute seletion riteria. Now, based on the data, the attribute ranked seond in the former proedure is now ranked first. When this happens the whole struture of the tree will hange sine a different attribute in the first node will result in a different first split of the entire data set. It is important that this situation is visually presented to the user so that he an judge what happened. Often the user has already some domain knowledge and prefers a ertain attribute to be ranked first. A way to enable suh a preferene is to allow the user to atively pik the attribute for the node. These visualization tehniques should allow to show to user the loation of the lass-speifi data dependent on two attributes, as shown in Figure 11. This helps the user to understand what hanged in the data. From a list of attributes the user an pik two attributes and the respetive graph will be presented. Another way to judge this situation is to look for the variane of the auray. If the variane is high, this means that the model is not stable yet. The data do not give enough onfidene in regard to the deision. The desribed situation an indiate that something is wrong with the data. It often helps to talk to the user and figure out how the new data set has been obtained. To give you an example: A data base ontains information about the mortality rate of patients that have been treated for breast aner. Information about the patients, suh as age, size, weight, measurements taken during the treatment, and finally the suess or failure, is reported. In the time period T1, treatment with a ertain oktail of mediine, radioative treatment and physiotherapy has taken plae; the kind of treatment is alled a protool. In the time period T2, the physiians hanged the protool sine other mediine is available or other treatment proedures have been reported in the medial literature as being more suessful. The physiians know about the hange in protool but they did not inform you aordingly. Then the whole tree might hange and as a result the deision rules are hanging and the physiians annot onfirm the resulting knowledge sine it does not fit their knowledge about the disease as established in the meantime. The resulting tree has to be disussed with the physiians; the outome may be that in the end the new protool is simply not good. 4.3 Revision of the Data Label Noisy data might be aused by wrong labels applied by the expert to the data. A review of the data with an expert is neessary to ensure that the data labels are orret. Therefore, the data are lassified by the learnt model. All data sets that are mislassified are reviewed by the domain expert. If the expert is of the opinion that the data set needs another label, then the data set is relabeled. The tree is learnt again based on the newly labeled data set. 4.4 Comparison of two Deision Trees Two data sets of the same domain that might be taken at different times, might result in two different deision trees. Then the question arises how similar these two deision trees are. If the models are not similar then something signifiant has hanged in the data set. The path from the top of a deision tree to the leaf is desribed by a rule like IF attribute A<= x and attribute B<=y and attribute C<=z and THEN Class_1. The transformation of a deision tree in a rule-like representation an be easily done. The loation of an attribute is fixed by the struture of the deision tree. ISBN: 978-1-61804-276-7 30

Comparison of rule sets is known from rule indution methods in different domains 9. Here the indued rules are usually ompared to the humanbuilt rules 10 11. Often this is done manually and should give a measure about how good the onstruted rule set is. These kinds of rules an also be automatially ompared by substruture mining. The following questions an be asked: a) How many rules are idential? b) How many of them are idential ompared to all rules? b) What rules ontain part strutures of the deision tree? We propose a first similarity measure for the differenes of the two models as follows. 1. Transform two deision trees d 1 and d 2 into a rule set. 2. Order the rules of two deision tress aording to the number n of attributes in a rule. 3. Then build substrutures of all l rules by deomposing the rules into their substrutures. 4. Compare two rules i and j of two deision trees d 1 and d 2 for eah of the n j and n i substrutures with s attributes. 5. Build similarity measure SIM ij aording to formula 6-8. Fig. 7 Deision_Tree_1, Sim d1,d1 =1 Fig. 8 Substrutures of Deision Tree_1 to Deision Tree_2; Sim d1,d2 =0.9166 The similarity measure is: 1 SIMij ( Sim1 Sim2... Simk... Simn ) (6) n with n max ni, nj and Sim k 1 if substuture identity 0 if otherwise (7) If the rule ontains a numerial attribute A<=k 1 and A <=k 2 =k 1 +x then the similarity measure is Sim A A k k x x t t t Sim 0 for x t 1 1 num 1 1 1 for x<t (8) k with t a user hosen value that allows x to be in a tolerane range of s % (e.g. 10%) of k 1. That means as long as the ut-point k 1 is within the tolerane range we onsider the term as similar, outside the tolerane range it is dissimilar. Small hanges around the first ut-point are allowed while a ut-point far from the first ut-point means that something seriously has happened with the data. Fig. 9 Substrutures of Deision Tree_1 to Deision Tree_3; Sim d1,d3 =0,375; Sim d2,d3 =0.375 Fig. 10 Deision Tree_4 dissimilar to all other Deision Trees, Sim d1,d4 =0 ISBN: 978-1-61804-276-7 31

The similarity measure for the whole substruture is: s Simnum 1 Simk s 1 for A A (9) z 1 0 otherwise The overall similarity between two deision trees d1 and d2 is Sim d, d 1 2 l 1 max Simij l (10) i 1 j for omparing the rules i of deision d 1 with rules j of deision d 2. Note that the similarity Sim d1,d2 must not be the same. The omparison of deision tree_1 in Figure 7 with deision tree_2 in Figure 8 gives a similarity value of 0.75 based on the above desribed measure. The upper struture of deision tree_2 is similar to deision tree_1 but deision tree_2 has a few more lower leaves. The deision tree_3 in Figure 9 is similar to deision tree_1 and deision_tree_2 by a similarity value of 0.125. Deision tree_3 in Figure 10 has no similarity at all ompared to all other trees. The similarity value is zero. Suh a similarity measure an help an expert to understand the developed model and also help to ompare two models that have been built based on two data set, wherein one ontains N examples and the other one ontains N+L samples. There are other options for onstruting the similarity measure [16]. 5 Conlusion The aim of this paper is to disuss how to deal with the result of data mining methods suh as deision tree indution. This paper has been prompted by the fat that domain experts are able to use the tools for deision tree indution but have a hard time interpreting the results. A lot of fators have to be taken into onsideration. The quantitative measures give a good overview in regard to the quality of the learnt model. But omputer siene experts laim that deision trees have explanation apabilities and that, ompared to neural nets and SVM, the user an understand the deision. This is only partially true. Of ourse, the user an follow the path of a deision from the top to the leaves and this provides him with a rule where the deisions in the node are ombined by logial ANDs. But often this is triky. A user likes rules that fit his domain knowledge and make sense in some way. Often this is not the ase sine the most favored attributes of the user do not appear at a high position. That makes the interpretation of a deision trees diffiult. The user s domain knowledge, even if it is only limited, is an indiator whether he aepts the tree or not. Some features a deision tree indution algorithm should have are mentioned in this paper. The deision tree indution algorithm should allow the user to interat with the indution algorithm. If two attributes are ranked more or less the same the user should be able to hoose whih one of the attributes to pik. The noise in the data should be heked with respet to different aspets. Quality measures of the model like mean auray, standard deviation and lass-speifi auray are neessary in order to judge the quality of a learnt deision tree right. The evaluation should be done by ross validation as test-and-train methods are not up-todate anymore. Among other things, explanation features are needed that show the split of the attributes and how it is represented in the deision spae. Simple visualization tehniques, like 2-d diagram plots, are often helpful to disover what happened in the data. Wrong labels have to be disovered by an oralebased lassifiation sheme. This should be supported by the tool. The omparison of two learned trees from the same domain is another important issue that a user needs in order to understand what has hanged. Therefore, proper similarity measures are needed that give a measure of goodness. Referenes: [1] Perner, P.: Data Mining on Multimedia Data. LNCS, vol. 2558, Springer Verlag (2002) [2] Dougherty, J, Kohavi, R, and Sahamin, M: Supervised and Unsupervised Disretization of Continuous Features, Mahine Learning, 14th IJCAI, pp. 194-202 (1995) [3] Quinlan, JR: Deision trees and multivalued attributes. In: Hayes, J.E, Mihie, D, Rihards, J. (eds.) Mahine Intelligene 11, Oxford University Press(1988), [4] Copersmith, D, Hong, SJ, Hosking, J.: Partitioning nominal attributes in deision trees. Journal of data mining and knowledge disovery 3 (2), 100-200 (1999) [5] de Mantaras, R.L.: A distane-based attribute seletion measure for deision tree indution. Mahine Learning 6, 81-92 (1991) ISBN: 978-1-61804-276-7 32

[6] White, A.P., Lui, W.Z.: Bias in informationbased measures in deision tree indution. Mahine Learning 15, 321-329 (1994) [7] Philipow, E.: Handbuh der Elektrotehnik, Bd 2 Grundlagen der Informationstehnik. pp. 158-171, Tehnik Verlag, Berlin, (1987) [8] Perner, P., Zsherpel, U., Jaobsen, C.: A Comparision between Neural Networks and Deision Trees based on Data from Industrial Radiographi Testing, Pattern Reognition Letters 22), 47-54 (2001) [9] Georg, G., Séroussi, B., Bouaud, J.: Does GEM- Enoding Clinial Pratie Guidelines Improve the Quality of Knowledge Bases? A Study with the Rule-Based Formalism. AMIA Annu Symp Pro. 2003, pp. 254 258 (2003) [10] Lee, S., Lee, S.H., Lee, K. C., Lee, M.H., Harashima, F.; Intelligent performane management of networks for advaned manufaturing systems, IEEE Transations on Industrial Eletronis 48 (4), 731-741 (2001) [11] Bazijane, B., Gausmann, O., Turowski, K.: Parsing Effort in a B2B Integration Senario - An Industrial Case Study. Enterprise Interoperability II, Part IX, pp. 783-794, Springer Verlag (2007) [12] Muggleton, S.: Due - An Orale-based Approah to Construtive Indution. Proeeding of the Tenth International Join Conferene on Artifiial Intelligene (IJCAI 87), pp. 287-292 (1987) [13] Wu, B., Nevatia, R.: Improving Part based Objet Detetion by Unsupervised, Online Boosting, IEEE Conferene on Computer Vision and Pattern Reognition, 2007. CVPR '07, pp. 1 8 (2007) [14] Whiteley, J.R., Davis, J.F.: A similarity-based approah to interpretation of sensor data using adaptive resonane theory. Computers & Chemial Engineering 18(7), 637-661 (1994) [15] Perner, P.: Prototype-Based Classifiation. Applied Intelligene 28(3), 238-246 (2008) [16] Perner, P.: A Method for Supporting the Domain Expert by the Interpretation of Different Deision Trees Learnt from the Same Domain Quality and Reliability Engineering International, Volume 30, Issue 7, pages 985 992, November 2014. [17] Perner, P., Deision Tree Indution Methods and Their Appliation to Big Data,In: Fatos Xhafa, Leonard Barolli, Admir Barolli, Petraq Papajorgji (Eds.), Modeling and Optimization in Siene and Tehnologies, Modeling and Proessing for Next-Generation Big-Data Tehnologies With Appliations and Case Studies, Volume 4 2015, Springer Verlag 2015, pp. 57-88. ISBN: 978-1-61804-276-7 33

Appendix Fig. 11. Deision Surfae for two Attributes on eah Level of the Deision Tree shown in Figure 3 Advanes in Information Siene and Computer Engineering ISBN: 978-1-61804-276-7 34