A Genetic Algorithm for Constructing Compact Binary Decision Trees

Size: px
Start display at page:

Download "A Genetic Algorithm for Constructing Compact Binary Decision Trees"

Transcription

1 OURNL OF PTTERN REOGNITION RESERH (9) -3 Received ul 7, 8. ccepted Feb 6, 9. Genetic lgorithm for onstructing ompact inary ecision Trees Sung-Hyuk ha harles Tappert omputer Science epartment, Pace University 86 edford Road, Pleasantville, Ne York, 57, US bstract Tree-based classifiers are important in pattern recognition and have been ell studied. lthough the problem of finding an optimal decision tree has received attention, it is a hard optimization problem. Here e propose utilizing a genetic algorithm to improve on the finding of compact, near-optimal decision trees. We present a method to encode and decode a decision tree to and from a chromosome here genetic operators such as mutation and crossover can be applied. Theoretical properties of decision trees, encoded chromosomes, and fitness functions are presented. eyords: inary ecision Tree, Genetic lgorithm.. Introduction ecision trees have been ell studied and idely used in knoledge discovery and decision support systems. Here e are concerned ith decision trees for classification here the leaves represent classifications and the branches represent feature-based splits that lead to the classifications. These trees approximate discrete-valued target functions as trees and are a idely used practical method for inductive inference []. ecision trees have prospered in knoledge discovery and decision support systems because of their natural and intuitive paradigm to classify a pattern through a sequence of questions. lgorithms for constructing decision trees, such as I3 [-3], often use heuristics that tend to find short trees. Finding the shortest decision tree is a hard optimization problem [4, 5]. ttempts to construct short, near-optimal decision trees have been described (see the extensive survey [6]). Gehrke et al developed a bootstrapped optimistic algorithm for decision tree construction [7]. For continuous attribute data, Zhao and Shirasaka [8] suggested an evolutionary design, ennett and lue [9] proposed an extreme point tabu search algorithm, and Pajunen and Girolami [] exploited linear independent component analysis (I) to construct binary decision trees. Genetic algorithms (Gs) use an optimization technique based on natural evolution [,,, ]. Gs have been used to find near-optimal decision trees in tofold. On the one hand, they ere used to select attributes to be used to construct decision trees in a hybrid or preprocessing manner [3-5]. On the other hand, they ere applied directly to decision trees [6, 7]. problem that arises ith this approach is that an attribute may appear more than once in the path of the tree. In this paper, e describe an alternate method of constructing near-optimal binary decision trees proposed succinctly in [8]. In order to utilize genetic algorithms, decision trees must be represented as chromosomes here genetic operators such as mutation and crossover can be applied. The main contribution of this paper is proposing a ne scheme to encode and decode a decision tree to and from a chromosome. The remainder of the paper is organized as follos. Section revies decision trees and defines a ne function denoted as d to describe the complexity. Section 3 presents the encoding/decoding c 9 PRR. ll rights reserved. Permissions to make digital or hard copies of all or part of this ork for personal or classroom use is granted ithout fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherise, or to republish, requires a fee and/or special permission from PRR.

2 H N TPPERT (a) T consistent to (b) T x inconsistent to Fig. : ecision trees consistent and inconsistent ith. decision trees to/from chromosomes hich stems from d, genetic operators like mutation and crossover, fitness functions and their analysis. Finally, Section 4 concludes this ork.. Preliminary Let be a set of labeled training data, a database of instances represented by attribute-value pairs here each attribute can have a small number of possible disjoint values. Here e consider only binary attributes. Hence, has n instances here each instance x i consists of d ordered binary attributes and a target value hich is one of c states of nature,. The folloing sample database here n = 6, d = 4, c =, and = {,} ill be used for illustration throughout the rest of this paper. x x = x 3 x 4 x 5 x 6 lgorithms to construct a decision tree take a set of training instances as input and output a learned discrete-valued target function in the form of a tree. decision tree is a rooted tree T that consists of internal nodes representing attributes, leaf nodes representing labels, and edges representing the attributes possible values. ranches represent the attributes possible values and in binary decision trees, left branches have values of and right branches have values of as shon in Fig.. For simplicity e omit the value labels in some later figures. decision tree represents a disjunction of conjunctions. In Fig. (a), for example, T represents the and states as ( ) ( ) ( ) and ( ) ( ) ( ), respectively. Each conjunction corresponds to a path from the root to a leaf. decision tree based on a database ith c number of classes is a c-class classification problem. ecision trees classify instances by traversing from root node to leaf node. The classification process starts from the root node of a decision tree, tests the attribute specified at this node, and then moves don the tree branch according to the attribute value given. Fig. shos to decision trees, T and T x. The decision tree T is said to be a consistent decision tree because it is consistent

3 GENETI LGORITHM FOR ONSTRUTING OMPT INRY EISION TREES (a) T ith nodes (b) T 3 ith 9 nodes Fig. : To decision trees consistent ith : (a) by I-3 and (b) by G. ith all instances in. Hoever, the decision tree T x is inconsistent ith because x s class is actually in hereas T x classifies it as. There are to important properties of a binary decision tree: Property The size of a decision tree ith l leaves is l. Property The loer and upper bounds of l for a consistent binary decision tree are c and n: c l n. The number of leaves in a consistent decision tree must be at least c in the best cases. In the orst cases, the number of leaves ill be the size of ith each instance corresponding to a unique leaf, e.g., T and T.. Occams Razor and I3 mong numerous decision trees that are consistent ith the training database of instances, Fig. shos to of them. ll instances x = {x,...,x 6 } are classified correctly by both decision trees T and T 3. Hoever, an unknon instance,,,,?, hich is not in the training set, is classified differently by the to decision trees; T classifies the instance as hereas T 3 classifies it as. This inductive inference is a fundamental problem in machine learning. The minimum description length principle formalized from Occam s Razor [9] is a very important concept in machine learning theory [, ]. lbeit controversial, many decision-tree building algorithms such as I3 [3] prefer smaller (more compact, shorter depth, feer nodes) trees and thus the instance,,,,? is preferably classified as by T 3 because T 3 is shorter than T. In other ords, T 3 has a simpler description than T. The shorter the tree, the feer the number of questions required to classify instances. ased on Occam s Razor, Ross Quinlan proposed a heuristic that tends to find smaller decision trees [3]. The algorithm is called I3 (Iterative ichotomizer 3) and it utilizes the Entropy hich is a measure of homogeneity of examples as defined in the equation. Entropy() = x ={,} P(x)log P(x) () 3

4 H N TPPERT L x x x3 x5 x6 gain... LL x x x6 gain LLL x x6 gain gain R.9 x4 LR x3 x5 gain LLR x LRR LRL x3 x5 Fig. 3: Illustration of the I3 algorithm. Information gain or simply gain is defined in terms of Entropy here X is one of attributes in. When all attributes are binary type, the gain can be defined as in the equation. ( L gain(, X) = Entropy() Entropy( L) + ) R Entropy( R) () The I3 algorithm first selects the attribute hose gain is the maximum as a root node. For all subtrees of the root node, it finds the next attribute hose gain is the maximum iteratively. Fig. 3 illustrates the I3 algorithm. Starting ith the root node, it evaluates all attributes in the database. Since the attribute has the highest gain, the attribute is selected as a root node. Then it partitions the database into to sub databases: L and R. For each sub-database, it calculates the gain. s a result, T decision tree is built. Hoever, as is apparent from Fig. the I3 algorithm does not necessarily find the smallest decision tree.. omplexity of d Function To the extent that smaller trees are preferred, it becomes interesting to find a smallest decision tree. Finding a smallest decision tree is an NP-complete problem though [4, 5]. So as to comprehend the complexity of finding a smallest decision tree, consider a full binary decision tree, T f (the superscript f denotes full), here each path from the root to a leaf contains all the attributes exactly once as exemplified in Fig. 4. There are d leaves here d + is the height of the full decision tree. In order to build a consistent full binary tree, one may choose any attribute as a root node, e.g., there are four choices in Fig. 4. In the second level nodes, one can choose any attribute that is not in the root node. For example, in Fig. 4 there are possible full binary trees. We denote the number of possible full binary tree ith d attributes as d. efinition The number of possible full binary tree ith d attributes is formally defined as 4

5 GENETI LGORITHM FOR ONSTRUTING OMPT INRY EISION TREES {x,x,x3,x4,x5,x6} {x, x5} {x,x3,x4,x6} {x} {x5} {x,x3,x6} {x4} d + {x} {x5} {x3,x6} {x} {x4} {x} {x5} {x6} {x3} {x} {x4} d Fig. 4: sample full binary decision tree structure T f. Table : omparison of d and d! functions. d d d! = =!= = =!= 3 3 = 3 = 3!= = = 576 4!= = = != 6 6 = = != = =.984e + 7 7!= = =.935e != = = e + 9!= 3688 = = 5.836e + 4!= 3688 d = d i d i. (3) i= s illustrated in Table, as d gros, the function d gros faster than all polynomial, exponential, and even factorial functions. The factorial function d! is the product of all positive integers less than or equal to d and many combinatorial optimization problems have the complexity of O(d!) search space. The search space of full binary decision trees is much larger, i.e., d = Ω(d!). Loer and upper bounds of d are Ω( d ) and o(d d ). Note that real decision trees, such as T in Fig. (a), can be sparse because some internal nodes can be leaves as long as they are homogeneous. Hence, the search space for finding a shortest binary decision tree can be smaller than d. 3. Genetic lgorithm for inary ecision Tree onstruction Genetic algorithms can provide good solutions to many optimization problems [, ]. They are based on natural processes of evolution and the survival-of-the-fittest concept. In order to use the 5

6 H N TPPERT {~4} T e {~3} {~} (a) {~4} {~3} {~} S f 3 3 (b) Fig. 5: Encoding and decoding schema: (a) encoded tree for T f scheduling string. in Fig. 4 and (b) its chromosome attribute-selection genetic algorithm process, one must define at least the folloing four steps: encoding, genetic operators such as mutation and crossover, decoding, and fitness function. 3. Encoding For genetic algorithms to construct decision trees the decision trees must be encoded so that genetic operators, such as mutation and crossover, can be applied. Let = {a,a,...,a d } be the ordered attribute list. We illustrate and describe the process by considering the full binary decision tree T f in Fig. 4, here = {,,,}. Graphically speaking, the encoding process converts the attribute names in T f into the index of the attribute according to the ordered attribute list,, recursively, starting from the root as depicted in Fig. 5. For example, the root is and its index in is 3. Recursively, for each sub-tree, update to {} = {,,} attribute list. The possible integer values at a node in the i th level in the encoded decision tree T e are from to d i +. Finally, take the breadth-first traversal to generate the chromosome string S, hich stems from d function. For T f the chromosome string S is given in Fig. 5 (b). T and T 3 in Fig. is encoded into S = 3,,3,,,, here can be any number ithin the restricted bounds. T and T 3 in Fig. are encoded into S = 4,,,,,, and S 3 =,,,,,,, respectively. Let us call this a chromosome attribute-selection scheduling string, S, here genetic operators can be applied. Properties of S include: Property 3 The parent position of position i is i/, except for i =, the root. Property 4 The left and right child positions of position i are i and i +, respectively, if i d ; otherise, there are no children. Property 5 The length of the S s is exponential in d : S = d. Property 6 Possible integer values at position i are to d log(i + ) : S i {,...,d log(i + ) }. Property 6 provides the restricted bounds for each position of S. 6

7 GENETI LGORITHM FOR ONSTRUTING OMPT INRY EISION TREES {~4} {~3} {~} S f S 4 S f T4 f T5 Fig. 6: Mutation operator. Property 7 The number of possible S, π(s), is d or π(s) = S (d log(i + ) ) i. i= The loer and upper bounds of π(s) are Ω( S ) and o(d S ). 3. Genetic Operators To of the most common genetic operators are mutation and crossover. The mutation operator is defined as changing the value of a certain position in a string to one of the possible values in the range. We illustrate the mutation process on the attribute selection scheduling string S f = 3,,3,,,, in Fig. 6. If a mutation occurs in the first position and changes the value to 4, hich is in the range {,..,4}, T f 4 is generated. If a mutation happens in the third position and changes the value to, hich is in the range {,..,3}, then T f 5 is generated. s long as the changed value is ithin the alloed range, the resulting ne string alays generates a valid full binary decision tree. onsider S x = 4,,3,,,,. full binary decision tree is built according to this schedule. The final decision tree for S x ill be T Fig., i.e., S x = 4,,,,,,. There are 3 = equivalent schedules that produce T. Therefore, for the mutation to be effective e apply the mutation operators only to those positions that are not. To illustrate the crossover operator, consider the to parent attribute selection scheduling strings, P and P, in Fig. 7. fter randomly selecting a split point, the first part of P and the last part of P contribute to yield a child string S 6. Reversing the crossover produces a second child S 7. The resulting full binary decision trees for these to children are T f 6 and T f 7, respectively. 3.3 ecoding ecoding is the reverse of the encoding process in Fig. 5. Starting from the root node, e place the attribute according to the chromosome schedule S hich contains the index values of attribute list. When an attribute a is selected, is divided into left and right branches L and R. L consists of all the x i having a value of and R consists of all the x i having a value of. For each pair of sub-trees e repeat the process recursively ith the ne attribute list = {a}. When a node becomes homogeneous, i.e., all class values in are the same, e label the leaf. Fig. 8 displays the decision trees from S 4, S 5, S 6, and S 7, respectively. 7

8 H N TPPERT P : P : S 6 : S 7 : f f T6 T7 Fig. 7: rossover operator. T 4 T 5 T 6 T 7 Fig. 8: ecoded decision trees. Sometimes a chromosome introduces mutants. For instance, consider a chromosome S 8 3,3,,,,, hich results T 8 in Fig. 9. The occurs hen at the node attribute a has non-homogeneous labels but the a column in has either all s or all s. In other ords, the a attribute provides no information gain. We refer to such a decision tree as a mutant tree for to reasons. First, hat values should e put in? The label may be chosen at random but L provides no clue. Second, if e allo entering a value for, it may violate the Property ; the number of leaves l may exceed n. Indeed, T 8 behaves identically to T 6 ith respect to. Thus, mutant decision trees ill not be chosen as the fittest trees according to the fitness functions presented in the later section. 3.4 Fitness Functions Each attribute selection scheduling string S must be evaluated according to a fitness function. We consider to cases: in the first case contains no contradicting instances and d is a small finite number, in the other case d is very large. ontradicting instances are those hose attribute values are 8

9 GENETI LGORITHM FOR ONSTRUTING OMPT INRY EISION TREES L x x5 T 8 R x x3 x4 x6 LL x x5 RL x3 x4 x6 RR x LLL x LLR x5 RLL x4 x6 RLR x3 Fig. 9: Mutant binary decision tree. identical but their target values are different. The case ith small d and no contradicting instances has application to netork function representation []. Theorem Every attribute selection scheduling string S produces a consistent full binary decision tree as long as the set of instances contains no contradicting instances. Proof Every attribute selection scheduling string S produces a full binary decision tree, e.g., in Fig. 4. Since each instance in corresponds to a certain branch, add the leaves ith target values in the set of instances accordingly. If the target value is not available in the training set, add a random target value. This becomes a complete truth table as the number of leaves is d. The predicted value by the decision tree is alays consistent ith the actual target value in. If there are to instances ith same attribute values but different classes, no consistent binary decision tree exists. Since the databases e consider contain no contradicting instances, one obvious fitness function is the size of the tree; the fitness function f s is the size, the total number of nodes, e.g., f s (T ) = and f s (T 3 ) = 9. Here, hoever, e use a different fitness function f d, i.e., the sum of the depth of the leaf nodes for each instance. This is equivalent to the sum of questions to be asked for each instance, e.g., f d (T ) = 8 and f d (T 3 ) = f d (T 6 ) = 5 as shon in Table. This requires the assumption that each instance in has equal prior probability. oth of these evaluation functions suggest that T 3 and T 6 are better than T hich is produced by the I3 algorithm. With to decision trees of the same size (l ) here l is the number of leaves, the number of questions to be asked could be different by theorem. Theorem There exist T x and T y such that f s (T x ) = f s (T y ) and f d (T x ) < f d (T y ). Proof Let T x be a balanced binary tree and T y be a skeed or linear binary tree. Then f d (T x ) = Θ(n log n) hereas F d (T y ) = Θ(n ) = Θ( n i). i= 9

10 H N TPPERT Table : Training instances and their depths in decision trees. T T T 3 T 4 T 5 T 6 T 7 T 8 x x x x x x sum The f d fitness function prefers not only shorter decision trees to larger ones, but also balanced decision trees to skeed ones. orollary n log c f d (T x ) n i here n is the number of instances. i= Proof y the Property, in the best case the decision tree ill have l = c leaves. In the best case, the tree is completely balanced ith height log c. Thus the loer bound for f d (T x ) is n log c. In the orst case, the number of leaves is n by the Property and the decision tree forms a skeed or linear binary tree. Thus the upper bound is n i. i= If f d (T x ) < n log c, T x classifies only a subset of classes. Regardless of the size, this tree should not be selected. If f d (T x ) > n i, there is a mutant node and T x can be shortened. i= When d is large, the size of the chromosome, S ill explode by the Property 5. In this case e must limit the height of the decision tree. The chromosome has a finite length and guides to select attributes up to a certain height. Since n d typically, a good choice of the height of the decision tree is h = log n, i.e., S n. When a branch reaches a height h, the node is assigned to be a leaf ith the class hose prior probability is the highest instead of choosing another attribute. When the height is limited, decision trees may be no longer consistent to. Suppose that the height is limited to 3; h = 3 in Fig. 3 case. Fig. shos the pruned binary decision tree, T 9. In the 4th node in breadth first traversal order, instead of choosing attribute in Fig. 3, is labeled because the prior probability of is higher than in LL. When the prior probabilities are the same, either one can be chosen. Let f a be the fitness function that represents the percentage of correctly classified instances by the decision tree. In Fig., {x,x,x 4,x 5 } are correctly predicted by T 9 and hence f a (T 9 ) = 4/6. We randomly generated a 6 binary attribute training database and limited the tree height to 8. Fig. shos the initial population of binary decision trees generated by random chromosomes in respect to their f d (T x ) and accuracy f a (T x ) on the training examples. The size of decision trees and their accuracies on the training examples have no correlations. 4. iscussion In this paper, e revieed binary decision trees and demonstrated ho to utilize genetic algorithms to find compact binary decision trees. y limiting the tree s height the presented method guarantees finding a better or equal decision tree than the best knon algorithms since such trees can be put in the initial population. Methods that apply Gs directly to decision trees [6, 7] can yield subtrees that are never visited as shon in Fig.. fter mutation operator in O node in T a, T c has a dashed subtree that

11 GENETI LGORITHM FOR ONSTRUTING OMPT INRY EISION TREES L x x x3 x5 x6 R x4 LL x3 x x5 x x6 T 9 LR x x x3 x4 x5 x6 Predicted by T 9 Fig. : Pruned decision tree at the height of 3. Fig. : inary decision trees ith respect to f d and accuracy f a. is never visited. fter crossover beteen T a and T b, the child trees T d and T e also have dashed subtrees. These unnecessary subtrees occur henever an attribute occurs more than once in a path of a decision tree. Hoever, by encoding the decision tree, this problem never occurs as illustrated in Fig. 3. When d is large, limiting the height as recommended. Encoding the binary decision tree in this paper utilized the breadth first traversal. Hoever, pre-order depth first traversal can be utilized as shon in Fig. 3. Mutation shon in this paper is still valid in pre-order depth first traversal representation but crossover may cause a mutant hen to subtrees to be sitched are at different levels.

12 H N TPPERT The index number of a node may exceed the limit due to a crossover. Thus mutation immediately after crossover is inevitable. nalyzing and implementing the depth first traversal of encoded decision tree remains ongoing ork. Encoding and decoding non-binary decision trees here different attributes have different possible values is an open problem. References [] Mitchell, T. M., Machine Learning, McGra-hill, 997. [] uda, R. O., Hart, P. E., and Stork,. G., Pattern lassification, nd Ed., Wiley interscience,. [3] Quinlan,. R., Induction of decision trees, Machine Learning, (), 8-6. [4] L. Hyafil and R. L. Rivest, onstructing optimal binary decision trees is NP-complete, Information Processing Letters, Vol. 5, No., 5-7, 976. [5] odlaender, L.H. and Zantema H., Finding Small Equivalent ecision Trees is Hard, International ournal of Foundations of omputer Science, Vol., No. World Scientific Publishing,, pp [6] Safavian, S.R. and Landgrebe,., survey of decision tree classifier methodology,ieee Transactions on Systems, Man and ybernetics, Vol, No. 3, pp , 99. [7] Gehrke,., Ganti, V., Ramakrishnan R., and Loh, W., OT-Optimistic ecision Tree onstruction, in /it Proc. of the M SIGMO onference on Management of ata, 999, p69-8. [8] Zhao, Q. and Shirasaka, M., Study on Evolutionary esign of inary ecision Trees, in Proceedings of the ongress on Evolutionary omputation, Vol 3, IEEE, 999, pp [9] ennett,. and lue,., Optimal decision trees, Tech. Rpt. No. 4 epartment of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, Ne York., 996. [] Pajunen, P. and Girolami, M., Implementing decisions in binary decision trees using independent component analysis, in Proceedings of I,, pp [] Goldberg. L., Genetic lgorithms in Search, Optimization, and Machine Learning, ddison-wesley, 989. [] Mitchell, M., n Introduction to Genetic lgorithms, Massachusetts Institute of Technology, 996. [3] young Min im, oong o Park, Myung Hyun Song, In heol im, and hing Y. Suen, inary ecision Tree Using Genetic lgorithm for Recognizing efect Patterns of old Mill Strip, LNS Vol. 36, Springer, 4, pp [4] Teeusen, S.P., Erlich, I., El-Sharkai, M.., and achmann, U., Genetic algorithm and decision treebased oscillatory stability assessment, IEEE Transactions on Poer Systems, Vol, Issue, May 6 pp [5] ala,., Huang,., Vafaie, H., eong,., and Wechsler, H., Hybrid learning using genetic algorithms and decision tress for pattern classification. in Proceedings of the 4th International oint onference on rtificial Intelligence, Montreal, anada, 995, pp [6] Papagelis,. and alles,., G Tree: genetically evolved decision trees, in Proceedings of th IEEE International onference on Tools ith rtificial Intelligence,, pp [7] Fu, Z., n Innovative G-ased ecision Tree lassifier in Large Scale ata Mining, LNS Vol. 74, Springer, 999, pp [8] S. ha and.. Tappert, onstructing inary ecision Trees using Genetic lgorithms, in Proceedings of International onference on Genetic and Evolutionary Methods, uly 4-7, 8, Las Vegas, Nevada. [9] P. Grnald, The Minimum escription Length principle, MIT Press, une 7. [] Martinez, T. R. and ampbell,. M., Self-Organizing inary ecision Tree For Incrementally efined Rule ased Systems, Systems, IEEE Systems, Man, and ybernetics, Vol., No. 5, pp.3-38, 99.

13 GENETI LGORITHM FOR ONSTRUTING OMPT INRY EISION TREES T a T b O R O V Mutate M rossover T c T d T e O R O V M M Fig. : irect G on decision trees. e T a 9 e T b e T f e T g e T h Tf Tg T h O R L L N V N L Fig. 3: Pre-order depth first traversal for decision trees. 3

OPTIMIZING WEB SERVER'S DATA TRANSFER WITH HOTLINKS

OPTIMIZING WEB SERVER'S DATA TRANSFER WITH HOTLINKS OPTIMIZING WEB SERVER'S DATA TRANSFER WIT OTLINKS Evangelos Kranakis School of Computer Science, Carleton University Ottaa,ON. K1S 5B6 Canada kranakis@scs.carleton.ca Danny Krizanc Department of Mathematics,

More information

E-mail Spam Filtering Using Genetic Algorithm: A Deeper Analysis

E-mail Spam Filtering Using Genetic Algorithm: A Deeper Analysis ISSN:0975-9646 Mandeep hodhary et al, / (IJSIT) International Journal of omputer Science and Information Technologies, Vol. 6 (5), 205, 4266-4270 E-mail Spam Filtering Using Genetic Algorithm: A Deeper

More information

Leveraging Multipath Routing and Traffic Grooming for an Efficient Load Balancing in Optical Networks

Leveraging Multipath Routing and Traffic Grooming for an Efficient Load Balancing in Optical Networks Leveraging ultipath Routing and Traffic Grooming for an Efficient Load Balancing in Optical Netorks Juliana de Santi, André C. Drummond* and Nelson L. S. da Fonseca University of Campinas, Brazil Email:

More information

A Quantitative Approach to the Performance of Internet Telephony to E-business Sites

A Quantitative Approach to the Performance of Internet Telephony to E-business Sites A Quantitative Approach to the Performance of Internet Telephony to E-business Sites Prathiusha Chinnusamy TransSolutions Fort Worth, TX 76155, USA Natarajan Gautam Harold and Inge Marcus Department of

More information

Full and Complete Binary Trees

Full and Complete Binary Trees Full and Complete Binary Trees Binary Tree Theorems 1 Here are two important types of binary trees. Note that the definitions, while similar, are logically independent. Definition: a binary tree T is full

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

The Goldberg Rao Algorithm for the Maximum Flow Problem

The Goldberg Rao Algorithm for the Maximum Flow Problem The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }

More information

A Non-Linear Schema Theorem for Genetic Algorithms

A Non-Linear Schema Theorem for Genetic Algorithms A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland

More information

Binary Search Trees 3/20/14

Binary Search Trees 3/20/14 Binary Search Trees 3/0/4 Presentation for use ith the textbook Data Structures and Algorithms in Java, th edition, by M. T. Goodrich, R. Tamassia, and M. H. Goldasser, Wiley, 04 Binary Search Trees 4

More information

An innovative application of a constrained-syntax genetic programming system to the problem of predicting survival of patients

An innovative application of a constrained-syntax genetic programming system to the problem of predicting survival of patients An innovative application of a constrained-syntax genetic programming system to the problem of predicting survival of patients Celia C. Bojarczuk 1, Heitor S. Lopes 2 and Alex A. Freitas 3 1 Departamento

More information

Online Learning of Genetic Network Programming (GNP)

Online Learning of Genetic Network Programming (GNP) Online Learning of Genetic Network Programming (GNP) hingo Mabu, Kotaro Hirasawa, Jinglu Hu and Junichi Murata Graduate chool of Information cience and Electrical Engineering, Kyushu University 6-0-, Hakozaki,

More information

Classification and Prediction

Classification and Prediction Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

More information

Introduction to Learning & Decision Trees

Introduction to Learning & Decision Trees Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing

More information

CRITERIUM FOR FUNCTION DEFININING OF FINAL TIME SHARING OF THE BASIC CLARK S FLOW PRECEDENCE DIAGRAMMING (PDM) STRUCTURE

CRITERIUM FOR FUNCTION DEFININING OF FINAL TIME SHARING OF THE BASIC CLARK S FLOW PRECEDENCE DIAGRAMMING (PDM) STRUCTURE st Logistics International Conference Belgrade, Serbia 8-30 November 03 CRITERIUM FOR FUNCTION DEFININING OF FINAL TIME SHARING OF THE BASIC CLARK S FLOW PRECEDENCE DIAGRAMMING (PDM STRUCTURE Branko Davidović

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

More information

Expert Systems with Applications

Expert Systems with Applications Expert Systems ith Applications 36 (2009) 5450 5455 Contents lists available at ScienceDirect Expert Systems ith Applications journal homepage:.elsevier.com/locate/esa Trading rule discovery in the US

More information

Optimal Index Codes for a Class of Multicast Networks with Receiver Side Information

Optimal Index Codes for a Class of Multicast Networks with Receiver Side Information Optimal Index Codes for a Class of Multicast Networks with Receiver Side Information Lawrence Ong School of Electrical Engineering and Computer Science, The University of Newcastle, Australia Email: lawrence.ong@cantab.net

More information

Analysis of Algorithms I: Optimal Binary Search Trees

Analysis of Algorithms I: Optimal Binary Search Trees Analysis of Algorithms I: Optimal Binary Search Trees Xi Chen Columbia University Given a set of n keys K = {k 1,..., k n } in sorted order: k 1 < k 2 < < k n we wish to build an optimal binary search

More information

Choosing the Optimal Object-Oriented Implementation using Analytic Hierarchy Process

Choosing the Optimal Object-Oriented Implementation using Analytic Hierarchy Process hoosing the Optimal Object-Oriented Implementation using Analytic Hierarchy Process Naunong Sunanta honlameth Arpnikanondt King Mongkut s University of Technology Thonburi, naunong@sit.kmutt.ac.th King

More information

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

More information

Decision Trees from large Databases: SLIQ

Decision Trees from large Databases: SLIQ Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values

More information

A Survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery

A Survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery A Survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery Alex A. Freitas Postgraduate Program in Computer Science, Pontificia Universidade Catolica do Parana Rua Imaculada Conceicao,

More information

Data Mining with R. Decision Trees and Random Forests. Hugh Murrell

Data Mining with R. Decision Trees and Random Forests. Hugh Murrell Data Mining with R Decision Trees and Random Forests Hugh Murrell reference books These slides are based on a book by Graham Williams: Data Mining with Rattle and R, The Art of Excavating Data for Knowledge

More information

AnalysisofData MiningClassificationwithDecisiontreeTechnique

AnalysisofData MiningClassificationwithDecisiontreeTechnique Global Journal of omputer Science and Technology Software & Data Engineering Volume 13 Issue 13 Version 1.0 Year 2013 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals

More information

Efficient Integration of Data Mining Techniques in Database Management Systems

Efficient Integration of Data Mining Techniques in Database Management Systems Efficient Integration of Data Mining Techniques in Database Management Systems Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex France

More information

Classification/Decision Trees (II)

Classification/Decision Trees (II) Classification/Decision Trees (II) Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Right Sized Trees Let the expected misclassification rate of a tree T be R (T ).

More information

Oracle Scheduling: Controlling Granularity in Implicitly Parallel Languages

Oracle Scheduling: Controlling Granularity in Implicitly Parallel Languages Oracle Scheduling: Controlling Granularity in Implicitly Parallel Languages Umut A. Acar Arthur Charguéraud Mike Rainey Max Planck Institute for Softare Systems {umut,charguer,mrainey}@mpi-ss.org Abstract

More information

D-optimal plans in observational studies

D-optimal plans in observational studies D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

More information

Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm

Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm Journal of Al-Nahrain University Vol.15 (2), June, 2012, pp.161-168 Science Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm Manal F. Younis Computer Department, College

More information

Network (Tree) Topology Inference Based on Prüfer Sequence

Network (Tree) Topology Inference Based on Prüfer Sequence Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 vanniarajanc@hcl.in,

More information

Multi-layer Structure of Data Center Based on Steiner Triple System

Multi-layer Structure of Data Center Based on Steiner Triple System Journal of Computational Information Systems 9: 11 (2013) 4371 4378 Available at http://www.jofcis.com Multi-layer Structure of Data Center Based on Steiner Triple System Jianfei ZHANG 1, Zhiyi FANG 1,

More information

root node level: internal node edge leaf node CS@VT Data Structures & Algorithms 2000-2009 McQuain

root node level: internal node edge leaf node CS@VT Data Structures & Algorithms 2000-2009 McQuain inary Trees 1 A binary tree is either empty, or it consists of a node called the root together with two binary trees called the left subtree and the right subtree of the root, which are disjoint from each

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 7, July 23 ISSN: 2277 28X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Greedy Algorithm:

More information

LCs for Binary Classification

LCs for Binary Classification Linear Classifiers A linear classifier is a classifier such that classification is performed by a dot product beteen the to vectors representing the document and the category, respectively. Therefore it

More information

Catalan Numbers. Thomas A. Dowling, Department of Mathematics, Ohio State Uni- versity.

Catalan Numbers. Thomas A. Dowling, Department of Mathematics, Ohio State Uni- versity. 7 Catalan Numbers Thomas A. Dowling, Department of Mathematics, Ohio State Uni- Author: versity. Prerequisites: The prerequisites for this chapter are recursive definitions, basic counting principles,

More information

GRAPH THEORY LECTURE 4: TREES

GRAPH THEORY LECTURE 4: TREES GRAPH THEORY LECTURE 4: TREES Abstract. 3.1 presents some standard characterizations and properties of trees. 3.2 presents several different types of trees. 3.7 develops a counting method based on a bijection

More information

Recursive Algorithms. Recursion. Motivating Example Factorial Recall the factorial function. { 1 if n = 1 n! = n (n 1)! if n > 1

Recursive Algorithms. Recursion. Motivating Example Factorial Recall the factorial function. { 1 if n = 1 n! = n (n 1)! if n > 1 Recursion Slides by Christopher M Bourke Instructor: Berthe Y Choueiry Fall 007 Computer Science & Engineering 35 Introduction to Discrete Mathematics Sections 71-7 of Rosen cse35@cseunledu Recursive Algorithms

More information

Flexible Neural Trees Ensemble for Stock Index Modeling

Flexible Neural Trees Ensemble for Stock Index Modeling Flexible Neural Trees Ensemble for Stock Index Modeling Yuehui Chen 1, Ju Yang 1, Bo Yang 1 and Ajith Abraham 2 1 School of Information Science and Engineering Jinan University, Jinan 250022, P.R.China

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Source-Channel Coding with Multiple Classes

Source-Channel Coding with Multiple Classes -Channel Coding ith Multiple Classes Irina E. Bocharova 1, Albert Guillén i Fàbregas 234, Boris D. Kudryashov 1, Alfonso Martinez 2, Adrià Tauste Campo 2, and Gonzalo Vazquez-Vilar 2 1 St. Petersburg Univ.

More information

HUFFMAN CODING AND HUFFMAN TREE

HUFFMAN CODING AND HUFFMAN TREE oding: HUFFMN OING N HUFFMN TR Reducing strings over arbitrary alphabet Σ o to strings over a fixed alphabet Σ c to standardize machine operations ( Σ c < Σ o ). inary representation of both operands and

More information

Binary Search Trees CMPSC 122

Binary Search Trees CMPSC 122 Binary Search Trees CMPSC 122 Note: This notes packet has significant overlap with the first set of trees notes I do in CMPSC 360, but goes into much greater depth on turning BSTs into pseudocode than

More information

Lecture 10: Regression Trees

Lecture 10: Regression Trees Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

More information

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

More information

Topology-based network security

Topology-based network security Topology-based network security Tiit Pikma Supervised by Vitaly Skachek Research Seminar in Cryptography University of Tartu, Spring 2013 1 Introduction In both wired and wireless networks, there is the

More information

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

More information

Integrating Benders decomposition within Constraint Programming

Integrating Benders decomposition within Constraint Programming Integrating Benders decomposition within Constraint Programming Hadrien Cambazard, Narendra Jussien email: {hcambaza,jussien}@emn.fr École des Mines de Nantes, LINA CNRS FRE 2729 4 rue Alfred Kastler BP

More information

Professor Anita Wasilewska. Classification Lecture Notes

Professor Anita Wasilewska. Classification Lecture Notes Professor Anita Wasilewska Classification Lecture Notes Classification (Data Mining Book Chapters 5 and 7) PART ONE: Supervised learning and Classification Data format: training and test data Concept,

More information

Analysis of Algorithms, I

Analysis of Algorithms, I Analysis of Algorithms, I CSOR W4231.002 Eleni Drinea Computer Science Department Columbia University Thursday, February 26, 2015 Outline 1 Recap 2 Representing graphs 3 Breadth-first search (BFS) 4 Applications

More information

Any two nodes which are connected by an edge in a graph are called adjacent node.

Any two nodes which are connected by an edge in a graph are called adjacent node. . iscuss following. Graph graph G consist of a non empty set V called the set of nodes (points, vertices) of the graph, a set which is the set of edges and a mapping from the set of edges to a set of pairs

More information

Multiobjective Multicast Routing Algorithm

Multiobjective Multicast Routing Algorithm Multiobjective Multicast Routing Algorithm Jorge Crichigno, Benjamín Barán P. O. Box 9 - National University of Asunción Asunción Paraguay. Tel/Fax: (+9-) 89 {jcrichigno, bbaran}@cnc.una.py http://www.una.py

More information

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge, The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge, cheapest first, we had to determine whether its two endpoints

More information

The Research on Demand Forecasting of Supply Chain Based on ICCELMAN

The Research on Demand Forecasting of Supply Chain Based on ICCELMAN 505 A publication of CHEMICAL ENGINEERING TRANSACTIONS VOL. 46, 2015 Guest Editors: Peiyu Ren, Yancang Li, Huiping Song Copyright 2015, AIDIC Servizi S.r.l., ISBN 978-88-95608-37-2; ISSN 2283-9216 The

More information

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar Complexity Theory IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar Outline Goals Computation of Problems Concepts and Definitions Complexity Classes and Problems Polynomial Time Reductions Examples

More information

Classification - Examples

Classification - Examples Lecture 2 Scheduling 1 Classification - Examples 1 r j C max given: n jobs with processing times p 1,...,p n and release dates r 1,...,r n jobs have to be scheduled without preemption on one machine taking

More information

Offline 1-Minesweeper is NP-complete

Offline 1-Minesweeper is NP-complete Offline 1-Minesweeper is NP-complete James D. Fix Brandon McPhail May 24 Abstract We use Minesweeper to illustrate NP-completeness proofs, arguments that establish the hardness of solving certain problems.

More information

GA as a Data Optimization Tool for Predictive Analytics

GA as a Data Optimization Tool for Predictive Analytics GA as a Data Optimization Tool for Predictive Analytics Chandra.J 1, Dr.Nachamai.M 2,Dr.Anitha.S.Pillai 3 1Assistant Professor, Department of computer Science, Christ University, Bangalore,India, chandra.j@christunivesity.in

More information

Lecture 4: BK inequality 27th August and 6th September, 2007

Lecture 4: BK inequality 27th August and 6th September, 2007 CSL866: Percolation and Random Graphs IIT Delhi Amitabha Bagchi Scribe: Arindam Pal Lecture 4: BK inequality 27th August and 6th September, 2007 4. Preliminaries The FKG inequality allows us to lower bound

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Network Traffic Prediction Based on the Wavelet Analysis and Hopfield Neural Network

Network Traffic Prediction Based on the Wavelet Analysis and Hopfield Neural Network Netork Traffic Prediction Based on the Wavelet Analysis and Hopfield Neural Netork Sun Guang Abstract Build a mathematical model is the key problem of netork traffic prediction. Traditional single netork

More information

Lecture 5 - CPA security, Pseudorandom functions

Lecture 5 - CPA security, Pseudorandom functions Lecture 5 - CPA security, Pseudorandom functions Boaz Barak October 2, 2007 Reading Pages 82 93 and 221 225 of KL (sections 3.5, 3.6.1, 3.6.2 and 6.5). See also Goldreich (Vol I) for proof of PRF construction.

More information

Data Mining Classification: Decision Trees

Data Mining Classification: Decision Trees Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous

More information

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R Binary Search Trees A Generic Tree Nodes in a binary search tree ( B-S-T) are of the form P parent Key A Satellite data L R B C D E F G H I J The B-S-T has a root node which is the only node whose parent

More information

Less naive Bayes spam detection

Less naive Bayes spam detection Less naive Bayes spam detection Hongming Yang Eindhoven University of Technology Dept. EE, Rm PT 3.27, P.O.Box 53, 5600MB Eindhoven The Netherlands. E-mail:h.m.yang@tue.nl also CoSiNe Connectivity Systems

More information

6 Creating the Animation

6 Creating the Animation 6 Creating the Animation Now that the animation can be represented, stored, and played back, all that is left to do is understand how it is created. This is where we will use genetic algorithms, and this

More information

Mathematical Induction. Lecture 10-11

Mathematical Induction. Lecture 10-11 Mathematical Induction Lecture 10-11 Menu Mathematical Induction Strong Induction Recursive Definitions Structural Induction Climbing an Infinite Ladder Suppose we have an infinite ladder: 1. We can reach

More information

HYBRID GENETIC ALGORITHMS FOR SCHEDULING ADVERTISEMENTS ON A WEB PAGE

HYBRID GENETIC ALGORITHMS FOR SCHEDULING ADVERTISEMENTS ON A WEB PAGE HYBRID GENETIC ALGORITHMS FOR SCHEDULING ADVERTISEMENTS ON A WEB PAGE Subodha Kumar University of Washington subodha@u.washington.edu Varghese S. Jacob University of Texas at Dallas vjacob@utdallas.edu

More information

A Note on Maximum Independent Sets in Rectangle Intersection Graphs

A Note on Maximum Independent Sets in Rectangle Intersection Graphs A Note on Maximum Independent Sets in Rectangle Intersection Graphs Timothy M. Chan School of Computer Science University of Waterloo Waterloo, Ontario N2L 3G1, Canada tmchan@uwaterloo.ca September 12,

More information

Efficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case.

Efficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case. Algorithms Efficiency of algorithms Computational resources: time and space Best, worst and average case performance How to compare algorithms: machine-independent measure of efficiency Growth rate Complexity

More information

1.2 Cultural Orientation and Consumer Behavior

1.2 Cultural Orientation and Consumer Behavior JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 25, 1605-1616 (2009) A Ne Perspective for Neural Netorks: Application to a Marketing Management Problem JAESOO KIM AND HEEJUNE AHN + Department of Computer

More information

The LCA Problem Revisited

The LCA Problem Revisited The LA Problem Revisited Michael A. Bender Martín Farach-olton SUNY Stony Brook Rutgers University May 16, 2000 Abstract We present a very simple algorithm for the Least ommon Ancestor problem. We thus

More information

D A T A M I N I N G C L A S S I F I C A T I O N

D A T A M I N I N G C L A S S I F I C A T I O N D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.

More information

Decision Trees for Mining Data Streams Based on the Gaussian Approximation

Decision Trees for Mining Data Streams Based on the Gaussian Approximation International Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 Decision Trees for Mining Data Streams Based on the Gaussian Approximation S.Babu

More information

Compact Representations and Approximations for Compuation in Games

Compact Representations and Approximations for Compuation in Games Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions

More information

Lecture 1: Course overview, circuits, and formulas

Lecture 1: Course overview, circuits, and formulas Lecture 1: Course overview, circuits, and formulas Topics in Complexity Theory and Pseudorandomness (Spring 2013) Rutgers University Swastik Kopparty Scribes: John Kim, Ben Lund 1 Course Information Swastik

More information

Efficient Incremental Search for Moving Target Search

Efficient Incremental Search for Moving Target Search fficient Incremental Search for Moving Target Search Xiaoxun Sun William Yeoh Sven Koenig omputer Science epartment University of Southern alifornia Los ngeles, 90089-0781, US {xiaoxuns, wyeoh, skoenig}@usc.edu

More information

Converting a Number from Decimal to Binary

Converting a Number from Decimal to Binary Converting a Number from Decimal to Binary Convert nonnegative integer in decimal format (base 10) into equivalent binary number (base 2) Rightmost bit of x Remainder of x after division by two Recursive

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

Measuring the Performance of an Agent

Measuring the Performance of an Agent 25 Measuring the Performance of an Agent The rational agent that we are aiming at should be successful in the task it is performing To assess the success we need to have a performance measure What is rational

More information

An Application of Machine Learning to Network Intrusion Detection

An Application of Machine Learning to Network Intrusion Detection An Application of Machine Learning to Network Intrusion Detection Chris Sinclair Applied Research Laboratories The University of Texas at Austin sinclair@arlututexasedu Lyn Pierce epierce@arlututexasedu

More information

Binary Coded Web Access Pattern Tree in Education Domain

Binary Coded Web Access Pattern Tree in Education Domain Binary Coded Web Access Pattern Tree in Education Domain C. Gomathi P.G. Department of Computer Science Kongu Arts and Science College Erode-638-107, Tamil Nadu, India E-mail: kc.gomathi@gmail.com M. Moorthi

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Data Mining for Knowledge Management. Classification

Data Mining for Knowledge Management. Classification 1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

More information

Three Effective Top-Down Clustering Algorithms for Location Database Systems

Three Effective Top-Down Clustering Algorithms for Location Database Systems Three Effective Top-Down Clustering Algorithms for Location Database Systems Kwang-Jo Lee and Sung-Bong Yang Department of Computer Science, Yonsei University, Seoul, Republic of Korea {kjlee5435, yang}@cs.yonsei.ac.kr

More information

An Experience Based Learning Controller

An Experience Based Learning Controller J. Intelligent Learning Systems & Applications, 2010, 2: 80-85 doi:10.4236/jilsa.2010.22011 Published Online May 2010 (http://.scirp.org/journal/jilsa) An Experience Based Learning Controller Debadutt

More information

6.852: Distributed Algorithms Fall, 2009. Class 2

6.852: Distributed Algorithms Fall, 2009. Class 2 .8: Distributed Algorithms Fall, 009 Class Today s plan Leader election in a synchronous ring: Lower bound for comparison-based algorithms. Basic computation in general synchronous networks: Leader election

More information

Offline sorting buffers on Line

Offline sorting buffers on Line Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com

More information

Security-Aware Beacon Based Network Monitoring

Security-Aware Beacon Based Network Monitoring Security-Aware Beacon Based Network Monitoring Masahiro Sasaki, Liang Zhao, Hiroshi Nagamochi Graduate School of Informatics, Kyoto University, Kyoto, Japan Email: {sasaki, liang, nag}@amp.i.kyoto-u.ac.jp

More information

A binary search tree is a binary tree with a special property called the BST-property, which is given as follows:

A binary search tree is a binary tree with a special property called the BST-property, which is given as follows: Chapter 12: Binary Search Trees A binary search tree is a binary tree with a special property called the BST-property, which is given as follows: For all nodes x and y, if y belongs to the left subtree

More information

New binary representation in Genetic Algorithms for solving TSP by mapping permutations to a list of ordered numbers

New binary representation in Genetic Algorithms for solving TSP by mapping permutations to a list of ordered numbers Proceedings of the 5th WSEAS Int Conf on COMPUTATIONAL INTELLIGENCE, MAN-MACHINE SYSTEMS AND CYBERNETICS, Venice, Italy, November 0-, 006 363 New binary representation in Genetic Algorithms for solving

More information

Generating models of a matched formula with a polynomial delay

Generating models of a matched formula with a polynomial delay Generating models of a matched formula with a polynomial delay Petr Savicky Institute of Computer Science, Academy of Sciences of Czech Republic, Pod Vodárenskou Věží 2, 182 07 Praha 8, Czech Republic

More information

Data Preprocessing. Week 2

Data Preprocessing. Week 2 Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.

More information

SHORT CYCLE COVERS OF GRAPHS WITH MINIMUM DEGREE THREE

SHORT CYCLE COVERS OF GRAPHS WITH MINIMUM DEGREE THREE SHOT YLE OVES OF PHS WITH MINIMUM DEEE THEE TOMÁŠ KISE, DNIEL KÁL, END LIDIKÝ, PVEL NEJEDLÝ OET ŠÁML, ND bstract. The Shortest ycle over onjecture of lon and Tarsi asserts that the edges of every bridgeless

More information

Performance Analysis of Decision Trees

Performance Analysis of Decision Trees Performance Analysis of Decision Trees Manpreet Singh Department of Information Technology, Guru Nanak Dev Engineering College, Ludhiana, Punjab, India Sonam Sharma CBS Group of Institutions, New Delhi,India

More information

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES Bruno Carneiro da Rocha 1,2 and Rafael Timóteo de Sousa Júnior 2 1 Bank of Brazil, Brasília-DF, Brazil brunorocha_33@hotmail.com 2 Network Engineering

More information

Genetic Algorithm Based Interconnection Network Topology Optimization Analysis

Genetic Algorithm Based Interconnection Network Topology Optimization Analysis Genetic Algorithm Based Interconnection Network Topology Optimization Analysis 1 WANG Peng, 2 Wang XueFei, 3 Wu YaMing 1,3 College of Information Engineering, Suihua University, Suihua Heilongjiang, 152061

More information

CS711008Z Algorithm Design and Analysis

CS711008Z Algorithm Design and Analysis CS711008Z Algorithm Design and Analysis Lecture 7 Binary heap, binomial heap, and Fibonacci heap 1 Dongbo Bu Institute of Computing Technology Chinese Academy of Sciences, Beijing, China 1 The slides were

More information

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde

More information