Decision Trees. The predictor vector x can be a mixture of continuous and categorical variables.


 Felicia Fitzgerald
 1 years ago
 Views:
Transcription
1 Decision Trees 1 Introduction to tree based methods 1.1 The general setting Supervised learning problem. Data: {(y i, x i ) : i = 1,..., n}, where y is the response, x is a pdimensional input vector, and n is the sample size. The response y can be binary or categorical classification (or decision) trees; or continuous regression trees. The predictor vector x can be a mixture of continuous and categorical variables. A treebased method (or recursive partitioning) recursively partitions the predictor space to model the relationship between the response y and the predictors (x 1,..., x p ). 1.2 Advantages of trees It is nonparametric requiring few statistical assumptions. It can be applied to various data structures including both ordered and categorical variables in a simple and natural way. In particular, the recursive partitioning method is exceptionally efficient in handling categorical predictors. It ameliorates the curse of dimensionality. When p > 2, the conventional nonparametric smoothing techniques become computationally infeasible. As p increases, parametric models encounter problems too, such as variable selection, transformations, and interaction handling. It does stepwise variable selection, complexity reduction, and (implicit) interaction handling in an automatic manner. 16
2 Invariant under all monotone transformations of individual ordered predictors Efficient in handling Missing values and provides variable importance rankings. The output gives easily understood and interpreted information. Interpretability is one of the main advantages of decision trees compared to black box methods such as neural networks. The hierarchical (binary) tree structure automatically and optimally groups data, which renders it an excellent tool in medical prognosis/diagnosis. Provides a natural platform to handle heterogeneity in the data by allowing different models to be fit to different groups (treed models). Tradeoff: tree models are robust yet unstable. 1.3 A brief history of tree modeling Morgan and Sonquist (1963) Automatic Interaction Detection (AID) Breiman, Friedman, Olshen and Stone (1984)  Classification And Regression Trees (CART) Addressed the tree size selection (pruning) and many other issues such as missing values, variable importance, etc. Greatly advanced the use of tree methods in various application fields. Extensions: Freund and Schapire (1996) and Friedman (2001): Boosting Breiman (1996): Bagging Breiman (2003): Random Forests 1.4 An example and terminology The stage C prostate cancer example The dataset contains info about 146 stage C prostate cancer patients. The main clinical endpoint of interest is whether the disease recurs after initial surgical removal of the prostate, and the time interval 17
3 to that progression (if any). The enpoint of this example is pgstat, which takes on the value 1 if the disease has progressed and 0 if not. Below is a short description of the variables. The data is a matrix of 146 rows and 8 columns corresponding to the following 8 variables: pgtime = time to progression in years pgstat = status at last followup: 1=progressed, 0=censored age = age at diagnosis eet = early endocrine therapy: 1=no 2=yes g2 = % of cells in g2 phase, from flow cytometry grade = tumor grade 1,2,3,4 gleason = Gleason score (competing grading system, 310) ploidy = diploid/tetraploid/aneuploid DNA pattern The file stagec.r shows how to construct a classification tree predicting pgstat from the last 6 variables (age, eet, g2, grade, gleason, ploidy). Terminology: node, root node, parent node, child node, split, leaf (terminal) node, internal node, and path. Built from root node (top) to leaf/terminal nodes (bottom) A record first enters the root node. A test (split) is applied to determine to which child node it should go next. The process is repeated until a record arrives at a leaf (terminal) node. The path from the root to a leaf node provides an expression of a rule. 1.5 References CART by Breiman, Friedman, Olshen, and Stone (1984). Sections 2.2 and 2.7. An introduction to recursive partitioning using the RPART routines by Atkinson and Therneau, Mayo Foundation, February 11, Statistical learning from a regression perspective by Berk (2008) Sections 18
4 Decision Trees (continued) 2 Growing a Large Tree We will follow the CART methodology to develop tree models, which contains the following three major steps: 1. Grow a large initial tree, T 0 ; 2. Iteratively truncate branches of T 0 to obtain a sequence of optimally pruned (nested) subtrees; 3. Select the best tree size based on validation provided by either test sample or cross validation (CV). To illustrate, we consider decision trees with binary responses. Namely, y i = 1 when an event of interest occurs to subject i; 0 otherwise. In this section, we focus on how to grow a large tree. The four elements needed in the initial tree growing procedure are 1. A set of binary questions to induce a split 2. A goodness of split criterion to evaluate a split 3. A stopsplitting rule 4. A rule for assigning every terminal node to a class (0 or 1) We will discuss each of these elements in the sections that follow. 2.1 Possible Number of Splits The first problem in tree construction is how to determine the number of partitions needed to examine at each node. An exhaustive (greedy) search algorithm considers all possible partitions of all input variables at every node in the tree. However, the number of child nodes tends to increase rapidly when there are too many 19
5 variables or when there are too many levels in one or more variables. This makes an exhaustive search algorithm prohibitively expensively. Examples 1. Suppose x is an ordinal variable with four levels 1, 2, 3, and 4. What is the total number of possible splits considering only binary ones? Solution: 2 way split: 1234, 1234, Note that there are L 1 possible splits for an ordinal variable with L levels. 2. Suppose x is a numerical variable with 100 distinct values. What is the total number of possible splits? Solution: The formula above for computing the number of possible partitions for ordinal variables also applies when computing the number of possible partitions for numerical variables, where L now denotes the number of distinct values in the observed sample. Total number of splits = = Suppose x is a nominal variable with four categories a, b, c, d. What is the total number of possible binary splits? Solution: abcd, acbd, adbc, abcd, abdc, acdb, abcd Total number of binary splits = 7 Note that the total number of possible binary splits is 2 L 1 1. Reducing the Number of Possible Partitions for Nominal Variables For categorical predictors that has many levels {b 1,..., b L }, one way to reduce the number of splits is to rank the levels as {b l1,..., b ll } according to the occurrence rate within each node p{1 b l1 } p{1 b l2 } p{1 b ll } and then treat it as an ordinal input. (See CART, p. 101). 20
6 2.2 Node Impurity based Splitting criteria In general, the impurity i(t) of node t can be defined as a nonnegative function of p{0 t} and p{1 t}, where p{0 t} and p{1 t} denote the proportions of the cases in node t belonging to classes 0 and 1, respectively. More formally, i(t) = φ(p 1 ), where p 1 = p{y = 1 t} and the impurity function φ( ) is the largest when both classes are equally mixed together and it is the smallest when the node contains only one class. Hence, it has the following properties: 1. φ(p) 0; 2. φ(p) attains its minimum 0 when p = 0 or p = φ(p) attains its maximum when p = 1 p = 1/2. 4. φ(p) = φ(1 p), i.e., φ(p) is symmetric about p = 1/2. Common choices of φ include: the minimum error, the entropy function, and the Gini index. The minimum or Bayes Error φ(p) = min(p, 1 p). This measure corresponds to the misclassification rate when majority vote is used. The minimum error is rarely used in practice due to the fact that it does not sufficiently reward purer nodes (CART, p. 99). The Entropy Function φ(p) = p log(p) (1 p) log(1 p). Quinlan (1993) first proposed to use the reduction of Entropy as a goodness of split criterion. Ripley (1996) showed the entropy reduction criterion is equivalent to using the likelihood ratio chisquare statistic for association between the branches and the target categories. 21
7 The Gini Index φ(p) = p(1 p). Breiman et al. (1984) proposed to use the reduction of Gini index as a goodness of split criterion. It has been observed that this rule has an undesirable endcut preference problem (Breiman et al., 1984, Ch. 11): It gives preference to the splits that result in two child nodes of extremely unbalanced sizes. To resolve this problem, a modification called the delta splitting method has been adopted in both the THAID (Morgan and Messenger, 1973) and CART programs. Because of the above concerns, from now on the impurity refers to the entropy criterion unless stated otherwise. Computation of i(t) The computation of impurity is simple when the occurrence rate p{y = 1 t} in node t is available. In many applications such as prospective studies, this occurrence rate can be estimated empirically from the data. At other times (e.g. retrospective studies), additional prior information may be required to estimate the occurrence rate. For a given split s, we have the following 2 2 table according to the split and the response. response node 0 1 left (t L ) n 11 n 12 n 1 right (t R ) n 21 n 22 n 2 n 1 n 2 n In prospective studies, p = p{y = 1 t L } and 1 p = p{y = 0 t L } can be estimated by n 12 /n 1 and n 11 /n 1, respectively. Hence i(t L ) = n 12 n 1 log ( ) n12 n 1 n 11 n 1 log ( n11 n 1 ). 22
8 In fact, it can be shown that the above entropy criterion is proportional to the maximized loglikelihood associated with t L. In light of this fact, many nodesplitting criteria originate from the maximum of certain likelihood functions. The importance of this observation will be appreciated later. GoodnessofSplit Measure Let s be any candidate split and suppose s divides t into t L and t R such that the proportions of the cases in t go into t L and t R are p L and p R, respectively. Define the reduction in node impurity as i(s, t) = i(t) [p L i(t L ) + p R i(t R )], which provides a goodnessofsplit measure for s. provides the maximum impurity reduction, i.e., The best split s for node t i(s, t) = max i(s, t). s S Then t will be split into t L and t R according to the split s and the search procedure for the best split repeated on t L and t R separately. A node becomes a terminal node when prespecified terminal node conditions are satisfied. 2.3 Alternative Splitting Criteria There are two alternative splitting criteria: the twoing rule and the χ 2 test. The twoing rule is an alternative measure of the goodness of a split: p L p R 4 j=0,1 p{y = j t L } p{y = j t R } For a binary response, the twoing rule coincides with the use of the Gini index, which has the endcut preference problem. The Pearson chisquare test statistic measures the difference between the observed cell frequencies and the expected cell frequencies (under the independence assumption). The pvalue associated with the χ 2 test may be used as a goodness of split measure
9 2.4 Input variables with different number of possible splits There are more splits to consider on a variable with more levels. Therefore, the maximum possible value for the goodness of split measure tends to become large as the number of possible splits, m, increases. For example, there is only one split for a binary input variable and there are 511 possible binary splits for a nominal input variable with 10 levels. Thus, all commonly used splitting criteria (e.g. Gini index, Entropy, and Pearson χ 2 test) favor variables with large number of possible splits. This problem has been identified as the variable selection bias problem (Loh 2002). Adjustment for Gini index is unavailable. The information gain ratio can be used to adjust Entropy (Quinlan, 1993). information gain ratio = Entropy input levels in parent node. Bonferroni type of adjustment can be used to adjust the χ 2 test (Kass, 1980). Kass adjustment is to multiply the pvalue by m, the number of possible splits. In order to identify the unbiased split, Loh (2002) proposed a residualbased method of selecting the most important variable first, and then applying greedy search only on this variable to find the best cutpoint. 2.5 References [1] Statistical learning from a regression perspective by Berk (2008). Section 3.3. [2] Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984) Classification and Regression Trees, Chapman and Hall. Chapters 2 and 4. [3] Kass, G. V. (1980) An Exploratory Technique for Investigating Large Quantities of Categorical Data,, Applied Statistics, Vol. 29, pp [4] Loh, W.Y. (2002). Regression trees with unbiased variable selection and interaction detection. Statistica Sinica, 12:
10 3 Tree Pruning 3.1 Motivation Why need tree pruning? Decision Trees (continued) Want a final tree model that can generalize well to new data A decision tree can be grown until every node is pure so that the misclassification rate is 0 A small tree with only few branches may fail to adapt enough signals. A Simulated Example from CART (page 60) # Terminal nodes Estimated Rate True Rate Note that in the above table 25
11 The difference between the estimated misclassification rate and true misclassification rate is getting bigger after a certain number of nodes The optimal number of terminal nodes is 10 in this example. Trees with less than 10 nodes underfit the data. Trees with more than 10 nodes overfit the data. Balance Between Bias and Variance Tree complexity can be measured by the number of leaves, the number of splits, or the depth. A wellfitted tree has low bias (i.e., adapts enough signal) and low variance (i.e., does not adapt to noise). The determination of tree complexity usually involves the balance between bias and variance. An underfitted tree that has no sufficient complexity has high bias and low variance. On the other hand, an overfitted tree has low bias and high variance. 3.2 Before CART: TopDown Pruning by Stopping Rules Instead of growing a large tree, one may use a set of stopping rules in order to decide when one can declare a terminal node. The strategies for stopping the growth of a tree may include: (1) limit the depth of the tree, (2) set the minimum number of cases in a terminal node, (3) set the minimum statistical significance that a split has to reach. Problems for TopDown Pruning Stopping rules are often subjective. Both underfitting and overfitting problems may occur. Treatment: bottomup pruning procedures 26
12 3.3 Misclassification Cost and CostComplexity Measure A few notes: CART: First grow a large initial tree T 0 by using loose stopping rules and then select a subtree of T 0 as the best tree structure. Unfortunately, evaluating all possible subtrees would not be computationally feasible even for moderately sized trees, because the number of subtrees grows much faster than the number of terminal nodes in the initial tree. To narrow down the choices of subtrees from which the bestsized subtree is to be selected, CART employs the idea of iteratively pruning off the weakest link to obtain a nested set of best subtrees of size ranging from 1 to T 0. In CART, the complexity of a tree model is determined by the total number of terminal nodes it has. Some tree terminology: Descendant and Ancestor: A node t is called a descendant of a higher node h if there is a connected path down the tree leading from h to t. If t is a descendant node of h, then h is an ancestor of t. Set and size of terminal nodes: Let T denote the set of all terminal nodes of T and T T the set of all internal nodes of T. Furthermore, let denote cardinality, i.e., the number of elements in a set. Therefore, T represents the number of terminal nodes of tree T. Note that T = 2 T 1. Subtree of T: A tree T 1 is a subtree of T if T 1 has the same root node as T and for every h T 1, h T. We denote subtree by : T 1 T. Branch: A tree T h is called a branch of T if T h is a tree with root node h T and all descendants of h in T are descendants of h in T h. Pruning a Branch: Pruning a Branch T h from a tree T consists of deleting from T all descendants of h, that is, cutting off all of T h except its root node. The pruned tree is denoted by T T h. 27
13 Illustration of concepts: page 31 (Source: LeBlanc and Crowley, JASA, 1993) To evaluate branches, define the goodnessoffit of a tree as R(T ) = t T R(t) where R(t) measures the quality (or goodnessoffit) of node t. Because our ultimate goal is to classify objects, R(t) is commonly chosen as the misclassification rate. Two types of errors False Positive Error: a case with true response value 0 () is falsely classified as 1 (+). False Negative Error: a case with true response value 1 (+) is falsely classified as 0 (). Two types of errors may need to be weighted with different costs. Modifying Majority Voting by Incorporating Misclassification Cost The class membership of 0 or 1 for a node now depends on whether the total cost of the false positive errors is higher or lower than that of the false negative errors. Let c(i j) denote the cost associated with misclassifying a true j as i. Node t will be assigned to class j (j = 0, 1) if it has the smallest misclassification cost, i.e., c(j 1 j) p(y = 1 j t) c(1 j j) p(y = j t) or c(j y i ) c(1 j y i ) i : y i t i : y i t y i j y i 1 j 28
14 Example: consider a node with 44 preterm (1) and 356 full term (0) babies. Using a simple majority vote principle, the node will be classified as full term. However, in order to minimize the error of misclassifying preterm babies as term babies, we may define the costs to be c(1 0) = 1 and c(0 1) = 10. What class will the node be assigned to? The goodnessoffit measure R(T ) alone is not sufficient for determining which subtree is better especially because larger trees typically have smaller values of R(T ). To develop a better measure of the performance (predictive ability) of tree T, we need to penalize the misclassification cost by its size, i.e., T. Define the CostComplexity Measure of Tree T as R α (T ) = R(T ) + α T where α 0 is the complexity parameter, used to penalize large trees. 3.4 CART: CostComplexity Pruning If the complexity parameter α is 0, then the initial tree T 0 is the best, i.e., having the smallest costcomplexity measure; if the complexity parameter goes to infinity, then the tree containing the root node only is the best. Note that as the complexity parameter increases from 0, there will be a link or internal node h that first becomes ineffective. What do we mean by ineffective? The node h as a terminal node is better than the branch T h, i.e., R α (h) R α (T h ) or R(h) + α 1 R(T h ) + α T h or α R(h) R(T h) T h 1 Let α = R(h) R(T h) T, which is the threshold that changes an internal node (link) h 1 h to a terminal node. Compute such threshold for every link (internal node). The link corresponding to the smallest threshold is identified as the weakest link h. 29
15 Denote the pruned subtree, after truncating T h, as T 1 = T 0 T h and repeat the same procedure by considering all internals nodes of T 1 and pruning off the weakest link to obtain T 2. The Algorithm: Let j = 0 Let T = T 0 While T 2, do For every h T T, compute α h = R(h) R(T h) T h 1 Set j = j + 1 enddo Let α j = min α h and h be the corresponding link Let T j = T T h Set T = T j The pruning algorithm results in a nested sequence of optimally pruned subtrees T 0 T 1 T m, where T m denotes the tree with the root node only, and a corresponding sequence of thresholds satisfying 0 = α 0 < α 1 < < α m. CART shows that for α [α k, α k+1 ), k = 0,..., m, tree T k is the smallest subtree that minimizes the costcomplexity measure R α (T ). 3.5 References [1] CART. Sections (pp ). 30
16 Insert: LeBlanc s 1993 JASA paper, page
17 4 Tree Size Selection Decision Trees (continued) As the third step in the CART algorithm, now we need to identify one (or several) optimallysized tree from the subtree sequence as the final tree model. This step is equivalent to selecting the best tree size. A natural approach is to choose the subtree that optimizes an estimate of a performance measure. However, the resubstitution estimate based on training sample tends to be over optimistic because of the very adaptive nature of decision trees. Need validation methods (test sample & cross validation) to develop a more honest estimate of performance. 4.1 The Test Sample Method Step 1: Split data randomly into two sets: the learning sample L 1 (66.66%) and the test sample L 2 (33.33%) The learning sample is also called the training sample and the test sample is sometimes called the validation sample. The above ratio (2:1) is quoted from CART. A different ratio may be applied depending on the total sample size. For example, when a huge amount of data is available, one may apply a larger proportion for test sample, e.g. a ratio of 1:1 for the learning and test samples. Stratified sampling may be applied. Stratification may be based on the outcome variable or important input variables. Step 2: Using the training sample L 1 only, grow a large initial tree and then prune it back to obtain a nested sequence of subtrees T 0... T M. Step 3: Send the test sample L 2 down each subtree and compute the misclassification cost R ts (T m ) based on the test sample for each subtree T m, m = 0, 1,..., M. The subtree having the smallest misclassification cost is then selected as the best subtree. We denote it as T. That is, R ts (T ) = min m Rts (T m ) 32
18 . Once the best subtree T is determined, R ts (T ) is used as an estimate of the misclassification cost. Advantages and disadvantages of the test sample method Very straight forward. Does not use all data: the sequence of subtrees, from which the best tree model is selected, is based on 2/3 of the data. The estimate of the misclassification cost R(T ) is based on 1/3 of the data, hence the test sample based estimator of the tree performance has high variance when we don t have much data. 4.2 Cross Validation (CV) Often applied when the sample size is moderate or small. (Even when the sample size seems to be large, we may still not have data to waste if the target variable is sparse or if there are a large number of input variables.) Does not waste data. One of the resampling techniques: generate samples from the one sample at hand. Other resampling techniques include bootstrap and jackknife. V fold crossvalidation: Step 1: The whole sample L is randomly divided into V subsets: L v, v = 1,..., V. The sample sizes of the V subsets should be all equal, or as nearly equal as possible. The vth learning sample is L (v) = L L v, v = 1,..., V. The vth subset L v is used as the test sample corresponding to L (v). The value of V needs to be reasonably large so that the size of each training sample, (V 1)/V, is close to the size of all data. CART s suggestion is V = 10, in which case, each training sample contains 90% of the data and each test sample contains 10% of the data. 33
19 Stratified sampling may be used to ensure balance for important variables. Step 2: For a fixed v = 1,..., V, grow a large initial tree and prune it back using only L (v). The pruning procedure provides a nested sequence of optimally pruned subtrees T (v) 0 T (v) 1... T (v) M. Also grow and prune a tree based on all data to obtain the nested sequence of best pruned subtrees T 0 T 1... T M and a corresponding sequence of complexity parameters 0 = α 0 < α 1 <... < α M < α M+1 =. step 3: Now we want to select the best subtree from the subtree sequence T 0 T 1... T M based on the minimum misclassification cost. How do we achieve this through Vfold cross validation? Let s first review an important property from the CART pruning procedure: Theorem (Theorem 3.10 in CART, page 71): For m 1, T m is the smallest subtree that minimizes the costcomplexity measure R α (T ) for complexity parameter α such that α m α < α m+1. The above theorem implies that we can get the optimally pruned subtree for any penalty α from the efficient pruning algorithm. Define α m = α m α m+1 : m = 0, 1,..., M such that for each m, α m is the geometric midpoint of the interval [α m, α m+1 ). Here, {α m : m = 0, 1,..., M + 1} are obtained by applying the costcomplexity pruning algorithm to the entire sample L. Note that for each v = 1,..., V, we have the optimally pruned subtree T (v) (α m) for complexity parameter α m, m = 1,..., M. Now we want to find the complexity parameter α that minimizes the average of the estimated misclassification cost for v = 1,..., V. Fix the value of v : For each m = 1,..., M, L v is sent down the tree T (v) (α m). The quantity R CV v (T (v) (α m)) = t T (v) (α m) i: (x i,y i ) t L v R(i) is calculated, where R(i) is the misclassification cost for observation i that belongs to L v and falls into terminal node t. 34
20 Sum over v: the above quantity is summed over the V subsamples to obtain R CV (T (α m)) = V v=1 R CV v (T (v) (α m)). The best pruned subtree can be defined as the subtree T (α ) which minimizes the crossvalidated estimate of the misclassification cost: SE rule R CV (T (α )) = min m RCV (T (α m)). There is one problem with methods based on honest estimates of the misclassification cost (test sample & crossvalidation). The estimate of the misclassification cost (or prediction error) tends to decrease rapidly as the tree size increases from the root node. Then there is a wide flat valley with the estimated misclassification cost rising slowly as the number of terminal nodes gets large (see figure on page 79 of CART). Breiman et al. (1984) note that there may be considerable variability in the minimum misclassification cost. CART proposes an ad hoc fix, namely the 1SE rule. The 1SE Rule is designed To keep the tree as simple as possible without sacrificing much accuracy To reduce instability in tree selection. CART selects the smallest subtree T such that ˆR(T ) is less than one standard error greater than ˆR(T ), where ˆR denotes either R ts or R cv, and T denotes the best subtree from the corresponding validation method. Namely, for all subtrees with ˆR less than ˆR(T ) + SE( ˆR(T )), T has the smallest size. 4.4 References [1] Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984) Classification and Regression Trees, Chapman and Hall. Sections
Classification/Decision Trees (II)
Classification/Decision Trees (II) Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Right Sized Trees Let the expected misclassification rate of a tree T be R (T ).
More information6 Classification and Regression Trees, 7 Bagging, and Boosting
hs24 v.2004/01/03 Prn:23/02/2005; 14:41 F:hs24011.tex; VTEX/ES p. 1 1 Handbook of Statistics, Vol. 24 ISSN: 01697161 2005 Elsevier B.V. All rights reserved. DOI 10.1016/S01697161(04)240111 1 6 Classification
More informationData Mining Methods: Applications for Institutional Research
Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014
More informationGerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
More informationData Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
More informationData Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data
More informationCOMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
More informationTan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2. Tid Refund Marital Status
Data Mining Classification: Basic Concepts, Decision Trees, and Evaluation Lecture tes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Classification: Definition Given a collection of
More informationDecision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
More informationAn Overview and Evaluation of Decision Tree Methodology
An Overview and Evaluation of Decision Tree Methodology ASA Quality and Productivity Conference Terri Moore Motorola Austin, TX terri.moore@motorola.com Carole Jesse Cargill, Inc. Wayzata, MN carole_jesse@cargill.com
More informationClass #6: Nonlinear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Nonlinear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Nonlinear classification Linear Support Vector Machines
More informationFine Particulate Matter Concentration Level Prediction by using Treebased Ensemble Classification Algorithms
Fine Particulate Matter Concentration Level Prediction by using Treebased Ensemble Classification Algorithms Yin Zhao School of Mathematical Sciences Universiti Sains Malaysia (USM) Penang, Malaysia Yahya
More informationTrees and Random Forests
Trees and Random Forests Adele Cutler Professor, Mathematics and Statistics Utah State University This research is partially supported by NIH 1R15AG03739201 Cache Valley, Utah Utah State University Leo
More informationData Mining Techniques Chapter 6: Decision Trees
Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................
More informationClassification tree analysis using TARGET
Computational Statistics & Data Analysis 52 (2008) 1362 1372 www.elsevier.com/locate/csda Classification tree analysis using TARGET J. Brian Gray a,, Guangzhe Fan b a Department of Information Systems,
More informationData Mining for Knowledge Management. Classification
1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh
More informationLearning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationLecture 10: Regression Trees
Lecture 10: Regression Trees 36350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
More informationBenchmarking OpenSource Tree Learners in R/RWeka
Benchmarking OpenSource Tree Learners in R/RWeka Michael Schauerhuber 1, Achim Zeileis 1, David Meyer 2, Kurt Hornik 1 Department of Statistics and Mathematics 1 Institute for Management Information Systems
More informationData Mining  Evaluation of Classifiers
Data Mining  Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationAn Overview of Data Mining: Predictive Modeling for IR in the 21 st Century
An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIRPO
More information!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'*&./#$&'(&(0*".$#$1"(2&."3$'45"
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'*&./#$&'(&(0*".$#$1"(2&."3$'45"!"#"$%&#'()*+',$$.&#',/"0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationChapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida  1 
Chapter 11 Boosting Xiaogang Su Department of Statistics University of Central Florida  1  Perturb and Combine (P&C) Methods have been devised to take advantage of the instability of trees to create
More informationData mining techniques: decision trees
Data mining techniques: decision trees 1/39 Agenda Rule systems Building rule systems vs rule systems Quick reference 2/39 1 Agenda Rule systems Building rule systems vs rule systems Quick reference 3/39
More informationKnowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19  Bagging. Tom Kelsey. Notes
Knowledge Discovery and Data Mining Lecture 19  Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.standrews.ac.uk twk@standrews.ac.uk Tom Kelsey ID505919B &
More informationClassification and Prediction
Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser
More informationStatistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees
Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and treebased classification techniques.
More informationClassification: Basic Concepts, Decision Trees, and Model Evaluation. General Approach for Building Classification Model
10 10 Classification: Basic Concepts, Decision Trees, and Model Evaluation Dr. Hui Xiong Rutgers University Introduction to Data Mining 1//009 1 General Approach for Building Classification Model Tid Attrib1
More informationInsurance Analytics  analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.
Insurance Analytics  analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics
More informationPenalized regression: Introduction
Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20thcentury statistics dealt with maximum likelihood
More informationEXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationREPORT DOCUMENTATION PAGE
REPORT DOCUMENTATION PAGE Form Approved OMB NO. 07040188 Public Reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,
More informationFeature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification
Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE541 28 Skövde
More informationDecision Tree Learning on Very Large Data Sets
Decision Tree Learning on Very Large Data Sets Lawrence O. Hall Nitesh Chawla and Kevin W. Bowyer Department of Computer Science and Engineering ENB 8 University of South Florida 4202 E. Fowler Ave. Tampa
More informationENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,
More informationUsing multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
More informationDECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com
More informationASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS
DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.
More informationData Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 7 of Data Mining by I. H. Witten and E. Frank
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 7 of Data Mining by I. H. Witten and E. Frank Engineering the input and output Attribute selection Scheme independent, scheme
More informationClassification and Regression Trees (CART) Theory and Applications
Classification and Regression Trees (CART) Theory and Applications A Master Thesis Presented by Roman Timofeev (188778) to Prof. Dr. Wolfgang Härdle CASE  Center of Applied Statistics and Economics Humboldt
More informationTRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
More informationDistances, Clustering, and Classification. Heatmaps
Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be
More informationLecture Notes for Chapter 4. Introduction to Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data
More informationThe More Trees, the Better! Scaling Up Performance Using Random Forest in SAS Enterprise Miner
Paper 33612015 The More Trees, the Better! Scaling Up Performance Using Random Forest in SAS Enterprise Miner Narmada Deve Panneerselvam, Spears School of Business, Oklahoma State University, Stillwater,
More informationModelBased Recursive Partitioning for Detecting Interaction Effects in Subgroups
ModelBased Recursive Partitioning for Detecting Interaction Effects in Subgroups Achim Zeileis, Torsten Hothorn, Kurt Hornik http://eeecon.uibk.ac.at/~zeileis/ Overview Motivation: Trees, leaves, and
More informationSTATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Webbased Analytics Table
More informationIBM SPSS Decision Trees 21
IBM SPSS Decision Trees 21 Note: Before using this information and the product it supports, read the general information under Notices on p. 104. This edition applies to IBM SPSS Statistics 21 and to all
More informationCI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
More informationTHE HYBRID CARTLOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell
THE HYBID CATLOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most datamining projects involve classification problems assigning objects to classes whether
More informationCLASSIFICATION AND CLUSTERING. Anveshi Charuvaka
CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationDecisionTree Learning
DecisionTree Learning Introduction ID3 Attribute selection Entropy, Information, Information Gain Gain Ratio C4.5 Decision Trees TDIDT: TopDown Induction of Decision Trees Numeric Values Missing Values
More informationAuxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationData Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data
More informationApplied Multivariate Analysis  Big data analytics
Applied Multivariate Analysis  Big data analytics Nathalie VillaVialaneix nathalie.villa@toulouse.inra.fr http://www.nathalievilla.org M1 in Economics and Economics and Statistics Toulouse School of
More informationSmart Grid Data Analytics for Decision Support
1 Smart Grid Data Analytics for Decision Support Prakash Ranganathan, Department of Electrical Engineering, University of North Dakota, Grand Forks, ND, USA Prakash.Ranganathan@engr.und.edu, 7017774431
More informationL13: crossvalidation
Resampling methods Cross validation Bootstrap L13: crossvalidation Bias and variance estimation with the Bootstrap Threeway data partitioning CSCE 666 Pattern Analysis Ricardo GutierrezOsuna CSE@TAMU
More informationEfficiency in Software Development Projects
Efficiency in Software Development Projects Aneesh Chinubhai Dharmsinh Desai University aneeshchinubhai@gmail.com Abstract A number of different factors are thought to influence the efficiency of the software
More informationL25: Ensemble learning
L25: Ensemble learning Introduction Methods for constructing ensembles Combination strategies Stacked generalization Mixtures of experts Bagging Boosting CSCE 666 Pattern Analysis Ricardo GutierrezOsuna
More informationDecision Trees for Predictive Modeling
Decision Trees for Predictive Modeling Padraic G. Neville SAS Institute Inc. 4 August 1999 What a Decision Tree Is...................2 What to Do with a Tree................... 3 Variable selection Variable
More informationData Mining Algorithms for Classification
Data Mining Algorithms for Classification BSc Thesis Artificial Intelligence Author: Patrick Ozer Radboud University Nijmegen January 2008 Supervisor: Dr. I.G. SprinkhuizenKuyper Radboud University Nijmegen
More informationCART: Classification and Regression Trees
Chapter 10 CART: Classification and Regression Trees Dan Steinberg Contents 10.1 Antecedents... 180 10.2 Overview... 181 10.3 A Running Example... 181 10.4 The Algorithm Briefly Stated... 183 10.5 Splitting
More informationLeveraging Ensemble Models in SAS Enterprise Miner
ABSTRACT Paper SAS1332014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to
More informationDecision Trees. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University. www.cs.cmu.edu/~awm awm@cs.cmu.
Decision Trees Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 422687599 Copyright Andrew W. Moore Slide Decision Trees Decision trees
More informationModel Combination. 24 Novembre 2009
Model Combination 24 Novembre 2009 Datamining 1 20092010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy
More informationRegression Modeling Strategies
Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions
More informationData Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
More informationTHE RISE OF THE BIG DATA: WHY SHOULD STATISTICIANS EMBRACE COLLABORATIONS WITH COMPUTER SCIENTISTS XIAO CHENG. (Under the Direction of Jeongyoun Ahn)
THE RISE OF THE BIG DATA: WHY SHOULD STATISTICIANS EMBRACE COLLABORATIONS WITH COMPUTER SCIENTISTS by XIAO CHENG (Under the Direction of Jeongyoun Ahn) ABSTRACT Big Data has been the new trend in businesses.
More informationInferential Statistics
Inferential Statistics Sampling and the normal distribution Zscores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationREVIEW OF ENSEMBLE CLASSIFICATION
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.
More informationComparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
More informationS032008 The Difference Between Predictive Modeling and Regression Patricia B. Cerrito, University of Louisville, Louisville, KY
S032008 The Difference Between Predictive Modeling and Regression Patricia B. Cerrito, University of Louisville, Louisville, KY ABSTRACT Predictive modeling includes regression, both logistic and linear,
More informationIntroduction to Learning & Decision Trees
Artificial Intelligence: Representation and Problem Solving 538 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning?  more than just memorizing
More information2 Decision tree + Crossvalidation with R (package rpart)
1 Subject Using crossvalidation for the performance evaluation of decision trees with R, KNIME and RAPIDMINER. This paper takes one of our old study on the implementation of crossvalidation for assessing
More informationHow to Conduct a Hypothesis Test
How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 10 Sajjad Haider Fall 2012 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationMicrosoft Azure Machine learning Algorithms
Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql Tomaz.kastrun@gmail.com http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation
More informationA Study Of Bagging And Boosting Approaches To Develop MetaClassifier
A Study Of Bagging And Boosting Approaches To Develop MetaClassifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet524121,
More informationCART 6.0 Feature Matrix
CART 6.0 Feature Matri Enhanced Descriptive Statistics Full summary statistics Brief summary statistics Stratified summary statistics Charts and histograms Improved User Interface New setup activity window
More information1 Maximum likelihood estimation
COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N
More informationDoptimal plans in observational studies
Doptimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
More informationPASS Sample Size Software
Chapter 250 Introduction The Chisquare test is often used to test whether sets of frequencies or proportions follow certain patterns. The two most common instances are tests of goodness of fit using multinomial
More informationA Property and Casualty Insurance Predictive Modeling Process in SAS
Paper 114222016 A Property and Casualty Insurance Predictive Modeling Process in SAS Mei Najim, Sedgwick Claim Management Services ABSTRACT Predictive analytics is an area that has been developing rapidly
More informationDECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING
DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING ABSTRACT The objective was to predict whether an offender would commit a traffic offence involving death, using decision tree analysis. Four
More informationData Mining with R. Decision Trees and Random Forests. Hugh Murrell
Data Mining with R Decision Trees and Random Forests Hugh Murrell reference books These slides are based on a book by Graham Williams: Data Mining with Rattle and R, The Art of Excavating Data for Knowledge
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for
More informationEnsemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 20150305
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 20150305 Roman Kern (KTI, TU Graz) Ensemble Methods 20150305 1 / 38 Outline 1 Introduction 2 Classification
More informationIncreasing Classification Accuracy. Data Mining: Bagging and Boosting. Bagging 1. Bagging 2. Bagging. Boosting Metalearning (stacking)
Data Mining: Bagging and Boosting Increasing Classification Accuracy Andrew Kusiak 2139 Seamans Center Iowa City, Iowa 522421527 andrewkusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Tel: 319335
More informationData Mining: An Overview. David Madigan http://www.stat.columbia.edu/~madigan
Data Mining: An Overview David Madigan http://www.stat.columbia.edu/~madigan Overview Brief Introduction to Data Mining Data Mining Algorithms Specific Eamples Algorithms: Disease Clusters Algorithms:
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002Topics in StatisticsBiological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationBOOSTING  A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on elearning (elearning2014), 2223 September 2014, Belgrade, Serbia BOOSTING  A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
More informationChapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining BecerraFernandez, et al.  Knowledge Management 1/e  2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
More informationNonnegative Matrix Factorization (NMF) in Semisupervised Learning Reducing Dimension and Maintaining Meaning
Nonnegative Matrix Factorization (NMF) in Semisupervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
More informationANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS Abstract D.Lavanya * Department of Computer Science, Sri Padmavathi Mahila University Tirupati, Andhra Pradesh, 517501, India lav_dlr@yahoo.com
More information