Package NHEMOtree. February 19, 2015



Similar documents
D-optimal plans in observational studies

A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II

Simple Population Replacement Strategies for a Steady-State Multi-Objective Evolutionary Algorithm

Multi-Objective Optimization using Evolutionary Algorithms

MULTI-OBJECTIVE OPTIMIZATION USING PARALLEL COMPUTATIONS

Electric Distribution Network Multi objective Design Using Problem Specific Genetic Algorithm

Genetic Algorithms for Bridge Maintenance Scheduling. Master Thesis

Selecting Best Investment Opportunities from Stock Portfolios Optimized by a Multiobjective Evolutionary Algorithm

Benchmarking Open-Source Tree Learners in R/RWeka

A Novel Constraint Handling Strategy for Expensive Optimization Problems

Decision Trees from large Databases: SLIQ

Model-based Parameter Optimization of an Engine Control Unit using Genetic Algorithms

An Evolutionary Algorithm in Grid Scheduling by multiobjective Optimization using variants of NSGA

Introduction To Genetic Algorithms

Genetic Algorithms commonly used selection, replacement, and variation operators Fernando Lobo University of Algarve

Bi-Objective Optimization of MQL Based Turning Process Using NSGA II

Genetic Algorithm. Based on Darwinian Paradigm. Intrinsically a robust search and optimization mechanism. Conceptual Algorithm

A New Multi-objective Evolutionary Optimisation Algorithm: The Two-Archive Algorithm

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Package trimtrees. February 20, 2015

Data Mining - Evaluation of Classifiers

Decision Tree Learning on Very Large Data Sets

Package acrm. R topics documented: February 19, 2015

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Index Terms- Batch Scheduling, Evolutionary Algorithms, Multiobjective Optimization, NSGA-II.

Solving Three-objective Optimization Problems Using Evolutionary Dynamic Weighted Aggregation: Results and Analysis

New Modifications of Selection Operator in Genetic Algorithms for the Traveling Salesman Problem

Hiroyuki Sato. Minami Miyakawa. Keiki Takadama ABSTRACT. Categories and Subject Descriptors. General Terms

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm

Modified Version of Roulette Selection for Evolution Algorithms - the Fan Selection

Defining and Optimizing Indicator-based Diversity Measures in Multiobjective Search

Lab 4: 26 th March Exercise 1: Evolutionary algorithms

Maintenance scheduling by variable dimension evolutionary algorithms

ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS

Learning in Abstract Memory Schemes for Dynamic Optimization

Package neuralnet. February 20, 2015

Data Mining Classification: Decision Trees

Multi-Objective Optimization to Workflow Grid Scheduling using Reference Point based Evolutionary Algorithm

A Multi-Objective Performance Evaluation in Grid Task Scheduling using Evolutionary Algorithms

Package hybridensemble

HYBRID GENETIC ALGORITHMS FOR SCHEDULING ADVERTISEMENTS ON A WEB PAGE

Multiobjective Multicast Routing Algorithm

Lecture 10: Regression Trees

Data Mining Methods: Applications for Institutional Research

Package missforest. February 20, 2015

Package bigrf. February 19, 2015

Multiobjective Optimization and Evolutionary Algorithms for the Application Mapping Problem in Multiprocessor System-on-Chip Design

Model Combination. 24 Novembre 2009

GP-DEMO: Differential Evolution for Multiobjective Optimization Based on Gaussian Process Models

Efficiency in Software Development Projects

A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND

Clustering & Visualization

An Overview and Evaluation of Decision Tree Methodology

Better credit models benefit us all

Big Data Decision Trees with R

6 Creating the Animation

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms

Package tagcloud. R topics documented: July 3, 2015

A Study of Crossover Operators for Genetic Algorithm and Proposal of a New Crossover Operator to Solve Open Shop Scheduling Problem

SIMULATING CANCELLATIONS AND OVERBOOKING IN YIELD MANAGEMENT

SCHEDULING AND INSPECTION PLANNING IN SOFTWARE DEVELOPMENT PROJECTS USING MULTI-OBJECTIVE HYPER-HEURISTIC EVOLUTIONARY ALGORITHM

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

6 Classification and Regression Trees, 7 Bagging, and Boosting

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

ISSN: ISO 9001:2008 Certified International Journal of Engineering Science and Innovative Technology (IJESIT) Volume 2, Issue 3, May 2013

Selecting Data Mining Model for Web Advertising in Virtual Communities

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification

An Interactive Visualization Tool for the Analysis of Multi-Objective Embedded Systems Design Space Exploration

Open-Source Machine Learning: R Meets Weka

1 Maximum likelihood estimation

Chapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida - 1 -

Package empiricalfdr.deseq2

Keywords: Beta distribution, Genetic algorithm, Normal distribution, Uniform distribution, Yield management.

GPSQL Miner: SQL-Grammar Genetic Programming in Data Mining

Using multiple models: Bagging, Boosting, Ensembles, Forests

Package bigdata. R topics documented: February 19, 2015

An innovative application of a constrained-syntax genetic programming system to the problem of predicting survival of patients

The Use of Evolutionary Algorithms in Data Mining. Khulood AlYahya Sultanah AlOtaibi

Biopharmaceutical Portfolio Management Optimization under Uncertainty

How To Filter Spam With A Poa

Multi-objective Approaches to Optimal Testing Resource Allocation in Modular Software Systems

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Data Preprocessing. Week 2

Transcription:

Type Package Package NHEMOtree February 19, 2015 Title Non-hierarchical evolutionary multi-objective tree learner to perform cost-sensitive classification Depends partykit, emoa, sets, rpart Version 1.0 Date 2013-05-05 Author Swaantje Casjens Maintainer Swaantje Casjens <swaantje.casjens@tu-dortmund.de> Description NHEMOtree performs cost-sensitive classification by solving the two-objective optimization problem of minimizing misclassification rate and minimizing total costs for classification. The three methods comprised in NHEMOtree are based on EMOAs with either tree representation or bitstring representation with an enclosed classification tree algorithm. License GPL-3 NeedsCompilation no Repository CRAN Date/Publication 2013-05-09 15:46:02 R topics documented: NHEMO........................................... 2 NHEMOtree......................................... 4 NHEMO_Cutoff...................................... 7 plot.nhemotree...................................... 9 Sim_Costs.......................................... 10 Sim_Data.......................................... 11 Wrapper........................................... 12 Index 15 1

2 NHEMO NHEMO Non-hierarchical evolutionary multi-objective tree learner to perform cost-sensitive classification Description NHEMO performs cost-sensitive classification by solving the non-hierarchical evolutionary twoobjective optimization problem of minimizing misclassification rate and minimizing total costs for classification. NHEMO is based on an EMOA with tree representation and cutoff optimization conducted by the EMOA. Usage NHEMO(data, CostMatrix, gens = 50, popsize = 50, max_nodes = 10, ngens = 14, bound = 10^-10, init_prob = 0.8, ps = c("tournament", "roulette", "winkler"), tournament_size = 4, crossover = c("standard", "brood", "poli"), brood_size=4, crossover_prob=0.5, mutation_prob=0.5, CV=5, vim=0,...) Arguments data CostMatrix gens popsize max_nodes ngens bound init_prob ps A data frame containing in the first column the class for each observation and in the other columns from which variables specified in formula are preferentially to be taken. A data frame containing the names (first column) and the costs (second column) for each explanatory variable in formula or x. NHEMOtree does not work with missing data in CostMatrix. Maximal number of generations of the EMOA (default: gens=50). Population size in each generation of the EMOA (default: popsize=50). Maximal number of nodes within each tree (default: max_nodes=10). Preceeding generations for the Online Convergence Detection (OCD, default: ngens=14) (see below for details). Variance limit for the Online convergence detection (default: bound=10^10). Degree of initilization [0, 1], i.e. the probability of a node having a subnode. Type of parent selection, "tournament" for tournament selection (default), "roulette" for roulette-wheel selection, and "winkler" for random selection of the first parent and the second parent by roulette-wheel selection. tournament_size Size of tournament for ps="tournament" (default: tournament_size=4).

NHEMO 3 crossover Crossover operator, "standard" for one point crossover swapping two randomly chosen subtrees of the parents, "brood" for brood crossover (for details see Tackett (1994)), "poli" for a size-dependent crossover with the crossover point from the so called common region of both parents (for details see Poli and Langdon (1998)). brood_size Number of offspring created by brood crossover (default: brood_size=4). crossover_prob Probability to perform crossover [0, 1] (default: crossover_prob=0.50). mutation_prob Probability to perform mutation [0, 1] (default: mutation_prob=0.50). CV Cross validation steps as natural number bigger than 1 (default: CV=5). vim Variable importance measure to be used to improve standard crossover. vim=0 for no variable improtance measure (default), vim=1 for simple absolute frequency, vim=2 for simple relative frequency, vim=3 for relative frequency, vim=4 for linear weigthed relative frequency, vim=5 for exponential weigthed relative frequency, and vim=6 for permutation accuracy importance.... Arguments passed to or from other methods. Details With NHEMO a cost-sensitive classification can be performed, i.e. building a tree learner by solving a two-objective optimization problem with regard to minimizing misclassification rate and minimizing total costs for classification (sum of costs for all used variables in the classifier). The non-hierarchical evolutionary multi-objective tree learner performs this multi-objective optimization based on an EMOA with tree representation. It optimizes both objectives simultaneously without any hierarchy and generates Pareto-optimal classifiers being trees to solve the problem. Cutoffs of the tree learners are optimized with the EMOA. Termination criteria are the maximal amount of generations and the Online Convergence Detection (OCD) proposed by Wagner and Trautmann (2010). Here, OCD uses the dominated hypervolume as quality criterion. If its variance over the last g generations is significantly below a given threshold L according to the one-sided χ 2 -variance test OCD stops the run. We followed the suggestion of Wagner and Trautmann (2010) and considered their parameter settings as default values. Missing data in the grouping variable or the explanatory variables are excluded from the analysis automatically. NHEMO does not work with missing data in CostMatrix. Costs of all explanatory variables set to 1 results in optimizing the amount of explanatory variables in the tree learner as second objective. Author(s) Swaantje Casjens References R. Poli and W.B. Langdon. Schema theory for genetic programming with one-point crossover and point mutation. Evolutionary Computation, 6(3):231-252, 1998a. W.A. Tackett. Recombination, selection und the genetic construction of computer programs. PhD thesis, University of Southern California, 1994. T. Wagner and H. Trautmann. Online convergence detection for evolutionary multiobjective algorithms revisited. In: IEEE Congress on Evolutionary Computation, 1-8, 2010.

4 NHEMOtree See Also NHEMOtree Examples # Simulation of data and costs d <- Sim_Data(Obs=200) CostMatrix<- Sim_Costs() # NHEMO calculations with function NHEMOtree and type="nhemo" res<- NHEMOtree(method="NHEMO", formula=y2~., data=d, CostMatrix=CostMatrix, gens=5, popsize=10, max_nodes=5, ngens=5, bound=10^-10, init_prob=0.8, ps="tournament", tournament_size=4, crossover="standard", crossover_prob=0.1, mutation_prob=0.1, CV=5, vim=1) res NHEMOtree Non-hierarchical evolutionary multi-objective tree learner to perform cost-sensitive classification Description NHEMOtree performs cost-sensitive classification by solving the two-objective optimization problem of minimizing misclassification rate and minimizing total costs for classification. The three methods comprised in NHEMOtree ("NHEMO", "NHEMO_Cutoff", "Wrapper") are based on EMOAs with either tree representation or bitstring representation with a classification tree as enclosed classifier. Usage NHEMOtree(method = c("nhemo", "NHEMO_Cutoff", "Wrapper"), formula, data, x, grouping, CostMatrix,...) Arguments method formula data Optimization method, "NHEMO" for standard non-hierarchical evolutionary multi-objective optimization with cutoff optimization by the EMOA, "NHEMO_Cutoff" for NHEMO with local cutoff optimization, "Wrapper" for the wrapper approach (see below for details). A formula of the form groups ~ x1 + x2 +... The response is the grouping factor and the right hand side specifies the explanatory variables. Data frame from which variables specified in formula are preferentially to be taken.

NHEMOtree 5 x grouping CostMatrix (Required if no formula is given as the principal argument.) A matrix or data frame containing the explanatory variables. (Required if no formula is given as the principal argument.) A factor specifying the class for each observation. A data frame containing the names (first column) and the costs (second column) for each explanatory variable in formula or x. NHEMOtree does not work with missing data in CostMatrix.... Arguments passed to or from other methods. Details With NHEMOtree three types of cost-sensitive classification can be performed, i.e. building a tree learner by solving a two-objective optimization problem with regard to minimizing misclassification rate and minimizing total costs for classification (summarized costs for all used variables in the classifier). First, the non-hierarchical evolutionary multi-objective tree learner (NHEMOtree) performs this multi-objective optimization based on an EMOA with tree representation (method="nhemo"). It optimizes both objectives simultaneously without any hierarchy and generates Pareto-optimal classifiers being binary trees to solve the problem. Cutoffs of the tree learners are optimized with the EMOA. Second, NHEMOtree with local cutoff optimization works like NHEMOtree but the cutoffs of the tree learner are optimized analogous to a classification tree with recursive partitioning either based on the Gini index or the misclassification rate (method="nhemo_cutoff"). Third, a wrapper approach based on NSGA-II with enclosed classification tree algorithm can be chosen to solve the multi-objective optimization problem (method="wrapper"). The classification trees are built with rpart rpart. However, wrapper approaches suffer from a hierarchy in the objectives at which misclassification is minimized at first followed by optimizing costs. Termination criteria of the EMOAs are the maximal amount of generations and the Online Convergence Detection (OCD) proposed by Wagner and Trautmann (2010). Here, OCD uses the dominated hypervolume as quality criterion. If its variance over the last g generations is significantly below a given threshold L according to the one-sided χ 2 -variance test OCD stops the run. We followed the suggestion of Wagner and Trautmann (2010) and considered their parameter settings as default values. Missing data in the grouping variable or the explanatory variables are excluded from the analysis automatically. NHEMOtree does not work with missing data in "CostMatrix". Costs of all explanatory variables set to 1 results in optimizing the amount of explanatory variables in the tree learner as second objective. Value Values returned by NHEMOtree are S_Metric S metric of the final population Misclassification_total Range of the misclassification rate of the final population

6 NHEMOtree Costs_total VIMS Trees Best_Tree S_Metrik_temp method Range of the costs of the final population Variable improtance measures of all explanatory variables for method=c("nhemo", "NHEMO_Cutoff") Individuals of the final population Individual of the final population with the lowest misclassification rate, its misclassification rate and costs for method=c("wrapper") S metric values of each generation Method run with NHEMOtree Author(s) Swaantje Casjens References L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Trees. Wadsworth, 1984. K. Deb, A. Pratap, S. Agarwal and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Transactions on Evolutionary Computation, 6(2):182-197, 2002. R. Poli and W.B. Langdon. Schema theory for genetic programming with one-point crossover and point mutation. Evolutionary Computation, 6(3):231-252, 1998a. W.A. Tackett. Recombination, selection und the genetic construction of computer programs. PhD thesis, University of Southern California, 1994. T. Wagner and H. Trautmann. Online convergence detection for evolutionary multiobjective algorithms revisited. In: IEEE Congress on Evolutionary Computation, 1-8, 2010. See Also plot.nhemotree, NHEMO, NHEMO_Cutoff, Wrapper Examples # Simulation of data and costs d <- Sim_Data(Obs=200) CostMatrix<- Sim_Costs() # NHEMOtree calculations res<- NHEMOtree(method="NHEMO", formula=y2~., data=d, CostMatrix=CostMatrix, gens=5, popsize=5, crossover_prob=0.1, mutation_prob=0.1) res

NHEMO_Cutoff 7 NHEMO_Cutoff Non-hierarchical evolutionary multi-objective optimization with local cutoff optimization Description Usage NHEMO_Cutoff performs cost-sensitive classification by solving the non-hierarchical evolutionary two-objective optimization problem of minimizing misclassification rate and minimizing total costs for classification. NHEMO_Cutoff is based on an EMOA with tree representation and local cutoff optimization. Cutoffs of the tree learner are optimized analogous to a classification tree with recursive partitioning either based on the Gini index or the misclassification rate. NHEMO_Cutoff(data, CostMatrix, gens = 50, popsize = 50, max_nodes = 10, ngens = 14, bound = 10^-10, init_prob = 0.8, ps = c("tournament", "roulette", "winkler"), tournament_size = 4, crossover = c("standard", "brood", "poli"), brood_size = 4, crossover_prob = 0.5, mutation_prob = 0.5, CV = 5, vim = 0, ncutoffs = 10, opt = c("gini", "mcr")) Arguments data CostMatrix gens popsize max_nodes ngens bound init_prob A data frame containing in the first column the class for each observation and in the other columns from which variables specified in formula are preferentially to be taken. A data frame containing the names (first column) and the costs (second column) for each explanatory variable in formula or x. NHEMOtree does not work with missing data in CostMatrix. Maximal number of generations of the EMOA (default: gens=50). Population size in each generation of the EMOA (default: popsize=50). Maximal number of nodes within each tree (default: max_nodes=10). Preceeding generations for the Online Convergence Detection (OCD, default: ngens=14) (see below for details). Variance limit for the Online convergence detection (default: bound=10^10). Degree of initilization [0, 1], i.e. the probability of a node having a subnode (default: init_prob=0.80). ps Type of parent selection, "tournament" for tournament selection (default), "roulette" for roulette-wheel selection, and "winkler" for random selection of the first parent and the second parent by roulette-wheel selection. tournament_size Size of tournament for ps="tournament" (default: tournament_size=4).

8 NHEMO_Cutoff crossover brood_size Crossover operator, "standard" for one point crossover swapping two randomly chosen subtrees of the parents, "brood" for brood crossover (for details see Tackett (1994)), "poli" for a size-dependent crossover with the crossover point from the so called common region of both parents (for details see Poli and Langdon (1998)). Number of offspring created by brood crossover (default: brood_size=4). crossover_prob Probability to perform crossover [0, 1] (default: crossover_prob=0.50). mutation_prob CV vim ncutoffs opt Details Probability to perform mutation [0, 1] (default: mutation_prob=0.50). Cross validation steps as natural number bigger than 1 (default: CV=5). Variable importance measure to be used to improve standard crossover. vim=0 for no variable improtance measure (default), vim=1 for simple absolute frequency, vim=2 for simple relative frequency, vim=3 for relative frequency, vim=4 for linear weigthed relative frequency, vim=5 for exponential weigthed relative frequency, and vim=6 for permutation accuracy importance. Number of cutoffs per explanatory variable to be tested for optimality (default: ncutoffs=10). Type of local cutoff optimization, "gini" for local cutoff optimization by Gini index (default), "mcr" for local cutoff optimization by misclassification rate. The non-hierarchical evolutionary multi-objective tree learner (NHEMOtree) with local cutoff optimization solves a two-objective optimization problem with regard to minimizing misclassification rate and minimizing total costs for classification (summarized costs for all used variables in the classifier). It is based on an EMOA with tree representation. It optimizes both objectives simultaneously without any hierarchy and generates Pareto-optimal classifiers being binary trees to solve the problem. Cutoffs of the tree learner are optimized analogous to a classification tree with recursive partitioning either based on the Gini index or the misclassification rate. Termination criteria of NHEMO_Cutoff are the maximal amount of generations and the Online Convergence Detection (OCD) proposed by Wagner and Trautmann (2010). Here, OCD uses the dominated hypervolume as quality criterion. If its variance over the last g generations is significantly below a given threshold L according to the one-sided χ 2 -variance test OCD stops the run. We followed the suggestion of Wagner and Trautmann (2010) and considered their parameter settings as default values. Missing data in the grouping variable or the explanatory variables are excluded from the analysis automatically. NHEMO_Cutoff does not work with missing data in "CostMatrix". Costs of all explanatory variables set to 1 results in optimizing the amount of explanatory variables in the tree learner as second objective. Author(s) Swaantje Casjens References R. Poli and W.B. Langdon. Schema theory for genetic programming with one-point crossover and point mutation. Evolutionary Computation, 6(3):231-252, 1998a.

plot.nhemotree 9 W.A. Tackett. Recombination, selection und the genetic construction of computer programs. PhD thesis, University of Southern California, 1994. T. Wagner and H. Trautmann. Online convergence detection for evolutionary multiobjective algorithms revisited. In: IEEE Congress on Evolutionary Computation, 1-8, 2010. See Also NHEMOtree Examples # Simulation of data and costs d <- Sim_Data(Obs=200) CostMatrix<- Sim_Costs() # NHEMO_Cutoff calculations with function NHEMOtree and type="nhemo_cutoff" res<- NHEMOtree(method="NHEMO_Cutoff", formula=y2~., data=d, CostMatrix=CostMatrix, gens=5, popsize=5, max_nodes=5, ngens=5, bound=10^-10, init_prob=0.8, ps="tournament", tournament_size=4, crossover="standard", crossover_prob=0.1, mutation_prob=0.1, CV=5, vim=1, ncutoffs=5, opt="mcr") res plot.nhemotree Plot Method for Class NHEMOtree Description Generates different plots for class NHEMOtree Usage ## S3 method for class NHEMOtree plot(x, data, which = c("pareto", "S_Metric", "VIMS", "Tree"), vim = 0,...) Arguments x data which An object of class NHEMOtree. Data which was used in NHEMOtree. Type of plot, "Pareto" for the final Pareto front, "S_Metric" for increase of the dominated hypervolume by generation, "VIMS" for variable improtance for each explanatory variable, "Tree" for the individual with the lowest misclassification rate.

10 Sim_Costs vim Variable improtance measure to be plottet for which="vims", vim=1 for simple absolute frequency, vim=2 for simple relative frequency, vim=3 for relative frequency, vim=4 for linear weigthed relative frequency, vim=5 for exponential weigthed relative frequency, vim=6 for permutation accuracy importance (default: plots for all VIMs).... Arguments passed to or from other methods. Author(s) Swaantje Casjens See Also NHEMOtree Examples # Simulation of data and costs d <- Sim_Data(Obs=200) CostMatrix<- Sim_Costs() res<- NHEMOtree(type="NHEMO", formula=y2~., data=d, CostMatrix=CostMatrix, gens=5, popsize=5, crossover_prob=0.1, mutation_prob=0.1) plot(data=d, x=res, which="p") plot(data=d, x=res, which="s", col=3, type="l") plot(data=d, x=res, which="v", vim=5) plot(data=d, x=res, which="t") Sim_Costs Simulation of a cost matrix for use in NHEMOtree Description Simulation of a cost matrix for 10 variables in combination with Sim_Data and to be analyszed with "NHEMOtree". Usage Sim_Costs() Author(s) Swaantje Casjens See Also Sim_Data, NHEMOtree

Sim_Data 11 Examples Sim_Costs() Sim_Data Simulation of data for use in NHEMOtree Description Simulation of data with one grouping variable containing four classes and 20 explanatory variables. Variables X1 to X3 are informative for seperating the four classes. Variable X1 separates class 1, X2 separates class 1 and class 2, and X3 separates class 3 from class 4. Variables X4, X5, and X6 are created on basis of X3 and can also be used to separate class 3 from class 4 but with decreased prediction accuracy. Usage Sim_Data(Obs, VG=1, VP1=0.05, VP2=0.1, VP3=0.3) Arguments Obs VG VP1 VP2 VP3 Amount of observations. Overall accuracy for data separation [0, 1] with VG=1 (default) for perfect seperation. Decrease of prediction accuracy for variable X4 in comparison with X3 to separate class 3 from class 4 (default: VP1=0.05). Decrease of prediction accuracy for variable X5 in comparison with X3 to separate class 3 from class 4 (default: VP2=0.1). Decrease of prediction accuracy for variable X6 in comparison with X3 to separate class 3 from class 4 (default: VP3=0.3). Details With this function data with one grouping variable containing four classes and 20 explanatory variables X1 to X10 is simulated. Variable X1 separates class 1, X2 separates class 1 and class 2, and X3 separates class 3 from class 4. For all samples belonging to the according classes the explanatory variables X1 to X3 are drawn from a normal distribution with µ = 80 and σ 2 = 25. Samples which are not allocated to the corresponding class are drawn from a uniform distribution with minimum 0 and an adjustable maximum value. The maximum values of the uniform distributions are the smallest drawn random values of each variable. Variables X4, X5, and X6 are created on basis of X3 and separate class 3 from class 4, too. However, the prediction accuracy of these variables decreases gradually. The decrease is assigned by VP1, VP2, and VP3. Thus, the according amount of the discriminating samples of former class 3 are disturbed by assigning a value drawn from a uniform distribution. Accordingly, X4, X5 and X6

12 Wrapper discriminate class 3 worse than X3. X7 to X10 are noisy variables drawn from a normal distribution that contain no information. Noise is added to the class assignment by a binomial distribution. Each potential class is only with probability "VG" the equivalent class and with probability 1-"VG" one of the other classes. Variable costs correlate with their prediction accuracy so that variables containing more information are more expensive than variables with less or none information. The costs of the variables are generated with function "Sim_Costs". Author(s) See Also Swaantje Casjens Sim_Costs, NHEMOtree Examples d<- Sim_Data(Obs=200) head(d) Wrapper NSGA-II wrapper with enclosed classification tree algorithm Description Usage This wrapper approach is based on the Nondominated Sorting Genetic Algorithm II (NSGA-II) with an enclosed classification tree algorithm. It performs cost-sensitive classification by solving the two-objective optimization problem of minimizing misclassification rate and minimizing total costs for classification. Wrapper(data, CostMatrix, gens = 50, popsize = 50, ngens = 14, bound = 10^-10, init_prob = 0.8, crossover_prob = 0.5, mutation_prob = 0.5, CV = 5,...) Arguments data CostMatrix A data frame containing in the first column the class for each observation and in the other columns from which variables specified in formula are preferentially to be taken. A data frame containing the names (first column) and the costs (second column) for each explanatory variable in formula or x. NHEMOtree does not work with missing data in CostMatrix.

Wrapper 13 gens popsize ngens bound init_prob Maximal number of generations of the EMOA (default: gens=50). Population size in each generation of the EMOA (default: popsize=50). Preceeding generations for the Online Convergence Detection (OCD, default: ngens=14) (see below for details). Variance limit for the Online convergence detection (default: bound=10^10). Degree of initilization [0, 1], i.e. the amount of activated explanatory variables in the individuals of the initial population (default: init_prob=0.80). crossover_prob Probability to perform crossover [0, 1] (default: crossover_prob=0.50). mutation_prob CV Details Probability to perform mutation [0, 1] (default: mutation_prob=0.50). Cross validation steps as natural number bigger than 1 (default: CV=5).... Arguments passed to or from other methods. With Wrapper a wrapper approach based on the Nondominated Sorting Genetic Algorithm II (NSGA- II) with an enclosed classification tree algorithm is used to solve the two-objective optimization problem of minimizing misclassification rate and minimizing total costs for classification (summarized costs for all used variables in the classifier). The classification trees are constructed with rpart rpart. However, wrapper approaches suffer from a hierarchy in the objectives at which misclassification is minimized at first followed by optimizing costs. Parent selection is always performed by a binary tournament and there is only one-point crossover. Alternatives are the proposed non-hierarchical evolutionary multi-objective tree learners which do not suffer from a hierarchy in the objectives (see NHEMO NHEMOtree and NHEMO_Cutoff NHEMOtree). Termination criteria of the NSGA-II are the maximal amount of generations and the Online Convergence Detection (OCD) proposed by Wagner and Trautmann (2010). Here, OCD uses the dominated hypervolume as quality criterion. If its variance over the last g generations is significantly below a given threshold L according to the one-sided χ 2 -variance test OCD stops the run. We followed the suggestion of Wagner and Trautmann (2010) and considered their parameter settings as default values. Missing data in the grouping variable or the explanatory variables are excluded from the analysis automatically. Wrapper does not work with missing data in "CostMatrix". Costs of all explanatory variables set to 1 results in optimizing the amount of explanatory variables in the tree learner as second objective. Author(s) Swaantje Casjens References L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Trees. Wadsworth, 1984. K. Deb, A. Pratap, S. Agarwal and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Transactions on Evolutionary Computation, 6(2):182-197, 2002.

14 Wrapper R. Poli and W.B. Langdon. Schema theory for genetic programming with one-point crossover and point mutation. Evolutionary Computation, 6(3):231-252, 1998a. W.A. Tackett. Recombination, selection und the genetic construction of computer programs. PhD thesis, University of Southern California, 1994. T. Wagner and H. Trautmann. Online convergence detection for evolutionary multiobjective algorithms revisited. In: IEEE Congress on Evolutionary Computation, 1-8, 2010. See Also NHEMOtree Examples # Simulation of data and costs d <- Sim_Data(Obs=200) CostMatrix<- Sim_Costs() # Wrapper calculations with function NHEMOtree and type="wrapper" res<- NHEMOtree(method="Wrapper", formula=y2~., data=d, CostMatrix=CostMatrix, gens=5, popsize=5, init_prob=0.8, ngens=14, bound=10^-10, crossover_prob=0.1, mutation_prob=0.1, CV=5) res

Index Topic Classification tree Wrapper, 12 Topic Classification NHEMO, 2 NHEMO_Cutoff, 7 NHEMOtree, 4 plot.nhemotree, 9 Topic Evolutionary algorithms NHEMO, 2 NHEMO_Cutoff, 7 NHEMOtree, 4 plot.nhemotree, 9 Wrapper, 12 Topic Multi-objective optimization NHEMO, 2 NHEMO_Cutoff, 7 NHEMOtree, 4 plot.nhemotree, 9 Wrapper, 12 Topic Non-hierarchical evolutionary multi-objective tree learner NHEMO, 2 NHEMO_Cutoff, 7 NHEMOtree, 4 Sim_Costs, 10 Sim_Data, 11 Topic Nondominated Sorting Genetic Algorithm II Wrapper, 12 Topic Plots for NHEMOtree plot.nhemotree, 9 Topic Wrapper approach Wrapper, 12 Sim_Costs, 10, 12 Sim_Data, 10, 11 Wrapper, 6, 12 NHEMO, 2, 6 NHEMO_Cutoff, 6, 7 NHEMOtree, 4, 4, 9, 10, 12, 14 plot.nhemotree, 6, 9 print.nhemotree (NHEMOtree), 4 15