Mnng Feature Importance: Applyng Evolutonary Algorthms wthn a Web-based Educatonal System Behrouz MINAEI-BIDGOLI 1, and Gerd KORTEMEYER 2, and Wllam F. PUNCH 1 1 Genetc Algorthms Research and Applcatons Group (GARAGe), Department of Computer Scence & Engneerng, Mchgan State Unversty East Lansng, MI 48824, USA, {mnaeb, punch}@cse.msu.edu http://garage.cse.msu.edu 2 Dvson of Scence and Math Educaton, Mchgan State Unversty, LITE lab, East Lansng, MI 48824, USA, korte@lte.msu.edu http://www.lon-capa.org ABSTRACT A key objectve of data mnng s to uncover the hdden relatonshps among the objects n gven data set. Classfcaton has emerged as a popular task to fnd the groups of smlar objects n order to predct unseen test data ponts. Classfcaton fuson combnes multple classfcatons of data nto a sngle classfcaton soluton of hgher accuracy. Feature extracton ams to reduce the computatonal cost of feature measurement, ncrease classfer effcency, and allow greater classfcaton accuracy based on the process of dervng new features from the orgnal features. Recently web-based educatonal systems collect vast amounts of data on user patterns, and data mnng methods can be appled to these databases to dscover nterestng assocatons based on students features, problems attrbutes, and the actons taken by students n solvng homework and exam problems. Ths paper represents an approach for classfyng students n order to predct ther fnal grades based on features extracted from logged data n an educatonal web-based system. By weghng feature vectors representng feature mportance usng a Genetc Algorthm we can optmze the predcton accuracy and obtan sgnfcant mprovement over raw classfcaton. Ths work represents a rgorous applcaton of known classfers as a means of analyzng and comparng use and performance of students who have taken a techncal course that was partally/completely admnstered va the web. Keywords: Web-based Educatonal System, Data Mnng, Classfcaton Fuson, Predcton, Genetc Algorthm, Feature Extracton, Feature Importance. 1. INTRODUCTION The ever-ncreasng progress of network-dstrbuted computng and partcularly the rapd expanson of the web have had a broad mpact on socety n a relatvely short perod of tme. Educaton s on the brnk of a new era based on these changes. Onlne delvery of educatonal nstructon provdes the opportunty to brng colleges and unverstes new energy, students, and revenues. Many leadng educatonal nsttutons are workng to establsh an onlne teachng and learnng presence. Several web-based educatonal systems wth dfferent capabltes and approaches have been developed to delver onlne educaton n an academc settng. In partcular, Mchgan State Unversty (MSU) has poneered some of these systems to provde an nfrastructure for onlne nstructon. The research presented here was performed on a part of the latest onlne educatonal system developed at MSU, the Learnng Onlne Network wth Computer-Asssted Personalzed Approach (LON-CAPA). LON-CAPA s nvolved wth three knds of large data sets: 1) educatonal resources such as web pages, demonstratons, smulatons, and ndvdualzed problems desgned for use on homework assgnments, quzzes, and examnatons; 2) nformaton about users who create, modfy, assess, or use these resources; and 3) actvty log databases whch log actons taken by students n solvng homework and exam problems. In other words, we have three ever-growng pools of data. Ths paper nvestgates methods for extractng useful and nterestng patterns from these large databases usng onlne educatonal resources and ther recorded paths
wthn the system. We am to answer the followng research questons: Can we fnd classes of students? In other words, do there exst groups of students who use these onlne resources n a smlar way? If so, can we predct a class for any ndvdual student? Wth ths nformaton, can we then help a student use the resources better, based on the usage of the resource by other students n ther groups? We fnd smlar patterns of use n the data gathered from LON-CAPA, and eventually make predctons as to the most-benefcal course of studes for each learner based on ther past and present usage. The system could then make suggestons to the learner as to how best to proceed. 2. BACKGROUND Genetc Algorthms (GA) have been shown to be an effectve tool to use n data analyss and pattern recognton [1], [2], [3]. An mportant aspect of GAs n a learnng context s ther use n pattern recognton. There are two dfferent approaches to applyng GA n pattern recognton: 1. Apply a GA drectly as a classfer. Bandyopadhyay and Murthy n [4] appled GA to fnd the decson boundary n N dmensonal feature space. 2. Use a GA as an optmzaton tool for resettng the parameters n other classfers. Most applcatons of GAs n pattern recognton optmze some parameters n the classfcaton process. Many researchers have used GAs n feature selecton [5], [6], [7], [8]. GAs has been appled to fnd an optmal set of feature weghts that mprove classfcaton accuracy. Frst, a tradtonal feature extracton method such as Prncpal Component Analyss (PCA) s appled, and then a classfer such as k-nn s used to calculate the ftness functon for GA [9], [10]. Combnaton of classfers s another area that GAs have been used to optmze. Kuncheva and Jan n [11] used a GA for selectng the features as well as selectng the types of ndvdual classfers n ther desgn of a Classfer Fuson System. GA s also used n selectng the prototypes n the case-based classfcaton [12]. In ths paper we focus on the second approach and use a GA to optmze a combnaton of classfers. Our objectve s to predct the students fnal grades based on ther webuse features, whch are extracted from the homework data. We desgn, mplement, and evaluate a seres of pattern classfers wth varous parameters n order to compare ther performance on a dataset from LON-CAPA. Error rates for the ndvdual classfers, ther combnaton and the GA optmzed combnaton are presented. Two approaches are proposed for the problem of feature extracton and selecton. The flter model chooses features by heurstcally determned goodness/relevant or knowledge, whle the wrapper model does ths by the feedback of classfer evaluaton, or experment. Research has shown the wrapper model outperforms the flter model comparng the predctve power on unseen data [13]. We propose a wrapper model for feature extracton through settng dfferent weghts for features and gettng feedback from ensembles of classfers. 3. DATASETS AND FEATURES We selected 14 student/course data sets of LON-CAPA courses, whch were held at MSU n fall semester 2002 (FS02), sprng semester 2003 (SS03), and fall semester 2003 (FS03) as shown n Table 1. Table 1. 14 of MSU courses whch held by LON-CAPA Course Ttle Term ADV 205 Prncples of Advertsng SS03 BS 111 Bologcal Scence: Cells and Molecules SS02 BS 111 Bologcal Scence: Cells and Molecules SS03 CE 280 Cvl Engneerng: Intro Envronment Eng. SS03 FI 414 Advanced Busness Fnance (w) SS03 LBS 271 Lyman Brggs School: Physcs I FS02 LBS 272 Lyman Brggs School: Physcs II SS03 MT 204 Medcal Tech.: Mechansms of Dsease SS03 MT 432 Clnc Immun. & Immunohematology SS03 PHY 183 Physcs Scentsts & Engneers I SS02 PHY 183 Physcs Scentsts & Engneers I SS03 PHY 231c Introductory Physcs I SS03 PHY 232c Introductory Physcs II FS03 PHY 232 Introductory Physcs II FS03 Table 2. Characterstcs of courses whch held by LON-CAPA Course # of # of Sze of Sze of # of Students Problems Actvty log useful data Transactons ADV 205 609 773 82.5 MB 12.1 MB 424,481 BS 111 372 229 361.2 MB 34.1 MB 1,112,394 BS 111 402 229 367.6 MB 50.2 MB 1,689,656 CE 280 178 19 6 28.9 MB 3.5 MB 127,779 FI 414 169 68 16.8 MB 2.2 MB 83,715 LBS 271 132 174 119.8 MB 18.7 MB 706,700 LBS 272 102 166 73.9 MB 15.3 MB 585,524 MT 204 27 150 5.2 MB 0.7 MB 23,741 MT 432 62 150 20.0 MB 2.4 MB 90,120 PHY 183 227 184 140.3 MB 21.3 MB 452,342 PHY 183 306 255 210.1 MB 26.8 MB 889,775 PHY 231c 99 247 67.2 MB 14.1 MB 536,691 PHY 232c 83 194 55.1 MB 10.9 MB 412,646 PHY 232 220 259 138.5 MB 19.7 MB 981,568 Table 2 shows the characterstcs of these 14 courses; for example the thrd row of the table shows that the BS111 03 (Bologcal Scence: Cells and Molecules) was held n sprng semester 2003 ntegrated 229 onlne homework problems, and 402 students used LON-CAPA for ths course. the BS111 course had an actvty log wth approxmately 368 MB. Usng some perl scrpt modules for cleansng the data, we found 48 MB of useful data n
BS111 SS03 course. We then pulled out from these logged data 1,689,656 transactons (nteractons between students and homework/exam/quz problems) from whch from whch we extracted the followng nne features: 1. Total number of attempts before correct answer s derved 2. Total number of correct answers. (Success rate) 3. Success on the frst try 4. Success on the second try 5. Success after 3 to 9 attempts 6. Success after 10 or more attempts 7. Total tme from frst attempt untl the correct answer 8. Total tme spent on the problem, regardless of success 9. Partcpaton n the provded onlne communcaton mechansms wth other students and nstructonal staff Based on the above extracted features n each course, we classfy the students, and try to predct for every ndvdual student of test data to whch class he/she belongs. We categorze the students wth one of two class labels: Passed for grades hgher than 2.0, and Faled for grades less than or equal to 2.0 where the MSU gradng system s based on grades from 0.0 to 4.0. 4. CLASSIFICATION FUSION Pattern recognton has a wde varety of applcatons n many dfferent felds, such that t s not possble to come up wth a sngle classfer that can gve good results n all cases. The optmal classfer n every case s hghly dependent upon the problem doman. In practce, one mght come across a case where no sngle classfer can acheve an acceptable level of accuracy. In such cases t would be better to pool the results of dfferent classfers to acheve the optmal accuracy. Every classfer operates well on dfferent aspects of the tranng or test feature vector. As a result, assumng approprate condtons, combnng multple classfers may mprove classfcaton performance when compared wth any sngle classfer. The scope of ths study s restrcted to comparng some popular non-parametrc pattern classfers and a sngle parametrc pattern classfer accordng to the error estmate. Four dfferent classfers usng the LON-CAPA dataset are compared n ths study. The classfers used n ths study nclude Quadratc Bayesan classfer, 1-nearest neghbor (1-NN), k-nearest neghbor (k-nn), Parzenwndow. 1 These are some of the common classfers used n most practcal classfcaton problems. After some preprocessng operatons the optmal k=3 s chosen for knn algorthm. To mprove classfcaton performance, a fuson of classfers s performed. 1 The classfers are coded n MATLAB TM 6.5. Normalzton. Havng assumed n Bayesan and Parzenwndow classfers that the features are normally dstrbuted, t s necessary that the data for each feature be normalzed. Ths ensures that each feature has the same weght n the decson process. Assumng that the gven data s Gaussan, ths normalzaton s performed usng the mean and standard devaton of the tranng data. In order to normalze the tranng data, t s necessary frst to calculate the sample mean µ, and the standard devaton σ of each feature n ths dataset, and then normalze the data usng the Eq. (1). x µ x = (1) σ Ths ensures that each feature of the tranng dataset has a normal dstrbuton wth a mean of zero and a standard devaton of unty. In addton, the knn method requres normalzaton of all features nto the same range. Combnaton of Multple Classfers. In combnng multple classfers we mprove classfer performance. There are dfferent ways one can thnk of combnng classfers: The smplest way s to fnd the overall error rate of the classfers and choose the one whch has the least error rate on the gven dataset. Ths s called an offlne classfcaton fuson. Ths may appear to be a classfcaton fuson; however, n general, t has a better performance than ndvdual classfers. The second method, whch s called onlne classfcaton fuson, uses all the classfers followed by a vote. The class gettng the maxmum votes from the ndvdual classfers wll be assgned to the test sample. Usng the second method we show that classfcaton fuson can acheve a sgnfcant accuracy mprovement n all gven data sets. A GA s employed to determne whether classfcaton fuson performance can be maxmzed. 5. OPTIMIZING CLASSIFICATION FUSION WITH GENTIC ALGORITHMS Our goal s to fnd a populaton of best weghts for every feature vector, whch mnmze the classfcaton error rate. The feature vector for our predctors are the set of nne varables for every student: Number of attempts before correct answer s derved, Success rate, Success at the frst try, Success at the second try, Success wth number of tres between three and nne, Success wth hgh number of tres, the tme at whch the student got the problem correct
relatve to the due date, and total tme spent on the problem. We randomly ntalzed a populaton of nne dmensonal weght vectors wth values between 0 and 1, correspondng to the feature vector and expermented wth dfferent number of populaton szes. We found good results usng a populaton wth 200 ndvduals. Realvalued populatons may be ntalzed usng the GA MATLAB Toolbox functon crtrp. For example, to create a random populaton of nne ndvduals wth 200 varables each: we defne boundares on the varables n FeldD whch s a matrx contanng the boundares of each varable of an ndvdual. FeldD = [ 0 0 0 0 0 0 0 0 0; % lower bound 1 1 1 1 1 1 1 1 1]; % upper bound We create an ntal populaton wth Chrom = crtrp(200, FeldD), So we have for example: Chrom = 0.23 0.17 0.95 0.38 0.06 0.26 0.31 0.52 0.65 0.35 0.09 0.43 0.64 0.20 0.54 0.43 0.90 0.32 0.50 0.10 0.09 0.65 0.68 0.46 0.29 0.67 0.13 0.21 0.29 0.89 0.48 0.63 0.81 0.05 0.12 0.71 We used the smple genetc algorthm (SGA), whch s descrbed by Goldberg n [14]. The SGA uses common GA operators to fnd a populaton of solutons whch optmze the ftness values. We used Stochastc Unversal Samplng [14] as our selecton method. A form of stochastc unversal samplng s mplemented by obtanng a cumulatve sum of the ftness vector, FtnV, and generatng N equally spaced numbers between 0 and sum(ftnv). Thus, only one random number s generated, all the others used beng equally spaced from that pont. The ndex of the ndvduals selected s determned by comparng the generated numbers wth the cumulatve sum vector. The probablty of an ndvdual beng selected s then gven by F ( x ) = N nd = 1 f ( x where f(x ) s the ftness of ndvdual x and F(x ) s the probablty of that ndvdual beng selected. The operaton of crossover s not necessarly performed on all strngs n the populaton. Instead, t s appled wth a probablty Px when the pars are chosen for breedng. We selected Px = 0.7. Intermedate recombnaton combnes parent values usng the followng formula [15]: Offsprng = parent1 + Alpha (parent2 parent1) (3) where Alpha s a Scalng factor chosen unformly n the nterval [-0.25, 1.25] A further genetc operator, mutaton s appled to the new chromosomes, wth a set probablty Pm. Mutaton causes the ndvdual genetc representaton to be changed accordng to some probablstc rule. Mutaton s generally consdered to be a background operator that ensures that the probablty of searchng a partcular subspace of the ) f ( x ) (2) problem space s never zero. Ths has the effect of tendng to nhbt the possblty of convergng to a local optmum, rather than the global optmum. We consdered 1/800 as our mutaton rate. The mutaton of each varable s calculated as follows: Mutated Var = Var + MutMx range MutOpt(2) delta (4) where delta s an nternal matrx whch specfes the normalzed mutaton step sze; MutMx s an nternal mask table; and MutOpt specfes the mutaton rate and ts shrnkage durng the run. Durng the reproducton phase, each ndvdual s assgned a ftness value derved from ts raw performance measure gven by the objectve functon. Ths value s used n the selecton to bas towards more ft ndvduals. Hghly ft ndvduals, relatve to the whole populaton, have a hgh probablty of beng selected for matng whereas less ft ndvduals have a correspondngly low probablty of beng selected. The error rate s measured n each round of cross valdaton by dvdng the total number of msclassfed examples nto total number of test examples. Therefore, our ftness functon measures the accuracy rate acheved by classfcaton fuson and our objectve would be to maxmze ths performance (mnmze the error rate). 6. EXPERIMENTS Wthout usng GA, the overall results of classfcaton performance on our datasets for four classfers and classfcaton fuson are shown n the Table 3. Regardng ndvdual classfers, mostly, 1NN and knn have the best performance. However, the classfcaton fuson mproved the classfcaton accuracy sgnfcantly n all data sets. That s, t acheved n average 79% accuracy over the gven data sets. Table 3. Comparng the average performance% of ten runs of classfers on the gven datasets usng 10-fold cross valdaton, wthout GA Data sets Bayes 1NN knn Parzen Classfcaton Wndow Fuson ADV 205, 03 55.7 69.9 70.7 55.8 78.2 BS 111, 02 54.6 67.8 69.6 57.6 74.9 BS 111, 03 52.6 62.1 55.0 59.7 71.2 CE 280, 03 66.6 73.6 74.9 65.2 81.4 FI 414, 03 65.0 76.4 72.3 70.3 82.2 LBS 271, 02 66.9 75.6 73.8 59.6 79.2 LBS 272, 03 72.3 70.4 69.6 65.3 77.6 MT 204, 03 63.4 71.5 68.4 56.4 82.2 MT 432, 03 67.6 77.6 79.1 59.8 84.0 PHY 183, 02 73.4 76.8 80.3 65.0 83.9 PHY 183, 03 59.6 66.5 70.4 54.4 76.6 PHY 231c, 03 56.7 74.5 72.6 60.9 80.7 PHY 232c, 03 65.6 71.7 75.6 57.8 81.6 PHY 232, 03 59.9 73.5 71.4 56.3 79.8 For GA optmzaton, we used 200 ndvduals (weght vectors) n our populaton, runnng the GA over 500
generatons. We ran the program 10 tmes and got the averages, whch are shown, n Table 4. The results n Table 3 represent the mean performance wth a two-taled t-test wth a 95% confdence nterval for every data set. For the mprovement of GA over non-ga result, a P-value ndcatng the probablty of the Null-Hypothess (There s no mprovement) s also gven, showng the sgnfcance of the GA optmzaton. All have p<0.001, ndcatng sgnfcant mprovement. Therefore, usng GA, n all the cases, we got approxmately more than a 10% mean ndvdual performance mprovement and about 10 to 17% best ndvdual performance mprovement. Fg. 1 shows the results of one of the ten runs n the case of 2-Classes (passed and faled). The dotted lne represents the populaton mean, and the sold lne shows the best ndvdual at each generaton and the best value yelded by the run (Due to the space lmtaton, only a graph for BS 111 2003 GA-optmzaton s shown). Table 4. Comparng the classfcaton fuson performance on gven datasets, wthout-ga, usng-ga (Mean ndvdual) and mprovement, 95% confdence nterval Data sets Wthout GA GA ptmzed Improvement ADV 205, 03 78.19 ± 1.34 89.11 ± 1.23 10.92 ± 0.94 BS 111, 02 74.93 ± 2.12 87.25 ± 0.93 12.21 ± 1.65 BS 111, 03 71.19 ± 1.34 81.09 ± 2.42 9.82 ± 1.33 CE 280, 03 81.43 ± 2.13 92.61 ± 2.07 11.36 ± 1.41 FI 414, 03 82.24 ± 1.54 91.73 ± 1.21 9.50 ± 1.76 LBS 271, 02 79.23 ± 1.92 90.02 ± 1.65 10.88 ± 0.64 LBS 272, 03 77.56 ± 0.87 87.61 ± 1.03 10.11 ± 0.62 MT 204, 03 82.24 ± 1.65 91.93 ± 2.23 9.96 ± 1.32 MT 432, 03 84.03 ± 2.13 95.21 ± 1.22 11.16 ± 1.28 PHY 183, 02 83.87 ± 1.73 94.09 ± 2.84 10.22 ± 1.92 PHY 183, 03 76.56 ± 1.37 87.14 ± 1.69 9.36 ± 1.14 PHY 231c, 03 80.67 ± 1.32 91.41 ± 2.27 10.74 ± 1.34 PHY 232c, 03 81.55 ± 0.13 92.39 ± 1.58 10.78 ± 1.53 PHY 232, 03 79.77 ± 1.64 88.61 ± 2.45 9.13 ± 2.23 Total Average 78.98 ± 12 90.03 ± 1.30 10.53 ± 56 Fnally, we can examne the ndvduals (weghts) for features by whch we obtaned the mproved results. Ths feature weghtng ndcates the mportance of each feature for makng the requred classfcaton. In most cases the results are smlar to Multple Lnear Regressons or some tree-based software (lke CART) that use statstcal methods to measure feature mportance. The GA feature weghtng results, as shown n Table 4, state that the Success wth hgh number of tres s the most mportant feature. The Total number of correct answers feature s also the most mportant n some cases. Fg. 1. GA-Optmzed Combnaton of Multple Classfers (CMC) performance n the case of 2-Class labels (Passed and Faled) for BS111 2003, 200 weght vectors ndvduals, 500 Generatons Table 5. Relatve Feature Importance%, Usng GA weghtng for BS111 2003 course Feature Importance % Aerage Number of Tres 18.9 Total number of Correct Answers 84.7 # of Success at the Frst Try 24.4 # of Success at the Second Try 26.5 Got Correct wth 3-9 Tres 21.2 Got Correct wth # of Tres 10 91.7 Tme Spent to Solve the Problems 32.1 Total Tme Spent on the Problems 36.5 # of communcaton 3.6 Table 5 shows the mportance of the nne features n the BS 111 SS03 course, applyng the Gn splttng crteron. Based on Gn, a statstcal property called nformaton gan measures how well a gven feature separates the tranng examples n relaton to ther target classes. Gn characterzes mpurty of an arbtrary collecton of examples S at a specfc node N. In [16] the mpurty of a node N s denoted by (N) such that: Gn(S) = ( N) = j P( ω ) P( ω ) = 1 j j 2 P ( ω ) where P( ω j) s the fracton of examples at node N that go to category ω j. Gn attempts to separate classes by focusng on one class at a tme. It wll always favor workng on the largest or, f you use costs or weghts, the most mportant class n a node. j (5)
Table 5. Feature Importance for BS111 2003, usng decson-tree software CART, applyng Gn Crteron Varable Total number of Correct Answers 100.00 Got Correct wth # of Tres 10 93.34 Average number of tres 58.61 # of Success at the Frst Try 37.70 Got Correct wth 3-9 Tres 30.31 # of Success at the Second Try 23.17 Tme Spent to Solve the Problems 16.60 Total Tme Spent on the Problems 14.47 # of communcaton 2.21 Comparng results n Table 4 (GA-weghtng) and Table 5 (Gn ndex crteron) shows the very smlar output, whch demonstrates merts of the proposed method for detectng the feature mportance. 7. CONCLUSIONS We proposed a new approach to classfyng student usage of web-based nstructon. Four classfers are used n groupng the students. A combnaton of multple classfers leads to a sgnfcant accuracy mprovement n the gven data sets. Weghng the features and usng a genetc algorthm to mnmze the error rate mproves the predcton accuracy by at least 10% n the all three test cases. In cases where the number of features s low, feature weghtng worked much better than feature selecton. The successful optmzaton of student classfcaton n all three cases demonstrates the merts of usng the LON-CAPA data to predct the students fnal grades based on ther features, whch are extracted from the homework data. Ths approach s easly adaptable to dfferent types of courses, dfferent populaton szes, and allows for dfferent features to be analyzed. Ths work represents a rgorous applcaton of known classfers as a means of analyzng and comparng use and performance of students who have taken a techncal course that was partally/completely admnstered va the web. For future work, we plan to mplement such an optmzed assessment tool for every student on any partcular problem. Therefore, we can track students behavors on a partcular problem over several semesters n order to acheve more relable predcton. 8. ACKNOWLEDGMENT Ths work was partally supported by the Natonal Scence Foundaton under ITR 0085921. 9. REFERENCES 1. M.L. Raymer, W.F. Punch, E.D. Goodman, L.A. Kuhn, and A.K. Jan, Dmensonalty Reducton Usng Genetc Algorthms. IEEE Transactons on Evolutonary Computaton, Vol. 4, (2000) 164-171 2. A. K. Jan, D. Zongker, Feature Selecton: Evaluaton, Applcaton, and Small Sample Performance, IEEE Transacton on Pattern Analyss and Machne Intellgence, Vol. 19, No. 2, February (1997). 3. K.A. De Jong, W.M. Spears and D.F. Gordon, Usng genetc algorthms for concept learnng. Machne Learnng 13, (1993) 161-188. 4. S. Bandyopadhyay, and C.A. Muthy, Pattern Classfcaton Usng Genetc Algorthms, Pattern Recognton Letters, Vol. 16, (1995) 801-808 5. J. Bala, K. De Jong, J. Huang, H. Vafae, and H. Wechsler, Usng learnng to facltate the evoluton of features for recognzng vsual concepts. Evolutonary Computaton 4(3) - Specal Issue on Evoluton, Learnng, and Instnct: 100 years of the Baldwn Effect (1997) 6. C. Guerra-Salcedo, and D. Whtley, Feature Selecton mechansms for ensemble creaton: a genetc search perspectve. In: Fretas AA (Ed.) Data Mnng wth Evolutonary Algorthms: Research Drectons Papers from the AAAI Workshop, 13-17. Techncal Report WS-99-06. AAAI Press (1999) 7. H. Vafae, and K. De Jong, Robust feature Selecton algorthms, Proceedng of IEEE Internatonal Conference on Tools wth AI, Boston, Mass., USA. Nov. (1993) 356-363 8. M.J. Martn-Bautsta, and M.A.Vla, A survey of genetc feature selecton n mnng ssues. Proceedng Congress on Evolutonary Computaton (CEC-99), Washngton D.C., July (1999) 1314-1321 9. M. Pe, E.D. Goodman, and W.F. Punch, Pattern Dscovery from Data Usng Genetc Algorthms. Proceedng of 1 st Pacfc-Asa Conference Knowledge Dscovery & Data Mnng (PAKDD-97) (1997) 10. W.F. Punch, M. Pe, L. Cha-Shun, E.D. Goodman, P. Hovland, and R. Enbody, Further research on Feature Selecton and Classfcaton Usng Genetc Algorthms. In 5 th Internatonal Conference on Genetc Algorthm, Champagn IL, (1993) 557-564 11. L.I. Kuncheva, and L.C. Jan, Desgnng Classfer Fuson Systems by Genetc Algorthms. IEEE Transacton on Evolutonary Computaton, Vol. 33 (2000) 351-373 12. D. B. Skalak, Usng a Genetc Algorthm to Learn Prototypes for Case Retreval an Classfcaton, Proc. of the AAAI-93 Case-Based Reasonng Workshop, Washgton, D.C., Amercan Assocaton for Artfcal Intellgence, Menlo Park, CA, (1994) 64-69 13. G.H. John, R. Kohav, K. Pfleger, Irrelevant Features and the Subset Selecton Problem, Proceedngs of the Eleventh Internatonal Conference of Machne Learnng, Morgan Kaufmann Publshers, San Francsco, CA (1994) 121-129 14. D.E. Goldberg, Genetc Algorthms n Search, Optmzaton, and Machne Learnng. MA, Addson- Wesley (1989) 15. Muhlenben and D. Schlerkamp-Voosen, Predctve Models for the Breeder Genetc Algorthm, Contnuous Parameter Optmzaton, Evolutonary Computaton, Vol. 1, No. 1 (1993) 25-49 16. R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classfcaton. 2 nd Edton, John Wley & Sons, Inc., New York NY. 2001.