Genetic Algorithms applied to Clustering Problem and Data Mining

Similar documents
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

A DATA MINING APPLICATION IN A STUDENT DATABASE

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Cluster Analysis. Cluster Analysis

Cluster Analysis of Data Points using Partitioning and Probabilistic Model-based Algorithms

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

SCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS

An Interest-Oriented Network Evolution Mechanism for Online Communities

Sciences Shenyang, Shenyang, China.

The OC Curve of Attribute Acceptance Plans

A Simple Approach to Clustering in Excel

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

Single and multiple stage classifiers implementing logistic discrimination

Improved Mining of Software Complexity Data on Evolutionary Filtered Training Sets

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

Ants Can Schedule Software Projects

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Gender Classification for Real-Time Audience Analysis System

Forecasting the Direction and Strength of Stock Market Movement

Mooring Pattern Optimization using Genetic Algorithms

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

An efficient constraint handling methodology for multi-objective evolutionary algorithms

The Greedy Method. Introduction. 0/1 Knapsack Problem

Can Auto Liability Insurance Purchases Signal Risk Attitude?

A New Task Scheduling Algorithm Based on Improved Genetic Algorithm

What is Candidate Sampling

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

L10: Linear discriminants analysis

Software project management with GAs

A Load-Balancing Algorithm for Cluster-based Multi-core Web Servers

A Secure Password-Authenticated Key Agreement Using Smart Cards


Optimal Choice of Random Variables in D-ITG Traffic Generating Tool using Evolutionary Algorithms

Performance Management and Evaluation Research to University Students

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST)

An Analysis of Dynamic Severity and Population Size

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem

The Journal of Systems and Software

Support Vector Machines

Blending Roulette Wheel Selection & Rank Selection in Genetic Algorithms

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

1 Example 1: Axis-aligned rectangles

Research Article Enhanced Two-Step Method via Relaxed Order of α-satisfactory Degrees for Fuzzy Multiobjective Optimization

IMPACT ANALYSIS OF A CELLULAR PHONE

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Customer Segmentation Using Clustering and Data Mining Techniques

Ant Colony Optimization for Economic Generator Scheduling and Load Dispatch

Learning with Imperfections A Multi-Agent Neural-Genetic Trading System. with Differing Levels of Social Learning

Inter-Ing INTERDISCIPLINARITY IN ENGINEERING SCIENTIFIC INTERNATIONAL CONFERENCE, TG. MUREŞ ROMÂNIA, November 2007.

ERP Software Selection Using The Rough Set And TPOSIS Methods

A Suspect Vehicle Tracking System Based on Video

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

Using Multi-objective Metaheuristics to Solve the Software Project Scheduling Problem

A Genetic Programming Based Stock Price Predictor together with Mean-Variance Based Sell/Buy Actions

Implementations of Web-based Recommender Systems Using Hybrid Methods

Dynamic Fuzzy Pattern Recognition

Implementation of Deutsch's Algorithm Using Mathcad

Testing and Debugging Resource Allocation for Fault Detection and Removal Process

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

A GENETIC ALGORITHM-BASED METHOD FOR CREATING IMPARTIAL WORK SCHEDULES FOR NURSES

Improved SVM in Cloud Computing Information Mining

An Alternative Way to Measure Private Equity Performance

Lecture 2: Single Layer Perceptrons Kevin Swingler

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE

HowHow to Find the Best Online Stock Broker

Mining Multiple Large Data Sources

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

LITERATURE REVIEW: VARIOUS PRIORITY BASED TASK SCHEDULING ALGORITHMS IN CLOUD COMPUTING

Planning for Marketing Campaigns

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35, , ,200,000 60, ,000

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

Sensor placement for leak detection and location in water distribution networks

Business Process Improvement using Multi-objective Optimisation K. Vergidis 1, A. Tiwari 1 and B. Majeed 2

Energy Efficient Coverage Optimization in Wireless Sensor Networks based on Genetic Algorithm

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

8 Algorithm for Binary Searching in Trees

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

Intelligent Method for Cloud Task Scheduling Based on Particle Swarm Optimization Algorithm

Data Visualization by Pairwise Distortion Minimization

Project Networks With Mixed-Time Constraints

An MILP model for planning of batch plants operating in a campaign-mode

J. Parallel Distrib. Comput.

Patterns Antennas Arrays Synthesis Based on Adaptive Particle Swarm Optimization and Genetic Algorithms

A Binary Particle Swarm Optimization Algorithm for Lot Sizing Problem

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

Machine Learning and Software Quality Prediction: As an Expert System

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

Transcription:

Proceedngs of the 7th WSEAS Internatonal Conference on Smulaton, Modellng and Optmzaton, Beng, Chna, September 5-7, 007 9 Genetc Algorthms appled to Clusterng Problem and Data Mnng JF JIMENEZ a, FJ CUEVAS b, JM CARPIO a a Insttuto Tecnológco de León, Av Tecnológco s/n, Fracc Julán de Obregón, CP3790 León, Guanauato, Méxco b Centro de Investgacones en Óptca AC, Loma del Bosque 5, CP 3750, León, Guanauato, Méxco fcuevas@comx http://wwwcomx and http://smbatleonedumx/prncpalhtml Abstract: - Clusterng technques have obtaned adequate results when are appled to data mnng problems However, dfferent runs of the same clusterng technque on a specfc dataset may result n dfferent solutons The cause of ths dfference s the choce of the ntal cluster settng and the values of the parameters assocated wth the technque A defnton of good ntal settngs and optmal parameters values s not an easy task, partcularly because both vary largely from one dataset to another In ths paper the authors nvestgate the use of Genetc Algorthms to determne the best ntalzaton of clusters, as well as the optmzaton of the ntal parameters The expermental results show the great potental of the Genetc Algorthms for the mprovement of the clusters, snce they do not only optmze the clusters, but resolve the problem of the number cluster, whch had been gvng t form a pror The technques of clusterng are most used n the analyss of nformaton or Data Mnng, ths method was appled to Data Set at mnng ey-words: - Clusterng Technques, Data Mnng, k-means, Genetc Algorthms Introducton Clusterng has always been a key task n the process of acqurng knowledge The complexty and specally the dversty of phenomena have forced socety to organze thngs based on ther smlartes The obectve of cluster analyss s to sort the observatons nto clusters such as the degree of natural assocaton whch s hgh among members of the same cluster and low between members of dfferent clusters (Berry, 003; Tou and Gonzalez, 974; Webb, 00), the complexty of such task s easly recognzed due to the number of possble arrangements Senstvty to ntal ponts and convergence to local optmum are usually among the problems affectng the nteractve technques such as k-means (Bradley and Fayyad, 998) Largely used, cluster analyss has called the attenton of a very large number of academc dscplnes Most of the work done on nternal spatal and socal structure of ctes has n some way used classfcaton as a bass for data sets analyss usng Data mnng There are several establshed methods for generatng a clusterng algorthmcally (Evertt, 99; aufman and Rousseeuw, 990; Gersho and Gray, 99) The most cted and wdely used method s the k-means algorthm (McQueen, 967) It begns wth an ntal soluton, whch s teratvely mproved usng two dfferent optmalty crtera n turn untl a local mnmum has been reached The algorthm s easy to mplement and t gves postve results n most cases The problem of the technques clusterng ncludes two search and selecton sub-problems: () number of clusters to formng (k), and () the ntal elements of these clusters Clusterng of adequate qualty has been obtaned by genetc algorthms (GA) (varv and Frat, 003; Nald and Carvalho, 003; Wang, et al 006), Solvng the problem of ntalzaton of the clusters However, t does not solve the selecton problem of the number of clusters In ths paper we propose an adaptve genetc algorthm for the clusterng problem, our am s to gve an effectve algorthm whch obtans good solutons for the optmzaton problem wthout explct parameter tunng, and each ndvdual of the GA populaton contans a set of parameter values These parameters are used for the generaton of clusters Data mnng and Clusterng Data mnng s an nterdscplnary feld, the confluence of a set of dscplnes, ncludng database systems, statstcs, machne learnng, vsualzaton, and nformaton scence Moreover, dependng on the data mnng approach used, technques from other dscplnes may be appled, such as neural networks, fuzzy and/or rough set theory, knowledge

Proceedngs of the 7th WSEAS Internatonal Conference on Smulaton, Modellng and Optmzaton, Beng, Chna, September 5-7, 007 0 representaton, nductve logc programmng, or hgh performance computng Dependng on the knds of data to be mned or on the gven data mnng applcaton, the data mnng system may also ntegrate technques from spatal data analyss, nformaton retreval, pattern recognton, mage analyss, sgnal processng, computer graphcs, Web technology, economcs, or psychology A data mnng system has the potental to generate thousands or even mllons of patterns, or rules Are all of the patterns nterestng? The answer n not, only a small fracton of the patterns potentally generated would actually be of nterest to any gven user Clusterng also has been studed n the felds of machne learnng and statstcal pattern recognton as a type of unsupervsed learnng because t does not rely on predefned class-labeled tranng examples (Duda, Hart, & Stork, 00) The knds of knowledge to be mned: Ths specfes the data mnng functons to be performed, such as characterzaton, dscrmnaton, assocaton, classfcaton, clusterng, or evoluton analyss For nstance, f studyng the buyng habts of customers n Mexco, you may choose to mne assocatons between customers Insde the data mnng they workng n practcal pattern-classfcaton and knowledge-dscovery problems requre the selecton of a subset of attrbutes or features to represent the patterns to be classfed, some works manage wth genetc algorthms, memetc algorthms, n some cases cultural algorthms (Skora and Pramuthu, 007; Ochoa, et al, 007), n our case we wll use clusterng technques and wll optmze them wth GA 3 Clusterng Clusterng methods partton a set of obects nto clusters such that obects n the same cluster are more smlar to each other than obects n dfferent clusters accordng to some defned crtera The clusterng problem s defned as follows Gven a set of N data obects x, partton the data set nto clusters n such a way that smlar obects are cluster together and obects wth dssmlar features belong to dfferent clusters M patterns x, x,, xm, a process clusterng conssts of searchng clusters S, =,,, Every cluster s characterzed to have centrod (mean), t s the optmal pattern of the cluster, and s formed for Z, =,,, Scheme functonally clusterng technques as s ndcated n the Fgure The general clusterng problem ncludes two subproblems: () Intalzaton of centrods and patterns processng, () decson of the number of clusters X, X,, X M Algorthm of Clusterng parameters The most cted and wdely used method s the k- means algorthm (McQueen, 967) It begns wth an ntal soluton, whch s teratvely mproved usng two dfferent optmalty crtera n turn untl a local mnmum has been reached The algorthm s easy to mplement and t gves reasonable results n most cases Typcally the k-means algorthm starts wth an ntalzaton process n whch seed postons are defned Ths ntal step can have a sgnfcant mpact on the performance of the method (Bradley and Fayyad, 998) and can be done n a number of ways (Bradley and Fayyad, 998) After the ntal seed had been defned each data element s assgned to the nearest seed The next step conssts on repostonng the seeds, ths can be done after all elements are assgned to the nearest seed or as each one of the elements s assgned After ths, a new assgnment step s necessary and the process wll go on untl no further mprovement can be made, n other words a local optmum has been found Consderng that the assgnments wll be done on the bass of the dstance to the nearest seed, mplctly ths process wll produce a mnmzaton of the sum of the dstance squared between each data pont and ts nearest centrod of the cluster (Bradley and Fayyad, 998) The measurement of smlarty smpler s the dstance, f d t s a measurement of dssmlarty defned between two patterns there turns out to be evdent: d ( X, X ) = 0 d( X, X ) 0 { X, X X } 4 7,, In the scentfc lterature (Bow, 00; Tou and Gonzalez, 974; Webb, 00) they can fnd dfferent expressons, Eucldean dstance s a wdely used S 3 { X X } S,, S, 55 X M { X, X X } 0 3,, Fg Scheme functonally clusterng technques () 66

Proceedngs of the 7th WSEAS Internatonal Conference on Smulaton, Modellng and Optmzaton, Beng, Chna, September 5-7, 007 dstance functon n the clusterng context, and t s calculated as: M M M ( x x ) d( x, x = () ) M= The most mportant choce n the clusterng method s the obectve functon for evaluatng the qualty of a cluster, a commonly used obectve crteron s to mnmze the sum of squared dstances of the data obects to ther cluster representatves, and t s calculated as: If lke Z s the centrod of the clusterng Z = N X S The sum of squared errors s: = X S X S, calculated (3) J = X Z (4) e The specfcatons of the algorthm of k-means are the followng: A Algorthm usng to on a data set M nformaton n clusters B The algorthm converges to local optmum Z Select at random the centrod { z, z,, z } Repeat untl the crteron stop s satsfed J e a) Every pattern of the data set assgns the most nearby cluster ( x, Z ) d( x, Z ) x M, d, (5) b) update from the new assgned patterns Z = E( x), x S, (6) 4 Genetc Algorthms Evoluton has proven to be a very powerful mechansm n fndng good solutons to dffcult problems One can look at the natural selecton as an optmzaton method, whch tres to produce adequate solutons to partcular envronments In spte of the large number of applcatons of GA n dfferent types of optmzaton problems, there s very lttle research on usng ths knd of approach to the clusterng problem (varv and Frat,003; Nald and Carvalho, 006; Wang, 003) In fact, the qualty of the solutons that ths technque has showed n dfferent types of felds and problems (Mtchell, 996) t makes perfect sense to try to use t n clusterng problems The flexblty assocated wth GA s one mportant aspect and advantage to consder Wth the same genome representaton and ust by changng the ftness functon one can have a dfferent algorthm In the case of spatal analyss ths s partcularly mportant snce one can try dfferent ftness functons n an exploratory phase 5 Solvng the Clusterng Problem usng k-means method An ndvdual n the genetc tradtonal algorthm codfes a solutonω In ths case the soluton s gven n the ndvdual, whch encodes t The reason of ths conceptual dstncton between the ndvdual and the soluton s that an ndvdual ncludes the nformaton of the parameter entry of the algorthm of k-means Before startng the codfcaton of the Genetc Algorthm, we must codfy the chromosomes chan that contans all the genetc nformaton of our system In ths case, analyze a massve repostory of nformaton, the scheme of the nformaton s as follows: Table Scheme of the set nformaton Car Car Car X X M where M s the number of samples, and N s the number of characterstcs or dmensons of every sample In the chromosome s encoded the followng parameters of the k-means algorthm: () the number of clusters k and () the number of characterstcs or attrbutes that wll be used n the clusterng process n the range of [, N] denomnated C Then the chromosome structure can be represented as follows, Number of Clusters () Car Car Car C Wth nformaton prevously descrbed there s generated a chromosome chan ω, whch ths composed for: the number of k clusters, ths number N

Proceedngs of the 7th WSEAS Internatonal Conference on Smulaton, Modellng and Optmzaton, Beng, Chna, September 5-7, 007 s selected random n the range of [, _ max], ths s to explore the space of search of the clusters, also the soluton ω ths composed by the numbers of the characterstcs to usng n the clusterng technques As example of chromosome take the maxmum = 7 and C = 3 of 7 possble ones, whch can be represented each one by 3 bts (see Table ) Table Example of the genotype of the chromosome chans Number of Car Car Car Chromosome 3 Cluster () 0 0 00 0 00000 0 0 00 00 000000 3 00 0 0 0000 The chromosome s generated of form random, where the number of clusters [, _ max] and Car [, maxmum number of characterstcs or column of the data set] for =,, and 3 The followng step generates hs phenotype, whch s the representaton n decmal form Table 3 Example of the phenotype of the Number of Clusters () chromosome chans Car Car Car 3 Chromosome 3 6 5 3,6,,5 5 3 4 5,3,4, 3 7 4 5 3 7,4,5,3 These characterstcs are the parameters of entry to the algorthm of k-means Scheme of the algorthm proposed for the resoluton of clusterng problems: To obtan the crossover probablty ( P C ) and mutaton ( P M ), populaton sze ( G ), and maxmum number of generatons (T ) Generate G random ndvduals to form the ntal generaton 3 Iterate the followng T generatons a) Apply k-means to G ndvduals b) To obtan the statstcs of the clusters of every ndvdual G c) Select G survvng ndvduals for B the new generaton d) Select G G pars of ndvduals as B the set of parents e) For each par of parents ( a, b ) do the followng: ) Create the soluton ω of n the offsprng by crossng the solutons of the parents ) Mutate ω wth probablty pm ) To evaluate the qualty of the soluton ω n a functon of ftness v) Add n to the new generaton f) Replace the current generaton by the new generaton 4 Output the best soluton of the fnal generaton In every run of the k-means, such statstcs are obtaned for ω lke standard devaton and dstances between clusters, to be able to evaluate them n functon ftness The chromosome chanω, ths to be evaluated by functon ftness, n ths case the crtera of the clusterng technque are: () to maxmze the dstance between clusters Ths functon can be wrtten lke: dstc = ( Z Z ) = = Where dstc s the average of the dstances of the centrods And () to mnmze the nternal standard devaton of every cluster, ths functon can be wrtten lke desvc= ( σ σ) = = The desvc mnmzes the Sum of Squared Errors of the standard devaton of the clusters Fnally combnng functons dstc and desvc, we obtan the functon of ftness f ( M ) = σ ( Z Z ) ( σ σ) = = + = = σ (7) (8) (9)

Proceedngs of the 7th WSEAS Internatonal Conference on Smulaton, Modellng and Optmzaton, Beng, Chna, September 5-7, 007 3 Where M s the set of patterns, before applyng ths functon ftness, M s evaluated by one clusterng technques n ths case that of k-means 6 Test Results The data set used n the smulated test had M=000 samples, N= characterstcs, only two characterstcs where used to generate fve clusters Clusters were generated takng 5 random centrods The samples were spread usng a Gaussan dstrbuton The other characterstcs were generated usng a unform dstrbuton In Fgure 3 the orgnal computer generated clusters are shown Fg3 Clusters generated by Computer smulaton (a) (c) Fg 4 Graphs of the results of k-means and the proposed algorthm, wth (a) Gen (b) Gen 6 (c) Gen 0 (d) Gen 35 The nput parameters used n the GA, were the followng: P =08, C P =000, M G =0, T =30, and a Boltzmann selecton method where used Fgure 4 shows the results of the proposed GA technque n Generatons, 6, 0 and 35 Fnally, on Generaton 35 the orgnal clusters are recovered 7 Concluson The good result of a clusterng method depends to a great extent on ntal parameters, n ths paper we (b) (d) proposed a genetc algorthm that adapts the ntal parameters The GA technque s appled n k-means clusterng method to determne the number of clusters, and the characterstcs to take n consderaton n the clusterng process The future work conssts n usng dfferent representaton schemes for the GA and compares the qualtes and shortcomngs of the dfferent representatons References: [] Berry Mchael W: Surver of Text Mnng: Clusterng, Classfcaton, and Retreval John Wley & Sons (003) [] Bow Sng-Tze: Pattern Recognton and Image Preprocessng Marcel Dekker Inc (00) [3] Bradley P, Fayyad U: Refnng Intal Ponts for -Means Clusterng, In J Shavlk, edtor, Proceedngs of the Ffteenth Internatonal Conference on Machne Learnng, Morgan aufmann (998) [4] Duda Rchard O, Hart Peter E: Pattern Classfcaton John Wley & Sons (00) [5] Goldberg Davd E: Genetc Algorthms n Search Optmzaton and Machne Learnng Addson-Wesley Publshng (989) [6] Gonzalez Rafael C, Woods Rchard E: Dgtal Image Processng Addson Wesley (00) [7] Hartgan, J: Clusterng Algorthms Wley Seres n Probablty and Mathematcal Statstcs, John Wley & Sons (975) [8] Huapt Randy L, Huapt Sue Ellen: Practcal Genentc Algorthm John Wley & Sons (005) [9] Jan Ak, Dubes RC: Algorthms for Clusterng Data Prentce-Hall (998) [0] varv Juha, Frat Pas: Self-Adaptatve Genetc Algorthm for Clusterng Journal for Heurstcs, luwer Academc Publshers 9: 3-9 (003) [] Marques de Sá JP: Pattern Recognton: Concept, Methods and Aplcatons Sprnger (00) [] Mtchel, Melane: An Introducton to Genetc Algorthms MIT Press, London (999) [3] Nald Murllo C, Carvalho André: Parttonal clusterng mprovement wth Genetc Algorthms (006) [4] Ochoa Alberto, Ponce Julo, Baltazar Rosaro: An approach to Cultural Algorthms from Data Mnng (COMCEV07) Mexcan congress of Evolutonary Computaton (007) [5] Pedrycz Wtold: nowledge Based Clusterng John Wley & Sons (005)

Proceedngs of the 7th WSEAS Internatonal Conference on Smulaton, Modellng and Optmzaton, Beng, Chna, September 5-7, 007 4 [6] Sato M, Sato Y, Jan L: Fuzzy Clusterng Models and Applcatons Sprnger-Verlag (997) [7] Skora Ryaz, Pramuthu Selwyn: Framework for effcent feature selecton n genetc algorthm based data mnng European Journal of Operatonal Research 80(): 73-737 (007) [8] Tou Julus T, Gonzalez Rafael C: Pattern Recognton Prncples Addson-Wesley (974) [9] Una-May O Relly, Tna Yu: Genetc Programmng Theory and Practce II Sprnger (005) [0] Wang Chang, Zengqang Chen, Qnln Sun, Zhuzh Yuan: Clusterng of Amno Acd Sequences based on -Medods Method Journal of Computer Engneerng, Vol9 No8 (003) [] Wang Chang, Zengqang Chen, Zhuzh Yuan: -Means Clusterng Based on Genetc Algorthm Journal of Computer Scence, Vol30 No (003) [] Webb Andrew R: Statstcal Pattern Recognton Prncples John Wley & Sons (00)