Using Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council



Similar documents
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Forecasting the Direction and Strength of Stock Market Movement

What is Candidate Sampling

An Interest-Oriented Network Evolution Mechanism for Online Communities

How To Classfy Onlne Mesh Network Traffc Classfcaton And Onlna Wreless Mesh Network Traffic Onlnge Network

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

An Alternative Way to Measure Private Equity Performance

Searching for Interacting Features for Spam Filtering

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

A Secure Password-Authenticated Key Agreement Using Smart Cards

PEER REVIEWER RECOMMENDATION IN ONLINE SOCIAL LEARNING CONTEXT: INTEGRATING INFORMATION OF LEARNERS AND SUBMISSIONS

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

DEFINING %COMPLETE IN MICROSOFT PROJECT

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

A Performance Analysis of View Maintenance Techniques for Data Warehouses

Traffic-light a stress test for life insurance provisions

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

Calculation of Sampling Weights

Semantic Link Analysis for Finding Answer Experts *

Mining Multiple Large Data Sources

Enterprise Master Patient Index

Recurrence. 1 Definitions and main statements

Performance Analysis and Coding Strategy of ECOC SVMs

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

Using Content-Based Filtering for Recommendation 1

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Single and multiple stage classifiers implementing logistic discrimination

Multiple-Period Attribution: Residuals and Compounding

Design and Development of a Security Evaluation Platform Based on International Standards

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

Extending Probabilistic Dynamic Epistemic Logic

IT09 - Identity Management Policy

To manage leave, meeting institutional requirements and treating individual staff members fairly and consistently.

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

WISE-Integrator: An Automatic Integrator of Web Search Interfaces for E-Commerce

Web Object Indexing Using Domain Knowledge *

Data Mining from the Information Systems: Performance Indicators at Masaryk University in Brno

Software project management with GAs

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Multi-sensor Data Fusion for Cyber Security Situation Awareness

Can Auto Liability Insurance Purchases Signal Risk Attitude?

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

Support Vector Machines

Lecture 2: Single Layer Perceptrons Kevin Swingler

Complex Service Provisioning in Collaborative Cloud Markets

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

RequIn, a tool for fast web traffic inference

A Dynamic Energy-Efficiency Mechanism for Data Center Networks

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

Traffic State Estimation in the Traffic Management Center of Berlin

Improved SVM in Cloud Computing Information Mining

Gender Classification for Real-Time Audience Analysis System

Nordea G10 Alpha Carry Index

The Greedy Method. Introduction. 0/1 Knapsack Problem

Probabilistic Latent Semantic User Segmentation for Behavioral Targeted Advertising*

A Simple Approach to Clustering in Excel

Master s Thesis. Configuring robust virtual wireless sensor networks for Internet of Things inspired by brain functional networks

IMPACT ANALYSIS OF A CELLULAR PHONE

A Dynamic Load Balancing for Massive Multiplayer Online Game Server

Assessing Student Learning Through Keyword Density Analysis of Online Class Messages

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Active Learning for Interactive Visualization

A Passive Network Measurement-based Traffic Control Algorithm in Gateway of. P2P Systems

Introduction CONTENT. - Whitepaper -

Research on Transformation Engineering BOM into Manufacturing BOM Based on BOP

Cooperative Load Balancing in IEEE Networks with Cell Breathing


Effective Network Defense Strategies against Malicious Attacks with Various Defense Mechanisms under Quality of Service Constraints

Comparison of Domain-Specific Lexicon Construction Methods for Sentiment Analysis

Dynamic Fuzzy Pattern Recognition

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Capacity-building and training

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

ERP Software Selection Using The Rough Set And TPOSIS Methods

A heuristic task deployment approach for load balancing

Using Association Rule Mining: Stock Market Events Prediction from Financial News

Financial Mathemetics

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Hosting Virtual Machines on Distributed Datacenters

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

The program for the Bachelor degrees shall extend over three years of full-time study or the parttime equivalent.

1 Example 1: Axis-aligned rectangles

Estimating the Development Effort of Web Projects in Chile

Transcription:

Usng Supervsed Clusterng Technque to Classfy Receved Messages n 137 Call Center of Tehran Cty Councl Mahdyeh Haghr 1*, Hamd Hassanpour 2 (1) Informaton Technology engneerng/e-commerce, Shraz Unversty (2) Department of Computer Eng. & IT, Shahrood Unversty of Technology,Shahrood,Iran Mahdyehaghr.66@gmal.com; h_hassanpour@yahoo.com Receved: 2011/05/20; Accepted: 2011/06/14 Pages: 15-24 Abstract Supervsed clusterng s a data mnng technque that assgns a set of data to predefned classes by analyzng dataset attrbutes. It s consdered as an mportant technque for nformaton retreval, management, and mnng n nformaton systems. Snce customer satsfacton s the man goal of organzatons n modern socety, to meet the requrements, 137 call center of Tehran cty councl s plannng to reduce the watng tme of customers, forwardng ther messages to the approprate servce manager to ncrease servce qualty. The cty councl s currently usng a manual approach dspatchng textual request messages. Snce ths process s very smlar to supervsed clusterng concept, n ths study, we appled the Naïve Bayes algorthm, whch s one of the most common algorthms n the supervsed clusterng arena to classfy receved messages regardng ther subjects. The performance results of the proposed technque ndcate ts hgh effcency n clusterng the receved messages wth 98% accuracy. Keywords: supervsed clusterng, classfcaton, Naïve Bayes algorthm, data mnng 1. Introducton Nowadays, due to the growng development of communcaton and nformaton technologes, our socety needs to take advantages of human and socal resources as much as possble. In ths regard, for nstance, reducng the watng tme of customers, ncreasng servce qualty and smplfyng the communcatons are needed n socal communcatons n ths age. Because of the growng development of IT-based communcaton tools and easy access to telecommuncatons systems such as phones, mobles, computers and the nternet, and due to engagement n dong routne tasks, usng smple and avalable technologes has a specal mportance. Therefore, Tehran muncpalty on a new plan made a system by usng nformaton technology scence and telecommuncaton tools. So, because of the drect vew and ctzens' actve collaboraton and people's nvolvement n managng ther own envronment, cvl tasks can be done quckly and properly. 137 call center of Tehran s founded as an nfrastructure n order to facltate communcatons of ctzens. The receved messages are classfed accordng to ther content nto predefned subject categores n the system by human operators and referred to the proper unt to be surveyed more. 15

Usng Supervsed Clusterng Technque to M. Haghr, H. Hassanpour Because of the mplementaton process n 137 call center that s naturally based on specfc and predefned choces, ths paper presents a model to classfy the receved messages automatcally. The proposed model uses a well known technque for textual document clusterng as an mportant step n ndexng, retreval, management and mnng of data n nformaton systems [1]. Clusterng methods are classfed nto two groups: supervsed and unsupervsed clusterng. Supervsed clusterng also known as classfcaton [2], s a data mnng technque used for extractng model from data on dscrete values [3]. Classfcaton evaluates the features of a data set, and then assgns them labels from a predefned set. Ths s the most common capablty of data mnng. Data classfcaton s a two-phase process [3]. In the frst phase, that s called learnng step, classfcaton system s made by a predefned set of data labels or concepts. In ths phase, the classfcaton algorthm makes the classfer by analyzng or learnng from tranng data set. In the second phase - generalzaton step [4], traned model s used to classfy unlabeled data. A test set s used to estmate the classfer performance. The man goal of generalzaton phase s to decrease nose mpact on the classfcaton results and ncrease classfer accuracy. Classfer accuracy on a gven test set s the percentage of tuples that have been correctly classfed. If classfer accuracy s admtted, t can be used to classfy tuples that ther class labels are unknown. Naïve Bayes, Roccho, K-Nearest Neghbors (K-NN), Regresson, Decson Trees, Neural Networks, Support Vector Machnes (SVM) and Decson Rule, can be referred to as the most mportant methods for text classfcatons problem [5]. Unsupervsed clusterng or smply clusterng [2], s n fact an unsupervsed learnng method. The goal s to dvde and gather a set of unstructured objects nto approprate clusters or groups so that smlar objects appear n the same cluster, and dfferent objects n the separate clusters [6,7]. The remander sectons of ths paper are as follows. Secton 2 explans our approach n detal. Followng, Secton 3 ntroduces our dataset and some related challenges related to Persan language of our dataset. Secton 4 s devoted to expermental results, and fnally secton 5 offers conclusons and suggests future work. 2. Introducng the proposed algorthm The classfcaton algorthm s used to classfy receved messages from ctzens n the 137 call center of Tehran cty councl based on Naïve Bayes algorthm. To know more about the mechansm of ths knd of algorthm, we wll explan t n great detals n ths secton. Assume we have m classes. Naïve Bayes classfcaton algorthm assgns X record to a class, say, whch has the maxmum posteror probablty: PC ( X ) > PC ( X ) for 1 j m, j (1) Wth ths assumpton, j maxmzes the posteror probablty. P( X C) PC ( ) P( C X) (2) P( X) 16

Journal of Advances n Computer Research (Vol. 2, No. 3, August 2011) 15-24 Snce the value of P(X) s the same for all classes, to maxmze (2),, t s enough to maxmze. Addtonally f that s named pror probablty s equal for all classes, t s only requred for to be maxmzed. If the pror probablty of classes s dfferent, t s possble to compute from (3): PC ( ) PC, D D (3) where s the number of records of class n the dataset, and s the total number of records n the data set. The amount of computed by (4): P( X C ) =Π P( X C ) (4) n k= 1 k In ths relaton, s the amount of attrbute of for record and s the total number of all attrbutes. If attrbutes are dscrete, then: P( X k C) P( X C ) = C, D (5) Snce our calculaton for classfyng a record, s based on the percentage of occurrng keywords of specal class (weght of the keyword for a subject), formula (6) s used to compute the weght of each keyword for each subject: ( ) P X C thefrequencyofeachkeywordforsubjectc = (6) thefrequencyofeachkeywordforallsubjects a. Offered archtecture for classfcaton Offered archtecture n ths paper can classfy the receved messages from ctzens to predefned subjects. Ths algorthm s presented n fgure (1), weghts keywords of each message for every subject and then calculates the weght of the subjects for each message by summng the weghts of the keywords n each message. Fnally, t assgns to the message the subject wth maxmum weght. 17

Usng Supervsed Clusterng Technque to M. Haghr, H. Hassanpour Algorthm SAMANE (Input Message: Strng, Output: Classfcaton of Message) 1- Select a Message; 2- Gve-Token(Message, Token-Set); 3- Fnd New Tokens n Relaton by Tokens n Token-Set and Insert Token-Set and Unte wth Related Token; 4- For Each Token n Token-Set Do Fnd All Message n Database n Whch It Occurs and Insert It n Message-Set; 4-1- For Each Message n Message-Set Do Fnd All Themes (Subjects) Related to Message; Themes-Frequency=Frequency of All Themes for Each Token; 4-2- For Each Theme n Theme-Set Do Weght (Token, Theme) = Frequency of Themes (Subjects)/ Themes-Frequency; 5- For Theme=1 to Number of Themes Do For Token=1 to Number of Tokens Do Sum (Theme) = Sum(Theme)+Weght(Token, Theme); 6- Max=Sum (1); For Theme=2 to Number of Themes Do If Sum(Theme)>Max Then Max=Sum(Theme); 7- Return Class=Subject(Max);. Fgure 1. offered classfcaton algorthm Table (1) represents a sample of the amount of the keywords relaton wth predefned subjects for each message. Ths assumed message has keywords that have been already exsted n subjects of all predefned subjects. Table 1. The relaton between keywords and defned subjects Subject Word Subject m Subject 2 Subject 1 Keyword 1 15% 40% 27% Keyword 2 20% 25% 10% Keyword n 5% 30% 36% Ths algorthm estmates the weght of each keyword n exstng subjects and calculates the weght of each message for each probable subject by computng the total assgned weghts of the keywords n each subject. The way of calculatng the weght of each keyword n dfferent subjects has been shown n (6). 18

Journal of Advances n Computer Research (Vol. 2, No. 3, August 2011) 15-24 The message classfcaton steps are as followng: 1. Select a message from receved messages; 2. Fnd the keywords exstng n the message; 3. Fnd synonyms and words relatng to the keywords and ntegrate them; 4. Make Token-Set (set of the keywords and the synonyms relatng to the message); 5. Send Token-Set to the weght calculaton unt for each subject; 6. Calculate the weghts for each subject for message; 7. Assgn the message to the subject wth the maxmum weght; 8. Regster nformaton so that t can be used for searchng and retrevng. b. The components of the offered archtecture Fgure 2. offered archtecture for classfcaton. Tranng data set Tranng data are sets of nformaton that transfer a knd of knowledge to the learner agent as a presupposed one. Actually, basc dfference between two methods of clusterng results from the dfference between ther learnng methods. Tranng nformaton gven to the system has been dscussed n the followng. - Class Labels The 137 call center of Tehran cty councl has classfed the receved messages from ctzens nto 44 specfc classes and then refers them to the relevant unt. Snce we plan to classfy automatc data based on the current process, and also we want to provde the same condton to compare the two methods, 44 classes wth the same labels are ncluded n the offered system. Moreover a class called "other cases" has been consdered so that a receved message out of the knowledge extent of the system can be classfed. - Tranng messages Each of the 44 classes n the manual system contans some related messages. These messages have been chosen and dscussed by a group of human expertse assgned by Tehran muncpalty and nclude all types of probable problems n every subject. These messages have been known as tranng messages n our system and ntroduced to the system by ther class labels. 19

Usng Supervsed Clusterng Technque to M. Haghr, H. Hassanpour - Keywords Preprocessng explores smple data n order to change them to approprate format for tranng [8]. One of the preprocessng phases s to extract keywords. Each document has one or more keywords or key phrases descrbng ts content, where these keywords or key phrases belong to a fnte set called controlled dctonary [5]. If there s a set of tranng documents wth dstnctve keywords for each of them, the process of extractng keywords wll be a supervsed learnng, otherwse wll be an unsupervsed learnng [8]. Due to the nature of our data tranng, we always face the short messages. Our target keywords are also exactly techncal n the urban management and belong to the predefned classes. So dentfyng and defnng these few techncal words whch many of them have been already known ncreases the accuracy and effcency of the system, and requres less tme regardng to dentfyng all the useless words such as addtonal words, conjunctons, pronouns, adverbs, plural suffxes, verbs and, etc. In ths paper we proft from supervsed learnng for keyword extracton. The offered system has been desgned n a way that t can dstngush defned keywords from all the suffxes, prefxes, punctuatons such as ".", ",", ";", "?", etc. and also mult syllabus words. So the keywords have been ntroduced to the system n ther smplest faces.. Test data set The data whch have been used to test the offered system, are the sent messages from the ctzens n the frst fve months n 2011n the dstrcts 1,2, 5 and 7 of the muncpalty of the regon 1 n Tehran. These 356 messages that have been already classfed manually are n 20 classes out of 44 tranng classes. In ths paper, t s supposed that ctzens' messages should be sent n Persan. c. Overcomng Persan language challenges In [9], some of the Persan language challenges n computer especally n the nternet and result of web searchng, have been mentoned. These challenges are as followng: ; «میتواند «and «میتواند «as such «می «unclung 1. Varety n usng clung or ; «درختها «and «درختها «as such «ها «unclung 2. Varety n usng clung or ; «پاکسازي «and «پاكسازي «as 3. Usng some prefxes or suffxes such ; «مسي له «and «مسا له «or «مسو ول «and «مسي ول «as n varous types such «همزه «Usng 4. ; «خانه مسکونی «and «خانە مسکونی «as such ة««usng 5. Usng or not ; «موسا «and «موسی «as such ا««by n Arabc words ended ي««usng 6. Varety n ; «اطاق «and «اتاق «as 7. Varety n spellng some words such 8. Usng European words n natve language or Persan translatons such as «ATM» and ; «خودپرداز «9. Usng or not usng rregular plurals for some words; 10. Transformng Latn words to Persan alphabets wth the orgnal pronuncaton such as ; «سورس «and «source» ; «قران «and «قرآن «as nterchangeably such آ««and ا««Usng 11. 12. Usng or not usng phonetc symbols for words. 20

Journal of Advances n Computer Research (Vol. 2, No. 3, August 2011) 15-24 The only dfference between our data and what found n computer, and the nternet s that our data are messages sent by ther own cell phones or mobles. Due to the present abltes of mobles n our country, cases such as 5 and 12 can't be set forth for dscusson. On the other hand, ctzens are supposed to send ther messages n Persan, so cases 8 and 10 can be gnored, too. Because of the system capacty to extract keywords from all of possble suffxes and prefxes, cases 1 and 2 have been corrected. Our offered system has a complete controlled dctonary that keywords belong to t. Ths dctonary n addton to synonymous words and phrases of keywords, conssts of dfferent spellng of them so that the system can respond to the cases 3, 6, 7 and 9 correctly. But for cases 4 and 11, an ablty s consdered n the system so that all the wrtng styles of vowels such as «,«و««ا and «ي» can be changed to the natve wrtng styles and then t wll survey them. 3. Evaluaton the offered system performance To estmate algorthm performance, three scales Precson(P), Recall(R) and Accuracy(A) were surveyed[10]. Four parameters TP, FN, FP and TN[11] that are defned n table (3), are needed to calculate these scales. Table 3. parameters of effcency scales Text Assgnng to class Not assgnng to class Belongng to class True Postve(tp) False Negatve(fn) Belongng to non class False Postve(fp) True Negatve(tn) Accuracy s the common scale used to evaluate accuracy of machne learnng algorthms. Ths scale s calculated by (7)[12]: tp+ tn A = tp + tn + fp + fn (7) Precson and Recall are scales used to estmate effcency of the text classfcaton algorthm. Precson scale calculates the accuracy of a searchng. Ths scale s calculated by (8)[12]: P tp = tp + fp (8) Recall scale, unlke the precson scale, calculates the rate of perfecton n a searchng [12]: R tp = tp + fn (9) 21

Usng Supervsed Clusterng Technque to M. Haghr, H. Hassanpour Table (4) presents the effcency scales of the system n all target classes. The average of ths asset value s used for fnal evaluaton. Table 4. Performance metrcs related to some target classes Class name Accuracy Precson Recall water 0.977 1 0.91 bus 0.991 0.75 0.86 asphalt 0.983 1 0.84 Socal damages 0.997 0.8 1 snow 0.985 0.83 0.88 Parks and green lands 0.985 0.66 0.79 traffc 1 1 1 Changng usage 0.997 1 0.88 Removng and ptchng 0.958 0.91 0.8 anmals 0.992 0.73 1 trees 0.983 0.89 0.94 trashes 0.966 0.71 0.77 buldng 0.983 0.95 0.9 Blockng path 0.98 1 0.53 washng 1 1 1 dredge 0.977 1 0.75 reparng 0.989 0.89 0.89 trouble 0.952 0.64 0.43 Ptchng safety marks 0.989 0.67 0.67 cleanng 0.98 1 0.61 Average 0.984 0.87 0.82 The results from the surveys showed that the system has totally made acceptable errors n the followng condtons: 1. The nearness of the message content to two or more groups wth the hgh usage and meanng resemblance; 2. The presence of dfferent usage and meanng keywords n the message; 3. Introducng the problem n a statement, wthout gvng any soluton or requestng to solve the problem. The frst error resultng from the hgh meanng resemblance and usage occurs because the rght range of the subject has not been pad attenton. For words such as park that has dfferent meanngs n dfferent sentences, t s necessary to extract not only the defned keywords but also survey the other parts of the sentence so that the target meanng n each sentence can be specfed through the relevant words wth each possble meanng. The thrd error occurs because the range of some groups s too wde. Although the ctzens can help survey ther message better and faster by requestng the muncpalty n ther own messages, at least by makng ther own requests besdes the ntroduced problems. Therefore, to correct the possble errors and ncrease the effcency of the system as much as possble, the followng actons can be taken. 1. Reformng the exstng groups by mergng or splttng the groups of less accuracy; 22

Journal of Advances n Computer Research (Vol. 2, No. 3, August 2011) 15-24 2. Usng a method besdes usng a dctonary to ndex latent semantc n the sentences made up of the dfferent applcable meanng keywords; 3. Approprate teachng and nformng the ctzens to ask ther own requests n the message. However, despte the ntroduced problems the rate of offered system precson s 87%, and ts Recall s 82%. Of course, we should notce that these acceptable percentages have consderably mproved and makes the system effcency ncrease by overcomng the presented dffcultes. The accuracy measure calculated s over 98%. Ths shows one of the most mportant advantages of the Naïve Bayes algorthm n comparson wth other algorthms that s ts persstence aganst the rrelevant attrbutes and keywords. 4. Concluson In ths paper, a method based on one of the most common supervsed algorthms, Naïve Bayes, was presented to classfy the ctzens' thematc messages n the 137 call center of Tehran cty councl. Ths algorthm uses 44 exstng classes n the manual system as tranng classes, 605 reference messages as ts tranng messages and 356 receved messages from Tehran ctzens n dstrct 4 of Tehran muncpalty of regon 1 for fve months n a year, as ts test data. Applyng ths method and evaluatng ts results proved ths algorthm wth accuracy up to over 98%, s a rght choce for our target. The rate of precson 87% and recallng 82% n classfyng the messages also suggests the comparatve success of the offered system n classfyng the target test data. Moreover, more study on the obtaned results showed that the effcency of the system can be ncreased by reformng the target groups used n the 137 call center and subsequently n the system and addng more convenence to the capacty of ndexng ts latent semantc, and teachng and nformng the ctzens correctly n presentng ther problems and requests. Our suggested future work conssts of usng other methods besde the current algorthm to ncrease accuracy. 5. References [1] Azzag, H., Gunot, C. and Venturn, G., "Data and text mnng wth herarchcal clusterng ants" Sprnger, Swarm Intellgence n Data Mnng. Vol. 34,: 153-189, 2006. [2] Eck, C., F., Zedat, N. and Zhao, Z., "Supervsed Clusterng- Algorthms and Benefts" 16th IEEE Internatonal Conference on Tools wth Artfcal Intellgence (ICTAI'04): 774-776, 2004. [3] Mazhar, N., Iman, M., Joudak, M. and Ghelchpour, A., " An overvew of classfcaton and ts algorthms" 3 th Data Mnng Conference (IDMC'09): Tehran, 2009. [4] Fazaeljavan, M. and Sadredn, M., H., " Investgatng Classfcaton and Assocaton Rule Mnng Technques for Intruson Detecton" 3 th Data Mnng Conference (IDMC'09): Tehran, 2009. [5] Sebastan, F., "Machne learnng n automated text categorzaton" Journal of ACM Computng Surveys., Vol. 34, 2002. [6] Fnley, T. and Joachms, T., "Supervsed clusterng wth support vector machnes" ICML '05 Proceedngs of the 22nd nternatonal conference on Machne learnng, 2005. [7] Berkhn, P., "Survey Of Clusterng Data Mnng Technques" Sprnger. Vol. 12: 1-56, 2002. [8] Ahmad, M., H., Monadjem, A., H. and Ayat, S., "Persan Text Classfcaton Usng Assocaton Rules" 4 th Data Mnng Conference (IDMC'10): Tehran, 2010. 23

Usng Supervsed Clusterng Technque to M. Haghr, H. Hassanpour [9] محسنیکبیر نینا میناییبیدگلی بهروز و کاشفی امید "مطالعه و بررسی اثر پیشوند و پسوند در شباهت معنایی جملات زبان فارسی با هدف کاربردي در سیستمهاي بازیابی اطلاعات" سومین کنفرانس دادهکاوي ایران تهران: دانشگاه علم و صنعت ایران 1388. [10] Wllams, N., Zander, S. and Armtage, G., "A prelmnary performance comparson of fve machne learnng algorthms for practcal IP traffc flow classfcaton" ACM SIGCOMM Computer Communcaton Revew. Vol. 36, October 2006. [11] Nguyen, T., T., T. and Armtage, G., "A survey of technques for nternet traffc classfcaton usng machne learnng" IEEE, Communcatons Surveys & Tutorals. Vol. 10: 56-76, 2008. [12] Fard, D., Harb, N. and Rahman, M., Z., "Combnng Nave Bayes and Decson Tree for Adaptve Intruson Detecton" Internatonal Journal of Network Securty & Its Applcatons: 12-25, 2010. 24