Customer Segmentation Using Clustering and Data Mining Techniques

Similar documents
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

A DATA MINING APPLICATION IN A STUDENT DATABASE

A Simple Approach to Clustering in Excel

An Interest-Oriented Network Evolution Mechanism for Online Communities

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Support Vector Machines

Forecasting the Direction and Strength of Stock Market Movement

Credit Limit Optimization (CLO) for Credit Cards

An Alternative Way to Measure Private Equity Performance

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

How To Calculate The Accountng Perod Of Nequalty

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Calculation of Sampling Weights

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Mining Multiple Large Data Sources

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

Project Networks With Mixed-Time Constraints


Cluster Analysis. Cluster Analysis

Recurrence. 1 Definitions and main statements

An Adaptive and Distributed Clustering Scheme for Wireless Sensor Networks

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

Data Mining from the Information Systems: Performance Indicators at Masaryk University in Brno

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

A Secure Password-Authenticated Key Agreement Using Smart Cards

Gaining Insights to the Tea Industry of Sri Lanka using Data Mining

A Fast Incremental Spectral Clustering for Large Data Sets

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

J. Parallel Distrib. Comput.

Scale Dependence of Overconfidence in Stock Market Volatility Forecasts

What is Candidate Sampling

CHAPTER 14 MORE ABOUT REGRESSION

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

SPECIALIZED DAY TRADING - A NEW VIEW ON AN OLD GAME

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

Improved SVM in Cloud Computing Information Mining

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

Implementation of Deutsch's Algorithm Using Mathcad

L10: Linear discriminants analysis

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

Statistical Methods to Develop Rating Models

Estimating the Number of Clusters in Genetics of Acute Lymphoblastic Leukemia Data

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB.

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Financial Mathemetics

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications

Fixed income risk attribution

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

Data Visualization by Pairwise Distortion Minimization

Marginal Benefit Incidence Analysis Using a Single Cross-section of Data. Mohamed Ihsan Ajwad and Quentin Wodon 1. World Bank.

Multi-sensor Data Fusion for Cyber Security Situation Awareness

Calculating the high frequency transmission line parameters of power cables

The Greedy Method. Introduction. 0/1 Knapsack Problem

LIFETIME INCOME OPTIONS

Single and multiple stage classifiers implementing logistic discrimination

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) , info@teltonika.

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

Using Series to Analyze Financial Situations: Present Value

Brigid Mullany, Ph.D University of North Carolina, Charlotte

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS

Gender Classification for Real-Time Audience Analysis System

Efficient Project Portfolio as a tool for Enterprise Risk Management

Lecture 2: Single Layer Perceptrons Kevin Swingler

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining

Design and Development of a Security Evaluation Platform Based on International Standards

HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA*

Section 5.4 Annuities, Present Value, and Amortization

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence


Transcription:

Internatonal Journal of Computer Theory and Engneerng, Vol. 5, No. 6, December 2013 Customer Segmentaton Usng Clusterng and Data Mnng Technques Kshana R. Kashwan, Member, IACSIT, and C. M. Velu fronter research topc n computer scence and engneerng. For clusterng method, the most mportant property s that a tuple of partcular cluster s more lely to be smlar to the other tuples wthn the same cluster than the tuples of other clusters. For classfcaton, the smlarty measure s defned as sm (t, tl), between any two tuples, t, tj D. For a gven cluster, Km of N ponts {tml, tm2... tmn}, the centrod s defned as the mddle of the cluster. Many of the clusterng algorthms assume that the cluster s represented by centrally located one object n the cluster, called a medod. The radus s the square root of the average mean squared dstance from any pont n the cluster to the centrod. We use the notaton Mm to ndcate the medod for cluster Km. For gven clusters K and Kj, there are several ways to determne the dstance between the clusters. A natural choce of dstance s Eucldean dstance measure [5]. Sngle ln s defned as smallest dstance between elements n dfferent clusters gven by ds(k, Kj) = mn(dst(t1, tjm)) tl K Kj and tjm K Kj. The complete ln s defned as the largest dstance between elements n dfferent clusters gven by ds (K, Kj) = max (ds (tl, tjm)), tl K Kj and tjm Kj Kj. The average ln s the average dstance between elements n dfferent clusters. We thus have, ds(k, Kj) = mean(ds(tl, tjm)), tl K Kj, tjm Kj Kj. If clusters are represented by centrods, the dstance between two clusters s the dstance between ther respectve centrods. We thus have, ds (K, Kj ) = ds (C, Cj), where C and Cj are the centrod for K and Kj respectvely. If each cluster s represented by ts medod then the dstance between the cluster can be defned as the dstance between medods whch can be gven as ds ( K, Kj )=ds (M, Mj ), where M and Mj are the Medod for K and Kj respectvely Abstract Clusterng technque s crtcally mportant step n data mnng process. It s a multvarate procedure qute sutable for segmentaton applcatons n the maret forecastng and plannng research. Ths research paper s a comprehensve report of -means clusterng technque and SPSS Tool to develop a real tme and onlne system for a partcular super maret to predct sales n varous annual seasonal cycles. The model developed was an ntellgent tool whch receved nputs drectly from sales data records and automatcally updated segmentaton statstcs at the end of day s busness. The model was successfully mplemented and tested over a perod of three months. A total of n = 2138, customer, were tested for observatons whch were then dvded nto = 4 smlar groups. The classfcaton was based on nearest mean. An ANOVA analyss was also carred out to test the stablty of the clusters. The actual day to day sales statstcs were compared wth predcted statstcs by the model. Results were qute encouragng and had shown hgh accuracy. Index Terms Cluster analyss, data mnng, customer segmentaton, ANOVA analyss. I. INTRODUCTION Hghlght Clusterng s a statstcal technque much smlar to classfcaton. It sorts raw data nto meanngful clusters and groups of relatvely homogeneous observatons. The objects of a partcular cluster have smlar characterstcs and propertes but dffer wth those of other clusters. The groupng s accomplshed by fndng smlartes among data accordng to characterstcs found n raw data [1]. The man objectve was to fnd optmum number of clusters. There are two basc types of clusterng methods, herarchcal and non-herarchcal. Clusterng process s not one tme tas but s contnuous and an teratve process of nowledge dscovery from huge quanttes of raw and unorganzed data [2]. For a partcular classfcaton problem, an approprate clusterng algorthm and parameters must be selected for obtanng optmum results. [3]. Clusterng s a type of exploratve data mnng used n many applcaton orented areas such as machne learnng, classfcaton and pattern recognton [4]. In recent tmes, data mnng s ganng much faster momentum for nowledge based servces such as dstrbuted and grd computng. Cloud computng s yet another example of II. K-MEANS CLUSTERING TECHNIQUE The algorthm s called -means due to the fact that the letter represents the number of clusters chosen. An observaton s assgned to a partcular cluster for whch ts dstance to the cluster mean s the smallest. The prncpal functon of algorthm nvolves fndng the -means. Frst, an ntal set of means s defned and then subsequent classfcaton s based on ther dstances to the centres [6]. Next, the clusters mean s computed agan and then reclassfcaton s done based on the new set of means. Ths s repeated untl cluster means don t change much between successve teratons [7]. Fnally, the means of the clusters once agan calculated and then all the cases are assgned to the permanent clusters. Gven a set of observatons (x1, x2,.., xn), where each observaton x s a d-dmensonal real vector. The -means clusterng algorthm ams to partton the n observatons nto Manuscrpt receved December 25, 2012; revsed February 28, 2013. Kshana R. Kashwan s wth the Department of Electroncs and Communcaton Engneerng PG, Sona College of Technology (An Autonomous Insttuton Afflated to Anna Unversty), TPT Road, Salem-636005, Taml Nadu, Inda (e-mal: drrashwan@gmal.com, drrashwan@sonatech.ac.n). C. M. Velu s wth the Department of CSE, Dattaala Group of Insttutons, Swam Chnchol, Daund, Pune 413130, Inda (e-mal: cmvelu41@gmal.com). DOI: 10.7763/IJCTE.2013.V5.811 856

groups of observatons called clusters where n, so as to mnmze the sum of squares of dstances between observatons wthn a partcular cluster [8]. As shown n Table I, the sum of squares of the dstance may be gven by the equaton arg mn S = Σ =1 Σ sj s x j μ j 2, where μ s the mean of ponts n S. Gven an ntal set, -means computes ntal means m (1) (1) 1,,m and t dentfes clusters n gven raw data set. TABLE I: K-MEANS ALGORITHM Smplfed smulaton flow of -means algorthm Begn Inputs: X = (x 1, x 2,.., x n) Determne: Clusters Intal Centrods - C 1, C 2,.., C Assgn each nput to the cluster wth the closest centrod Determne: Update Centrods - C 1, C 2,.., C Repeat: Untl Centrods don t change sgnfcantly (specfed threshold value) Output: Fnal Stable Centrods - C 1, C 2,.., C End In most of the cases, -means s qute slow to converge. For very accurate condtons, t taes qute a long tme to converge exponentally. A reasonable threshold value may be specfed for conversng n most of the cases to produce quc results wthout compromsng much accuracy [9]. As shown n Table II, the Sum of Square of Errors (SSE) may be consderably reduced by defnng more number of clusters. It s always desrable to mprove SSE wthout ncreasng number of clusters whch s possble due to the fact that -means converges to a local mnmum [10]. To decrease SSE, a cluster may be splt or a new cluster centrod may be ntroduced. TABLE II: BISECTING OF K-MEANS ALGORITHM Bsectng sequence of -means algorthm Begn Intalze clusters Do: Remove a cluster from lst Select a cluster and bsect t usng -means algorthm Compute SSE Choose from bsected clusters one wth least SSE Add bsected clusters to the lst of clusters Repeat: Untl the number of cluster have been reached to End To ncrease SSE, a cluster may be dspersed or two clusters may be merged. To obtan -clusters from a set of all observaton ponts, the observaton ponts are splt nto two clusters and agan one of these clusters s splt further nto two clusters. Intally a cluster of largest sze or a cluster wth largest SSE may be chosen for splttng process. Ths s repeated untl the numbers of clusters have been produced. Thus t s easly observable that the SSE can be changed by splttng or mergng the clusters [11]. Ths specfc property of the -means clusterng s very much desrable for maretng segmentaton research. The new SSE s agan computed after updatng cluster centrod. Ths s repeated untl SSE s reached to a mnmum value or becomes constant wthout changng further, a condton smlar to congruence. The SSE s represented mathematcally by SSE = Σ =1 (μ - x) 2 where μ s the centrod of th cluster represented by c and x s any pont n the same cluster. A condton for achevng mnmum SSE can be easly computed by dfferentatng SSE, settng t equal to 0 and then solvng the equaton [12]. SSE m 1 1 xc xc ( x) 2 ( x ) 0 xc xc x 2 ( x) Here m s total number of elements and μ K s centrod n th cluster c. Further t can be smplfed as 1 m xc Ths concludes that the mnmum SSE can be acheved under the condton of the centrod of the cluster beng equal to the mean of the ponts n the th cluster c. III. MARKET SEGMENTATION SURVEY The maret segmentaton s a process to dvde customers nto homogeneous groups whch have smlar characterstcs such as buyng habts, lfe style, food preferences etc. [13]. Maret segmentaton s one of the most fundamental strategc plannng and maretng concepts wheren groupng of people s done under dfferent categores such as the eenness, purchasng capablty and the nterest to buy. The segmentaton operaton s performed accordng to smlarty n people n several dmensons related to a product under consderaton. The more accurately and approprately the segments performed for targetng customers by a partcular organzaton, the more successful the organzaton s n the maretplace. The man objectve of maret segmentaton s accurately predctng the needs of customers and thereby ntern mprovng the proftablty by procurng or manufacturng products n rght quantty at tme for the rght customer at optmum cost. To meet these strngent requrements -means clusterng technque may be appled for maret segmentaton to arrve at an approprate forecastng and plannng decsons [14]. It s possble to classfy objects such as brands, products, utlty, durablty, ease of use etc wth cluster analyss [15]. For example, whch brands are clustered together n terms of consumer perceptons for a postonng exercse or whch ctes are clustered together n terms of ncome, qualfcaton etc. [16]. The data set conssted of usages of brands under dfferent condtons, demographc varables and varyng atttudes of the customers. The respondents consttuted a representatve random sample of 2138 as data ponts from customer transactons n a retal super maret where varous household x 2 857

products were sold to ts customers. The survey was carred for a perod of about 3 months. The modellng and testng of maret segmentaton usng clusterng for forecastng was based on the customers of a leadng super maret retal house hold suppler located at Chenna branch, Inda. The organzaton s name and the varables drectly related to the organzaton are delberately suppressed to mantan confdentalty as per our agreement. It was requred to map the profle of the target customers n terms of lfestyle, atttudes and perceptons. The man objectve was to measure mportant varables or factors whch can lead to vtal nputs for decson mang n forecastng. The survey contaned 15 dfferent questonnares as gven below n Table III. preferred emals to wrtng letters whereas 306 customers only agreed that they preferred emal. Smlarly 541 customers strongly dsagreed wth the dea of emals, may be they ddn t have access to nternet or otherwse and so on. Graphcal response for 5 scale pont s shown n Fg. 1. The cluster mappng vsualzaton of response matrx s llustrated n Fg. 2, whch shows that dstrbuton of the responses s qute wde and scattered but farly unform. TABLE III: VARIABLES CHOSEN Dfferent varables used for maretng segmentaton Var1: Prefer emal to wrtng a letter Var2: Feel that qualty products are prced hgher Var3: Thn wsely before buyng anythng Var4: Televson s a major source of entertanment Var5: A entertanment s a necessty rather than a luxury Var6: Prefers fast food and ready-to-use products Var7: More health-conscous Var8: Competton mproves the qualty Var9: Women are actve partcpants n purchase Var10:The advertsements can play a postve role Var11: Enjoy watchng moves Var12: Le modern style and fashon Var13: Prefers branded products Var14: Prefer outng on weeends Var15: Prefer to pay by credt card than cash Fg. 1. Response of customer on the fve pont scale. Fg. 2. Customer response mappng for questonnare. TABLE IV: CUSTOMER RESPONSE ON THE SCALE OF FIVE POINTS Varable Strongly agree (5) Agree (4) Dsagree (2) No Opnon (3) Strongly dsagree (1) Var1 412 306 557 322 541 Var2 334 606 216 737 245 Var3 513 751 304 427 143 Var4 339 628 433 501 237 Var5 232 723 344 642 197 Var6 534 430 636 302 236 Var7 116 831 213 622 356 Var8 448 727 223 552 188 Var9 530 419 631 330 228 Var10 602 223 749 310 254 Var11 517 320 763 104 434 Var12 863 403 151 311 410 Var13 652 161 754 348 223 Var14 414 629 237 712 146 Var15 324 546 430 613 225 A fve pont ratng scale was used to represent varables n segmentaton. For ths, the customers were ased to gve ther response n categores of strongly agree as 5, agree as 4, No Opnon as 3, dsagree as 2 and strongly dsagree as 1. The Eucldean dstance was used to measure the clusterng analyss. Eucldean dstance s deally sutable for smlar nterval scaled varables. The nput data matrx of 2138 respondents wth 15 varables s shown n Table IV. Ths could be explaned as 412 customers strongly agreed that they TABLE V: INITIAL AND FINAL CONVERGED CUSTOMER CENTERS Varable Intal Cluster Centre Fnal Cluster Centre 1 2 3 4 1 2 3 4 Var1 1 4 4 3 1.80 3.13 2.80 4.00 Var2 3 5 3 2 2.60 3.50 2.20 1.50 Var3 5 1 3 1 3.40 2.50 3.20 1.00 Var4 4 4 2 5 2.80 3.25 2.60 5.00 Var5 3 5 1 3 3.20 3.88 2.60 2.00 Var6 5 4 2 4 4.40 3.25 3.40 3.00 Var7 3 5 1 4 2.40 4.38 1.40 4.00 Var8 2 1 5 2 3.00 2.00 4.60 2.00 Var9 3 1 2 1 3.80 2.63 1.80 1.50 Var10 2 5 2 2 3.40 3.80 3.00 3.00 Var11 4 3 4 1 3.60 3.13 4.20 2.50 Var12 1 3 5 2 2.00 3.63 3.60 2.50 Var13 1 5 1 2 2.20 4.00 2.40 2.50 Var14 1 5 1 4 2.20 3.88 2.40 3.50 Var15 5 2 2 4 4.60 2.75 1.80 3.00 Further on, the clusterng was carred out as explaned n Secton II. The value of was chosen as 4 and t was desred to now that what nd of 4 groups exsted n the data set of customer response matrx. For the values gven n Table IV, -means clusterng s computed by usng standard SPSS pacage. Table V shows ntal and fnally converged cluster centers wth ther means. The ntal cluster vsualzaton s shown n Fg. 3 wth observaton of qute scattered dstrbuton. Intal centers were randomly selected, thus had wde varatons and then SPSS 858

teratons were performed untl there was no sgnfcant change n the poston of cluster centers. Ths condton s called as convergence and as a result of t, fnally refned and stable cluster centers, as llustrated n Fg. 4, was acheved. Fg. 3. Intal clusters dstrbuton as chosen randomly. remanng other varables are statstcally sgnfcant as they all have probabltes > 0.10. TABLE VI: ANOVA ANALYSIS Varable Cluster MS Error MS F-Statstc P-value VAR-1 3.050 1.315 2.318 0.114 VAR-2 3.072 1.083 2.835 0.071 VAR-3 2.572 1.630 1.577 0.234 VAR-4 1.633 0.943 1.730 0.201 VAR-5 2.505 1.605 1.560 0.238 VAR-6 1.705 1.505 1.133 0.365 VAR-7 9.650 0.390 24.704 0.000 VAR-8 8.550 0.681 12.550 0.000 VAR-9 1.300 1.865 0.696 0.567 VAR-10 5.556 0.730 7.539 0.002 VAR-11 2.738 1.020 2.683 0.082 VAR-12 4.083 1.293 3.156 0.054 VAR-13 7.255 0.799 9.081 0.001 VAR-14 1.622 1.880 0.862 0.480 VAR-15 2.850 1.465 1.944 0.163 Fg. 4. Fnal clusters dstrbuton as computed by SPSS. IV. STATISTICAL SIMULATION TESTS There are many statstcal tests whch are normally used to perform the clusterng process. In statstcs, qute often a very mportant tool called Analyss of Varance (ANOVA) s employed for varous analyss and data processng ncludng clusterng and data mnng. ANOVA test gves sutable nference whle splttng and mergng the clusters dynamcally. The stablty of the clusters can be checed through splttng the sample and repeatng the cluster analyss. Typcally an ANOVA conssts of varous statstcal models and ther related procedures, n whch varance s observed n a partcular random varable. The varance s dvded nto dfferent components whch can be attrbuted to dfferent sources of varaton n random varable. The ANOVA s nvarably used n comparng more than one means or centrods. In ts smplest form ANOVA provdes a statstcal test of whether or not the means or centrods of several groups of random varables are all equal or not. The ANOVA statstcs appled on data collected for maretng segmentatons s lsted n Table VI. It was very clear from that statstc test that whch of the 15 chosen varables were sgnfcantly dfferent across the 4 fnal clusters obtaned by -means clusterng as shown n Table VI. The last column ndcates that the varables 2, 7, 8, 10, 11, 12 and 13 are nsgnfcant at the confdence level of 0.10 snce all the probabltes of these varables are < 0.10. The The Table V shows the dfference between ntal and fnal centers. It can be observed that there s a dfference of 0.8 n the coeffcents between the cluster-3 soluton of ntal stage and fnal stage for varable 2. The hghest dfference s 1.6, between ntal stage and fnal stage of cluster-1 of varable 3. Cluster formaton s spatally qute dfferent for ntal and fnal stages as llustrated n Fg. 3 and Fg. 4. For maret segmentaton problem, Table IV gves the outputs of -means clusterng for the = 4 (chosen value) for stable and refned fnal cluster centers. Subsequently null hypothess s tested by usng ANOVA method as llustrated by Table VI. Ths can be explaned further as cluster-1 s descrbed by the mean value of 1.80 for Var1. Smlarly the mean value s 2.60 for Var2 and 3.40 for Var3 and so on. In the same way cluster-2 s descrbed by the mean of 3.13 for Var1, 3.50 for Var2 and 2.50 for Var3 and so on. The 15 varables were taen nto consderaton and tred to nterpret the sgnfcance of the clusters formed thereof. The graphcal llustraton of ANOVA analyss s depcted n Fg. 5 for chosen statstcal values of cluster mean, mean square error, F-statstcs and P-value. Fg. 5. Statstcal dstrbuton analyss of clusters. For example, for cluster-1, the mean of 1.80 for Var1 nterprets that people prefer emal. Smlarly, a mean of 3.40 for Var3 ndcates that the peoples are careful whle spendng. 859

The mean of 2.60 for Var2 says that the qualty products come always at hgher prce. For these same varables, cluster-2 shows that people prefer conventonal letters to e-mal whch s ndcated by the mean of 3.13 for Var1. The people who do not prefer hgh prce for good qualty s shown by the mean of 3.50 for Var2 and tend to be neutral about care n spendng wth mean of 2.50 for Var3. Smlarly, the varables for cluster-3 and cluster-4 can be compared. For the gven maret segmentaton problem, 4 clusters were analyzed for varous consderaton. Cluster-1 ndcated that the varables whch were ncluded n ths cluster namely, varable1es 1, 2, 5, 12, 13, 14 not le varables 4, 6, 9, 11, 15 and not sure of varable 7. The derved nference from ths was thus exhbtng many tradtonal values, except that they had adopted to emal use. They had also begun to spend more lberally and were probably n the transton process of few other factors le acceptance of women as decson maers and more use of credt cards as convenence. Cluster-2 ndcated that the varables found n the clusters, namely, 1, 5, 8, 9, 15 were not n the same statstcal characterstcs as compared to varables 3, 4, 6, 7, 10, 11, 12, 13, 14. They beleved n negotatons or were aggressve buyers. It could be concluded that, t was a group whch led to use credt cards, spent more freely, beleved n women power, beleved n economcs rather than poltcs and felt qualty products could be worth purchasng. Also, they seemed to have taste of modern lfe style and were fashon orented. Cluster-4 gave out an analyss that the varables 2, 4, 5, 7, 10 belong to ths cluster, had opposte statstcal characterstcs to the varable 1, 3, 6, 8, 11, 12, 13 and were neutral n comparson to varables 14, 15. It was concluded that, ths group was optmstc, free spendng and a good target for TV advertsng, partcularly consumer durables tems and entertanment. But they need not to get necessarly nfluenced by brands. They wanted value for money, n case f they understood that the tem was worth, they would tend to buy the same. V. CONCLUSION AND DISCUSSIONS In summary, the cluster analyss of the chosen sample of respondents explaned a lot about the possble segments whch exsted n the target customer populaton. Once the number of clusters was dentfed, a -means clusterng algorthm, whch s a non-herarchcal method, was used. For computng -means clusterng, the ntal cluster centers were chosen and then fnal stable cluster centers were computed by contnung number of teratons untl means had stopped further changng wth next teratons. Ths convergent condton was also acheved by settng a threshold value for change n the mean. The fnal cluster centers contaned the mean values for each varable n each cluster. Also, ths was nterpreted n mult-dmensonal projectons related to maret forecastng and plannng. To chec the stablty of the clusters, the sample data was frst splt nto two parts and was checed that whether smlar stable and dstnct clusters emerged from both the sub-samples. These analyses at the end provded further llustratons of usng cluster method for maret segmentaton for forecastng. Computng based system developed was an ntellgent and t automatcally presented results to the mangers to nfer for quc and fast decson mang process. The smulaton tests were also computed for cluster brands and other characterstcs of the cluster representng a partcular class of people. The future wor wll nvolve more trals and automaton of the maret forecastng and plannng ACKNOWLEDGMENT The authors feel deeply ndebted and thanful to all who opned for techncal nowhow and helped n collecton of maret data. Authors also feel thanful to all customers who volunteered for feedbac and transactonal nformaton. Authors feel thanful to ther famly members for constant support and motvaton. REFERENCES [1] I. S. Dhllon and D. M. Modha, Concept decompostons for large sparse text data usng clusterng, Machne Learnng, vol. 42, ssue 1, pp. 143-175, 2001. [2] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Pato, R. Slverman, and A. Y. Wu, An effcent K-means clusterng algorthm, IEEE Trans. Pattern Analyss and Machne Intellgence, vol. 24, pp. 881-892, 2002. [3] MacKay and Davd, An Example Inference Tas: Clusterng, Informaton Theory, Inference and Learnng Algorthms, Cambrdge Unversty Press, pp. 284-292, 2003. [4] M. Inaba, N. Katoh, and H. Ima, Applcatons of weghted Vorono dagrams and randomzaton to varance-based -clusterng, n Proc.10 th ACM Symposum on Computatonal Geometry, 1994, pp. 332-339. [5] D. Alose, A. Deshpande, P. Hansen, and P. Popat, NP-hard Eucldean sum-of-squares clusterng, Machne Learnng, vol. 75, pp. 245-249, 2009. [6] S. Dasgupta and Y. Freund, Random Trees for Vector Quantzaton, IEEE Trans. on Informaton Theory, vol. 55, pp. 3229-3242, 2009. [7] M. Mahajan, P. Nmbhorar, and K. Varadarajan, The Planar K-Means Problem s NP-Hard, LNCS, Sprnger, vol. 5431, pp. 274-285, 2009. [8] A. Vattan, K-means exponental teratons even n the plane, Dscrete and Computatonal Geometry, vol. 45, no. 4, pp. 596-616, 2011. [9] C. Elan, Usng the trangle nequalty to accelerate K-means, n Proc. the 12 th Internatonal Conference on Machne Learnng (ICML), 2003. [10] H. Zha, C. Dng, M. Gu, X. He, and H. D. Smon, Spectral Relaxaton for K-means Clusterng, Neural Informaton Processng Systems, Vancouver, Canada, vol.14, pp. 1057-1064, 2001. [11] C. Dng and X.-F. He, K-means Clusterng va Prncpal Component Analyss, n Proc. Int'l Conf. Machne Learnng (ICML), 2004, pp. 225-232. [12] P.-N. Tan, V. Kumar, and M. Stenbach, Introducton to Data Mnng, Pearson Educato Inc. and Dorlng Kndersley (Inda) Pvt. Ltd., New Delh and Chenna Mcro Prnt Pvt. Ltd., Inda, 2006. [13] D. D. S. Garla and G. Charaborty, Comparson of Probablstc-D and -Means Clusterng n Segment Profles for B2B Marets, SAS Global Forum 2011, Management, SAS Insttute Inc., USA. [14] H.-B. Wang, D. Huo, J. Huang, Y.-Q. Xu, L.-X. Yan, W. Sun, X.-L. L, and Jr. A. R. Sanchez, An approach for mprovng K-means algorthm on maret segmentaton, n Proc. Internatonal Conference on Systme Scence and Engneerng (ICSSE), IEEE Xplore, 2010. [15] H. Hruscha and M. Natter, Comparng performance of feedforward neural nets and K-means for cluster-based maret segmentaton, European Journal of Operatonal Research, Elsver Scence, vol. 114, pp. 346-353, 1999. [16] P. Ahmad, Pharmaceutcal Maret Segmentaton Usng GA- K-means, European Journal of Economcs, Fnance and Admnstratve Scences, ssue 22, 2010. 860

Kshana R. Kashwan receved the degrees of M. Tech. n Electroncs Desgn and Technology and Ph.D. n Electroncs and Communcaton Engneerng from Tezpur Unversty (a central unversty of Inda), Tezpur, Inda, n 2002 and 2007 respectvely. Presently he s a professor and Dean of Post Graduate Studes n the department of Electroncs and Engneerng (Post Graduate Studes), Sona College of Technology (An Autonomous Insttuton Afflated to Anna Unversty), TPT Road, Salem 636005, Taml Nadu, Inda. He has publshed extensvely at nternatonal and natonal level and has travelled to many countres. Hs research areas are VLSI Desgn, Communcaton Systems, Crcuts and Systems and SoC / PSoC. He s also drector of the Centre of Excellence n VLSI Desgn and Embedded SoC at Sona College of Technology. He s a member of Academc Councl, Research Commttee and Board of Studes of Electroncs and Communcaton Engneerng at Sona College of Technology. He has successfully guded many scholars for ther master s and doctoral theses. Kashwan has completed many funded research projects. Currently, he s worng on a few funded projects from Government of Inda. Dr. Kashwan s a member of IEEE, IASTED and Senor Member of IACSIT. He s a lfe member of ISTE and Member of IE (Inda). C. M. Velu receved hs M. E n CSE from Sathyabama Unversty. He s currently pursung hs doctoral program under the faculty of Informaton and Communcaton Engneerng regstered at Anna Unversty Chenna, Inda. He has vsted UAE as a Computer faculty. He served as faculty of CSE for more than two and half decades. He has publshed many research papers n nternatonal and natonal journals. Hs areas of nterest are Data Warehousng and Data Mnng, Artfcal Intellgence, Artfcal Neural Networs, Dgtal Image Processng and Pattern Recognton. 861