Segmentation of stock trading customers according to potential value


 Lesley Lyons
 3 years ago
 Views:
Transcription
1 Expert Systems with Applications 27 (2004) Segmentation of stock trading customers according to potential value H.W. Shin a, *, S.Y. Sohn b a Samsung Economy Research Institute, Kúkje Cener Building, 191, Hangangro 2Ga, Seoul, South Korea b Department of Computer Science and Industrial Systems Engineering, Yonsei University, Seoul, South Korea Abstract In this article, we use three clustering methods (Kmeans, selforganizing map, and fuzzy Kmeans) to find properly graded stock market brokerage commission rates based on the 3month long total trades of two different transaction modes (representative assisted and online trading system). Stock traders for both modes are classified in terms of the amount of the total trade as well as the amount of trade of each transaction mode, respectively. Results of our empirical analysis indicate that fuzzy Kmeans cluster analysis is the most robust approach for segmentation of customers of both transaction modes. We then propose a decision tree based rule to classify three groups of customers and suggest different brokerage commission rates of 0.4, 0.45, and 0.5% for representative assisted mode and 0.06, 0.1, and 0.18% for online trading system, respectively. q 2003 Elsevier Ltd. All rights reserved. Keywords: Customer relationship management; Customer segmentation; Kmeans clustering; Selforganizing map; Fuzzy Kmeans 1. Introduction The scale of Korean stock market has been rapidly increased in 1990s. In spite of the financial crisis occurred in Korea in 1997, there were more than 30 domestic security corporations, and daily average stock transaction had reached 4800 billion won in 2000, compared to 4100 billion won a year ago. It indicates that the commission based on the transaction was considerably increased as well. This commission is one of the main sources for profit of security corporations and each security corporation introduces its own commission rate to increase the profit. It is typically based on each trading amount itself. However, this kind of system does not consider the potential customer value over time. Those who have traded more in a cumulative manner continuously over a longer time period needs to be treated in a better manner (Hartfeil, 1996). In commercial banking system, Zeithaml, Rust, and Lemon (2001) presented that superior 20% of customers produced 82% of the bank s retail profit. Hunt (1999) showed that the charge system of insurance corporation should be arranged not uniformly but differently according to customer s potential value. This * Corresponding author. Tel.: þ ; fax: þ addresses: (H.W. Shin); (S.Y. Sohn). argument supports the value of better treatment of loyal customers. In this article, we propose a robust clustering algorithm to classify the stock traders into several groups in terms of the three 3month transaction in order to suggest the graded commission policy for each group. Variables used for clustering criteria are transactions made on both representative assisted trading and online Home Trading System (HTS). Clustering methods used are Kmeans clustering, selforganizing map (SOM), and fuzzy Kmeans method. The cutoff value of each customer group is set based on classification and regression tree (CART). The rest of this article is organized as follows. In Section 2 we describe three clustering methods along with the performance measure for comparison. In Section 3 we apply proposed algorithms to the field data and come up with three groups of customers. Subsequently, in Section 4 we present new brokerage commission rate and it is compared to the existing commission rate in terms of profit. Finally in Section 5, we discuss the implication of our results and suggest further study areas. 2. Three clustering algorithms Cluster analysis can be used for gathering objects (observation) on the basis of their variables. We use three /$  see front matter q 2003 Elsevier Ltd. All rights reserved. doi: /j.eswa
2 28 H.W. Shin, S.Y. Sohn / Expert Systems with Applications 27 (2004) kinds of clustering methods for customer segmentation: Kmeans, SOM, and fuzzy Kmeans. For brief description of each method, let us assume that we are interested in clustering N samples with respect to P variables into K clusters. For sample i; x i ¼ðx i1 ; x i2 ; ; x ip ; ; x ip Þ represents a vector of P characteristic variables. Typically K is unknown but for stock customer segmentation, we use K ¼ 3: 2.1. Kmeans clustering algorithm Kmeans method is widely used due to rapid processing ability of large data. Kmeans clustering proceeds in the following order. Firstly, K number of observations is randomly selected among all N number of observations according to the number of clusters. They become centers of initial clusters. Secondly, for each of remaining N K observations, find the nearest cluster in terms of the Euclidean distance with respect to x i ¼ ðx i1 ; x i2 ; ; x ip ; ; x ip Þ After each observation is assigned the nearest cluster, recompute the center of the cluster. Lastly, after the allocation of all observation, calculate the Euclidean distance between each observation and cluster s center point and confirm whether it is allocated to the nearest cluster or not Selforganizing map The SOM is an unsupervised neural network model devised by Kohonen (1982). As with other neural networks the analysis is based on the solution of a large number of simple operations that can be performed in parallel. The SOM network typically has two layers of nodes: an input layer and an output layer. The neurons in the output layer are arranged in a grid and are influenced by their neighbors in this grid. The goal is to automatically cluster the input samples in such a way that similar samples are represented by the same output neuron (Kim & Han, 2001; Mangiameli, Chen, & West, 1996). Since each of the characteristic variables is linked to every output neuron by a weighted connection, each output neuron j ðj ¼ 1; ; KÞ has the same number of weights w j associated with as the number of input variables. Starting from a randomly initialized weights, it learns to adapt its weight according to the input samples as follows. When an input sample, x i ; is presented to the SOM network, the neurons compute distance between weight vectors w j ¼ðw j1 ; w j2 ; ; w jp ; ; w jp Þ and the input x ¼ðx i1 ; x i2 ; ; x ip ; ; x ip Þ: The neuron with the minimum distance, called winner, is then determined based on Min D j vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ux P ¼ t ½x ip 2 w jp Š 2 p¼1 ð1þ where w jp is the weight of the j th neuron linked to p th variable. The weights of the winner as well as in its neighborhood are then updated using the following equation: w j new ¼ w j old þ akx i 2 w j old k where w j new is the new weight vector and w j old is the old weight vector of the j th neuron, and a is the learning rate ð0, a, 1Þ: This procedure is over when the difference in the error (e.g. average of the Euclidean distances of each input sample and its best matching weight vector) between the current and the previous iteration is smaller than a given value 1. After the stop criterion is satisfied, each neuron in the network represents a cluster Fuzzy Kmeans clustering analysis Fuzzy set theory was introduced in the 1960s as a way of explaining uncertainty in data structure (Zadeh, 1965). Fuzzy Kmeans (also known as fuzzy cmeans) clustering has been investigated by Bezdek (1981) and was compared to the nonfuzzy clustering method. Hruschka (1986) and Weber (1996) showed in their empirical study that fuzzy clustering provided more insight than nonfuzzy clustering in terms of market segment information. Fuzzy clustering segments the samples into 1, K, N clusters, estimates sample cluster membership and simultaneously estimates the cluster centers. The cluster membership of x i in the cluster s; u si ; is between 0 and 1 and is defined as follows (Ozer, 2001) 1 u si ¼!; kx i 2v s k 2=ðm21Þ forx i v j ; ;s;i; andm.1 ð3þ j¼1 kx i 2v j k 2=ðm21Þ where m is the smoothing parameter which controls the fuzziness of the clusters, and v s is the vector of cluster centers ðv s1 ; v s2 ; ; v sp ; ; v sp Þ defined as X N I¼1 v s ¼ ðu siþ m x i X N i¼1 ðu ; ;s: ð4þ m siþ Optimal value of u is obtained so as to minimize the following objective function Min XN i¼1 s¼1 ðu si Þ m ðkx i 2 v s k 2 Þ The constraints used are as follows 0 # u si # 1; ;s; i ð6þ s¼1 ð2þ ð5þ u si ¼ 1; ;s: ð7þ
3 H.W. Shin, S.Y. Sohn / Expert Systems with Applications 27 (2004) Condition (6) ensures that the degrees of memberships are between 0 and 1, and condition (7) means that, for a given sample, the degrees of membership across the clusters sum to one. Once optimal values of u are found, a case with highest associated u is assigned a corresponding cluster Performance comparison of the three clustering methods We compare the performances of these clustering methods using intraclass method presented in Michaud (1997). Intraclass inertia is a measure of how compact each cluster (class) is when the number of cluster is fixed. Usually the variables are scaled to be in the same range (Nair & Narendran, 1997). The mean of the j th cluster C j that has n j samples is defined as x j ¼ðx j1 ; x j2 ; ; x jp ; ; x jp Þ; where x jp ¼ð1=n j Þ X i[c j x ip The intraclass inertia I j of cluster j is defined as I j ¼ X XP ðx ip 2 x jp Þ 2 i[c j p¼1 Finally, the intraclass inertia FðKÞ for a given K clusters is defined as FðKÞ ¼ 1 n n j I j ¼ 1 n j¼1 X XP j¼1 i[c j p¼1 ðx ip 2 x jp Þ 2 ð8þ ð9þ ð10þ One can see that FðKÞ is the average squared Euclidean distance between each observation and its cluster mean. 3. A case study We randomly select 3000 customers who had transaction records from the middle of July to the middle of October in 1999 from stock corporation A and apply the three clustering methods. The stock transaction modes used are either representative assisted or online HTS. HTS customers directly buy and sell their stocks without the advice of the corporation s representatives. Results of the descriptive statistics of the sample data are given as follows. About 78% of the total trade amount was made by online HTS. In terms of gender, 68% of the customers are male. However, average trade amount made by female customers by both modes were 51 and 52%, respectively, for representative assisted and online HTS. This suggests the importance of marketing strategy for HTS and female customers. In terms of age, those who are older than 60 used representative assisted mode mostly. Also, their trade amount is the highest among various generations in both Table 1 Intraclass inertia of each clustering method Clustering method Mode Intraclass inertia Kmeans Representative assisted mode p HTS SOM Representative assisted mode HTS Fuzzy Kmeans Representative assisted mode HTS p transaction modes. In terms of the average transaction frequency, representative assisted mode is 1.8 times while online HTS is six times per month, respectively. We also estimate correlation between the trade amount made by each transaction mode and the sum of them. Apparently the correlation between the two modes is relatively low (0.38) while those between single mode and the total transactions are 0.76 and 0.89, respectively, for representative assisted and online HTS Cluster analysis of customers Clustering methods are used to segment the customers for both modes, respectively, using two variables for clustering of customers each mode. Variables used for cluster analysis for representative assisted mode are both total trade amount and representative assisted trade amount over the 3month period. In the case of HTS mode, we use both total trade amount and trade amount in HTS over the 3month period. Customers are segmented into three clusters (Normal, Best, VIP customers). After some experimentation with the parameters of clustering methods we set the following parameters: SOM learning rate ðaþ is equal to 0.1 and fuzzy Kmeans smoothing parameter ðmþ is equal to 1.2. Fuzzy Kmeans smoothing parameter ðmþ is equal to 1.2. For comparison purpose, the resulting compactness of clusters of the three clustering methods (Kmeans, SOM, fuzzy Kmeans) is summarized in Table 1. In case of customer segmentation in the representative assisted mode, Kmeans clustering method turns out to be Table 2 The segmentation of customers in representative assisted mode using Kmeans Number of customers Cluster center Total trade amount for 3 months (units: won) Trade amount in representative assisted mode for 3 months (units: won) Normal million 6.4 million Best billion 25.6 billion VIP billion billion
4 30 H.W. Shin, S.Y. Sohn / Expert Systems with Applications 27 (2004) Fig. 1. Transaction distribution in a representative assisted mode for 3 months. the best while in the segmentation of customers in HTS, fuzzy Kmeans method is the winner. Table 2 and Fig. 1 represent the segmentation of customers in Representative assisted mode using Kmeans clustering method while Table 3 and Fig. 2 represent the segmentation of customers of HTS using fuzzy Kmeans. The results indicate that the number of Best customers and VIP customers are small in the case of representative assisted mode compared to HTS. As shown in Figs. 1 and 2, there is a particular data that have a very large amount of total trade (558 billion won for 3 months) among VIP customers. This customer may be considered as an outlier. Therefore, we compare the clustering results without this particular customer. Results are given in Table 4. In this case, fuzzy Kmeans has the best performance in representative assisted mode. SOM is the most suitable in HTS, but fuzzy Kmeans produces fairly good performance as well. Generally, we can conclude that fuzzy Kmeans provides relatively robust results in terms of intraclass inertia for both modes Classification of three group of customers In practice, we need threshold values to classify the three different groups We use decision tree to find the threshold values for customer segmentation of both transaction modes. The class (Normal, Best, VIP) of outcome is categorized by fuzzy Kmeans after deleing an outlier. Seventy percentage of 2999 (except a particular customer) customers data are assigned for training while 30% are assigned for validation using a segment based stratified sampling approach. We then use CART algorithm to find the threshold values for the three groups. Trees in Figs. 3 and 4 show the threshold values for customer segmentation. From Fig. 3, if the total trade amount of both modes for three months is less than about 19.3 billion won, they are defined as Normal customers. Also if the total trade amount of both modes for 3 months is more than 19.3 billion won and the trade amount in the representative assisted mode for 3 months is less than 125 billion won, they are defined as Best customers. The others customers are VIP customers. From Fig. 4, if the trade amount in HTS for 3 months is less than about 13.6 billion won and the total trade amount of both modes is less than 23.3 billion won, they are defined as Normal customers. Also, if the trade amount in HTS for 3 months is more than 13.6 billion won and the total trade amount of both modes is more than 75.9 billion won, they are defined as VIP customers. The rest of them are considered as Best customers. Table 3 The segmentation of customers of HTS using fuzzy Kmeans Number of customers Cluster center Total trade amount for 3 months (units: won) Trade amount in HTS mode for 3 months (units: won) Normal million 19.9 million Best billion 30.9 billion VIP billion billion
5 H.W. Shin, S.Y. Sohn / Expert Systems with Applications 27 (2004) Fig. 2. Transaction distribution in HTS for 3 months. 4. New brokerage commission policy In this section, we suggest the graded brokerage commission policy based on the three clusters of customers. The new policy must be effective enough to avoid the churning behavior of the existing customers and at the same time it should result in sufficient profit to the security corporation. As shown in Table 5, we suggest that the proposed commission of Normal, Best, and VIP customers be 0.5, 0.45, and 0.4% in the representative assisted mode while 0.18, 0.1, and 0.06% for HTS, respectively. This policy is then compared to the existing commission system of A stock corporation (see Table 6). Next, we compare the profit of existing commission policy with the profit of the proposed commission policy in Table 7. The proposed commission policy is based on the threshold values obtained by decision tree using fuzzy Kmeans algorithm. As shown in Table 7, one can see that the new policy would provide the expected profit which is similar to that of the existing policy. However, it should be noted that the proposed commission policy have additional positive effects on customer relationship management (CRM) by recognizing the value of different levels of customers. Therefore, in a long run, we can conclude that the new policy would bring higher profit than the existing commission policy. 5. Conclusion In this article, we found a fuzzy Kmeans clustering being the most stable to group stock trading customers and used it to classify three tiers of customers (Normal, Best, and VIP level) based on the total trade amount over 3month period. For each group, different brokerage commission rate is assigned as 0.4, 0.45, and 0.5% for the representative assisted mode while 0.06, 0.1, and 0.18% for HTS. This approach is different from the existing graded commission policy in that the proposed policy adopts the idea of the graded commission based on the historically accumulated transaction amount made by customer. This new approach is expected to bring more profit by treat loyal customers in a better manner and subsequently retain them in a longer term. Data used in this article for clustering contain relatively short history of customers transaction. After data warehousing project is completed and it accumulates a larger amount of information, clustering may need to be redone for tuning. Our new policy is mainly dependent on the cumulative transaction. Some other facts such as frequency of transaction may need to be included in the policy. Table 4 Intraclass inertia by clustering method (without a particular customer) Cluster analysis method Mode Intraclass Inertia Kmeans Representative assisted mode HTS SOM Representative assisted mode HTS p Fuzzy Representative assisted mode p Kmeans HTS
6 32 H.W. Shin, S.Y. Sohn / Expert Systems with Applications 27 (2004) Fig. 3. Classifying the customers of the Representative assisted mode (unit: won, the number in the parenthesis is the count per class). Fig. 4. Classifying the customer for HTS mode (unit: won, the number in the parenthesis is the is the count per class).
7 H.W. Shin, S.Y. Sohn / Expert Systems with Applications 27 (2004) Table 5 Newly proposed commission rate Brokerage commission in representative assisted mode (%) Brokerage commission in HTS (%) Acknowledgement This work was supported by grant No. R from Korea Science & Engineering Foundation. Normal Best VIP Table 6 Currently used commission rates of A stock corporation Mode Amount of transaction Brokerage commission Representative Under 200 million 0.5% assisted mode From 200 to 500 million 0.45% þ 1000 Over 500million 0.4% þ 500 HTS Under 250million 0.23% From 250 to 500 million 0.19% þ 1000 From 500 to 1000 million 0.17% þ 500 From 1000 to 3000million 0.15% 3000 million 0.09% Table 7 Comparison of the two commission policies in A stock corporation (unit: won) Class Profit by the existing commission policy Profit by the proposed commission policy Representative assisted 1,473,640,285 1,428,058,209 mode HTS 1,349,532,283 1,356,896,165 Total commission 2,823,172,568 2,784,954,374 More variations of approach based on the longer timeseries data set are left for further study areas. References Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algorithms. New York: Plenum Press. Hartfeil, G. (1996). Bank one measures profitability of customers, not just products. Journal of Retail Banking Services, 18(2), Hruschka, H. (1986). Market definition and segmentation using fuzzy clustering methods. International Journal of Research in Marketing, 3, Hunt, P. (1999). The pricing is right. Canadian Insurance Statistics, Kim, K. S., & Han, I. (2001). The clusterindexing method for casebased reasoning using selforganizing maps and learning vector quantization for bond rating cases. Expert Systems with Applications, 21(3), Kohonen, T. (1982). Selforganized formation of topologically correct, feature maps. Biological Cybernetics, 43(1), Mangiameli, P., Chen, S. K., & West, D. A. (1996). Comparison of SOM neural network and hierarchical clustering methods. European Journal of Operational Research, 93(2), Michaud, P. (1997). Clustering techniques. Future Generation Computer System, 13(2), Nair, G. J., & Narendran, T. T. (1997). Cluster goodness: a new measure of performance for cluster formation in the design of cellular manufacturing systems. International Journal of Production Economics, 48(1), Ozer, M. (2001). User segmentation of online music services using fuzzy clustering. Omega, 29(2), Weber, R. (1996). Customer segmentation for banks and insurance groups with fuzzy clustering techniques. In J. F. Baldwin (Ed.), Fuzzy logic. New York: Wiley. Zeithaml, V. A., Rust, R. T., & Lemon, K. N. (2001). The customer pyramid: creating and serving profitable customers. California Management Review, 43(4), Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8,
SELFORGANISING MAPPING NETWORKS (SOM) WITH SAS EMINER
SELFORGANISING MAPPING NETWORKS (SOM) WITH SAS EMINER C.Sarada, K.Alivelu and Lakshmi Prayaga Directorate of Oilseeds Research, Rajendranagar, Hyderabad saradac@yahoo.com Self Organising mapping networks
More informationData Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin
Data Mining for Customer Service Support Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Traditional Hotline Services Problem Traditional Customer Service Support (manufacturing)
More informationExpert Systems with Applications
Expert Systems with Applications 37 (2010) 8793 8798 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa Clustering Indian stock market
More informationMobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
More informationChurn problem in retail banking Current methods in churn prediction models Fuzzy cmeans clustering algorithm vs. classical kmeans clustering
CHURN PREDICTION MODEL IN RETAIL BANKING USING FUZZY C MEANS CLUSTERING Džulijana Popović Consumer Finance, Zagrebačka banka d.d. Bojana Dalbelo Bašić Faculty of Electrical Engineering and Computing University
More informationData Mining and Neural Networks in Stata
Data Mining and Neural Networks in Stata 2 nd Italian Stata Users Group Meeting Milano, 10 October 2005 Mario Lucchini e Maurizo Pisati Università di MilanoBicocca mario.lucchini@unimib.it maurizio.pisati@unimib.it
More informationSOFT COMPUTING METHODS FOR CUSTOMER CHURN MANAGEMENT
SOFT COMPUTING METHODS FOR CUSTOMER CHURN MANAGEMENT LITERATURE REVIEW Author: Triin Kadak Helsinki, 2007 1. INTRODUCTION...3 2. LITERATURE REVIEW...5 2.1. PAPER ONE...5 2.1.1. OVERVIEW...5 2.1.2. FUTURE
More informationFlexible Neural Trees Ensemble for Stock Index Modeling
Flexible Neural Trees Ensemble for Stock Index Modeling Yuehui Chen 1, Ju Yang 1, Bo Yang 1 and Ajith Abraham 2 1 School of Information Science and Engineering Jinan University, Jinan 250022, P.R.China
More informationA Study of Web Log Analysis Using Clustering Techniques
A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept
More informationData Mining 資 料 探 勘. 分 群 分 析 (Cluster Analysis)
Data Mining 資 料 探 勘 Tamkang University 分 群 分 析 (Cluster Analysis) DM MI Wed,, (: :) (B) MinYuh Day 戴 敏 育 Assistant Professor 專 任 助 理 教 授 Dept. of Information Management, Tamkang University 淡 江 大 學 資
More informationLoad balancing in a heterogeneous computer system by selforganizing Kohonen network
Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by selforganizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.
More informationCustomer Relationship Management using Adaptive Resonance Theory
Customer Relationship Management using Adaptive Resonance Theory Manjari Anand M.Tech.Scholar Zubair Khan Associate Professor Ravi S. Shukla Associate Professor ABSTRACT CRM is a kind of implemented model
More informationChapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining BecerraFernandez, et al.  Knowledge Management 1/e  2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
More informationAn Analysis on Density Based Clustering of Multi Dimensional Spatial Data
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationCluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009
Cluster Analysis Alison Merikangas Data Analysis Seminar 18 November 2009 Overview What is cluster analysis? Types of cluster Distance functions Clustering methods Agglomerative Kmeans Densitybased Interpretation
More informationDynamic intelligent cleaning model of dirty electric load data
Available online at www.sciencedirect.com Energy Conversion and Management 49 (2008) 564 569 www.elsevier.com/locate/enconman Dynamic intelligent cleaning model of dirty electric load data Zhang Xiaoxing
More informationClustering & Association
Clustering  Overview What is cluster analysis? Grouping data objects based only on information found in the data describing these objects and their relationships Maximize the similarity within objects
More informationQuality Assessment in Spatial Clustering of Data Mining
Quality Assessment in Spatial Clustering of Data Mining Azimi, A. and M.R. Delavar Centre of Excellence in Geomatics Engineering and Disaster Management, Dept. of Surveying and Geomatics Engineering, Engineering
More informationComparison of Kmeans and Backpropagation Data Mining Algorithms
Comparison of Kmeans and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
More informationClustering in Machine Learning. By: Ibrar Hussain Student ID:
Clustering in Machine Learning By: Ibrar Hussain Student ID: 11021083 Presentation An Overview Introduction Definition Types of Learning Clustering in Machine Learning Kmeans Clustering Example of kmeans
More informationClustering. Adrian Groza. Department of Computer Science Technical University of ClujNapoca
Clustering Adrian Groza Department of Computer Science Technical University of ClujNapoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 Kmeans 3 Hierarchical Clustering What is Datamining?
More informationMachine Learning with MATLAB David Willingham Application Engineer
Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationChapter 7. Cluster Analysis
Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. DensityBased Methods 6. GridBased Methods 7. ModelBased
More informationSelfOrganizing g Maps (SOM) COMP61021 Modelling and Visualization of High Dimensional Data
SelfOrganizing g Maps (SOM) Ke Chen Outline Introduction ti Biological Motivation Kohonen SOM Learning Algorithm Visualization Method Examples Relevant Issues Conclusions 2 Introduction Selforganizing
More informationUNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS
UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable
More informationNeural Networks. Neural network is a network or circuit of neurons. Neurons can be. Biological neurons Artificial neurons
Neural Networks Neural network is a network or circuit of neurons Neurons can be Biological neurons Artificial neurons Biological neurons Building block of the brain Human brain contains over 10 billion
More informationEnhanced Boosted Trees Technique for Customer Churn Prediction Model
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 22503021, ISSN (p): 22788719 Vol. 04, Issue 03 (March. 2014), V5 PP 4145 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction
More informationUsing Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
More informationSTATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Webbased Analytics Table
More informationSelf Organizing Maps: Fundamentals
Self Organizing Maps: Fundamentals Introduction to Neural Networks : Lecture 16 John A. Bullinaria, 2004 1. What is a Self Organizing Map? 2. Topographic Maps 3. Setting up a Self Organizing Map 4. Kohonen
More informationPredictive Dynamix Inc
Predictive Modeling Technology Predictive modeling is concerned with analyzing patterns and trends in historical and operational data in order to transform data into actionable decisions. This is accomplished
More informationPredictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD
Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,
More informationMANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL
MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL G. Maria Priscilla 1 and C. P. Sumathi 2 1 S.N.R. Sons College (Autonomous), Coimbatore, India 2 SDNB Vaishnav College
More informationImproved Fuzzy Cmeans Clustering Algorithm Based on Cluster Density
Journal of Computational Information Systems 8: 2 (2012) 727 737 Available at http://www.jofcis.com Improved Fuzzy Cmeans Clustering Algorithm Based on Cluster Density Xiaojun LOU, Junying LI, Haitao
More informationWhat is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling
MS4424 Data Mining & Modelling MS4424 Data Mining & Modelling Lecturer : Dr Iris Yeung Room No : P7509 Tel No : 2788 8566 Email : msiris@cityu.edu.hk 1 Aims To introduce the basic concepts of data mining
More informationA new pattern recognition methodology for classification of load profiles for ships electric consumers
A new pattern recognition methodology for classification of load profiles for ships electric consumers GJ Tsekouras 1, IK Hatzilau 1, JM Prousalidis 1,2 1 Hellenic Naval Academy, Department of Electrical
More information1 Choosing the right data mining techniques for the job (8 minutes,
CS490D Spring 2004 Final Solutions, May 3, 2004 Prof. Chris Clifton Time will be tight. If you spend more than the recommended time on any question, go on to the next one. If you can t answer it in the
More informationNew Ensemble Combination Scheme
New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,
More informationClassification Techniques (1)
10 10 Overview Classification Techniques (1) Today Classification Problem Classification based on Regression Distancebased Classification (KNN) Net Lecture Decision Trees Classification using Rules Quality
More informationCLASSIFICATION AND CLUSTERING. Anveshi Charuvaka
CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training
More informationA Survey of Kernel Clustering Methods
A Survey of Kernel Clustering Methods Maurizio Filippone, Francesco Camastra, Francesco Masulli and Stefano Rovetta Presented by: Kedar Grama Outline Unsupervised Learning and Clustering Types of clustering
More informationPattern Recognition Using Feature Based DieMap Clusteringin the Semiconductor Manufacturing Process
Pattern Recognition Using Feature Based DieMap Clusteringin the Semiconductor Manufacturing Process Seung Hwan Park, ChengSool Park, Jun Seok Kim, Youngji Yoo, Daewoong An, JunGeol Baek Abstract Depending
More informationClustering. Data Mining. Abraham Otero. Data Mining. Agenda
Clustering 1/46 Agenda Introduction Distance Knearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in
More informationThe Result Analysis of the Cluster Methods by the Classification of Municipalities
The Result Analysis of the Cluster Methods by the Classification of Municipalities PAVEL PETR, KAŠPAROVÁ MILOSLAVA System Engineering and Informatics Institute Faculty of Economics and Administration University
More informationIra J. Haimowitz Henry Schwarz
From: AAAI Technical Report WS9707. Compilation copyright 1997, AAAI (www.aaai.org). All rights reserved. Clustering and Prediction for Credit Line Optimization Ira J. Haimowitz Henry Schwarz General
More informationUsing Artificial Intelligence and Machine Learning Techniques. Some Preliminary Ideas. Presentation to CWiPP 1/8/2013 ICOSS Mark Tomlinson
Using Artificial Intelligence and Machine Learning Techniques. Some Preliminary Ideas. Presentation to CWiPP 1/8/2013 ICOSS Mark Tomlinson Artificial Intelligence Models Very experimental, but timely?
More informationPredicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
More informationA Neural Network based Approach for Predicting Customer Churn in Cellular Network Services
A Neural Network based Approach for Predicting Customer Churn in Cellular Network Services Anuj Sharma Information Systems Area Indian Institute of Management, Indore, India Dr. Prabin Kumar Panigrahi
More informationComparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations
Volume 3, No. 8, August 2012 Journal of Global Research in Computer Science REVIEW ARTICLE Available Online at www.jgrcs.info Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations
More information6.2.8 Neural networks for data mining
6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural
More informationCluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototypebased Fuzzy cmeans
More informationUsing Predictive Analytics to Detect Fraudulent Claims
Using Predictive Analytics to Detect Fraudulent Claims May 17, 211 Roosevelt C. Mosley, Jr., FCAS, MAAA CAS Spring Meeting Palm Beach, FL Experience the Pinnacle Difference! Predictive Analysis for Fraud
More informationData Mining Project Report. Document Clustering. Meryem UzunPer
Data Mining Project Report Document Clustering Meryem UzunPer 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. Kmeans algorithm...
More informationNeural Network Addin
Neural Network Addin Version 1.5 Software User s Guide Contents Overview... 2 Getting Started... 2 Working with Datasets... 2 Open a Dataset... 3 Save a Dataset... 3 Data Preprocessing... 3 Lagging...
More informationVisualization of Breast Cancer Data by SOM Component Planes
International Journal of Science and Technology Volume 3 No. 2, February, 2014 Visualization of Breast Cancer Data by SOM Component Planes P.Venkatesan. 1, M.Mullai 2 1 Department of Statistics,NIRT(Indian
More informationDATA ANALYTICS USING R
DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
More informationIntroduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
More informationReview on Financial Forecasting using Neural Network and Data Mining Technique
ORIENTAL JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY An International Open Free Access, Peer Reviewed Research Journal Published By: Oriental Scientific Publishing Co., India. www.computerscijournal.org ISSN:
More informationA Hybrid Model of Data Mining and MCDM Methods for Estimating Customer Lifetime Value. Malaysia
A Hybrid Model of Data Mining and MCDM Methods for Estimating Customer Lifetime Value Amir Hossein Azadnia a,*, Pezhman Ghadimi b, Mohammad Molani Aghdam a a Department of Engineering, Ayatollah Amoli
More informationSTOCK MARKET TRENDS USING CLUSTER ANALYSIS AND ARIMA MODEL
Stock AsianAfrican Market Trends Journal using of Economics Cluster Analysis and Econometrics, and ARIMA Model Vol. 13, No. 2, 2013: 303308 303 STOCK MARKET TRENDS USING CLUSTER ANALYSIS AND ARIMA MODEL
More informationFig. 1 A typical Knowledge Discovery process [2]
Volume 4, Issue 7, July 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Review on Clustering
More informationPractical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING
Practical Applications of DATA MINING Sang C Suh Texas A&M University Commerce r 3 JONES & BARTLETT LEARNING Contents Preface xi Foreword by Murat M.Tanik xvii Foreword by John Kocur xix Chapter 1 Introduction
More informationStabilization by Conceptual Duplication in Adaptive Resonance Theory
Stabilization by Conceptual Duplication in Adaptive Resonance Theory Louis Massey Royal Military College of Canada Department of Mathematics and Computer Science PO Box 17000 Station Forces Kingston, Ontario,
More informationA Complete Gradient Clustering Algorithm for Features Analysis of Xray Images
A Complete Gradient Clustering Algorithm for Features Analysis of Xray Images Małgorzata Charytanowicz, Jerzy Niewczas, Piotr A. Kowalski, Piotr Kulczycki, Szymon Łukasik, and Sławomir Żak Abstract Methods
More informationUse of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationOpen Access Research on Application of Neural Network in Computer Network Security Evaluation. Shujuan Jin *
Send Orders for Reprints to reprints@benthamscience.ae 766 The Open Electrical & Electronic Engineering Journal, 2014, 8, 766771 Open Access Research on Application of Neural Network in Computer Network
More informationUSING SELFORGANIZED MAPS AND ANALYTIC HIERARCHY PROCESS FOR EVALUATING CUSTOMER PREFERENCES IN NETBOOK DESIGNS
International Journal of Electronic Business Management, Vol. 7, No. 4, pp. 297303 (2009) 297 USING SELFORGANIZED MAPS AND ANALYTIC HIERARCHY PROCESS FOR EVALUATING CUSTOMER PREFERENCES IN NETBOOK DESIGNS
More informationThere are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:
Statistics: Rosie Cornish. 2007. 3.1 Cluster Analysis 1 Introduction This handout is designed to provide only a brief introduction to cluster analysis and how it is done. Books giving further details are
More informationData Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototypebased clustering Densitybased clustering Graphbased
More informationData Mining Techniques Chapter 7: Artificial Neural Networks
Data Mining Techniques Chapter 7: Artificial Neural Networks Artificial Neural Networks.................................................. 2 Neural network example...................................................
More informationCredit Card Fraud Detection Using Self Organised Map
International Journal of Information & Computation Technology. ISSN 09742239 Volume 4, Number 13 (2014), pp. 13431348 International Research Publications House http://www. irphouse.com Credit Card Fraud
More informationPOSTHOC SEGMENTATION USING MARKETING RESEARCH
Annals of the University of Petroşani, Economics, 12(3), 2012, 3948 39 POSTHOC SEGMENTATION USING MARKETING RESEARCH CRISTINEL CONSTANTIN * ABSTRACT: This paper is about an instrumental research conducted
More informationClustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
More informationNeural Networks Kohonen SelfOrganizing Maps
Neural Networks Kohonen SelfOrganizing Maps Mohamed Krini ChristianAlbrechtsUniversität zu Kiel Faculty of Engineering Institute of Electrical and Information Engineering Digital Signal Processing and
More informationThe Research of Data Mining Based on Neural Networks
2011 International Conference on Computer Science and Information Technology (ICCSIT 2011) IPCSIT vol. 51 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V51.09 The Research of Data Mining
More informationClustering. 15381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is
Clustering 15381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv BarJoseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is
More informationSurvey on Students Academic Failure and Dropout using Data Mining Techniques
ISSN 23202602 Volume 3, No.5, May 2014 P.Senthil Vadivu International et al., International Journal of of Advances in Computer in Computer Science and Science Technology, and 3(5), Technology May 2014,
More informationCluster analysis with SPSS: KMeans Cluster Analysis
analysis with SPSS: KMeans Analysis analysis is a type of data classification carried out by separating the data into groups. The aim of cluster analysis is to categorize n objects in k (k>1) groups,
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
More informationIs a Data Scientist the New Quant? Stuart Kozola MathWorks
Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by
More informationLife Insurance Customers segmentation using fuzzy clustering
Available online at www.worldscientificnews.com WSN 21 (2015) 3849 EISSN 23922192 Life Insurance Customers segmentation using fuzzy clustering Gholamreza Jandaghi*, Hashem Moazzez, Zahra Moradpour Faculty
More informationLVQ PlugIn Algorithm for SQL Server
LVQ PlugIn Algorithm for SQL Server Licínia Pedro Monteiro Instituto Superior Técnico licinia.monteiro@tagus.ist.utl.pt I. Executive Summary In this Resume we describe a new functionality implemented
More informationDATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
More informationClassification of Engineering Consultancy Firms Using SelfOrganizing Maps: A Scientific Approach
International Journal of Civil & Environmental Engineering IJCEEIJENS Vol:13 No:03 46 Classification of Engineering Consultancy Firms Using SelfOrganizing Maps: A Scientific Approach Mansour N. Jadid
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDDLAB ISTI CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationIncorporating Soft Computing Techniques Into a Probabilistic Intrusion Detection System
154 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART C: APPLICATIONS AND REVIEWS, VOL. 32, NO. 2, MAY 2002 Incorporating Soft Computing Techniques Into a Probabilistic Intrusion Detection System
More informationData Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms
Data Mining Techniques forcrm Data Mining The nontrivial extraction of novel, implicit, and actionable knowledge from large datasets. Extremely large datasets Discovery of the nonobvious Useful knowledge
More informationData Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
More informationThe influence of teacher support on national standardized student assessment.
The influence of teacher support on national standardized student assessment. A fuzzy clustering approach to improve the accuracy of Italian students data Claudio Quintano Rosalia Castellano Sergio Longobardi
More informationAnalyzing Transaction Data
Wissuwa, Stefan; Dipl. Wirt.Inf. Wismar University s.wissuwa@wi.hswismar.de Cleve, Jürgen; Prof. Dr. Wismar University j.cleve@wi.hswismar.de Lämmel, Uwe; Prof. Dr. Wismar University u.laemmel@wi.hswismar.de
More informationRole of Social Networking in Marketing using Data Mining
Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:
More informationCITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學. SelfOrganizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理
CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學 SelfOrganizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理 Submitted to Department of Electronic Engineering 電 子 工 程 學 系 in Partial Fulfillment
More informationReal Stock Trading Using Soft Computing Models
Real Stock Trading Using Soft Computing Models Brent Doeksen 1, Ajith Abraham 2, Johnson Thomas 1 and Marcin Paprzycki 1 1 Computer Science Department, Oklahoma State University, OK 74106, USA, 2 School
More informationManagement Science Letters
Management Science Letters 4 (2014) 905 912 Contents lists available at GrowingScience Management Science Letters homepage: www.growingscience.com/msl Measuring customer loyalty using an extended RFM and
More informationMachine Learning and Data Mining. Clustering. (adapted from) Prof. Alexander Ihler
Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand
More informationA Basic Guide to Modeling Techniques for All Direct Marketing Challenges
A Basic Guide to Modeling Techniques for All Direct Marketing Challenges Allison Cornia Database Marketing Manager Microsoft Corporation C. Olivia Rud Executive Vice President Data Square, LLC Overview
More information