Expert Systems with Applications 27 (2004) 27 33 www.elsevier.com/locate/eswa Segmentation of stock trading customers according to potential value H.W. Shin a, *, S.Y. Sohn b a Samsung Economy Research Institute, Kúkje Cener Building, 191, Hangangro 2-Ga, Seoul, South Korea b Department of Computer Science and Industrial Systems Engineering, Yonsei University, Seoul, South Korea Abstract In this article, we use three clustering methods (K-means, self-organizing map, and fuzzy K-means) to find properly graded stock market brokerage commission rates based on the 3-month long total trades of two different transaction modes (representative assisted and online trading system). Stock traders for both modes are classified in terms of the amount of the total trade as well as the amount of trade of each transaction mode, respectively. Results of our empirical analysis indicate that fuzzy K-means cluster analysis is the most robust approach for segmentation of customers of both transaction modes. We then propose a decision tree based rule to classify three groups of customers and suggest different brokerage commission rates of 0.4, 0.45, and 0.5% for representative assisted mode and 0.06, 0.1, and 0.18% for online trading system, respectively. q 2003 Elsevier Ltd. All rights reserved. Keywords: Customer relationship management; Customer segmentation; K-means clustering; Self-organizing map; Fuzzy K-means 1. Introduction The scale of Korean stock market has been rapidly increased in 1990s. In spite of the financial crisis occurred in Korea in 1997, there were more than 30 domestic security corporations, and daily average stock transaction had reached 4800 billion won in 2000, compared to 4100 billion won a year ago. It indicates that the commission based on the transaction was considerably increased as well. This commission is one of the main sources for profit of security corporations and each security corporation introduces its own commission rate to increase the profit. It is typically based on each trading amount itself. However, this kind of system does not consider the potential customer value over time. Those who have traded more in a cumulative manner continuously over a longer time period needs to be treated in a better manner (Hartfeil, 1996). In commercial banking system, Zeithaml, Rust, and Lemon (2001) presented that superior 20% of customers produced 82% of the bank s retail profit. Hunt (1999) showed that the charge system of insurance corporation should be arranged not uniformly but differently according to customer s potential value. This * Corresponding author. Tel.: þ82-2-3780-8022; fax: þ8-22-3780-8152. E-mail addresses: hyungwon.shin@samsung.com (H.W. Shin); sohns@yonsei.ac.kr (S.Y. Sohn). argument supports the value of better treatment of loyal customers. In this article, we propose a robust clustering algorithm to classify the stock traders into several groups in terms of the three 3-month transaction in order to suggest the graded commission policy for each group. Variables used for clustering criteria are transactions made on both representative assisted trading and online Home Trading System (HTS). Clustering methods used are K-means clustering, self-organizing map (SOM), and fuzzy K-means method. The cut-off value of each customer group is set based on classification and regression tree (CART). The rest of this article is organized as follows. In Section 2 we describe three clustering methods along with the performance measure for comparison. In Section 3 we apply proposed algorithms to the field data and come up with three groups of customers. Subsequently, in Section 4 we present new brokerage commission rate and it is compared to the existing commission rate in terms of profit. Finally in Section 5, we discuss the implication of our results and suggest further study areas. 2. Three clustering algorithms Cluster analysis can be used for gathering objects (observation) on the basis of their variables. We use three 0957-4174/$ - see front matter q 2003 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2003.12.002
28 H.W. Shin, S.Y. Sohn / Expert Systems with Applications 27 (2004) 27 33 kinds of clustering methods for customer segmentation: K-means, SOM, and fuzzy K-means. For brief description of each method, let us assume that we are interested in clustering N samples with respect to P variables into K clusters. For sample i; x i ¼ðx i1 ; x i2 ; ; x ip ; ; x ip Þ represents a vector of P characteristic variables. Typically K is unknown but for stock customer segmentation, we use K ¼ 3: 2.1. K-means clustering algorithm K-means method is widely used due to rapid processing ability of large data. K-means clustering proceeds in the following order. Firstly, K number of observations is randomly selected among all N number of observations according to the number of clusters. They become centers of initial clusters. Secondly, for each of remaining N K observations, find the nearest cluster in terms of the Euclidean distance with respect to x i ¼ ðx i1 ; x i2 ; ; x ip ; ; x ip Þ After each observation is assigned the nearest cluster, recompute the center of the cluster. Lastly, after the allocation of all observation, calculate the Euclidean distance between each observation and cluster s center point and confirm whether it is allocated to the nearest cluster or not. 2.2. Self-organizing map The SOM is an unsupervised neural network model devised by Kohonen (1982). As with other neural networks the analysis is based on the solution of a large number of simple operations that can be performed in parallel. The SOM network typically has two layers of nodes: an input layer and an output layer. The neurons in the output layer are arranged in a grid and are influenced by their neighbors in this grid. The goal is to automatically cluster the input samples in such a way that similar samples are represented by the same output neuron (Kim & Han, 2001; Mangiameli, Chen, & West, 1996). Since each of the characteristic variables is linked to every output neuron by a weighted connection, each output neuron j ðj ¼ 1; ; KÞ has the same number of weights w j associated with as the number of input variables. Starting from a randomly initialized weights, it learns to adapt its weight according to the input samples as follows. When an input sample, x i ; is presented to the SOM network, the neurons compute distance between weight vectors w j ¼ðw j1 ; w j2 ; ; w jp ; ; w jp Þ and the input x ¼ðx i1 ; x i2 ; ; x ip ; ; x ip Þ: The neuron with the minimum distance, called winner, is then determined based on Min D j vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ux P ¼ t ½x ip 2 w jp Š 2 p¼1 ð1þ where w jp is the weight of the j th neuron linked to p th variable. The weights of the winner as well as in its neighborhood are then updated using the following equation: w j new ¼ w j old þ akx i 2 w j old k where w j new is the new weight vector and w j old is the old weight vector of the j th neuron, and a is the learning rate ð0, a, 1Þ: This procedure is over when the difference in the error (e.g. average of the Euclidean distances of each input sample and its best matching weight vector) between the current and the previous iteration is smaller than a given value 1. After the stop criterion is satisfied, each neuron in the network represents a cluster. 2.3. Fuzzy K-means clustering analysis Fuzzy set theory was introduced in the 1960s as a way of explaining uncertainty in data structure (Zadeh, 1965). Fuzzy K-means (also known as fuzzy c-means) clustering has been investigated by Bezdek (1981) and was compared to the non-fuzzy clustering method. Hruschka (1986) and Weber (1996) showed in their empirical study that fuzzy clustering provided more insight than non-fuzzy clustering in terms of market segment information. Fuzzy clustering segments the samples into 1, K, N clusters, estimates sample cluster membership and simultaneously estimates the cluster centers. The cluster membership of x i in the cluster s; u si ; is between 0 and 1 and is defined as follows (Ozer, 2001) 1 u si ¼!; kx i 2v s k 2=ðm21Þ forx i v j ; ;s;i; andm.1 ð3þ j¼1 kx i 2v j k 2=ðm21Þ where m is the smoothing parameter which controls the fuzziness of the clusters, and v s is the vector of cluster centers ðv s1 ; v s2 ; ; v sp ; ; v sp Þ defined as X N I¼1 v s ¼ ðu siþ m x i X N i¼1 ðu ; ;s: ð4þ m siþ Optimal value of u is obtained so as to minimize the following objective function Min XN i¼1 s¼1 ðu si Þ m ðkx i 2 v s k 2 Þ The constraints used are as follows 0 # u si # 1; ;s; i ð6þ s¼1 ð2þ ð5þ u si ¼ 1; ;s: ð7þ
H.W. Shin, S.Y. Sohn / Expert Systems with Applications 27 (2004) 27 33 29 Condition (6) ensures that the degrees of memberships are between 0 and 1, and condition (7) means that, for a given sample, the degrees of membership across the clusters sum to one. Once optimal values of u are found, a case with highest associated u is assigned a corresponding cluster. 2.4. Performance comparison of the three clustering methods We compare the performances of these clustering methods using intraclass method presented in Michaud (1997). Intraclass inertia is a measure of how compact each cluster (class) is when the number of cluster is fixed. Usually the variables are scaled to be in the same range (Nair & Narendran, 1997). The mean of the j th cluster C j that has n j samples is defined as x j ¼ðx j1 ; x j2 ; ; x jp ; ; x jp Þ; where x jp ¼ð1=n j Þ X i[c j x ip The intraclass inertia I j of cluster j is defined as I j ¼ X XP ðx ip 2 x jp Þ 2 i[c j p¼1 Finally, the intraclass inertia FðKÞ for a given K clusters is defined as FðKÞ ¼ 1 n n j I j ¼ 1 n j¼1 X XP j¼1 i[c j p¼1 ðx ip 2 x jp Þ 2 ð8þ ð9þ ð10þ One can see that FðKÞ is the average squared Euclidean distance between each observation and its cluster mean. 3. A case study We randomly select 3000 customers who had transaction records from the middle of July to the middle of October in 1999 from stock corporation A and apply the three clustering methods. The stock transaction modes used are either representative assisted or online HTS. HTS customers directly buy and sell their stocks without the advice of the corporation s representatives. Results of the descriptive statistics of the sample data are given as follows. About 78% of the total trade amount was made by online HTS. In terms of gender, 68% of the customers are male. However, average trade amount made by female customers by both modes were 51 and 52%, respectively, for representative assisted and online HTS. This suggests the importance of marketing strategy for HTS and female customers. In terms of age, those who are older than 60 used representative assisted mode mostly. Also, their trade amount is the highest among various generations in both Table 1 Intraclass inertia of each clustering method Clustering method Mode Intraclass inertia K-means Representative assisted mode 7.2685 10 16p HTS 1.09 10 17 SOM Representative assisted mode 1.04394 10 17 HTS 1.12 10 18 Fuzzy K-means Representative assisted mode 7.29 10 16 HTS 9.55 10 16p transaction modes. In terms of the average transaction frequency, representative assisted mode is 1.8 times while online HTS is six times per month, respectively. We also estimate correlation between the trade amount made by each transaction mode and the sum of them. Apparently the correlation between the two modes is relatively low (0.38) while those between single mode and the total transactions are 0.76 and 0.89, respectively, for representative assisted and online HTS. 3.1. Cluster analysis of customers Clustering methods are used to segment the customers for both modes, respectively, using two variables for clustering of customers each mode. Variables used for cluster analysis for representative assisted mode are both total trade amount and representative assisted trade amount over the 3-month period. In the case of HTS mode, we use both total trade amount and trade amount in HTS over the 3-month period. Customers are segmented into three clusters (Normal, Best, VIP customers). After some experimentation with the parameters of clustering methods we set the following parameters: SOM learning rate ðaþ is equal to 0.1 and fuzzy K-means smoothing parameter ðmþ is equal to 1.2. Fuzzy K-means smoothing parameter ðmþ is equal to 1.2. For comparison purpose, the resulting compactness of clusters of the three clustering methods (K-means, SOM, fuzzy K-means) is summarized in Table 1. In case of customer segmentation in the representative assisted mode, K-means clustering method turns out to be Table 2 The segmentation of customers in representative assisted mode using K-means Number of customers Cluster center Total trade amount for 3 months (units: won) Trade amount in representative assisted mode for 3 months (units: won) Normal 2969 31.0 million 6.4 million Best 30 84.0 billion 25.6 billion VIP 1 558.0 billion 297.3 billion
30 H.W. Shin, S.Y. Sohn / Expert Systems with Applications 27 (2004) 27 33 Fig. 1. Transaction distribution in a representative assisted mode for 3 months. the best while in the segmentation of customers in HTS, fuzzy K-means method is the winner. Table 2 and Fig. 1 represent the segmentation of customers in Representative assisted mode using K-means clustering method while Table 3 and Fig. 2 represent the segmentation of customers of HTS using fuzzy K-means. The results indicate that the number of Best customers and VIP customers are small in the case of representative assisted mode compared to HTS. As shown in Figs. 1 and 2, there is a particular data that have a very large amount of total trade (558 billion won for 3 months) among VIP customers. This customer may be considered as an outlier. Therefore, we compare the clustering results without this particular customer. Results are given in Table 4. In this case, fuzzy K-means has the best performance in representative assisted mode. SOM is the most suitable in HTS, but fuzzy K-means produces fairly good performance as well. Generally, we can conclude that fuzzy K-means provides relatively robust results in terms of intraclass inertia for both modes. 3.2. Classification of three group of customers In practice, we need threshold values to classify the three different groups We use decision tree to find the threshold values for customer segmentation of both transaction modes. The class (Normal, Best, VIP) of outcome is categorized by fuzzy K-means after deleing an outlier. Seventy percentage of 2999 (except a particular customer) customers data are assigned for training while 30% are assigned for validation using a segment based stratified sampling approach. We then use CART algorithm to find the threshold values for the three groups. Trees in Figs. 3 and 4 show the threshold values for customer segmentation. From Fig. 3, if the total trade amount of both modes for three months is less than about 19.3 billion won, they are defined as Normal customers. Also if the total trade amount of both modes for 3 months is more than 19.3 billion won and the trade amount in the representative assisted mode for 3 months is less than 125 billion won, they are defined as Best customers. The others customers are VIP customers. From Fig. 4, if the trade amount in HTS for 3 months is less than about 13.6 billion won and the total trade amount of both modes is less than 23.3 billion won, they are defined as Normal customers. Also, if the trade amount in HTS for 3 months is more than 13.6 billion won and the total trade amount of both modes is more than 75.9 billion won, they are defined as VIP customers. The rest of them are considered as Best customers. Table 3 The segmentation of customers of HTS using fuzzy K-means Number of customers Cluster center Total trade amount for 3 months (units: won) Trade amount in HTS mode for 3 months (units: won) Normal 2915 26.1 million 19.9 million Best 80 41.8 billion 30.9 billion VIP 5 261.8 billion 199.1 billion
H.W. Shin, S.Y. Sohn / Expert Systems with Applications 27 (2004) 27 33 31 Fig. 2. Transaction distribution in HTS for 3 months. 4. New brokerage commission policy In this section, we suggest the graded brokerage commission policy based on the three clusters of customers. The new policy must be effective enough to avoid the churning behavior of the existing customers and at the same time it should result in sufficient profit to the security corporation. As shown in Table 5, we suggest that the proposed commission of Normal, Best, and VIP customers be 0.5, 0.45, and 0.4% in the representative assisted mode while 0.18, 0.1, and 0.06% for HTS, respectively. This policy is then compared to the existing commission system of A stock corporation (see Table 6). Next, we compare the profit of existing commission policy with the profit of the proposed commission policy in Table 7. The proposed commission policy is based on the threshold values obtained by decision tree using fuzzy K-means algorithm. As shown in Table 7, one can see that the new policy would provide the expected profit which is similar to that of the existing policy. However, it should be noted that the proposed commission policy have additional positive effects on customer relationship management (CRM) by recognizing the value of different levels of customers. Therefore, in a long run, we can conclude that the new policy would bring higher profit than the existing commission policy. 5. Conclusion In this article, we found a fuzzy K-means clustering being the most stable to group stock trading customers and used it to classify three tiers of customers (Normal, Best, and VIP level) based on the total trade amount over 3-month period. For each group, different brokerage commission rate is assigned as 0.4, 0.45, and 0.5% for the representative assisted mode while 0.06, 0.1, and 0.18% for HTS. This approach is different from the existing graded commission policy in that the proposed policy adopts the idea of the graded commission based on the historically accumulated transaction amount made by customer. This new approach is expected to bring more profit by treat loyal customers in a better manner and subsequently retain them in a longer term. Data used in this article for clustering contain relatively short history of customers transaction. After data warehousing project is completed and it accumulates a larger amount of information, clustering may need to be re-done for tuning. Our new policy is mainly dependent on the cumulative transaction. Some other facts such as frequency of transaction may need to be included in the policy. Table 4 Intraclass inertia by clustering method (without a particular customer) Cluster analysis method Mode Intra-class Inertia K-means Representative assisted mode 6.19 10 16 HTS 6.82 10 16 SOM Representative assisted mode 7.45 10 16 HTS 5.43 10 16p Fuzzy Representative assisted mode 4.12 10 16p K-means HTS 5.48 10 16
32 H.W. Shin, S.Y. Sohn / Expert Systems with Applications 27 (2004) 27 33 Fig. 3. Classifying the customers of the Representative assisted mode (unit: won, the number in the parenthesis is the count per class). Fig. 4. Classifying the customer for HTS mode (unit: won, the number in the parenthesis is the is the count per class).
H.W. Shin, S.Y. Sohn / Expert Systems with Applications 27 (2004) 27 33 33 Table 5 Newly proposed commission rate Brokerage commission in representative assisted mode (%) Brokerage commission in HTS (%) Acknowledgement This work was supported by grant No. R04-2002-000-20003-0 from Korea Science & Engineering Foundation. Normal 0.5 0.18 Best 0.45 0.1 VIP 0.4 0.06 Table 6 Currently used commission rates of A stock corporation Mode Amount of transaction Brokerage commission Representative Under 200 million 0.5% assisted mode From 200 to 500 million 0.45% þ 1000 Over 500million 0.4% þ 500 HTS Under 250million 0.23% From 250 to 500 million 0.19% þ 1000 From 500 to 1000 million 0.17% þ 500 From 1000 to 3000million 0.15% 3000 million 0.09% Table 7 Comparison of the two commission policies in A stock corporation (unit: won) Class Profit by the existing commission policy Profit by the proposed commission policy Representative assisted 1,473,640,285 1,428,058,209 mode HTS 1,349,532,283 1,356,896,165 Total commission 2,823,172,568 2,784,954,374 More variations of approach based on the longer time-series data set are left for further study areas. References Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algorithms. New York: Plenum Press. Hartfeil, G. (1996). Bank one measures profitability of customers, not just products. Journal of Retail Banking Services, 18(2), 23 29. Hruschka, H. (1986). Market definition and segmentation using fuzzy clustering methods. International Journal of Research in Marketing, 3, 117 134. Hunt, P. (1999). The pricing is right. Canadian Insurance Statistics, 26 28. Kim, K. S., & Han, I. (2001). The cluster-indexing method for case-based reasoning using self-organizing maps and learning vector quantization for bond rating cases. Expert Systems with Applications, 21(3), 147 156. Kohonen, T. (1982). Self-organized formation of topologically correct, feature maps. Biological Cybernetics, 43(1), 59 69. Mangiameli, P., Chen, S. K., & West, D. A. (1996). Comparison of SOM neural network and hierarchical clustering methods. European Journal of Operational Research, 93(2), 402 417. Michaud, P. (1997). Clustering techniques. Future Generation Computer System, 13(2), 135 147. Nair, G. J., & Narendran, T. T. (1997). Cluster goodness: a new measure of performance for cluster formation in the design of cellular manufacturing systems. International Journal of Production Economics, 48(1), 49 61. Ozer, M. (2001). User segmentation of online music services using fuzzy clustering. Omega, 29(2), 193 206. Weber, R. (1996). Customer segmentation for banks and insurance groups with fuzzy clustering techniques. In J. F. Baldwin (Ed.), Fuzzy logic. New York: Wiley. Zeithaml, V. A., Rust, R. T., & Lemon, K. N. (2001). The customer pyramid: creating and serving profitable customers. California Management Review, 43(4), 118 142. Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8, 338 353.