DHL Data Mining Project. Customer Segmentation with Clustering

Transcription

1 DHL Data Mining Project Customer Segmentation with Clustering Timothy TAN Chee Yong Aditya Hridaya MISRA Jeffery JI Jun Yao 3/30/2010

2 DHL Data Mining Project Table of Contents Introduction to DHL and the data... 3 Problem... 3 Clustering Methods & Motivation... 3 Initial Manual Logical Segmentation (A & B) using Basic Statistics... 4 KDD Process, Methodology & Software used... 5 Data Preparation Data Preprocessing, Integration and Selection... 6 Selection and Separating Customer Transaction Data into Regular and Irregular Customers for Mining... 7 Regular Customers: Data Mining using Clustering... 8 Regular Customers: Cluster Analysis with Interactive Visualization... 9 Using Parallel Plot to decide Key Cluster Groups Focus for Regular Customers Summary Statistics of Cluster Groups Key Cluster Analysis for REGULAR Customers using Histogram to present distribution of different Cluster Groups Validating Hierarchical Clustering (Key Clusters) with another Clustering technique, K-Means Further Analysis to aid interpretation with Tree Map & industry attributes Tree Map Analysis using cluster groupings Findings & Recommendations for Decision Making for the Regular Customers Segment Irregular Customers: Cluster Analysis with Interactive Visualization Final Conclusion... 29

3 Introduction to DHL and the data Product: Real-world DHL 2008 Malaysia data set Objective: To analyze using data mining to perform customer segmentation and analyze useful patterns in DHL transaction data for the purpose of coming up with recommendations that can help in marketing, pricing and future Business Intelligence decisions. About DHL DHL is the global market leader of the international express and logistics industry, specializing in providing innovative and customized solutions from a single source. DHL offers expertise in express, air and ocean freight, overland transport, contract logistics solutions as well as international mail services, combined with worldwide coverage and an in-depth understanding of local markets. DHL's international network links more than 220 countries and territories worldwide. Some 300,000 employees are dedicated to providing fast and reliable services that exceed customers' expectation. Problem DHL lacks the information about their customer segmentation and profile to help them to make decisions in marketing, pricing and other business decisions. While they have strong domain knowledge and a rich past transaction data, they lack the expertise to mine out interesting patterns. Clustering Methods & Motivation The customer transaction data of a logistics company was analyzed. Data mining, clustering in particular, and visualization techniques were used to find meaningful relationships and patterns within the existing data. These techniques enable the company to target their customers more efficiently and improve their marketing processes. They also equip the organization with the much needed market intelligence that would enable it to gain a much better insight and visibility of its customers. SAS JMP was used to interactively visualize the data and cluster the customers based on their properties and characteristics. JMP also enabled us to evaluate the transactional patterns and data presentation and interpretation through intuitive graphs. Parallel plots were used to study pattern across clusters and bubble plots enabled us to identify temporal patterns within customer accounts. For accurate decision making, we used the knowledge discovery process (KDD), transforming the raw data into useful knowledge. The raw data was pre processed using SAS tools and explored using SAS JMP. The statistical summarizing features of JMP in synergy with the visualization techniques increase the potential for business decision making. The customers were broadly segmented as regular and irregular customers. Cluster and temporal analysis enabled the decision making for all the customer segments.

4 Initial Manual Logical Segmentation (A & B) using Basic Statistics For completeness, we use simple tools like Excel to explore the data and logically segment the data, without using of any Data Exploration or Mining software. Logical Segment A Alternate Logical Segment B For simplicity, we have broken it into 3 logical buckets to best represent the highly skewed and uneven data. Rough Sketch to show the distribution of the data.

5 KDD Process, Methodology & Software used Slides taken from Data Mining class slides. For Cleaning/Integration, we used SAS Integration Studio to clean and import the structured data into the SAS Suite. For Selection & Transformation, SAS Enterprise Guide was used to select and transform individual transaction records into derived aggregated customer monthly and total data, for attributes like total_revenue, total_shipments. The transaction data was also segmented into regular and irregular data in preparation for mining. For Mining, SAS Enterprise Miner was initially used, however, due to the limited hardware resources, large dataset, and the less-friendly user interface that Miner has, we decided to switch to SAS JMP a light-weight Data Mining software under the SAS Suite for our mining tasks. Benefits of SAS JMP over SAS Enterprise Miner - light-weight thus faster - supports for dendrogram for clustering - supports interactive visual analytics For Pattern Evaluation & Data Presentation, SAS JMP is used to interactively generate intuitive graphs to analysis and interpret the data. For Knowledge Representation to support DHL Decision Making, we have compiled our findings in Microsoft Word document.

6 Data Preparation Data Preprocessing, Integration and Selection Selection of Tables As the objective of the data mining project is customer segmentation, only customer transaction information and other tables that may be useful is selected. The following ETL process is then carried out as shown in the diagrams below: - Phase 1: Useful attributes are then selected and combined into a single table via Inner joins - Phase 2: Aggregation All transactions that belongs to the same customers are aggregated into a single bill_account record - Phase 2: Bucketing the aggregated temporal numerical values into 12 separate months, with 13 th attribute as the total. The final table, that is ready for data mining, has a total of 70 columns, and 20,386 unique customer accounts/records.

7 Reduction of dimensions by the selection of key attributes for clustering input Based on the discussion with the client with domain knowledge, 65 attributes columns were used as inputs to overcome the problem of the curse of dimensionality. Selection and Separating Customer Transaction Data into Regular and Irregular Customers for Mining For the purpose of future analysis, reference & targeted marketing for DHL, we have saved the records (data points) into separate fact tables for each cluster. Percentage No. of Customers (N) No of Regular Customer Transactions 1,975, % 4478 No. of Irregular Customer Transactions 531, % Total: 2,507, % Definition and Rationale for splitting into Regular and Irregular Customers We have defined regular customers as customers with bill account that has consistent engagement (no zero values) with DHL for all the months. The rest of the customers will be classified as irregular customers. Some of them have zero values for transactions in between months. And some are new customers with zero at the beginning of the months. However for simplicity and for the purpose of clustering, we have separated them.

8 Regular Customers: Data Mining using Clustering 65 attributes columns were used as inputs, with the key attributes as number of (1) transactions, (2) weight, (3) pieces, (4) shipments, (5) revenue for each month and its aggregated totals. An example is shown here. Revenue_1 represents the aggregated revenue in dollars for the month of January Hierarchical Clustering Results Ward Hierarchical (Cluster Frequencies) Key Clusters (based on Counts): 1) Cluster 1 2) Cluster 11 3) Cluster 19 By analyzing the dendrogram and the distinctive distance between cluster 21 and 22 onwards, we have decided to prune the tree and choose 21 clusters.

9 Regular Customers: Cluster Analysis with Interactive Visualization In this section, after we have generated the clusters, we will use visual analytics like parallel plot, histogram, tree map and other graphs to understand, group and interpret the clusters to find useful patterns that may support decision making. After clustering, we moved on to a more visual and analytical layout for data exploration. Interactive data visualization through the use of parallel plots, histograms, tree maps and bubble plots covers almost all the different properties that the data has in order for us to interpret it effectively. Every point in the multi dimensional data was mapped as a line on a 2 dimensional plane using parallel plots.

10 Using Parallel Plot to decide Key Cluster Groups Focus for Regular Customers In order to not use data mining blindly (simply using machine learning) with the default K=20 size, we will use data visualization with parallel plots and human interpretation. As shown with parallel plots, we can clearly see a clear general pattern to help to differentiate the cluster groupings (colours) for further analysis. See Zoomed in view Grouping 1 (Total N= 3713) Cluster 1 (N=3713) Grouping 2 (Total N= 415) Cluster 11 (N=415) Grouping 3 (Total N= 123) Cluster 19 (N=123) Grouping 4 (Total N= 171) Cluster 6 (N=97) Cluster 16 (N=38) Cluster 2 (N=23) Cluster 5 (N=4) Cluster 21 (N=4) Cluster 4 (N=1) Cluster 18 (N=5) Grouping 5 (Total N= 55) Cluster 7 (N=1) Cluster 8 (N=17) Cluster 14 (N=7) Cluster 10 (N=21) Cluster 13 (N=3) Cluster 9 (N=1) Cluster 3 (N=1) Cluster 20 (N=1) Cluster 12 (N=1) Cluster 17 (N=1) Cluster 15 (N=1) Analysis Isolated cluster with very high values for all attributes Largest no. of customers Very low no. of customer transactions Moderately high values for other attributes 2nd largest no. of customers Very low no. of customer transactions High total billed weight Low pieces and shipments Very low no. of customer transactions 4th Largest no. of customers Very low no. of customer transactions Very low billed weight Very low to Low pieces and shipments 3rd Largest no. of customers Very low no. of customer transactions Very low to low total billed weight Very low pieces and shipments Smallest size market (55 Customers)

11 Zoomed in view Cluster Group Parallel Plot Analysis While clusters 1 and 11 show a similar behavior across different attributes, cluster 19 follows a pattern of sharp dips and rises relatively. This can be attributed to the drop in sum of pieces. Similar distribution followed by lines of different colors representing different groupings i.e. the clusters can in turn be grouped into different groups with similar characteristics. By zooming in, and observing the patterns, we are able to see some similarities and differences for the different clusters. The clusters with similar patterns has been grouped into a cluster grouping and given a particular colour to help us distinguish. For example, we can see that cluster 2 and cluster 21 are similar. They both have a relatively high sum_of_pieces value and sum_of_shipments as compared to their other values, creating a mountain-like shape that can be easily distinguished with the human eye. Here we can see how Parallel plot can help us effectively and intuitively group the clusters. For further analysis, by creating a new col ClusterGrp and assigning each record to a cluster group based on their patterns, we can generate summary statistics to analyse the Cluster Groups further.

12 Summary Statistics of Cluster Groups Analysis Cluster Grp 5 and 3 has the similar highest revenue per transaction ($400+/trans) and revenue per shipments. I could possibly further group them as a customer segment which makes up of a sizeable 38% ( ) of the total revenue market share. Cluster Grp 1 an extremely high revenue per weight volume. 4 By generating another parallel plot with useful ratios, we can make more useful analysis. Note that since, shipment and transaction values are almost the same, they are interchangeable. 1 2 Cluster Grp 1: Has the highest rev per weight and the highest % of the total revenue 30% (market), and a relatively low weight per trans/shipment. We perceive that most transactions in Cluster Grp 1 are likely to be transported by air. However, as the number of customers (N) is extremely high (3713), we cannot conclude that they are air-based. It could mean that these shipments are light-weight land/sea based but they are timecritical, thus explaining for their high revenue per transaction. Cluster Grp 3 and 5: They have similar patterns. Cluster 3 and 5 can be merged to give a sizeable market revenue size of 39%. Given their low revenue per unit weight and high weight per shipment, they are likely to be provided as a high volume, slow delivery, container-based service transported by land or ship. Cluster Grp 2: Seem to be a middle-tier customer segment with moderate revenue per trans, and moderate revenue per unit weight, with a total of 15% total revenue. Cluster Grp 4: Has very low revenue/transaction and low weight per shipment/transaction comprising of 16% of the total revenue market. However, they have the highest revenue per weight.

13 Based on this graph we can see that Cluster Grp 1 (linear) and 5 are important to focus on as they make up of more than 50% of the total regular customers revenue. Cluster Grp 5 is interesting because it comprises a high revenue with just half of the no. of customer transactions. It would be interesting to find out why is this so. Cluster Grp 5 has the largest billed weight %.

14 Key Cluster Analysis for REGULAR Customers using Histogram to present distribution of different Cluster Groups For the purpose of future analysis, reference & targeted marketing for DHL, we have saved the records (data points) into separate fact tables for each cluster. Group 1, containing Cluster 1 (N=3713) Cluster 1 has too many outliners. Needs further classification. Use Decision Tree? Relatively Small Total_Rev But many N. M=734 M=28000 Group 2, containing Cluster 11 (N=415) Medium Total_Rev M= M=4116 Group 3, containing Cluster 19 (N=123) Large Total_Rev M=17100 M=375000

15 *For the purpose of analyzing, we have used sampling by selecting the largest representative cluster from the group to generate its distribution. Grouping 4 Cluster 6 (Total N= 150) Cluster 6 (N=97)* Cluster 16 (N=38) Cluster 5 (N=4) Cluster 21 (N=4) Cluster 4 (N=1) Cluster 18 (N=5) Cluster 17 (N=1) M= Grouping 5 (Total N= 54) Cluster 8 (N=17) Cluster 14 (N=7) Cluster 10 (N=21)* Cluster 13 (N=3) Cluster 9 (N=1) Cluster 3 (N=1) Cluster 20 (N=1) Cluster 12 (N=1) Cluster 3 (N=1) Cluster 17 (N=1) Cluster 10 M=700000

16 Validating Hierarchical Clustering (Key Clusters) with another Clustering technique, K-Means By looking the graph comparison, it seems at first glance that the results of the different techniques are similar. To validate if it is actually similar, we will look at the actual mapping of each cluster based on distribution, and their key attributes. Based on the graph generated using Graph Builder, it seems like the Ward clusters can be mapped to the K-means cluster using the count. Ward Cluster 1 K-Means Cluster 17 Ward Cluster 11 K-Means Cluster 6 Ward Cluster 19 K-Means Cluster 9 However, as it might be presumptuous to just conclude that the results of the two different clustering techniques are the same, we will investigate further by generating the distribution for each mapping to see if they are pointing to the same data records. Hierarchical & K-means Mapping Similarity (Hierarchical Cluster 1 VS K-means Cluster 17) Hierarchical Cluster 1 (N=3713) K-means Cluster 17 (N=3892) M=33100 Ward Cluster 1 K-Means Cluster 17: As their distribution and their mean are somewhat similar, we can safely conclude that the two cluster results are the same.

17 Hierarchical & K-means Mapping Similarity (Hierarchical Cluster 11 VS K-means Cluster 6) Hierarchical Cluster 11 (N=415) M= K-means Cluster 6 (N=421) M= Ward Cluster 11 K-Means Cluster 6: As their distribution and their mean are somewhat similar, we can safely conclude that the two cluster results are the same.

18 Hierarchical & K-means Mapping Similarity (Hierarchical Cluster 19 VS K-means Cluster 9) Hierarchical Cluster 19 (N=123) Analysis: K-means Cluster 9 (N=90) Analysis: M= Ward Cluster 19 K-Means Cluster 9: As their distribution and their mean are somewhat similar, we can safely conclude that the two cluster results are the same. Conclusion: The results of the two different Clustering techniques give similar results Thus we can simply use a clustering technique to segment and interpret the data.

19 Further Analysis to aid interpretation with Tree Map & industry attributes In order to drill further down into the data set and view the industrial backgrounds of different cluster groupings, we decided to generate tree maps for our analysis. The tree maps used represent both the Division Description as well as the Major Group Description in the form of nested rectangles. By joining the industry code directory table with the customer and transaction table, we can further analyse the key clusters based on their industry. Reason for using Tree maps: They make a very efficient use of space. Thus, we used it to display a lot of industries simultaneously See different patters within the data by making use of the correlation between size and color. Tree Map Analysis using cluster groupings Cluster Grouping 1 (size by local_revenue) Cluster Grouping 1 (size by billed_weight)

20 Cluster Grouping 2 (size by local_revenue) Cluster Grouping 2 (size by billed_weight) Cluster Grouping 3 (size by local_revenue) Cluster Grouping 3 (size by billed_weight)

21 Cluster Grouping 4 (size by local_revenue) Cluster Grouping 4 (size by billed_weight) Cluster Grouping 5 (size by local_revenue) Cluster Grouping 5 (size by billed_weight)

22 Cluster Grouping Tree Map Analysis: Tree Maps is not directly linked to clustering but is rather used to understand the grouped cluster by studying their industrial details. Thus, although not directly a part of clustering, tree maps helped us to view the industrial backgrounds. Clusters and cluster groupings have a low variability when it comes to sizing using different attributes i.e local revenue and billed weight. For almost all the cluster groupings, the local revenue bears a positive correlation with billed weight. Thus the contribution by different clusters and cluster groupings to the local revenue and the billed weight is directly proportional, an exception being cluster 1 where the Electronics industry appears to contribute a major proportion of the billed weight unlike its contribution to the local revenue. Tree Map as a data visualization technique is rather static in nature. But using JMP as our tool for mining as well as statistical analysis, we were able to dynamically filter out the data based on cluster groupings and generate tree maps dynamically. Thus, handling tree maps for interactive analytics was something that we have learnt through this project. Electronics and General Merchandise expand across all the segments/ cluster groupings. Cluster Grouping 1 focuses more on commodity shipping with a relatively low weight. Linking it back to our cluster findings Cluster Groupings Grp 1 (N=3713) Grp 3 and 5 Total=178 (N=123) (N=55) Grp 2 (N=415) Grp 4 (N=171) Cluster Properties Highest revenue per weight Very high values for all attributes Largest no. of customers High weight per shipment Low revenue per unit weight Moderate revenue per trans Moderate revenue per unit weight 2nd largest no. of customers Has very low revenue/transaction Low weight per shipment/transaction 3rd Largest no. of customers Industrial details (Key contributors) Electronics Wholesale General Merchandise Electronics Industrial Machinery General Merchandise Electronics Wholesale General Merchandise Electronics Wholesale General Merchandise Industrial Machinery Revenue Conclusion % 30% Grp 1 and Grp 3&5 have almost the same industrial background. The major differentiator for industry being Industrial 39% machinery for grp 2 that along with relatively low customers attributes to its high weight per shipment. Cluster grp 1: Relatively low weight per shipment due to a very high number of customers. Industrial machinery has a huge proportion (explains for high weight per shipment) 15% Grp 2 and grp 4 have almost the same contribution to revenue. 16% Parallel Plots Group 2(revenue/ weight) is TWICE Group 4 s (revenue/ weight)

23 Bubble Plot Analysis Cluster Groupings(1->5) Cluster Groupings(1->5)

24 Clusters(1->21) Clusters(1->21) In order to represent the data using the 3 key attributes (local revenue, billed weight and shipments) that contributed in the cluster formation and to further embark on a temporal analysis, we decided to use a bubble plots. In order to do a trend analysis, we will take the example of cluster grouping 5. The revenue increases during the first 8 months and so does the revenue. However, the revenue falls during the later part of the year with the maximum dip observed in the December. The size of the bubble (number of shipments) remains the same throughout the year except for December when the number falls below the average. Plotting the tree map using local revenue to vary the bubble size, we get a similar trend for cluster grouping 5. Findings & Recommendations for Decision Making for the Regular Customers Segment Due to the limited domain knowledge we have about the logistic industry and the business, we find it hard to come up with detailed recommendations.

25 Irregular Customers: Cluster Analysis with Interactive Visualization Based on statistical analysis, the irregular customers (at least one 0 revenue in a month) make up of 1/3 of the total revenue. Hierarchical clustering (ward) Deciding on the number of cluster Number of cluster: 21. Most important clusters are 1 and 21

26 Cluster 1 Cluster 21 Cluster 1 The mean is 12 while the median is 6. Which means this data set is skewed. For the measure of position, median will be better. For majority of the customer inside this cluster, the total number of transaction is around 6 and the total revenue per year is only From these two number, we can roughly get the average revenue per transaction (1436/6=239.33).

27 Cluster 21 For this cluster, even the customers did not have consistent transaction for every month, but the total number of transactions and revenue is still considerable and we should not ignore them. For this cluster, the difference between the mean and median is not very big, so we can take mean as a measure of the position. In average, customers in this cluster have 80 transactions per year which means 6 transactions per month. We should study on the characteristic of this group and find out ways to make these customers become our loyal or regular customers. Data Exploration To find out the unique characteristic of customers whose total number of transactions is above 500, we will examine the follow subset of records. bill_account trans_1 trans_2 trans_3 trans_4 trans_5 trans_6 trans_7 trans_8 trans_9 trans_10 trans_11 trans_ Observation of the data: For irregular customers, some of them may be new customers to DHL in 2008 and should not be categorized as the irregular customer. For example, customer A started to have transaction with DHL from May and have consistent number of transactions after May till the end of the year, even though this customer does not have transaction before May, it does not mean he is not loyal or consistent. We should try to separate these customers from other irregular customers.

28 Another point I personal felt very important is I observe many customers have a lot of transaction with DHL for the first few months, then the number of the transaction gradually decrease till 0. We should try to find out the reason behind it. Is it because of the nature of business (For example, fishing activities are determined by seasons). Or it is because DHL were losing their customers to the competitors. Limitations of Data Mining (Clustering) Based on what we analyzed, there are new regular customers (consistent) but they begin their transactions with DHL after than Jan. For simplicity sake, we have decided to exclude these regular irregular customers, in the regular analysis. We have analyzed about 4000 customers which has no zero transactions/revenues in all the months. Since our focus is on the customer segmentation, by the means of clustering, we find that this analysis is sufficient to derive unique clusters for groups of customers (N>100). However, though these customers are excluded, for the purpose of targeted Marketing (Advertisement), these specific customers should be included as well. Further analysis could to be done on the Irregular Customers Further studies on new customers (with trailing zero values) and customers with decreasing transactions could be mined to come up with recommendations to help employ targeted marketing and encourage pricing discounts to keep and win back customers. Challenges, Limitations and Lessons Learnt 1) Use of Interactive Visual Analytics & Data Mining Techniques To validate our final results (via human data analyst) customer segmentation into 5 cluster groupings, we went back to the Ward clustering dendrogram and set the no. of clusters as 5, and the results are as shown on the right. By comparing the system-generated results and our results, we realized that we have overlooked the fact that there are 2 outliers Cluster 3 and Cluster 9 the 2 largest customers. That we have identified at the start at our Initial Manual Logical Segmentation (A & B) using Basic Statistics. To rectify this, we can break our initial 5 cluster groupings into 6 cluster grouping by creating a separate cluster grouping for the two largest customers. The system generated 5 clusters has correctly clustered cluster 3, and 9 into a single cluster. However, the rest of the groupings (grouping cluster 1, 11, and 19 to Cluster 1) do not make any sense, thus showing that a human data mining analyst is still important to coming up with useful and logical findings.

29 2) Data complexity and understanding, and lack of domain knowledge 3) Doing Clustering with software (SAS Enterprise Miner to JMP) 4) To come up with meaningful and useful patterns for decision making Final Conclusion Knowledge discovery with DHL large data set has enabled us to adopt the roles of a Database Analyst (DBA), Data Analyst, and some aspect of a Business Analyst. We have identified 6 unique customer segments for DHL by using various clustering techniques and validating them. Interactive visual analytics and data mining techniques can empower everyday data analysts to gain insights and formulate informed decision. We find that the the best combination is to have an intelligent data mining analyst who has deep industry knowledge in the field (eg. Global Logistics) and yet have a deep understanding of data mining algorithms and techniques to apply and use. That way, useful and relevant findings and recommendations can be communicated to the decision makers of the business, to realize the full potential of data mining for business intelligence.