DHL Data Mining Project. Customer Segmentation with Clustering

Size: px
Start display at page:

Download "DHL Data Mining Project. Customer Segmentation with Clustering"

Transcription

1 DHL Data Mining Project Customer Segmentation with Clustering Timothy TAN Chee Yong Aditya Hridaya MISRA Jeffery JI Jun Yao 3/30/2010

2 DHL Data Mining Project Table of Contents Introduction to DHL and the data... 3 Problem... 3 Clustering Methods & Motivation... 3 Initial Manual Logical Segmentation (A & B) using Basic Statistics... 4 KDD Process, Methodology & Software used... 5 Data Preparation Data Preprocessing, Integration and Selection... 6 Selection and Separating Customer Transaction Data into Regular and Irregular Customers for Mining... 7 Regular Customers: Data Mining using Clustering... 8 Regular Customers: Cluster Analysis with Interactive Visualization... 9 Using Parallel Plot to decide Key Cluster Groups Focus for Regular Customers Summary Statistics of Cluster Groups Key Cluster Analysis for REGULAR Customers using Histogram to present distribution of different Cluster Groups Validating Hierarchical Clustering (Key Clusters) with another Clustering technique, K-Means Further Analysis to aid interpretation with Tree Map & industry attributes Tree Map Analysis using cluster groupings Findings & Recommendations for Decision Making for the Regular Customers Segment Irregular Customers: Cluster Analysis with Interactive Visualization Final Conclusion... 29

3 Introduction to DHL and the data Product: Real-world DHL 2008 Malaysia data set Objective: To analyze using data mining to perform customer segmentation and analyze useful patterns in DHL transaction data for the purpose of coming up with recommendations that can help in marketing, pricing and future Business Intelligence decisions. About DHL DHL is the global market leader of the international express and logistics industry, specializing in providing innovative and customized solutions from a single source. DHL offers expertise in express, air and ocean freight, overland transport, contract logistics solutions as well as international mail services, combined with worldwide coverage and an in-depth understanding of local markets. DHL's international network links more than 220 countries and territories worldwide. Some 300,000 employees are dedicated to providing fast and reliable services that exceed customers' expectation. Problem DHL lacks the information about their customer segmentation and profile to help them to make decisions in marketing, pricing and other business decisions. While they have strong domain knowledge and a rich past transaction data, they lack the expertise to mine out interesting patterns. Clustering Methods & Motivation The customer transaction data of a logistics company was analyzed. Data mining, clustering in particular, and visualization techniques were used to find meaningful relationships and patterns within the existing data. These techniques enable the company to target their customers more efficiently and improve their marketing processes. They also equip the organization with the much needed market intelligence that would enable it to gain a much better insight and visibility of its customers. SAS JMP was used to interactively visualize the data and cluster the customers based on their properties and characteristics. JMP also enabled us to evaluate the transactional patterns and data presentation and interpretation through intuitive graphs. Parallel plots were used to study pattern across clusters and bubble plots enabled us to identify temporal patterns within customer accounts. For accurate decision making, we used the knowledge discovery process (KDD), transforming the raw data into useful knowledge. The raw data was pre processed using SAS tools and explored using SAS JMP. The statistical summarizing features of JMP in synergy with the visualization techniques increase the potential for business decision making. The customers were broadly segmented as regular and irregular customers. Cluster and temporal analysis enabled the decision making for all the customer segments.

4 Initial Manual Logical Segmentation (A & B) using Basic Statistics For completeness, we use simple tools like Excel to explore the data and logically segment the data, without using of any Data Exploration or Mining software. Logical Segment A Alternate Logical Segment B For simplicity, we have broken it into 3 logical buckets to best represent the highly skewed and uneven data. Rough Sketch to show the distribution of the data.

5 KDD Process, Methodology & Software used Slides taken from Data Mining class slides. For Cleaning/Integration, we used SAS Integration Studio to clean and import the structured data into the SAS Suite. For Selection & Transformation, SAS Enterprise Guide was used to select and transform individual transaction records into derived aggregated customer monthly and total data, for attributes like total_revenue, total_shipments. The transaction data was also segmented into regular and irregular data in preparation for mining. For Mining, SAS Enterprise Miner was initially used, however, due to the limited hardware resources, large dataset, and the less-friendly user interface that Miner has, we decided to switch to SAS JMP a light-weight Data Mining software under the SAS Suite for our mining tasks. Benefits of SAS JMP over SAS Enterprise Miner - light-weight thus faster - supports for dendrogram for clustering - supports interactive visual analytics For Pattern Evaluation & Data Presentation, SAS JMP is used to interactively generate intuitive graphs to analysis and interpret the data. For Knowledge Representation to support DHL Decision Making, we have compiled our findings in Microsoft Word document.

6 Data Preparation Data Preprocessing, Integration and Selection Selection of Tables As the objective of the data mining project is customer segmentation, only customer transaction information and other tables that may be useful is selected. The following ETL process is then carried out as shown in the diagrams below: - Phase 1: Useful attributes are then selected and combined into a single table via Inner joins - Phase 2: Aggregation All transactions that belongs to the same customers are aggregated into a single bill_account record - Phase 2: Bucketing the aggregated temporal numerical values into 12 separate months, with 13 th attribute as the total. The final table, that is ready for data mining, has a total of 70 columns, and 20,386 unique customer accounts/records.

7 Reduction of dimensions by the selection of key attributes for clustering input Based on the discussion with the client with domain knowledge, 65 attributes columns were used as inputs to overcome the problem of the curse of dimensionality. Selection and Separating Customer Transaction Data into Regular and Irregular Customers for Mining For the purpose of future analysis, reference & targeted marketing for DHL, we have saved the records (data points) into separate fact tables for each cluster. Percentage No. of Customers (N) No of Regular Customer Transactions 1,975, % 4478 No. of Irregular Customer Transactions 531, % Total: 2,507, % Definition and Rationale for splitting into Regular and Irregular Customers We have defined regular customers as customers with bill account that has consistent engagement (no zero values) with DHL for all the months. The rest of the customers will be classified as irregular customers. Some of them have zero values for transactions in between months. And some are new customers with zero at the beginning of the months. However for simplicity and for the purpose of clustering, we have separated them.

8 Regular Customers: Data Mining using Clustering 65 attributes columns were used as inputs, with the key attributes as number of (1) transactions, (2) weight, (3) pieces, (4) shipments, (5) revenue for each month and its aggregated totals. An example is shown here. Revenue_1 represents the aggregated revenue in dollars for the month of January Hierarchical Clustering Results Ward Hierarchical (Cluster Frequencies) Key Clusters (based on Counts): 1) Cluster 1 2) Cluster 11 3) Cluster 19 By analyzing the dendrogram and the distinctive distance between cluster 21 and 22 onwards, we have decided to prune the tree and choose 21 clusters.

9 Regular Customers: Cluster Analysis with Interactive Visualization In this section, after we have generated the clusters, we will use visual analytics like parallel plot, histogram, tree map and other graphs to understand, group and interpret the clusters to find useful patterns that may support decision making. After clustering, we moved on to a more visual and analytical layout for data exploration. Interactive data visualization through the use of parallel plots, histograms, tree maps and bubble plots covers almost all the different properties that the data has in order for us to interpret it effectively. Every point in the multi dimensional data was mapped as a line on a 2 dimensional plane using parallel plots.

10 Using Parallel Plot to decide Key Cluster Groups Focus for Regular Customers In order to not use data mining blindly (simply using machine learning) with the default K=20 size, we will use data visualization with parallel plots and human interpretation. As shown with parallel plots, we can clearly see a clear general pattern to help to differentiate the cluster groupings (colours) for further analysis. See Zoomed in view Grouping 1 (Total N= 3713) Cluster 1 (N=3713) Grouping 2 (Total N= 415) Cluster 11 (N=415) Grouping 3 (Total N= 123) Cluster 19 (N=123) Grouping 4 (Total N= 171) Cluster 6 (N=97) Cluster 16 (N=38) Cluster 2 (N=23) Cluster 5 (N=4) Cluster 21 (N=4) Cluster 4 (N=1) Cluster 18 (N=5) Grouping 5 (Total N= 55) Cluster 7 (N=1) Cluster 8 (N=17) Cluster 14 (N=7) Cluster 10 (N=21) Cluster 13 (N=3) Cluster 9 (N=1) Cluster 3 (N=1) Cluster 20 (N=1) Cluster 12 (N=1) Cluster 17 (N=1) Cluster 15 (N=1) Analysis Isolated cluster with very high values for all attributes Largest no. of customers Very low no. of customer transactions Moderately high values for other attributes 2nd largest no. of customers Very low no. of customer transactions High total billed weight Low pieces and shipments Very low no. of customer transactions 4th Largest no. of customers Very low no. of customer transactions Very low billed weight Very low to Low pieces and shipments 3rd Largest no. of customers Very low no. of customer transactions Very low to low total billed weight Very low pieces and shipments Smallest size market (55 Customers)

11 Zoomed in view Cluster Group Parallel Plot Analysis While clusters 1 and 11 show a similar behavior across different attributes, cluster 19 follows a pattern of sharp dips and rises relatively. This can be attributed to the drop in sum of pieces. Similar distribution followed by lines of different colors representing different groupings i.e. the clusters can in turn be grouped into different groups with similar characteristics. By zooming in, and observing the patterns, we are able to see some similarities and differences for the different clusters. The clusters with similar patterns has been grouped into a cluster grouping and given a particular colour to help us distinguish. For example, we can see that cluster 2 and cluster 21 are similar. They both have a relatively high sum_of_pieces value and sum_of_shipments as compared to their other values, creating a mountain-like shape that can be easily distinguished with the human eye. Here we can see how Parallel plot can help us effectively and intuitively group the clusters. For further analysis, by creating a new col ClusterGrp and assigning each record to a cluster group based on their patterns, we can generate summary statistics to analyse the Cluster Groups further.

12 Summary Statistics of Cluster Groups Analysis Cluster Grp 5 and 3 has the similar highest revenue per transaction ($400+/trans) and revenue per shipments. I could possibly further group them as a customer segment which makes up of a sizeable 38% ( ) of the total revenue market share. Cluster Grp 1 an extremely high revenue per weight volume. 4 By generating another parallel plot with useful ratios, we can make more useful analysis. Note that since, shipment and transaction values are almost the same, they are interchangeable. 1 2 Cluster Grp 1: Has the highest rev per weight and the highest % of the total revenue 30% (market), and a relatively low weight per trans/shipment. We perceive that most transactions in Cluster Grp 1 are likely to be transported by air. However, as the number of customers (N) is extremely high (3713), we cannot conclude that they are air-based. It could mean that these shipments are light-weight land/sea based but they are timecritical, thus explaining for their high revenue per transaction. Cluster Grp 3 and 5: They have similar patterns. Cluster 3 and 5 can be merged to give a sizeable market revenue size of 39%. Given their low revenue per unit weight and high weight per shipment, they are likely to be provided as a high volume, slow delivery, container-based service transported by land or ship. Cluster Grp 2: Seem to be a middle-tier customer segment with moderate revenue per trans, and moderate revenue per unit weight, with a total of 15% total revenue. Cluster Grp 4: Has very low revenue/transaction and low weight per shipment/transaction comprising of 16% of the total revenue market. However, they have the highest revenue per weight.

13 Based on this graph we can see that Cluster Grp 1 (linear) and 5 are important to focus on as they make up of more than 50% of the total regular customers revenue. Cluster Grp 5 is interesting because it comprises a high revenue with just half of the no. of customer transactions. It would be interesting to find out why is this so. Cluster Grp 5 has the largest billed weight %.

14 Key Cluster Analysis for REGULAR Customers using Histogram to present distribution of different Cluster Groups For the purpose of future analysis, reference & targeted marketing for DHL, we have saved the records (data points) into separate fact tables for each cluster. Group 1, containing Cluster 1 (N=3713) Cluster 1 has too many outliners. Needs further classification. Use Decision Tree? Relatively Small Total_Rev But many N. M=734 M=28000 Group 2, containing Cluster 11 (N=415) Medium Total_Rev M= M=4116 Group 3, containing Cluster 19 (N=123) Large Total_Rev M=17100 M=375000

15 *For the purpose of analyzing, we have used sampling by selecting the largest representative cluster from the group to generate its distribution. Grouping 4 Cluster 6 (Total N= 150) Cluster 6 (N=97)* Cluster 16 (N=38) Cluster 5 (N=4) Cluster 21 (N=4) Cluster 4 (N=1) Cluster 18 (N=5) Cluster 17 (N=1) M= Grouping 5 (Total N= 54) Cluster 8 (N=17) Cluster 14 (N=7) Cluster 10 (N=21)* Cluster 13 (N=3) Cluster 9 (N=1) Cluster 3 (N=1) Cluster 20 (N=1) Cluster 12 (N=1) Cluster 3 (N=1) Cluster 17 (N=1) Cluster 10 M=700000

16 Validating Hierarchical Clustering (Key Clusters) with another Clustering technique, K-Means By looking the graph comparison, it seems at first glance that the results of the different techniques are similar. To validate if it is actually similar, we will look at the actual mapping of each cluster based on distribution, and their key attributes. Based on the graph generated using Graph Builder, it seems like the Ward clusters can be mapped to the K-means cluster using the count. Ward Cluster 1 K-Means Cluster 17 Ward Cluster 11 K-Means Cluster 6 Ward Cluster 19 K-Means Cluster 9 However, as it might be presumptuous to just conclude that the results of the two different clustering techniques are the same, we will investigate further by generating the distribution for each mapping to see if they are pointing to the same data records. Hierarchical & K-means Mapping Similarity (Hierarchical Cluster 1 VS K-means Cluster 17) Hierarchical Cluster 1 (N=3713) K-means Cluster 17 (N=3892) M=33100 Ward Cluster 1 K-Means Cluster 17: As their distribution and their mean are somewhat similar, we can safely conclude that the two cluster results are the same.

17 Hierarchical & K-means Mapping Similarity (Hierarchical Cluster 11 VS K-means Cluster 6) Hierarchical Cluster 11 (N=415) M= K-means Cluster 6 (N=421) M= Ward Cluster 11 K-Means Cluster 6: As their distribution and their mean are somewhat similar, we can safely conclude that the two cluster results are the same.

18 Hierarchical & K-means Mapping Similarity (Hierarchical Cluster 19 VS K-means Cluster 9) Hierarchical Cluster 19 (N=123) Analysis: K-means Cluster 9 (N=90) Analysis: M= Ward Cluster 19 K-Means Cluster 9: As their distribution and their mean are somewhat similar, we can safely conclude that the two cluster results are the same. Conclusion: The results of the two different Clustering techniques give similar results Thus we can simply use a clustering technique to segment and interpret the data.

19 Further Analysis to aid interpretation with Tree Map & industry attributes In order to drill further down into the data set and view the industrial backgrounds of different cluster groupings, we decided to generate tree maps for our analysis. The tree maps used represent both the Division Description as well as the Major Group Description in the form of nested rectangles. By joining the industry code directory table with the customer and transaction table, we can further analyse the key clusters based on their industry. Reason for using Tree maps: They make a very efficient use of space. Thus, we used it to display a lot of industries simultaneously See different patters within the data by making use of the correlation between size and color. Tree Map Analysis using cluster groupings Cluster Grouping 1 (size by local_revenue) Cluster Grouping 1 (size by billed_weight)

20 Cluster Grouping 2 (size by local_revenue) Cluster Grouping 2 (size by billed_weight) Cluster Grouping 3 (size by local_revenue) Cluster Grouping 3 (size by billed_weight)

21 Cluster Grouping 4 (size by local_revenue) Cluster Grouping 4 (size by billed_weight) Cluster Grouping 5 (size by local_revenue) Cluster Grouping 5 (size by billed_weight)

22 Cluster Grouping Tree Map Analysis: Tree Maps is not directly linked to clustering but is rather used to understand the grouped cluster by studying their industrial details. Thus, although not directly a part of clustering, tree maps helped us to view the industrial backgrounds. Clusters and cluster groupings have a low variability when it comes to sizing using different attributes i.e local revenue and billed weight. For almost all the cluster groupings, the local revenue bears a positive correlation with billed weight. Thus the contribution by different clusters and cluster groupings to the local revenue and the billed weight is directly proportional, an exception being cluster 1 where the Electronics industry appears to contribute a major proportion of the billed weight unlike its contribution to the local revenue. Tree Map as a data visualization technique is rather static in nature. But using JMP as our tool for mining as well as statistical analysis, we were able to dynamically filter out the data based on cluster groupings and generate tree maps dynamically. Thus, handling tree maps for interactive analytics was something that we have learnt through this project. Electronics and General Merchandise expand across all the segments/ cluster groupings. Cluster Grouping 1 focuses more on commodity shipping with a relatively low weight. Linking it back to our cluster findings Cluster Groupings Grp 1 (N=3713) Grp 3 and 5 Total=178 (N=123) (N=55) Grp 2 (N=415) Grp 4 (N=171) Cluster Properties Highest revenue per weight Very high values for all attributes Largest no. of customers High weight per shipment Low revenue per unit weight Moderate revenue per trans Moderate revenue per unit weight 2nd largest no. of customers Has very low revenue/transaction Low weight per shipment/transaction 3rd Largest no. of customers Industrial details (Key contributors) Electronics Wholesale General Merchandise Electronics Industrial Machinery General Merchandise Electronics Wholesale General Merchandise Electronics Wholesale General Merchandise Industrial Machinery Revenue Conclusion % 30% Grp 1 and Grp 3&5 have almost the same industrial background. The major differentiator for industry being Industrial 39% machinery for grp 2 that along with relatively low customers attributes to its high weight per shipment. Cluster grp 1: Relatively low weight per shipment due to a very high number of customers. Industrial machinery has a huge proportion (explains for high weight per shipment) 15% Grp 2 and grp 4 have almost the same contribution to revenue. 16% Parallel Plots Group 2(revenue/ weight) is TWICE Group 4 s (revenue/ weight)

23 Bubble Plot Analysis Cluster Groupings(1->5) Cluster Groupings(1->5)

24 Clusters(1->21) Clusters(1->21) In order to represent the data using the 3 key attributes (local revenue, billed weight and shipments) that contributed in the cluster formation and to further embark on a temporal analysis, we decided to use a bubble plots. In order to do a trend analysis, we will take the example of cluster grouping 5. The revenue increases during the first 8 months and so does the revenue. However, the revenue falls during the later part of the year with the maximum dip observed in the December. The size of the bubble (number of shipments) remains the same throughout the year except for December when the number falls below the average. Plotting the tree map using local revenue to vary the bubble size, we get a similar trend for cluster grouping 5. Findings & Recommendations for Decision Making for the Regular Customers Segment Due to the limited domain knowledge we have about the logistic industry and the business, we find it hard to come up with detailed recommendations.

25 Irregular Customers: Cluster Analysis with Interactive Visualization Based on statistical analysis, the irregular customers (at least one 0 revenue in a month) make up of 1/3 of the total revenue. Hierarchical clustering (ward) Deciding on the number of cluster Number of cluster: 21. Most important clusters are 1 and 21

26 Cluster 1 Cluster 21 Cluster 1 The mean is 12 while the median is 6. Which means this data set is skewed. For the measure of position, median will be better. For majority of the customer inside this cluster, the total number of transaction is around 6 and the total revenue per year is only From these two number, we can roughly get the average revenue per transaction (1436/6=239.33).

27 Cluster 21 For this cluster, even the customers did not have consistent transaction for every month, but the total number of transactions and revenue is still considerable and we should not ignore them. For this cluster, the difference between the mean and median is not very big, so we can take mean as a measure of the position. In average, customers in this cluster have 80 transactions per year which means 6 transactions per month. We should study on the characteristic of this group and find out ways to make these customers become our loyal or regular customers. Data Exploration To find out the unique characteristic of customers whose total number of transactions is above 500, we will examine the follow subset of records. bill_account trans_1 trans_2 trans_3 trans_4 trans_5 trans_6 trans_7 trans_8 trans_9 trans_10 trans_11 trans_ Observation of the data: For irregular customers, some of them may be new customers to DHL in 2008 and should not be categorized as the irregular customer. For example, customer A started to have transaction with DHL from May and have consistent number of transactions after May till the end of the year, even though this customer does not have transaction before May, it does not mean he is not loyal or consistent. We should try to separate these customers from other irregular customers.

28 Another point I personal felt very important is I observe many customers have a lot of transaction with DHL for the first few months, then the number of the transaction gradually decrease till 0. We should try to find out the reason behind it. Is it because of the nature of business (For example, fishing activities are determined by seasons). Or it is because DHL were losing their customers to the competitors. Limitations of Data Mining (Clustering) Based on what we analyzed, there are new regular customers (consistent) but they begin their transactions with DHL after than Jan. For simplicity sake, we have decided to exclude these regular irregular customers, in the regular analysis. We have analyzed about 4000 customers which has no zero transactions/revenues in all the months. Since our focus is on the customer segmentation, by the means of clustering, we find that this analysis is sufficient to derive unique clusters for groups of customers (N>100). However, though these customers are excluded, for the purpose of targeted Marketing (Advertisement), these specific customers should be included as well. Further analysis could to be done on the Irregular Customers Further studies on new customers (with trailing zero values) and customers with decreasing transactions could be mined to come up with recommendations to help employ targeted marketing and encourage pricing discounts to keep and win back customers. Challenges, Limitations and Lessons Learnt 1) Use of Interactive Visual Analytics & Data Mining Techniques To validate our final results (via human data analyst) customer segmentation into 5 cluster groupings, we went back to the Ward clustering dendrogram and set the no. of clusters as 5, and the results are as shown on the right. By comparing the system-generated results and our results, we realized that we have overlooked the fact that there are 2 outliers Cluster 3 and Cluster 9 the 2 largest customers. That we have identified at the start at our Initial Manual Logical Segmentation (A & B) using Basic Statistics. To rectify this, we can break our initial 5 cluster groupings into 6 cluster grouping by creating a separate cluster grouping for the two largest customers. The system generated 5 clusters has correctly clustered cluster 3, and 9 into a single cluster. However, the rest of the groupings (grouping cluster 1, 11, and 19 to Cluster 1) do not make any sense, thus showing that a human data mining analyst is still important to coming up with useful and logical findings.

29 2) Data complexity and understanding, and lack of domain knowledge 3) Doing Clustering with software (SAS Enterprise Miner to JMP) 4) To come up with meaningful and useful patterns for decision making Final Conclusion Knowledge discovery with DHL large data set has enabled us to adopt the roles of a Database Analyst (DBA), Data Analyst, and some aspect of a Business Analyst. We have identified 6 unique customer segments for DHL by using various clustering techniques and validating them. Interactive visual analytics and data mining techniques can empower everyday data analysts to gain insights and formulate informed decision. We find that the the best combination is to have an intelligent data mining analyst who has deep industry knowledge in the field (eg. Global Logistics) and yet have a deep understanding of data mining algorithms and techniques to apply and use. That way, useful and relevant findings and recommendations can be communicated to the decision makers of the business, to realize the full potential of data mining for business intelligence.

White Paper. Redefine Your Analytics Journey With Self-Service Data Discovery and Interactive Predictive Analytics

White Paper. Redefine Your Analytics Journey With Self-Service Data Discovery and Interactive Predictive Analytics White Paper Redefine Your Analytics Journey With Self-Service Data Discovery and Interactive Predictive Analytics Contents Self-service data discovery and interactive predictive analytics... 1 What does

More information

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an

More information

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 8/05/2005 1 What is data exploration? A preliminary

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical

More information

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT

More information

<no narration for this slide>

<no narration for this slide> 1 2 The standard narration text is : After completing this lesson, you will be able to: < > SAP Visual Intelligence is our latest innovation

More information

Table of Contents. June 2010

Table of Contents. June 2010 June 2010 From: StatSoft Analytics White Papers To: Internal release Re: Performance comparison of STATISTICA Version 9 on multi-core 64-bit machines with current 64-bit releases of SAS (Version 9.2) and

More information

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation

More information

Customer Analytics. Turn Big Data into Big Value

Customer Analytics. Turn Big Data into Big Value Turn Big Data into Big Value All Your Data Integrated in Just One Place BIRT Analytics lets you capture the value of Big Data that speeds right by most enterprises. It analyzes massive volumes of data

More information

Building Data Cubes and Mining Them. Jelena Jovanovic Email: jeljov@fon.bg.ac.yu

Building Data Cubes and Mining Them. Jelena Jovanovic Email: jeljov@fon.bg.ac.yu Building Data Cubes and Mining Them Jelena Jovanovic Email: jeljov@fon.bg.ac.yu KDD Process KDD is an overall process of discovering useful knowledge from data. Data mining is a particular step in the

More information

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,

More information

not possible or was possible at a high cost for collecting the data.

not possible or was possible at a high cost for collecting the data. Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

More information

Business Intelligence and Process Modelling

Business Intelligence and Process Modelling Business Intelligence and Process Modelling F.W. Takes Universiteit Leiden Lecture 2: Business Intelligence & Visual Analytics BIPM Lecture 2: Business Intelligence & Visual Analytics 1 / 72 Business Intelligence

More information

2015 Workshops for Professors

2015 Workshops for Professors SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market

More information

Data Mining Applications in Higher Education

Data Mining Applications in Higher Education Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Visualization methods for patent data

Visualization methods for patent data Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes

More information

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK Agenda Analytics why now? The process around data and text mining Case Studies The Value of Information

More information

Paper 232-2012. Getting to the Good Part of Data Analysis: Data Access, Manipulation, and Customization Using JMP

Paper 232-2012. Getting to the Good Part of Data Analysis: Data Access, Manipulation, and Customization Using JMP Paper 232-2012 Getting to the Good Part of Data Analysis: Data Access, Manipulation, and Customization Using JMP Audrey Ventura, SAS Institute Inc., Cary, NC ABSTRACT Effective data analysis requires easy

More information

Data Exploration Data Visualization

Data Exploration Data Visualization Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

More information

Visibility optimization for data visualization: A Survey of Issues and Techniques

Visibility optimization for data visualization: A Survey of Issues and Techniques Visibility optimization for data visualization: A Survey of Issues and Techniques Ch Harika, Dr.Supreethi K.P Student, M.Tech, Assistant Professor College of Engineering, Jawaharlal Nehru Technological

More information

SAS JOINT DATA MINING CERTIFICATION AT BRYANT UNIVERSITY

SAS JOINT DATA MINING CERTIFICATION AT BRYANT UNIVERSITY SAS JOINT DATA MINING CERTIFICATION AT BRYANT UNIVERSITY Billie Anderson Bryant University, 1150 Douglas Pike, Smithfield, RI 02917 Phone: (401) 232-6089, e-mail: banderson@bryant.edu Phyllis Schumacher

More information

Data Mining: Overview. What is Data Mining?

Data Mining: Overview. What is Data Mining? Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3 COMP 5318 Data Exploration and Analysis Chapter 3 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping

More information

20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns

20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns 20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns John Aogon and Patrick J. Ogao Telecommunications operators in developing countries are faced with a problem of knowing

More information

Interactive Data Mining and Visualization

Interactive Data Mining and Visualization Interactive Data Mining and Visualization Zhitao Qiu Abstract: Interactive analysis introduces dynamic changes in Visualization. On another hand, advanced visualization can provide different perspectives

More information

All Visualizations Documentation

All Visualizations Documentation All Visualizations Documentation All Visualizations Documentation 2 Copyright and Trademarks Licensed Materials - Property of IBM. Copyright IBM Corp. 2013 IBM, the IBM logo, and Cognos are trademarks

More information

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

VISUALIZING HIERARCHICAL DATA. Graham Wills SPSS Inc., http://willsfamily.org/gwills

VISUALIZING HIERARCHICAL DATA. Graham Wills SPSS Inc., http://willsfamily.org/gwills VISUALIZING HIERARCHICAL DATA Graham Wills SPSS Inc., http://willsfamily.org/gwills SYNONYMS Hierarchical Graph Layout, Visualizing Trees, Tree Drawing, Information Visualization on Hierarchies; Hierarchical

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will

More information

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM. DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

More information

Hierarchical Data Visualization

Hierarchical Data Visualization Hierarchical Data Visualization 1 Hierarchical Data Hierarchical data emphasize the subordinate or membership relations between data items. Organizational Chart Classifications / Taxonomies (Species and

More information

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers 60 Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

Data Visualization Techniques

Data Visualization Techniques Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The

More information

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,

More information

SAS VISUAL ANALYTICS AN OVERVIEW OF POWERFUL DISCOVERY, ANALYSIS AND REPORTING

SAS VISUAL ANALYTICS AN OVERVIEW OF POWERFUL DISCOVERY, ANALYSIS AND REPORTING SAS VISUAL ANALYTICS AN OVERVIEW OF POWERFUL DISCOVERY, ANALYSIS AND REPORTING WELCOME TO SAS VISUAL ANALYTICS SAS Visual Analytics is a high-performance, in-memory solution for exploring massive amounts

More information

The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA

The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA Paper 156-2010 The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA Abstract JMP has a rich set of visual displays that can help you see the information

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1 Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints

More information

Business Intelligence Solutions for Gaming and Hospitality

Business Intelligence Solutions for Gaming and Hospitality Business Intelligence Solutions for Gaming and Hospitality Prepared by: Mario Perkins Qualex Consulting Services, Inc. Suzanne Fiero SAS Objective Summary 2 Objective Summary The rise in popularity and

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Specific Usage of Visual Data Analysis Techniques

Specific Usage of Visual Data Analysis Techniques Specific Usage of Visual Data Analysis Techniques Snezana Savoska 1 and Suzana Loskovska 2 1 Faculty of Administration and Management of Information systems, Partizanska bb, 7000, Bitola, Republic of Macedonia

More information

Visualization Quick Guide

Visualization Quick Guide Visualization Quick Guide A best practice guide to help you find the right visualization for your data WHAT IS DOMO? Domo is a new form of business intelligence (BI) unlike anything before an executive

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

University of Gaziantep, Department of Business Administration

University of Gaziantep, Department of Business Administration University of Gaziantep, Department of Business Administration The extensive use of information technology enables organizations to collect huge amounts of data about almost every aspect of their businesses.

More information

Easily Identify Your Best Customers

Easily Identify Your Best Customers IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do

More information

Automated Financial Reporting (AFR) Version 4.0 Highlights

Automated Financial Reporting (AFR) Version 4.0 Highlights Automated Financial Reporting (AFR) Version 4.0 Highlights Why Do 65% of North American CAT Dealers Use AFR? Without formal training, our CFO conducted quarterly statement reviews with all of our operating

More information

TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents:

TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents: Table of contents: Access Data for Analysis Data file types Format assumptions Data from Excel Information links Add multiple data tables Create & Interpret Visualizations Table Pie Chart Cross Table Treemap

More information

Data Mining and Visualization

Data Mining and Visualization Data Mining and Visualization Jeremy Walton NAG Ltd, Oxford Overview Data mining components Functionality Example application Quality control Visualization Use of 3D Example application Market research

More information

Data Visualization Handbook

Data Visualization Handbook SAP Lumira Data Visualization Handbook www.saplumira.com 1 Table of Content 3 Introduction 20 Ranking 4 Know Your Purpose 23 Part-to-Whole 5 Know Your Data 25 Distribution 9 Crafting Your Message 29 Correlation

More information

A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data

A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data White Paper A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data Contents Executive Summary....2 Introduction....3 Too much data, not enough information....3 Only

More information

Hexaware E-book on Predictive Analytics

Hexaware E-book on Predictive Analytics Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,

More information

Crime Pattern Analysis

Crime Pattern Analysis Crime Pattern Analysis Megaputer Case Study in Text Mining Vijay Kollepara Sergei Ananyan www.megaputer.com Megaputer Intelligence 120 West Seventh Street, Suite 310 Bloomington, IN 47404 USA +1 812-330-01

More information

How To Solve The Kd Cup 2010 Challenge

How To Solve The Kd Cup 2010 Challenge A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.

More information

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III www.cognitro.com/training Predicitve DATA EMPOWERING DECISIONS Data Mining & Predicitve Training (DMPA) is a set of multi-level intensive courses and workshops developed by Cognitro team. it is designed

More information

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Machine Learning and Data Mining. Fundamentals, robotics, recognition Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,

More information

Miracle Integrating Knowledge Management and Business Intelligence

Miracle Integrating Knowledge Management and Business Intelligence ALLGEMEINE FORST UND JAGDZEITUNG (ISSN: 0002-5852) Available online www.sauerlander-verlag.com/ Miracle Integrating Knowledge Management and Business Intelligence Nursel van der Haas Technical University

More information

Data Mining. SPSS Clementine 12.0. 1. Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

Data Mining. SPSS Clementine 12.0. 1. Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine Data Mining SPSS 12.0 1. Overview Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Types of Models Interface Projects References Outline Introduction Introduction Three of the common data mining

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

Chapter 20: Data Analysis

Chapter 20: Data Analysis Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

More information

Data Visualization Techniques

Data Visualization Techniques Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The

More information

Delivering new insights and value to consumer products companies through big data

Delivering new insights and value to consumer products companies through big data IBM Software White Paper Consumer Products Delivering new insights and value to consumer products companies through big data 2 Delivering new insights and value to consumer products companies through big

More information

What is Visualization? Information Visualization An Overview. Information Visualization. Definitions

What is Visualization? Information Visualization An Overview. Information Visualization. Definitions What is Visualization? Information Visualization An Overview Jonathan I. Maletic, Ph.D. Computer Science Kent State University Visualize/Visualization: To form a mental image or vision of [some

More information

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam ECLT 5810 E-Commerce Data Mining Techniques - Introduction Prof. Wai Lam Data Opportunities Business infrastructure have improved the ability to collect data Virtually every aspect of business is now open

More information

Short-Term Forecasting in Retail Energy Markets

Short-Term Forecasting in Retail Energy Markets Itron White Paper Energy Forecasting Short-Term Forecasting in Retail Energy Markets Frank A. Monforte, Ph.D Director, Itron Forecasting 2006, Itron Inc. All rights reserved. 1 Introduction 4 Forecasting

More information

White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.

White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices. White Paper Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices. Contents Data Management: Why It s So Essential... 1 The Basics of Data Preparation... 1 1: Simplify Access

More information

Populating a Data Quality Scorecard with Relevant Metrics WHITE PAPER

Populating a Data Quality Scorecard with Relevant Metrics WHITE PAPER Populating a Data Quality Scorecard with Relevant Metrics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Useful vs. So-What Metrics... 2 The So-What Metric.... 2 Defining Relevant Metrics...

More information

IBM SPSS Direct Marketing 19

IBM SPSS Direct Marketing 19 IBM SPSS Direct Marketing 19 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This document contains proprietary information of SPSS

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Data Preprocessing. Week 2

Data Preprocessing. Week 2 Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.

More information

1 Choosing the right data mining techniques for the job (8 minutes,

1 Choosing the right data mining techniques for the job (8 minutes, CS490D Spring 2004 Final Solutions, May 3, 2004 Prof. Chris Clifton Time will be tight. If you spend more than the recommended time on any question, go on to the next one. If you can t answer it in the

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for

More information

Time series clustering and the analysis of film style

Time series clustering and the analysis of film style Time series clustering and the analysis of film style Nick Redfern Introduction Time series clustering provides a simple solution to the problem of searching a database containing time series data such

More information

Visualizing Relationships and Connections in Complex Data Using Network Diagrams in SAS Visual Analytics

Visualizing Relationships and Connections in Complex Data Using Network Diagrams in SAS Visual Analytics Paper 3323-2015 Visualizing Relationships and Connections in Complex Data Using Network Diagrams in SAS Visual Analytics ABSTRACT Stephen Overton, Ben Zenick, Zencos Consulting Network diagrams in SAS

More information

Unsupervised Data Mining (Clustering)

Unsupervised Data Mining (Clustering) Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining /8/ What is Cluster

More information

The Comparisons. Grade Levels Comparisons. Focal PSSM K-8. Points PSSM CCSS 9-12 PSSM CCSS. Color Coding Legend. Not Identified in the Grade Band

The Comparisons. Grade Levels Comparisons. Focal PSSM K-8. Points PSSM CCSS 9-12 PSSM CCSS. Color Coding Legend. Not Identified in the Grade Band Comparison of NCTM to Dr. Jim Bohan, Ed.D Intelligent Education, LLC Intel.educ@gmail.com The Comparisons Grade Levels Comparisons Focal K-8 Points 9-12 pre-k through 12 Instructional programs from prekindergarten

More information

Cluster Analysis: Basic Concepts and Algorithms

Cluster Analysis: Basic Concepts and Algorithms Cluster Analsis: Basic Concepts and Algorithms What does it mean clustering? Applications Tpes of clustering K-means Intuition Algorithm Choosing initial centroids Bisecting K-means Post-processing Strengths

More information

Visualizing non-hierarchical and hierarchical cluster analyses with clustergrams

Visualizing non-hierarchical and hierarchical cluster analyses with clustergrams Visualizing non-hierarchical and hierarchical cluster analyses with clustergrams Matthias Schonlau RAND 7 Main Street Santa Monica, CA 947 USA Summary In hierarchical cluster analysis dendrogram graphs

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Clustering Algorithms K-means and its variants Hierarchical clustering

More information

Microsoft Business Intelligence

Microsoft Business Intelligence Microsoft Business Intelligence P L A T F O R M O V E R V I E W M A R C H 1 8 TH, 2 0 0 9 C H U C K R U S S E L L S E N I O R P A R T N E R C O L L E C T I V E I N T E L L I G E N C E I N C. C R U S S

More information

A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS

A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS Stacey Franklin Jones, D.Sc. ProTech Global Solutions Annapolis, MD Abstract The use of Social Media as a resource to characterize

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

MHI3000 Big Data Analytics for Health Care Final Project Report

MHI3000 Big Data Analytics for Health Care Final Project Report MHI3000 Big Data Analytics for Health Care Final Project Report Zhongtian Fred Qiu (1002274530) http://gallery.azureml.net/details/81ddb2ab137046d4925584b5095ec7aa 1. Data pre-processing The data given

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Business Intelligence & Product Analytics

Business Intelligence & Product Analytics 2010 International Conference Business Intelligence & Product Analytics Rob McAveney www. 300 Brickstone Square Suite 904 Andover, MA 01810 [978] 691 8900 www. Copyright 2010 Aras All Rights Reserved.

More information

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

Working with telecommunications

Working with telecommunications Working with telecommunications Minimizing churn in the telecommunications industry Contents: 1 Churn analysis using data mining 2 Customer churn analysis with IBM SPSS Modeler 3 Types of analysis 3 Feature

More information

BI SURVEY. The world s largest survey of business intelligence software users

BI SURVEY. The world s largest survey of business intelligence software users 1 The BI Survey 12 KPIs and Dashboards THE BI SURVEY 12 The Customer Verdict The world s largest survey of business intelligence software users 11 This document explains the definitions and calculation

More information

APPROACHABLE ANALYTICS MAKING SENSE OF DATA

APPROACHABLE ANALYTICS MAKING SENSE OF DATA APPROACHABLE ANALYTICS MAKING SENSE OF DATA AGENDA SAS DELIVERS PROVEN SOLUTIONS THAT DRIVE INNOVATION AND IMPROVE PERFORMANCE. About SAS SAS Business Analytics Framework Approachable Analytics SAS for

More information