Data Mining + Business Intelligence Integration, Design and Implementation
ABOUT ME Vijay Kotu Data, Business, Technology, Statistics
BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution Dimensional slicing Mostly as-is reporting DATA MINING - Finding useful patterns in data Limited distribution Algorithms Insights and Predictions
DATA MINING Data Mining in simpler terms, is finding useful patterns in the data. It is non-trivial process of finding useful, valid, novel, understandable patterns or relationships in the data to make important decisions (Fayyad et al., 1996) Statistics Quantitative Operations Research Computing Machine Learning Data Stores Computation Machine Learning, Optimization, Algorithms
DATA MINING: MODELS
DATA MINING: TYPES Tasks Regression Classification Feature Selection Clustering Data Mining Text Mining Anomaly detection Time Series Applications Association
DATA MINING: TYPES Tasks Examples Classification Assigning voters into known buckets by political parties eg: soccer moms. Bucketing new customers into one of known customer groups. Regression Predicting unemployment rate for next year. Estimating insurance premium. Anomaly detection Fraud transaction detection in credit cards. Network intrusion detection. Time series Sales forecasting, production forecasting, virtually any growth phenomenon that needs to be extrapolated Clustering Finding customer segments in a company based on transaction, web and customer call data. Association analysis Find cross selling opportunities for a retailer based on transaction purchase history.
DATA MINING: TYPES Tasks Algorithms Classification Decision Trees, Neural networks, Bayesian models, Induction rules, K nearest neighbors Regression Linear regression, Logistic regression Anomaly detection Distance based, Density based, LOF Time series Exponential smoothing, ARIMA, regression Clustering K means, density based clustering - DBSCAN Association analysis FP Growth, Apriori
DATA MINING: PROCESS
DATA MINING: PROCESS
DATA MINING: PROCESS
DATA MINING: PROCESS
DATA MINING: PROCESS
DATA MINING: PROCESS Data Mining Scoring 625
DATA MINING: PROCESS
Data Mining + Business Intelligence
ISSUES Data Mining Business Intelligence - People: Skills of data mining and business intelligence are exclusive - Organization: They live in different organizations within an enterprise - Technology: Minimal overlap in the tools, platform and technology - Use cases: History reporting vs. prediction and insights
BENEFITS Data Mining Business Intelligence - Distribution: Data Mining insights will have wider real time distribution - Smarter Analytics: History + Predictions - Visual discovery: Common link - Security: Secure delivery of insights
CLASSIC BI ARCHITECTURE Security Layer Extraction Transformation &Loading Star Schema Staging OLAP Dashboards, reports, alerts, ad hoc...
ANALYTICAL ARCHITECTURE #1 Data Mining Tool Scoring Data Mining Tool Extraction Transformation &Loading Star Schema Staging OLAP Dashboards, reports, alerts, ad hoc... Data Mining tool does the scoring. Robust modeling and scoring capabilities. BI tool reports the scored like any other data points. Limitations: New records cannot be scored, unless scoring is provided by DM tool. Required multiple analytical tools.
ANALYTICAL ARCHITECTURE #2 Database Scoring Extraction Transformation &Loading Star Schema Staging OLAP Database does the scoring. Can handle large data. Model, scoring and data in one place. Limitations: DB vendors have to provide full DM suite. Analysis Skills Dashboards, reports, alerts, ad hoc...
ANALYTICAL ARCHITECTURE #3 BI Scoring: Native Modeling Extraction Transformation &Loading Star Schema Staging OLAP Dashboards, reports, alerts, ad hoc... BI platform does the scoring. Good integration between predictive metrics with BI metrics. Security. Distribution. Real time scoring. Limitations: Performance. Limited Functionality
ANALYTICAL ARCHITECTURE #4 BI Scoring: Data Mining Tool Modeling Extraction Transformation &Loading Star Schema Data Mining Tool Staging OLAP Dashboards, reports, alerts, ad hoc... PMML Model BI platform does the scoring. Modeled by DM tool and imported in BI platform. Real time scoring. Supports wide selection of algo. Limitations: Performance.
ANALYTICAL ARCHITECTURE Data Mining Tool Scoring Database Scoring BI Scoring - Native Modeling - Data Mining Tool Modeling
USE CASE Association Analysis or Market Basket Analysis
CLICKSTREAM DATA Can be generalized to transactions Applies to any product purchases in an enterprise
CLICKSTREAM DATA Creation of Association Rules
CLICKSTREAM DATA Creation of Association Rules
CLICKSTREAM DATA Creation of Association Rules
DATA MINING USING BI SYSTEM Model Building in BI MicroStrategy Desktop > Data Mining Services
DATA MINING SERVICE MicroStrategy Desktop > Data Mining Services
DATA MINING SERVICE MicroStrategy Desktop > Data Mining Services
DATA MINING SERVICE MicroStrategy Desktop > Data Mining Services
MODEL DETAILS MicroStrategy Desktop > Data Mining Services
RESULTS MicroStrategy Desktop > Data Mining Services
RESULTS
PMML MicroStrategy Desktop > Data Mining Services
PMML
PMML
PMML
PMML
PMML
BI VS. DATA MINING THINKING Number of customers lost last month Production downtime report ROI for Marketing Campaigns Yesterday s revenue Who will most likely churn in next 10 days What part of process will fail and mitigation Whats the next action will the prospect make Tomorrow s
Data Mining + Business Intelligence
RECOMMENDED READING Advanced Reporting Guide: Enhancing Your Business Intelligence OPEN SOURCE DATA MINING TOOLS
Data Mining + Business Intelligence Appendix
CLUSTERING CLUSTERING
CLUSTERING
CLUSTERING Data Set
CLUSTERING k-means Clustering
CLUSTERING
CLUSTERING
CLUSTERING
CLUSTERING
DECISION TREES DECISION TREES
DECISION TREES
DECISION TREES
DECISION TREES
DECISION TREES
DECISION TREES