Overview. Background. Data Mining Analytics for Business Intelligence and Decision Support

Mining Analytics for Business Intelligence and Decision Support Chid Apte, PhD Manager, Abstraction Research Group IBM TJ Watson Research Center apte@us.ibm.com http://www.research.ibm.com/dar Overview Knowledge discovery and data mining (KDD) techniques are used for analyzing and discovering actionable insights from data. The talk will Provide technical descriptions of the core algorithms that comprise data mining analytics Describe some business application scenarios for KDD Discuss issues in business intelligence systems Map trends in this area Background Widespread and explosive growth in use and size of databases Traditional use: query based report generation Size and volumes raise new issues: will data help business to achieve an advantage can data be used to model underlying processes and predict their behavior can we understand the data Providing capabilities to support exploration, summarization, and modeling of large databases is the goal of Business Intelligence systems 1

From Transactions to Warehouses Transactional databases: Reliable and accurate data capture; logging, book-keeping warehousing: Turning transactional data into a history repository Can be queried for summaries and aggregate reports First step in transforming transactional data (primary purpose: reliable storage) to one whose primary use is business intelligence May require integration of multiple sources of data Dealing with multiple formats; multiple database systems; integrating distributed databases; data cleaning; creating unified logical view of underlying non-homogeneous data On-Line Analytical Processing (OLAP) Supports query driven exploration of the data warehouse Utilizing pre-computed aggregates along data dimensions Deciding which aggregates to pre-compute and how to derive or reliably estimate from pre-computed projections Extends the Structured Query Language (SQL) framework to accommodate queries that would otherwise have been computationally impossible on a relational database management system Beyond OLAP Supporting queries at much more abstract level than SQL and OLAP Computer-driven exploration of data as opposed to human analyst-driven Facilitating data exploration of high dimensional data Providing solutions when user cannot describe goal in terms of a specific query e.g. discovering fraudulent cases in credit card or telephone uses Visualizing and understanding massive volumes of highdimensional data Rates of growth of data sets exceed by far any rates with which traditional human analyst techniques can cope 2

A Definition for Mining Automated search procedures for discovering credible and actionable insights from large volumes of high dimensional data emphasis upon symbolic learning and modeling methods (i.e. techniques that produce interpretable results) data management methods use of techniques from statistics, pattern recognition, and machine learning machine learning and statistical modeling also heavily used in vision, speech recognition, image processing, handwriting recognition, natural language understanding, etc. issues of scalability and automated business intelligence solutions drive much of and differentiate data mining from the other applications of machine learning and statistical modeling Machine Learning and Statistical Modeling serve as an important core Temporal Modeling Complex Pattern Detection Systems Performance Management Management Business Decision Support Systems Knowledge Discovery and Mining Machine Learning Statistical Modeling Feature Creation and Analysis Markov Modeling Speech Understanding Handwriting Recognition Computational Linguistics Statistical Text Processing Vision / Image Knowledge Management NLP/NLU Others (Agents, Education, etc..) Typical Business Intelligence Applications Risk Analysis Given a set of current customers and their finance/insurance history data, build a predictive model that can be used to classify a new customer into a risk category Targeted Marketing Given a set of current customers and history on their purchases and their responses to promotions, target new promotions to those most likely to respond Customer Retention Given a set of past customers and their behavior prior to leaving, predict who is most likely to leave and take proactive action Fraud Detection Detect fraudulent activities either proactively or on-line real-time Many other new applications keep surfacing 3

There s More to it Than Just Mining The process of identifying valid, novel, potentially useful, and understandable patterns in data requires one or more of: Selecting or sampling data from a data warehouse Cleaning or pre-processing it Transforming or reducing it Applying a data mining component to extract models or patterns Evaluating the derived structure The process is also known as KDD (Knowledge Discovery from Mining) mining is a key component concerned with the algorithmic means by which structures are extracted from data while meeting computational efficiency constraints Identify Business Opportunity The KDD Process Select Transform Mine Assimilate Warehouse Selected data mining n. The process of extracting valid, previously unknown, and ultimately comprehensible and actionable information from large databases and using it to make crucial business decisions. Visualization Mining Techniques Predictive Modeling Predict a specific attribute (database field) based upon the other attributes (fields) in the data Clustering (Segmentation) Group data records into subsets where items in subsets are more similar to each other than to items in other subsets Frequent Patterns Find interesting similarities between a few attributes in subsets of the data Change & Deviation Detect and account for interesting sequence of information in data records Dependencies Generate the joint probability density function that might have generated the data 4

Predictive Modeling Estimate a function? that maps points from an input space? to an output space???given only a finite sampling of the mapping Predict value of field (???in a database based on the other fields (?? Accurately construct an estimator ƒ of ƒfrom a finite sample known as the training set May be corrupted (i.e. noisy) If predicted quantity is numeric (i.e.??r, the real line) then the prediction problem is that of regression modeling If the predicted quantity is discrete (i.e.???????????) then the prediction problem is that of classification modeling Issues in Predictive Modeling Transformations on input space X to improve estimation capability Feature extraction / construction / selection Evaluating the estimate ƒ in terms of how well it performs on data not present in the training set Maximizing prediction accuracy by avoiding underfitting or over-fitting Trading off model complexity versus model accuracy Bias-variance tradeoff, penalized likelihood, minimum message length (MML) or minimum description length (MDL) Classification Predicting the most likely state of a categorical variable (the class) given the values of other variables Density estimation problem: deriving the value of Y given x?? from the joint density on Y and? Kernel density estimators Metric-space based methods (k-nearest neighbor) Projection into decision regions divide attribute space into decision regions and associate prediction with each region Linear classifiers, neural networks, decision trees, disjunctive normal form (DNF) rule-based classifiers Projection methods by far the most practical for data mining 5

120 100 80 60 40 20 0 0 0 10 20 30 40 50 60 70 80 90 100 5 15 25 35 45 55 65 75 85 95 120 100 80 60 40 20 Regression Predicting the most likely value of a numerical variable (the target column) given the values of other variables Numerical function approximation problem: deriving the value of Y given x?? from the joint probability distribution on Y and? Statistical probability models (e.g. linear regression) Projection into decision regions divide attribute space into decision regions and associate constant value with each region Neural networks, decision trees, disjunctive normal form (DNF) rule-based classifiers Hybrid Coupling projection methods with statistical models Projection and hybrid methods by far the most practical for data mining The Predictive Modeling Process Mine historical data to train patterns/models that can predict future behaviors Behaviors Response to Direct Mail Product Quality (Defects) Declining Activity Credit Risk Delinquency Likelihood to buy specific products Profitability etc. Score with models to reflect likelihood to exhibit the modeled behavior Act to optimize business objectives based on these scores Decision Trees for Predictive Modeling Tree generation algorithm Beginning with the training set at the root node, recursively split until a stopping criteria is met Split using best test among all possible tests on all attributes Prune tree (MDL, cross-validation, etc.) 6

Issues in Decision Tree Building Splitting at nodes greedy search: GINI (entropy minimizing), class probability profile difference, log-loss likelihood, etc. exhaustive search: ReliefF, Contextual Merit, etc. Testing of attributes Numerical attributes: inequality tests on cut-points Categorical attributes: subset tests Leaf models Piecewise constant Linear Probability functions Scalability A Typical Decision Tree Expected Sales Revenue Historical Sales per Mlg Less than 7 Historical Sales per Mlg 7-15 Historical Sales per Mlg Greater than 15 Segment 8 Hist Avg Sales per Order Less than 113 Segment 1 Hist Avg Sales per Order Greater or equal to 113 Credit Limit Less than 2200 Segment 6 Credit Limit Greater or equal to 2200 Segment 7 Climate Indicator 0 Segment 2 Climate Indicator 1 Risk Score Less than 687 Segment 3 Risk Score Greater or equal to 687 Outdoor Purchases Less than 3 Segment 4 Outdoor Purchases Greater or equal to 3 Segment 5 Training a Predictive Model Observations Predictive Model Predicted Outcomes Prediction Errors Actual Outcomes 7

Training Generalization Training Validation Too many segments Over fit Too few segments Under fit About right About right Prediction Error Optimal Training Validation Error Training Error Optimum Rule Set Number of Rules (degrees of freedom) Clustering Given a finite sampling of points, group them into sets of similar points Representing clusters of points with common characteristics In predictive modeling, class (or value) membership is known in the training data In clustering, this knowledge is not known a- priori, and is perhaps being discovered by clustering or segmentation 8

Techniques for Clustering/Segmentation Two-stage approach outer loop to determine cluster number k inner loop to fit points to clusters Metric distance-based methods; find best k-way partition so that points in a partition are closer to each other than to points in other partitions Model based methods: a best fit (very typically probabilistic) model is hypothesized for each cluster Partition based methods: iteratively enumerating and scoring various partition scenarios using heuristic scoring functions k-mean Clustering Algorithm Widely used in data mining Given k cluster centers c 1,j,c 2,j,,c k,j at iteration j, compute c 1,j+1,c 2,j+1,,c k,j+1 Cluster assignment: For each i=1,,m, assign x i to cluster l(i) such that c l(i),j is nearest to x i Cluster Center Update: For l=1,,k set c l,j+1 to be the mean of all x i assigned to c l,j Stop when c l,j = c l,j+1, l=1,,k Extensions include support for scalability, efficient placement of initial k means, and (harder problem) determining the number of clusters k Frequent Patterns Extracting compact patterns that describe subsets of data Row-wise patterns Column-wise patterns Association rules: detecting combinations of attribute values that occur with a minimum level of frequency (support) and certainty (confidence) Scalable algorithms can find all such rules in linear time under certain conditions of data sparseness Rules are not statements about causal effects amongst attributes, but can still provide useful insights 9

Change and Deviation Detecting sequence information, temporal or otherwise Ordering information of transactions (rows) is utilized Under certain conditions of data sparseness, sequences with desired levels of frequency and certainty can be computed in linear time Dependency Modeling Detecting causal structure within data Causal models Discovering probabilistic distributions governing the data Discovering functional dependencies between attributes in the data Techniques Density estimation methods Expectation maximization Explicit causal modeling Bayesian networks Applying Mining in Business Profit Customer Satisfaction Efficiency Who are the best customers to sell my products to? What are the most effective market segments for my business? How do I increase market share of my products? How do I reduce my costs and not impact production? How do I optimize my inventory? 10

Mapping Operations into Applications Predictive Modeling Assigning risk levels to new insurance and financial contracts Clustering / Segmentation Identifying distinct market groups in customer population Frequent Patterns Market basket analysis (what gets shopped together in a supermarket) Change and Deviation Fraud discovery in health claim data Discovering shopping patterns over time Business Application Opportunities Retail/Distribution Category management Merchandise planning Product management Production planning/tracking Insurance/Healthcare Claims analysis Provider analysis Managed care Outcomes analysis Manufacturing Product costing Manufacturing quality and efficiency Parts analysis Utilities Industrial customer profiles Financial analysis "Bulk Power" analysis Government Budgeting Financial reporting Demographics Telecommunications Customer profiles/ segmentation Product profitability Demand forecasting Usage analysis Cross Industry Market basket analysis Target marketing Customer segmentation Customer service Fraud and abuse Financial performance Transportation Yield management Pricing/rate analysis Logistics Financial Customer profitability and segmentation Products/portfolio profitability Risk management Cross-sale analysis Branch performance The Challenge Where Is It? What Does it Mean? How Can I Get It? What Format Is It In? No Single View of Many Interfaces Difficult to Access Multiple Sources Different Formats, Platforms Inconsistencies, Redundancies 11

An Efficient Environment for Mining Enterprise Model Operational Enterprise Warehouse Global mart Specialized Analysis marts External Mining Analysis Transaction Transformation (legacy system context removal) Meta-data Definition (e.g. consistent business terms) Analytical dimension definition (e.g. time, policyholder) Summarization Aggregation Identification / Collection Review Conversion / Reduction / Normalization Representation The Business Intelligence Process Access Transform Distribute Store Find & Understand Operational & External Enhancing Staging Relational Summarizing Aggregating Flow & Process Flow Joining From Multiple Sources Populating On-Demand Automate & Manage Multiple Platforms & Hardware Information Catalog Business Views Models Discover, Analyze, Visualize Query Interpretation Multi- Dimensional Analysis Mining Multi-Vendor Support Open Interfaces Mining Marketplace Status Enabler for business intelligence systems mining algorithm suites Loosely coupled with database technology Emphasis on data warehousing followed by exploratory data mining Typically conducted by consultants or in-house analytic teams Issues warehouse requirements Sophisticated analytics requirements 12

Key Challenges and Trends Infrastructure Enabling transparent and pervasive usage Algorithms Optimized and robust mining Solutions Vertically integrated for critical problems Enhanced emphasis on the Internet Infrastructure Making data mining transparent base extenders e.g. DB2/UDB User Defined Functions for model training/scoring Sufficient statistics (e.g. histograms, counts, samples, etc.) Parallel and distributed data mining Scalability (sampling and parallelization) XML based APIs for database coupling and application embedding Interoperability Training and scoring in different environments Intelligent or semi-automated data warehousing for mining Industry specific templates Meta-data mining Algorithms Robust and Automated Evaluation metrics Automated feature extraction / transformation / selection Discovering relational and hierarchical structures amongst attributes Incorporating prior knowledge to account for costs / benefits / uncertainty / missing values Incremental and on-line mining Privacy preserving data mining Heterogeneous data mining 13

Solutions Business Risk management Targeted marketing Portfolio management Systems Performance management Internet Site profiling and performance tuning User personalization Summary mining is being embedded in vertical solutions for business intelligence and decision support Management ecommerce Critical Large-Scale Solutions (CRM, etc.) Using data mining should eventually become as easy and pervasive as working with databases and spreadsheets today References Mathematical Programming for Mining: Formulations and Challenges, by Bradley et al., INFORMS Journal on Computing, Volume 11, No. 3, Summer 1999 KDD Nuggets http://www.kdnuggets.com IBM http://www.research.ibm.com/compsci/kdd http://www.research.ibm.com/dar http://www.ibm.com/bi 14