# The Predictive Data Mining Revolution in Scorecards:

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 January 13, 2013 StatSoft White Paper The Predictive Data Mining Revolution in Scorecards: Accurate Risk Scoring via Ensemble Models Summary Predictive modeling methods, based on machine learning algorithms that optimize model accuracy, have revolutionized numerous industries, from manufacturing through marketing to sales. However, credit risk scores in the financial industries are still built using traditional statistical models to derive score-cards where additive factors such as specific age, income, etc., make up the final credit score, and determine the final decision of credit worthiness. This paper describes successful approaches of how modern predictive modeling algorithms can be used to build better credit risk models and scorecards without sacrificing some of the critical benefits from applying traditional scorecards such as reason scores (for denial of credit). How Traditional Scorecards Are Built The process of building a credit scorecard usually follows three steps: Data preparation and predictor coding Model building using logistic regression, and model validation Final deployment These steps are briefly described later. They are also described in some detail at the StatSoft Electronic Statistics Textbook, at A Common Scorecard A final scorecard may have the following structure and look like this:

2 Page 2 of 8 The most important columns in this spreadsheet are the first, second, and last ones. The first column identifies each predictor variable (e.g., Value of Savings). The second column shows the specific interval boundaries or (combined) categories for categorical predictors. For example, Value of Savings < 140, Value of Savings between 140 and 700, and so on. The last column shows the credit score factor to be used for applicants that fall into the respective interval. For example, for an applicant with a Value of Savings between 140 and 700, a score value of 95 is added to the total credit score. If the Amount of credit is between 6,600 and 10,000, a score of 86 is added, and so on. For each credit applicant, a total sum credit score can be computed based on the respective categories that person falls into; critical cutoff values are then used to determine the final credit decision. The analytic workflow to get to the final scorecard follows a logical procedure to generate a table such as that shown above. Data Preparation, Predictor Coding After available data have been retrieved and cleaned (e.g., bad codes have been removed, missing data have been replaced, duplicates were deleted, etc.), the first step is to recode the value ranges for the continuous predictor variables such as age, average account balance, and so on into value ranges.

3 Page 3 of 8 The STATISTICA solution platform provides different methods to accomplish this coding, ranging from manual/visual methods (graphical methods) to fully automated optimal binning to maximize the information of the resulting binned variable for modeling credit default probability. Model Building, Logistic Regression The next step is to build a linear model that relates the coded categories to credit default risk. The statistical method that is used for this purpose is Logistic Linear Regression. The details of this method are described in the StatSoft Electronic Statistics Textbook (

4 Page 4 of 8 In short, a simple best linear equation is estimated and scaled so that, for example, the odds of credit default double for every 10 points' decrease in the resulting predicted score. Final Deployment The final deployment of a scorecard can be accomplished manually by adding the scores for the different categories for each variable describing a credit applicant. Typically, in larger banks the scoring is, of course, done automatically through systems such as STATISTICA Enterprise Server or STATISTICA Live Score for real-time scoring. Also, along with the final credit score, typically a final credit decision is returned by comparing the credit score to appropriately chosen cutoffs. Further, for rejected credit applications, the reasons for the rejection (e.g., those predictors whose respective scores were below the norm or average) can be provided to the applicant.

5 Page 5 of 8 Advantages of Common Scorecards The types of credit scorecards briefly described here have several convenient properties that explain their popularity. Final credit scores are the simple sum of individual factors determined by classifying applicants into specific bins. It is easy to see what specific factors were most beneficial for a favorable credit decision or detrimental and leading to a rejection of a credit application. In addition, because the basic statistical model is linear, it is easy to understand the inner workings or behavior of the credit risk model, in the sense that in most cases simple summaries can be derived such as the higher the education of an applicant, the lower the credit default risk and so on. Predictive Modeling Approaches So why are some progressive financial institutions and banks using other, perhaps more complex, approaches to credit scoring? The answer is simple: predictive modeling methods often result in more accurate predictions of risk. With more accurate predictions of risk, more credit can be extended to more applicants, while reducing or not increasing the overall default rate. Thus, the accuracy of the model to predict credit default has a direct impact on profits. Accuracy of Scorecards, and Profitability Shown below is a summary evaluation of different predictive modeling scorecards compared to the existing traditional scorecard for a large StatSoft customer.

6 Page 6 of 8 This graph (based on actual data) shows the enormous impact that a high-quality, accurate model of credit risk can have on the bottom line: Given the Baseline model, approving approximately 55% of applications for credit resulted in a little above 10% of eventual credit defaults. In this graph, the profitability of that status quo model and solution is pegged at 100% profit. Moving to the right, different cut-off values for credit approval are shown for a powerful ensemble predictive model built via STATISTICA. Depending on where the credit dis/approval threshold is set, given a predicted probability of default, more credit can be approved (e.g., for almost 85% of all applicants in the High Approval Rate model), while either decreasing or maintaining a constant default rate. The effect on the profitability is dramatic: over 80% greater profit resulted for all cutoff values considered. In short, and as expected, the more accurately a creditor can predict the risk of credit default, the more selective that creditor can be to extend credit only to those who will pay back the money with all interest as agreed. How Should Accurate Risk Models Be Built? In short, the answer is: by using algorithms that can find and approximate any systematic relationships between predictor variables and credit default risk, regardless if those relations are nonlinear, highly interactive (e.g., different models should be applied to different age groups), etc. Such algorithms are called general approximators because they can approximate any relationship regardless of how complex it might be and leverage those relationships to make more accurate predictions. Ensemble Models, Voting Models The algorithms that have proven to be very successful and that are refined in the STATISTICA platform are those applying voting (averaging) of predictions from different tree- models, or by boosting those models. Boosting is the process of applying a learning algorithm repeatedly to the poorly predicted observations in a training sample to represent correctly all homogeneous subgroups, groups of different and difficult-to-predict applicants, outliers, etc. For example, in a recent international competition based on a credit risk scoring problem (the international 2010 PAKDD Data Mining Competition), the winning (most accurate) solution was computed using the STATISTICA Boosting Trees (stochastic gradient boosting) algorithm. The paper describing the winning solution is available from StatSoft. The details of effective predictive modeling algorithms are often complex. However, efficient implementations of these algorithms on capable hardware, such as the implementations through the STATISTICA Credit Risk solutions, will solve even difficult problems against large databases in very little time.

7 Page 7 of 8 The Dreaded Black-Box and How to Open It There is little controversy around the notion that predictive modeling techniques as implemented in the STATISTICA solutions will outperform the accuracy of statistical modeling approaches in most cases. However, a common criticism that is raised in this context is about the concern that these models are black-boxes, making it nearly impossible to provide insights into the reasons for specific predictions, or the general mechanisms and relationships that drive the key outcomes such as credit default. On the surface, these concerns are valid and perhaps sufficiently important to continue with the status quo and common scorecard modeling approaches. However, there are good solutions to address this issue of lack-of-interpretability and reason scores, and all of them are easily implemented through the STATISTICA solutions. What-if (scenario) analysis. In most general terms, for every prediction that is made through a black-box predictive model, the computer can run comprehensive what-if analyses to evaluate what the credit default probability prediction would have been, if for each predictor different values had been observed. So for example, if an applicant for credit is 18 years old and credit is denied, the computer can quickly evaluate what the credit decision would have been if the applicant had been 20 years old or 30 years old. Applying this basic approach systematically to every prediction or group of predictions will provide detailed insights into the inner workings of even the most complex models, and into the specific predictors and their interactions that are responsible for a specific credit decision. Two-stage modeling. There are other and perhaps simpler approaches that can also be used to arrive at interpretable models and predictions. One can use the predictions of the most accurate black-box model as a starting point, and drill down into the model to identify the specific interactions and predictor bins that were important and drove the risk prediction (for example, using what-if scenarios or the various standard results around predictor importance available through STATISTICA). With that knowledge, better and more comprehensive common scorecards can then be built that include the relevant predictor interactions, pre-scoring segmentation of homogeneous groups of applicants, specific binning solutions, etc., identified through the first predictive modeling step. Deploying Predictive Models Risk models derived via predictive modeling tools are best deployed in an automated computer scoring solution such as STATISTICA Enterprise Server or STATISTICA Live Score. In fact, the STATISTICA Enterprise Decisioning Platform will manage the entire modeling lifecycle from model development to single-click deployment for real-time or batch scoring with version control and audit logs for regulatory compliance.

8 Page 8 of 8 Conclusion Risk modeling is an area where the application of advanced predictive modeling tools implemented in the STATISTICA solution platforms can produce significant and rapid return on investment. The STATISTICA platform includes algorithms such as boosted trees, tree-nets (or random forests), and automatic neural network ensembles, to mention only a few of the very powerful and proven-to-be-effective implementations available. When combined with the ease and comprehensiveness of features for model lifecycle management and integration with business rules and logic available in the STATISTICA Decisioning Platform, there is little reason not to adopt these methods to realize the revenue associated with more accurate risk prediction and to become more competitive in the complex financial services domain. Further Reading: Breiman, L. (2001). Statistical Modeling: The Two Cultures. Statistical Science, Vol. 16, Giovanni, S., Elder, J. (2010). Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions. Synthesis Lectures on Data Min. Harańczyk, G. (2010). Predictive model optimization by searching parameter space. Winning entry to the PAKDD 2010 Credit Risk Modeling Competition. StatSoft Inc. Hill, T. & Lewicki, P. (2006). Statistics: Methods and Applications. StatSoft. (also available in electronic form at: Nisbet, R., Elder, J., Miner, G. (2009). Handbook of Statistical Analysis and Data Mining Applications. Elsevier. The Wall Street Journal Market Watch (2012). Danske Bank Implemented STATISTICA to Maximize Efficiency and Minimize Operational Risk. For More Information: StatSoft, Inc East 14 th Street Tulsa, OK USA StatSoft Copyright 2013

### Benchmarking of different classes of models used for credit scoring

Benchmarking of different classes of models used for credit scoring We use this competition as an opportunity to compare the performance of different classes of predictive models. In particular we want

### STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT

### KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES Translating data into business value requires the right data mining and modeling techniques which uncover important patterns within

June 2010 From: StatSoft Analytics White Papers To: Internal release Re: Performance comparison of STATISTICA Version 9 on multi-core 64-bit machines with current 64-bit releases of SAS (Version 9.2) and

### Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends

Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends Spring 2015 Thomas Hill, Ph.D. VP Analytic Solutions Dell Statistica Overview and Agenda Dell Software overview Dell in

### Software Course and the Case Practice Introduction of Credit Risk Data

Software Course and the Case Practice Introduction of Credit Risk Data Cheyu HUNG / 洪哲裕 StatSoft Holdings, Inc., Taiwan Branch November 27, 2013 Making the World More Productive Headquarters: StatSoft,

### STATISTICA Solutions for Financial Risk Management Management and Validated Compliance Solutions for the Banking Industry (Basel II)

STATISTICA Solutions for Financial Risk Management Management and Validated Compliance Solutions for the Banking Industry (Basel II) With the New Basel Capital Accord of 2001 (BASEL II) the banking industry

### 2015 Workshops for Professors

SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market

### Using Data Mining Methods to Optimize Boiler Performance: Successes and Lessons Learned

Using Data Mining Methods to Optimize Boiler Performance: Successes and Lessons Learned Thomas Hill, Ph.D. StatSoft Power Solutions StatSoft Inc. www.statsoft.com www.statsoftpower.com Optimize combustion,

### Grow Revenues and Reduce Risk with Powerful Analytics Software

Grow Revenues and Reduce Risk with Powerful Analytics Software Overview Gaining knowledge through data selection, data exploration, model creation and predictive action is the key to increasing revenues,

### Why Ensembles Win Data Mining Competitions

Why Ensembles Win Data Mining Competitions A Predictive Analytics Center of Excellence (PACE) Tech Talk November 14, 2012 Dean Abbott Abbott Analytics, Inc. Blog: http://abbottanalytics.blogspot.com URL:

### WHITEPAPER. How to Credit Score with Predictive Analytics

WHITEPAPER How to Credit Score with Predictive Analytics Managing Credit Risk Credit scoring and automated rule-based decisioning are the most important tools used by financial services and credit lending

### Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.

Bonus Chapter Ten Major Predictive Analytics Vendors In This Chapter Angoss FICO IBM RapidMiner Revolution Analytics Salford Systems SAP SAS StatSoft, Inc. TIBCO This chapter highlights ten of the major

### Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

### Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,

### EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models

### Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

### About Dell Statistica 12.6... 2

Complete Product Name with Trademarks Version Dell TM Statistica TM 12.6 Contents Dell TM Statistica TM... 1 About Dell Statistica 12.6... 2 New Features... 2 Workspace Enhancements: Statistica Enterprise

### Application of SAS! Enterprise Miner in Credit Risk Analytics. Presented by Minakshi Srivastava, VP, Bank of America

Application of SAS! Enterprise Miner in Credit Risk Analytics Presented by Minakshi Srivastava, VP, Bank of America 1 Table of Contents Credit Risk Analytics Overview Journey from DATA to DECISIONS Exploratory

### not possible or was possible at a high cost for collecting the data.

Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

### ElegantJ BI. White Paper. The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis

ElegantJ BI White Paper The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis Integrated Business Intelligence and Reporting for Performance Management, Operational

### TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP

TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions

### Weight of Evidence Module

Formula Guide The purpose of the Weight of Evidence (WoE) module is to provide flexible tools to recode the values in continuous and categorical predictor variables into discrete categories automatically,

### Example 3: Predictive Data Mining and Deployment for a Continuous Output Variable

Página 1 de 6 Example 3: Predictive Data Mining and Deployment for a Continuous Output Variable STATISTICA Data Miner includes a complete deployment engine with various options for deploying solutions

### Business Analytics and Credit Scoring

Study Unit 5 Business Analytics and Credit Scoring ANL 309 Business Analytics Applications Introduction Process of credit scoring The role of business analytics in credit scoring Methods of logistic regression

### The More Trees, the Better! Scaling Up Performance Using Random Forest in SAS Enterprise Miner

Paper 3361-2015 The More Trees, the Better! Scaling Up Performance Using Random Forest in SAS Enterprise Miner Narmada Deve Panneerselvam, Spears School of Business, Oklahoma State University, Stillwater,

### White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.

White Paper Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices. Contents Data Management: Why It s So Essential... 1 The Basics of Data Preparation... 1 1: Simplify Access

### Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

### Leveraging Ensemble Models in SAS Enterprise Miner

ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to

### Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

### Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-19-B &

### Using Adaptive Random Trees (ART) for optimal scorecard segmentation

A FAIR ISAAC WHITE PAPER Using Adaptive Random Trees (ART) for optimal scorecard segmentation By Chris Ralph Analytic Science Director April 2006 Summary Segmented systems of models are widely recognized

Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org

### Data Mining. Nonlinear Classification

Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

### Easily Identify Your Best Customers

IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do

### Contents. Overview. The solid foundation for your entire, enterprise-wide business intelligence system

Data Warehouse The solid foundation for your entire, enterprise-wide business intelligence system The core of the high-performance intelligence delivery infrastructure, designed to meet even the most demanding

### WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise

### Machine Learning with MATLAB David Willingham Application Engineer

Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the

### Better planning and forecasting with IBM Predictive Analytics

IBM Software Business Analytics SPSS Predictive Analytics Better planning and forecasting with IBM Predictive Analytics Using IBM Cognos TM1 with IBM SPSS Predictive Analytics to build better plans and

### Select the right configuration management database to establish a platform for effective service management.

Service management solutions Buyer s guide: purchasing criteria Select the right configuration management database to establish a platform for effective service management. All business activities rely

### How to Deploy Models using Statistica SVB Nodes

How to Deploy Models using Statistica SVB Nodes Abstract Dell Statistica is an analytics software package that offers data preparation, statistics, data mining and predictive analytics, machine learning,

### KnowledgeSEEKER POWERFUL SEGMENTATION, STRATEGY DESIGN AND VISUALIZATION SOFTWARE

POWERFUL SEGMENTATION, STRATEGY DESIGN AND VISUALIZATION SOFTWARE Most Effective Modeling Application Designed to Address Business Challenges Applying a predictive strategy to reach a desired business

### CROSS INDUSTRY PegaRULES Process Commander. Bringing Insight and Streamlining Change with the PegaRULES Process Simulator

CROSS INDUSTRY PegaRULES Process Commander Bringing Insight and Streamlining Change with the PegaRULES Process Simulator Executive Summary All enterprises aim to increase revenues and drive down costs.

### Gerry Hobbs, Department of Statistics, West Virginia University

Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

### INFORMATION MANAGED. Project Management You Can Build On. Primavera Solutions for Engineering and Construction

INFORMATION MANAGED Project Management You Can Build On Primavera Solutions for Engineering and Construction Improve Project Performance, Profitability, and Your Bottom Line Demanding owners, ineffective

### Overview, Goals, & Introductions

Improving the Retail Experience with Predictive Analytics www.spss.com/perspectives Overview, Goals, & Introductions Goal: To present the Retail Business Maturity Model Equip you with a plan of attack

### The Artificial Prediction Market

The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory

### COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for

### Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde

### Variable Selection in the Credit Card Industry Moez Hababou, Alec Y. Cheng, and Ray Falk, Royal Bank of Scotland, Bridgeport, CT

Variable Selection in the Credit Card Industry Moez Hababou, Alec Y. Cheng, and Ray Falk, Royal ank of Scotland, ridgeport, CT ASTRACT The credit card industry is particular in its need for a wide variety

### DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

### Oracle Real Time Decisions

A Product Review James Taylor CEO CONTENTS Introducing Decision Management Systems Oracle Real Time Decisions Product Architecture Key Features Availability Conclusion Oracle Real Time Decisions (RTD)

### INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER

INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc. AGENDA Overview/Introduction to Data Mining

### Tax Fraud in Increasing

Preventing Fraud with Through Analytics Satya Bhamidipati Data Scientist Business Analytics Product Group Copyright 2014 Oracle and/or its affiliates. All rights reserved. 2 Tax Fraud in Increasing 27%

### White Paper. Redefine Your Analytics Journey With Self-Service Data Discovery and Interactive Predictive Analytics

White Paper Redefine Your Analytics Journey With Self-Service Data Discovery and Interactive Predictive Analytics Contents Self-service data discovery and interactive predictive analytics... 1 What does

### CHAPTER 3 DATA MINING AND CLUSTERING

CHAPTER 3 DATA MINING AND CLUSTERING 3.1 Introduction Nowadays, large quantities of data are being accumulated. The amount of data collected is said to be almost doubled every 9 months. Seeking knowledge

### In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

### A Cloud Based Solution with IT Convergence for Eliminating Manufacturing Wastes

A Cloud Based Solution with IT Convergence for Eliminating Manufacturing Wastes Ravi Anand', Subramaniam Ganesan', and Vijayan Sugumaran 2 ' 3 1 Department of Electrical and Computer Engineering, Oakland

### Data Mining Applications in Higher Education

Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

### What is Data Mining, and How is it Useful for Power Plant Optimization? (and How is it Different from DOE, CFD, Statistical Modeling)

data analysis data mining quality control web-based analytics What is Data Mining, and How is it Useful for Power Plant Optimization? (and How is it Different from DOE, CFD, Statistical Modeling) StatSoft

### Tree Ensembles: The Power of Post- Processing. December 2012 Dan Steinberg Mikhail Golovnya Salford Systems

Tree Ensembles: The Power of Post- Processing December 2012 Dan Steinberg Mikhail Golovnya Salford Systems Course Outline Salford Systems quick overview Treenet an ensemble of boosted trees GPS modern

### Harnessing the power of advanced analytics with IBM Netezza

IBM Software Information Management White Paper Harnessing the power of advanced analytics with IBM Netezza How an appliance approach simplifies the use of advanced analytics Harnessing the power of advanced

### Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

### Data Mining. SPSS Clementine 12.0. 1. Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

Data Mining SPSS 12.0 1. Overview Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Types of Models Interface Projects References Outline Introduction Introduction Three of the common data mining

### Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are

### IBM Tivoli Netcool network management solutions for enterprise

IBM Netcool network management solutions for enterprise The big picture view that focuses on optimizing complex enterprise environments Highlights Enhance network functions in support of business goals

### Prediction of Stock Performance Using Analytical Techniques

136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University

### The Power of Business Intelligence in the Revenue Cycle

The Power of Business Intelligence in the Revenue Cycle Increasing Cash Flow with Actionable Information John Garcia August 4, 2011 Table of Contents Revenue Cycle Challenges... 3 The Goal of Business

### Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through

### Auto Days 2011 Predictive Analytics in Auto Finance

Auto Days 2011 Predictive Analytics in Auto Finance Vick Panwar SAS Risk Practice Copyright 2010 SAS Institute Inc. All rights reserved. Agenda Introduction Changing Risk Landscape - Key Drivers and Challenges

### DISCOVER MERCHANT PREDICTOR MODEL

DISCOVER MERCHANT PREDICTOR MODEL A Proactive Approach to Merchant Retention Welcome to Different. A High-Level View of Merchant Attrition It s a well-known axiom of business that it costs a lot more to

### Chapter 7: Data Mining

Chapter 7: Data Mining Overview Topics discussed: The Need for Data Mining and Business Value The Data Mining Process: Define Business Objectives Get Raw Data Identify Relevant Predictive Variables Gain

### The Total Economic Impact Of SAS Customer Intelligence Solutions Real-Time Decision Manager

A Forrester Total Economic Impact Study Commissioned By SAS Project Director: Dean Davison May 2014 The Total Economic Impact Of SAS Customer Intelligence Solutions Real-Time Decision Manager Table Of

### APPROACHABLE ANALYTICS MAKING SENSE OF DATA

APPROACHABLE ANALYTICS MAKING SENSE OF DATA AGENDA SAS DELIVERS PROVEN SOLUTIONS THAT DRIVE INNOVATION AND IMPROVE PERFORMANCE. About SAS SAS Business Analytics Framework Approachable Analytics SAS for

### How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK Agenda Analytics why now? The process around data and text mining Case Studies The Value of Information

### From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data

100 001 010 111 From Raw Data to 10011100 Actionable Insights with 00100111 MATLAB Analytics 01011100 11100001 1 Access and Explore Data For scientists the problem is not a lack of available but a deluge.

### High-Performance Analytics

High-Performance Analytics David Pope January 2012 Principal Solutions Architect High Performance Analytics Practice Saturday, April 21, 2012 Agenda Who Is SAS / SAS Technology Evolution Current Trends

### An Introduction to SAS Enterprise Miner and SAS Forecast Server. André de Waal, Ph.D. Analytical Consultant

SAS Analytics Day An Introduction to SAS Enterprise Miner and SAS Forecast Server André de Waal, Ph.D. Analytical Consultant Agenda 1. Introduction to SAS Enterprise Miner 2. Basics 3. Enterprise Miner

### Predictive Analytics Certificate Program

Information Technologies Programs Predictive Analytics Certificate Program Accelerate Your Career Offered in partnership with: University of California, Irvine Extension s professional certificate and

### Manage student performance in real time

Manage student performance in real time Predict better academic outcomes with IBM Predictive Analytics for Student Performance Highlights Primary and secondary school districts nationwide are looking for

### Working with telecommunications

Working with telecommunications Minimizing churn in the telecommunications industry Contents: 1 Churn analysis using data mining 2 Customer churn analysis with IBM SPSS Modeler 3 Types of analysis 3 Feature

### Random Forest Based Imbalanced Data Cleaning and Classification

Random Forest Based Imbalanced Data Cleaning and Classification Jie Gu Software School of Tsinghua University, China Abstract. The given task of PAKDD 2007 data mining competition is a typical problem

### KNIME UGM 2014 Partner Session

KNIME UGM 2014 Partner Session DYMATRIX Stefan Weingaertner DYMATRIX CONSULTING GROUP 1 Agenda 1 Company Introduction 2 DYMATRIX Customer Intelligence Offering 3 PMML2SQL / PMML2SAS Converter 4 Uplift

### ORACLE S PRIMAVERA FEATURES PORTFOLIO MANAGEMENT. Delivers value through a strategy-first approach to selecting the optimum set of investments

ORACLE S PRIMAVERA FEATURES Delivers value through a strategy-first approach to selecting the optimum set of investments Leverages consistent evaluation metrics, user-friendly forms, one click access to

### SAS. Predictive Analytics. Overview. Turning Your Data into Timely Insight for Better, Faster Decision Making. Challenges SOLUTION OVERVIEW

SOLUTION OVERVIEW SAS Predictive Analytics Turning Your Data into Timely Insight for Better, Faster Decision Making Overview Is your organization overflowing with enterprise data but failing to turn it

### Data Mining Part 5. Prediction

Data Mining Part 5. Prediction 5.7 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Linear Regression Other Regression Models References Introduction Introduction Numerical prediction is

### Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions

### Hurwitz ValuePoint: Predixion

Predixion VICTORY INDEX CHALLENGER Marcia Kaufman COO and Principal Analyst Daniel Kirsch Principal Analyst The Hurwitz Victory Index Report Predixion is one of 10 advanced analytics vendors included in

### TDWI Best Practice BI & DW Predictive Analytics & Data Mining

TDWI Best Practice BI & DW Predictive Analytics & Data Mining Course Length : 9am to 5pm, 2 consecutive days 2012 Dates : Sydney: July 30 & 31 Melbourne: August 2 & 3 Canberra: August 6 & 7 Venue & Cost

### Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be

### Enterprise Risk Management

Enterprise Risk Management Enterprise Risk Management Understand and manage your enterprise risk to strike the optimal dynamic balance between minimizing exposures and maximizing opportunities. Today s

### Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

### WHITE PAPER. Payment Integrity Trends: What s A Code Worth. A White Paper by Equian

WHITE PAPER Payment Integrity Trends: What s A Code Worth A White Paper by Equian June 2014 To install or not install a pre-payment code edit, that is the question. Not all standard coding rules and edits

### Afni deploys predictive analytics to drive milliondollar financial benefits

Afni deploys predictive analytics to drive milliondollar financial benefits Using a smarter approach to debt recovery to identify the best payers and focus collection efforts Overview The need Afni wanted

### High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances Highlights IBM Netezza and SAS together provide appliances and analytic software solutions that help organizations improve

### BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376

Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.

### Data Mining: STATISTICA

Data Mining: STATISTICA Outline Prepare the data Classification and regression 1 Prepare the Data Statistica can read from Excel,.txt and many other types of files Compared with WEKA, Statistica is much