How To Make A Credit Risk Model For A Bank Account
|
|
- Madlyn Webb
- 3 years ago
- Views:
Transcription
1 TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző 15 October 2015
2
3 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions 25 Q&A 29 3
4 INTRODUCTION
5 INTRODUCTION TEAM STRUCTURE Antonio Horta- Osorio CEO Retail Division Group Operations Commercial Banking Group Finance Group Risk Insurance Customer Products & Marketing Group Digital... Group Financial Risk Retail and Consumer Credit Risk Analytics & Modelling Customer Analytics and Decisions Analytics & Modelling Has two group level responsibilities, which are Model Validation, and Analytics & Model Development. We are doing the latter, with 3 focuses: 1) Build a centre of excellence for all analytics, linking up analytics teams across LBG, sharing best practices and knowledge around data and modelling techniques; 2) Act as a link to external entities, e.g. Universities, Bureaus, analytical software providers, to keep on top of latest research; 3) Conduct proof of concept projects for new data and analytics solutions overall to simplify and develop LBG s analytical capacity to be the best bank for customers. Customer Analytics and Decisions Responsible for the development and maintenance of the Retail and Consumer Finance risk and capital models, to support lending decisions in line with our risk appetite and capital management strategy 5
6 RANDOM FOREST METHODOLOGY
7 TRANSACTIONAL DATA MINING PROJECT OBJECTIVE Problem statement Most LBG credit risk models are built on traditional credit data bases and techniques, such as account and customer characteristics and Logistic Regression based scorecards The latest technical advancements in computation and Machine Learning are not used in most credit models, nor considered/evaluated Objective: A proof of concept project, leveraging learnings from a Random Forest based Fraud Model, to assess the potential in Transactional data to enrich credit modelling datasets 2...Random Forests over Logistic Regression for estimating credit risk 7
8 RANDOM FORESTS OVERVIEW Training Sample Default rate: 50% Methodology cornerstones Numerous iterations of a Decision Tree build Each Decision Tree is different (trained on different subsets of the data) Split 2a Default rate: 90% Split 2b Default rate: 20% Split 3a Default rate: 10% Split 3b Default rate: 40% Each Decision Tree can be unstable in itself, still the Forest formed is proven to be stable and was found to be one of the most accurate methods for prediction Hundreds of decision trees a forest Bootstrap Sample Bootstrap Sample Bootstrap Sample Bootstrap Sample Bootstrap Sample Bootstrap Sample Split 2a Split 2b Split 2a Split 2b Split 2a Split 2b Split 2a Split 2b Split 2a Split 2b Split 2a Split 2b Average of votes: Probability of future default 8
9 DECISION TREES OVERVIEW P(G) P(B) GI Population Split 2a Split 2b Split 3a Split 3b Aim is to decrease impurity at each split One measure is the Gini impurity criterion at each node: G = 1 P(G) 2 P(B) 2 The decrease in Gini impurity shows how important a characteristic split is 9
10 DECISION TREES METHODOLOGY Building Methodology 1. Splitting the population until stop criteria met Too few observations in a tree node to split Perfectly pure node The split doesn t improve purity Reached a maximum tree depth / complexity (pre-set) 2. Evaluating the tree on an independent validation data set (not the same as test data or hold-out sample) 3. Prune back tree until performance is optimal on independent validation set 4. Assign an outcome probability to each leaf node as per the occurrence of outcome in that leaf node in the training data: score Scoring Methodology 1. Each new observation will fall into one of the leaf nodes, where it will get the score that was assigned to that leaf at training (outcome probability) Split Search Population Population Population Population Population Data driven and optimized Considers all characteristics and observations in the dataset at each split Considers fundamentally all possible splits of each characteristic Selects the best local candidate at each splitting 10
11 DECISION TREES PROS AND CONS PROS CONS 11
12 RANDOM FORESTS METHODOLOGY Bootstrap Sample Split 2a Split 2b 1. Select a random k observations (once per tree) 2. Select a random m characteristics and perform a split 3. Replacement of all characteristics into the bag 4. Select another random m characteristics and perform the next split 12
13 RANDOM FORESTS METHODOLOGY Bootstrap Sample Bootstrap Sample Bootstrap Sample Bootstrap Sample Bootstrap Sample Bootstrap Sample Split 2a Split 2b Split 2a Split 2b Split 2a Split 2b Split 2a Split 2b Split 2a Split 2b Split 2a Split 2b Building Methodology (cont d) 5. Each tree is trained until a stop criteria is reached 6. No pruning done Fundamental Parameters k: sample size (bag size) m: number of features n: number of trees d: max depth of tree Scoring Methodology 1. Each new observation will fall into one of the leaf nodes, and gets a score (same as in decision trees) 2. Each tree will produce a score for each observation, and determine a decision (default / no default) 3. All votes are averaged, providing an outcome probability for the Forest as a whole the final score 13
14 RANDOM FORESTS PROS AND CONS PROS CONS 14
15 RANDOM FORESTS ANALOGY Random Forest An ensemble of Decision Trees A Decision Tree is a supervised learning method, capable of observing associations in data All decision trees are trained using the same methodology All decision trees are trained using slightly different subsets of our data, developing an edge in scoring different types of observations Empirical and theoretical evidence shows that the average of a lot of highly trained trees gives a more accurate and stable prediction, than using a single model Board of Medical Experts An ensemble of Specialists A Specialist is capable of learning from experience, books, practice All specialists have the same brain structure and learning capabilities All specialists specialize in different areas of a subject, had different degrees, different experiences A board of specialists may provide a more balanced and accurate decision, than a single generalist 15
16 RANDOM FORESTS GENERIC FRAMEWORK Unsupervised Learning methods (no outcome) Clustering (k-means, k-mediods, Hierarchical, Density, etc) Association Rules (Market Basket Analysis, Sequence Analysis) Supervised Learning methods (binary or continuous outcome) Simple Classifiers (Logistic Regression, Decision Tree, Support Vector Machines, etc) Ensemble Classifiers (Random Forest, Artificial Neural Networks, Gradient Boosting, etc) Generally a combination of simple classifier units Type of simple classifiers (uniform, mixed) Diversification logic (subspaces of features, bags of observations, performance of other units, etc) Voting logic (simple averaging, majority voting, confidence-enhanced voting, etc) 16
17 TRANSACTIONAL DATA MINING PROJECT Csaba Főző, Lee Gregory, Sami Niemi, Olga Murumets
18 TRANSACTIONAL DATA MINING PROJECT OBJECTIVE Problem statement Most LBG credit risk models are built on traditional credit data bases and techniques, such as account and customer characteristics and Logistic Regression based scorecards The latest technical advancements in computation and Machine Learning are not used in most credit models, nor considered/evaluated Objective: A proof of concept project, leveraging learnings from a Random Forest based Fraud Model, to assess the potential in Transactional data to enrich credit modelling datasets 2...Random Forests over Logistic Regression for estimating credit risk 18
19 TRANSACTIONAL DATA MINING SCOPE In Scope Applications to a credit product over 3 months. Same characteristics as used in the champion model, plus characteristics derived from transactions. Extraction and transformation of new data elements from transactional data sources (Transaction Categorization System) The same bad definition will be used as in the champion model. Development and out-of-time test samples will be used for validation, aligned with the champion model time windows. Comparison against the live scorecard in place, on the basis of performance, transparency, and stability. Data sourcing and preliminary feature selection in SQL (for transactional data), and in SAS (for customer data). Data preparation and feature selection in SAS, R and Python. Model development using a Random Forest package in R and Python. Not in Scope Implementation Monitoring Governance 19
20 TRANSACTIONAL DATA MINING PROJECT PHASES Regression with 12 chars Phase 1a: RF on 12 chars Phase 1b: RF on 1500 chars Phase 2: RF on 1500 chars + transactional chars Phase 3: RF on 1500 chars + all transactional chars Phase 4: RF on 1500 chars + all transactional chars + complex derived chars 1. Develop Random Forest Methodology for Credit Risk: Build a Random Forests model using the same training and test data as in the champion model (12 characteristic variables). Extend model build to include full 1,500 credit risk characteristics considered during development of the champion model. Evaluate results on hold-out test sample and compare with champion model to evaluate Random Forests predictive power. 2. Inclusion of Transactional Data: Perform Transaction Data extraction to create Characteristic variables. Extend data model to better align with credit risk modelling Extend Random Forest and Logistic Regression model builds to include Transaction Data. Evaluate both training and test results across the two Random Forest models and the two Logistic Regression models (two models for each based on two data sets, original characteristic variables and the original data plus transactional data). 3. Extension of Transactional Data from another credit product: Using data extraction methodology from Phase 2 to create Characteristic Variables based on further transactions. Extend the Random Forest methodology to use data from champion model data, and all transactons data. Evaluate performance across all existing models. 4. New Transaction Data Characteristics: Design methodology to apply Association Rule Mining to generate new characteristic variables based on Transaction Data. Test these new variables using both Random Forest and Logistic Regression methodology. 20
21 TRANSACTIONAL DATA MINING ISSUES Computational restrictions Transactional data storage (wide datasets in volatile, permanent) Processing storage (Memory, Spool space) Characteristic transformations (SQL/SAS/R) Data transfer speed (Different databases, local PC) System Reliability Modelling tool (SAS, SAS EM, R) Processing speed (code optimization, indexing, partitioning, parallel processing) Transparency and interpretability Measurement of variable importance while collinearity is present Impact of drivers on outcome probability Random Forest methodology Different implementations and tools (SAS, R, Python, C#) Feature selection on 50K+ features, focussed on improving the POS model Stability (parameter optimization, independent validation) 21
22 TRANSACTIONAL DATA MINING TRANSACTIONAL DATA Transaction types 2 different basic types Transaction purposes/channels MCC code groups Beneficiaries X Weeks Weeks from time of application 0-6 weeks Months Months from time of application 0-12 months X Measurement Volume Sum, Avg, Max, Min Value Avg, Min, Max Balance Combining transaction type with time interval and measurement yields a multitude of predictors: 40K+ characteristics Infrastructure necessitated the reduction of these characteristics Information Value macros cannot cope with this number of chars Chars populated very sparsely were removed (.01%) Chars with very low defaults were removed (125 and 1%) Marginal Information Value and Mean Decrease in Gini was used to select final candidate chars 22
23 TRANSACTIONAL DATA MINING MODEL PERFORMANCE Characteristics Model Hold-out Gini 12 Logistic Regression Champion Characteristics 51% 12 Random Forest Champion Characteristics 54% 170 Random Forest All Credit Risk Chars (1.5K+ considered) 59% 100 Random Forest Champion Chars + Transactional Chars (15K+ considered) 56% 10 Logistic Regression All Credit Risk & Transactional Chars 53% 258 Random Forest All Credit Risk & Transactional Chars 60% 23
24 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0,99 0,94 0,88 0,82 0,76 0,70 0,65 0,59 0,53 0,47 0,41 0,36 0,30 0,24 0,18 0,12 0,07 Cumulative Good % Population % TRANSACTIONAL DATA MINING BEST RANDOM FOREST MODEL 100% Validation GINI 100% Cumulative Score Distribution 90% 90% 80% 80% 70% 70% 60% 60% 50% 50% 40% 40% 30% 30% 20% 10% Random Forest Model random 20% 10% Cuml Good% Cuml Bad% 0% 0% Cumulative Bad % Score 24
25 CONCLUSION
26 CONCLUSION A CRUDE ESTIMATE OF PROFIT & LOSS IMPACT Fixing predicted defaults Fixing predicted non-defaults 4%+ 12%+ Extra Lending Potential Reduction in Losses 26
27 CONCLUSION SUMMARY Positive Results Substantial improvement in credit risk model performance, which also translates to business growth / loss mitigation Two successful application of Random Forests in LBG (Fraud and Credit Risk) More information can be used in modelling credit risk (# of predictors, interactions, non-linearity) Model build is relatively fast Importance of model drivers are available Challenges Implementation is difficult with current IT systems, though possible Full model interpretation is less straight-forward, though possible Cultural change is always hard, though competitors give us some push 27
28
29 Q&A
30 30
Gerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationKnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES
HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES Translating data into business value requires the right data mining and modeling techniques which uncover important patterns within
More informationUsing multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
More informationDecision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationLeveraging Ensemble Models in SAS Enterprise Miner
ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationCI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
More informationKnowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes
Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-19-B &
More informationJournée Thématique Big Data 13/03/2015
Journée Thématique Big Data 13/03/2015 1 Agenda About Flaminem What Do We Want To Predict? What Is The Machine Learning Theory Behind It? How Does It Work In Practice? What Is Happening When Data Gets
More informationA Property & Casualty Insurance Predictive Modeling Process in SAS
Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing
More informationEXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationFast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
More informationA Study Of Bagging And Boosting Approaches To Develop Meta-Classifier
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,
More informationChapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
More informationNew Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
More informationClassification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
More informationAn Overview of Data Mining: Predictive Modeling for IR in the 21 st Century
An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIRPO
More informationInsurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics
More informationAn Overview and Evaluation of Decision Tree Methodology
An Overview and Evaluation of Decision Tree Methodology ASA Quality and Productivity Conference Terri Moore Motorola Austin, TX terri.moore@motorola.com Carole Jesse Cargill, Inc. Wayzata, MN carole_jesse@cargill.com
More informationThe Predictive Data Mining Revolution in Scorecards:
January 13, 2013 StatSoft White Paper The Predictive Data Mining Revolution in Scorecards: Accurate Risk Scoring via Ensemble Models Summary Predictive modeling methods, based on machine learning algorithms
More informationModel Combination. 24 Novembre 2009
Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy
More informationApplied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
More informationLavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs
1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be
More informationTable of Contents. June 2010
June 2010 From: StatSoft Analytics White Papers To: Internal release Re: Performance comparison of STATISTICA Version 9 on multi-core 64-bit machines with current 64-bit releases of SAS (Version 9.2) and
More information2015 Workshops for Professors
SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationLecture 10: Regression Trees
Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
More informationWhy Ensembles Win Data Mining Competitions
Why Ensembles Win Data Mining Competitions A Predictive Analytics Center of Excellence (PACE) Tech Talk November 14, 2012 Dean Abbott Abbott Analytics, Inc. Blog: http://abbottanalytics.blogspot.com URL:
More informationEnsemble Data Mining Methods
Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods
More informationA Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND
Paper D02-2009 A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND ABSTRACT This paper applies a decision tree model and logistic regression
More informationComparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
More informationFine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms
Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms Yin Zhao School of Mathematical Sciences Universiti Sains Malaysia (USM) Penang, Malaysia Yahya
More informationCOPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments
Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for
More informationIn-Database Analytics
Embedding Analytics in Decision Management Systems In-database analytics offer a powerful tool for embedding advanced analytics in a critical component of IT infrastructure. James Taylor CEO CONTENTS Introducing
More informationData mining and statistical models in marketing campaigns of BT Retail
Data mining and statistical models in marketing campaigns of BT Retail Francesco Vivarelli and Martyn Johnson Database Exploitation, Segmentation and Targeting group BT Retail Pp501 Holborn centre 120
More informationPredictive Modeling of Titanic Survivors: a Learning Competition
SAS Analytics Day Predictive Modeling of Titanic Survivors: a Learning Competition Linda Schumacher Problem Introduction On April 15, 1912, the RMS Titanic sank resulting in the loss of 1502 out of 2224
More informationMS1b Statistical Data Mining
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
More informationEnsemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
More informationPredictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD
Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,
More informationA fast, powerful data mining workbench designed for small to midsize organizations
FACT SHEET SAS Desktop Data Mining for Midsize Business A fast, powerful data mining workbench designed for small to midsize organizations What does SAS Desktop Data Mining for Midsize Business do? Business
More informationData Mining Techniques Chapter 6: Decision Trees
Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................
More informationDECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING
DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING ABSTRACT The objective was to predict whether an offender would commit a traffic offence involving death, using decision tree analysis. Four
More informationData Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
More informationWebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat
Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise
More informationA Property and Casualty Insurance Predictive Modeling Process in SAS
Paper 11422-2016 A Property and Casualty Insurance Predictive Modeling Process in SAS Mei Najim, Sedgwick Claim Management Services ABSTRACT Predictive analytics is an area that has been developing rapidly
More informationBenchmarking of different classes of models used for credit scoring
Benchmarking of different classes of models used for credit scoring We use this competition as an opportunity to compare the performance of different classes of predictive models. In particular we want
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationBetter credit models benefit us all
Better credit models benefit us all Agenda Credit Scoring - Overview Random Forest - Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis
More informationData Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
More informationDidacticiel Études de cas
1 Theme Data Mining with R The rattle package. R (http://www.r project.org/) is one of the most exciting free data mining software projects of these last years. Its popularity is completely justified (see
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationUniversité de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr
Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection
More informationGeneralizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel
Generalizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel Copyright 2008 All rights reserved. Random Forests Forest of decision
More informationRisk pricing for Australian Motor Insurance
Risk pricing for Australian Motor Insurance Dr Richard Brookes November 2012 Contents 1. Background Scope How many models? 2. Approach Data Variable filtering GLM Interactions Credibility overlay 3. Model
More informationChapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida - 1 -
Chapter 11 Boosting Xiaogang Su Department of Statistics University of Central Florida - 1 - Perturb and Combine (P&C) Methods have been devised to take advantage of the instability of trees to create
More informationDistributed forests for MapReduce-based machine learning
Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
More informationData Science and Business Analytics Certificate Data Science and Business Intelligence Certificate
Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate Description The Helzberg School of Management has launched two graduate-level certificates: one in Data
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 10 Sajjad Haider Fall 2012 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationData Mining Using SAS Enterprise Miner Randall Matignon, Piedmont, CA
Data Mining Using SAS Enterprise Miner Randall Matignon, Piedmont, CA An Overview of SAS Enterprise Miner The following article is in regards to Enterprise Miner v.4.3 that is available in SAS v9.1.3.
More informationData Mining Methods: Applications for Institutional Research
Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014
More informationCOMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers
COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Instructor: (jpineau@cs.mcgill.ca) TAs: Pierre-Luc Bacon (pbacon@cs.mcgill.ca) Ryan Lowe (ryan.lowe@mail.mcgill.ca)
More informationA Data Mining Tutorial
A Data Mining Tutorial Presented at the Second IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN 98) 14 December 1998 Graham Williams, Markus Hegland and Stephen
More informationIn this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
More informationData mining techniques: decision trees
Data mining techniques: decision trees 1/39 Agenda Rule systems Building rule systems vs rule systems Quick reference 2/39 1 Agenda Rule systems Building rule systems vs rule systems Quick reference 3/39
More informationKATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics
ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM KATE GLEASON COLLEGE OF ENGINEERING John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE (KGCOE- CQAS- 747- Principles of
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More information!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"!"#"$%&#'()*+',$$-.&#',/"-0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:
More informationClassification and Regression by randomforest
Vol. 2/3, December 02 18 Classification and Regression by randomforest Andy Liaw and Matthew Wiener Introduction Recently there has been a lot of interest in ensemble learning methods that generate many
More informationChapter 12 Bagging and Random Forests
Chapter 12 Bagging and Random Forests Xiaogang Su Department of Statistics and Actuarial Science University of Central Florida - 1 - Outline A brief introduction to the bootstrap Bagging: basic concepts
More informationData Mining Applications in Fund Raising
Data Mining Applications in Fund Raising Nafisseh Heiat Data mining tools make it possible to apply mathematical models to the historical data to manipulate and discover new information. In this study,
More informationCredit Risk Models. August 24 26, 2010
Credit Risk Models August 24 26, 2010 AGENDA 1 st Case Study : Credit Rating Model Borrowers and Factoring (Accounts Receivable Financing) pages 3 10 2 nd Case Study : Credit Scoring Model Automobile Leasing
More informationBOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
More informationAdvanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
More informationClustering through Decision Tree Construction in Geology
Nonlinear Analysis: Modelling and Control, 2001, v. 6, No. 2, 29-41 Clustering through Decision Tree Construction in Geology Received: 22.10.2001 Accepted: 31.10.2001 A. Juozapavičius, V. Rapševičius Faculty
More informationUsing Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
More informationNine Common Types of Data Mining Techniques Used in Predictive Analytics
1 Nine Common Types of Data Mining Techniques Used in Predictive Analytics By Laura Patterson, President, VisionEdge Marketing Predictive analytics enable you to develop mathematical models to help better
More informationVariable Selection in the Credit Card Industry Moez Hababou, Alec Y. Cheng, and Ray Falk, Royal Bank of Scotland, Bridgeport, CT
Variable Selection in the Credit Card Industry Moez Hababou, Alec Y. Cheng, and Ray Falk, Royal ank of Scotland, ridgeport, CT ASTRACT The credit card industry is particular in its need for a wide variety
More informationBIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
More informationCourse Syllabus. Purposes of Course:
Course Syllabus Eco 5385.701 Predictive Analytics for Economists Summer 2014 TTh 6:00 8:50 pm and Sat. 12:00 2:50 pm First Day of Class: Tuesday, June 3 Last Day of Class: Tuesday, July 1 251 Maguire Building
More informationTree Ensembles: The Power of Post- Processing. December 2012 Dan Steinberg Mikhail Golovnya Salford Systems
Tree Ensembles: The Power of Post- Processing December 2012 Dan Steinberg Mikhail Golovnya Salford Systems Course Outline Salford Systems quick overview Treenet an ensemble of boosted trees GPS modern
More informationBIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
More informationData Mining Techniques and its Applications in Banking Sector
Data Mining Techniques and its Applications in Banking Sector Dr. K. Chitra 1, B. Subashini 2 1 Assistant Professor, Department of Computer Science, Government Arts College, Melur, Madurai. 2 Assistant
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
More informationCOLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics
ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM COLLEGE OF SCIENCE John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE: COS-STAT-747 Principles of Statistical Data Mining
More informationGLM, insurance pricing & big data: paying attention to convergence issues.
GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.
More informationThe Operational Value of Social Media Information. Social Media and Customer Interaction
The Operational Value of Social Media Information Dennis J. Zhang (Kellogg School of Management) Ruomeng Cui (Kelley School of Business) Santiago Gallino (Tuck School of Business) Antonio Moreno-Garcia
More informationPredictive Modeling and Big Data
Predictive Modeling and Presented by Eileen Burns, FSA, MAAA Milliman Agenda Current uses of predictive modeling in the life insurance industry Potential applications of 2 1 June 16, 2014 [Enter presentation
More informationIntroduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training
More informationLearning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
More informationStatistics in Retail Finance. Chapter 2: Statistical models of default
Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision
More informationKnowledgeSEEKER POWERFUL SEGMENTATION, STRATEGY DESIGN AND VISUALIZATION SOFTWARE
POWERFUL SEGMENTATION, STRATEGY DESIGN AND VISUALIZATION SOFTWARE Most Effective Modeling Application Designed to Address Business Challenges Applying a predictive strategy to reach a desired business
More informationOverview. Background. Data Mining Analytics for Business Intelligence and Decision Support
Mining Analytics for Business Intelligence and Decision Support Chid Apte, PhD Manager, Abstraction Research Group IBM TJ Watson Research Center apte@us.ibm.com http://www.research.ibm.com/dar Overview
More informationANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
More informationClassification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
More information