Data Mining Approaches to Modeling Insurance Risk. Dan Steinberg, Mikhail Golovnya, Scott Cardell. Salford Systems 2009

Size: px
Start display at page:

Download "Data Mining Approaches to Modeling Insurance Risk. Dan Steinberg, Mikhail Golovnya, Scott Cardell. Salford Systems 2009"

Transcription

1 Data Mining Approaches to Modeling Insurance Risk Dan Steinberg, Mikhail Golovnya, Scott Cardell Salford Systems 2009

2 Overview of Topics Covered Examples in the Insurance Industry Predicting at the outset of a claim the likelihood of the claim becoming serious (CART example) Developing a model of total projected customer value for a health insurer Premium increase optimization Combination with GLM Other examples TreeNet tutorial

3 Predicting at the outset of a claim the likelihood of the claim becoming serious Consulting Project in Workers Compensation Insurance (with Pricewaterhouse Coopers Australia). Used Salford Systems CART. In worker s compensation insurance, serious claims comprise a small proportion of all claims by number but the great majority of the incurred cost. In our case study, 14% of reported claims were classified as serious. Those claims made up around 90% of total claim cost. Not obvious, in most cases, which claims will become serious as there are many factors contributing to result.

4 Predicting at the outset of a claim the likelihood of the claim becoming serious (continuation) Results: Data Mining (CART) identified 19 variables as most important predictor variables from 83 variables, including categorical variables with 100 s of categories. Classified all claims as likely to become serious or not likely to become serious. 31% of claims classified as serious Of these, 10% turn out to be serious 21% turn out to be false positives 69% of claims classified as non-serious Of these, 4% subsequently prove to be serious Interesting: Expected Result: Injury details were important in predicting serious claims Unexpected Result: language skills of the claimant were important in predicting serious claims Copyright Salford Systems 2009

5 Predicting at the outset of a claim the likelihood of the claim becoming serious (continuation) Gains Chart: 40% lift First 20% of the population, 60% of serious claims.

6 Developing a model of total projected customer value for a health insurer Consulting Project in Health Insurance (with Pricewaterhouse Coopers Australia). Used Salford Systems CART and MARS. Lifetime customer value is the discounted present value of income less expenses associated with a customer

7 Developing a model of total projected customer value for a health insurer (continuation) Result: Predicted the high cost claimants with good degree of accuracy. 15% of members predicted as having the highest cost by the data mining model, yielded 56% of the total actual cost. 30% of the members predicted as having the highest cost by the data mining model, yielded 80% of the total actual cost. Much of the model is accounted for by the dependence of cost on age. The model improves the gains chart by around 10% of total cost in the high cost part of the population over a simple model depending on age. Details of the resulting model are commercially sensitive. Results showed that many other factors, in addition to age, contribute to the predicted cost.

8 Developing a model of total projected customer value for a health insurer (continuation) How CART was used: Exploratory analysis: preliminary tree model suggested segmenting the customer base into 4 broad groups according to age and previous claims experience Separate CART model created for each of the 4 broad segments discovered in Exploratory Analysis phase: Sometimes risk drivers were similar but in many cases there were significant differences How MARS was used: Included CART decision tree segment for each record in the form of a categorical variable as one of the input variables. Also included as inputs: variables selected by CART as important predictors. MARS ranked variables in order of their importance. The predictor variable representing the CART decision tree output was ranked as most important Other, mostly continuous variables were also ranked as important which showed that the MARS model was finding minor linear effects that were not picked up by the decision tree. Copyright Salford Systems 2009

9 Premium Increase Optimization, TreeNet/GLM example Consulting Project in Home Insurance (with Taylor Fry Consulting Actuaries) Used Salford Systems TreeNet and GLM Although Home Insurance example, also applicable to Health Insurance Problem: Average renewal rates are around 87.5% Current renewal price is risk cost + expenses + target profit Constraint: +/- $50 from current price Details camouflaged at customer request Goal: pricing insurance products according to what customer is prepared to pay What is best that can be achieved? Insurance profit vs renewal rate

10 Premium Increase Optimization, TreeNet/GLM example (continuation) Approach Taken First Modeling Step: Used TreeNet: Quickly provided a benchmark model Variable selection Feature identification 2 nd Step: Built GLM Model 3 rd Step: Assessed Performance of GLM model versus Benchmark model Without data mining techniques, model building would take far longer. TreeNet model gives performance that is usually close to (sometimes better than) the final model adopted for demand modeling.

11 Premium Increase Optimization, TreeNet/GLM example (continuation) Results: Knowledge of drivers of renewal delivers quick wins Segment into high/low renewal Analyze drivers and levers Manage low expected renewal rate segments throughout the renewal process Identified customers less sensitive to price increases

12 Other Examples Contact Salford Systems: Actuarial Case Study Presentations Available Combining Linear and Non-Linear Modeling Techniques: Case Study Example from the Insurance Industry Insurance Fraud Detection Use of Data Mining (CART) and Rule-based Methods to Infer Claims Payment Policy from the Analysis of Paid Claims Text Mining Challenges and TreeNet: Impact of Textual Information in Claims Cost Prediction Statistical Case Estimation Modelling - An Overview of the NSW WorkCover Model CART Plus Linear Models for Actuarial Applications Insurance Premium Increase Optimization: Case Study And more.

13 Other Examples Contact Salford Systems: Marketing Case Study Presentations Available: Marketing Strategies for Retail Customers Based on Predictive Behavior Models Modeling Effectiveness of Marketing Campaigns How to Exploit the Homogeneity of Data or the Homogeneity of Past Experience in High Volume Direct Marketing Programs Predicting Customer Behavior Trends Over Space and Time CART Analysis to Augment Predefined Market Zones & Boost Response Rates for Direct Mail Campaigns High-Level Marketing Strategy: Deciding Whether to Customize Product Offerings and Promotion and Whether to Reward Best Customers with Perks Application of CART in Determining Reserve Levels for Customer Loyalty Points Achieving Better Insights Into Customer Behavior Through Integrating Market Research with Decision Tree Behavior Modeling

14 Interaction Detection with TreeNet Methodology Example

15 The challenge of interaction detection Classical statistical modeling is focused primarily on the development of linear additive modeling Models are of the form Y = A + B1X1 + B2X2 + +BkXk The predictors or attributes Xi are either raw data columns or data after repairs are made such as missing value imputations and capping or elimination of extreme values Non-linearity is introduced into the models in limited ways such as via the log transform of Y (for continuous Y>0) and a collection of well known transforms of the Xi Credit risk scorecard technology introduces transforms derived from binning continuous predictors

16 Classical Statistical Model Performance Classical statistical models represent a huge fraction of real world models deployed in enterprises world wide Such models are popular in part because statisticians are well trained to develop them The models are also popular because they tend to give good performance no matter how such performance is measured (e.g. R-squared, Area Under the ROC curve, top decile lift, etc) Such models are so restrictive they must over-simplify but they benefit from their stability (low variance)

17 Bias/Variance Trade-Off The universe of possible models is huge whereas the subset of linear or linear models is quite small Forcing a model to conform to a specific mathematical form is likely to introduce distortions We are therefore willing to say that the classical models are almost certainly biased As data become more plentiful the models almost certainly converge to an incorrect representation of the data generating process In contrast, modern learning machines such as TreeNet and CART converge to a correct unbiased representation Classical models have the advantage of lower variance which might translate into an overall better model in smaller samples

18 Classical Model Shortcomings: Sources of Bias The classical statistical model is expected to fall short in two areas: Incorrect representation of nonlinearities in individual predictors Absence of relevant interactions Interactions will be the key to identifying important data segments which behave differently from the dominant patterns Specifying a model correctly is the most challenging task facing the modeler Until now there has been no reliable way for a modeler to determine whether they have found a correct specification

19 Classical Interaction Detection A popular method of statistical interaction detection is to begin by first building the best possible additive model. Then, interactions terms, which look like products of main effects, e.g. Xi*Xj are added to the model and tested for significance One problem with this method is that the number of possible interactions grows more rapidly than square of the number of predictors All the usual challenges of model construction apply: the XiXj may not become visible unless one also includes XmXn There may well be higher order interactions XiXjXk Interactions may only exist and may only be detectable in subregions of the predictors (Xi Xi>c)*(Xj Xj<d)

20 What is Needed The classical approach of building the best possible additive model and then testing for interactions is largely impractical and largely infeasible What is needed is a methodology that can automatically discover the right transformations of each predictor and introduce the right interactions This methodology must be far more flexible than classical modeling and yet be resistant to overfitting It must be able to tell us whether or not interactions are present and to identify precisely which interactions are present

21 TreeNet and Model Development TreeNet is Jerome Friedman s stochastic gradient boosting first described in Starting from the base of MART (Multiple Additive Regression Trees) TreeNet has evolved over the past decade into a powerful modeling machine. TreeNet can fit a variety of models, including regression and logistic regression. The models are non-parametric and based on hundreds if not thousands of small regression trees. Each tree is designed to learn very little from the data and thus many are needed to complete the learning process Models are in the form of error correcting updates

22 Some interesting TreeNet Features Trees are grown on only a random subset of the training data, typically a random half We never train on all the training data at one time Model updates are small. We do not allow the model to change more than a little in any training cycle The equivalent of outlying data (points that are badly mispredicted) are eventually ignored in training cycles We do not allow anomalies to have much influence on the model

23 A TreeNet Model For the binary dependent variable it can be shown that the TreeNet model is a non-parametric logistic regression For the continuous dependent variable the TreeNet model is a nonparametric regression fit to maximize (or minimize) one of the following objective functions: Least Squares Residuals Least Absolute Deviations Huber-M hybrid of LS and LAD (LS for small residuals, LAD for large)

24 Some Key TreeNet Features Automatic variable selection. TreeNet is a superb ranking machine. TreeNet tends to include more predictors than other learning machines TreeNet tends to give effective ranking of predictors allowing the modeler to select small subsets of reliable predictors TreeNet contains built-in step-wise variable selection (backward, forward, or backward/forward) to automatically search for a best model Start with all variables in model Using the variable importance ranking remove the R least important predictors (typically R=1) Repeat until all predictors have been removed Identify best model based on preferred performance metric

25 Forward Stepping Starting with best model test every predictor in a specified subset to see which is best to add Repeat forward stepping for F steps Identify best new model which is a final model candidate

26 Testing for Interactions When the TreeNet model is built with only 2-node trees the model is essentially limited to an additive model With a single split in the tree the tree cannot capture any interactions A 2-node tree TreeNet is thus an additive model. Observe that an additive model can be highly nonlinear Model is of the form Y = A + F1(X1) + F2(X2) +..Fk(Xk) Where the Fi are the nonlinear functions discovered by TreeNet Note that there may be many Fi associated with a specific predictor Xj ( cumulate these into G(Xi) = Σ Fq(Xi) ) If the TreeNet contains only 2-node trees we can collect all trees associated with a given Xi to arrive at the final Y = A + G1(X1) + G2(X2) + + Gk(Xk) Copyright Salford Systems 2009

27 Building 2-Node TreeNets Important to keep in mind that when only one split is allowed in a tree the amount of learning that can take place is severely limited. 2-node tree TreeNets may require a very large number of trees to extract all the information in the data Some examples we will show use 20,000 trees Unless the TreeNets are fully expanded it will not be possible to measure the true predictive power of the additive model

28 General TreeNet Models TreeNet models are permitted to contain trees of any size We can grow trees with 2,3,4,5,6,12,50 etc nodes We generally favor small trees because we want to limit the amount of learning in any training cycle We therefore tend to keep trees to sizes like 6,9,12. However experimentation is always recommended and we have encountered data where larger trees perform better

29 The Default 6-Node Tree Friedman recommended a default setting of 6-nodes and our experiments confirm that this size of tree performs well across broad range of data sets A 6-node tree clearly permits interactions. A tree with 6 terminal nodes contains 5 internal splits and if it is as close to balanced as possible like the tree below then it can contain up to 3 different variables along its longest branch If the tree is maximally unbalanced then it can contain up to 5 different variables along the longest branch There is no guarantee that different variables will be used as we progress down a tree. The same variable might be used several times. In general we observe that the 6 node tree should be adequate to uncover 3-way interactions

30 Global Intercation Test Compare Unrestricted TreeNet model allowing moderate sized trees Restricted TreeNet model confined to just 2-node trees Must take into account the total learning in each model (number of trees * number of nodes per tree) Otherwise the 2-node tree model will be at an automatic disadvantage Simply allow each model to reach convergence

31 A Global Interaction Test Consider a TreeNet model built with 2-nodes and compare this model with a 6-node TreeNet model Recall that the TreeNets must have been allowed to grow out fully to locate the optimal number of trees for prediction By comparing the restricted 2-node tree model with the 6- node tree model we can conduct a definitive test for the presence of interactions of moderate degree: If the larger tree (unrestricted) model sufficiently outperforms of the (restricted) 2-node model then we have compelling evidence that interactions are present in the data generation process At present we do not have a definitive statistic for testing this hypothesis but classical statistical tests can be developed when comparing predictions of constrained and unconstrained models on holdout data

32 Which Interactions Are Present Our global test is sufficient to establish the existence of interactions but not sufficient to identify which specific interactions are present All we can conclude from a positive result of our test is that interactions exist and from a negative result that interactions do not exist However this is a major step forward as this test is fully automatic We have found evidence that many mainstream consumer risk models are adequately modeled with additive TNs The next step for us is therefore to try to identify specific interactions

33 Interaction Detection In TreeNet Interaction in TreeNet has progressed along two fronts. Friedman suggested one strategy in his paper. At Salford Systems, Cardell, Golovnya, and Steinberg suggested a slightly different method.

34 Interaction Measurement From the TreeNet model extract the function (or smooth) Y(Xi, Xj Z) (1) which is based on averaging the Y associated with all observed Xi, Xj pairs over all observed Z Now repeat the process for the single dependencies Y(Xi Xj, Z) and Y(Xj Xi, Z) (2) Compare the predictions derived from (1) and (2) across an appropriate region of the (Xi, Xj) space

35 A Simple Example: Financial Market Behavior Top 15% of values Bottom 85% of values Dependent Variable: Continuous with a long right hand tail

36 Summary Stats: Target and 9 Predictors TARGET variable BPS is truly continuous; others are ideal for a tree model We randomly set aside 20% of the data for test

37 Baseline CART Model: Test Data MSE (Random 20%) 90 node tree SE node tree SE R2 better than.97 on test data Observe that a CART model allows for high order interactions

38 Naïve Regression: R2 is only.74 on train data R2 on test data is.68

39 MARS Models results Main Effects (nonlinear) way interactions way interactions way interactions MARS main effects and interactions determined by 4-way model

40 TreeNet Results: Test MSE TreeNet 6-node unconstrained (3303 trees) TreeNet 2-node tree (20,000 trees) Although the starting sum of squares in the data is large it seems intuitive that the difference between the two models is substantial The 2-node TreeNet is similar in performance to the MARS model with 2-way interactions even though it prevents interactions The 2-node TreeNet underperforms a single CART tree

41 TreeNet Interactions Ranking Report Based on the 6-node tree TreeNet we calculate the degree of interaction observed for the most important variables This report reveals that e.g. AVG_VOL is involved in important interactions

42 Detailed Interaction Reports Measured interactions based on the comparison of 2D and 3D relationships between target and predictors Allows us to hypothesize which interactions are likely to matter

43 Nine predictors permit 36 2-way Top rated 2-way interactions (6 in all) interactions AVG_VOL * AVG_TREND * SPREAD_Q SPREAD_Q * AVG_TREND * AVG_SIZE AVG_TREND * AVG_SIZE AVG_SIZE * MKT_CAP

44 Testing Interactions Salford Systems has developed an Interaction Control Language (ICL) for TreeNet models The language allows the modeler to specify precisely the types of interactions which will be permitted in the TreeNet ICL allow X1 X2 X3 X4 / 2 Specifies that only 2-way interactions are allowed among the collection of predictors listed (X1-X4) The ICL language was developed in-house for private clients in 2006 and has been the basis of all of our interaction detection work

45 The ICL Language The ICL allows a broad range range of controls such as: ICL ADDITIVE x1 x2 x3 ICL ALLOW x5 x6 x7 x8 x9 / 3 ICL DISALLOW x9 x11 x5 x7 x21 x25 / 4 The ADDITIVE keyword prevents any predictor from interacting with any other variable in the model. Practically this means that should such a predictor be selected to split the root node of the tree than it can be the only predictor anywhere in that tree TreeNets restricted to ADDITIVE predictors can still contain trees with as many terminal nodes as the modeler prefers to work with. But each tree will be grown using a single predictor

46 ADDITIVE vs 2-node Trees 2-node trees are often thought to guarantee ADDITIVE models but this is not strictly true If the training data are complete (no missing values present) then indeed a 2-node tree TreeNet yields an additive (in the predictors) model However, if missing values are present then TreeNet requires the use of missing value indicator predictors of the form If Xi == MISSING then go LEFT; Else if CONDITION then go RIGHT; The problem is that TreeNet permits the condition in the ELSE clause to involve a variable other than Xi

47 2-node trees in TreeNet Although the user may request 2-node trees in a TreeNet model if missing values are present for all predictors then the smallest possible tree that can be grown contains at least 3-nodes. Here is an example of such a split

48 Two-Node Tree with Missings One variable in tree Is Xi missing? Yes No Terminal Node Is Xi <= c Terminal Node Terminal Node Both internal nodes are split using the same variable Xi

49 Two-Node Tree with Missings Two variables in tree Is Xi missing? Yes No Terminal Node Is Xk <= c Terminal Node Terminal Node Xi (root node split) and Xk (internal split) are the two variables in the tree

50 2 node tree details A 2-node tree in TreeNet can never contain more than one split on a standard predictor. Therefore if the predictor is never missing the tree using this variable will have only two nodes However, the TreeNet mechanism does not count a split on a missing value indicator as a genuine split As a result 2-node tree may contain any number of missing value indicator splits. The following tree is technically a 2- node tree by TreeNet standards.

51 Allowable 2-node tree Missing? Terminal Node Missing Terminal Node Split Terminal Node Terminal Node

52 2-node trees and Interactions The important point in this discussion is that in standard TreeNet we cannot guarantee the complete absence of interactions with 2-node trees The 2-node tree will allow interactions of any degree between missing value indicators and also a general interaction between a single predictor and missing value indicators To enforce literal non-interactivity we must rely on the ICL mechanism. If we specify that Xi is to enter the TreeNet as ADDITIVE then if we split the root using the MVI for Xi any subsequent split can use only Xi

53 Test MSE Performance: Restricted Models 2-node trees (20,000 trees) As expected the interaction report shows 0.00 for all interaction scores. The data contain no missing values so literal 2-node trees are generated for this model We require 20,000 trees because 2-node trees can learn only little in any training cycle

54 Test MSE Performance: Various Interactions Allowed 2-node trees (20,000 trees) Allow 2-way interactions Allow 3-way interactions Allow 4-way interactions Allow 5-way interactions Allow 6-way interactions Allow 7-way interactions Unconstrained TreeNet It is plain that there is a huge difference between an additive and an interaction model

55 Refining the Model Our choices include accepting a 2-way interaction model and searching for an explicit set of interactions to work with AVG_DLY_VOL, AVG_TRND, AVG_SIZE / MKT_CAP, AVG_DLY_VOL, AVG_TRND, AVG_SIZE, SPREAD_Q / Here we see that just 3 two-way interactions can capture more than 90% of the difference between an additive and a fully unconstrained model

56 TreeNet with ICL TreeNet in the PRO EX version contains the ICL language and is available on request from Salford Systems. Contact: David Tolliver General information

57 Salford Systems Developer of CART, MARS, TreeNet and RandomForests Advanced statistical software since 1983 PROC MLOGIT and PROC MPROBIT add-ins for SAS mainfames Technical Advisers and close collaborators Jerome H. Friedman, Stanford University ( CART, MARS, Treenet) Leo Breiman*, UC Berkeley (RandomForests) Richard Olshen, Stanford University (Survival CART, Bioinformatics) Charles Stone, UC Berkeley (CART, MARS large sample theory) Rob Tibshirani, Stanford (modern statistical methods) * Leo Breiman passed away in July 2005

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

More information

Data Mining Opportunities in Health Insurance

Data Mining Opportunities in Health Insurance Data Mining Opportunities in Health Insurance Methods Innovations and Case Studies Dan Steinberg, Ph.D. Copyright Salford Systems 2008 Analytical Challenges for Health Insurance Competitive pressures in

More information

Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

More information

Tree Ensembles: The Power of Post- Processing. December 2012 Dan Steinberg Mikhail Golovnya Salford Systems

Tree Ensembles: The Power of Post- Processing. December 2012 Dan Steinberg Mikhail Golovnya Salford Systems Tree Ensembles: The Power of Post- Processing December 2012 Dan Steinberg Mikhail Golovnya Salford Systems Course Outline Salford Systems quick overview Treenet an ensemble of boosted trees GPS modern

More information

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether

More information

Identifying SPAM with Predictive Models

Identifying SPAM with Predictive Models Identifying SPAM with Predictive Models Dan Steinberg and Mikhaylo Golovnya Salford Systems 1 Introduction The ECML-PKDD 2006 Discovery Challenge posed a topical problem for predictive modelers: how to

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

A Property & Casualty Insurance Predictive Modeling Process in SAS

A Property & Casualty Insurance Predictive Modeling Process in SAS Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing

More information

Churn Modeling for Mobile Telecommunications:

Churn Modeling for Mobile Telecommunications: Churn Modeling for Mobile Telecommunications: Winning the Duke/NCR Teradata Center for CRM Competition N. Scott Cardell, Mikhail Golovnya, Dan Steinberg Salford Systems http://www.salford-systems.com June

More information

The Predictive Data Mining Revolution in Scorecards:

The Predictive Data Mining Revolution in Scorecards: January 13, 2013 StatSoft White Paper The Predictive Data Mining Revolution in Scorecards: Accurate Risk Scoring via Ensemble Models Summary Predictive modeling methods, based on machine learning algorithms

More information

CART 6.0 Feature Matrix

CART 6.0 Feature Matrix CART 6.0 Feature Matri Enhanced Descriptive Statistics Full summary statistics Brief summary statistics Stratified summary statistics Charts and histograms Improved User Interface New setup activity window

More information

A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND

A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND Paper D02-2009 A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND ABSTRACT This paper applies a decision tree model and logistic regression

More information

Risk pricing for Australian Motor Insurance

Risk pricing for Australian Motor Insurance Risk pricing for Australian Motor Insurance Dr Richard Brookes November 2012 Contents 1. Background Scope How many models? 2. Approach Data Variable filtering GLM Interactions Credibility overlay 3. Model

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Addressing Analytics Challenges in the Insurance Industry. Noe Tuason California State Automobile Association

Addressing Analytics Challenges in the Insurance Industry. Noe Tuason California State Automobile Association Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile Association Overview Two Challenges: 1. Identifying High/Medium Profit who are High/Low Risk of Flight Prospects

More information

Gerry Hobbs, Department of Statistics, West Virginia University

Gerry Hobbs, Department of Statistics, West Virginia University Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

More information

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information

Using Adaptive Random Trees (ART) for optimal scorecard segmentation

Using Adaptive Random Trees (ART) for optimal scorecard segmentation A FAIR ISAAC WHITE PAPER Using Adaptive Random Trees (ART) for optimal scorecard segmentation By Chris Ralph Analytic Science Director April 2006 Summary Segmented systems of models are widely recognized

More information

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

More information

Benchmarking of different classes of models used for credit scoring

Benchmarking of different classes of models used for credit scoring Benchmarking of different classes of models used for credit scoring We use this competition as an opportunity to compare the performance of different classes of predictive models. In particular we want

More information

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar Prepared by Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com Louise.francis@data-mines.cm

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise

More information

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

Modeling Lifetime Value in the Insurance Industry

Modeling Lifetime Value in the Insurance Industry Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

WHITEPAPER. How to Credit Score with Predictive Analytics

WHITEPAPER. How to Credit Score with Predictive Analytics WHITEPAPER How to Credit Score with Predictive Analytics Managing Credit Risk Credit scoring and automated rule-based decisioning are the most important tools used by financial services and credit lending

More information

How To Build A Predictive Model In Insurance

How To Build A Predictive Model In Insurance The Do s & Don ts of Building A Predictive Model in Insurance University of Minnesota November 9 th, 2012 Nathan Hubbell, FCAS Katy Micek, Ph.D. Agenda Travelers Broad Overview Actuarial & Analytics Career

More information

DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING

DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING ABSTRACT The objective was to predict whether an offender would commit a traffic offence involving death, using decision tree analysis. Four

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Predictive Modeling Techniques in Insurance

Predictive Modeling Techniques in Insurance Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics

More information

Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Better credit models benefit us all

Better credit models benefit us all Better credit models benefit us all Agenda Credit Scoring - Overview Random Forest - Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis

More information

Cross Validation. Dr. Thomas Jensen Expedia.com

Cross Validation. Dr. Thomas Jensen Expedia.com Cross Validation Dr. Thomas Jensen Expedia.com About Me PhD from ETH Used to be a statistician at Link, now Senior Business Analyst at Expedia Manage a database with 720,000 Hotels that are not on contract

More information

Penalized regression: Introduction

Penalized regression: Introduction Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

More information

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are

More information

The Artificial Prediction Market

The Artificial Prediction Market The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Why Ensembles Win Data Mining Competitions

Why Ensembles Win Data Mining Competitions Why Ensembles Win Data Mining Competitions A Predictive Analytics Center of Excellence (PACE) Tech Talk November 14, 2012 Dean Abbott Abbott Analytics, Inc. Blog: http://abbottanalytics.blogspot.com URL:

More information

Beating the NCAA Football Point Spread

Beating the NCAA Football Point Spread Beating the NCAA Football Point Spread Brian Liu Mathematical & Computational Sciences Stanford University Patrick Lai Computer Science Department Stanford University December 10, 2010 1 Introduction Over

More information

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct

More information

!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"

!!!#$$%&'()*+$(,%!#$%$&'()*%(+,'-*&./#-$&'(-&(0*.$#-$1(2&.3$'45 !"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"!"#"$%&#'()*+',$$-.&#',/"-0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:

More information

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges A Basic Guide to Modeling Techniques for All Direct Marketing Challenges Allison Cornia Database Marketing Manager Microsoft Corporation C. Olivia Rud Executive Vice President Data Square, LLC Overview

More information

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,

More information

Product recommendations and promotions (couponing and discounts) Cross-sell and Upsell strategies

Product recommendations and promotions (couponing and discounts) Cross-sell and Upsell strategies WHITEPAPER Today, leading companies are looking to improve business performance via faster, better decision making by applying advanced predictive modeling to their vast and growing volumes of data. Business

More information

How To Make A Credit Risk Model For A Bank Account

How To Make A Credit Risk Model For A Bank Account TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions

More information

Why do statisticians "hate" us?

Why do statisticians hate us? Why do statisticians "hate" us? David Hand, Heikki Mannila, Padhraic Smyth "Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

Applying Data Science to Sales Pipelines for Fun and Profit

Applying Data Science to Sales Pipelines for Fun and Profit Applying Data Science to Sales Pipelines for Fun and Profit Andy Twigg, CTO, C9 @lambdatwigg Abstract Machine learning is now routinely applied to many areas of industry. At C9, we apply machine learning

More information

Chapter 7: Data Mining

Chapter 7: Data Mining Chapter 7: Data Mining Overview Topics discussed: The Need for Data Mining and Business Value The Data Mining Process: Define Business Objectives Get Raw Data Identify Relevant Predictive Variables Gain

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

Data Mining Methods: Applications for Institutional Research

Data Mining Methods: Applications for Institutional Research Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014

More information

IBM SPSS Direct Marketing 19

IBM SPSS Direct Marketing 19 IBM SPSS Direct Marketing 19 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This document contains proprietary information of SPSS

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK Agenda Analytics why now? The process around data and text mining Case Studies The Value of Information

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Evaluation & Validation: Credibility: Evaluating what has been learned

Evaluation & Validation: Credibility: Evaluating what has been learned Evaluation & Validation: Credibility: Evaluating what has been learned How predictive is a learned model? How can we evaluate a model Test the model Statistical tests Considerations in evaluating a Model

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

Easily Identify Your Best Customers

Easily Identify Your Best Customers IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do

More information

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4. Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics

More information

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES Translating data into business value requires the right data mining and modeling techniques which uncover important patterns within

More information

Paper AA-08-2015. Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM

Paper AA-08-2015. Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM Paper AA-08-2015 Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM Delali Agbenyegah, Alliance Data Systems, Columbus, Ohio 0.0 ABSTRACT Traditional

More information

Data Mining Techniques Chapter 6: Decision Trees

Data Mining Techniques Chapter 6: Decision Trees Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Leveraging Ensemble Models in SAS Enterprise Miner

Leveraging Ensemble Models in SAS Enterprise Miner ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to

More information

Use Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study

Use Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study Use Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study Tongshan Chang The University of California Office of the President CAIR Conference in Pasadena 11/13/2008

More information

Summary. WHITE PAPER Using Segmented Models for Better Decisions

Summary. WHITE PAPER Using Segmented Models for Better Decisions WHITE PAPER Using Segmented Models for Better Decisions Summary Experienced modelers readily understand the value to be derived from developing multiple models based on population segment splits, rather

More information

D-optimal plans in observational studies

D-optimal plans in observational studies D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

More information

Text mining for insurance claim cost prediction

Text mining for insurance claim cost prediction Text mining for insurance claim cost prediction Prepared by Inna Kolyshkina and Marcel van Rooyen Presented to the Institute of Actuaries of Australia XVth General Insurance Seminar 16-19 October 2005

More information

Compliance. Technology. Process. Using Automated Decisioning and Business Rules to Improve Real-time Risk Management

Compliance. Technology. Process. Using Automated Decisioning and Business Rules to Improve Real-time Risk Management Technology Process Compliance Using Automated Decisioning and Business Rules to Improve Real-time Risk Management Sandeep Gupta, Equifax James Taylor, Smart (enough) Systems August 2008 Equifax is a registered

More information

Prediction of Stock Performance Using Analytical Techniques

Prediction of Stock Performance Using Analytical Techniques 136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University

More information

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIRPO

More information

Lecture 10: Regression Trees

Lecture 10: Regression Trees Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

More information

A Better Statistical Method for A/B Testing in Marketing Campaigns

A Better Statistical Method for A/B Testing in Marketing Campaigns A Better Statistical Method for A/B Testing in Marketing Campaigns Scott Burk Marketers are always looking for an advantage, a way to win customers, improve market share, profitability and demonstrate

More information

A Property and Casualty Insurance Predictive Modeling Process in SAS

A Property and Casualty Insurance Predictive Modeling Process in SAS Paper 11422-2016 A Property and Casualty Insurance Predictive Modeling Process in SAS Mei Najim, Sedgwick Claim Management Services ABSTRACT Predictive analytics is an area that has been developing rapidly

More information

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING Xavier Conort xavier.conort@gear-analytics.com Session Number: TBR14 Insurance has always been a data business The industry has successfully

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

REPORT DOCUMENTATION PAGE

REPORT DOCUMENTATION PAGE REPORT DOCUMENTATION PAGE Form Approved OMB NO. 0704-0188 Public Reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,

More information

Efficiency in Software Development Projects

Efficiency in Software Development Projects Efficiency in Software Development Projects Aneesh Chinubhai Dharmsinh Desai University aneeshchinubhai@gmail.com Abstract A number of different factors are thought to influence the efficiency of the software

More information

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

A Decision Theoretic Approach to Targeted Advertising

A Decision Theoretic Approach to Targeted Advertising 82 UNCERTAINTY IN ARTIFICIAL INTELLIGENCE PROCEEDINGS 2000 A Decision Theoretic Approach to Targeted Advertising David Maxwell Chickering and David Heckerman Microsoft Research Redmond WA, 98052-6399 dmax@microsoft.com

More information

WHITEPAPER. Creating and Deploying Predictive Strategies that Drive Customer Value in Marketing, Sales and Risk

WHITEPAPER. Creating and Deploying Predictive Strategies that Drive Customer Value in Marketing, Sales and Risk WHITEPAPER Creating and Deploying Predictive Strategies that Drive Customer Value in Marketing, Sales and Risk Overview Angoss is helping its clients achieve significant revenue growth and measurable return

More information

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation

More information

Fraud Detection for Online Retail using Random Forests

Fraud Detection for Online Retail using Random Forests Fraud Detection for Online Retail using Random Forests Eric Altendorf, Peter Brende, Josh Daniel, Laurent Lessard Abstract As online commerce becomes more common, fraud is an increasingly important concern.

More information

Combining Linear and Non-Linear Modeling Techniques: EMB America. Getting the Best of Two Worlds

Combining Linear and Non-Linear Modeling Techniques: EMB America. Getting the Best of Two Worlds Combining Linear and Non-Linear Modeling Techniques: Getting the Best of Two Worlds Outline Who is EMB? Insurance industry predictive modeling applications EMBLEM our GLM tool How we have used CART with

More information

2015 Workshops for Professors

2015 Workshops for Professors SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market

More information

UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee

UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee 1. Introduction There are two main approaches for companies to promote their products / services: through mass

More information

SOA 2013 Life & Annuity Symposium May 6-7, 2013. Session 30 PD, Predictive Modeling Applications for Life and Annuity Pricing and Underwriting

SOA 2013 Life & Annuity Symposium May 6-7, 2013. Session 30 PD, Predictive Modeling Applications for Life and Annuity Pricing and Underwriting SOA 2013 Life & Annuity Symposium May 6-7, 2013 Session 30 PD, Predictive Modeling Applications for Life and Annuity Pricing and Underwriting Moderator: Barry D. Senensky, FSA, FCIA, MAAA Presenters: Jonathan

More information

Course Syllabus. Purposes of Course:

Course Syllabus. Purposes of Course: Course Syllabus Eco 5385.701 Predictive Analytics for Economists Summer 2014 TTh 6:00 8:50 pm and Sat. 12:00 2:50 pm First Day of Class: Tuesday, June 3 Last Day of Class: Tuesday, July 1 251 Maguire Building

More information

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT

More information

GLM, insurance pricing & big data: paying attention to convergence issues.

GLM, insurance pricing & big data: paying attention to convergence issues. GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.

More information

Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller

Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller Agenda Introduktion till Prediktiva modeller Beslutsträd Beslutsträd och andra prediktiva modeller Mathias Lanner Sas Institute Pruning Regressioner Neurala Nätverk Utvärdering av modeller 2 Predictive

More information

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Nine Common Types of Data Mining Techniques Used in Predictive Analytics 1 Nine Common Types of Data Mining Techniques Used in Predictive Analytics By Laura Patterson, President, VisionEdge Marketing Predictive analytics enable you to develop mathematical models to help better

More information

White Paper. Redefine Your Analytics Journey With Self-Service Data Discovery and Interactive Predictive Analytics

White Paper. Redefine Your Analytics Journey With Self-Service Data Discovery and Interactive Predictive Analytics White Paper Redefine Your Analytics Journey With Self-Service Data Discovery and Interactive Predictive Analytics Contents Self-service data discovery and interactive predictive analytics... 1 What does

More information