Predictive Analytics for Life Insurance: How Data and Advanced Analytics are Changing the Business of Life Insurance Seminar May 23, 2012

Size: px
Start display at page:

Download "Predictive Analytics for Life Insurance: How Data and Advanced Analytics are Changing the Business of Life Insurance Seminar May 23, 2012"

Transcription

1 Predictive Analytics for Life Insurance: How Data and Advanced Analytics are Changing the Business of Life Insurance Seminar May 23, 2012 Session 2 How to Build a Risk Based Analytical Model for Life Insurance Presenter James C. Guszcza, FCAS, MAAA

2 Predictive Modeling From A to B Society of Actuaries Los Angeles James Guszcza, PhD, FCAS, MAAA Deloitte Consulting LLP May 23, 2011 Analyzing Analytics Three thoughts capture a major reason why predictive analytics is ubiquitous... The whole of science is a refinement of everyday thinking. Albert Einstein Decision-making is at the heart of administration. Herbert Simon Human judges are not merely worse than optimal regression equations; they are worse than almost any regression equation. Richard Nisbett & Lee Ross Human Inference: Strategies and Shortcomings of Social Judgment 2 Deloitte Analytics Institute 2011 Deloitte LLP 1

3 Agenda Introduction Why Business Analytics is Now Ubiquitous Predictive Modeling: Levels of Discussion Strategy Methodology Technique Why Business Analytics is Increasingly Ubiquitous 2

4 Is Business Analytics New? Not really Easy example: credit scoring is an early example of business analytics and has been around for many decades. 5 Deloitte Analytics Institute 2010 Deloitte LLP Is Business Analytics New? Predictive Modeling in Insurance isn t new but it s rate of adoption continues to increase. Property-Casualty insurers have been intensively using predictive models for ~ 15 years now. 1990s: Personal credit used to price, underwrite personal insurance. 2000s: scoring models used to price, underwriter commercial and professional liability insurance. Today: Life insurers are adopting or preparing to adopt predictive modeling techniques. 6 Deloitte Analytics Institute 2010 Deloitte LLP 3

5 Why Now? (Answer #1) Technology (Moore s Law) Cost of storage and computing power has decreased exponentially Data Third-party data is becoming increasingly available Companies are learning to do more with their internal data Software and algorithms Great analytic ideas keep coming from statistics, economics, machine learning, marketing, Free tools like R 7 Deloitte Analytics Institute 2011 Deloitte LLP Exponential Growth of Data Availability More and more data is gathered, stored, and available to be analyzed Time Challenges Challenging Times for Future Innovation Search 8 Deloitte Analytics Institute 2011 Deloitte LLP 4

6 Exponential growth of Statistical Computing Availability The open-source R statistical computing environment has experienced exponential growth The number of contributed packages has grown exponentially 9 Deloitte Analytics Institute 2011 Deloitte LLP Why Now? (Answer #2) OR: Clinical vs Actuarial Judgment the Motion Picture 10 Deloitte Analytics Institute 2011 Deloitte LLP 5

7 There s Something About Linda To illustrate, think about this person: Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations 11 Deloitte Analytics Institute 2011 Deloitte LLP There s Something About Linda Think about this person: Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations Now rank these possible scenarios in order of probability: Linda is active in the feminist movement Linda is a bank teller Linda is a bank teller and is active in the feminist movement 12 Deloitte Analytics Institute 2011 Deloitte LLP 6

8 There s Something About Linda Daniel Kahneman and Amos Tversky posed precisely this question to a group of people They found that 87% of the people thought that feminist bank teller was more probable than bank teller But this is logically impossible. Mathematical fact: So in particular: Prob(A&B) < Prob(B) Prob(feminist & bank teller) < Prob(bank teller) 13 Deloitte Analytics Institute 2011 Deloitte LLP Analytics Everywhere Neural net models are used to predict movie box-office returns based on features of their scripts Decision tree models are used to help ER doctors better triage patients complaining of chest pain. Predictive models are used to predict the price of different wine vintages based on variables about the growing season. Predictive models to help commercial insurance underwriters better select and price risks. Predict which non-custodial parents are at highest risk of falling into arrears on their child support. Predicting which job candidates will successfully make it through the interviewing / recruiting process and which candidates will subsequently retain and perform well on the job. Predicting which doctors are at highest risk of being sued for malpractice. Predicting the ultimate severity of injury claims. (Deloitte applications in green) 14 Deloitte Analytics Institute 2011 Deloitte LLP 7

9 A Practical Conclusion There is no controversy in social science which shows such a large body of quantitatively diverse studies coming out so uniformly in the same direction as this one. When you are pushing over 100 investigations, predicting everything from the outcome of football games to the diagnosis of liver disease, and when you can hardly come up with half a dozen studies showing even a weak tendency in favor of the clinician, it is time to draw a practical conclusion. -- Paul Meehl, Causes and Effects of my Disturbing Little Book 15 Deloitte Analytics Institute 2011 Deloitte LLP Predictive Analytics: Fundamental Concepts 8

10 What Business Analytics Isn t One view of predictive modeling: We assemble a database, explore it, and fit models to it. DATA MODEL Is this a full conception of analytics? 17 Deloitte Analytics Institute 2011 Deloitte LLP What Business Analytics Is No: analytics projects do not begin with data they begin with a strategic issue or problem STRATEGY DATA MODEL 18 Deloitte Analytics Institute 2011 Deloitte LLP 9

11 What Business Analytics Is No: analytics projects do not begin with data they begin with a strategic issue or problem STRATEGY DATA MODEL DECISIONS And they do not end with a model they end with a model being used to make better decisions 19 Deloitte Analytics Institute 2011 Deloitte LLP What Business Analytics Is No: analytics projects do not begin with data they begin with a strategic issue or problem STRATEGY DATA MODEL DECISIONS And they do not end with a model they end with a model being used to make better decisions Therefore, the best analytics projects projects begin with business strategy and end with IT implementation, training, creating analytics-aided decision rules, and change management. 20 Deloitte Analytics Institute 2011 Deloitte LLP 10

12 Predictive Analytics: Three Levels of Discussion Strategy Profitable growth More consistent, accurate and economical underwriting Retaining, i cross-selling to preferred policyholders ld Marketing / prospecting Promoting wellness Methodology Model design Modeling process ( actuarial science ) ( data science ) Technique Supervised and unsupervised GLM vs. decision trees vs. support vector machines 21 Deloitte Analytics Institute 2011 Deloitte LLP Methodology and Technique How does data science need actuarial science? Target variable specification, preparation Model design Domain knowledge Variable creation, variable selection Model evaluation Knowing which ambiguities to focus on and which to ignore How does actuarial science need data science? Advances in computing, modeling techniques Programming with data Exploratory data analysis, data visualization ideas Ideas from other fields can be applied to insurance problems 22 Deloitte Analytics Institute 2011 Deloitte LLP 11

13 Semantics: Data Mining vs Predictive Modeling Data Mining : exploration and discovery Find interesting patterns in databases Open-ended, cast the net wide Data visualization often an important t part Can involve brute-force machine learning algorithms Example: discovery that personal credit correlates with risky insurance behavior. Predictive Modeling : apply statistical techniques in the light of the EDA, knowledge discovery phase Quantify & synthesize relationships found during knowledge discovery Example: use credit information in a traditional predictive model 23 Deloitte Analytics Institute 2011 Deloitte LLP At the Center of It All: Data Science Or: The Collision between Statistics and Computation Today, it is arguably appropriate to call certain aspects of propertycasualty ypractice data science. This often involves more than just knowing how to run regression models. Data science goes beyond: Traditional statistics Business intelligence [BI] Information technology Image borrowed from Drew Conway s blog 24 Deloitte Analytics Institute 2011 Deloitte LLP 12

14 Predictive Analytics: Understanding the Strategy A Great Book About Predictive Analytics 26 Deloitte Analytics Institute 2011 Deloitte LLP 13

15 The Strategic Use of Analytics in Life Insurance Like baseball, insurance is a numbers game. Both fields are awash in data and a lot of money is at stake! As with baseball, so in insurance: All of this data until fairly recently has not been used in strategic ways. Percent Declined 80% 70% 60% 50% 40% 30% Underwriting Model Lift Curve (Hypothetical) Life insurance example: Models can be built to assign probabilities of (e.g.) being declined or (e.g.) being assigned to the best underwriting class based on readily available data. 20% 10% 0% Model Decile (Predicted Value) 27 Deloitte Analytics Institute 2011 Deloitte LLP The Strategic Use of Analytics in Life Insurance Strategic application: data-driven underwriting 80% Underwriting Model Lift Curve (Hypothetical) 70% Models estimate the probabilities that a risk will fall 60% into the best / worst 50% underwriting classes. In this hypothetical example Worst 10% of risks have a >70% decline rate Percent Declined 40% 30% 20% 10% 0% Model Decile (Predicted Value) 28 Deloitte Analytics Institute 2011 Deloitte LLP 14

16 The Strategic Use of Analytics in Life Insurance Strategic application: data-driven underwriting 80% Underwriting Model Lift Curve (Hypothetical) 70% Models estimate the probabilities that a risk will fall 60% into the best / worst 50% underwriting classes. In this hypothetical example Worst 10% of risks have a >70% decline rate Best 10% of risks have <5% decline rate Percent Declined 40% 30% 20% 10% 0% Model Decile (Predicted Value) 29 Deloitte Analytics Institute 2011 Deloitte LLP Predictive Analytics: Understanding the Methodology 15

17 Three Central Concepts of Predictive Analytics Scoring engines aka algorithmic solutions aka predictive models Lift / Gain analysis Lift: how much better/worse than average are the observations with the best/worst scores? Gain: what percent of the outcomes are captures in the 10% of observations with the highest scores Out-of-sample tests Estimate how the model is likely to perform once implemented Unbiased estimate of model s predictive power 31 Deloitte Analytics Institute 2011 Deloitte LLP Scoring Engines Scoring engine: a formula that classifies or separates risks (or accounts or households ) into: Expected longevity Churn likelihoodlih Profitable vs unprofitable Utilization... (non)-linear equation f() of several predictive variables Produces continuous range of scores: Y = f(x 1, X 2,,X N ) Common model form: Y = g -1 ( + 1 X 1, + 2 X 2,+ + N X N ) 32 Deloitte Analytics Institute 2011 Deloitte LLP 16

18 What Powers a Scoring Engine? Scoring engine: Y = f(x 1, X 2,,X N ) The {X 1, X 2,,X N } are at least as 1 2 N important as the f(). Domain expertise, creativity, insight, all important A major part of the modeling process consists of variable creation and selection It is possible to generate many 100s of variables Knowing what is important t to create, how to create them, and what to do with them once they are created is the steepest part of the learning curve Why predictive modeling is an art as well as a science Tacit knowledge vs textbook knowledge 33 Deloitte Analytics Institute 2011 Deloitte LLP Model Evaluation: Lift Curve Analysis Sort the data by the model score Underwriting Model Lift Curve (Hypothetical) Break into 10 equal pieces Best decile : 10% of risks with the best scores Worst decile : 10% of risks with the worst scores Lift: average measured difference in the quantity we are trying to predict or estimate Lift = segmentation power Lift translates into the ROI of the modeling project Unlike such concepts as R 2 or AIC, lift has operational meaning Percent Declined 80% 70% 60% 50% 40% 30% 20% 10% 0% Model Decile (Predicted Value) 34 Deloitte Analytics Institute 2011 Deloitte LLP 17

19 Data Mining and The Curse of Dimensionality Data Mining has both positive and negative connotations. Done well, it can lead us to new insights that result in more powerful models Done naively, it can cause our models to be fooled by randomness The curse of dimensionality the more variables we consider the harder it becomes to search through the space of possible models. As the dimension increases, so does the side-length of the cube needed to cover r% of the data space. In 10 dimensions we need to cover 80% of the range of each coordinate to capture 10% of the data. (From The Elements of Statistical Learning by Hastie, Tibshirani, Friedman) 35 Deloitte Analytics Institute 2011 Deloitte LLP The Bias-Variance Tradeoff and the Problem of Overfitting Predictive modeling techniques cannot distinguish true patterns from random variation in the data. Error measured on the dataset used to fit the model is overly optimistic The Bias-Variance Tradeoff high bias low variance <----- low bias high variance -----> Which model would you choose? error prediction e train data test data Deloitte Analytics Institute complexity (# neural net cycles) 2011 Deloitte LLP 18

20 The Bias-Variance Tradeoff and the Problem of Overfitting Low bias : The model fit is good on the training data. The model s predicted value is close to the data s expected value High variance : The model might not make accurate predictions upon implementation Bias alone is not the name of the game The Bias-Variance Tradeoff high bias low variance <----- low bias high variance -----> game Out of sample validation is a way of guarding against overfitting. prediction error train data test data complexity (# neural net cycles) 37 Deloitte Analytics Institute 2011 Deloitte LLP Out-of-Sample Model Testing / Validation A Classic methodology: Divide the data into 3 pieces Randomly and/or using time Test data: use to measure gain/lift of candidate model Train Test Val Training data: used to fit models E.g. estimate parameters of a Generalized Linear Model or Cox Hazard Model Test data: used to measure gain/lift of candidate models Perform train/test steps iteratively until you have a model you are happy with During this iterative phase, validation is held in cold storage Validation Data: used measure lift on final selected model The test data was implicitly used during model selection process, therefore it is preferable to measure lift on a truly blind test sample Validation data is often an out-of-time sample 38 Deloitte Analytics Institute 2011 Deloitte LLP 19

21 So far Therefore we want to: Build a scoring model that results in a continuous range of scores that Produces valuable segmentation power on a blind-test data sample How do we get there? 39 Deloitte Analytics Institute 2011 Deloitte LLP Predictive Analytics: Understanding the Process 20

22 The High-Level Process Data scrubbing and Variable Creation Data Exploration and Modeling Model Analysis and Implementation The modeling phases we will discuss fit into the Cross-Industry Standard Process for Data Mining [CRISP-DM] 41 Deloitte Analytics Institute 2011 Deloitte LLP Data Scrubbing and Variable Creation Research possible data sources Extract/purchase data Check data for quality (QA) Messy! (still deep in the mines) Create Predictive and Target Variables Opportunity to quantify tribal wisdom and come up with new ideas Can be a very big task! Steepest part of the learning curve 42 Deloitte Analytics Institute 2011 Deloitte LLP 21

23 Types of Predictive Variables Insurance data sources can be viewed in an outward inward hierarchy. Geodemographic Population Density, traffic, weather, crime, education, Policy Characteristics Term, whole-life, Policyholder, Household Characteristics Age, gender, martial status, #kids, income, education Prior History / Behavior Medical history, life events, credit history, education/career choices, 43 Deloitte Analytics Institute 2011 Deloitte LLP Knowing Me, Knowing You Consumer data is rapidly increasing in: Volume Scope Specificity it REALITY The nature of available data is evolving rapidly. A 3-stage evolutionary process 44 Deloitte Analytics Institute 2011 Deloitte LLP 22

24 Data Exploration and Variable Triage The variable creation phase was hypothesis-driven expansion. This phase is contraction : begin the process of analyzing and winnowing the space of candidate variables. Data visualization / Exploratory Data Analysis is a key part of this. Use EDA to: Discard weak or poorly distributed variables Consider how to handle missing variables Transform variables categoricity, linearity, Investigate correlations amongst the candidate predictors Explore dimension reduction techniques (remember curse of dimensionality) Take note of variables relative predictive power 45 Deloitte Analytics Institute 2011 Deloitte LLP Model Design Model design is where actuarial science meets data science. Translate business problem into algorithmic solution. Common considerations: Which target variable to use? Which data points / types of policies to include/exclude? Which data sources / variable types to consider? What is the level of the model (policy? HH for cross-sell?) How should model be evaluated? Lift curves / gains charts Broken down by certain segments of business Compare with existing practice How to split data into train / test / validation samples Is data sufficiently credible? 46 Deloitte Analytics Institute 2011 Deloitte LLP 23

25 Multivariate Modeling Key initial step: appropriate model design Translate ultimate use of the model into appropriate model design With the model design in place, the iterative ti modeling process begins Picks up where EDA process leaves off: process of triaging, transforming, interacting candidate predictors continues. Multiple candidate models are built/tested Sample techniques: Regression / GLM / proportional hazard models CART / MARS / Random Forests, Neural nets, Support Vector Machines 47 Deloitte Analytics Institute 2011 Deloitte LLP Multivariate Modeling: In More Detail 1) Pair down full collection of candidate predictive variables to a more manageable set 2) Iterative modeling process Build candidate models on training data Evaluate on test data Many things to tweak Alternate modeling techniques Hybrid models, ensemble models Different target t variables Different predictive variables Data filtering / sampling criteria Technique-specific modeling options: distributional assumptions, weights and offsets, cost-complexity tuning parameters, neural net topology, decision tree splitting method, multiple random samples, 48 Deloitte Analytics Institute 2011 Deloitte LLP 24

26 Considerations Do signs/magnitudes of parameters make sense? Statistically significant? Sensible from a business point of view? Readily explainable Consistent with corporate philosophy? Is the population reasonably homogenous? If not does the model behave consistently / reasonably on both groups? Does the predictive power hold up within sub-segments Continuity Are there small changes in input values that might produce large swings in scores Make sure that end-users can t game the system 49 Deloitte Analytics Institute 2011 Deloitte LLP Model Analysis and Implementation Perform model analytics Necessary for end-users to gain comfort with the model Calibrate Models Create user-friendly scale client dictates Implement models Technology implementation deep IT skills crucial Business implementation plan use of business rules, reason messages, education Monitor performance Distribution of scores over time, predictiveness, usage of model... Collect reactions, suggestions from end-users Collect list of desired improvements for v2.0 Plan model maintenance schedule 50 Deloitte Analytics Institute 2011 Deloitte LLP 25

27 Predictive Analytics: Understanding the Techniques Supervised vs Unsupervised Learning Supervised Learning Estimate expected value of Y given values of X. Generalized Linear Models (GLM) Generalized Additive Models (GAM) Survival Models (Cox Proportional Hazard) Classification and Regression Trees (CART) Multivariate Adaptive Regression Splines (MARS) Random Forests Support Vector Machines Neural Nets Unsupervised Learning Attempt to find interesting patterns amongst the X. No outcome ( target ) variable Clustering, Self-organizing maps Correlation / Principal Components / Factor Analysis 52 Deloitte Analytics Institute 2011 Deloitte LLP 26

28 Classification vs Regression Classification Problems Goal is to segment the observations of interest into 2 or more distinct categories. Average, above average, below average profitability customers Likely fraudsters vs likely non-fraudsters Likely defectors vs likely non-defectors Regression Problems Goal is to predict a continuous amount. Dollars of loss for a policy Ultimate size of claim Years of retention Days out of work 53 Deloitte Analytics Institute 2011 Deloitte LLP Parametric vs Non-Parametric Parametric Statistics Involves the specification of a probabilistic model governing the process that generated the data. e.g. Poisson Regression: assume that the process generating g the number of claims for a given policy is a Poisson process. The mean ( λ ) of each policy s Poisson distribution is a linear combination of various attributes. Non-Parametric Statistics No probability model specified Purely algebraic e.g. classification trees use brute-force computing to create relatively pure segments of a dataset. No assumptions made about the target variable. 54 Deloitte Analytics Institute 2011 Deloitte LLP 27

29 Regression and its Relations Generalized Linear Models [GLM] Relax assumptions of Ordinary Least Squares [OLS] regression Normality exponential family Linearity linearity on scale link function scale (e.g. log scale) Logistic regression: binary outcomes Poisson regression: count outcomes Multivariate Adaptive Regression Splines [MARS] and Neural Nets Clever automatic ways of transforming and interacting candidate input variables Automates difficult job of selecting and transforming dozens or hundred of variables Account for fact that true relationships aren t necessarily linear Universal approximators: can model any functional form Downside: are black-box models (though MARS is much more interpretable) Classification and Regression Trees [CART] Tree-structured regression Simplification of MARS; easier to interpret 55 Deloitte Analytics Institute 2011 Deloitte LLP This presentation contains general information only and is based on the experiences and research of Deloitte practitioners. Deloitte is not, by means of this presentation, rendering business, financial, investment, or other professional advice or services. This presentation is not a substitute for such professional advice or services, nor should it be used as a basis for any decision or action that may affect your business. Before making any decision or taking any action that may affect your business, you should consult a qualified professional advisor. Deloitte, its affiliates, and related entities shall not be responsible for any loss sustained by any person who relies on this presentation. Copyright 2011 Deloitte Development LLC. All rights reserved. Member of Deloitte Touche Tohmatsu Limited 28

PREDICTIVE MODELLING FOR COMMERCIAL INSURANCE

PREDICTIVE MODELLING FOR COMMERCIAL INSURANCE PREDICTIVE MODELLING FOR COMMERCIAL INSURANCE General Insurance Pricing Seminar 13 June 2008 London James Guszcza, FCAS, MAAA jguszcza@deloitte.com General Themes Predictive modelling: 3 Levels of Discussion

More information

Predictive Modeling Techniques in Insurance

Predictive Modeling Techniques in Insurance Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics

More information

Model Validation Techniques

Model Validation Techniques Model Validation Techniques Kevin Mahoney, FCAS kmahoney@ travelers.com CAS RPM Seminar March 17, 2010 Uses of Statistical Models in P/C Insurance Examples of Applications Determine expected loss cost

More information

Risk pricing for Australian Motor Insurance

Risk pricing for Australian Motor Insurance Risk pricing for Australian Motor Insurance Dr Richard Brookes November 2012 Contents 1. Background Scope How many models? 2. Approach Data Variable filtering GLM Interactions Credibility overlay 3. Model

More information

SOA 2013 Life & Annuity Symposium May 6-7, 2013. Session 30 PD, Predictive Modeling Applications for Life and Annuity Pricing and Underwriting

SOA 2013 Life & Annuity Symposium May 6-7, 2013. Session 30 PD, Predictive Modeling Applications for Life and Annuity Pricing and Underwriting SOA 2013 Life & Annuity Symposium May 6-7, 2013 Session 30 PD, Predictive Modeling Applications for Life and Annuity Pricing and Underwriting Moderator: Barry D. Senensky, FSA, FCIA, MAAA Presenters: Jonathan

More information

Predictive Modeling and Big Data

Predictive Modeling and Big Data Predictive Modeling and Presented by Eileen Burns, FSA, MAAA Milliman Agenda Current uses of predictive modeling in the life insurance industry Potential applications of 2 1 June 16, 2014 [Enter presentation

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

Session 62 TS, Predictive Modeling for Actuaries: Predictive Modeling Techniques in Insurance Moderator: Yonasan Schwartz, FSA, MAAA

Session 62 TS, Predictive Modeling for Actuaries: Predictive Modeling Techniques in Insurance Moderator: Yonasan Schwartz, FSA, MAAA Session 62 TS, Predictive Modeling for Actuaries: Predictive Modeling Techniques in Insurance Moderator: Yonasan Schwartz, FSA, MAAA Presenters: Jean-Frederic Breton David A. Moore, FSA, MAAA Session 62:

More information

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,

More information

Anti-Trust Notice. Agenda. Three-Level Pricing Architect. Personal Lines Pricing. Commercial Lines Pricing. Conclusions Q&A

Anti-Trust Notice. Agenda. Three-Level Pricing Architect. Personal Lines Pricing. Commercial Lines Pricing. Conclusions Q&A Achieving Optimal Insurance Pricing through Class Plan Rating and Underwriting Driven Pricing 2011 CAS Spring Annual Meeting Palm Beach, Florida by Beth Sweeney, FCAS, MAAA American Family Insurance Group

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise

More information

How To Build A Predictive Model In Insurance

How To Build A Predictive Model In Insurance The Do s & Don ts of Building A Predictive Model in Insurance University of Minnesota November 9 th, 2012 Nathan Hubbell, FCAS Katy Micek, Ph.D. Agenda Travelers Broad Overview Actuarial & Analytics Career

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar Prepared by Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com Louise.francis@data-mines.cm

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

How Will Predictive Modeling Change the P&C Industry in the Next 5-10 Years?

How Will Predictive Modeling Change the P&C Industry in the Next 5-10 Years? How Will Predictive Modeling Change the P&C Industry in the Next 5-10 Years? CAS Annual Meeting Seattle November, 2008 James Guszcza, FCAS, MAAA Deloitte Consulting The Problem with Prediction A grand

More information

ElegantJ BI. White Paper. The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis

ElegantJ BI. White Paper. The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis ElegantJ BI White Paper The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis Integrated Business Intelligence and Reporting for Performance Management, Operational

More information

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING Xavier Conort xavier.conort@gear-analytics.com Session Number: TBR14 Insurance has always been a data business The industry has successfully

More information

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

More information

MS1b Statistical Data Mining

MS1b Statistical Data Mining MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

More information

How To Understand The Theory Of Probability

How To Understand The Theory Of Probability Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

Data Mining Applications in Higher Education

Data Mining Applications in Higher Education Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

More information

Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

More information

Why do statisticians "hate" us?

Why do statisticians hate us? Why do statisticians "hate" us? David Hand, Heikki Mannila, Padhraic Smyth "Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort xavier.conort@gear-analytics.com Motivation Location matters! Observed value at one location is

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Gerry Hobbs, Department of Statistics, West Virginia University

Gerry Hobbs, Department of Statistics, West Virginia University Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

More information

TDWI Best Practice BI & DW Predictive Analytics & Data Mining

TDWI Best Practice BI & DW Predictive Analytics & Data Mining TDWI Best Practice BI & DW Predictive Analytics & Data Mining Course Length : 9am to 5pm, 2 consecutive days 2012 Dates : Sydney: July 30 & 31 Melbourne: August 2 & 3 Canberra: August 6 & 7 Venue & Cost

More information

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges A Basic Guide to Modeling Techniques for All Direct Marketing Challenges Allison Cornia Database Marketing Manager Microsoft Corporation C. Olivia Rud Executive Vice President Data Square, LLC Overview

More information

Easily Identify Your Best Customers

Easily Identify Your Best Customers IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do

More information

Addressing Analytics Challenges in the Insurance Industry. Noe Tuason California State Automobile Association

Addressing Analytics Challenges in the Insurance Industry. Noe Tuason California State Automobile Association Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile Association Overview Two Challenges: 1. Identifying High/Medium Profit who are High/Low Risk of Flight Prospects

More information

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

More information

MSCA 31000 Introduction to Statistical Concepts

MSCA 31000 Introduction to Statistical Concepts MSCA 31000 Introduction to Statistical Concepts This course provides general exposure to basic statistical concepts that are necessary for students to understand the content presented in more advanced

More information

Better decision making under uncertain conditions using Monte Carlo Simulation

Better decision making under uncertain conditions using Monte Carlo Simulation IBM Software Business Analytics IBM SPSS Statistics Better decision making under uncertain conditions using Monte Carlo Simulation Monte Carlo simulation and risk analysis techniques in IBM SPSS Statistics

More information

A Property & Casualty Insurance Predictive Modeling Process in SAS

A Property & Casualty Insurance Predictive Modeling Process in SAS Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing

More information

Data Mining Methods: Applications for Institutional Research

Data Mining Methods: Applications for Institutional Research Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014

More information

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition Brochure More information from http://www.researchandmarkets.com/reports/2170926/ Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

69 PD Underwriting Issues for Group Life and Disability Insurance. Moderator: Peter A. Heinrichs, FSA, MAAA

69 PD Underwriting Issues for Group Life and Disability Insurance. Moderator: Peter A. Heinrichs, FSA, MAAA 69 PD Underwriting Issues for Group Life and Disability Insurance Moderator: Peter A. Heinrichs, FSA, MAAA Presenters: Susan L. Ebertz Michael F. Vassar Mark R. Yoest, FSA, MAAA Underwriting Trends in

More information

REPORT DOCUMENTATION PAGE

REPORT DOCUMENTATION PAGE REPORT DOCUMENTATION PAGE Form Approved OMB NO. 0704-0188 Public Reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,

More information

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING

More information

The Last Mile Problem

The Last Mile Problem The Last Mile Problem How Data Science and Behavioural Science can Work Together IPAC Toronto March 4, 2015 James Guszcza, PhD, FCAS, MAAA Deloitte Consulting jguszcza@deloitte.com Two overdue sciences

More information

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4. Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics

More information

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an

More information

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Is a Data Scientist the New Quant? Stuart Kozola MathWorks Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by

More information

Business Analytics and Data Mining for CRM Business Analytics and Data Mining for CRM: Jumpstart workshop

Business Analytics and Data Mining for CRM Business Analytics and Data Mining for CRM: Jumpstart workshop : Jumpstart workshop Date and Place: Bangalore, Sep 1 st (Sat) and 2 nd (Sun) 2012 Registration Link: http://compegence.com/open-programs.php http://compegence.com/workshop-analytics-for-crm.php Audience:

More information

Data Analytical Framework for Customer Centric Solutions

Data Analytical Framework for Customer Centric Solutions Data Analytical Framework for Customer Centric Solutions Customer Savviness Index Low Medium High Data Management Descriptive Analytics Diagnostic Analytics Predictive Analytics Prescriptive Analytics

More information

Data Mining: Overview. What is Data Mining?

Data Mining: Overview. What is Data Mining? Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,

More information

Analytics in Action. What do Jeopardy, Pampers, and Major League Baseball all have in common? October 24, 2012

Analytics in Action. What do Jeopardy, Pampers, and Major League Baseball all have in common? October 24, 2012 Analytics in Action What do Jeopardy, Pampers, and Major League Baseball all have in common? October 24, 2012 University of Cincinnati Tangeman University Center Theater Sponsored by LUCRUM, Inc. ABOUT

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for

More information

Data Mining: An Introduction

Data Mining: An Introduction Data Mining: An Introduction Michael J. A. Berry and Gordon A. Linoff. Data Mining Techniques for Marketing, Sales and Customer Support, 2nd Edition, 2004 Data mining What promotions should be targeted

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Combining Linear and Non-Linear Modeling Techniques: EMB America. Getting the Best of Two Worlds

Combining Linear and Non-Linear Modeling Techniques: EMB America. Getting the Best of Two Worlds Combining Linear and Non-Linear Modeling Techniques: Getting the Best of Two Worlds Outline Who is EMB? Insurance industry predictive modeling applications EMBLEM our GLM tool How we have used CART with

More information

Data Mining with SAS. Mathias Lanner mathias.lanner@swe.sas.com. Copyright 2010 SAS Institute Inc. All rights reserved.

Data Mining with SAS. Mathias Lanner mathias.lanner@swe.sas.com. Copyright 2010 SAS Institute Inc. All rights reserved. Data Mining with SAS Mathias Lanner mathias.lanner@swe.sas.com Copyright 2010 SAS Institute Inc. All rights reserved. Agenda Data mining Introduction Data mining applications Data mining techniques SEMMA

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

SURVEY REPORT DATA SCIENCE SOCIETY 2014

SURVEY REPORT DATA SCIENCE SOCIETY 2014 SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether

More information

Data Mining: An Overview. David Madigan http://www.stat.columbia.edu/~madigan

Data Mining: An Overview. David Madigan http://www.stat.columbia.edu/~madigan Data Mining: An Overview David Madigan http://www.stat.columbia.edu/~madigan Overview Brief Introduction to Data Mining Data Mining Algorithms Specific Eamples Algorithms: Disease Clusters Algorithms:

More information

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics

More information

Actuarial Science as Data Science A vision for actuarial science in the 21 st century

Actuarial Science as Data Science A vision for actuarial science in the 21 st century Actuarial Science as Data Science A vision for actuarial science in the 21 st century California Actuarial Student Congress UC Santa Barbara April 4, 2015 James Guszcza, PhD, FCAS, FSA Deloitte Consulting

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

Health Spring Meeting May 2008 Session # 42: Dental Insurance What's New, What's Important

Health Spring Meeting May 2008 Session # 42: Dental Insurance What's New, What's Important Health Spring Meeting May 2008 Session # 42: Dental Insurance What's New, What's Important Floyd Ray Martin, FSA, MAAA Thomas A. McInteer, FSA, MAAA Jonathan P. Polon, FSA Dental Insurance Fraud Detection

More information

Business Analytics and Credit Scoring

Business Analytics and Credit Scoring Study Unit 5 Business Analytics and Credit Scoring ANL 309 Business Analytics Applications Introduction Process of credit scoring The role of business analytics in credit scoring Methods of logistic regression

More information

Data Mining Approaches to Modeling Insurance Risk. Dan Steinberg, Mikhail Golovnya, Scott Cardell. Salford Systems 2009

Data Mining Approaches to Modeling Insurance Risk. Dan Steinberg, Mikhail Golovnya, Scott Cardell. Salford Systems 2009 Data Mining Approaches to Modeling Insurance Risk Dan Steinberg, Mikhail Golovnya, Scott Cardell Salford Systems 2009 Overview of Topics Covered Examples in the Insurance Industry Predicting at the outset

More information

Cross Validation. Dr. Thomas Jensen Expedia.com

Cross Validation. Dr. Thomas Jensen Expedia.com Cross Validation Dr. Thomas Jensen Expedia.com About Me PhD from ETH Used to be a statistician at Link, now Senior Business Analyst at Expedia Manage a database with 720,000 Hotels that are not on contract

More information

Learning Objectives for Selected Programs Offering Degrees at Two Academic Levels

Learning Objectives for Selected Programs Offering Degrees at Two Academic Levels Learning Objectives for Selected Programs Offering Degrees at Two Academic Levels Discipline Degree Learning Objectives Accounting 1. Students graduating with a in Accounting should be able to understand

More information

Statistics for BIG data

Statistics for BIG data Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

More information

Machine Learning with MATLAB David Willingham Application Engineer

Machine Learning with MATLAB David Willingham Application Engineer Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the

More information

Customer and Business Analytic

Customer and Business Analytic Customer and Business Analytic Applied Data Mining for Business Decision Making Using R Daniel S. Putler Robert E. Krider CRC Press Taylor &. Francis Group Boca Raton London New York CRC Press is an imprint

More information

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Nine Common Types of Data Mining Techniques Used in Predictive Analytics 1 Nine Common Types of Data Mining Techniques Used in Predictive Analytics By Laura Patterson, President, VisionEdge Marketing Predictive analytics enable you to develop mathematical models to help better

More information

MHI3000 Big Data Analytics for Health Care Final Project Report

MHI3000 Big Data Analytics for Health Care Final Project Report MHI3000 Big Data Analytics for Health Care Final Project Report Zhongtian Fred Qiu (1002274530) http://gallery.azureml.net/details/81ddb2ab137046d4925584b5095ec7aa 1. Data pre-processing The data given

More information

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Paper Jean-Louis Amat Abstract One of the main issues of operators

More information

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT

More information

A Marketer s Guide to Analytics

A Marketer s Guide to Analytics A Marketer s Guide to Analytics Using Analytics to Make Smarter Marketing Decisions and Maximize Results WHITE PAPER SAS White Paper Table of Contents Executive Summary.... 1 The World of Marketing Is

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore. CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

More information

MA2823: Foundations of Machine Learning

MA2823: Foundations of Machine Learning MA2823: Foundations of Machine Learning École Centrale Paris Fall 2015 Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr TAs: Jiaqian Yu jiaqian.yu@centralesupelec.fr

More information

SAS Fraud Framework for Banking

SAS Fraud Framework for Banking SAS Fraud Framework for Banking Including Social Network Analysis John C. Brocklebank, Ph.D. Vice President, SAS Solutions OnDemand Advanced Analytics Lab SAS Fraud Framework for Banking Agenda Introduction

More information

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,

More information

How To Perform An Ensemble Analysis

How To Perform An Ensemble Analysis Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier

More information

Better credit models benefit us all

Better credit models benefit us all Better credit models benefit us all Agenda Credit Scoring - Overview Random Forest - Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis

More information

The Process of Predictive Modeling Dorothy L. Andrews Merlinos & Associates

The Process of Predictive Modeling Dorothy L. Andrews Merlinos & Associates Dorothy L. Andrews Merlinos & Associates Conceive Develop Test Implement Analyze My Background Actuary, ASA, MAAA MA Mathematical Statistics MA Mathematics & Education BA Mathematics Everett Curtis Huntington

More information

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

More information

Data Mining + Business Intelligence. Integration, Design and Implementation

Data Mining + Business Intelligence. Integration, Design and Implementation Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d. EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models

More information

Gordon S. Linoff Founder Data Miners, Inc. gordon@data-miners.com

Gordon S. Linoff Founder Data Miners, Inc. gordon@data-miners.com Survival Data Mining Gordon S. Linoff Founder Data Miners, Inc. gordon@data-miners.com What to Expect from this Talk Background on survival analysis from a data miner s perspective Introduction to key

More information

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIRPO

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

Business Intelligence and Decision Support Systems

Business Intelligence and Decision Support Systems Chapter 12 Business Intelligence and Decision Support Systems Information Technology For Management 7 th Edition Turban & Volonino Based on lecture slides by L. Beaubien, Providence College John Wiley

More information