ROOT: A data mining tool from CERN What can actuaries do with it?



Similar documents
MSCA Introduction to Statistical Concepts

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

The power of IBM SPSS Statistics and R together

Predict the Popularity of YouTube Videos Using Early View Data

Advanced analytics at your hands

Optimization: Optimal Pricing with Elasticity

ANALYTICS CENTER LEARNING PROGRAM

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Exploratory Spatial Data Analysis

Studying Auto Insurance Data

MSCA Introduction to Statistical Concepts

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Customer and Business Analytic

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Descriptive Statistics

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Notes for EER #4 Graph transformations (vertical & horizontal shifts, vertical stretching & compression, and reflections) of basic functions.

The Predictive Data Mining Revolution in Scorecards:

What Does the Normal Distribution Sound Like?

Lecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization

not possible or was possible at a high cost for collecting the data.

Temperature Scales. The metric system that we are now using includes a unit that is specific for the representation of measured temperatures.

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

5 Systems of Equations

Anti-Trust Notice. Agenda. Three-Level Pricing Architect. Personal Lines Pricing. Commercial Lines Pricing. Conclusions Q&A

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

Health Spring Meeting May 2008 Session # 42: Dental Insurance What's New, What's Important

IBM SPSS Direct Marketing

Chapter 6. The stacking ensemble approach

Unit 9 Describing Relationships in Scatter Plots and Line Graphs

The Edge Editions of SAP InfiniteInsight Overview

Vectors. Objectives. Assessment. Assessment. Equations. Physics terms 5/15/14. State the definition and give examples of vector and scalar variables.

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Fairfield Public Schools

Chapter 4 Online Appendix: The Mathematics of Utility Functions

Linear Programming. Solving LP Models Using MS Excel, 18

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

II. DISTRIBUTIONS distribution normal distribution. standard scores

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

Microeconomics Sept. 16, 2010 NOTES ON CALCULUS AND UTILITY FUNCTIONS

A Property and Casualty Insurance Predictive Modeling Process in SAS

Lesson 4 What Is a Plant s Life Cycle? The Seasons of a Tree

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Diagrams and Graphs of Statistical Data

APPLICATION PROGRAMMING: DATA MINING AND DATA WAREHOUSING

Data Mining: An Introduction

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

Price Optimization. For New Business Profit and Growth

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Data Exploration Data Visualization

7 Time series analysis

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

STATISTICA Solutions for Financial Risk Management Management and Validated Compliance Solutions for the Banking Industry (Basel II)

TEACHING AGGREGATE PLANNING IN AN OPERATIONS MANAGEMENT COURSE

Modeling Lifetime Value in the Insurance Industry

Data Mining Applications in Higher Education

Principles of Data Mining by Hand&Mannila&Smyth

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Sanjeev Kumar. contribute

Professional Organization Checklist for the Computer Science Curriculum Updates. Association of Computing Machinery Computing Curricula 2008

Managerial Economics Prof. Trupti Mishra S.J.M. School of Management Indian Institute of Technology, Bombay. Lecture - 13 Consumer Behaviour (Contd )

Data, Measurements, Features

Data Mining Analytics for Business Intelligence and Decision Support

Business Intelligence. Data Mining and Optimization for Decision Making

Data Visualization Techniques and Practices Introduction to GIS Technology

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

EdExcel Decision Mathematics 1

MARS STUDENT IMAGING PROJECT

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Part 1: Background - Graphing

Lesson 4: Solving and Graphing Linear Equations

The Scientific Data Mining Process

Questions: Does it always take the same amount of force to lift a load? Where should you press to lift a load with the least amount of force?

Support Vector Machines Explained

Personal Auto Predictive Modeling Update: What s Next? Roosevelt Mosley, FCAS, MAAA CAS Predictive Modeling Seminar October 6 7, 2008 San Diego, CA

Didacticiel Études de cas

Session 7 Bivariate Data and Analysis

Statistics Graduate Courses

Decision Trees from large Databases: SLIQ

GRAPH MATCHING EQUIPMENT/MATERIALS

Optimization Modeling for Mining Engineers

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

SAS Certificate Applied Statistics and SAS Programming

Charlesworth School Year Group Maths Targets

Transcription:

ROOT: A data mining tool from CERN What can actuaries do with it? Ravi Kumar, Senior Manager, Deloitte Consulting LLP Lucas Lau, Senior Consultant, Deloitte Consulting LLP Southern California Casualty Actuarial Club Fall Meeting Pasadena, California December 3, 2010 1 Background P&C insurance companies are increasingly becoming Data Driven organizations To understand their customers better To improve operational efficiencies To stay profitable To stay ahead of their competitors So, data volumes are growing at an annual rate of 60% or even more. Actuaries are finding it increasingly difficult to manage this data explosion that is happening around them, a data mining tool from CERN, offers a scalable solution that can help actuaries deal with large volumes of data 2 3 Demand for internal data Actuaries are using more detailed internal data in their day to day work Demand for external data Actuaries are constantly looking for new business insights that can be derived from external data Billing Claims Policy & Premium Pharmacy Lexis Nexis House hold information Claimant History Customer Name & Address Financial Census Broker & Vendor Notes (Text Mining) Medical Social Network Data Analysis Analysis Business Insights Business Insights

Data, data everywhere Conclusion 6 7 What is ROOT? Root as a Data Storage Framework Root is an efficient Data Storage framework specifically designed to handle Big Data Root stores data in a compressed columnar structure Root is a Data Visualization Tool with extensive data visualization capabilities Root can store complex objects, structures, arrays, images, in addition to simple data types Design is optimized for fast analysis capabilities Root is a Modeling Tool with extensive set of data fitting, modeling and analysis methods including data simulation For example: Root is a complete C++ programming environment Root comes with full memory management, networking, parallel processing capabilities for increased scalability 8 In this example, Root takes only 18% space compared to the corresponding ASCII file 9 Root as a Data Visualization Tool... Root as a Modeling Tool Root provides convenient graphic tools to visualize and explore even very large datasets Multiple Regression Maximum Likelihood Fitting General purpose function minimizers Can fit any user defined function to the data Neural Networks Decision and Regression Trees Boosting and Bagging Principal Component Analysis Rules based predictive learning Support vector machines Several other Classification/Regression Algorithms Histograms 1-, 2- and 3-Dimensional With interactive capabilities With Dynamic re-binning capabilities Curve Fitting Graphs Pie Charts QQ Plots Parallel Coordinates 10 11

Bundled Services Price Price of Single Service 7 0 0.5 1 1.5 2 Optimal Volume Originated ($B) 12 11 10 9 8 A B Current More about Root Root is available as a free open source software from http://root.cern.ch/drupal/ Root is supported on many operating systems like Windows, Unix, Linux, Mac OS etc. Root is used in many industries High Energy Physics community Wall Street trading companies(merrill Lynch, Renaissance Corp) Flight planning systems (MITRE) Insurance(Nationwide, Deloitte) Defense(USAF, DoD) Oil, geology(haliburton) Medical Imaging(Philips Medical) 12 13 What are you trying to optimize? Price Optimization is the process of calculating optimal set of prices, taking into consideration demand elasticity, customer preferences and cost of delivering products or services. 1. Define the Objective Function Profit = Quantity * (Price Cost) 2. Determine the Constraints Floor/ceiling prices Inter-product relationship rules (equal, greater than) based on attributes 3. Decide on strategy, trade-offs Revenue versus profit versus market share versus 4. Use Calculus methods (Lagrange multipliers, Linear Programming, etc) to solve for price for multiple strategic scenarios Solve for Prices that maximize Objective subject to Constraints 14 15 Putting Optimization Into Practice How do we optimize prices? 1. Profitability function formularization : Most institutions can accurately measure performance from an operational standpoint. However, they cannot accurately estimate the profitability of an individual transaction at the time of origination. This requires robust pro forma models that estimate expected income, costs, and risks over the lifetime of a transaction. Accurate transaction-level estimation of profitability is a pre-requisite for a successful pricing initiative. When implementing optimization, a single measure must be selected at a given point in time. Profit Profit Price Volume Variable Fixed = ( x ) - ( Cost + Cost ) 2. Incorporation of constraints to the benefit function: Once benefit function is estimated, restrictions are incorporated (based on internal policies, standard requirements, etc.) that may condition pricing policy. Max Price Increases/Decreases Regulatory Restrictions Supply Constraints. Pricing/Product Rules Translating this situation into optimization problem formulation, we see that the objective is to find the highest point on the hill. Therefore, objective function is the height achieved by the first boy with respect to his original position. The design variables are longitude and latitude the coordinates, defining position of the boy. The constraints are that the boy has to stay inside the fences. Note here, that in general, the boy may start the search from outside the fences. 3. Benefit optimization: The constrained objective function (e.g. profit, revenue, attrition rate, etc.) is optimized through simulation of multiple scenarios, generating efficient frontiers for a segment, market, product, or a combination thereof. Demand Models Benefits Function Efficient Frontier Revenue / Customer Benefit= f (P, Q, X) + Constraints Win Probability Annual Pre-Tax Interest Income (in $MM per $1B originated) 16 17

Data Exploratory in Price Elasticity Let s look at the traditional way to do it in SAS: proc univariate. Definition of Elasticity: the ratio of percentage change in quantity sold to percentage change in price Elasticity = QSold QSold P P In order to have an accurate estimate of elasticity, we need to examine the outliers of both numerator and denominator How do they distribute graphically? 18 19 Here is the graphical representation of distribution After the log transformation, the frequency of extreme values pops X-Axis: % change in Quantity Sold Y-Axis: Frequency X-Axis: % change in price Y-Axis: Frequency X-Axis: % change in Quantity Sold Y-Axis: Frequency (log scale) X-Axis: % change in price Y-Axis: Frequency(log scale) 20 The absolute count on Y-axis does not tell us about the frequency of extreme cases because they are too small relative to the majority. We need to do some transformation to Y-axis. SAS? or ROOT? 21 Just a right click on your mouse, the Y-axis has been transformed to log scale. This save the hassle of programming the log transformation and running the program in SAS! Modeling in Root Use TMinuitMinimizer to minimize the negative of the objective function in ROOT Root is a complete C++ programming environment, so that you can customize your objective function and make your optimization engine as proprietary as possible Root is an efficient tool in running optimization. We compared our internal optimization engines between R and Root. Root outperformed R significantly in terms of running time. 22 23

Conclusion Root is specifically designed to handle, store, and analyze large amounts of data very efficiently Root comes with many Statistical tools Data mining tools Data visualization tools Connectivity tools Bibliography http://root.cern.ch/drupal/ Main ROOT Website http://www.casact.org/pubs/forum/08wforum/kumar_tripathi.pdf Paper on CAS E-Forum introducing Root to the actuaries http://tmva.sourceforge.net/ Toolkit for Multivariate data Analysis with ROOT Root is suitable for any environment requiring serious and scalable data analysis solution 24 25