Model Efficiency Through Data Compression



Similar documents
Equity-Based Insurance Guarantees Conference November 18-19, Atlanta, GA. GAAP and Statutory Valuation of Variable Annuities

Financial Engineering g and Actuarial Science In the Life Insurance Industry

Variable Annuity Living Benefits

How To Become A Life Insurance Agent

Session 190 PD, Model Risk Management and Controls Moderator: Chad R. Runchey, FSA, MAAA

Prudent Risk Management of Variable Annuities in the US. Senior Vice President and Actuary Prudential Annuities

Facilitating On-Demand Risk and Actuarial Analysis in MATLAB. Timo Salminen, CFA, FRM Model IT

GENERALI PANEUROPE LIMITED

Principles-Based Update - Annuities

AXA EQUITABLE VARIABLE ANNUITY GUARANTEED BENEFITS DYNAMIC HEDGING CONSIDERATIONS

Variable Annuities. Society of Actuaries in Ireland Evening Meeting September 17, 2008

Hedging Variable Annuity Guarantees August 7, 2010

Pricing Variable Annuity With Embedded Guarantees. - a case study. David Wang, FSA, MAAA May 21, 2008 at ASHK

Autumn Investor Seminar. Workshops. Managing Variable Annuity Risk

SOA 2010 Life & Annuity Symposium May 17-18, Session 31 PD, Federal Income Tax Modeling and Reporting Issues

Principles-Based Reserves Theory and Practice

Distances, Clustering, and Classification. Heatmaps

Variable Annuities and Policyholder Behaviour

Stochastic Analysis of Long-Term Multiple-Decrement Contracts

Practical Issues Relating to Setting Up a Hedging Program

Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009

Variable Annuities Risk Management

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

"Charting the Course... MOC C Designing a Data Solution with Microsoft SQL Server Course Summary

Replicating Life Insurance Liabilities

EEV, MCEV, Solvency, IFRS a chance for actuarial mathematics to get to main-stream of insurance value chain

Actuarial Guideline: XLIII Statutory and Tax Issues

Session 57 PD, Actuarial Models: Leading Practices. Moderator: Alexander Zaidlin, FSA, MAAA

The Science and Art of Market Segmentation Using PROC FASTCLUS Mark E. Thompson, Forefront Economics Inc, Beaverton, Oregon

Unbundling Insurance Contracts

There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:

Stress Testing: Insurance Companies in Canada

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Hedging Variable Annuity Guarantees

Income Annuities: Market Opportunities & Product Trends

SOA 2012 Life & Annuity Symposium May 21-22, Session 31 PD, Life Insurance Illustration Regulation: 15 Years Later

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

6 th GLOBAL CONFERENCE OF ACTUARIES February, 2004, New Delhi. By Walter de Oude. (Subject Code 01 Subject Group : Life Insurance)

Chapter 7. Cluster Analysis

The Purchase Price in M&A Deals

Solvency II and Predictive Analytics in LTC and Beyond HOW U.S. COMPANIES CAN IMPROVE ERM BY USING ADVANCED

Micro Simulation Study of Life Insurance Business

3. What is the difference between variance and standard deviation? 5. If I add 2 to all my observations, how variance and mean will vary?

AXA Global Variable Annuity. Product Review & Risk Management. STAN TULIN VICE CHAIRMAN AND CHIEF FINANCIAL OFFICER AXA FINANCIAL May 19, 2006

20465D: Designing Solutions for Microsoft SQL Server 2014

Hierarchical Cluster Analysis Some Basics and Algorithms

Hedging Variable Annuity Guarantees in the Current Environment

Cluster Analysis: Advanced Concepts

Embedded Value Report

Working Paper. The Variable Annuities Dilemma: How to redesign hedging strategies without deterring rising demand.

Cost Effective Backup with Deduplication. Copyright 2009 EMC Corporation. All rights reserved.

THE EMPIRE LIFE INSURANCE COMPANY

The Jackson Difference. Michael Wells, Chairman and CEO

Business-centric Storage for small and medium-sized enterprises. How ETERNUS DX powered by Intel Xeon processors improves data management

Predictive Modeling for Life Insurers

SOA Pandemic Model Spreadsheet Documentation

Pricing and Risk Management of Variable Annuity Guaranteed Benefits by Analytical Methods Longevity 10, September 3, 2014

The SPSS TwoStep Cluster Component

Efficient Curve Fitting Techniques

SOA 2013 Life & Annuity Symposium May 6-7, Session 28 PD, Life Protection Product Update. Moderator: Robert P.

Sentiment analysis using emoticons

2016 SOA Annual Meeting and Exhibit Sessions Sorted by Primary Competency

CEIOPS-DOC-33/09. (former CP 39) October 2009

The Academy's Variable Life Reserve Guideline Work Group of the Committee on State Life Insurance Issues prepared this report.

Actuary s Guide to Reporting on Insurers of Persons Policy Liabilities. Senior Direction, Supervision of Insurers and Control of Right to Practice

Hedging at Your Insurance Company

Stochastic Pricing for Embedded Options in Life Insurance and Annuity Products. Milliman, Inc. October 2008

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Queuing Theory. Long Term Averages. Assumptions. Interesting Values. Queuing Model

ETERNUS CS High End Unified Data Protection

Clustering UE 141 Spring 2013

Variable annuity dynamic lapse study: A data mining approach

Leveraging Radware s ADC-VX to Reduce Data Center TCO An ROI Paper on Radware s Industry-First ADC Hypervisor

Performance of insurance company hedging programs during the recent capital market crisis

Session 35 PD, Predictive Modeling for Actuaries: Integrating Predictive Analytics in Assumption Setting Moderator: David Wang, FSA, FIA, MAAA

MCEV reporting with mg-alfa

Life Principle-Based Reserves Under VM-20

BIRCH: An Efficient Data Clustering Method For Very Large Databases

Risk Management Metrics Subgroup. Embedded Value Definition

SOA 2010 Annual Meeting & Exhibit Oct , Session 92 PD, Hot Topics in the Principle-Based Approach to Reserves and Capital

A successful market segmentation initiative answers the following critical business questions: * How can we a. Customer Status.

PARALLELS CLOUD STORAGE


Collaborative Forecasting

A COMPUTER APPROACH TO ORDINARY LIFE ASSURANCE POLICY VALUATION A NEW LOOK AT CONCEPTS GRAHAM WHITTAKER

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Business-centric Storage for small and medium-sized enterprises. How ETERNUS DX powered by Intel Xeon processors improves data management

A simplified implementation of the least squares solution for pairwise comparisons matrices

1. INTRODUCTION AND PURPOSE

WHITE PAPER. Effectiveness of Variable-block vs Fixedblock Deduplication on Data Reduction: A Technical Analysis

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Transcription:

Model Efficiency Through Data Compression Actuaries Club of the Southwest Houston, TX November 15, 2012 Trevor Howes, FCIA, FSA, MAAA VP & Actuary, GGY AXIS 1

Agenda The need for model efficiency Types of model efficiency Data compression by Clustering How it works Possible refinements to basic method Advantages and disadvantages Comparison to traditional grouping

Model Efficiency the challenge Run time vs. resource cost challenge is faced by many applications, especially: Existing valuation and required capital calculations for VA s (AG 43, C3 Phase II, FAS 133, SOP 03-1) Pending valuation and required capital calculations for Life products under PBA Economic Capital, MCEV, and future Solvency II Hedging (Greek Calculation) and simulated hedging within actuarial valuation and projections o On projected inforce to simulate hedging strategy o Double nested projections for current valuation/capital o Triple nested projections for projected valuation/capital Any large scale, time consuming modeling application 3

An IT Manager s View of an Actuarial Model As a result, Actuarial Models are rapidly growing in size, complexity and demand for IT resources FEED ME! Inspiration: Audrey II from Little Shop of Horrors 4

Building a Larger Grid Adding more cores to your farm is not always the best option Expensive servers, memory, network connections, operating software, power, cooling, security, IT support Hardware has limited lifetime performance, reliability Scalability may be an issue with some software Consolidating results becomes more difficult as Grids increase More nodes feeding results simultaneously into single DB Stretches capacities and speed of disk drives Disaster recovery implies duplicate facility built, tested, then kept idle for guaranteed availability within a day 5

How to Find Greater System Efficiency? 1. Improved performance hardware, new technology May require redesigning software, recoding in new language May place restrictions on size or design 2. Optimize software implementation for greater efficiency Should have no impact on results 3. Model Efficiency Techniques (find ways to do less work and achieve acceptable model accuracy) Note: Model Efficiency Work Group (MEWG) created in 2007 by the AAA SVL2 Committee to support PBA project Main MEWG web page on AAA website: www.actuary.org/content/academy%e2%80%99s-model-efficiency-work-group 6

Preferred Model Efficiency Techniques Attractive characteristics: Greatest efficiency for least amount of error added Both mathematical and intuitive support Impact should be fully understood, readily quantifiable Easily incorporated in regular production routines without manual intervention Flexibility in application option to specify reduction factor and turn on/off Useful throughout projections with nested stochastics and not just for current date calculations Useful for multiple applications and purposes 7

How to Find Greater System Efficiency? What types of Model Efficiency Techniques are used? Model simplifications and approximations o Less frequent time steps o Ignoring or simplifying assumptions Scenario generation and selecting reduced scenarios Compressing asset and liability inforce models Hybrid techniques (data and scenario compression) All the above in combination 8

Model Compression Techniques Many companies use traditional grouped data approach: o Groups of similar or identical policies treated as one model point o Groupings formed by predefined criteria such as specific values or ranges of values; for example: average issue age reflecting 5 or 10 year issue age range average issue date within 3 months or full calendar year similar plan codes represented by most common plan code o Model point is a set of identical policies reflecting the average or most common characteristics of the grouping o Scaling factor based typically on policy count to reflect the group Traditional grouping approach may still be simplest and best method in some cases! 9

Model Compression Techniques Clustering is an advanced compression technique Some attractive features vs. traditional grouping Reported to have potential for significant efficiencies in some lines of business, and some applications Included in 2011 SOA funded research project o See report by Ernst & Young available from: www.soa.org/research/research-projects/life-insurance/research- 2011-11-model-eff.aspx Also reported in various Section newsletter articles in recent years; e.g. The Financial Reporter June 2010 issue: www.soa.org/library/newsletters/financial-reporter/2010/june/frn-2010- iss81.pdf 10

Clustering Compression Technique Clustering is a generic technique used in data analysis For Model Compression useful for actuarial applications, typically agglomerative clustering algorithm is used: o Each policy starts as its own cluster o Distance (similarity) between every pair is calculated using chosen location values (characteristics) for each policy o Least important cluster by size and distance from nearest cluster is merged into nearest cluster o Original policy in cluster is representative policy of cluster; scaling factor calculated to reflect total volume of cluster o Clustering process continues until target ratio (# of clusters) met o Better representative policies may be found when finished 11

Clustering Process Illustration Step 1: Seriatim data assigned location variables and size 20 seriatim policies of varying sizes and characteristics Target compression ratio of 20% (5:1) LV Y LV X 12

Clustering Process Illustration Step 2: Distances between all pairs calculated Distances determine order of clustering from closest to furthest 13

Clustering Process Illustration Step 3: Closest pairs identified 3 4 2 1 5 14

Clustering Process Illustration Step 4: Least important clustered with larger adjacent point 3 Clustered policies 2 2 2 15

Clustering Process Illustration Step 5: Next closest pairs identified 10 7 9 6 8 16

Clustering Process Illustration Step 6: Least important clustered with larger adjacent point 6 Clustered policies 2 2 2 2 2 17

Clustering Process Illustration Step 7: Remaining closest pairs identified to reach target 16 13 12 14 11 15 18

Clustering Process Illustration Step 8: Final clustering performed 7 Target compression level of 20% reached 8 4 19

Clustering Process Illustration Result: Highly compressed model and original model 7 4 8 20 original policies compressed into 4 clusters Representative policies may need to be reset 20

Refinements to Basic Clustering Total portfolio divided into segments; clustering applied only within each segment Different target compression ratios may be set by segment Segments with zero compression target left as seriatim Segment sizes may impact performance of clustering step since distance table (triangle) size is based on N squared Multiple clustering processes with different target ratios may be used to generate compressed models for different purposes, or for research into impact one compression run may generate multiple compressed models 21

Refinements to Basic Clustering Choices of Distance measure measure of similarity that determines order of clustering o Distance calculations based on comparing location variables o Choices include: - Euclidean distance (Square root of sum of squares) - Square of Euclidean distance - Sum of absolute values of differences o Different measures may isolate or combine outliers more effectively o Will need to normalize each location variable may also want to apply weightings 22

Refinements to Basic Clustering Choosing Measure of Size and Importance o A volume or size Measure must be selected be used to accumulate the total size of each cluster and the scaling factor to apply to the representative policy Usually Face Amount or Fund Value, but others possible Only the measure used for scaling can exactly replicate the portfolio; other measures will be approximate o A variation would be to allow each Representative Policy values to be adjusted to the cluster average Greater effort, increased risk (need to assure that the adjusted policy is internally consistent, realistic, etc.) 23

Refinements to Basic Clustering Using Different Linkage Rules (How to define distance to a cluster?) Which policy to use? Original representative policy when first merged? Nearest policy, or largest policy? Policy closest to centroid? How often is this representative policy reset for the cluster? Only after target ratio reached? At predefined stages of compression or numbers of steps? Every step? Note: all relevant distances may have to be recalculated for changed clusters unless original distance table retained 24

Clustering vs. Grouping Like traditional grouping, a central policy represents the whole cluster, by scaling up the calculated policy results o Grouping typically scales by policy count o Clustering scales by user selected measure Clustering differs from traditional grouping in that: o It can work across typical boundaries (sex, UW class, plan) o No need to predefine groupings, ranges of ages, issue dates o Target compression ratio may be set and compression continues until met o Easy to increase or decrease compression ratio to achieve acceptable distortion vs. full seriatim o Easy to generate multiple Data Models of different compression levels for different purposes in one process 25

Clustering Case Study in FR Section Newsletter Variable Annuity block with GMAB, GMDB, GMIB, GMWB o 100,000 plus policies, $9.5 billion fund value Compression ratios: 100% 5.0% 2.5% 1.0% 0.25% 0.05% 26

Clustering Case Study in FR Section Newsletter Variable Annuity block running AG 43 o 100,000 plus policies, $9.5 billion fund value o GMAB, GMDB, GMIB, GMWB o Case study used following location variables calculated across 5 representative scenarios: easily obtained from in force files requires some seriatim preprocessing 27

Clustering Case Study in FR Section Newsletter Note the varying impact of compression by benefit type 28

Clustering Investigation or Implementation Clustering algorithms are generally available in software used for data analysis, statistics (e.g. MatLab) Available in at least two common modeling platforms Preferably it should be easily integrated into your production process and require little manual intervention You will need research to discover o optimal choices for segments, location variables and target ratios in each product portfolio and application o the corresponding distortion in application results 29

Model Efficiency Through Data Compression THANK YOU! 30