exspline That: Explaining Geographic Variation in Insurance Pricing
|
|
|
- Meryl Stewart
- 9 years ago
- Views:
Transcription
1 Paper exspline That: Explaining Geographic Variation in Insurance Pricing Carol Frigo and Kelsey Osterloo, State Farm Insurance ABSTRACT Generalized linear models (GLMs) are commonly used to model rating factors in insurance pricing. The integration of territory rating and geospatial variables poses a unique challenge to the traditional GLM approach. Generalized additive models (GAMs) offer a flexible alternative based on GLM principles, with a relaxation of the linearity assumption. We will explore two approaches for incorporating geospatial data in a pricing model using a GAM based framework. The ability to incorporate new geospatial data and improve traditional approaches to territory ratemaking results in further market segmentation and a better match of price to risk. Our discussion highlights the use of the high-performance GAMPL procedure, which is new in SAS/STAT 14.1 software. With PROC GAMPL, we can incorporate the geographic effects of geospatial variables on target loss outcomes. We will illustrate two approaches. In our first approach, we begin by modeling the predictors as regressors in a GLM, and subsequently model the residuals as part of a GAM based on location coordinates. In our second approach, we model all inputs as covariates within a single GAM. Our discussion will compare the two approaches and demonstrate visualization of model outputs. INTRODUCTION The practice of rating policies based on their geographic territory is a part of most insurance rating plans. Traditionally, territories have been built based on government boundaries such as county, city and zip code. Since these territories are large geographic areas, much of the high level variation can be explained, but there is still more information that can be extracted to account for the deeper granular differences between policyholders. The current approach looks at territories as spatial grid boundaries; that is, as square grid cells based on latitude and longitude ranges. To model the observed losses of these grid cells, we currently credibility weight each cell with its neighboring grid cells. Our proposed methods explore generalized additive models (GAMs) to capture the multi-directional variation and incorporate new location variables to better address the individuality of each policy. GENERALIZED ADDITIVE MODELS Generalized linear models (GLMs) relax the assumptions of general linear models by allowing for link functions that transform the mean of the response variable, and distributions other than normal. GAMs 1 go one step further by relaxing the linearity assumption to allow smoothing functions. In a GAM, the response can depend on both parametric and nonparametric predictors. The parametric predictors can be linear effects involving continuous and classification variables, as in GLMs. The nonparametric predictors can be smooth functions of one or more continuous variables. GAMs can be expressed as follows: g(u) = f 0 + f 1 (x 1 ) + f 2 (x 2, x 3 ) + + f p (x p ) where u = E(Y), g(.) is a link function applied to the mean, and fp (.) characterizes the linear effect or the dependency of the smoothing function in the neighborhood of xp. The smoothing functions can take a variety of forms, such as local average, running average, running line, kernel-based, and spline. The increased flexibility of the GAM allows for better fits of the contoured data that are not easily parameterized in a linear or higher degree polynomial model. The spline terms can be univariate or multivariate, where the multivariate splines include the interaction of the variables, thus capturing more of the joint variation in the data. 1
2 GAM MODELING IN SAS THE GAMPL PROCEDURE GAMPL, which stands for Generalized Additive Model by Penalized Likelihood estimation, is a new highperformance procedure in the SAS/STAT 14.1 software 2. This is not simply the high-performance version of PROC GAM the two procedures have different underlying regression equations. PROC GAM uses smoothing splines and is considered better for smaller amounts of data with known degrees of freedom. The GAMPL procedure uses regression splines and is better for searching for the optimal degrees of freedom and automatic model building. In the procedure s MODEL statement, each regressor must be enclosed within SPLINE() or PARAM(). SPLINE( ) is used to specify spline functions of continuous regressors, and PARAM() is used to specify linear effects of continuous or classification regressors. To begin with a simple example, we can model latitude and longitude against pure premium (dollars of loss per policy, aggregated to the grid cell level). Latitude and longitude values are rounded to the hundredths place, effectively creating.01 x.01 (or 1km by 1km) square grid cells. Our dataset contains over 600,000 observations, each being a unique latitude/longitude combination, covering the entire state. The modeling code for this example is: PROC GAMPL data=statex_crime; MODEL PurePrem = spline(latitude Longitude / maxdf=200 maxknots=20000); OUTPUT out=statex_output pred= predicted; Here, we have a bivariate spline constructed by latitude and longitude, creating a smoothed rating surface topology over the geographic space. The procedure s default maximum degrees of freedom for a bivariate spline term is 20 (the default for univariate splines is 10), and the default maximum number of knots is 2,000. Due to the complexity of the data, we need more than the default degrees of freedom and knots to appropriately capture the variation, and thus have expanded the search area, requesting the optimal degrees of freedom selected to be between one and two hundred and the optimal number of knots to be at most 20,000. The degrees of freedom (DFs) are a measure of nonlinearity, and can be non-integer due to the effects of the spline. DFs are reported both for the model as a whole, and individually for each spline term. Smaller model DFs imply a simpler, easier to interpret model. Spline DFs close to one indicate that smoothing is not needed the variable has a linear relationship with the target and should instead be included as a parameter. The formula found in the Details section 3 of the documentation for the GAMPL procedure provides an intuitive way to parameterize the complexity of the spline model. The formula shows the natural extension of the definition of degrees of freedom for linear/generalized linear models, where the degrees of freedom are simply the number of parameters. While PROC GAMPL is a high-performance procedure that executes very fast even in a non-distributed environment, it should be noted that computation speed is highly dependent on the complexity of the model additional multivariate splines and higher degrees of freedom require extra computing time. TWO PROPOSED MODELING SOLUTIONS Now that we have identified the need for an additive model and how best to implement it, we can focus on different methodologies for incorporating additional variables. In this section, we will highlight two approaches for modeling the spatial aspect of our dependent variable. To conceptualize the difference in these two approaches, we will analyze crime loss data for Homeowners policies in a single state. Using the two methods defined below, we can demonstrate the improvement gained by creating territory rates based on two different implementations of splines as opposed to a locally weighted average. Before we detail these approaches, we must first define our variables and discuss how they are handled in the model. MODELING SPECIFICATIONS AND DEFINITION OF INSURANCE TERMS As previously stated, the model target is pure premium (PurePrem), which is defined as loss per exposure in a grid cell. Latitude and longitude combine to create 1km by 1km square grid cells. Other variables included in the modeling process are defined as follows: 2
3 LossExpo loss exposure. For this Homeowners illustration, we are using earned house years. Values of one mean the policy was in force for the entire term; fractions reflect the portion earned before the policy lapsed. Temp annual temperature range Precip average annual precipitation To highlight the differences in modeling approaches, the same variables will be used in both methods, and in the same manner (i.e. as linear univariate regressors). The target variable is adjusted to account for all other factors in our rating plan (such as construction type, home security system, etc.). Numerous missing values are present in the target variable, indicating there are no policyholders at the location. While these grid cells are not used to fit the model, they are included in the dataset so that a predicted value is generated. Due to the right-skewed nature of our continuous target, we will use a Gamma distribution with a log link transformation. METHOD ONE: THE GLM-GAM APPROACH The GLM-GAM approach is a two stage modeling process. In the first stage, we fit a GLM to our linear terms. In the second stage, we fit a GAM on the same target using the remaining nonlinear spatial variables, and the score of the first model as an offset. Method one is similar in concept to a spatial error model the main regressors are modeled first, and a second model is run on the residuals to extract the spatial aspect. The advantage of this two stage approach is that it separates the conventional GLM and variable selection process from the spline process, making it easier to implement. The disadvantage is that, in the second model, everything from the first stage model is included in the offset all the regression coefficients are assumed to be fixed. If the offset has a structure to it from the assumed linear model, and if the original model is over-fitting, this introduces some artificial pattern into the rest of the modeling process. The HPGENSELECT procedure 4 is utilized for fitting and building our generalized linear model. The SAS code is as follows: Stage One: PROC HPGENSELECT data = StateX_Crime; ID Latitude Longitude PurePrem LossExpo; MODEL PurePrem = Temp Precip / dist=gamma link=log; WEIGHT LossExpo; OUTPUT out=stage1output pred=glm_pred; Stage Two: PROC GAMPL data = Stage1output; MODEL PurePrem = spline(latitude Longitude / maxdf=200 maxknots=20000) / dist=gamma link=log offset=glm_pred; WEIGHT LossExpo; OUTPUT out=method1output pred=method1_pred; METHOD TWO: THE SINGLE GAM APPROACH In contrast to the GLM-GAM approach, a single stage GAM fits all explanatory variables at once, and does not require the non-spatial terms to be linear. The advantage in using a singular approach is that all of the variation is captured. The parameter and smoothing function estimates are calculated jointly rather than fixing the estimates and associated errors at different stages in the modeling process. The disadvantage to this approach is the increased complexity of the model may make it harder to conceptualize and explain. Splines functions consist of multiple basis functions, each with its own parameter, and spline terms are usually interpreted as a single entity (not individual spline basis 3
4 parameters). It should be noted that the procedure generates plots of smoothing components as a visual aid for interpreting the fitted spline terms. The SAS code for the Single GAM approach is: PROC GAMPL data = StateX_Crime; MODEL PurePrem = spline (Latitude Longitude / maxdf=200 maxknots=20000) param(temp) param(precip) / dist=gamma link=log; WEIGHT LossExpo; OUTPUT out=method2output pred=method2_pred; METHODOLOGY APPLICATION TERRITORIAL RATEMAKING The best way to visualize the results is through heat maps, here created using the new HEATMAPPARM statement in the SGPLOT procedure. The city names and points are superimposed using an annotate dataset created from the SAS predefined map datasets 5. A wider variety of colors indicates more segmentation an increased ability to achieve a wider range of rates. As is shown in the thermometer to the right of the graphs, dark purple is associated with the lower values, and red with higher. For the visual comparison, the predicted pure premium values for each grid cell are transformed into relativities. This is done by dividing each grid cell s pure premium by the statewide pure premium, so that a relativity of one is the state average, and values below one denote losses lower that the average for the state. Our goal is to increase the segmentation in our territorial ratemaking and capture more of the granular variation. We can use heat maps along with the model output to assess the improvement of the two methods over the current approach, as well as to see each methodology s impact on the results. METHOD COMPARISON To assess the differences in model fits, we use a few different metrics. GCV (generalized cross validation) is a model fitting criterion used with splines, and is similar to AIC smaller is better. The metric is relative, lending itself to comparisons of various model runs. The roughness penalty is also a relative metric, measuring the penalization from the functional space bounded by the maximum number of DFs. As in our case, both methods have the same maximum DFs. Therefore, the model with the lower roughness penalty will actually have more parameters (as is more directly seen in the model DF value) since it is penalized less for its complexity. Fit Statistics Method One Method Two AIC 204, ,791 GCV Degrees of Freedom Roughness Penalty 27,071 26,204 Spline Effects Latitude & Longitude Smoothing Parameter Degrees of Freedom Roughness Penalty 27,071 26,204 Table 1. Fit Statistics The plots below demonstrate a dramatic improvement in addressing the granular variation through both the GLM-GAM and Single GAM approaches. Methods one and two still have pockets of higher relativities around the major cities, but they have different degrees of relativity variation for the surrounding areas, 4
5 mainly due to the inclusion of additional spatial variables. The difference in the two methods seen in the lower right hand corner of the graph is attributed to using an offset in the GLM-GAM approach part of the variation has been fixed. Current Method Method One: GLM-GAM Approach Method Two: Single GAM Approach Output 1. Heat Maps All fit statistics point to the single approach producing the better model. The Single GAM has a lower roughness penalty (lower penalization for its complexity), a smaller GCV criterion (indicating a better fit), and a smaller smoothing parameter (allowing for more granular predictions). The single model is slightly more nonlinear, but more accurately predicts the pure premium, especially along the coast. The Single GAM model results follow the current methodology, with the increased benefit of more granularly modeling the areas between the major cities, as well as within the cities themselves. CONCLUSION Generalized additive models offer a flexible way to incorporate smoothing splines for modeling geospatial variables. PROC GAMPL not only aids in the fine tuning of the generalized additive model, but seamlessly allows for the inclusion of parametric terms and additional spline terms something that is not so effortlessly handled with other implementation approaches. Both the single and two stage approaches produce reasonable results, given the advantages and disadvantages of each methodology. For our particular application, the Single GAM Approach best accomplishes our goal increased segmentation over the current methodology to better capture the low level variation between policyholders. REFERENCES 1 Wood, S. (2006). Generalized Additive Models. Boca Raton, FL: Chapman & Hall/CRC. 2 SAS Institute Inc. (2015). The GAMPL Procedure. In SAS/STAT 14.1 User s Guide. Cary, NC: SAS Institute Inc. 5
6 3 SAS Institute Inc. (2015). The GAMPL Procedure. In SAS/STAT 14.1 User s Guide. Cary, NC: SAS Institute Inc. ls13.htm 4 SAS Institute Inc. (2015). The HPGENSELECT Procedure. In SAS/STAT 14.1 User s Guide. Cary, NC: SAS Institute Inc. 5 SAS Institute Inc. (2015). In SAS/GRAPH 9.4: Reference, Fourth Edition. Cary, NC: SAS Institute Inc. 1pjsh20reebaj.htm ACKNOWLEDGMENTS Special thank you to Bob Rodriguez and Weijie Cai of the SAS Institute. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the authors at: Carol Frigo Property and Casualty Actuarial Department State Farm Insurance Companies One State Farm Plaza Bloomington, Illinois [email protected] Kelsey Osterloo Property and Casualty Actuarial Department State Farm Insurance Companies One State Farm Plaza Bloomington, Illinois [email protected] SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 6
Regression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model
Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort [email protected] Motivation Location matters! Observed value at one location is
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
Predictive Modeling for Life Insurers
Predictive Modeling for Life Insurers Application of Predictive Modeling Techniques in Measuring Policyholder Behavior in Variable Annuity Contracts April 30, 2010 Guillaume Briere-Giroux, FSA, MAAA, CFA
In comparison, much less modeling has been done in Homeowners
Predictive Modeling for Homeowners David Cummings VP & Chief Actuary ISO Innovative Analytics 1 Opportunities in Predictive Modeling Lessons from Personal Auto Major innovations in historically static
Offset Techniques for Predictive Modeling for Insurance
Offset Techniques for Predictive Modeling for Insurance Matthew Flynn, Ph.D, ISO Innovative Analytics, W. Hartford CT Jun Yan, Ph.D, Deloitte & Touche LLP, Hartford CT ABSTRACT This paper presents the
BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING
BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING Xavier Conort [email protected] Session Number: TBR14 Insurance has always been a data business The industry has successfully
Dealing with continuous variables and geographical information in non life insurance ratemaking. Maxime Clijsters
Dealing with continuous variables and geographical information in non life insurance ratemaking Maxime Clijsters Introduction Policyholder s Vehicle type (4x4 Y/N) Kilowatt of the vehicle Age Age of the
MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
Risk pricing for Australian Motor Insurance
Risk pricing for Australian Motor Insurance Dr Richard Brookes November 2012 Contents 1. Background Scope How many models? 2. Approach Data Variable filtering GLM Interactions Credibility overlay 3. Model
Module 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
Introduction to nonparametric regression: Least squares vs. Nearest neighbors
Introduction to nonparametric regression: Least squares vs. Nearest neighbors Patrick Breheny October 30 Patrick Breheny STA 621: Nonparametric Statistics 1/16 Introduction For the remainder of the course,
Predict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
Introduction to Predictive Modeling Using GLMs
Introduction to Predictive Modeling Using GLMs Dan Tevet, FCAS, MAAA, Liberty Mutual Insurance Group Anand Khare, FCAS, MAAA, CPCU, Milliman 1 Antitrust Notice The Casualty Actuarial Society is committed
GLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through
Section 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
Visualizing Relationships and Connections in Complex Data Using Network Diagrams in SAS Visual Analytics
Paper 3323-2015 Visualizing Relationships and Connections in Complex Data Using Network Diagrams in SAS Visual Analytics ABSTRACT Stephen Overton, Ben Zenick, Zencos Consulting Network diagrams in SAS
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
Module 3: Correlation and Covariance
Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis
Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@
Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,
A Property & Casualty Insurance Predictive Modeling Process in SAS
Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing
Regression Modeling Strategies
Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions
Getting Correct Results from PROC REG
Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking
Modifying Insurance Rating Territories Via Clustering
Modifying Insurance Rating Territories Via Clustering Quncai Zou, New Jersey Manufacturers Insurance Company, West Trenton, NJ Ryan Diehl, New Jersey Manufacturers Insurance Company, West Trenton, NJ ABSTRACT
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
Modeling Lifetime Value in the Insurance Industry
Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting
The Probit Link Function in Generalized Linear Models for Data Mining Applications
Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications
Combining GLM and datamining techniques for modelling accident compensation data. Peter Mulquiney
Combining GLM and datamining techniques for modelling accident compensation data Peter Mulquiney Introduction Accident compensation data exhibit features which complicate loss reserving and premium rate
Smoothing and Non-Parametric Regression
Smoothing and Non-Parametric Regression Germán Rodríguez [email protected] Spring, 2001 Objective: to estimate the effects of covariates X on a response y nonparametrically, letting the data suggest
Multiple Regression: What Is It?
Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in
Simple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
Development Period 1 2 3 4 5 6 7 8 9 Observed Payments
Pricing and reserving in the general insurance industry Solutions developed in The SAS System John Hansen & Christian Larsen, Larsen & Partners Ltd 1. Introduction The two business solutions presented
Nonnested model comparison of GLM and GAM count regression models for life insurance data
Nonnested model comparison of GLM and GAM count regression models for life insurance data Claudia Czado, Julia Pfettner, Susanne Gschlößl, Frank Schiller December 8, 2009 Abstract Pricing and product development
GAM for large datasets and load forecasting
GAM for large datasets and load forecasting Simon Wood, University of Bath, U.K. in collaboration with Yannig Goude, EDF French Grid load GW 3 5 7 5 1 15 day GW 3 4 5 174 176 178 18 182 day Half hourly
Package smoothhr. November 9, 2015
Encoding UTF-8 Type Package Depends R (>= 2.12.0),survival,splines Package smoothhr November 9, 2015 Title Smooth Hazard Ratio Curves Taking a Reference Value Version 1.0.2 Date 2015-10-29 Author Artur
Combining Linear and Non-Linear Modeling Techniques: EMB America. Getting the Best of Two Worlds
Combining Linear and Non-Linear Modeling Techniques: Getting the Best of Two Worlds Outline Who is EMB? Insurance industry predictive modeling applications EMBLEM our GLM tool How we have used CART with
A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic
A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic Report prepared for Brandon Slama Department of Health Management and Informatics University of Missouri, Columbia
Fitting Subject-specific Curves to Grouped Longitudinal Data
Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: [email protected] Currie,
Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC
Paper AA08-2013 Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC Robert G. Downer, Grand Valley State University, Allendale, MI ABSTRACT
Predictive Modeling Techniques in Insurance
Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
Assessing Model Fit and Finding a Fit Model
Paper 214-29 Assessing Model Fit and Finding a Fit Model Pippa Simpson, University of Arkansas for Medical Sciences, Little Rock, AR Robert Hamer, University of North Carolina, Chapel Hill, NC ChanHee
EST.03. An Introduction to Parametric Estimating
EST.03 An Introduction to Parametric Estimating Mr. Larry R. Dysert, CCC A ACE International describes cost estimating as the predictive process used to quantify, cost, and price the resources required
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of
Introduction to Exploratory Data Analysis
Introduction to Exploratory Data Analysis A SpaceStat Software Tutorial Copyright 2013, BioMedware, Inc. (www.biomedware.com). All rights reserved. SpaceStat and BioMedware are trademarks of BioMedware,
Thematic Map Types. Information Visualization MOOC. Unit 3 Where : Geospatial Data. Overview and Terminology
Thematic Map Types Classification according to content: Physio geographical maps: geological, geophysical, meteorological, soils, vegetation Socio economic maps: historical, political, population, economy,
SOA 2013 Life & Annuity Symposium May 6-7, 2013. Session 30 PD, Predictive Modeling Applications for Life and Annuity Pricing and Underwriting
SOA 2013 Life & Annuity Symposium May 6-7, 2013 Session 30 PD, Predictive Modeling Applications for Life and Annuity Pricing and Underwriting Moderator: Barry D. Senensky, FSA, FCIA, MAAA Presenters: Jonathan
PREDICTIVE MODELS IN LIFE INSURANCE
PREDICTIVE MODELS IN LIFE INSURANCE Date: 17 June 2010 Philip L. Adams, ASA, MAAA Agenda 1. Predictive Models Defined 2. Predictive models past and present 3. Actuarial perspective 4. Application to Life
Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar
Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar Prepared by Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com [email protected]
Paper AA-08-2015. Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM
Paper AA-08-2015 Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM Delali Agbenyegah, Alliance Data Systems, Columbus, Ohio 0.0 ABSTRACT Traditional
The zero-adjusted Inverse Gaussian distribution as a model for insurance claims
The zero-adjusted Inverse Gaussian distribution as a model for insurance claims Gillian Heller 1, Mikis Stasinopoulos 2 and Bob Rigby 2 1 Dept of Statistics, Macquarie University, Sydney, Australia. email:
SAS/STAT. 9.2 User s Guide. Introduction to. Nonparametric Analysis. (Book Excerpt) SAS Documentation
SAS/STAT Introduction to 9.2 User s Guide Nonparametric Analysis (Book Excerpt) SAS Documentation This document is an individual chapter from SAS/STAT 9.2 User s Guide. The correct bibliographic citation
A Property and Casualty Insurance Predictive Modeling Process in SAS
Paper 11422-2016 A Property and Casualty Insurance Predictive Modeling Process in SAS Mei Najim, Sedgwick Claim Management Services ABSTRACT Predictive analytics is an area that has been developing rapidly
GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE
ACTA UNIVERSITATIS AGRICULTURAE ET SILVICULTURAE MENDELIANAE BRUNENSIS Volume 62 41 Number 2, 2014 http://dx.doi.org/10.11118/actaun201462020383 GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE Silvie Kafková
Dimensionality Reduction: Principal Components Analysis
Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely
Association Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA
Paper P-702 Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA ABSTRACT Individual growth models are designed for exploring longitudinal data on individuals
13. Poisson Regression Analysis
136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often
Geostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras [email protected]
4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4
4. Simple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/4 Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression
2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions
Model Validation Techniques
Model Validation Techniques Kevin Mahoney, FCAS kmahoney@ travelers.com CAS RPM Seminar March 17, 2010 Uses of Statistical Models in P/C Insurance Examples of Applications Determine expected loss cost
Linear Models and Conjoint Analysis with Nonlinear Spline Transformations
Linear Models and Conjoint Analysis with Nonlinear Spline Transformations Warren F. Kuhfeld Mark Garratt Abstract Many common data analysis models are based on the general linear univariate model, including
ABSTRACT INTRODUCTION
Paper SP03-2009 Illustrative Logistic Regression Examples using PROC LOGISTIC: New Features in SAS/STAT 9.2 Robert G. Downer, Grand Valley State University, Allendale, MI Patrick J. Richardson, Van Andel
Examining a Fitted Logistic Model
STAT 536 Lecture 16 1 Examining a Fitted Logistic Model Deviance Test for Lack of Fit The data below describes the male birth fraction male births/total births over the years 1931 to 1990. A simple logistic
A Non-parametric Approach to Modeling Exchange Rate Pass-through. in Basic Commodity Markets
A Non-parametric Approach to Modeling Exchange Rate Pass-through in Basic Commodity Markets Gülcan Önel * and Barry K. Goodwin ** * Food and Resource Economics Department, University of Florida, Gainesville,
Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection
Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics
Sensitivity Analysis in Multiple Imputation for Missing Data
Paper SAS270-2014 Sensitivity Analysis in Multiple Imputation for Missing Data Yang Yuan, SAS Institute Inc. ABSTRACT Multiple imputation, a popular strategy for dealing with missing values, usually assumes
9.2 User s Guide SAS/STAT. Introduction. (Book Excerpt) SAS Documentation
SAS/STAT Introduction (Book Excerpt) 9.2 User s Guide SAS Documentation This document is an individual chapter from SAS/STAT 9.2 User s Guide. The correct bibliographic citation for the complete manual
Regression 3: Logistic Regression
Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic regression Logistic regression in R Outline Logistic regression Introduction The model Looking at and comparing
Introduction to Longitudinal Data Analysis
Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
Chapter 1 Introduction. 1.1 Introduction
Chapter 1 Introduction 1.1 Introduction 1 1.2 What Is a Monte Carlo Study? 2 1.2.1 Simulating the Rolling of Two Dice 2 1.3 Why Is Monte Carlo Simulation Often Necessary? 4 1.4 What Are Some Typical Situations
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
Multinomial and ordinal logistic regression using PROC LOGISTIC Peter L. Flom National Development and Research Institutes, Inc
ABSTRACT Multinomial and ordinal logistic regression using PROC LOGISTIC Peter L. Flom National Development and Research Institutes, Inc Logistic regression may be useful when we are trying to model a
SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY
SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY ABSTRACT The purpose of this paper is to investigate several SAS procedures that are used in
Climate and Weather. This document explains where we obtain weather and climate data and how we incorporate it into metrics:
OVERVIEW Climate and Weather The climate of the area where your property is located and the annual fluctuations you experience in weather conditions can affect how much energy you need to operate your
Supervised and unsupervised learning - 1
Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in
Introduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
Chapter 5 Analysis of variance SPSS Analysis of variance
Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,
Piecewise Cubic Splines
280 CHAP. 5 CURVE FITTING Piecewise Cubic Splines The fitting of a polynomial curve to a set of data points has applications in CAD (computer-assisted design), CAM (computer-assisted manufacturing), and
Learning Module 4 - Thermal Fluid Analysis Note: LM4 is still in progress. This version contains only 3 tutorials.
Learning Module 4 - Thermal Fluid Analysis Note: LM4 is still in progress. This version contains only 3 tutorials. Attachment C1. SolidWorks-Specific FEM Tutorial 1... 2 Attachment C2. SolidWorks-Specific
II. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION
ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? SAMUEL H. COX AND YIJIA LIN ABSTRACT. We devise an approach, using tobit models for modeling annuity lapse rates. The approach is based on data provided
How to Get More Value from Your Survey Data
Technical report How to Get More Value from Your Survey Data Discover four advanced analysis techniques that make survey research more effective Table of contents Introduction..............................................................2
Scatter Plot, Correlation, and Regression on the TI-83/84
Scatter Plot, Correlation, and Regression on the TI-83/84 Summary: When you have a set of (x,y) data points and want to find the best equation to describe them, you are performing a regression. This page
Predicting daily incoming solar energy from weather data
Predicting daily incoming solar energy from weather data ROMAIN JUBAN, PATRICK QUACH Stanford University - CS229 Machine Learning December 12, 2013 Being able to accurately predict the solar power hitting
Canonical Correlation Analysis
Canonical Correlation Analysis LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the similarities and differences between multiple regression, factor analysis,
Poisson Regression or Regression of Counts (& Rates)
Poisson Regression or Regression of (& Rates) Carolyn J. Anderson Department of Educational Psychology University of Illinois at Urbana-Champaign Generalized Linear Models Slide 1 of 51 Outline Outline
Time Series Analysis
JUNE 2012 Time Series Analysis CONTENT A time series is a chronological sequence of observations on a particular variable. Usually the observations are taken at regular intervals (days, months, years),
Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler
Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error
Jinadasa Gamage, Professor of Mathematics, Illinois State University, Normal, IL, e- mail: [email protected]
Submission for ARCH, October 31, 2006 Jinadasa Gamage, Professor of Mathematics, Illinois State University, Normal, IL, e- mail: [email protected] Jed L. Linfield, FSA, MAAA, Health Actuary, Kaiser Permanente,
THE SELECTION OF RETURNS FOR AUDIT BY THE IRS. John P. Hiniker, Internal Revenue Service
THE SELECTION OF RETURNS FOR AUDIT BY THE IRS John P. Hiniker, Internal Revenue Service BACKGROUND The Internal Revenue Service, hereafter referred to as the IRS, is responsible for administering the Internal
