EVALUATION OF PROBABILITY MODELS ON INSURANCE CLAIMS IN GHANA

EVALUATION OF PROBABILITY MODELS ON INSURANCE CLAIMS IN GHANA E. J. Dadey SSNIT, Research Department, Accra, Ghana S. Ankrah, PhD Student PGIA, University of Peradeniya, Sri Lanka Abstract This study investigates the probability distributions that best fits the number of insurance claims. In particular, it compares the poisson distribution and the negative binomial distribution models to determine which distribution best fit insurance claim data obtained from two Insurance Companies in Ghana. Data on the number of claims of a funeral policy spanning from year 2006 to 200 were used for the study. Probability distribution models and the parametric bootstrap methods were employed in analyzing the data collected. The Negative Binomial distribution was found to be superior to the Poisson distribution in fitting the claims data. Also, the result revealed that the estimates obtained by the probability models and that of the parametric bootstrap estimates have no significant difference. Keywords: Poisson, Negative Binomial, Insurance Claims, Parametric Bootstrap Introduction Statistical methods have been paramount in the field of insurance due to risk involved in allocating insurance funds. Insurance funds may be invested in assets like bonds, equities and others. This helps to increase investment real returns in order to meet claim payment demands and other financial obligations. An appropriate statistical estimation is needed to acquire concrete information about the uncertain liabilities. This helps us to ascertain good decisions pertaining to assets allocation, expected monthly claims and payment targets as well as future insurance pricing. Policy holders expect a cushion in the event of economic loss as stipulated in an Insurance Contract. In view of this, the challenge of meeting the payment terms becomes an issue of much concern to the Insurer. A statistical estimation of expected claim liabilities of the Insurance Policies enables decisions on asset allocation and claim payment to be taken without much error. The main objective of the study is to explore probability models that will model adequately the number of claims occurring under funeral insurance policies in Ghana to estimate the expected number of claims. Specific objectives include but not limited to: To identify and explain seasonal variation within the number of claims. To compare the Probability distribution estimates of the number of claims. To derive the Probability distribution model that best fits the number of Claims for the funeral policies. To construct bootstrap confidence intervals for the Expected Number of Claims and compare with that of the estimates obtained from the models. 52

METHODOLOGY Secondary Data The data were collected from Star Life and Metropolitan life Insurance Company. The data consist of monthly recorded number of claims under a Family Funeral Insurance Policy for a period of Five years (2006-200). This policy was chosen because it is one of the most patronised insurance policies in Ghana. These two insurance companies represent a major controlling force in Ghana s insurance sector. Shapiro-Wilk test for Normality The Shapiro-Wilk test, proposed by Shapiro S. S. and Wilk M. B. (965), calculates a W statistic that tests whether a random sample, x, x 2,..., x n comes from (specifically) a normal distribution. Small values of W are evidence of departure from normality W = n ai x i= n i= ( i) ( x) xi 2 2 ----------- where the x (i) are the ordered sample values (x () is the smallest) and the a i are constants generated from the means, variances and covariances of the order statistics of a sample of size n from a normal distribution. The Null hypothesis is rejected if the test statistic W is too small or the p-value is less than the significance level α. The Kolmogorov-Smirnov test for Normality The two sided Kolmogorov-Smirov test tests the null hypothesis that two samples x; x2; x3;. and x; x2; x3; have a similar distribution. The test statistic is = ( x) ( x) n, n sup...2, n 2, n D X F F Where: F;n and F2; n are the empirical distribution functions of the first and second sample respectively. The null hypothesis is rejected at level α if p-value is less than the α. The Poisson Process and Distribution Function First of all, a Poisson process N is a stochastic process - that is, a collection of random variables N(t) for each t in some specifieded set. More specifically, Poisson processes are counting processes: for each t > 0 we count the number of "events" that happen between time 0 and time t. The kind of events in the case depends on the application. For example the number of insurance claims led by a particular driver, or the number of callers phoning in to a help line, or the number of people retiring from a particular employer, and so on. Whatever you might mean by an "event", N(t) denotes the number of events that occur after time 0 up through and including time t > 0. Finally, The Poisson distribution arises from independently and identically exponentially distributed inter-arrival times between events and is defined as fellows: Let X be a random variable with discrete distribution that is defined over N = {0; ; 2; 3; }. X has a Poisson distribution with parameter λ written as, X ~ Poisson(λ) if and only it the probability function is given by P k e λ λ = λ R, k = 0,, 2, ---------3 k! ( X k) =, The Poisson distribution has expected value, E(X) = λ and variance, Var(X) = λ. 53

The equality of the mean and variance is characteristic to the Poisson distribution and serves as the reference point of modeling count data. Modeling count data with the Poisson distribution requires randomness and homogeneity of the data which is referred to as equidispersion. If X and Y are Poisson distributed as X ~ Poisson (λ ) and Y _ Poisson (µ ), it follows that the random variable Z = X + Y is Poisson distributed as Z ~Poisson (λ +µ ). Negative Binomial Distribution A discrete variable is negative binomially distributed if they were generated from an occurrence or duration dependence process or if the rate at which events occur is heterogeneous. A random variable X has a negative binomial distribution with parameter α > 0 and θ > 0 written as X Negbin(α; θ), if the probability function is given by α k Γ ( α + k ) θ P( x= k) =, k = 0,, 2... ------4 Γ α Γ k + + θ + θ ( ) ( ) s Γ ( ) denotes the gamma function such that ( s) ( ) = + θ ( ) pgf s s α probability generating function ------------------5 Γ = z z e dz for s > 0. The 0 The mean and variance are given by E( X) = αθ ---------6 and var ( X) = αθ ( + θ ) = E( X)( + θ ) ----------7 The value of θ is called the dispersion parameter and measures the dispersion in the count data. Since θ > 0, the Variance of the negative binomial distribution exceeds it mean hence it is overdispersed. If X and Y are independently negatively binomial distributed with X Negbin ( λθ), and Y Negbin ( µθ),, it fellows that the random variable Z = X+Y is negative binomial distributed. i.e. Z Negbin ( λ + µθ, ). The Negative Binomial is preferable to the Poisson distribution in claim modeling because it is overdispersed and actual experience shows that this is certainly observed in Insurance. Goodness of Fit Test Two statistics that are employed in assessing the goodness of fit of a given distribution are the scaled deviance and the Pearson's chi square statistic. For a fixed value of the dispersion parameter θ, the scaled deviance is defined to be twice the difference between the maximum achievable log-likelihood and the log-likelihood at maximum likelihood estimate of the parameters. If l(y; µ ) is the log-likelihood function expressed as a function of the predicted mean values µ and the vector y of responses then the scaled deviance is defined by D ( y, µ ) = 2 l( y, y) l( y, µ ) --------------8 which can be expressed for specific distributions as 54

(, µ ) D y D ( y, µ ) = ------------9 θ the scaled deviance for the Poisson and negative binomial distributions are given as follows: y i 2 w i yilog ( yi µ i ) ------------0 i µ i and y y + k 2 wi y log y + log i µ k µ ---------- + k The scaled deviance is chi-square distributed with n- degrees of freedom, where n is the number of observations. Parametric Bootstrap Estimation The procedures for the Bootstrap estimation is outlined as follows: Given a random sample, X = (x,, x n ), estimate the appropriate probability distribution and calculate the desirable parameter ˆ θ. Sample with replacement from the estimated Probability distribution to obtain b b b X = X,..., X n ( ) Calculate the same statistic using the bootstrap sample in step 2 to get ˆ θ b Repeat steps 2 through 3, B times (i.e. the number of resamples desired). Use this estimate of the distribution of ˆ θ (i.e.,the bootstrap replicates) to obtain the desired characteristics as follows: B ˆ ˆ b θ = θ B b= 2 ˆ ( ˆ B ) ( ˆ b SE ˆ B θ = θ θ ) --------2 B b= biasˆ ˆ ˆ B = θ θ and the bias corrected estimator is given by ˆc θ = ˆ θ biasˆ 2 ˆ ˆ B = θ θ ------3 Results And Conclusions Seasonal Analysis A study of the seasonal changes in the occurrence of the number of claims revealed an increasing pattern for both Portfolios along the period. The highest number of claims was observed in July and March, 200 for StarLife and MetLife respectively. Several months between 2006 and 2007 recorded no claims (zero claims), that was linked to the reason that the policy was introduced in 2005 and as at the end of 2006, not much of the policies had been sold. Figure 4. is a box plot of the monthly recorded number of claims on yearly basis for the underlying years (2006-200). The yearly distributions of the number of claims were all tailed and skewed to either the left or right in the various years. The annual distributions of number of claims for MetLife were skewed to the right while those of StarLife were skewed to the left in 200 but in the other years under review they were all skewed to the right (2006-2009). This revealed that the annual distributions were not conventionally bell shaped (normally distributed) in any of the years but were all tailed and skewed. 55

Figure 4.: Boxplot of Yearly Number of Claims on Funeral Policy from StarLife and MetLife. Considering, the annual distributions during the active years (2008-200) where much of the policies had been sold, the year 2008 distribution of the number of claims from MetLife was heavily tailed, right skewed and fat arched (platykurtic) while that of StarLife was slightly tailed, right skewed and slender arched (leptokurtic). Furthermore, year 2009 distribution depicted a tailed, right skewed and slender arched (leptokurtic) for MetLife and tailed, right skewed and normally peaked (mesokurtic) for StarLife. Finally in year 200, the distributions for both portfolios were similar, tailed, asymmetric and slender arched (leptokurtic). Overall, the number of claims depicted increasing patterns across the years with rising annual averages (see Table 4. for details). Year 2006 2007 2008 2009 200 StarLife 0.33 2.08 8.25 7.72 36.67 MetLife 0.33 5.33 22.8 5.83 39.53 Table 4.: Trend of Annual Averages of Number of Claims from Starlife and MetLife Figure 4.2: Quatile Quatile plot of Number of Claims on Funeral Policy from StarLife and MetLife. 56

Comparison of the Distributions The preliminary analysis on the number of claims suggests that the distributions of both portfolios were not normal. To confirm, the null hypothesis of normality was rejected at a p-value of.28 x 0-8 for StarLife and.89 x 0-4 for MetLife according to Shapiro-Wilk test of normality. To answer the question as to whether the distribution of number of claims for both insurance companies came from the same distribution, a QQ plot displayed in Figure 4.2 was constructed and the fairly linear trend preliminarily suggest that the distribution of the number of claims for both insurance companies belong to a same distribution. Moreover, the p-value of 2.3 x 0 - obtained by the two sided Kolmogorov-Smirnov test is statistically significant to ascertain that the probability distribution of the number of claim are the same. Fitting Probability Distribution to Claims The density estimates as displayed in figure 4.3 for both portfolios are not significantly different; they both have a similar distribution but the StarLife distribution curve is stepper and slender than that of MetLife Insurance Company. Figure 4.3: Density Estimate of Number of Claims on Funeral Policy from StarLife and MetLife. Figure 4.4: Negative Binomial and Poisson Distribution Fit to the Number of Claims on Funeral Policy from StarLife and MetLife. 57

Furthermore, it is visible that the sample distribution though uni-modal, has several turning points which is not typical of the conventional probability distribution at a glance. However, ascribing this abnormality to the presence of outliers in the data set may warrant smoothening (fitting) to depict a true probability distribution. Empirical Distributions that can be fitted to the observed data include the Mixed Poisson Distributions starting with the Poisson distribution as discussed in the literature. Figure 4.4 and 4.5 were produced as a result of fitting the number of Claims for StarLife and MetLife respectively with the negative binomial and Poisson distribution. A critical look at the charts (figures 4.4 and 4.5) reveals that the Poisson distribution does not fit well to the number of claims from both StarLife and MetLife Insurance Companies. However the Negative Binomial Distribution fit the Number of Claims reasonably well. The maximum likelihood estimates of the Poisson mean for StarLife and MetLife were Ʌs = 3.05 and Ʌm = 6.65 respectively. The 95% confidence interval for Ʌs and Ʌm were (2.672; 3.9968) and (5.6489; 7.752) and finally, the log-likelihood were 228.36 and 80.5978. However, for the negative binomial distribution, the means were estimated to be Ʌs = 3.0500 and Ʌm = 6.6500 and the dispersion parameters to be Ʌs = 2.066 and Ʌm =.4323 for the StarLife and MetLife number of Claims. The variance of the random effect for both StarLife and MetLife were estimated to be Vs(Θ) = 0.49588 and Vm(Θ) = 0.6988 respectively. The 95% confidence interval for the dispersion parameters were (.2633; 2.7699), (0.892;.9734) and (0.360; 0.796), (0:5067;.22) for their variance V(Θ). The Log-likelihood of the Negative Binomial Distribution was 667.5653 and 275.5353 for StarLife and MetLife respectively which were far better than that of the Poisson distribution for both StarLife and MetLife respectively. Finally, the confidence interval estimates of the monthly expected number of claims estimated by the Negative Binomial Model were approximately (9; 9) and (2; 23) claims for StarLife and MetLife.(see table 4.3 for details) Figure 4.6 provides summary statistics of the Goodness of fit of the negative binomial distribution for both StarLife and MetLife Insurance Companies. The Scaled deviance of 69.3796 and 7.2286 for StarLife and MetLife compared to the asymptotic chi-square with 59 degrees of freedom yielded a p-value of about 0.5, which implies we cannot reject the null hypothesis that the specified negative binomial model is the correct model. Figure 4.5: PP Plot of the Distribution of Number of Claims, Negative Binomial and Poisson Distribution. Again, the dispersion parameter estimates were 2.066 and.4323 for StarLife and MetLife respectively. 58

Company Poisson Estimate N. Binomial Estimate StarLife 2.672; 3.9968 9.0494 ; 8.896 MetLife 7.752 ; 7.752 2.2227; 22.6804 Table 4.2: Interval Estimates of the Number of Claims from StarLife and MetLife The deductions are that the number of claims from StarLife and MetLife is not purely independent as required by the classical Poisson process. The number of claims is duration and occurrence dependent because the composition of the number of portfolios are constantly varying (either increasing or decreasing) as a result of continuous sales of the policy. In effect the assumption of independence of the random process is being violated hence the inadequacy of the Poisson model. However, the Negative Binomial Distribution produced considerably good fits in the sense that it is the limiting form of the resulting distributions that arise in the situation of occurrence and duration dependence and is known as Pölya - Eggenberger distribution. Figure 4.6: Criteria for assessing goodness of fit of the Negative Binomial model of the Number of claims. Again, the number of claim process is not homogeneous as required by the poisson distribution. The occurrence of death which drives the number of claims varies with every policy holder as a result of unobservable social, moral, economic and health factors. Again, the poisson distribution assumption of homogeneity is being violated and the negative binomial distribution arises as the limiting distribution and for that matter is adequate. Bootstrap Estimates Company ˆ θ ˆ θ * Bias ˆc θ StarLife 3.05 2.9978-0.0522 3.022.9923 MetLife 6.65 6.7274 0.0774 6.5726.783 Table 4.3: Summary statistics of Bootstrap replicates of Number of Claims from StarLife and MetLife. SE B ( ˆ θ ) Table 4.3 shows the summary statistics of the bootstrap estimate of 00 resamples from the estimated probability distribution of the number of claims from StarLife and MetLife. A comparison between the expected monthly number of Claims ˆ θ obtained by the 59

probability models and that obtained from the bootstrap method ˆc θ showed no significant variation as they were set at 3 and 7 claims for StarLife and MetLife insurance company. Conclusion The Negative Binomial distribution appears to be superior to the Poisson distribution for fitting insurance claims and therefore, provides somewhat reliable estimates for planning, decision making as well as estimation in insurance administration. The bootstrap estimates did not vary from the estimates obtained by the probability models. This research only focuses on choosing between the Poisson and the Negative Binomial distribution for fitting insurance claims and estimating the monthly expected number of claims. The bootstrap estimates should be obtained and compared with the estimates from the probability models to authenticate the estimates. Further work should be conducted using other models including mixed poison probability models. References: Aitkin, M 999, "A general maximum likelihood analysis of variance components in generalized linear models," Biometrics 55, 7-28. Annette J. Dobson (990), "An Introduction to Generalized Linear models," Chapman and Hall, London. Atella, V. and F.C. Rosati 2000,"Uncertainty about children's survival and fertility: A test using Indian microdata," Journal of Population Economics 3(2): 263-278. Barmby, T. And J. Doornik 989, "Modeling trip frequency as a Poisson variable," Journal of Transport Economics and Policy 23(3): 309-35. Baron, D.N. 992, "The analysis of count data: Overdispersion and autocorrelation," in P. Marsden (ed.) Sociological Methodology 992, Blackwell: Cambridge, MA, 79-220. Blundell, R., R. Gri_th and J. van Reenen 995, "Dynamic count data models of technological innovation," Economic Journal 05: 333-344. Bockenholt, U. 999, "Mixed INAR () Poisson regression models: analyzing heterogeneity and serial dependence in longitudinal count data," Journal of Econometrics 89: 37-338. Bowman, K.O., and Shenton L.R. 988, "Properties of Estimators for the Gamma Distribution", New York: Marcel Dekker. Breslow, N. 990, "Tests of hypotheses in overdispersed Poisson regression and other quasilikelihood models," Journal of the American Statistical Association 85: 565-57. Buck, A.J. 984, "Modeling a Poisson process: strike frequency in Great Britain," Atlantic Economic Journal 2(): 60-64. C. Davison and D. V. Hilkley (997),"Bootstrap Methods and Their Applications" Cambridge University Press. Cameron, A.C. and P. Johansson 997, "Count data regression using series expansions: with applications," Journal of Applied Econometrics 2(3): 203-223. 60