ISBA 2, Proceedings, pp. ISBA and Eurostat, 21 Bayesian Estimation of Joint Survival Functions in Life Insurance ARKADY SHEMYAKIN and HEEKYUNG YOUN University of St. Thomas, Saint Paul, MN, USA Abstract: The insurance industry recently experienced a high demand for life insurance policies issued to married couples, with payoff due at the second spouse s death. The fair pricing of such policies is an example of insurance problems requiring the construction of two or higher dimensional survival functions. Unfortunately, while a lot of information is available regarding univariate survival functions, little data can be found allowing for estimation of the association between them. The assumption of independent univariate survival functions of the spouses is not supported by empirical data. The parametric copula models currently used for bivariate constructions are based on spouses physical age only. They do not explicitly address such factors as "common disaster"and "broken heart", related to real (chronological) time. We suggest a modification of existing copula models using Bayesian approach, which allows us to incorporate existing information concerning univariate survival functions. Numerical results are presented for MLE and Bayesian estimation. A direction for further research is suggested. Keywords: WEIBULL SURVIVAL FUNCTION; HOUGAARD COPULA; JOINT LAST SURVIVOR INSURANCE. 1. JOINT LAST SURVIVOR INSURANCE In recent years, the insurance industry has experienced increased demand for life insurance policies issued to female-male pairs (mostly, married couples) with the benefit payoff due at the second death of the spouse (joint last survivor policy). These policies are generally used by older couples for estate tax purposes and carry large amounts of insurance. The problem of fair pricing for such policies deals with the evaluation of a two-dimensional survival function for a married couple. To price such insurance, we can use the following formula for the last survivor insurance premium Üܾ derived through the calculation of annuity values Đ Üܾ : Üܾ Đ Üܾ Đ Üܾ ¼ Ô Üܾ µ
where Ü and ܾ are respectively female and male ages at the policy issue date (entry ages), Ô Üܾ is the probability that the "last survivor status" survives for years after the issue date, and is the interest rate. Let us denote by 2. DEFINITIONS AND NOTATION Ë Øؾµ È Ø¾ ؾµ the joint survival function of the spouses, where is the wife s lifelength (age at death) and ¾ is the husband s lifelength (age at death). Let ØÔ Ü È Ü Ø Ü µ ÓÖ ¾ be female and male conditional survival probabilities for given entry ages Ü. The premium computations for the joint wife-husband policies (see, e.g., (1) or Bowers et al. (1997)) require the estimation of survival probabilities for the joint last survivor status: ØÔ Üܾ È Ü Ø ¾ ܾ Ø Ü¾ ܾ If we assume independence of the spouses lifelengths (in Section 6 - Model I), then we can write down: ØÔ Üܾ Ø Ô Ü ØÔ Ü¾ ØÔ Ü Ø Ô Ü¾ In general this is not true. We can, however, obtain the following representation of probabilities in (2) in terms of the joint survival function ¾µ ØÔ Üܾ Ë Ü Ø Ü¾µ Ë Üܾ ص Ë Ü Ø Ü¾ ص Ë Üܾµ µ Bowers et al. (1997) recommend a simpler formula which is practically applicable assuming partial independence between and ¾: ØÔ Üܾ Ø Ô Ü ØÔ Ü¾ ØÔ Üܾ µ where ØÔ Üܾ È Ü Ø ¾ ܾ Ø Ü¾ ܾ µ µ 3. DATA DESCRIPTION In the following sections we will suggest two approaches to the estimation of probabilities (2) and use a numerical example for illustration. The data set we use comes from 14,947 joint last survivor annuity contracts of a large Canadian insurer. The contracts were in payoff status over the observation period December 29, 1988 through December 31, 1993. For each contract, we have information on: - the date of birth, - the date of death (if applicable), - the date of contract initiation (entry age),
- sex of each annuitant (paired data). Each couple was included in the data set only once. Multiple contracts to the same couple were eliminated. Information from 11,457 pairs (mostly, married couples) was used in the study. Following insurance industry practice, we rounded entry ages Ü and ܾ and ages at death and ¾ to the nearest integer. We should point out that our data are left truncated and right censored, which causes some additional difficulties. Additionally, current female and male mortality tables were provided courtesy of a major Minnesota insurance company. They were used for elicitation of the hyperparameters of the Bayesian models. 4. MLE IN COPULA MODEL In some recent studies (see Hougaard (1986), Hougaard et al. (1992), Frees et al. (1996)), the method of copula functions was suggested for the construction of joint survival functions. According to this method, the joint survival function of and ¾ is represented as Ë Øؾµ È ØµÈ ¾ ؾµµ where Ù Úµ is a copula - a function with special properties, mixing the univariate survival functions Ù and Ú with an association parameter. Frees et al. (1996) suggest using twoparameter Gompertz or Weibull univariate female and male survival functions and Frank s copula Ù Úµ Ù Ú «ÐÒ «Ùµ µ «Úµ µ «with the association parameter «¼ («¼for independent univariate lifelengths). Then the maximum likelihood estimator is constructed for the 5-dimensional vector parameter, where the first 4 components correspond to parameters of Gompertz or Weibull univariate survival functions, and the last one is the parameter of association, «. Another alternative is the Hougaard s copula Ù Úµ ÜÔ Ò ÐÒ Ùµ «ÐÒ Úµ ««Ó with the association parameter «(«in case of independence). Following the general approach of Frees et al. (1996), we construct a Weibull-Hougaard copula model. We assume the joint bivariate survival function Ë Øؾµ to have the form of a Hougaard s copula, mixing two-parameter Weibull univariate female and male survival functions with scale parameters and shape parameters. Ë Øؾµ ÜÔ Ø «Ø ¾ ¾ «¾ «Then we build the maximum likelihood estimates of the five parameters of this model taking account of the right censoring and left truncation as presented in Table 3. Finally, we can evaluate probabilities (2) directly from formula (3). µ
Unfortunately, there are issues, which raise a question of adequacy of either maximum likelihood estimation, or the copula approach as it is used in this situation. Let us illustrate these issues using our data set in the following two subsections. 4.1. Age and Chronology: Shape of the Copula Surface The graphs below depict the surfaces ØØ¾Ë Øؾµµ and Øؾ ¾ Ë ØؾµØؾµ built according to model (6) with the parameter values for scale and shape estimated by MLE (see Table 3). Association «is allowed to vary. Figure 1. Shapes of Copula Surfaces 1.75.1.5.5.25 1.1 Surfaces Ø Ø ¾ Ë Ø Ø ¾ µµ and Ø Ø ¾ ¾ Ë Ø Ø ¾ µ with «1.75.2.15.5.1.25.5 1.2 Surfaces Ø Ø ¾ Ë Ø Ø ¾ µµ and Ø Ø ¾ ¾ Ë Ø Ø ¾ µ with «¾ 1.75.5.15.1.25.5 1.3 Surfaces Ø Ø ¾ Ë Ø Ø ¾ µµ and Ø Ø ¾ ¾ Ë Ø Ø ¾ µ with «¼
On pictures 2.1-3.2 one can observe a "ridge" corresponding to higher values of the joint density, approximately along the diagonal ØØ ¼µ. The higher the association, the steeper is the slope of the ridge. According to copula model (6), the pairs of Øؾµ under this ridge are the ages of a higher life hazard for a couple. Thus a higher association means a higher life hazard for a woman when she is approximately 1 years older than her husband was when he died. However, at least a part of the association between the times of spouses deaths is due to common-disaster or broken-heart factors. Therefore, one could expect an increased number of cases when spouses deaths occur closely one after the other in real (chronological) time. This increased mortality corresponds to ages ØØ µ, where is the actual age difference between the spouses. These points, depending on, lie along different diagonals on Øؾµ plane, and not directly under the ridge, which we see on the graphs above. It sets some doubt whether common-disaster and broken-heart factors are adequately represented by copula model (6). According to the previous argument, we may suspect that the misrepresentation of the association by the copula model leads to its underestimation by MLE. This should be relatively easy to trace, because there is a direct relationship between the association «in the Hougaard s model and Kendall s non-parametric correlation (see, e.g., Frees and Valdez (1998) and Youn and Shemyakin (1999)). 4.2 Drifting Univariate Parameters How would underestimation of «in the model (6) affect the MLEs of the shape and scale parameters of the underlying two-parameter Weibull distributions? A property of a copula function Ù Úµ is Ù µ Ù, ÚµÚ, so it preserves the marginal survival functions. Therefore one expects that whatever «is, model (6) will give us an accurate estimate of and. However, the following table demonstrates the result of fixing a value of «and then obtaining a MLE for and. Table 1. MLE of Weibull Parameters for Given «FEMALE MALE «¾ ¾ ¾ ¾ 1 9.95 92.68 88.15 1.65 7.95 86.34 81.29 12.13 1.2 1.27 91.17 86.83 1.19 7.98 86.15 81.12 12.6 1.4 1.25.22 85.92 1.1 7.87 86.4.96 12.2 1.6 1.2 89.6 85.25 1.23 7.69 85.98.81 12.44 1.8 9.68 89.19 84.73 1.51 7.46 85.96.67 12.77 2 9.3 88.91 84.32 1.86 7.21 85.96.53 13.16 3 7.36 88.27 82.78 13.28 5.91 85.99 79.71 15.66 There is a clear downward trend in scale parameter values versus «, better detected in the mean lifelenghts (boldfaced). It gives reason to doubt that underestimation of «has indeed no effect on the estimation of and.
5. BAYESIAN MODELS There are at least three reasons for considering Bayesian approach instead of direct MLE in a full copula model. First, there is a substantial amount of prior knowledge concerning the univariate survival functions (insurance industry experience, census data, etc.). Second, Bayesian methods might help to overcome issues discussed in subsections 4.1 and 4.2. Third, Bayesian methods prove to be not as sensitive as MLE to left truncation of data and possible underreporting of the first death, which seems to be a common problem for the last-survivor insurance data. If formula (4) is used, then the evaluation of Ø Ô Üܾ requires only the knowledge of the univariate survival functions and the conditional probability ØÔ Üܾ È Ü Ø ¾ ܾ ص È Ü¾ ܾµ Consider a vector observation Ý Ø Üܾµ, whereü and ܾ are the entry ages of the female and male partners respectively, is the censoring indicator, and Ø is the termination time. The termination is defined as the first death in a couple or failure of the joint-life status if observed ( ¼, no censoring). Otherwise, Ø is the end of the observation period ( ). Let us assume that where for j = 1,2 ØÔ Üܾ ÜÔ Û «Û«¾ µ«µ Û Û Ø Ü µ Û Ø Ü µ Ü Ø Ü µ For each value of Üܾµ (7) is a Hougaard copula function built on Weibull univariate survival functions conditioned by the entry age. This way we resolve the issue discussed in subsection 4.1: conditioning by the entry ages Üܾµ eliminates the conflict between physical age and chronological time. Therefore, the vector of parameters is «¾¾µ, where«is the parameter of association, and conditional Weibull survival functions have shape parameters and scale parameters. The following informative priors reflect the substantial information on the univariate survival functions (cf. mortality tables) and a little knowledge of «. Although and cannot take on negative values, assumption of normal priors still seems feasible since the means aremorethan ten standard deviations away from zero. Model B1:» Æ µ» Æ ³ µ µ Model B2:» Æ µ» Æ ³ µ «µ» ««Values of the hyperparameters are determined by resampling from industry mortality tables. We will use
Table 2. Hyperparameter Values Female Male «³ ¾ ¾ ³¾ ¾ 8.535.454 89.62.412 7.97.485 86.96.422 1 1 If we consider ¼ ÓÖ and ÓÖ Ò, the likelihood function for the sample Ý Ý Ò µ of size Ò has the form Ð «Ý Ý Ò µ ÜÔ ¾ Û «¼ ¼ Ò ¾ Ü Ø µ ¼ ¾ Û «Û «««Estimates of the posterior means were obtained for Model B1 with the help of MCMC implementation in WinBUGS 1.3 using a popular "ones" trick. However, this trick failed to work for an improper prior in Model B2. In this case an optimization routine was used to estimate the posterior modes for the parameters of interest. 6. COMPARISON OF MODELS The results of implementation of two Bayesian models to the data set described above are presented in the Table 3 below. We also include the results from Model ML: maximim likelihood estimation according to full Weibull-Hougaard copula model (see Section 1) and Model I: no association, marginal survival functions estimated separately under an assumption of independence using maximum likelihood estimation for a Weibull parametric model. Table 3. Parameter Estimates Parameters Model B1 Model B2 Model ML Model Female 8.61 8.83 9.96 9.98 89. 89.68 89.51 92.62 Male ¾ 7.2 8.12 7.65 7.94 ¾ 87.6 87.1 85.98 86.32 Association «1.81 1.82 1.64 1 Table 3 demonstrates that both Bayesian models suggest a higher value of «than Model ML, which goes in tune with the discussion in Section 4. Additionally, Table 4 below provides a more detailed account of the WinBUGS output for Model B1.
Table 4. MCMC Parameter Estimates node mean sd MC error 2.5 median 97.5 start sample «1.812.7293.148 1.49 1.612 3.723 41 5999 8.614.655.185 7.379 8.61 9.846 41 5999 89.59.6332.182 88.34 89.59.82 41 5999 ¾ 7.22.68.167 5.847 7.224 8.523 41 5999 ¾ 87.6.6324.1 85.81 87.7 88.29 41 5999 A direction of further research is suggested by using a more general formula (3) instead of (4). It will require a more complicated model than (7). An alternative approach emphasizing the use of age difference between the spouses is developed in Youn and Shemyakin (1999). The authors appreciate the support of the Society of Actuaries. Partial support of this work was also provided by an internal research grant of the University of St. Thomas. We also want to thank Tom Louis for valuable comments and many participants of the poster session of ISBA-2 for the fruitful discussions. REFERENCES Anderson, J.E., Louis, T.A., Holm, N.V. and Harvald B. (1992). Time Dependent Association Measures for Bivariate Survival Functions, J. Amer. Statist. Assoc. 87, 419 Berger, J.O. and Sun, D. (1993) Bayesian Analysis for the Poly-Weibull Distribution, J. Amer. Statist. Assoc. 88, 1412-1418 Bogdanoff, D.A. and Pierce, D.A. (1973) Bayes Fiducial Inference for the Weibull Distribution, J. Amer. Statist. Assoc. 68, 659 664 Bowers, N., Gerber, H., Hickmann, J., Jones, D. and Nesbitt, C. (1997) Actuarial Mathematics, Schaumburg, Ill.: Society of Actuaries Dellaportas, P. and Wright, D.E. (1991) Numerical Prediction for the Two-parameter Weibull Distribution, The Statistician 4, 365 372 Frees, E., Carriere, J. and Valdez, E. (1996) Annuity Valuation with Dependent Mortality, Journal of Risk and Insurance, 63, 229 Frees, E. and Valdez, E. (1998) Understanding Relationships Using Copulas, North American Actuarial Journal, 2,1-25 Hougaard, P. (1986) A Class of Multivariate Failure Time Distributions, Biometrika 73, 671 678 Hougaard, P., Harvald, B., and Holm, N.V. (1992) Measuring the Similarities Between the Lifetimes of Adult Twins Born 1881-193, J. Amer. Statist. Assoc. 87,17 Smith, R.L. and Naylor, J.C. (1987) A Comparison of Maximum Likelihood and Bayesian Estimators for the Three-parameter Weibull Distribution, Applied Statistics, 36, 358 369 ¼
Youn, H. and Shemyakin, A. (1999) Statistical Aspects of Joint Life Insurance Pricing, ASA 1999 Proceedings, Business and Economic Statistics Section, 34-38