Applying Survival Analysis Techniques to Loan Terminations for HUD s Reverse Mortgage Insurance Program - HECM Ming H. Chow, Edward J. Szymanoski, Theresa R. DiVenti 1 I. Introduction "Survival Analysis" has been widely used for studies of many research topics in scientific and engineering fields with death/termination/failure events. In this study, survival analysis techniques were applied to determine the factors affecting loan termination probabilities for the reverse mortgage insurance program initiated by the U.S. Housing and Urban Development (HUD) in 1989. A reverse mortgage is a loan, secured by the home equity of an older (age 62 or above) borrower, which does not have to be repaid until the borrower dies or moves out of the home. The borrower may receive loan proceeds either as a lump sum payment at the outset or periodically over the life of the loan. The program, officially called the Home Equity Conversion Mortgage (HECM), provides mortgage insurance for private lenders to protect lenders against non-repayment of the loan, which could occur, absent this protection, if the loan balance grows to exceed property value when the loan becomes due and payable. The loans become due and payable (terminate) when the borrower dies, moves out of the property, or voluntarily repays the debt. The mortgage insurance, funded by premiums paid to HUD by all borrowers, makes lenders more willing to offer these loans to seniors. The HECM program, being a new kind of mortgage loan, was launched by HUD in 1989 without the benefit of previous experience on termination rates and mortgage cash flows with which to price the premium for this coverage. Instead, HUD used reasonable, but untested assumptions relating to termination rates and cash flows. The many possible factors affecting loan termination include (1) the underlying mortality rates of the borrowers; (2) other borrower characteristics (such as wealth); and, (3) the future value of the property. With the availability of almost ten years data, this study was undertaken to better understand HECM cash flows by constructing loan survivorship tables and by estimating loan termination probabilities using a multivariate statistical model to relate the above mentioned factors to observed loan terminations. Using this analysis, the authors were able to test HUD s original assumption (made in 1989 when no program experience was available) that HECM loans would terminate approximately 1.3 times the agespecific mortality rate for females in the general population. The findings were that the assumption is much too low for younger borrowers, and slightly too high for older borrowers. II. The Survival Analysis Method and the Hazard Function In order to choose an appropriate survival analysis method, it is important to distinguish between observations which end in a loan termination, and those for which the termination is censored (that is, the termination date had not occurred as of the date of the study). In survival analysis theory, right censoring indicates that observation of the individual stopped 1 This paper reflects the views of the authors and not those of the U.S. Department of Housing and Urban Development, the Office of Federal Housing Enterprise Oversight, or Computer Based Systems, Inc.
before an event of interest occurs. Similarly, left censoring indicates that an event of interest occurred before the observation started. The problem of survival analysis applied to the HECM loan data is an example of right censoring because the time variable is expressed in terms of policy years (or loan age) -- all loan observations are considered to start when the loan is originated (policy year = 0) regardless of the calendar year during which the origination occurred. Each loan can either be terminated (the event is observed), or be censored (not observed) at the cut-off time of the study. The calendar time of loan origination will vary, but the study cut-off time is set to be September 30, 1998. All loans, regardless the time of origination, which survived to the cut-off date are considered right censored. The HECM loan survival analysis deals with events that are stochastically occurring with some probability distribution. The cumulative distribution function (CDF), relating to the probability distribution, shows the probability that the chosen variable T, the event time (expressed in policy years not calendar time), is less or equal to a given value t. Thus, F(t) = Pr[T <= t] denotes the CDF of the specific event. With assumptions of continuous variables, the probability distribution is called the probability density function (PDF). The PDF is defined as: F(t) = df(t)/dt, where S(t) = Pr[T > t] = 1 - F(t), and Based on continuous survival data the hazard function is therefore defined as: h(t) = lim[ t ~ 0] {Pr[t <= T < t+ t T >= t] / t}. It can also be written as f(t) = lim[ t ~ 0] {Pr[t <= T < t+ t ] / t}, where ~ denotes "approaching". The hazard function can also be expressed as: h(t) = f(t) / S(t). For this study, the hazard to be analyzed is the event of loan termination, irrespective of the reason for termination (that is, there was no distinction between loan terminations which occurred because the borrower died and those which occurred because the borrower moved or voluntarily paid off the loan the available data would not support these distinctions). The cut-off date for this study was September 30, 1998 as mentioned before. Let T be the duration in policy years of a terminated loan and C be the duration of a censored loan, i.e. T = CEIL((termination date - origination date)/365.25), C = CEIL((September 30, 1998 - origination date)/365.25), where, CEIL is a rounding function which returns the smallest integer that is greater than or equal to the argument in the parentheses. In addition, let s define: t(i) = number of loans terminated in the i-th policy year, where t(0) is zero. F(t) = -ds(t)/dt.
c(i) = number of loans censored in the i-th policy year, where c(0) is zero. e(i) = number of loans at risk at the start of the i-th policy year, where e(0) is the number of loans at risk at the very beginning which is the total number of loans. h(i) = hazard rate of the i-th policy year, where h(0) is 0.0000. s(i) = survival rate at the end of the i-th policy year, where s(0) is 1.0000. Thus, the formulas to calculate Number of Loans at Risk - e(i), Hazard Rates - h(i), and Survival Rates - s(i) can be expressed as: III. e(i) = e(i-1) - t(i-1) - (1/2)c(i-1) - (1/2)c(i) h(i) = t(i) / e(i) s(i) = s(i-1)*(1-h(i)), for i = 1 to all policy years. Comparison of the Assumed HECM Termination Rates and the Calculated Rates Applying the formulas of the last section to the HUD HECM database of 29,332 loans, the hazard and survival rates were calculated for three age sub-groups (borrowers who entered the program at ages 64-66, 74-76, or 84-86). Table 1 shows the assumed loan termination rates used by HUD in 1989 when it launched the HECM program. It is obtained by multiplying 1.3 by the age-specific female mortality rates from the Dept. of Health and Human Services (HHS) 1979-1981 US Decennial Life Tables. Tables 2-4 show the comparison results of calculated hazard and survival rates and the assumed HECM rates. The calculated hazard and survival rates are outputs from SAS PROC LIFETEST using the life table method. Table 2 shows that the assumed HECM termination rates are generally not within a 95% confidence interval of the observed hazard rates. (The confidence intervals are based on the normal approximation of a binomial distribution with the observed hazard rate as the expected value of the distribution). In particular, the original HUD assumption of termination rates appears too low for the younger age group (64-66). Note, however, as the policy year increases, the assumed rates become a better proxy for the observed HECM loan termination. For the typical borrower group (ages 74-76), the assumed rates fall inside the calculated confidence interval for policy year 5 to 8 only (see Table 3). The assumed HECM rates fit in better but follow the similar trend like the younger group -- the observed termination rates of loans with lower policy years are generally higher than the assumed rates. To the contrary, the older borrower group (ages 84-86) has loan termination rates that are at or slightly higher than the assumed rates (see Table 4). One exception is for the first policy year, in which the observed termination rates are low. In fact, for all three age groups, the first policy year posts substantially lower rates than subsequent policy years. This could be due to the following reasons: (1) a self-screening effect that homeowners with very poor health or a terminal illness will not participate in HECM program; (2) borrowers will be less inclined to
voluntarily pay off the loan immediately after origination than they might be a year or two later. The comparison, based on the first nine years data of the HECM program, shows that younger borrowers in their midsixties are paying off the loans much faster than the originally assumed rates while those in their mid-seventies pay off HECM loan a little faster than assumed. The older borrowers in their mid-eighties are paying off their loans at or slightly less than the assumed rates. IV. Examine Factors Affecting HECM Loan Termination The comparison of survival rates of the last section shows HUD s original actuarial assumption of loan termination rates may be too simple and not sufficient to reflect the actual loan termination experience. With the availability of almost ten years HECM program data and the SURVIVAL ANALYSIS procedure of GENMOD, a regression model based on the work of British statistician Sir David Cox was developed. By applying Cox's proportional hazard model with complementary log-log (CLL) link function, an alternative estimation of loan termination probabilities seems to be more promising. The model has a binomial error distribution and uses a maximum likelihood method for the estimation. As mentioned before, we calculated the "policy year" for each HECM loan termination, which means there will be many terminations that will occur at the same points in time (i.e. policy years). The many ties or discrete data format for HECM loan data can cause computational problems for a continuous version of Cox s model (using partial likelihood method for estimation). Instead we employed a model with discrete data derived from the Cox model with continuous-time data. (That is, we are only interested in the policy year of termination, not a continuous time measure of termination). The complementary log-log model we developed is shown below: Log[-Log(1-P it )] = α t + β 1 DMM1 t + β 2 MTRY it + β 3 MTRM it + β 4 MTRO it + β 5 APPR it + β 6 DCOB it + β 7 DGEN it + β 8 INC it + β 9 ASSET it + β 10 YEAR it Where, P it is the conditional probability of loan termination for individual loan i in policy year t, (conditional on the loan having survived to the start of policy year t), DMM1 t is the dummy variable used to reflect the substantial lower rates posted by the first policy year observations, MTRY it is the age-specific female mortality rate (%) for individual i in the younger age group (those who entered the program at age 70 or below) at policy year t, MTRM it is the age specific female mortality rate (%) for individual i in the middle age group (initially 71 to 80) at policy year t, MTRO it is the age-specific female mortality rate (%) for individual i in the older age group (initially over 80) at policy year t, APPR it is the cumulative home appreciation rate (based on a statespecific house price index series)
since policy year 0 for individual i at policy year t, DCOB it is a dummy variable indicating loan i has a "coborrower" (i.e., a married couple as opposed to a single borrower the HECM loan is not due until both co-borrowers die or move) note that this dummy actually remains unchanged for all policy years, DGEN it is the dummy variable for "gender" status for individual i at policy year t, (note that the gender of the younger of two co-borrowers is used and this status remains unchanged for all policy years), INC it is an the income variable (at the time of loan origination and expressed in $1,000 s) for individual i at policy year t, note that this status remains unchanged for all policy years, ASSET it is the borrower asset variable (defined as liquid assets held by the borrower at the time of origination and expressed in $1,000 s) for individual i at policy year t, note that this status remains unchanged for all policy years, YEAR it is a policy year variable for individual i at policy year t, and its value equals t. Note that both policy year is introduced NOT as a dummy variable, but rather as an ordered categorical variable which will give the Cox hazard model a baseline hazard (strictly time-dependent) from which the other explanatory variables will give proportional changes. A Logit model with the assumption that events happened at discrete points in time could have been used, but CLL model was chosen because the error distribution of the predicted probabilities is better suited to discrete survival analysis, and because a CLL transformation handles potential asymmetries in the conditional probabilities better than a Logit link function. (An example of the asymmetry is when the cumulative distribution function is an S-curve that approaches 1 much more rapidly than it approaches zero). The development of the model also considered several levels of pooling of the data. It was, first, believed that three separate models, each corresponding to a selected age group (perhaps defined as the age groups used in the construction of the hazard rates) would be constructed and analyzed separately to reflect the differences by age group. However, with the introduction of mortality-rate variables interacted with an age group dummy variable, we decided to pool all the data and estimate a single model. The interaction variable between age groups (age 62-70, 71-80, and 80+) and mortality rates were generated to better capture the impacts of both age group and age-specific mortality than a single mortality variable would. The CLL model needs to create as many observations as there are policy year observations for each loan in the data set in order to properly handle the censoring. These created observations bring information about the censoring of cases and allow conditional probabilities to be estimated. The variable QUIT was created and used as the explained variable in the SURVIVAL ANALYSISS PROC GENMOD Model statement. Another variable YEAR related to the current policy year was also created and used in the model. The following SURVIVAL
ANALYSIS CODE shows the creation of new observations and variables mentioned above. DATA NEWHECM; SET HECM; DO YEAR = 1 TO POLYEAR; IF (YEAR=POLYEAR AND TERMINATE=1) THEN QUIT=1; ELSE QUIT=0; N=1; END; RUN; The appreciation rates (APPR) were calculated using the appropriate housing price index in the origination month and compares it to the index value in the same month of the policy year of termination. The model was tested for different censored-times from three to nine policy years. Estimated coefficients of two cases (six and nine years) are shown below. They were chosen for the following reasons: (1) including sufficient number of years for analysis, and (2) reflecting the impacts of house price appreciation of the different length of policy years. PY = 6 β e β Pr > χ 2 Intercept -2.8715 0.0566 <.0001 DMM1-0.2886 0.7493 <.0001 MTRY 0.0990 1.1041.0028 MTRM 0.0684 1.0708 <.0001 MTRO 0.0854 1.0891 <.0001 APPR -0.2167 0.8052.1719 DCOB -0.4852 0.6155 <.0001 DGEN 0.1658 1.1803 <.0001 INC 0.0021 1.0021.0006 ASSET -0.0013 0.9987.0012 YEAR 0.0525 1.0539.0009 PY = 9 β e β Pr > χ 2 Intercept -2.6790 0.0686 <.0001 DMM1-0.4018 0.6691 <.0001 MTRY 0.0943 1.0989.0023 MTRM 0.0608 1.0627 <.0001 MTRO 0.0813 1.0847 <.0001 APPR 0.3627 1.4372.0140 DCOB -0.4619 0.0630 <.0001 DGEN 0.1437 1.1545 <.0001 INC 0.0022 1.0022.0003 ASSET -0.0012 0.9988.0031 YEAR 0.0540 1.0555 <.0001 The analysis shows that most explanatory variables behave similarly for both cases except the house price appreciation rates. Test cases with policy year less than 6 have negative estimated coefficients and cases with policy year more than 6 have positive coefficients. With the long-term trend of housing markets in US continuously increasing, it is understandable that the longer the HECM borrower stays in the program, the higher the possibility that the owner will terminate the loan. High house appreciation may give the borrower an incentive to voluntarily pay off the loan because the borrower gets to keep any equity in excess of the debt at the time of loan payoff. The housing price appreciation within shorter periods after origination may not be significant to encourage borrowers to pay-off their loans. The mortality rates do have positive (increasing) impacts on HECM loan termination. Since the mortality rate of older group is almost three times of the next younger age group (i.e. MTRO/MTRM or MTRM/MTRY), the impacts of mortality rate of younger group are higher than the estimated coefficients indicate for the older age group. The coborrower status variable (DCOB) shows that a couple tends to hold on HECM loan longer than single borrower. The gender variable (DGEN) shows that male have higher tendency of terminating HECM loan than female, which is consistent with
the fact that males have higher mortality rates than females. The data show that borrowers with higher income levels are more likely to terminate HECM loans early while borrowers with fewer assets tend are more likely to terminate HECM loans early. The finding on assets is counterintuitive one might expect borrowers with more wealth to have more housing choices available and therefore may be more willing to move and terminate the HECM loan. There may be some interaction between wealth and borrower age that explains this finding, but we have no explanation for this counterintuitive finding at present. The younger borrowers tend to have higher income and are terminating HECM loans more rapidly than older borrowers. The poor borrowers may live to a shorter lifetime or to be less healthy than the rich one. The negative coefficient on the YEAR variable produces a slightly declining baseline hazard (after a big rise in the second year, due to the negative coefficient on the policy year 1 dummy), i.e. the later policy years have a slightly declining impact on loan termination (See Figure 1). The total hazard will not follow this pattern indefinitely, however, as increases to the mortality rate over time will eventually make the hazard curve rise despite the declining baseline curve. V. Conclusion The HECM loan program, introduced and insured by HUD, is still in its early stage of existence. In order to make it a successful program that elder citizens can use to access the equity accumulated in their property, the characteristics of the program need to be explored further. The use of survival analysis techniques will facilitate the understanding of HECM loans and their cash flows, enabling the program to grow and to be attractive for lenders and investors include in their portfolios in the future. In addition, more detailed information regarding reasons for termination such as move-out or death will provide new insights for modeling HECM loan terminations. The SAS survival analysis procedures for handling logistic and other binary response regression analyses have grown substantially for many different applications. SAS procedures such as LIFETEST, LIFEREG, PHREG, LOGISTIC, PROBIT and GENMOD have enabled researchers to look into the survival events with reasonable confidence that policies developed based on the research findings would be sound and practical. The use of the GENMOD procedure in this study is simply a starting point for survival analysis for HECM loans and is far from providing a complete understanding the HUD reverse mortgage program.
REFERENCES Allison, Paul D. (1999), Logistic Regression Using the SAS System: Theory and Application, SAS Institute. Allison, Paul D. (1995), Survival Analysis Using the SAS System: A Practical Guide, SAS Institute. Cox, D. R. (1972), Regression Models and Life Tables, Journal of the Royal Statistical Society, Series B, 34, 187-220. U. S. Department of Housing and Urban Development (1990), Home Equity Conversion Mortgage Insurance Demonstration: Interim Report to Congress. U. S. Department of Housing and Urban Development (1992), Preliminary Evaluation of the Home Equity Conversion Mortgage Insurance Demonstration: Report to Congress. U. S. Department of Housing and Urban Development (1995), Evaluation of the Home Equity Conversion Mortgage Insurance Demonstration: A Report to Congress. ACKNOWLEDGEMENTS SAS is a registered trademark of SAS Institute Inc. in the USA and other countries. CONTACT INFORMATION Ming H. Chow Staff Analyst/Economist Computer Based Systems, Inc. - An AverStar Company 2750 Prosperity Ave, Suit 300 Fairfax, VA 22031 (202) 708-0421 ext. 5882 ming_chow@hud.gov Edward J. Szymanoski Senior Economist Office of Federal Housing Enterprise Oversight 1700 G Street, NW Washington, DC 20552 (202) 414-3763 eszymanoski@ofheo.gov Theresa DiVenti Economist Financial Institution Regulation Division US Department of Housing and Urban Development 451 7 th Street, SW Washington, DC 20410 (202) 708-0421 ext. 5883 theresa_r._diventi@hud.gov