An Empirical Investigation on the Prevalence of. Manipulation in Nursing Homes Rating System

Transcription

1 An Empirical Investigation on the Prevalence of Manipulation in Nursing Homes Rating System Xu Han, Niam Yaraghi & Ram Gopal Abstract: The Nursing Home Compare System supported by the Centers for Medicare & Medicaid Services (CMS) has been widely used by patients, doctors and insurance companies. We argue that potential fraud may exist in the rating procedure, which can lead to biased and misleading ratings. We use the CMS rating data over and the corresponding complaint data reported by California Department of Public Health (CDPH) and financial data reported by Office of Statewide Health Planning and Development (OSHPD) for over 1200 nursing homes in California to empirically examine the key factors affecting the star rating of a nursing home. We find a significant association between a nursing home s star rating and its profits, which points to a financial incentive for nursing homes to cheat. We then demonstrate that this association does not always lead to legitimate efforts to improve their service quality, and can induce cheating in the self-reporting step of the rating procedure. A prediction model is then developed to detect cheating in the suspect population based on which % of the nursing homes are identified as likely cheaters. Key Words: CMS, Fraud Detection, Nursing Home, Prediction, Rating System 1

2 I. Introduction Over nursing home facilities in the United States provide care to 1.5 million residents of which 72% are over 65 years old. Each resident spends an average of 835 days at a nursing home. They account for 6 percent of the Medicare population but 17 percent of total Medicare spending. The Department of Health and Human Services estimate that 4.1% of the 65+ population in 2009 lived in nursing homes. This percentage increases with age, ranging from 1.1% for persons years to 3.5% for persons years and 13.2% for persons 85+. Given the importance of nursing homes in the quality of life of patients and the billions of dollars spent on these facilities, the Centers for Medicare & Medicaid Services (CMS) has designed and implemented a system for evaluating the nursing homes. The system provides a star rating in a 1-5 scale for each nursing home based on 3 domains of on-site inspection, staffing and quality of care. Given the lack of alternative sources of information on nursing homes, the publically available CMS rating has become the gold standard in the industry since its inception, and has been widely popular among patients, physicians and insurance companies (Thomas, 2014). The CMS rating, however, may not always reflect the true service quality of the nursing homes. Cases have been reported in which patients personal experiences differ significantly from the star rating. Some high-star-rating nursing homes are sued for substandard care, even causing death of patient due to improper use of medical treatment (Thomas, 2014). As we discuss below, the current rating system may be prone to manipulation by nursing homes. CMS first assigns an initial star rating to each nursing home based on the annual on-site inspection results that are performed by licensed inspectors. Self-reported measures from 2

3 quality and staffing domains are then used to adjust the initial star rating by adding 1 star if any self-reported domain is excellent (4 or 5 stars for staffing domain and 5 stars for quality domain) and subtracting 1 star if any self-reported domain is 1 star. Figure 1 shows the trend of the ratings over the past five years. The ratings of the self-reported domains consistently shift to the higher end. In other words, the number of nursing homes that claim high performance in staffing and quality domains has continuously increased over the past five years, leading to seemingly improved overall ratings. This trend can be interpreted in two ways: On the one hand, supporters can argue that increased levels of self-reported measures are genuine and represent an honest effort by nursing homes to constantly improving their services. On the other hand, skeptics may argue that the improved ratings are not legitimate but are a result of nursing homes success in developing strategies of manipulating the system to inflate their ratings. The phenomenon leads to the following questions: Is the observed star increasing legitimate or does fraud exist in the system? If there is fraud, then how severe it is? In this paper, we seek answers to the above questions. We develop empirical models to investigate the existence and prevalence of fraud and to identify the incentives behind it. The paper proceeds as follows. In Section II, the data collection is described and the empirical model is developed. In Section III, the regression results are analyzed and a correlation analysis is conducted to show preliminary evidence of fraud in the system. A complaint-based analysis is then conducted in Section IV, which justifies the existence of fraud in the system. In Section V, we develop a fraud estimation model to evaluate the severeness of cheating in the system and conduct a variable importance analysis to show key characteristics of the cheaters. The paper is concluded in Section VI. 3

4 II. Data Collection and Empirical Model A. Data Collection Our primary panel data is derived from the star rating records for 1219 California nursing homes during the period The data consist of CMS s ratings and nursing homes basic information, such as location, size, certificate and ownership. The pooled data consist of records for all the nursing homes over the 60-month (5-year) period. The second major dataset is provided by the California Office of Statewide Health Planning and Development (OSHPD). The OSHPD data includes details on nursing homes financial situation for the years Nursing homes income is categorized into healthcare section and non-healthcare section. The corresponding revenue and expense details for each section are provided, and the profits can be calculated. The healthcare section is further categorized by the payment sources, e.g., Medicare, Medicaid or self-paying. We also collected complaint and deficiency data from California Department of Public Health (CDPH) for year The CDPH data contains basic information of the all California nursing homes as well as the detailed complaints, facility self-reported incidents and inspection results. The CDPH complaint data not only covers complaints that CMS have already considered in its rating procedure, but also includes complaints and deficiency reports that are only state-wide and are not reported to CMS. B. Nursing Homes Financial Incentive CMS star rating has significant financial implications for nursing homes and thus they have an incentive to obtain the highest possible rating. To demonstrate the financial 4

5 implications of star ratings for nursing homes, we calculate the average profit per day per patient for nursing homes in each of the five star rating groups, as shown in Table 1. These averages serve as an estimate of the profit that a nursing home can expect given its rating. The difference is significant. For example, a nursing home receiving 3 stars in inspection may only expect a profit of for treating one patient for one day. However, if it gains two additional stars after self-reporting and achieves an overall rating of 5 stars, its expected profit can be Figure 2 shows the profit trend for each of the star rating group over the 5 years. The results explain nursing homes incentives to achieve the highest possible ratings from the financial perspective, and provide a quantifiable way to measure. In our model, we define the incentive of a nursing home to be the profit difference between its inspection rating and the highest overall rating it can obtain, as shown in Table 1. C. Empirical Model We focus on the change of the star rating that happen as a result of the self-reported measures. Variable StarChange is equal to the difference between the overall rating and the inspection rating. For example, if the nursing home receives 3 stars in the Medicare on-site inspection but receives a 5-star overall rating after including its self-reported measures on staffing and quality domains, then the StarChange equals +2. By definition, StarChange can only take discrete values of +2, +1, 0, -1 and -2, thus we employ an ordinal logistic regression model. The model investigates how StarChange taking values in five levels {-2,, +2} depends on a vector of independent variables X. StarChange is 5

6 determined by a set of parameters -2, -1, 0, 1, which define the cut points of the five levels. The model can be written as exp(α j +X β) P(StarChange j) =, (1) 1+exp(α j +X β) where j = -2, -1, 0, 1, and vector X includes the following independent variables: Incentive, Competition, ResTotal, ForProfit, Medicare, Medicaid, ResCouncil, FamCouncil. Among the independent variables, the main effect we consider in our model is the nursing homes financial incentive, denoted by Incentive. The variable Competition describes the number of competing nursing homes in the 10 miles radius. The number of residential patients in each nursing home represents its size, and is denoted by ResTotal. Variable ForProfit defines a nursing home s Ownership type. ForProfit equals to one if the nursing home is for-profit type and zero otherwise. Variable Medicare and Medicaid define a nursing home s certification. Medicare is equal to one if the nursing home is Medicare certified, likewise, Medicaid is equal to one if the nursing home has Medicaid certification. By law, nursing homes are required to allow councils set up by their residents or the family members of the residents. These residential and family councils can facilitate the communication with staff, and get problem resolved more efficiently. Since nursing home residents may be more vulnerable than normal people due to their health conditions, the residential council and family council can function very differently in resolving issues and handling complaints. In our model, binary variables ResCouncil and FamCouncil are included to denote the council type. A nursing home can have both types of councils. Note that in our dataset, only StarChange and Incentive change over time and the other variables remain constant over all of the 60 months. 6

7 III. Result Analysis A. Regression Results We estimate equation (1) using both the fixed effects and the Hausman-Taylor methods, as shown in Table 2. The pooled linear regression result is also listed. In view that many variables are time-invariant, their coefficients cannot be directly estimated by the fixed effect model. In all the methods, the main effect Incentive is significant, with positive association, i.e., the bigger incentives nursing homes have, the more likely their star ratings will increase after self-reporting. The control variable Medicare is also significant in both the pooled linear regression and Hausman-Taylor method, but Medicaid is not. In view that almost all the nursing homes are Medicaid certified, the results indicate that if a nursing home is only Medicaid certified, it is more likely to obtain additional stars through self-reporting. B. Correlation Analysis The results show a positive association between the financial incentive and the rating change. However, it does not necessarily mean that nursing homes are gaming the system. It is possible that the nursing homes gain the star improvement legitimately through their true efforts. To better understand the changes in star ratings, we investigate the correlation between the on-site inspections and self-reported measures. Two correlations are expected. First, within the same year, if nursing homes are reporting the truth, a positive correlation is expected between the CMS on-site inspection and nursing homes self-reported measures. Second, if a nursing home really puts an effort in improving its care quality, these efforts should 7

8 have a lasting effect and lead to better results in the next year s on-site inspections. The results are shown in Figure 3. It can be seen that the correlation between on-site inspection and selfreported measures within the same year is around 0.2, indicating inconsistency between the inspection and self-reporting domains within the same year. The correlation between selfreported measures and the next year s CMS inspection results is only , which indicates that the self-reported improvements in quality and staffing have no lasting effect on the next year s on-site inspection. IV. Complaint-based Analysis As discussed in the previous section, inconsistency exists between the inspection and self-reported results, which shows preliminary evidence of potential fraud. In this section, we conduct an analysis based on the CDPH complaint data to justify the existence of fraud. Comparing with CMS s complaint data, the CDPH data consists of additional complaint records that are only state-wide and not reported to CMS, thus it can reflect the service quality more accurately. In the following, we first show that the number of complaints is a valid indicator for the service quality by conducting a contingency analysis, and then develop a complaint-based method to justify the existence of fraud. A. Contingency Analysis The CDPH complaint data consists of complaints that can be filed either by the patients or by the family members. Different from the CMS conducted inspection and nursing homes self-reported staffing and quality domain data, the complaints directly reflect the service quality of a nursing home from the customers point of view. We combine the CDPH complaint 8

9 data with CMS star rating data. The combined data has 4378 records in total, i.e., around 900 records for each year. We made the assumption that the complaints reflect the true performance of a nursing home. To test it, we divide the complaint status into 5 levels in which level 1 has the least complaints and level 5 has the most. A 5 by 5 contingency table can be created for each of the three domains. In each contingency table, the columns and rows are the ratings and complaint status, respectively. The chi-square test is then performed to test the independency between the column and row variables. For inspection, quality and staffing domains, the Pearson chi-square statistics are (p-value = 0.000), (p-value = 0.000) and (p-value = 0.000), respectively, indicating strong dependencies between complaints and each of the three domains. In other words, the number of complaints does reflect the performance in all three domains, which supports our assumption. As a result, we can use the complaints as a valid proxy variable when analyzing potential fraud behaviors. B. Evidence on Rating Manipulation The contingency analysis shows that the number of complaints is directly associated with the service quality. If two nursing homes provide similar quality of service, we assume their complaint distributions are similar. Thus if overall rating reflects the true service quality of nursing homes, i.e., there is no fraud, then nursing homes with the same overall rating should have similar distribution in terms of number of complaints. Table 3 shows the average number of complaints for nursing homes with different inspection and overall ratings. For each overall rating level, the nursing homes are divided into 2 categories: Nursing homes whose star ratings increase after self-reporting and nursing homes whose star ratings do not increase after self- 9

10 reporting. We denote the upper triangular to be area I and lower triangular to be area II. In the following, we test the following claims: Claim 1: The nursing homes with the same overall rating but different inspection ratings have different complaint distribution. To test this claim, we run ANOVA for each overall rating level with different inspection rating. For each overall rating level, two ANOVA results are presented in Table 4. In the first one, nursing homes are grouped by whether star rating increases after self-reporting, while in the second one, nursing homes are grouped by on-site inspection rating. It can be seen that all the comparisons are significant. Our claim that nursing homes with the same overall rating but different inspection ratings have different complaint distribution is thus supported. Claim 2: The nursing homes with the same inspection rating but different overall ratings have the same complaint distribution. To test this claim, we run ANOVA for each inspection rating level with different overall rating. For each inspection rating level, two ANOVA results are presented. In the first one, nursing homes are grouped by whether star rating increases after self-reporting, while in the second one, nursing homes are grouped by overall rating. As shown in Table 5, we do not observe significant difference in the number of complaints, although the overall rating can be quite different. The results provide strong evidence for potential cheating behaviors in self-reporting. The self-reported high ratings in the staffing and quality domains, though can gain higher overall ratings, do not lead to any observable service quality improvement in terms of the number of complaints. 10

11 V. Fraud Estimation and Variable Importance As discussed above, the final ratings of a cheating nursing home is driven by two components. The first component is the observable characteristics which are common between both cheating and common while the second is the unobservable cheating component which pertains only to the cheating nursing home. Since the cheating component is unobserved and omitted from our regression model, the estimates of the remaining observed variables will be biased due to the omitted variable of cheating. However, the final rating of an honest nursing home is only driven by one component of observed characteristics. That is, since the cheating component does not exist among the honest nursing homes, our regression estimates for the honest group will not suffer from the omitted variable bias. To develop our fraud detection model, we first divide the nursing homes into two groups: the honest nursing homes and the potential cheaters. A regression is then run for the honest nursing homes. The obtained regression coefficients for the honest group reflect the true association when there is no fraud, and are unbiased. These coefficients are then used to predict the highest possible overall rating for each nursing home in the suspected cheating group. A nursing home is identified as a likely cheater in our estimation if its actual rating is higher than the highest level of estimated rating. A. Prediction Model Since the overall star rating is the variable we want to predict, it is used as the dependent variable, and is denoted by OverallRating. Similar to the variable StarChange in the regression model in Section II, OverallRating is ordinal and takes values in five levels {1, 2,, 5}, so we employ an ordinal logistic regression model. OverallRating is determined by a set of 11

12 parameters 1, 2, 3, 4, which define the cut points of the five star levels. The model can be written as P(OverallRating k) = exp(γ k +X β P ) 1+exp(γ k +X β P ), (2) where k = 1, 2, 3, 4. The independent variables are not changed and are denoted as a vector X. The coefficients of the prediction model are denoted by β P. B. Grouping Since we use the coefficients of the honest group as the unbiased baseline, we define the members in this group strictly, i.e., we want to make sure there is no evidence that these nursing homes are cheating. An honest nursing home is then selected based on the following strict criteria: 1. Its rating does not increase after self-reporting 2. Its number of complaints is strictly lower than the median of its corresponding selfreporting level. To define a nursing home s self-reporting level, we combine the two self-reported domains. The median and average of the number of complaints are reported in Table 6. Based on the criteria stated above, the honest (H) group consists of 1345 nursing homes in the 5 consecutive years. The remaining 3033 nursing home are categorized in the potential cheating (PC) group. Note that the PC group consists of both the actual cheaters and the nursing homes who improve their service qualities through legitimate efforts. In the following, we estimate the proportion of the actual cheaters in the PC population. 12

13 C. Identification of Cheaters For the H group, we run the ordinal logistic regression in equation (2) to obtain the estimates of each coefficient, as shown in Table 7. Note that the H group is strictly defined and there is no evidence for cheating. Both the 95% and 90% confidence interval are reported. The coefficients are assumed to be unbiased. By using the unbiased coefficients and the upper bounds of the confidence intervals, we can predict the highest possible rating for the PC group. The cheaters are identified if its real overall rating is higher than the highest possible rating predicted. By using the 95% confidence interval, we can identify 184 out of the 3033 nursing home (6.07%) in the PC group which are most likely to be cheaters. Based on the 90% confidence interval, we can identify 379 out of 3033 nursing home (12.5%) in the PC group which are most likely to be cheaters. C. Variable Importance To emphasize the key difference between honest nursing homes and the likely cheaters, a subset of the data is constructed by eliminating nursing homes whose status are difficult to be identified. This includes the nursing homes that are not identified as cheaters and are also not a member of the honest group. The remaining dataset consists of 1724 nursing homes, in which 1345 are identified as honest nursing homes and 379 are identified as cheaters. The status of a nursing home is identified as 0 if it is an honest nursing home and 1 if it is a cheater. In view that status takes binary values, we develop a logit model to perform the variable importance analysis, as shown in equation 3. The probability of being an identified cheater is denoted as. 13

14 logit(λ) = β 0 + β 1 Incentive + β 2 Competition + β 3 ResTotal + β 4 ForProfit + β 5 Medicare + β 6 Medicaid + β 7 ResCouncil + β 8 FamCouncil + β 9 InspectionRating + β 10 QualityRating + β 11 StaffingRating +, (3) where InspectionRating, QualityRating and StaffingRating denote the 5-star ratings for the three domains. Other variables follow the same meanings as explained in Section II. The coefficient estimates and their importance are shown in Table 8. The two selfreported domains QualityRating and SatffingRating have the highest variable importance, i.e., they contribute the most to the probability of being a cheater if these values change from their lower bounds to upper bounds. The result is consistent with our analysis since untruthful high self-reporting is the only way for a cheater to achieve high overall rating. Besides the selfreported domain variables, we find the variable ResTotal to be the third in terms of variable importance, achieving a score of out of 100. The result indicates that when a nursing home s size grows, its probability to game the rating system increases significantly. The Incentive and ForProfit, being the 4 th and 5 th in the variable importance analysis, indicate that the financial incentive of a nursing home plays an important role in being a cheater. The forprofits nursing homes are more likely to be cheaters than the non-profits ones, and the higher their financial incentives are, the more likely they will be cheating. On the other hand, the probability of being a cheater does not change a lot with the change of competition and certification status (Medicare or Medicaid Certified). VI. Conclusion 14

15 This paper systematically analyzes CMS s nursing home rating system, and documents the existence of fraud. We develop a fraud detection model to estimate the proportion of cheating nursing homes. A variable importance analysis is then performed to identify the factors that contribute the most to being a cheater. Our research provides several contributions. First, to the best of our knowledge, it is the first study that systematically investigates the fraud in the CMS nursing home rating system, and explains incentives behind fraud. This study pinpoints the major flaws in the current rating system and contributes to the improvement of the rating mechanism. Second, the study estimates the proportion of likely cheaters, which serves as an important performance evaluation on the current rating system. These estimates can be used to strategically focus the future audits of the CMS on the nursing homes which are most likely to be cheaters. 15

16 Bibliography Berg, K. e. (2002). Identification and evaluation of existing nursing homes quality indicators. Health Care Financing Review, Cabin, W. (2014). For-profit Medicare Home Health Agencies' Costs Appear Higher and Quality Appears Lower Compared to Nonprofit Agencies. Health Affair, Della Vingna, S. (2010). Detecting illegal arms trade. American Economic Journal: Economic Policy, Duggan, M. (2000). Winning isin't everything: Corruption in sumo wrestling. National Bureau of economic research. Engelberg. (2014). Financial Conflicts of interest in Medicine. SSRN. Fennell, M. L. (2010). Elderly Hispanics more likely to reside in poor-quality nursing homes. Health Affairs, Grabowski, D. C. (2004). Medicaid payment and risk-adjusted nursing home quality measures. Health Affairs, Grabowski, D. C. (2004). Recent trends in state nursing home payment policies. Health Affairs, Harrington, C. (2001). Does investor ownership of nursin ghomes compromise the quality of care? American Journal of Public Health, Jacob, B. A. (2003). Rotten apples: An investigation of the prevalence and predictors of teacher cheating. National Bureau of Economic Research. 16

17 Kane, R. A. (2003). Definition, measurement and correlates of quality of life in nursing homes: Toward a reasonable practice, research, and policy agenda. The Gerontologist, Mayzlin, D. (2012). Promotional reviews: An empirical investigation of online review manipulation. National Bureau of Economic Research. Mor, V. e. (2003). The quality of quality measurement in U.S. nursing homes. The Gerontologist, Smith, D. B. (2007). Separate and unequal: racial segregation and disparities in quality across U.S. nursing homes. Health Affairs, Stevenson, D. G. (2003). The rise of nursing home litigation: Findings from a national survey of attorneys. Health Affairs, Stevenson, D. G. (2008). Private equity investment and nursing home care: Is it a big deal? Health Affairs, Studdert, D. M. (2011). Relationship between quality of care and negligence litigation in nursing homes. New England Journal of Medicine, Thomas, K. (2014). Medicare Star Ratings Allow Nursing Homes to Game the System. The New York Times. werner, R. E. (2010). Public reporting drove quality gains at nursing homes. HealthAffairs, Werner, R. M. (2010). Advancing nursing home quality through quality improvement itself. HealthAffairs,

18 Figure 1. The Trend of Five-year Ratings : (a) Inspection Rating (b) Quality Measurement Rating (c) Staffing Rating (d) Overall rating (a) (b) (c) (d) 18

19 Figure 2. Profit Trend over the 5 Years : (a) Plotted by Overall Ratings (b) Plotted by Years (a) (b) 19

20 Figure 3. The Correlation Analysis: a) A Positive Correlation Is Expected if Star Increase Is Resulted from Self-improvement Efforts (Red Arrow) b) A Positive Correlation Is Expected if a Nursing Home s Self-reporting is In Line with Medicare s Inspection 20

21 Table 1. Financial Incentive Definition Inspection rating Expected profit* if inspection rating unchanged Maximal possible overall rating Expected profit* if Maximal possible overall rating realized Incentive (profit difference) (Level 5 Level 5) (Level 5 Level 4) (Level 5 Level 3) (Level 4 Level 2) (Level 2 Level 1) *The expected profit is the average per patient per day profit for the corresponding star rating group 21

22 Table 2. Regression Results Variables Pooled data Fixed effect Hausman-Taylor Incentive *** *** *** Competition ResTotal *** ForProfit *** * Medicare *** *** Medicaid *** * ResCouncil FamCouncil **

23 Table 3. Average Complaints Overall 1 Overall 2 Overall 3 Overall 4 Overall 5 Inspection Inspection Inspection Inspection Inspection Note: The blank cells represent the impossible rating transaction according to CMS s rating system design. 23

24 Table 4. F test: Grouped by Inspection Rating F statistics: Area I vs Area II F statistics: Grouped by Inspection Overall 1 star ** Overall 2 star 7.43*** 6.16*** Overall 3 star 13.05*** 5.06*** Overall 4 star 14.22*** 8.35*** Overall 5 star 5.27** 5.70*** 24

25 Table 5. F test: Grouped by Overall Rating F statistics: Area I vs Area II F statistics: Grouped by overall Inspection 1 star Inspection 2 star Inspection 3 star Inspection 4 star 5.37** 2.00 Inspection 5 star

26 Table 6. Combined Self-reporting Levels Average Rating of Staffing and QM Complaint Average Complaint Median Level 1: Level 2: Level 3: Level 4: Level 5:

27 Table 7. The Unbiased Results for the H Group Overll Rating Coefficients 95% Confidence Interval 90% Confidence Interval Incentive *** Competition ResTotal *** ForProfit *** Medicare *** Medicaid * ResCouncil FamCouncil 0.483***

28 Table 8. Variable Importance Analysis Overall Rating Coefficients Variable Importance Incentive *** Competition ResTotal *** ForProfit 0.223*** Medicare Medicaid ResCouncil ** FamCouncil *** InspectionRating ** QualityRating *** StaffingRating ***