Papers presented at the ICES-III, June 18-21, 2007, Montreal, Quebec, Canada

A Comparison of the Results from the Old and New Private Sector Sample Designs for the Medical Expenditure Panel Survey-Insurance Component John P. Sommers 1 Anne T. Kearney 2 1 Agency for Healthcare Research and Quality 2 U.S. Census Bureau Abstract The MEPS-IC is an establishment survey that collects information on employer sponsored health insurance. In survey year 2004, we changed the sample for private sector establishments considerably. The changes include: strata definitions; allocation per state; number of states with minimum sample for publication which grew from 40 to 51; overall sample size which grew by 7,000 units; and special sampling rules that limit the sample per firm. The goal of the design was to improve sampling errors for state estimates and to at least maintain the errors at the national level. In this paper we compare errors from the new to the previous design to determine if we met our goals. Since we made changes to the design that could decrease sampling error for certain estimates, we also compare specialized design effects to determine the effects of stratification and over-sampling on final results. Keywords: Sample Design, Medical Expenditure Panel Survey-Insurance Component, Disclaimer: This report is released to inform interested parties of ongoing research and to encourage discussion of work in progress. The views expressed are those of the authors and not necessarily those of the U. S. Census Bureau or the Agency for Healthcare Research and Quality. 1.0 Background The Medical Expenditure Panel Survey (MEPS), Insurance Component (IC), is an annual survey of business establishments and governments with a sample of over 43,000 units. The survey is sponsored by the Agency for Healthcare Research and Quality (AHRQ) and conducted by the U.S. Census Bureau. The survey collects information on the topic of employer-sponsored health insurance in the United States. Among the information collected is whether the employer offers insurance, numbers of employees, numbers of employees eligible and enrolled, premiums, employee contributions to premium costs, whether insurance is offered to retirees and a set of business demographics. (See AHRQ Website #1 for more information on the MEPS-IC.) The survey, which was established in 1997, began by collecting information for the survey year 1996. Since that time a large number of tables of estimates have been published for each survey year. Estimates are made for government and private sector employers and include estimates of the number of establishments that offer health insurance, the percent of employees who work where health insurance is offered, the percent eligible for health insurance, the percent enrolled and average premiums and contributions per enrolled employee. Private sector estimates, which are the focus of this paper, have been published annually since 1997 for the nation, at least 40 states, firm size groups and major industry groups. The survey s sample is made up of two parts, the private sector sample, called the list sample, and the sample of state and local governments. Currently, with approximately 41,000 units, the private sector sample is the key component of the two samples since it measures the private sector which has over five times the number of employees compared with state and local governments. 2.0 Description of Sample Designs 2.1 Original List Sample Design The design, size and allocation of the list sample were basically the same from the 1996 through 2002 survey years. The sample was selected from the Census Bureau s Standard Statistical Establishment List, which was replaced recently by an updated type of list, the Census Bureau s Business Register. (From this point forward, when using the word sample, we will be referring to the private sector list sample.) The original sample design was characterized by a total sample size of about 37,500 total sampling units (before nonresponse) with minimum sample sizes for each of 40 states for which estimates were published. The remaining small states received a much smaller 835

allocation which was proportional to their actual size and was approximately 5% of the overall sample. Each state sample was allocated across 14 strata. These strata were based upon the size of the firm which controlled the establishment and the employment size of the establishment. The sample for the state was allocated using Neyman allocation (Cochran, 1977) based upon variance components which were an average of the unit variance for estimates of the numbers of establishments that offer health insurance and the unit variance component for numbers of employees enrolled in health insurance. This method effectively gives an allocation which is a balance of the optimal allocation for estimates of the numbers of establishments and the optimal allocation for estimates of numbers of persons. The former allocation would tend to give good estimates for estimates of variables, such as the percentage of establishments that offer health insurance. It would also support estimates for small firms, since there are a large number of these firms but they have a small proportion of the total employment. The latter allocation supports better estimates of variables related to employment, such as total enrollment and the average premium per enrolled employee. The unit variances used in the allocation were the same for each state and were a national average of the estimates of unit variance for each state. The numbers of units in each stratum were the actual numbers from the sampling frame for 1996. It was projected that this balanced design would give estimates for each of the two types of variables and would not have errors that would be substantially different than would be obtained if the optimal sample allocation for the individual variables were used (Sommers, 1999). One should note that actual error results obtained are also influenced by factors, such as varying response rates across each stratum and how far the relative values of a state s true variance components vary from the national average components used in the allocation. 2.2 New Sample Design Beginning in the 2003 survey year, the MEPS-IC sample design was changed (Sommers and Riesz, 2003). Survey year 2003 was a transition year and in survey year 2004 the new design was completely implemented. The new design is again state based with allocations to each state for strata within the state. Allocations are done by state using numbers of units in each stratum within the state, Neyman allocation and unit variance components which are the same across all the states for the same strata within each state. The strata for the new design are based upon the predicted probability that an establishment would offer health insurance and the predicted enrollment given the establishment would offer health insurance. Although size of establishment and size of firm (i.e. the factors used to build the old sample design strata) correlate well with outcomes, other factors also play a part in determining outcomes for an establishment. For instance, an establishment in a small firm is less likely to offer health insurance. However, even among small firms those in certain industries and in certain parts of the country which pay higher salaries are more likely to offer insurance and also to have a higher enrollment. By using the predictions, which are based upon more variables than the size variables alone used in the past, we are able to build strata based upon predicted values that correlate with outcomes better than size alone. The unit variance components used in the Neyman allocation were a weighted average of variance components for three variables, the number of establishments which offer health insurance, the total employees enrolled and the total employee contributions to single coverage. As in the old design the same sets of variance components were used for each state s allocation. Analysis indicated that the use of these variance components would improve estimates more for estimates based upon employee and enrollee numbers, such as total enrollment and average premiums and contributions per enrolled employee than for variables based upon numbers of establishments, such as percent of establishments that offer health insurance. Although analysis showed that the new design should improve both types of estimates, users indicated that the employee types of estimates were much more important. Originally the allocations per state were to have been the same as the old design, but budgets allowed for more sample and the state by state allocation was also changed. AHRQ decided that the increase in sample to 42,000 for the private sector sample was enough to produce estimates for all states. This important set of estimates had been a goal for AHRQ for several years. This meant that all states, even the smallest would now have minimum sample sizes. However, in order to do this it required that the minimum sample size per state for all states including the 40 states already published needed to be reduced to meet budget. This meant that without other changes, the state estimates would deteriorate slightly. It also meant that it was possible 836

that the national estimates could also deteriorate. The latter could occur because the entire increase to the sample was being placed into states with less than 5 percent of the universe and the sample was actually being reduced in the other states which represent the vast majority of the population. It was assumed that this sample size reduction would have little effect on the overall error rates. There was hope that using the new design would counteract the effects of the smaller less efficient allocation of the national sample among states. 3.0 Evaluation of Old and New Sample Designs The purpose of this paper is to report the results of an evaluation that compared a set of estimates and their errors for a selected set of variables and domains using results from the 2002 MEPS-IC, which used the old sample design, versus the results for the same set of estimates and their errors for the selected set of variables and domains using the results from the 2004 MEPS-IC, which was based upon the new sample design. 3.1 Estimates Analyzed Because of changes in sample sizes and allocations to states, and in order to have a more valid comparison of results, we created estimates for 8 variables for just the 31 largest states. We did this because sample sizes for these states were a set proportion in 2004 of the sample sizes in 2002 and results were more directly comparable. We chose the 8 variables to reflect a wide assortment of important estimate types. We also created a set of pseudo national estimates with just the 31 largest states. For the 8 different variables we created estimates for each of 9 industry groups using employers for the entire set of 31 states. These states have over 90 percent of the nation s workforce and these states had their samples changed by the same percentage. Thus, the quality of estimates of the combination of all these together was not affected by our budgeting process and only affected by the changes in stratification and allocation of the new sample design. Thus, we felt this pseudo national comparison was a more reasonable comparison of the changes in sample design than a comparison which reweighted the sample allocation across all the states because of budget reasons. 3.2 Measures of Sample Results If the distributions and sample sizes were the same in 2002 and 2004, one could compare results by simply comparing standard errors for the same estimates across the two years. However, things changed across the two years which made this direct comparison problematic. Two things happened: Sample sizes and the distributions of sample across states changed. We needed to account for these changes in any comparisons. The distributions of the variables changed. For instance, average premiums increased over 20 percent between the two years and as a result the distribution of premiums had a higher standard deviation. An estimate of average premiums with the exact same sample size and sample design would have a higher standard error in 2004 than 2002. We needed to factor out of any comparisons these changes in distributions. To give us the best means of assessing the new sample design, we first created three quality measures for each estimate compared. They were: The relative standard error (RSE), the ratio of the standard error over the mean. This would generally be the best standard, however due to sample size change, the total sample size for these 31 states in 2002 was 21,900 and was 20,600 in 2004. The square root of the design effect of the sample. The design effect is the estimate of variance for the sample divided by the estimate of the variance if simple random sampling had been used. A value we call the unit RSE. This is the RSE multiplied by the square root of the sample size. Under the assumption that the standard error is a constant standard deviation divided by the square root of the sample size, the unit RSE would be a comparison that removes the effect of sample size on the results. The first item gives an overall comparison which includes the change in sample size, but as opposed to total error it corrects for the increase in standard deviation caused by the increase in the mean. The second item gives a comparison which in some way 837

considers the affect after differences in the distribution and factors out sample size. The final item tries to give a measurement that adjusts for sample sizes only. Although we believe improvements in these measures indicates a better design, it seems that changes in other factors which can change from year to year make it difficult to state with certainty whether a given design is better. We calculated all quality measures using SUDAAN software and the Taylor series method to account for the complex survey design (Research Triangle Institute, 2002). 4.0 Results 4.1 Comparison of National Level Estimates The first comparison was for 8 national level estimates for the 31 state pseudo national universe. These types of national estimates are key policy level variables from the survey. Table A gives the three quality measures for each of the two years and the key variable types. At a first glance one can see that the values are lower for every case for the new design. (One case rounds to an equal value but is actually lower for the new design.) We did not perform statistical tests of significance on these cases individually. Aside from the difficulty of calculating the errors for each of these measures, it is really not the purpose of this evaluation to check individual results. We are trying to compare in some way the two overall designs. In order to do this we chose to consider the results in a non-parametric light. If the results were random, and the new design yielded about the same number of improvements as declines in error, or worse yet, the old design had generally better results, what would be the chance of the number of measurements where the new design had lower value? Thinking that for a particular case if the process were random, one would assume the chance that either design had the lower value would be 50%. In that case the probability of all 24 cases being lower would be.5 to the 24 th or less than 1 in 10 million. If we said instead the results for each of three categories were related so that we just looked at an individual quality measure, the chance of all 8 items being lower for the new deign would be.5 to the 8 th or about.0039. Either of these two results would be highly significant for a one tailed test at the 5% level. These results were very good in light of the fact that the new design was based upon limiting the error of only three of the eight variables. The results seem to indicate that the three variables upon which the design was focused relate well to the other variables we used in the evaluation. 4.2 Comparison of State Level Estimates Table B compares results for the same 8 variables across the results for the 31 largest states. As was noted earlier in the paper, the MEPS IC sample design in the past has allowed for publishable estimates for 40 states and recently the design was changed to allow for publishable estimates for all the states. Table B shows the average results across the 31 states included in this analysis for the 8 selected variables of interest. As one can see in Table B for either year the average RSE s for these states for either design was less than 8 percent for all the estimates and most are less than 5 percent. These results in themselves are very good and we feel show that both designs are quite good. When one views these results from the non-parametric point of view used earlier, one can see that 21 of 24 times the new design had a lower average. The probability that 21 or more of 24 averages would be greater for the new design is approximately.00014 and is highly unlikely. If one looks at each set of 8 of the same types of measurements separately, one can see that for all 8 the average design effect is lower for the new design, 7 of 8 times the unit RSE is lower and only 6 of 8 times the actual RSE is lower. The chances individually of having this many or more occurrences given the random difference hypothesis we are using are.0039,.031 and.14. The first two would be significant on a one tailed 5% test but the third would not. Thus, although most often the RSE is actually smaller it is not a significant amount more often. Perhaps, this result should have been expected since the sample sizes per state were slightly less in 2004 than in 2002. We should also note that we analyzed the data using other comparisons than the average values shown in the table. For instance, we looked at the comparisons of the percent of the time that when the two individual values for a state were compared, that the 2004 number was lower. Since there are 31 states and 8 variables per state then there are 248 comparisons for each of the 3 types of measure. For the design effects, the new design values were lower 72.2 percent of the time, while for the unit RSE measure, the new design values were better 58.1 percent of the cases and for the RSE, 838

the new design was better 54.4 percent of the time. If one considers these cases as draws from our random difference hypothesis, then as with the comparisons based upon average values, the design effects would be highly significant, the unit RSE s significant and the RSE s not significant. 4.3 Comparison of Estimates by Firm Size A third important type of estimate is by firm size. Although these estimates are clearly important, they were not considered directly in the design of the new sample. However, indirectly the stratification, which considers size and other factors in assignment to a stratum, may affect these estimates. Table C gives a glimpse of the effects on the estimates for the 8 variables for two sets of firms, those which have less than 50 employees and those which have 50 or more employees. As one can see in Table C.1, the averages are not much different between the two years for the small firm estimates and the averages for the new design are less for the large firm estimates. Table C.2 shows the proportion of times the 2004 values were less than the 2002 estimates. The proportions reflect what one can see in the averages. For small business, the number of times the new design produces smaller results is half the time or less. However, none of the individual proportions are small enough to be significant at the 5% level. For the larger firms two of the three proportions are 1 and are significant under our random difference hypothesis. We speculate that the reason these results occurred was due to the higher emphasis in the optimization process on enrollments and contribution totals. In the allocation, we tried to develop an allocation which would show more improvement in these two variables and little improvement in the estimates of total numbers of establishments that offer health insurance. Since for most enrollees, contributions and premiums are concentrated in larger firms, in hindsight, it is logical that estimates for these firms would improve since their values are the basis for the national totals for enrollees, contributions and premiums. However, small firms are where the vast majority of establishments are. This type of estimate was not as important, thus the small firm results basically are of a similar quality to those of the old design. Although one would like to improve all estimates, the results for large and small firms are an acceptable trade off for improved state and national estimates for the full universe. 4.4 Comparison of Estimates by Industry Table D shows averages of results for estimates for the 8 selected variables by industry group. These results for industry estimates are mixed. Overall of the total of 72 measures in each of the three categories, 63 percent of the design effects were lower in 2004, 51 percent of the unit RSE s were lower in 2004 and 49 percent of the RSE s were lower in 2004. However, as is reflected in the averages shown in the table, the effects of the sample varied by industry. The overwhelming majority of the cases where values of the three measures for year 2004 are higher than for year 2002 are concentrated in the agricultural/forestry/fishing, construction, and other services sectors. As was the case with smaller firms above, these industries generally have smaller rates of offering insurance and lower enrollment rates than the other industries (see AHRQ Website #2). It seems that with industry categories as with firm size categories that the new sample allocation and stratification focus sample on enrollments and contributions and not employment, like the old design. Results have shifted towards the variable of interest across the industries. 5.0 Comments and Conclusions In this paper, we attempted to compare the quality of the old and new MEPS-IC sample designs. Because of vagaries caused by changes in sample allocations and sizes, plus the changes in distributions of data across years, and even changes in patterns of response rates, the authors conclude there is no one measurement that can be used to gain a definitive answer to this type of question. As a result, we compared the results for the two designs using two years of data, 8 variables and multiple domains of the universe. Due to the many comparisons and distributions, we tested the results using random difference assumption that assumed that for any one comparison, there would be an equal chance of either design yielding a lower value of the measure. Using this assumption and testing, the national and state level estimates based upon the entire sample showed definite overall improvement with the new design. We expected that these national and state estimates would be better because variance of the estimates for the entire universe was the evaluation factor that drove the new design. 839

However, national estimates by firm size and national estimates by industry group did not show the same consistent improvement. Estimates for smaller firms and several industries showed minor deterioration in their estimates with the new design. The decline in quality for these cases was not significant and was balanced by improvements in estimates for large firms and for several other industries. We speculate that the changes in the quality of estimates were related to the size of the sub-domain s importance in the entire employer health insurance market. The sub-domains that showed a decline were those which generally have lower offer and enrollment rates of employer sponsored health insurance and thus are of lesser importance in the overall national estimates. Possibly, the sample allocations favored sub-domains more important to the overall national estimate upon which the original sample design was based. The decrease in quality of estimates for certain subdomains is a useful reminder that one can only expect improvements in results for items considered in the design. Given the complexity of trying to consider multiple criteria when developing a new design, one must be very careful to fully prioritize criteria and estimates in the design process, otherwise, one might have an unexpected surprise when actual results are calculated based on the new design. 6.0 References AHRQ Website #1: http://www.meps.ahrq.gov/mepsweb/survey_comp/ins urance.jsp AHRQ Website #2: http://www.meps.ahrq.gov/mepsweb/data_stats/quick_t ables_results.jsp?component=2&subcomponent=1&ye ar=2004&tableseries=1&tablesubseries=b&searchte xt=&searchmethod=1&action=search Cochran W.G. Sampling techniques. New York: John Wiley and Sons: 1977. Research Triangle Institute (2002). SUDAAN User s Manual, Release 8.0. Research Triangle Park, NC: Research Triangle Institute. Sommers, J.P. List sample design of the 1996 Medical Expenditure Panel Survey Insurance Component. Rockville(MD); Agency for Health Care Policy and Research; 1999, MEPS Methodology Report No. 6, AHCPR Pub. No. 99-0037. Sommers, J.P. and Riesz, S. A New Stratification for the Medical Expenditure Panel Survey - Insurance Component 2003 Proceedings of the Survey Research Methods Section of the American Statistical Association, 3987-3992. Table A Comparison of Three Quality Measures for Pseudo National Estimates for Eight Selected s for the Years 2002 and 2004: MEPS-IC Measure RSE Square Rt of Unit RSE Design Effect Avg Family Contr 0.0151 0.0144 0.434 0.333 2.277 2.062 Avg Family Premium 0.0059 0.0052 0.408 0.249 0.897 0.745 Avg Single Contribution 0.0155 0.0141 0.489 0.444 2.343 2.007 Avg Single Premium 0.0061 0.0059 0.514 0.404 0.921 0.837 % Employed Where Insurance Offered 0.0034 0.0034 0.537 0.353 0.505 0.483 %Enrolled Where Insurance Offered 0.0086 0.0074 0.442 0.335 1.268 1.061 %of All Employees Enrolled 0.0092 0.0079 0.414 0.304 1.358 1.138 % of Establishments That Offer Health Insurance 0.0066 0.0064 1.141 1.021 0.980 0.912 840

Table B Comparison of Average Values of Three Quality Measures for State Level Estimates for Eight s for the 31 Largest States for the Years 2002 and 2004: MEPS-IC Measure Average RSE Avg Square Rt Average Unit of Des Eff RSE Avg Family Contribution 0.0757 0.0743 0.412 0.358 1.975 1.824 Avg Family Premium 0.0268 0.0276 0.368 0.305 0.703 0.671 Avg Single Contribution 0.0792 0.0751 0.474 0.435 2.050 1.834 Avg Single Premium 0.0285 0.0271 0.478 0.385 0.744 0.671 % Employed Where Insurance Offered 0.0175 0.0193 0.494 0.411 0.453 0.477 % Enrolled Where Insurance Offered 0.0447 0.0424 0.417 0.345 1.149 1.053 % of All Employees Enrolled 0.0484 0.0461 0.395 0.309 1.246 1.142 % of Establishments That Offer Health Insurance 0.0380 0.0362 1.068 0.957 0.974 0.902 Table C.1 Comparison of Average Values of Three Quality Measures for Eight s by Firm Size for the Years 2002 and 2004: MEPS-IC Measure Average RSE Average Root Average Unit Design Effect RSE Firms with less than 50 employees 0.0155 0.0160 1.077 1.143 1.510 1.510 Firms with 50 or more employees 0.0095 0.0087 0.550 0.430 1.071 0.913 Table C.2 Proportion of Values of Three Quality Measures for Eight s that 2004 Value was Less than 2002 Value by Firm Size: MEPS-IC Measure Average RSE Average Root Design Effect Average Unit RSE Firms with less than 50 employees.250.125.500 Firms with 50 or more employees.750 1.00 1.00 Table D Comparison of Average Values of Three Quality Measures for Eight s by Industry for the Years 2002 and 2004: MEPS-IC Measure Average RSE Average Root Average Unit RSE Industry Design Effect Agriculture, Forestry and Fishing 0.0884 0.1314 0.615 0.691 1.737 2.173 Construction 0.0380 0.0383 0.776 0.752 1.473 1.397 Financial Services and Real Estate 0.0191 0.0181 0.651 0.468 0.944 0.959 Manufacturing and Mining 0.0210 0.0160 0.506 0.412 1.094 0.843 Other Services 0.0242 0.0279 0.686 0.714 1.630 1.665 Professional Services 0.0173 0.0140 0.534 0.386 1.305 1.012 Retail Trade 0.0210 0.0214 0.881 0.960 1.209 1.075 Utilities and Transportation 0.0446 0.0421 0.460 0.453 1.225 1.187 Wholesale Trade 0.0292 0.0289 0.832 0.477 1.017 1.018 841