A 'Virtual Population' Approach To Small Area Estimation



Similar documents
Calculation of Sampling Weights

An Alternative Way to Measure Private Equity Performance

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Instructions for Analyzing Data from CAHPS Surveys:

Demographic and Health Surveys Methodology

DEFINING %COMPLETE IN MICROSOFT PROJECT

The Current Employment Statistics (CES) survey,

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA*

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION

1. Measuring association using correlation and regression

Analysis of Premium Liabilities for Australian Lines of Business

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

Hollinger Canadian Publishing Holdings Co. ( HCPH ) proceeding under the Companies Creditors Arrangement Act ( CCAA )

Forecasting the Direction and Strength of Stock Market Movement

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

To manage leave, meeting institutional requirements and treating individual staff members fairly and consistently.

What is Candidate Sampling

Statistical Methods to Develop Rating Models

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

CHAPTER 14 MORE ABOUT REGRESSION

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001

Brigid Mullany, Ph.D University of North Carolina, Charlotte

The OC Curve of Attribute Acceptance Plans

STATISTICAL DATA ANALYSIS IN EXCEL

Marginal Benefit Incidence Analysis Using a Single Cross-section of Data. Mohamed Ihsan Ajwad and Quentin Wodon 1. World Bank.

Traffic State Estimation in the Traffic Management Center of Berlin

Extending Probabilistic Dynamic Epistemic Logic

Small pots lump sum payment instruction

Credit Limit Optimization (CLO) for Credit Cards

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

How To Calculate The Accountng Perod Of Nequalty

Calculating the high frequency transmission line parameters of power cables

An Interest-Oriented Network Evolution Mechanism for Online Communities

Recurrence. 1 Definitions and main statements

Regression Models for a Binary Response Using EXCEL and JMP

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Statistical algorithms in Review Manager 5

Traffic-light a stress test for life insurance provisions

An Empirical Study of Search Engine Advertising Effectiveness

A household-based Human Development Index. Kenneth Harttgen and Stephan Klasen Göttingen University, Germany

Heterogeneous Paths Through College: Detailed Patterns and Relationships with Graduation and Earnings

Estimating Total Claim Size in the Auto Insurance Industry: a Comparison between Tweedie and Zero-Adjusted Inverse Gaussian Distribution

LIFETIME INCOME OPTIONS

Enterprise Master Patient Index

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Tuition Fee Loan application notes

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

A Performance Analysis of View Maintenance Techniques for Data Warehouses

1 Example 1: Axis-aligned rectangles

Single and multiple stage classifiers implementing logistic discrimination

Construction Rules for Morningstar Canada Target Dividend Index SM

Computer-assisted Auditing for High- Volume Medical Coding

Reporting Forms ARF 113.0A, ARF 113.0B, ARF 113.0C and ARF 113.0D FIRB Corporate (including SME Corporate), Sovereign and Bank Instruction Guide

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

Transition Matrix Models of Consumer Credit Ratings

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

IT09 - Identity Management Policy

The demand for private health care in the UK

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

Nordea G10 Alpha Carry Index

IMPACT ANALYSIS OF A CELLULAR PHONE

14.74 Lecture 5: Health (2)

7 ANALYSIS OF VARIANCE (ANOVA)

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

2016/17

Method for assessment of companies' credit rating (AJPES S.BON model) Short description of the methodology

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Survey Weighting and the Calculation of Sampling Variance

General Iteration Algorithm for Classification Ratemaking

Fixed income risk attribution

The Greedy Method. Introduction. 0/1 Knapsack Problem

Intra-year Cash Flow Patterns: A Simple Solution for an Unnecessary Appraisal Error

Fragility Based Rehabilitation Decision Analysis

The Use of Analytics for Claim Fraud Detection Roosevelt C. Mosley, Jr., FCAS, MAAA Nick Kucera Pinnacle Actuarial Resources Inc.

Assurant Employee Benefits City of Frisco Dental DHMO & Dental PPO

Transcription:

A 'Vrtual Populaton' Approach To Small Area Estmaton Mchael P. Battagla 1, Martn R. Frankel 2, Machell Town 3 and Lna S. Balluz 3 1 Abt Assocates Inc., Cambrdge MA 02138 2 Baruch College, CUNY, New York Cty, NY 10010 3 Centers for Dsease Control and Preventon, Atlanta GA 30329 Abstract A small area estmaton system s developed for the Behavoral Rsk Factor Survellance System. The BRFSS s a state-based health survey but there s a need for county estmates. The 2005-2009 Amercan Communty Survey PUMS s used to create a populaton of adults for each county. Iteratve probablty adjustment s used to ensure that the county estmates agree wth the drect tabulated offcal estmates from the survey for key state and sub-state domans. Key Words: Iteratve Probablty Adjustment, Amercan Communty Survey, Behavoral Rsk Factor Survellance System 1. Introducton and Background One of the most dffcult problems assocated wth the development of small area estmates s related to the nconsstency of the varous small area estmates when compared wth those produced by the prmary survey from whch they are derved. For example, consder a state that s subdvded nto 60 countes. Suppose further that the sample desgn calls a total sample sze of 5000 ntervews, wth stratfcaton consstng of 7 regons, wth a mnmum sample sze of 500 wthn each regon. The poststratfcaton/weghtng process explctly recognzes these 7 regons, so t s possble to produce regon level estmates whch are weghted for regon specfc populaton characterstcs of Age, Gender, and Race. By makng use of explct regon level weghtng, users are assured that when separate survey estmates are produced for these regons they wll be vewed as representatve wth respect to the characterstcs used n the weghtng process. Furthermore, when users combne a subset of these regons, drect survey tabulatons wll be consstent wth those obtaned by addng up separate regon estmates. Ths wll be true n total and when the tabulaton or addton s carred out on the bass of the categores used n the weghtng process. For example, estmates for females on a regon by regon bass wll be consstent wth estmates produced by tabulaton from varous aggregatons of regons up to and ncludng the full state. Suppose now, that estmates are desred for geographc sub-areas (e.g., countes) wthn each of the regons used n stratfcaton, sample control and weghtng. There are several approaches that may be appled when consderng the development of estmates wthn geographc sub-regon. They are as follows: A. Drect tabulaton (usng the exstng weghts) from the ntervews wthn each of the geographc sub-regons. B. Re-weghtng the ntervews wthn the sub-regons, usng some of the regon level populaton estmates as controls. 1953

C. Development of model based estmates wthn the geographc sub-regons usng method that do not nvolve drect tabulaton of survey observatons. D. Development of estmates based on a weghted combnaton of survey data (ether A or B) and model based estmates. E. Suppresson of codes requred for geographc sub-regon tabulaton. Whle ths does not produce estmates, t n fact prevents these estmates from beng produced. It should be noted that each of the above approaches has obvous and subtle drawbacks. Some of these drawbacks are statstcal whle others mpact the perceved valdty and credblty of the survey. In addton to small sub-regon sample sze problems, approach A, (nvolvng drect tabulaton from the weghted data) produces wthn sub-regon estmates that wll addup to the total regon but may have obvous credblty related flaws (e.g. the weghted survey data for a certan sub-regon may have a 80:20 female to male rato, when t s known that the true gender rato wthn the sub-regon s close to 50:50. Further, the projected populaton sze wthn any sub-regon may be qute dfferent from the actual populaton sze. Approach B, whch nvolves wthn regon sub-regon re-weghtng wll produce estmates that may be credble, at the sub-regon level but when these re-weghted estmates from the sub-regons are added together to the regon level they wll dffer from the regon estmates usng the orgnal weghts. Thus there wll dfferent estmates for the total regon. Whle ths stuaton may be statstcally acceptable, t wll often produce serous questons about overall survey credblty on the part of fnal data users and analysts. Approach B s also lmted by small sub-regon sample sze problems. Approach C, whch nvolves the development of model based estmates wll generally result n sub-regon estmates that have certan desrable statstcal propertes, but wll often fal n the add-up or consstency test. For example, f we produce age by gender sub-regon estmates they wll not add up to (.e. be consstent wth) the model based subregon estmates. Furthermore, when these sub-regon estmates are added to form a regon level estmate, they wll generally not agree wth the publshed (.e., offcal ) estmates produced by drect tabulaton. Agan, ths may be acceptable to techncally traned survey statstcans, but wll generally produce strong questons about valdty on the part of polcy makers and subject matter analysts. Approach D suffers from the same drawbacks as approaches A C. Approach E avods these ssues, but lmts the usefulness and sutablty of survey. We have developed a method for producng estmates on a sub-regon (Small Area level that addresses some of the drawbacks and lmtatons outlned above. It allows for the producton of estmates for sub-regons (Small Areas) that are both nternally consstent and consstent wth survey based estmates at the regon level. The methodology for producng these estmates nvolves the development of a vrtual populaton based on an externally avalable large scale survey and/or census and the use of model based probablty estmates that are teratvely adjusted (Iteratve Probablty Adjustment) to conform wth pre-specfed control totals obtaned from drect survey tabulaton. As a result f users of these small area estmates add them up to areas where drect survey estmates are produced, the two estmates wll be n agreement (.e. they wll be consstent). 1954

Whle the methods used n ths system are not new to survey statstcs, we beleve that the partcular combnaton of these methods n order to produce consstent small area estmates has not been prevously publshed. In ths paper we wll frst provde a general descrpton of the steps used n developng the estmates. Ths wll be followed by a more detaled explanaton of the specfc steps followed to produce varous county estmates. It s assumed that the basc sample desgn, sample weghted process and fnal estmaton process have been specfed. Step 1. Determne the geographc level for whch small area estmates are desred and specfy the structure of these small areas wthn areas for whch drect estmates are deemed approprate. Determne f modfcatons to the overall survey weghtng process should be undertaken n order to establsh approprate control constrants for aggregatons of the small area estmates. For example, wthn a state t mght be reasonable to obtan estmates from drect tabulaton for all Metropoltan Areas (and wthn Metropoltan Area sub-components) where the sample sze exceeds 500. If ths s the case, then the overall weghtng process should be modfed to assure that approprate post-stratfcaton controls are appled at ths level. The reason for ths step s to make sure that n the process of aggregaton of the small area estmates to areas where drect tabulatons are possble are not nconsstent because of lack of controls at the level for whch drect estmates are to be produced. For example, f a partcular estmate s hghly correlated wth gender, then f small area estmates are to be consstent wth tabulated estmates, post-stratfcaton by gender s desrable at the level for whch drect estmates are to be produced. Step 2. Identfy a larger data set to be used as the vrtual populaton. Whle t s possble to develop a large vrtual populaton by smulaton, we have found that t s more desrable to make use of a larger data set whch for whch basc demographc and other varables are based on actual observatons. In developng small area estmates of health condtons, we have found that ether the decennal census or Amercan Communty Survey (ACS) publc use mcro data (PUMS) are qute sutable to form the bass of a vrtual populaton, whch we defne as a data set wth reasonably large numbers of observatons wth the small areas for whch estmates are to be produced. Step 3. If approprate apply addtonal weghts to the elements of the vrtual populaton n order to assure consstency wth the overall and sub area weghtng controls used for the survey. Create smulated sub-areas f necessary. Step 4. Develop basc predcton models for the desred estmates based on predcton varables that are avalable n the survey and the vrtual populaton. For example these mght be demographc varables collected on a respondent level or may be varables avalable from outsde sources. For example, demographc varables mght be age, gender, educaton, martal status, whle external varables mght be number of doctors per person wthn the county. We have restrcted our estmate to bnary (0,1) varables and have made use of logstc regresson for the predcton process. Step 5. Apply the model to the elements of the vrtual populaton n order to obtan a predcted probablty for each element. Aggregate these probabltes to the control levels for whch drect estmates are produced and apply teratve probablty adjustments to these probabltes. In general ths adjustment process wll nvolve the total sample and varous demographc subgroups whch estmates are to be avalable. For example, we 1955

have produced small area estmates of bnary health condtons by gender, age group, educaton and martal status. Step 6. Usng the adjusted probabltes, produce the small area estmates. County estmates are developed for behavoral rsk factors, health condtons, and access to care measures usng the 2009 Behavoral Rsk Factor Survellance System. The BRFSS s the largest health survey n the U.S. The BRFSS s conducted annually n each of the 50 states and the Dstrct of Columba by the Centers for Dsease Control and Preventon. Ths state-based survey s conducted by telephone wth a sample of adults (age 18+) usng random-dgt-dalng. The BRFSS questonnare consst of a core module that collects basc rsk factor and health condton data such as general health, health care coverage, smokng, alcohol use, asthma and BMI, as well as soco-demographc characterstcs such as age gender race/ethncty and educaton. The core secton s followed by one or more topc-specfc modules. 2. Creatng an Unfed Weght for the 2009 BRFSS Purpose: Provde drect sample health rsk factor prevalence estmates for all BRFSS domans and subclasses usng one set of weghts. The weghtng methodology for the BRFSS landlne telephone state samples nvolves the calculaton of a desgn weght for each completed adult ntervew. The desgn weght ncorporates the selecton probablty of the telephone number, the number of voce-use landlne telephone numbers n the household, and the number of adults n the household. The desgn weghts are then post-stratfed to control totals obtaned from Clartas Inc. and from the latest 3-year Amercan Communty Survey PUMS. The control totals for each state nclude age by gender, race/ethncty, martal status, educaton, landlne telephone servce nterrupton, age by race/ethncty, and gender by race/ethncty. For those states that are dvded nto regonal strata addtonal control totals are employed for regon, regon by age, regon by gender, and regon by race/ethncty (Battagla et al. 2008). Category collapsng rules are used to avod small sample sze categores. Rakng rato estmaton s used to calculate the fnal ntervew weghts for each state. The rakng algorthm ncorporates a weght trmmng procedure to prevent extreme hgh or low weghts (Izrael et al. 2009). The SMART BRFSS dentfes countes and metropoltan statstcal areas that meet mnmum annual sample sze crtera. Drect sample estmates are provded for these geographc areas. For our purposes we defned SMART geographc areas as countes or MSAs wth a mnmum of 500 adult ntervews whch s very smlar to the defnton used by the BRFSS. For an MSA that cuts across state boundares, the part nsde one of the fve states needed to contan at least 500 ntervews The adult ntervews n each SMART geographc area are weghted to Clartas Inc. control totals for age, gender and n some cases race/ethncty usng cell poststratfcaton. A separate set of SMART weghts are used to provde rsk factor and health condton prevalence estmates for each SMART geographc area. Because the weghtng methodology for the state BRFSS s dfferent from the weghtng methodology for the SMART BRFSS, the prevalence estmates wll dffer. For example, f a regon contaned four countes that were all classfed as SMART countes, then the regonal prevalence estmates from aggregatng the four countes would not agree wth the regon prevalence estmates from the BRFSS state sample weghtng. 1956

Before developng the small area estmates for each county n a state we frst developed a unfed BRFSS weghtng methodology that extended the rakng methodology descrbed above by addng addtonal margns for: 1) SMART county (wth a resdual category for the balance of the state), 2) SMART county by age, 3) SMART county by gender, 4) SMART county by race/ethncty, 5) SMART MSA (wth a resdual category for the balance of the state), 6) SMART MSA by age, 7) SMART MSA by gender, and 8) SMART MSA by race/ethncty. Category collapsng rules are used to avod small sample sze categores. We also used weght trmmng durng the rakng avod extremely large or small weghts. The weght trmmng durng the rakng (Izrael et al. 2009) nvolves placng global and ndvdual constrants on the weghts: Global low weght cap value = mean rakng nput weght tmes 0.091 Global hgh weght cap value = mean rakng nput weght tmes 11.0 Indvdual low weght cap value = respondent s weght tmes 0.2 Indvdual hgh weght cap value = respondent s weght tmes 5.0 We calculated the unfed weght for the adult ntervews n each of the fve states. 3. Creatng a Vrtual Populaton of Adults for Each County Purpose: Create large sample sze county data sets of adults for use n small area estmaton and ensure that the county and state dstrbutons for key soco-demographc varables algn wth populaton control totals. The 5-year 2005-2009 ACS PUMS provdes an extremely large sample of adults lvng n 9,689,251 households n the 50 states and the Dstrct of Columba. For the fve states the sample szes are: State Sample Sze of Adults Calforna 1,266,424 Connectcut 129,485 Mnnesota 193,799 North Carolna 339,707 Texas 821,980 We created a vrtual populaton of adults for each county n the fve states. The external ACS PUMS does not contan county dentfers. The PUMAs ncluded n the ACS PUMS were therefore mapped to countes usng nformaton avalable from the Mssour State Data Center webste (http://mcdc2.mssour.edu/websas/geocorr2k.html). They provde populaton estmates for PUMA/county ntersectons. Usng ther nformaton we created a mappng of PUMAs to countes that conssted of four categores: One-to-one mappng between the PUMA and the county, Two or more PUMAs cover the county, One PUMA covers two or more countes, and All remanng relatonshps between PUMAs and the county. An example of the fourth category s a county covered by three PUMAs. Two of the PUMAs only cover the county but the thrd PUMA also covers an adjacent county. For 1957

each of the three ntersectons we have the estmated populaton from the Mssour State Data Center webste. Usng Connectcut as an example we show the sze of the vrtual populaton of each county and the mappng category number: County FIPS Code Sample Sze of Adults Mappng Category Number 09001 32,085 2 09003 32,733 2 09005 8,771 1 09007 6,059 1 09009 29,533 2 09011 10,477 2 09013 5,212 1 09015 4,615 1 4. Reweghtng the ACS PUMS For each county n the U.S. the Census Bureau Populaton Estmates Program provdes annual populaton estmates by age, gender and race/ethncty. However, as dscussed above the BRFSS has used Clartas Inc. as the source of ts age, gender and race/ethncty control totals for the state and SMART geography poststratfcaton. These control totals are also avalable at the county level. Usng the ACS person weght as the rakng nput weght we carred out a separate rakng for each county n a state to brng the weghted dstrbuton of the vrtual populaton of adults nto close agreement wth the Clartas Inc. control totals for age by gender and for race/ethncty wth collapsng of small race/ethncty categores n a county. For mappng categores 1 to 3 we used the ACS person weght as the rakng nput weght. For mappng category 4 we frst adjusted the ACS person weghts usng the estmated populaton of the PUMS/county ntersecton as a proportonal weghng factor, where the proportons sum to one. For a gven county the ACS person weghts can exhbt consderable varablty due to the ACS sub-county geographc sample allocaton procedures. To avod endng up wth adjusted ACS person weghts that exhbted consderable varablty, we developed a weght trmmng approach that nvolved conductng a mnmum of two rakngs n each county usng the followng rules for trmmng hgh weght values: 1. Rakng # 0: No weght trmmng. Convergence crteron = maxmum percentage pont dfference of 0.1. Calculate Upper half way pont 1 = (Maxmum weght / Mean weght) / 2. 2. Rakng # 1: Rakng wth Global Hgh Weght Cap Value. Global hgh weght cap value = Upper half way pont 1 If the rakng converges for the county go to step 3. If the rakng does not converge wthn 25 teratons go to step 4. 1958

3. Calculate Upper half way pont -1 = Upper half way pont 1 / 2. Rakng # -1: Rakng wth Global Hgh Weght Cap Value. Global hgh weght cap value = Upper half way pont -1 If the rakng converges go to step 5. If the rakng does not converge wthn 25 teratons stop and save the rakng weght from rakng # 1. 4. Calculate Upper half way pont 2 = Upper half way pont 1 + (Upper half way pont 1 / 2). Rakng # 2: Rakng wth Global Hgh Weght Cap Value. Global hgh weght cap value = Upper half way pont 2 If the rakng does not converge wthn 25 teratons go to step 6. If the rakng converges stop and save the rakng weght from rakng # 2. 5. Calculate Upper half way pont -2 = Upper half way pont -1 / 2. Rakng # -2: Rakng wth Global Hgh Weght Cap Value. Global hgh weght cap value = Upper half way pont -2 If the rakng converges go to step 7 If the rakng does not converge wthn 25 teratons stop and save the rakng weght from rakng # -1. 6. Calculate Upper half way pont 3 = Upper half way pont 2 + (Upper half way pont 2 / 2). Rakng # 3: Rakng wth Global Hgh Weght Cap Value. Global hgh weght cap value = Upper half way pont 3 If the rakng does not converge wthn 25 teratons stop and flag ths county as a nonconvergence county. If the rakng converges stop and save the rakng weght from rakng #3. 7. Calculate Upper half way pont -3 = Upper half way pont -2 / 2. 1959

Rakng # -3: Rakng wth Global Hgh Weght Cap Value. Global hgh weght cap value = Upper half way pont -3 If the rakng converges stop here and save the rakng weght from rakng # -3. If the rakng does not converge wthn 25 teratons stop and save the rakng weght from rakng #-2. After completng all of the county rakng n a state a state level rakng was conducted to obtan a fnal adjusted ACS person weght for each adult n each county. Ths ensures that the BRFSS and the ACS PUMS have the same weghted dstrbutons on key socodemographc varables (Battagla et al. 2009). Usng the adjusted county ACS raked weght nput weght, we raked to the BRFSS control totals for: Age by gender Race/ethncty Educaton Martal status Gender by race/ethncty Age by race/ethncty Regon To mantan the county level controls we also ncluded rakng margns for: County by age by gender County by race/ethncty No weght trmmng was employed for the state level rakng, because the weghted ACS margns were typcally very close to margnal the control totals. 5. Eleven Rsk Factor and Health Condton Dependent Varables Purpose: Develop state level logstc regresson models predctng health rsk factor dependent varables. The BRFSS questonnare ncludes several key health rsk factors, obtans nformaton on access to health care, and also obtans self-reports on specfc health condtons. Eleven key varables were selected for the development of county prevalence estmates: Current smokng Current asthma Bnge drnkng Obese Far/Poor health Dabetes No physcal actvty 1960

No health care coverage No medcal home Delayed medcal care due to cost reasons No checkup n past 12 months We created eleven dchotomous dependent varables. The BRFSS has a very low level of mssng data on most of the health-related questons n the survey. In order to have the same sample count n state for all eleven dependent varables we used a sngle mputaton hot deck procedure wth mputaton cells formed on the bass of age group, gender and race/ethncty. Logstc regresson predctor varables fell nto two categores. The frst category ncluded ndvdual level soco-demographc predctors that are avalable n the BRFSS and n the ACS PUMS. The second category conssted of county level varables obtaned from the Area Resource Fle and from County Busness Patterns. The predctor varables are shown n the table below. Indvdual Level Predctors: Gender Number of adult males n household Number of adult females n the household Martal status Number of chldren n the household Educaton Race/ethncty Regon (example of state wth 5 regons) Regon 1 Regon 2 Regon 3 Regon 4 Regon 5 1 Male 2 Female 0 Adult Men 1 Adult Men 2+ Adult Men 0 Adult Women 1 Adult Women 2+ Adult Women 1 Marred 2 Dvorced/Separated 3 Wdowed 4 Never marred 1: 0 Chldren 2: 1 Chld 3: 2 Chldren 4: 3+ Chldren 1 Less than Hgh School 2 Hgh School graduate 3 Some College 4 College graduate Whte, nonhspanc Black, nonhspanc Hspanc Asan, Amercan Indan, Other, nonhspanc 1961

Age Group 1: 18-24 2: 25-29 3: 30-34 4: 35-39 5: 40-44 6: 45-49 7: 50-54 8: 55-59 9: 60-64 10: 65-69 11: 70-74 12: 75-79 13: 80-99 County level predctors: Low Educaton Typology (25 percent or more of resdents 25 through 64 years old had nether a hgh school dploma nor GED) Low Employment Typology (Less than 65 percent of resdents 21 through 64 years old were employed) Housng Stress Typology (30 percent or more of households had one or more of these housng condtons: lacked complete plumbng, lacked complete ktchen, pad 30 percent or more of ncome for owner costs or rent, or had more than 1 person per room) Populaton Loss Typology (Number of resdents declned both between the 1980 and 1990 censuses and between the 1990 and 2000 censuses) County Brths County Black Populaton County Deaths County Dentsts County Emergency Room Vsts County Hspanc Populaton County Hosptal Admssons County Hosptal Beds 0 (county does not meet typology) 1 (county meets typology) 0 (county does not meet typology) 1 (county meets typology) 0 (county does not meet typology) 1 (county meets typology) 0 (county does not meet typology) 1 (county meets typology) Total brths per 100,000 populaton Percent of populaton Afrcan- Amercan Total deaths per 100,000 populaton Count of total prvate practce, non- Federal dentsts per 100,000 populaton Count of emergency room vsts at short-term general hosptals per 100,000 populaton Percent of populaton Hspanc/Latno Count of admssons at short-term general hosptals per 100,000 populaton Count of hosptal beds at short-term general hosptals per 100,000 populaton 1962

County Hosptals County MDs County Poverty County Medcal Specalsts County Lquor Stores County Ftness Establshments County Fast Food Establshments Count of short-term general hosptals per 100,000 populaton Count of general practce MDs (General Practce and Famly Medcne, patent care, offce based, non-federal) per 100,000 populaton Percent of persons n poverty Count of medcal specalst MDs (Allergy and Immunology, Cardovascular Dsease, Dermatology, Gastroenterology, Internal Medcne, Pedatrcs, Pedatrc Cardology and Pulmonary Dsease, patent care, offce based, non-federal) per 100,000 populaton Beer, wne & lquor stores per 100,000 populaton Ftness & recreaton sports centers per 100,000 populaton Lmted-servce eatng places per 100,000 populaton The BRFSS state sample szes are large enough to allow for the fttng of state specfc logstc regresson models. For each state man effect weghted logstc regresson models were ft for each dependent varable. For the 55 models we found that the ndvdual level predctors were much more lkely to be statstcally sgnfcant than the county level predctors, reflectng the prmary mportance of ndvdual characterstcs n determnng rsk factors and health condtons. 6. Iteratve Probablty Adjustment Purpose: Adjust health rsk factor predcted probabltes of adults n each county n a state so that the small area estmates are n agreement wth the drect sample estmates for all key domans n that state. For a gven state we can use the coeffcents from a logstc regresson model to assgn predcted probabltes, prob, for that dependent varable to each adult n the vrtual populaton of each county n the state. Because each adult n the vrtual populaton also has an adjusted ACS person weght, WTACS, we can estmate the state prevalence of that rsk factor: P ACS WTACS prob WTACS Y X ACS 1963

as well as the county prevalence for county h: P ACSh WTACS prob h WTACS h h Y X ACSh h Estmates for key domans (e.g., gender) wthn a county or at the state or regon level can also be calculated. One problem wth usng the predcted probabltes from the logstc regresson models s that the aggregaton of the small area estmates wll typcally not agree wth the drect sample estmates. For example, f the aggregate the prevalence estmates for all the countes n the state t s very unlkely that the state prevalence wll agree wth the drect sample estmate from the BRFSS. Also, f we aggregate the estmates for males wthn each county, the state level prevalence estmate for males wll lkely dffer from the drect sample estmate for males from the BRFSS. For a sngle doman such as gender one can calbrate the predcted probabltes to ensure that the prevalence estmates for males and females are n agreement wth the drect sample BRFSS estmates. The BRFSS however has several key domans: SMART countes SMART MSAs Regons Martal status Educaton Race/ethncty Gender Age group We developed an teratve probablty adjustment (IPA) algorthm to calculate adjusted predcted probabltes for the adults n each state for a gven rsk factor varable. The approach s teratve n that t termnates after 10 teratons or when the maxmum dfference between an adjusted prevalence estmate and the drect BRFSS prevalence estmate for the doman s 0.1 percentage ponts or less. IPA conssts of two man steps. Frst we rato adjust the predcted probabltes of all adults n a state so that Y ACS equalsy BRFSS, and therefore PACS PBRFSS. If any adults have a predcted probablty that s rato adjusted to a value greater than one, we truncate that adjusted predcted probablty to one, and proportonately allocate the reducton n Y ACS to the remanng adults n the state. The adjusted predcted probablty of adult n pred. county h s h0 Second, an teraton s defned as the sequental adjustment of the predcted probabltes n the order of 8 doman varables lsted above. Startng wth the adjusted predcted probabltes from step 1, we rato adjust the predcted probabltes for the adults n each 1964

SMART county doman d (plus a resdual category for the balance of the state) so that the ACS prevalence estmate s nexact agreement wth the BRFSS drect sample prevalence estmates: pred PBRFSSd pred WTASCdh pred dh0 d WTASC dh d dh1 dh0 We then use these adjusted predcted probabltes as the nput probabltes nto the rato adjustment step for the SMART MSAs n the state (plus a resdual category for the balance of the state). At ths pont the ACS prevalence estmates for the SMART MSAs and the balance of the state are n exact agreement wth the BRFSS drect sample prevalence estmates. We contnue the frst teraton by movng on to the regon varable and work our way to the end of the frst teraton by adjustng the predcted probabltes wthn each of the age groups. For each doman varable f any adults have a predcted probablty that s rato adjusted to a value greater than one, we truncate that adjusted predcted probablty to one, and proportonately allocate the reducton n ACSd to the remanng adults n doman d. We contnue ths process wth teraton two by returnng to the SMART county doman varable. The IPA contnues untl we complete 10 teratons or the maxmum dfference for any doman s less than 0.1 percentage ponts. Y 7. County Prevalence Estmates The IPA yelds an ACS PUMS data set wth 11 predcted probablty varables for each adult n a county. The ACS PUMS also contans the fnal adjusted ACS person weght. The county prevalence estmates are calculated as weghted proportons for all adults. Our approach also allows for the calculaton of county prevalence estmates for key domans such as age group, gender, race/ethncty, educaton and martal status. Usng Texas as an example we frst aggregate the county estmates for three of the 11 rsk factor varables to each of the eght key domans lsted above. We then compare the aggregated estmates wth the BRFSS drect sample estmates. As shown n Table 1 all of the dfferences are zero or very small, ndcatng that the small area estmates add to the publshed estmates. Table 1: Dfference Between Aggregated County Estmates and Drect Sample Estmates for Key State and Sub-State Domans Current Smokng Prevalence Current Asthma Prevalence Doman County BRFSS Dfference County BRFSS Dfference Aggregato n Estmate Aggregato n Estmate Age: 18-24 20.7574 20.7574-0.00000 9.7883 9.7883-0.00000 25-34 22.1642 22.1642 0.00000 5.4996 5.4996-0.00000 1965

35-44 24.2395 24.2395 0.00000 5.7456 5.7456 0.00000 45-54 21.6495 21.6495 0.00000 6.9607 6.9607-0.00000 55-64 18.5290 18.5290-0.00000 8.3375 8.3375 0.00000 65+ 9.6464 9.6464 0.00000 7.6868 7.6868 0.00000 Gender: Male 23.9505 23.9505-0.00001 5.2661 5.2660 0.00007 Female 16.1329 16.1329 0.00001 8.9270 8.9271-0.00007 Race/Ethncty: Whte 22.2224 22.2228-0.00041 8.3429 8.3434-0.00043 nonhspanc Black 18.6209 18.6213-0.00035 7.7140 7.7142-0.00021 nonhspanc Hspanc 17.5094 17.5098-0.00034 4.9450 4.9450-0.00002 Asan, Amercan 17.0017 17.0020-0.00033 7.8241 7.8242-0.00012 Indan, Other nonhspanc Educaton: Less than HS 23.4926 23.4893 0.00332 6.8501 6.8506-0.00050 Hgh School 26.4108 26.4072 0.00363 7.5722 7.5721 0.00009 graduate Some College 19.0975 19.0949 0.00264 6.8715 6.8712 0.00027 College graduate 10.4191 10.4176 0.00146 7.1478 7.1478-0.00003 Martal Status: Marred 15.2333 15.3211-0.08784 5.1489 6.8506-0.00050 Dvorced/Separat 32.1284 32.3137-0.18536 10.9451 7.5721 0.00009 ed Wdowed 14.4539 14.5372-0.08330 9.7238 6.8712 0.00027 Never marred 25.2497 25.3956-0.14586 8.8001 7.1478-0.00003 Regon: 1 19.4770 19.4773-0.00024 6.5387 6.5388-0.00005 2 16.3861 16.3864-0.00027 7.5073 7.5074-0.00010 3 17.8142 17.8144-0.00017 6.9251 6.9251-0.00002 4 21.2723 21.2724-0.00011 6.0186 6.0185 0.00016 5 15.4479 15.4479-0.00001 9.9822 9.9828-0.00056 6 17.1135 17.1135-0.00002 6.6098 6.6101-0.00037 7 13.6567 13.6567-0.00000 8.4959 8.4960-0.00006 8 16.2499 16.2499 0.00001 6.3020 6.3027-0.00068 9 16.3166 16.3166-0.00003 2.5042 2.5045-0.00026 10 26.6284 26.6308-0.00238 13.5830 13.5827 0.00031 11 18.8144 18.8144-0.00003 13.0704 13.0702 0.00020 12 16.7505 16.7507-0.00013 5.6429 5.6424 0.00044 13 24.0131 24.0132-0.00008 6.6902 6.6900 0.00024 14 22.6273 22.6274-0.00006 8.1496 8.1491 0.00048 SMART MSA: Austn, TX MSA 16.3294 16.3293 0.00011 7.0376 7.0380-0.00037 Dallas, TX MD 17.8142 17.8144-0.00017 6.9251 6.9251-0.00002 El Paso, TX 16.2499 16.2499 0.00001 6.3020 6.3027-0.00068 MSA Fort Worth, TX 21.2723 21.2724-0.00011 6.0186 6.0185 0.00016 MD Houston, TX 19.7179 19.7405-0.02259 6.5128 6.5014 0.01138 MSA Lubbock, TX MSA 23.1242 23.0476 0.07659 14.2194 13.9646 0.25485 1966

McAllen, TX 16.3166 16.3166-0.00003 2.5042 2.5045-0.00026 MSA San Antono, TX 15.2570 15.1641 0.09283 9.7508 9.9103-0.15953 MSA Rest of State 24.0046 24.0167-0.01205 7.5562 7.5290 0.02723 SMART County: Bexar County 15.4479 15.4479-0.00001 9.9822 9.9828-0.00056 TX El Paso County 16.2499 16.2499 0.00001 6.3020 6.3027-0.00068 TX Fort Bend 16.3861 16.3864-0.00027 7.5073 7.5074-0.00010 County TX Harrs County 19.4770 19.4773-0.00024 6.5387 6.5388-0.00005 TX Hdalgo County 16.3166 16.3166-0.00003 2.5042 2.5045-0.00026 TX Lubbock County 22.6873 22.7011-0.01384 14.4114 14.1331 0.27830 TX Travs County 17.2574 17.2928-0.03539 7.6827 8.0817-0.39905 TX Wllamson 13.6567 13.6567-0.00000 8.4959 8.4960-0.00006 County TX Rest of State 21.3675 21.3655 0.00204 6.9912 6.9686 0.02259 8. Conclusons Small area estmaton technques often produce estmates that do no add up to the offcal drect sample estmates. Ths may rase valdty concerns among the consumers of the county estmates. To address ths ssue we have ntegrated two technques 1) creatng a large sample vrtual populaton usng the 2005-2009 ACS PUMS for each county and assgned predcted probabltes to the adults n that sample, and 2) usng Iteratve Probablty Adjustment (IPA) to constran the predcted probabltes so that the county estmates add to the doman estmates from the statebased sample survey (.e., the BRFSS). Although these technques have been used ndvdually, we are not aware of the combned use of these technques to develop county estmates. Because we are assgnng adjusted predcted probabltes to the ndvduals n the n the vrtual populaton n each county, our approach also yelds county estmates for key domans such as age group, gender, race/ethncty, educaton and martal status. References Battagla, M.P., M.R. Frankel, and M.W. Lnk. 2008. Improvng Standard Poststratfcaton Technques For Random-Dgt-Dalng Telephone Surveys. Survey Research Methods. Vol 2, No 1. Battagla, M.P., D. Izrael, D.C. Hoagln, and M.R. Frankel. 2009. Practcal Consderatons n Rakng Survey Data. Survey Practce, June 2009. Izrael, D., M.P. Battagla, and M.R. Frankel. 2009. Extreme Survey Weght Adjustment as a Component of Sample Balancng (a.k.a. Rakng), 2009 SAS Global Forum, http://www.abtassocates.com/page.cfm?pageid=40858&famlyid=8600. 1967