Population Health Forecasting Model for California Developing a Prototype and its Utility



Similar documents
Assessing and Forecasting Population Health

Assessing and Forecasting Population Health

Mortality Assessment Technology: A New Tool for Life Insurance Underwriting

Racial Disparities in US Healthcare

Health Policy Brief. Nadereh Pourat, Ana E. Martinez and Gerald F. Kominski

A Population Based Risk Algorithm for the Development of Type 2 Diabetes: in the United States

University of Maryland School of Medicine Master of Public Health Program. Evaluation of Public Health Competencies

A Population Health Management Approach in the Home and Community-based Settings

UMEÅ INTERNATIONAL SCHOOL

HowHow to Identify the Best Stock Broker For You

Sample Position Description for Tier 3a: Senior-Level Epidemiologist, Supervisor or Manager

Sacramento County 2010

Aggregate data available; release of county or case-based data requires approval by the DHMH Institutional Review Board

Determinants, Key Players and Possible Interventions

San Diego County 2010

Following are detailed competencies which are addressed to various extents in coursework, field training and the integrative project.

Competency 1 Describe the role of epidemiology in public health

Access to Care / Care Utilization for Nebraska s Women

Chronic Disease and Health Care Spending Among the Elderly

Butler Memorial Hospital Community Health Needs Assessment 2013

Against the Growing Burden of Disease. Kimberly Elmslie Director General, Centre for Chronic Disease Prevention

Chapter 4. Priority areas in health research. Section 1 Burden of disease 1998 in low- and middle-income and in high-income countries

Texas Diabetes Fact Sheet

The American Cancer Society Cancer Prevention Study I: 12-Year Followup

Master of Public Health Program Competencies. Implemented Fall 2015

Health Information Technology in the United States: Information Base for Progress. Executive Summary

The Panel Study of Income Dynamics Linked Medicare Claims Data

GRADUATE RESEARCH PROGRAMMES (MSc and PhD) AY2015/2016 MODULE DESCRIPTION

The Role of Insurance in Providing Access to Cardiac Care in Maryland. Samuel L. Brown, Ph.D. University of Baltimore College of Public Affairs

Principles on Health Care Reform

How To Be A Health Care Provider

Overview of Vital Records and Public Health Informatics in CDPH

Define a public health problem and specify an analytic approach.

Connecticut Diabetes Statistics

District of Columbia Department of Health: Chronic Disease State Action Plan 0

Potential Career Paths: Specialization Descriptions

Master of Public Health (MPH) SC 542

Department of Behavioral Sciences and Health Education


Louisiana Report 2013

3 Focus the Evaluation Design

watch Introduction January 2012 No. 83

An Equity Profile of the Kansas City Region. Summary. Overview. The Equity Indicators Framework. central to the region s economic success now and

Health risk assessment: a standardized framework

Apply an ecological framework to assess and promote population health.

ARE FLORIDA'S CHILDREN BORN HEALTHY AND DO THEY HAVE HEALTH INSURANCE?

The Commonwealth of Massachusetts Executive Office of Health and Human Services

School of Public Health and Health Services Department of Prevention and Community Health

Policy Forum. Racial and Ethnic Health Disparities in Illinois: Are There Any Solutions?

Injury Survey Commissioned by. Surveillance and Epidemiology Branch Centre for Health Protection Department of Health.

ECONOMIC COSTS OF PHYSICAL INACTIVITY

Social inequalities in all cause and cause specific mortality in a country of the African region

TABLE OF CONTENTS ACKNOWLEDGEMENTS... 1 INTRODUCTION... 1 DAMAGE ESTIMATES... 1

Oral Health Program. Strategic Plan. U.S. Department of Health and Human Services Centers for Disease Control and Prevention

Health Policy and Management Course Descriptions

PharmaSUG2011 Paper HS03

The Changing Face of American Communities: No Data, No Problem

PUBLIC POLICY BRIEFING KEYS

SAMA Working Paper: POPULATION AGING IN SAUDI ARABIA. February Hussain I. Abusaaq. Economic Research Department. Saudi Arabian Monetary Agency

STROKE RISK AND OUTCOMES:

REGULATIONS FOR THE POSTGRADUATE CERTIFICATE IN PUBLIC HEALTH (PCPH) (Subject to approval)

Overarching MPH Degree Competencies

Methodology Understanding the HIV estimates

Medicare Advantage Stars: Are the Grades Fair?

Randomized trials versus observational studies

Incorporating Life Course, Social Determinants, and Health Equity into California s MCAH Programs

Health Care Access to Vulnerable Populations

Coronary Heart Disease (CHD) Brief

ASPH Education Committee Master s Degree in Public Health Core Competency Development Project

Oklahoma county. Community Health Status Assessment

Health Insurance Coverage: Estimates from the National Health Interview Survey, 2004

Charles E. Drum, MPA, JD, PhD, Principal Investigator. December 3, 2014

PHFAST Public Health Framework ASsessment Tool Adapted from the Public Health Framework for Action and STAR

Health Profile for St. Louis City


What you will study on the MPH Master of Public Health (online)

Policy Interventions to Address Child Health Disparities: Moving Beyond Health Insurance

King County City Health Profile Vashon Island

Data Analysis and Interpretation. Eleanor Howell, MS Manager, Data Dissemination Unit State Center for Health Statistics

Massachusetts Population

Evidence-Based Practice for Public Health Identified Knowledge Domains of Public Health

Domain #1: Analytic Assessment Skills

HELP Interest Rate Options: Equity and Costs

Facts about Diabetes in Massachusetts

Sample Position Description for Tier 1: Entry-Level Epidemiologist

The Health Reform Monitoring Survey (HRMS): A Rapid Cycle Tool for Monitoring Coverage and Access Changes for Children under the ACA

Glossary Monitoring and Evaluation Terms

Welcome & Introductions. A Study of Population Health. What Is the Family Health Survey?

An In-Depth Look at the Lifetime Economic Cost of Obesity

Effective Federal Income Tax Rates Faced By Small Businesses in the United States

University of Michigan Health Risk Assessment (HRA) and Trend Management System (TMS)

Obesity in the United States: Public Perceptions

Policy Perspectives Paper

DIRECTOR OF PUBLIC HEALTH ROLE PROFILE

Social Care and Obesity

CHAPTER 11. AN OVEVIEW OF THE BANK OF ENGLAND QUARTERLY MODEL OF THE (BEQM)

Sick at Work. The cost of presenteeism to your business and the economy. July 2011 Part of the Medibank research series

Welcome and Introductions

The National Survey of Children s Health The Child

Use advanced techniques for summary and visualization of complex data for exploratory analysis and presentation.

Transcription:

Population Health Forecasting Model for California Developing a Prototype and its Utility Prepared for: The California Endowment Jonathan E. Fielding, MD, MPH and Gerald Kominski, PhD UCLA School of Public Health, in collaboration with the Los Angeles Department of Health Services and the California Department of Health Services UCLA Health Forecasting Team (May 2004) 1

Executive Summary Second Report UCLA Health Forecasting Project The California Population Health Forecasting Model is designed to improve the health of all Californians by reducing health disparities and supporting better, evidence-based decisions by key local, county and state elected and appointed officials as well as community organizations, advocacy groups, health plans, and other public health practitioners. The need for a comprehensive health forecast is substantiated by the rapidly changing and unique sociodemographic population mix in California and the lack of evidence-based health outcome data on our diverse population. Due to these important socioeconomic trends and migration patterns, simple extrapolation from existing trends without incorporating inclusion of these factors is likely to lead to erroneous assumptions and decisions that can have major adverse effects on future health. The stages of forecasting include formulating the problem, obtaining information, selecting the appropriate method, implementing and evaluating the method, and using forecast results. Population health forecasting uses microsimulation techniques to model individual behavior, lifetime events, and ultimately health outcomes. The model considers the influence of countless events and decisions that occur either simultaneously or in congruence with one another, and simulates a large population such as a county, state, or country, in order to produce results that can help decision-makers in better understanding programmatic and policy options to improve health and reduce health disparities. The model is comprised of three building blocks or modules: the base population framework, risk factor-disease modules, and the forecasting module. Together, they will produce estimates of the distributional effects of changes in key health determinants such as population characteristics, individual behaviors like smoking, physical activity levels, and obesity as well as summary measures of health status such as disease incidence and mortality. Physical activity is being modeled as the first risk factor due to its impact on health outcomes and increasing importance in health-related decision-making. Potential users of the California Health Forecasting Model include individuals in various healthrelated fields who are interested in advocating for public health programs or anticipating the future impact of current decisions on health outcomes. The population health forecasting model will serve as a critical analytic tool to introduce timely and relevant information into policy debates at all levels of government as well as in the private sector. It has the potential to interject new and valuable information about our future health based on existing policies and programs; provide critical information regarding changes in health disparities among subpopulations; and even more importantly, gauge the future health impacts of possible changes in current programs and policies on both the whole population and for specific groups. This real world application of the model can enhance both the advocacy and planning process for designing and UCLA Health Forecasting Team (May 2004) 2

implementing strategies and interventions that will improve the health of a population at the level of a state, a county or a community.. Part 1: Description of the California Health Forecasting Model Section 1: Why Forecast? Introduction A forecast is a prediction about the future that accounts for complex events and interactions by modeling characteristics and behavior while taking dynamic variable into consideration. Forecasting provides a useful framework that mitigates, to some extent, the uncertainty of the future. It thus allows one to anticipate the future impact of current actions on certain outcomes. This project focuses on development of a model to forecast the effects of specific actions, alone or in combination, on the health of a population and key subgroups. Both public and private enterprises operate under conditions of uncertainty about the future, but are required to make decisions today based on expectations and beliefs about their future impacts. Forecasting helps to reduce, but not eliminate uncertainty. Accurate forecasts can help an organization make better decisions for the future by gaining insight into both current and future trends. Forecasts can answer three types of questions about future states: When might specific outcomes of interest occur under different scenarios? What are the qualitative characteristics of the outcomes based on current inputs and/or changes in those inputs? What will be the magnitude of some quantifiable outcome, such as mortality or income, in the future? As a society, we devote considerable resources to forecasting the economy, weather, agricultural yields, transportation systems, and technology growth all very difficult and error prone efforts. We invest in these analyses because the return to various members of society, individuals, government agencies, businesses, policy makers, is considered sufficient to justify the endeavor. Each of these groups uses results from the analyses regularly to evaluate alternatives and make better informed decisions. In the health sector, forecasting has been primarily used to project the impact of current or possible changes in federal policies on health care such as their effect on insurance coverage, utilization rates of specific procedures, federal Medicare and Medicaid costs as well as the number, type, and distribution of health care providers. Yet, until now, we have almost completely neglected to forecast our most precious asset health itself. Population health may be easier to forecast than most other issues and topics for which we estimate future trends and effects. The fact is that with current knowledge and computing power, we can forecast the health of our population and key subgroups of the population over several decades with a quite acceptable level of accuracy. Further, we can forecast the health of key subgroups of the population, such as women, older adults, African-Americans, Latinos or children. UCLA Health Forecasting Team (May 2004) 3

The development of a population health forecasting model has the potential to interject new and valuable information about our future health based on existing policies and programs. It will also provide critical information regarding changes in health disparities among sub-populations. Even more importantly, the model will allow us to gauge the future health impacts of possible changes in current programs and policies on both the whole population and for specific groups. Thus, the forecasting model will serve as a critical analytic tool to introduce timely and relevant information into policy debates at all levels of government as well as in the private sector. In this capacity, health forecasting can enhance advocacy efforts as well as the planning process for designing and implementing strategies and interventions that will improve the health of a population. Health Forecast for California We are interested in predicting future health outcomes such as mortality, disease incidence, and morbidity for the population of California. We also want to examine how future trends in health outcomes will differ for subgroups of the population. As specific examples, how will changes in rates of obesity affect the health of Latino women over age 65 living in Los Angeles County or African American men between age 25 and 35 years living in Alameda County? Currently, the evidence base is comprised of populations that do not reflect California s diversity. Thus, our health forecasting tool is necessary for advocacy and program planning as it accounts for changes over time in the demographics of California s population. The basis for the forecasting model is a very substantial data base on the health trends in populations and on characteristics and behaviors that ample epidemiologic evidence has shown to be causally related to health in populations. For example, African Americans have a lower life expectancy compared to whites and Latinos have a higher incidence of Type II diabetes than both African Americans and whites. Also, individuals with lower socioeconomic status (SES) are more likely to die prematurely than individuals with higher SES. Previous research also shows that several risk factors such as physical activity, smoking, alcohol consumption, and diet strongly influence health outcomes For example, individuals who do not engage in physical activity are more likely to develop a heart attack and those who smoke are more likely to experience many forms of cancer as well as cardiovascular disease than nonsmokers. There is also a strong base of evidence from reviews of epidemiologic studies that specific interventions alter the health outcomes in the overall population and subgroups over time but and the distribution of these risk factors is not the same across the population but changes over time and with age. In addition to predicting future trends in health outcomes and disease burdens for the entire population, the California Health Forecast Project uses a microsimulation modeling approach capable of forecasting future health for specific subgroups of the population. Finally, by changing the levels of certain sociodemographic characteristics and risk factors, which is particularly important given California s dynamic population mix and changing characteristics, we are able to see their effects on future health outcomes. UCLA Health Forecasting Team (May 2004) 4

Section 2: Overview of Forecasting and Microsimulation Stages of Forecasting 1. Formulate Problem Figure 1 outlines the stages involved in any forecasting effort. The first step involves by formulating the problem or research question(s) of interest such as: What will the population of California s health status be in 2020? How will health outcomes vary for different groups of people living in California such as Latinos, Angelenos, or persons over age 65 years in 2020? How will health status differ among California adults if their average level of physical activity increases by 20% between now and 2020? 2. Obtain Information Once the problem has been formulated, it is necessary to collect relevant information needed to answer the pertinent questions. Both qualitative and quantitative information should be included if appropriate. For example, to answer the questions posed above, one might need the following information: Historical demographic data on the population of California, Information on population trends in health status over time, Information about key risk factors that reproducibly affect health outcomes, Information about the relationships among demographic characteristics, risk factors, and health outcomes, and Information to project future values of these variables 3. Select Method The next step is to select the appropriate method for forecasting. A forecasting model is a model developed to produce forecasts (or estimates in unknown situations) [and] may draw upon a variety of measurement models for estimates of key parameters [or] true values of some unknown population value (Armstrong, 2001). Two major types of forecasting, qualitative and quantitative, exist in a broad sense. Qualitative forecasts specify the expected direction of the variable of interest whereas quantitative methods estimate both the direction and the magnitude. Quantitative approaches to forecasting are classified based on the type of data and techniques used to analyze the data. When selecting a quantitative approach, one should make several assessments including the level of knowledge about relations between variables; amount of change involved; type of data available; need for policy analysis; and the extent of knowledge regarding the particular domain (e.g., health) (see Figure 2). Aggregate Models One set of quantitative models uses aggregate level data to produce aggregate level forecasts, which do not provide information about the distribution of risk factors or outcomes (Table 1). Naïve Models UCLA Health Forecasting Team (May 2004) 5

The simplest of these approaches is the naïve model, which assumes that future values of the outcome variable will be based on current and historical values (generally the most recent value) without incorporating additional information or conducting any type of analysis. Naïve models generate quick and easy results that typically assess the short-term usually measured in days, weeks, months, or a single year. They require minimal data and are easy to implement. For example, the base rate a typical or average behavior for a given population can be measured with cross-sectional data and serve as a naïve model, which has been referred to as the default forecast. One example of a successful naïve model might be using this year s number of registered nurses employed in a certain area to predict the number of nurses available in the same area the following year. Without accounting for other factors such as nurse recruitment programs and new nursing school graduates or considering external issues that might affect a nurse s decision to work, this approach would be considered naïve. Some observers have argued that naïve models can sometimes outperform other qualitative and quantitative methods in predicting short term future values of certain variables such as the stock market, gross domestic product (GDP), and inflation (Sherden, 1998). However, naïve models are often of limited usefulness for policymakers. The primary limitation is that naïve models do not produce accurate forecasts when variables are expected to change over time. By definition, they will miss trends such as cyclical or seasonal effects in the data. Further, naïve models do not provide any insight into alternative factors that may induce change in future outcomes. In sum, naïve models are adequate for simple decisions with relatively minor consequences where the cost of conducting a formal forecast would outweigh any potential benefits. When little change is anticipated in the future, and the use of complex forecasting methods is not warranted, policymakers may be inclined to use a naïve approach or rely on their own unaided judgments. Extrapolation Models Some demographers and other researchers have extended the naïve model to consider recent trends in an outcome variable when predicting future values. Numerous researchers have published extensively regarding different methods of demographic forecasting, which include time-series analysis and extrapolation (Lee and Carter, 1992). Extrapolation models allow for patterns and trends (e.g., seasonal or cyclical) by using historical values of a series or cross-sectional data that are assumed to be generalizable in the future. In general, extrapolation is used for decisions about trends that are relatively stable and for which much change is not expected in the future. For example, extrapolation would probably not be useful for assessing the effects of a new intervention or the implications of a change in policy. In assessing the health of populations, there are many known factors of influence and a significant body of literature on specific policies and programs that can alter these factors of influence. Cell-Based Models The cell-based model examines associations between risk factors and outcomes without making explicit assumptions regarding causality. Typically, cell-based models employ cross-sectional data and therefore do not allow for a temporal component per se, but do prove to be useful for UCLA Health Forecasting Team (May 2004) 6

following aggregate-level measures over time. This approach to modeling is used often in health-related research, where observational data is readily available and the effects of several independent variables (e.g., individual characteristics and behaviors, interventions, and diseases) on a particular health outcome are of interest. For example, The World Health Report 2002 highlights the relation between twenty risk factors and the global and regional burden of disease (GBD) (World Health Organization, 2002). The report includes predictions about the burden of diseases in different regions of the world that can be attributed to different risk factors. Their analysis creates a hypothetical situation where the disease or risk factor is reduced or eliminated and compares the results to the current situation or future projections based on current levels. There is no consideration of the effectiveness of proposed interventions. Instead, the authors assume that rates for risk factors are set to the lowest possible levels in the alternative scenario. Although it represents a different type of analysis, the GBD makes an important methodological contribution to our work, as it standardizes risk factors and outcomes to allow for comparisons across different factors. Combined Extrapolation and Cell-Based Models Extrapolation techniques and cell-based analyses can be combined to produce aggregate level forecasts that incorporate a temporal component while assessing relationships among variables. Kenneth Manton at Duke University s Center for Demographic Studies has used this approach in publishing extensively about the impact of reductions in disability in the elderly and the implications for the health care system. He developed two health forecasting models: one model was designed to analyze discrete health state changes using population and vital statistics data whereas the other describes both discrete and continuous changes using data from longitudinal community populations (Manton, Stallard, and Singer, 1992). The first model supplements aggregate data with multiple inputs from scientific experts while the second uses relatively information-rich measurements. Both models can be modified by expert judgment to deal with simulations of multiple possible interventions (Manton, Stallard and Singer, 1992). Population Microsimulation Models Microsimulation models are distinct from the models mentioned above in that they do not examine aggregate level data. Rather, microsimulation techniques operate using unit-level data such as household information, as well as joint distributions of risk factors and outcomes (Figure 2). Doing so allows the technique to consider the composition of the population and subgroups while assessing simultaneous changes in multiple factors including behaviors and programs or policies in order to make conclusions based on outcomes of interest in simulated populations. Population microsimulation models typically assess program or policy changes in light of social and economic conditions. In fact, simple microsimulation models have been used to predict the impact of policy and behavioral changes on specific aspects of the health of populations for more than two decades, but have considered only a small set of risk factors and diseases applied to static populations. Microsimulation models are flexible and can easily incorporate different assumptions and new research findings in an efficient manner. This quality is particularly valuable for assessing different scenarios based on competing programs or alternative interventions in an effort to meet decision maker requests. UCLA Health Forecasting Team (May 2004) 7

There are two major types of microsimulation models static and dynamic that are widely used to model potential impacts of social policies (Citro, 1991). Static models operate using cross-sectional databases that provide a snapshot of the population at a point in time (Citro, 1991). In contrast, dynamic models operate using longitudinal databases that contain individual histories (Citro, 1991). In the late 1980 s, Milton Weinstein at Harvard developed the Coronary Heart Disease Policy Model, a static microsimulation model of policy and technological advances on the incidence, prevalence, and mortality from coronary heart disease, and changes in the cost of health care. Lee Goldman at UCSF, Department of General Internal Medicine, continues to use this discretetime, state-transition model to project future trends and assess the impact of interventions. He used the CHD Policy Model to estimate the effects of investments designed to change coronary risk factors between 1981 and 1990 on the incidence, prevalence, mortality and costs of CHD during this period and projected the impact of the interventions through 2015 using effects of risk factor reductions that were estimated for 1990. Observed changes in risk factors between 1981 and 1990 resulted in a reduction of CHD deaths by approximately 430,000 and overall deaths by approximately 740,000, with an estimated cost effectiveness of $44,000 per year of life saved. Given that much of the benefit from risk factor reductions is delayed, estimated reductions for the extended, 35-year period of 1981 to 2015 were 3.6 million CHD deaths and 1.2 million non-chd deaths, which reduced the cost to about $5,400 per year of life saved (Goldman et al, 2001). More recently, researchers at Statistics Canada have developed a dynamic or continuoustime model Population Health Model (POHEM) that combined new and existing models to assess the impact of different policy interventions or technologies on the health of the Canadian population. Infectious Disease Models Infectious disease models include forecasting the progression and impact of certain infectious diseases on future health outcomes and at a very simple level can be modeled in the framework previously discussed. Similar techniques that might be used in this situation would include modeling known risk factors. For example, many of the world's infectious diseases are known to be highly sensitive to long-term climate and short-term weather changes. The application of environmental data to the study of disease provides the opportunity to forecast the risk of disease outbreaks or epidemics. In fact, existing global systems for epidemic preparedness focus on disease surveillance using either expert knowledge or statistical modeling of disease activity and thresholds to identify time periods and areas of risk (Myers et al., 2000). A 2001 report published by the American Academy of Microbiology summarized the state-of-the-field with regard to modeling the relations between climate and human health through changes in vector-born and infectious diseases. Because accurate disease forecasting models would markedly improve epidemic prevention and control capabilities, more sophisticated models have been developed to consider individual interactions and contact patterns that can affect outcomes. Researchers at the University of Michigan have developed computer programs to project the incidence rates of infectious disease based on complex patterns of contact, mode of transmission, incubation period, and vectors of disease transmission (Koopman et al, 2002). UCLA Health Forecasting Team (May 2004) 8

Model Selection Overview In general, quantitative methods are preferable to qualitative techniques when sufficient data are available. In general, quantitative models are used for complex problems, such as health. Also, econometric models typically perform better for long-term forecasting, whereas extrapolation models are often employed for short-term forecasting projects. When large changes in the outcome measure are expected, causal approaches are typically preferred to naïve approaches. The choice of forecasting method depends on four key issues: 1) forecast accuracy needed, 2) complexity of variable relationships, 3) time allowed for analysis, and 4) balance of forecasting costs relative to benefits. When choosing which forecasting method to use, each of these issues should be considered. Other criteria usually employed in the selection process include convenience, market popularity, structured judgment, statistical analytic needs, relative track records, and experience with alternative approaches from published research. While myriad forecasting methods have been employed for various research problems, the appropriate method depends on the specific situation and study objectives (Figure 3). Selecting a single forecasting model can prove to be difficult, therefore it may be useful to combine methods or try multiple approaches. 4. Implement and Evaluate Method The chosen forecasting method(s) will then be implemented to generate projections (i.e., outputs). Implementation requires securing appropriate resources such as a research team, software, and necessary data sets. Implementing more complex methods such as population microsimulation also requires sufficient time for the construction of each model component to ensure that the model will run smoothly. The next critical step is to evaluate the forecast in the situation it will be used by asking a variety of questions about model inputs, assumptions and outputs. Possible questions might include: Do reasonable alternatives exist for the base assumptions? Do the underlying assumptions about rate of change based on changing inputs make sense? Are the data and methods valid and reliable? Are the outputs sufficiently robust to answer the original research question(s) and can the analysis be easily replicated? Forecast evaluation should employ standard scientific principles that would apply to any other academic study. For example, a good evaluation should begin with tests of inputs and outputs followed by a comparison of the method used to reasonable alternatives. Testing inputs is particularly important for causal modeling in which model improvement will allow for a better assessment of policy changes. Testing outputs on the other hand is clearly important for assessing uncertainty but also for ensuring that the appropriate model was chosen. One method of testing inputs involves testing underlying assumptions with objective measures such as published results. In the absence of objective information, subjective measures such as expert opinion can be used. A method for identifying unreasonable assumptions is testing the UCLA Health Forecasting Team (May 2004) 9

construct validity of important model parameters and/or relationships between the parameters as well as other inputs. The next evaluation step should be testing the data and study methods. Attention to data validity and reliability can greatly impact forecast results and conclusions drawn from them. In general, the more important the event (i.e., intervention or policy change) under study, the more important is data testing. In keeping with good scientific practice, all data used in the forecast should be fully disclosed and accessible where appropriate. Ideally the completed model should be validated by predicting the most recent health outcomes, for year T, by creating forecasts based on prior periods, t=0 T-1, and check if there is reasonable predictive value. This requires that the model be developed excluding the most recent years of data, and check if the truncated model can predict the outcomes for these years. Making forecast data available to other researchers and entities will allow outputs to be replicated. Conducting direct replications of the forecast that generate the same or similar results is an obvious reliability check. Other methods of assessing reliability might include comparing parameter results from different forecasts, applying the same forecast methods to similar data, or extending the study to other situations. In addition, output assessment can include examining face validity or the reasonableness of forecast results and using appropriate error measurement. 5. Use Forecast Results If the evaluation identifies gaps in knowledge or deficiencies in the selected method, the decision should be reassessed, a new method should be selected and the process should occur again. Once a thorough evaluation provides evidence that the forecast methods are scientifically sound, model results can be used. In this final step, forecasters must work to ensure that the model gains acceptance among decision-makers, who will in turn use projections from the forecast in the decision-making process. The extent of model use is directly related to the level of confidence placed in the forecast results, which is in large part determined by the level of uncertainty in the forecast. One method of dealing with uncertainty in population forecasts is the variant approach which uses a combination of assumptions to generate a forecast range bounded by high and low values. Another approach is to use statistical techniques to generate probability distributions. Finally, population forecasters can combine more simple statistical approaches with expert opinion to account for uncommon events that are not likely to be captured in trend data. In sum, several approaches to ensure wide spread use of forecast results might include: Providing a range of estimates using prediction intervals for all projections so that policymakers have a greater comfort level with the accuracy of results, Monitoring and reporting forecast accuracy over time, Updating the model periodically to ensure that the most recent data is included and current assumptions hold. Actively disseminating overall results and specific outputs relevant to timely policy challenges. UCLA Health Forecasting Team (May 2004) 10

Table 1: Summary of Characteristics of Different Approaches to Model Data Approach Description and Organization of Data Types of Questions Answered Outputs Aggregate Models Naïve Extrapolation Cell-Based Aggregate-level outcome data from previous and current periods Assumes the current value of the outcome measure is held constant in the future (i.e., no change). Aggregate-level, outcome data Autoregressive Integrated Moving Average Model ARIMA time series, and other extrapolation techniques are used to model expected changes in the future values of an outcome based on historic trends Aggregate-level (typically crosssectional) data on all risk factors and outcomes Models associations between a series of risk factors and an outcome variable What will be the future value of the outcome variable in the near term if nothing changes? What is our best guess of the future value of an outcome measure based on historic trends? Which factors are associated with the outcome variable? What is the predicted value of an outcome measure given a certain set of risk factors? Projected mean values for outcome variable in future (generally short term) time periods Projected mean values for outcome variable in future time periods Parametric or non-parametric estimates of independent variables at aggregate level Predicted mean value of outcome variable in aggregate given specific values of independent variables Examples of Type of Analysis Planning Budgeting Planning Budgeting Demographic models Actuarial models Macroeconomic models Evaluations of Interventions Social Epidemiology Biostatistics Advantages Minimal data requirements Often sufficient if little change is expected during forecast period Useful for outcome variable with distinct patterns that are repeated over time Limited data requirements Fairly accurate in projecting trends over long time periods Allows assessment of associations between risk factors and outcome variables Disadvantages Does not consider factors associated with changes in outcome variable (i.e., provides no insight into risk factors) Uses population means and assumes an underlying distribution of the data No consideration of factors associated with changes in outcome variable (i.e., provides no insight into these factors Uses population means and assumes an underlying distribution of the data Does not consider impact of time on outcome Uses population means and assumes an underlying distribution of the data Greater data requirements No causal inference UCLA Health Forecasting Team (May 2004) 11

Approach Combined Extrapolation and Cell- Based Description and Organization of Data Aggregate-level, data on all risk factors and outcome Models relations between a series of risk factors and an outcome variable and then uses ARIMA, time series, and other extrapolation techniques to incorporate time Types of Questions Answered Given the predicted values for a set of risk factors and historical trends, what is the best guess of the value of an outcome in the future? Outputs Projected mean values for outcome variable in future time periods Predicted mean value of outcome variable in aggregate based on values of independent variables Examples of Type of Analysis Planning Forecasting Evaluation Advantages Allows assessment of relations between risk factors and outcome variables Fairly accurate in projecting trends over long time periods Disadvantages Uses population means and assumes an underlying distribution of the data Significant data requirements Joint distributions of risk factors and outcome measure are required Assumes proportional and constant hazard function No causal inference Microsimulation Discrete Time Continuous Time Individual-level data on all risk factors and outcomes Models a sample population of individuals prospectively to project future distributions of risk factors and outcomes Individual-level data on all risk factors and outcomes Models a sample population of individuals prospectively to project future distributions of risk factors and outcomes What will the future distribution of risk factors and outcomes be for a sample population in the absence of policy changes? How will individuals be affected by policy changes? What will the future distribution of risk factors and outcomes be for a sample population in the absence of policy changes? How will individuals be affected by policy changes? Projected distribution of population according to outcome variable in future time period Projected distribution population according to outcome variable in future time period Planning Forecasting Evaluation Planning Forecasting Evaluation Provides distribution of individuals according to risk factors and outcome variable Models changes in multiple risk factors in a single time period Accommodates age-time interactions, non-proportional and variant hazards and threshold effects Allows for causal inference Provides distribution of individuals according to risk factors and outcome variable Models changes in multiple risk factors simultaneously Accommodates age-time interactions non-proportional and variant hazards and threshold effects Allows for causal inference Extensive data requirements Joint distributions of risk factors and outcome measure are required Less precise than continuous time microsimulation models Forecasts might not be more accurate than simple models Extensive data requirements Joint distributions of risk factors and outcome measure are required Forecasts might not be more accurate than simple models UCLA Health Forecasting Team (May 2004) 12

Section 3. UCLA Population Health Forecasting UCLA Population health forecasting uses microsimulation techniques to model individual behavior, lifetime events, and ultimately health outcomes. The model considers the influence of countless events and decisions that occur either simultaneously or in congruence with one another. The model simulates a large population such as a county, state, or country, in order to produce results that can help make conclusions. Through the consultation process with the advisory panel and the team s forecasting methods research, several important methodological challenges related to population health forecasting have been identified: Long-term methods: for many health conditions, long-term forecasts are needed to model associations between risk factors and final health outcomes because of the long induction periods for many chronic diseases. For example, some health benefits of reductions in smoking initiation rates may only start to be seen twenty or more years later. However, it is also clear that changes in some risk factors have very rapid health benefits for example smoking cessation reduces heart attack rates within the first year. Multivariate approach: The complexity of the factors known to influence health outcomes argues for multivariate approaches including multiple risk factors and multiple health outcomes for health forecasting. Distributional effects: Although policymakers and planners may be interested in knowing whether the population s health will improve or decline globally, they are also interested in identifying the underlying causes of these changes and highlighting distributional effects within subgroups according to race/ethnicity, gender, age, and location. Data Requirements: In order for the final model to be able to produce detailed distributions of the population, risk factors, disease prevalence, and mortality rates by age, gender, race/ethnicity, and county, extensive data are required for each of the building blocks. Each of the building blocks requires distinctly different techniques so each module can be developed independently. However, the form and content of each building block is highly dependent on the others, so it is important to coordinate and integrate the various components and variables in the models. The natural order is to first build the core microsimulation framework while considering the diseases, risk factors, and interventions that will be included in later modules. These factors must be incorporated into the core framework from the outset. Subsequently, the disease module and then the forecasting module will be developed. Base Population Module A base population module for California includes socioeconomic and demographic information and computer programming to perform microsimulations of samples that describe the California population. It provides the underlying framework for the population health forecasting model UCLA Health Forecasting Team (May 2004) 13

and contains a demographic profile of California s population. This allows for synthetic replications, which will enable analysis of intervention effects. This component does not require assumptions on causal relations or structural forms of disease developments, but rather allows these relations to be modeled through the incorporation of disease modules. Risk Factor - Disease Modules Risk-factor disease modules are smaller models that describe linkages between 1) various risk factors, environmental conditions, and population characteristics and 2) morbidity and mortality. These components are crucial to understand health trends in California s population. Various diseases and risk factors are first modeled separately and then linked together through the microsimulation framework, given that the risk factor or disease information can be linked to the variables that are available in the simulation. This approach allows the use of research and conceptual models from around the globe, as long as the associated effects are not specific to one locale or a specific population segment. For example, models of heart disease are not particular to a location within the state and could thus serve as useful tools. However local data are required to model the impact of smog on respiratory diseases in Los Angeles County thus ruling them out for inclusion in the model. Forecasting Module Initially, the microsimulation model needs to use current data on the population of California. In order to make statements about the future, it is also necessary to make projections regarding the model assumptions and population data. These projections may be created using aggregate models of the parameters of interest, such as migration, fertility, and economic developments. These projections could be based on any combination of techniques, such as econometric models, time series models, expert judgments, or the Delphi method. It is likely that judgmental time series forecasting will be principally used to project variables with large expected changes (Webby, O Connor, Lawrence, 2001). Section 4: Specifications of The California Health Forecasting Model Our model is comprised of three components: a descriptive population framework, risk factordisease modules, and a forecasting module (Figure 4). Descriptive Population Framework Our preferred framework for forecasting is similar to two existing microsimulation models developed by Statistics Canada: 1) LifePath is a dynamic longitudinal microsimulation model of individuals and families. Microsimulation models extend multistate, disaggregated models using individual level data (Ahlburg, 2001). Using behavioral equations based on historical data, it creates statistically representative samples consisting of complete lifetimes of individuals. It is used for analyzing and developing government policies, particularly those requiring evaluation at the individual or family level, with an essentially UCLA Health Forecasting Team (May 2004) 14

longitudinal component. It can also be used to analyze a variety of societal issues of a longitudinal nature such as intergenerational equity or time allocation over entire lifetimes. 2) POHEM is an extension of LifePath that models health and disease. Using equations and sub-models developed at Statistics Canada and drawn from the medical literature, the model simulates representative populations and allows the rational comparison of competing health intervention alternatives, in a framework that captures the effects of disease interactions. More information on these models can be found at http://www.statcan.ca/english/spsd/). Both models include overlapping birth cohorts, and fully simulated person characteristics, replicating actual aggregate numbers. In developing our microsimulation models, one hundred years of vital statistics will be required to simulate a complete cohort. In addition, the model requires detailed mortality data (life tables), immigration and birth statistics. Furthermore, we include information on gender, education, employment, income, marital status and geographic location. Finally, the sample cohort needs to be stratified by race and ethnicity so that racial and ethnic differences can be modeled. Risk Factor Disease Modules These modules provide the link between population characteristics and health outcomes and are based on analyses of data sets linking individuals behavior with morbidity and mortality outcomes. Our primary focus is on characteristics such as socioeconomic factors (i.e., income and education) and behavioral factors. Health-related behaviors will include physical activity and diet/nutrition as well as risk factors such as smoking and alcohol intake, cholesterol level, and blood pressure. Health outcomes will include those diseases such as diabetes, cancer and coronary heart disease (CHD) that are both leading causes of mortality and affected by public health interventions. Making the connection between risk factors and health outcomes require the use of widely accepted, prominent modeling approaches (e.g., Weinstein model for CHD) in order to increase public acceptance of the population forecasting model results. By leveraging existing models, there will be no need to conduct additional analyses. In fact, the ideal method of combining literature sources would be to conduct extensive meta-analyses. However, such an approach is also outside the scope of this project. Rather our approach will be to use the results of published meta-analyses and, where such information is not available, to be somewhat less systematic while still accounting for important methodological concerns such as study design. The Forecasting Module This final module enables us to forecast health outcomes and project the results of interventions into the future. Without the forecasting module, we would be able to predict current outcomes under counterfactual conditions, but we would not be able to project future outcomes. Using a variety of forecasting techniques, we will be able to estimate the impact of various interventions not only in today s world, but also in tomorrow s world. In this module, we will take full UCLA Health Forecasting Team (May 2004) 15

advantage of the growing results of systematic meta-analyses of interventions designed to improve health at the population level. The principal source of this data is the Guide to Community Preventive Services but other sources, such as the Cochrane Collaboration, will also be accessed. A mixture of quantitative and qualitative forecasting techniques will be used to develop projections for the relevant variables. As mentioned above, projecting a population's future health status requires considerable data. The process has been facilitated through the collection of trend data and projections of the demographic characteristics of California s population from the Department of Finance. Data Sources Table 2 includes a list of data sources by module for each type of information required for the population health forecasting model. UCLA Health Forecasting Team (May 2004) 16

Table 2: Data Summary Sources for Health Forecasting Model Type of Information Base Population Module 1. Historic and current population counts for each county, stratified by age, gender, and race/ethnicity 2. Historic and current mortality and fertility rates for each county, stratified by age, gender, and race/ethnicity 3. Historic and current immigration rates for each county, by country of origin 4. Historic levels of physical activity for Californians, stratified by age, gender, and race/ethnicity 5.Historic mortality rates for CHD, colon cancer, diabetes, and depression for each county, stratified by age, gender, and race/ethnicity 6. Historic levels of educational attainment for each county, stratified by age, gender, and race/ethnicity Physical Activity Module 1. Relation between physical activity levels and allcause mortality and CHD, colon cancer, diabetes, and depression mortality, stratified by age and gender 2. Persistence of individual physical activity levels over time, stratified by age, gender, and race/ethnicity 3. Relation between physical activity levels in childhood and adolescence and physical activity levels for adults, stratified by age, gender, and race/ethnicity Source(s) 1. Department of Finance (California) 2. Department of Finance (California) 3. Department of Finance (California) 4. Behavioral Risk Factor Surveillance System (BRFSS), National Health Interview Survey (NHIS), National Health and Nutrition Examination Survey (NHANES), California Health Interview Survey (CHIS) 5. Vital statistics data and assumptions. 6. Department of Education (California) 1. Previous studies in published literature; Longitudinal data set such as Alameda County Human Population Laboratory 2. Previous studies in published literature 3. Previous studies in published literature; NHISlinked file, Human Population Laboratory Forecasting Module 1. Projected population count for each county, stratified by age, gender, and race/ethnicity 2. Projected mortality and fertility rates for each county, stratified by age, gender, and race/ethnicity 3. Projected immigration rate for each county, by country of origin 4. Projected levels of physical activity for Californians, stratified by age, gender, and race/ethnicity 5. Projected mortality rates for CHD, colon cancer, diabetes, and depression for each county, stratified by age, gender, and race/ethnicity 6. Projected highest level of educational attainment for each county, stratified by age, gender, and race/ethnicity 1. Department of Finance (California) 2. Department of Finance (California) 3. Department of Finance (California) 4. Forecast based on historical trends from population surveys and assumptions. 5. Forecast based on historical levels from vital statistics data and assumptions. 6. Department of Education (California) or Forecast based on historical levels from population surveys and assumptions. UCLA Health Forecasting Team (May 2004) 17

Simulating an Individual Within the microsimulation framework, the birth and fixed characteristics (gender and race) of an individual are simulated with a probability determined by population tables the birth date could be anywhere in the simulated time period from 1870 through 2020. Historical population tables determine the probability of a birth in the past, and projected population tables determine the probability of a birth in the future. It must be noted that increasing the length of base data does not generally improve forecast accuracy (Ahlburg, 2001). Life Events Once this simulated individual has been created in the computer model, the life events (starting with the birth) can be generated, with the last life event being death. Life events are conditional on the status of the individual according to gender, age, race, education, and employment status. Some of these variables are static whereas others are dynamic and change as a result of an event. For example, marriage is an event causing the status to change from single to married. Events can also change the probabilities of other events (e.g., marriage will change the probability of having a child). Behaviors, such as exercise and smoking, and health outcomes can be generated in a similar fashion. Trends Generally, trends in the important variable such as educational attainment, mortality, and obesity are non-cyclical. For example, each year more people have more education, all-cause mortality rates are declining slightly, and over the past twenty years the percentage of California who are overweight has increased steadily. For some of the current trends, there is a theoretical maximum and minimum, but because of uncertainty there needs to alternative estimates for trend changes over time. Time-series and trending methods are used to extrapolate historical trends into the future. These techniques will be supplemented with expert judgment to predict future discontinuities (Webby, O Connor, Lawrence, 2001). Although the use of multiple forecasting methods has been shown to produce more accurate forecasts, it is a time-intensive process. Therefore, forecasting methods must be refined over time as the model contains better input data, time passes, and new data become available. Uncertainty and Validation Although it is done infrequently in practice, the results of forecasting models should be evaluated (Armstrong, 2001). Common validation methods can use either observed or projected data and include backcasting, making concurrent predictions using split samples, and calibrating models (Armstrong, 2001). For example, trends in historical data can be used to predict current mortality rates for the population. This is known as backcasting. Calibration is a valuable tool to understand feedback within the model. A natural feedback loop that is not preserved in the proposed microsimulation framework is multigenerational fertility rates; however, this impact can be reflected in model outcomes by recalibrating the model after introducing the impact on UCLA Health Forecasting Team (May 2004) 18

fertility rates. Because it often involves post-hoc adjustments to results, calibration needs to be used very carefully to ensure that real effects are not eliminated from the model or smoothed out. If used properly, calibration can provide additional insight into the underlying model. In assessing outputs from the model, evaluation criteria should be pre-specified and multiple error measures should be used (Armstrong, 2001). Once reasonable models have been identified, sensitivity analyses can be conducted. In order to indicate the level of precision of the forecast, prediction intervals will provided along with point estimates. The prediction interval formula for the forecast of a regression-dependent variable conditional upon known future values for the independent variables and normally distributed disturbances is commonly taught and used by forecasters (Chatfield, 2001). The standard formula, however, can fail when the disturbances are non-normal, with the degree of failure increasing rather than decreasing with sample size. Bootstrapped prediction intervals based on either the percentile principle or the percentile-t principle have been found to perform substantially better (Lam, 2002). Assumptions A series of assumptions are required in developing the forecast because: A limited number of databases exist that assess a wide range of health determinants acting at the individual and population levels; A limited amount of longitudinal data is available to assess disease incidence and health status changes and to capture the lag times between certain determinants and outcomes; Defining explicit rules and methods for relating information about different determinants and integrating different data sets into a single data set is challenging; and Setting the rules for when causal relations have not been adequately established and creating analytic frameworks that specify the intermediate and distal variables in the causal models is difficult. Running the microsimulation requires distributional assumptions for multiple variables for a number of years. Due to incomplete data, some estimates may initially be largely driven by underlying assumptions. The microsimulation suggested by this framework eliminates some of the feedback that normally is part of a forecasting framework, particularly for birth/fertility rate trends. This issue will be addressed through an iterative calibration process. Example: Modeling Physical Activity Rationale Due to the complexity of the task and resource limitations, the team decided to start with one risk factor and several diseases in the initial model. A number of risk factors, smoking, obesity, and physical activity, and one disease, diabetes, were considered. Physical activity was selected for several reasons: UCLA Health Forecasting Team (May 2004) 19

It represents a new and important contribution to the field. Physical activity has not been modeled as extensively as smoking and is easier to model than obesity, which is an intermediate health state; It has been shown to significantly impact health outcomes; Effort is increasing to develop interventions that may lead to increased physical activity levels in the population; and Variation in incidence and mortality from a wide range of conditions, such as coronary heart disease, diabetes, colon cancer and depression has been found to be associated with different levels of physical activity. Additional risk factor modules can be added once the initial model has been developed. Smoking, obesity, and diabetes are potential candidates for future inclusion. Challenges Choosing physical activity also poses certain methodological challenges. First, physical activity levels are difficult to measure accurately due to the choice and change over time of categories that are frequently used in classifying physical activity. Physical activity is typically classified according to its purpose: work, leisure time, or transportation. Although initial studies and surveys of physical activity focused on activity within the workplace, more recent work focuses almost exclusively on leisure time activities. As a result, our model initially includes leisure time physical activity. Second, physical activity is comprised of multiple interrelated components including frequency, duration, intensity and activity type, which can result in different effects on subsequent health outcomes. Ideally all of these components are included in the model, however these variables are highly correlated and the independent effects are not consistently measured in the literature. Consequently physical activity in the model is reduced to a single variable that relates energy expenditure in leisure time activities to an individual s metabolic rate in rest, Metabolic Equivalent Hours per week or METhrs/wk. E.g. if an individual walks at a moderate pace while carrying a light object, which has a MET value of 4.0, twice a week for 45 minutes, the weekly LTPA value is equal to 2 x 0.75 x 4 or 6.0 METhrs/wk. Lastly, large national data sets are often cross-sectional with a limited number of items asking respondents to self-report their physical activity levels during a short period, which typically ranges from one week to one month. Ideally, the model would include physical activity patterns over longer time periods, which would be subject to less bias from seasonal weather patterns and other short-term fluctuations in physical activity behavior such as New Year s resolutions. A major barrier to modeling the distribution of health outcomes by race and ethnicity according to levels of physical activity is the limited amount of published research on physical activity patterns among Latinos and relations between physical activity and morbidity and mortality for Latinos. Few data sets contain detailed physical activity information and Latino samples that are large enough to analyze them as a distinct subgroup. Based on the available published research, it seems reasonable to assume that the health impacts of similar physical activity patterns are similar for Latinos and non-latino whites. In other words, ethnicity for these two groups does not modify the effect of physical activity on health outcomes. Although the health impacts of UCLA Health Forecasting Team (May 2004) 20