Basic Study Designs in Analytical Epidemiology For Observational Studies Cohort Case Control Hybrid design (case-cohort, nested case control) Cross-Sectional Ecologic
OBSERVATIONAL STUDIES (Non-Experimental) Observational because there is no individual intervention, treatment, exposures occur in a non-study environment (i.e. not randomly) Individuals can be observed, prospectively, retrospectively or currently
Three observational analytic designs with individuals as the unit of analysis Cohort can be prospective or retrospective (also called concurrent/non-concurrent) (can also be mixed) Case Control Hybrid design Cross-sectional
DEFINITION OF COHORT STUDY The analytic method of epidemiological study in which subsets of a defined population are identified who are, have been, or in the future may be exposed or not exposed, or exposed in different degrees, to a factor or factors hypothesized to influence the probability of occurrence of a given disease or other outcome.
Cohort Studies Prospective cohort study is the gold standard of observational studies because events can be recorded as they occur, as opposed to obtaining information retrospectively Cohort is followed over time, and outcomes ascertained (i.e. disease incidence, death, remission, etc)
USUAL FEATURES OF COHORT STUDY Observation of Large Numbers Over A Long Period Comparison of Incidence Rates in Groups That Differ in Exposure Levels
CALENDAR TIME ISSUES True Prospective Study/Concurrent study. Cohort constructed in present time; exposures documented in present time and possibly in future; cohort followed prospectively in future calendar time. Historical Prospective/Non-concurrent Study. Cohort constructed in past time; exposures documented in past; follow-up can extend into future. Mixed: Can include both concurrent and non-concurrent exposures
Advantages/Disadvantages of Concurrent, Non concurrent cohort studies Concurrent Exposures/outcomes can be measured prospectively, can obtain biologic measurements Depending on outcome of interest, follow up time may be long, increasing costs Easier to trace participants Non-concurrent Must rely on records, recall of participants to measure exposures, outcomes, subject to error Measures prior exposures, usually cheaper Subjects harder to trace, especially if last contact /record many years prior
TYPES OF COHORTS General Populations Occupational Groups Memberships of Groups (HMO Members, Graduates of a Particular College, MEDICARE Beneficiaries, Vietnam Era Veterans, Survivors of Atomic Bomb, Framingham residents)
TEMPORAL MEASUREMENT OF EXPOSURES At Baseline Only Throughout Study
TYPES OF MEASUREMENT OF EXPOSURES Direct Measurement Ex. Biologic measurements / air sampling/ water sampling Use of Surrogate Measurements Ex. Work records, questionnaires, medical records
SOURCES OF MEASUREMENTS Direct Measurement subject interviews Use of Proxies (friends, relatives)
STEPS IN CONDUCTING A COHORT STUDY Prospective Study Define cohort, invite subjects to participate Obtain baseline exposure measurements Interviews, biologic samples, clinical assessments, air /water sampling, etc Follow cohort for disease match subjects to registry data, survey cohort for disease, etc May obtain additional exposure measurements over time ex. survey dietary intake, post menopausal hormone use, etc. Analyze disease risk according to exposures
Retrospective Cohort Study Procedures Define cohort may include deceased individuals Ex. All employees of company a who worked 6 months or longer between time x and time y Obtain retrospective exposure measurements-- Ex. Review work records for job descriptions, exposure data, survey subjects regarding work history, smoking history, etc Obtain retrospective disease/mortality data for period of study. Usually only mortality data will be available can obtain death certificates. Some states may have disease registries for the time period under study. Analyze events (i.e. cause of death), by exposure status (Relative Risk, SMRs)
OUTCOME MEASURES Incidence Risk Ratio (Relative Risk) Odds Ratio (Relative Odds) Attributable Risks Clinical Attributable Risk Population Attributable Risk
ISSUES THAT CAN IMPACT RATES AGE EFFECT Most diseases vary by age, hence most analytic studies will always take age into account in the analysis COHORT EFFECT Year of birth could affect exposure and/or disease (stomach cancer rates declined after 1930's, probably due to the advent of refrigeration, and hence less need for preserved, smoked foods that are associated with stomach cancer PERIOD EFFECT change in risk of disease at some point in time (risk increases for all ages, all cohorts). Not as important for diseases with cumulative effects (i.e. smoking and lung cancer). Change can be due to change in exposure (more relevant for infectious disease), change in treatment (change would have to have large impact) or improved detection (increase in brain tumor rates may be due to better diagnostic tools in the past 20-30 years)
Case based Case Control Studies Nested case control (nested within cohort)
DIAGRAM OF CASE-CONTROL STUDY Population Disease Present Disease Absent Subjects Exposed Exposed Not Exposed Cases Not Exposed Controls
STEPS TO CONDUCT A CASE CONTROL STUDY Select cases (hospitals, registries, other) Select controls (RDD, HCFA, friends, etc) Obtain exposure information (interviews, record reviews, etc) Analyze data (odds ratios)
WHY DO CASE CONTROL STUDIES? ADVANTAGES COMPARED TO COHORT STUDIES For rare diseases, less costly to conduct a case control study (requires less subjects) Follow up of a large number of individuals for a long period of time is required in a cohort study of rare diseases (i.e. cancer) Subjects may drop out, lost to follow up
DISADVANTAGES of Case Control Studies Recall bias selective recall of prior exposures among cases, controls Retrospective assessment of exposures subject to error. Limited to one disease. If selected exposures are of interest (for example, asbestos) the number of cases/controls with this exposure likely to be low. Better to conduct a cohort study among exposed/unexposed. May be difficult to ascertain cases.
DESIGN FEATURES OF CASE-CONTROL STUDIES Cases Sources of Cases * Population Based Registries * Hospital, HMO, and Other Health Provider Records * Berkson s Bias Definition of Cases * Diagnostic Criteria * Incidence Based Cases * Prevalence Based Cases * Prevalence-Incidence Bias (Neyman s Bias)
AN EXAMPLE OF PREVALENCE-INCIDENCE BIAS USING FRAMINGHAM DATA
DESIGN FEATURES OF CASE-CONTROL STUDIES (CONTINUED) Controls Sources of Controls * Institutional Controls (Hospital, HMO, or other Medical Care Provider Sources) Subjects Selected Among Those Not Having Same Disease as Cases. Subjects More Accessible and Cooperative Subject to Similar Referral Patterns as Cases Easier to Measure Exposure from Records, Physical, or Laboratory Measurements * Population Controls (Neighbors, Friends, Relatives of Cases, Random Samples (RDD), Driver s License Records, HCFA) Source Population is Better Defined and Easier to Ensure that Cases and Controls Come from the Same Population Exposure Measurements More Likely to be Representative of Population without Disease Problems of Overmatching or Over-Controlling Population Controls may have lower response rates, however (RDD, HCFA, Driver s license records)
Other Issues 1. Should dead cases be included in your study? (Ex. Eligible incident cancer cases who die before they can be interviewed) Can interview proxies (family, co-workers) but data will not be as reliable 2. Should dead cases be matched to dead controls? If dead cases are included, should only dead controls be used as a comparison group? Rothman dead controls are not in source population for controls since they have no chance of getting the disease. However, if they have same exposure distribution as source population, can reasonably use dead controls proxy sampling Some exposures in deceased population are likely to be higher than in source population however i.e. smoking.
DESIGN FEATURES OF CASE-CONTROL STUDIES (CONTINUED) Definition of Controls Selection of Controls * Unmatched * Pair Matched * Frequency Matched Ratio of Controls to Cases * Not much gained from more than 3:1 Matching
EFFECT OF NUMBER OF CONTROLS PER CASE ON THE RELIABILITY OF THE RESULTING ESTIMATES Controls Per Case Reliability of Resulting Odds Ratio (Relative to One Control Per Case Incremental Gain 1 1.00-2 1.33 33% 3 1.50 17% 4 1.60 10% 5 1.67 7% 6 1.71 4% 7 1.76 3%
Measurement of Exposure Measurement of Potential Confounders Retrospective or concurrent measurement issues in case control studies 1. Recall bias 2. Current biologic measurements may not reflect past exposures (exposures with short half lives) 3. Refusal rates do they vary by exposure status? 4. Use of existing records (hospital/medical records, occupational records etc). Are they accurate?
ANALYSIS ISSUES 1.Analysis Must Follow Particular Design 2. Major Analysis Issues for Case Control-Studies Is Exposure Quantitative or Categorical? If Categorical, Is it Measured at More Than Two Levels? Is Design Pair-Matched as Opposed to Unmatched or Frequency Matched? Is Main Outcome Trend or Overall Non-Specific Association? We Will First Discuss Analysis for Unmatched or Frequency Matched Studies, and then for Pair Matched Studies
NESTED CASE CONTROL STUDIES/ CASE-COHORT STUDIES Hybrid design. Identifies cases from cohort study. Suppose you have identified a number of subjects with disease y in your cohort study. Exposure x has been identified as a potential risk factor for disease y but you have not measured this exposure on your entire cohort. Exposure y is expensive to measure (i.e. laboratory analyses of serum collected at baseline, detailed occupational records must be abstracted), you decide to measure exposure in cases and a sample of the cohort without the disease. Two approaches: Case-cohort study: sample controls at baseline Nested case control study: sample controls at the time each case occurs. This matches cases and controls on duration of follow up and uses more information. In both methods, cases can be both controls and cases. Special statistical techniques are applied in these analyses. (Not covered in this course) When cases are excluded as controls, then the usual case control analyses can be conducted, with exposure odds ratios calculated.
Ex. Nurse s health study. Analyzed PCB in serum on breast cancer cases and a sample of controls. CROSS- SECTIONAL STUDIES SNAPSHOT of Exposure and Disease at a SINGLE point in time. Measures PREVALENCE, not incidence. Examples: In cohort studies, baseline measurements of exposure and disease represent cross sectional data. Community surveys of exposure and disease at a single point in time. Survey of current workplace exposures and current disease. CUMULATIVE PREVALENCE Survey of individuals that measures the occurrence of any current or prior disease in the person s lifetime. Measures of association: Prevalence rate ratio
Can use statistical approaches for incidence rate ratios.
Incidence Prevalence Bias in Cross Sectional Studies Ex. Survey of current smoking and emphysema Emphysema cases who keep smoking after diagnosis have shorter survival. So prevalent cases are less likely to be current smokers, underestimating risk from current smoking.
Ecologic Studies Use aggregate data, used primarily for hypothesis generation as opposed to hypothesis testing Examples of aggregate data: Disease rates (incidence, mortality, etc) Birth rates Exposure data: smoking rates, geographic residence, air pollution data, mean income, per capita consumption of saturated fats, proximity to nuclear power plants
Ecologic Fallacy Grouped data do not necessarily represent individual level data Example: Durkheim classic work Suicide Correlation between percent of population that is Protestant and suicide rates in 19 th century Assumes rates are highest in Protestants but what if minority Catholics within majority Protestant communities have highest suicide rates due to their social isolation? Also, information on confounders not usually available.
Epidemiology example Fat intake and breast cancer rates with countries as the unit of measurement have consistently been found to be highly correlated. But studies of individuals (cohort, case control studies) have not found any association with fat intake. Why? Possible reasons countries with high fat intake are more likely to have other risk factors associated with breast cancer (i.e. late age at first pregnancy) Or-- within population variability is low, but inter-population variability is high. i.e. Extreme example if everyone in a country had high fat intake, we would not be able to detect any excess because there would not be any population to compare them to with low fat intake
Ecological studies are useful for generation of hypotheses, supporting hypotheses, or for intervening at the population level. Rates of stomach cancer declined dramatically after the advent of refrigeration in the 1930s Supports studies showing risk of stomach cancer increases with consumption of nitrates in preserved foods (sausage, lunch meat etc) Smoking and lung cancer Oral cancer and snuff use in the southern US