Research Skills for Non-Researchers: Using Electronic Health Data and Other Existing Data Resources James Floyd, MD, MS Sep 17, 2015 UW Hospital Medicine Faculty Development Program
Objectives Become more familiar with how to: 1) Transform an area of clinical interest into a research question 2) Leverage existing data (including electronic health data) to answer a research question 3) Access appropriate resources to complete a project using existing data
General internist My Background T32 in cardiovascular epidemiology Research interests: drug safety, cardiovascular epidemiology, pharmacogenomics and other omics Research settings: NIH-funded cohort studies: Cardiovascular Health Study Large-scale consortia: Cohorts for Heart and Aging Research in Genetic Epidemiology (CHARGE) Electronic health data: Group Health, VHA
What do you need for a successful research project? Mentorship/guidance Understanding of what is already known (literature review) Clearly-defined question Access to data Analytic support - more than just statistics Study design (e.g., cohort, case-control, cross-sectional) Appropriate controls Working with data
Types of Existing Data Resources Results of published studies Publically available datasets Drugs: FDA AERS Claims: National Inpatient Sample/CHARS, Medicare, Medicaid Surveys (CDC): NHANES, BRFSS, NHCS, NHAMCS Existing data from established studies: RCTs, NIH-funded prospective cohort studies, registries Publically-available datasets vs working with study investigators Electronic health data ( administrative data )
Types of Research Using Existing Data Case reports Systematic reviews Replication of other findings Original clinical or epidemiologic studies
Why Work with Electronic Health Data? RCTs and prospective observational studies may be unethical or impracticable Ability to study large populations quickly, with lower costs
Electronic Health Data: Limitations Study population well-defined? Completeness of data Exposure medications purchased out of pocket? Confounders laboratory and diagnostic tests available? Outcomes closed health system? Accuracy of data Validation studies sometimes needed Suitable for the study question? Descriptive (trends, utilization) vs analytic (causal inference)
Example 1: Group Health Use of administrative data to estimate the incidence of statin-related rhabdomyolysis. JAMA 2012;307:1580-2. PMID 22511681 Difficult to identify rare ADRs, ICD-9 code for rhabdomyolysis in 2006 SEARCH trial - risk of muscle injury from simvastatin 80mg/day Aims: (1) evaluate accuracy of ICD-9 code for statin-related rhabdo and (2) estimate incidence from various statins and doses Study Design existing data and new data collection Statin users with ICD-9 codes for rhabdo, CK level, rhabdo in text of EHR Validated cases with review of EHR: CK levels > 10x ULN, no other causes Pharmacy records for population exposure to each drug and dose Results: PPV 8%, 12-fold risk for simvastatin 80mg vs 20mg
Example 1: Group Health IRB approval Work with programmer Estimate person-time for each statin and dose from all GH statin prescriptions Identify encounters with ICD-9 codes for rhabdo and muscle injury Identify CK levels > 1000 from lab database Develop case definition/algorithm for validation Train abstractor Natural language processing to search EHR text for missed cases Statistical analysis
Advantages Example 1: Group Health Stable enrollment Outpatient lab tests Pharmacy database: >95% receive meds from GH pharmacies Availability of EHR for efficient validation of cases Disadvantages Hospitalization records not always available (outside hospitals) Costs
Example 2: Group Health Case-control study of second-line therapies for type 2 diabetes in combination with metformin and the comparative risks of myocardial infarction and stroke. Diabetes Obes Metab. Published online July 14 2015. PMID 26179389 For type 2 DM, CV risks from 2 nd -line drugs after metformin unknown GRADE trial will not be complete until 2020, underpowered Aim: Evaluate MI and stroke risks for SU vs INS among MET users Study Design - existing data from previous study (HVH) Ongoing population-based case-control studies of MI and stroke since 1980s Information on risk factors, medication use from databases and chart reviews Meta-analysis with another observational study to improve precision Results: RR 0.92 (95% CI 0.69-1.24) no large difference in risk
Example 2: Group Health IRB approval study already approved Work with programmer Created new analytic drug variables for DM drugs from pharmacy data MI and stroke cases already validated Detailed information on confounding variables already collected Analytic dataset already available Statistical analysis
Example 3: Cardiovascular Health Study Variation in resting heart rate over 4 years and the risks of myocardial infarction and death among older adults. Heart. 2015;101:132-8. PMID 25214500 Resting heart rate (RHR) a prognostic marker for CVD and death JAMA 2011;306:2579-2587 10 year increase in RHR associated with CV disease and death Lancet 2010;375:938-948 BP variation important for stroke risk Study Design existing study (CHS) Population-based prospective cohort study of risk factors for CV disease in elderly 5888 community-dwelling adults aged > 65 recruited from four sites, 1989-1993 Detailed information on risk factors, standardized measurements, diagnostic tests (ECG), biomarkers, blood samples at baseline and annual study visits Adjudication of cardiovascular events and death during follow up (ongoing)
Example 3: Cardiovascular Health Study Mar 2012: submitted proposal to CHS P&P Committee Jun 2012: revised proposal approved Aug 2012: analytic dataset received Sep 2012 to Jan 2013: conducted analyses, feedback on results Early to mid 2013: paper drafted, revised, approved by P&P Late 2013: rejected by 3 journals Mar 2014: rejected by 4 th journal Unfortunately, a substantively similar analysis performed by another group was recently published in the European Journal of Preventive Cardiology. Although the current analysis provides some additional depth, the main findings are essentially identical, and we are therefore unable to consider your manuscript for publication in the Journal
Example 3: Cardiovascular Health Study Advantages High-quality data collected for research purposes Long-term follow up with adjudicated outcomes Opportunity to work with network of experienced investigators Availability of analytic datasets Disadvantages If data are publically available, others can conduct similar studies
Feel free to contact me with questions: jfloyd@uw.edu