Research Opportunities using the PaTH Network DBMI Colloquium Chuck Borromeo Oct. 30, 2015 PaTH is funded through the Patient Centered Outcomes Research Institute (PCORI)
PaTH Network Goals Build network connecting EMR data from 6 sites Develop Computable Phenotypes Improve clinical outcomes through patient centered research Extend an investigator s analytic power
Analytic Power Analytic Power: the amount of data one * can access to answer a research question * Access means research ready data at minimal additional cost.
Analytic Power Katie is researcher. She has new research grant analyzing EMR data. She does not have the expertise or contacts to collect the necessary EMR data at her local institution. Katie EMR data negatively affect budgets. You must spend money for data acquisition, data storage, data harmonization, etc. Can you analyze a large dataset without sacrificing your grant budget?
Analytic Power Utah Geisinger UPMC PSU Temple Katie JHU She PaTHfinds allows the her EMR to data compare at her her institution. data with similar Analytic data Power= found institutional at 5 other institutions. (max n=2m) Analytic Power= network (max n=11m)
Analytic Power PaTH PaTH Katie PaTH part of a larger ecosystem called PCORI. PCORI is a nationwide research organization. PCORI allows her to conduct her research using data from across the US. Analytic Power = national (max n=90m)
Achievements Develop a new research network: PaTH Create regional research network (JHU, Temple, UPMC, PSU, Utah*, Geisinger*) Connect regional network to national network (PCORnet) Establish a data exchange methodology through Common Data Elements Computable phenotypes to assist with patient accrual Deployed Patient Reported Outcome (PRO) surveys using Epic and REDCap * Phase II sites
Aggressive Timeline Start: Mar 2014 End: Sep 2015 Phase 1: 18 months Research Areas Cross Institutional Research Idiopathic pulmonary fibrosis (IPF) Deploy a regional network Atrial Fibrillation Supply data to a national network Weight Deploy PROs Transform EMR data Meaningful Use 2 Common Data Elements
Current Timeline Start: Sept 2015 End: Sept 2018 Phase 2: 36 months Answer Research Questions Answer queries from PCORI Maintain data for use by national network Develop Self-sufficient Network Recruit new studies Create funding model Data Quality Improve data quality
Tale of Two Networks CDRN #11 Geisinger JHU PSU Temple UPMC Utah Regional Research Network Facilitate grant applications Govern access to PaTH Network Network infrastructure: i2b2/shrine CDRN CDRN CDRN #8#1 #1 10 CDRN 10 #1 CDRN #4 CDRN #6 10 CDRN #12 CDRN #13 CDRN #10 CDRN #7 13 Nationwide Research Networks Conduct large scale studies Fund Studies and Sustainability Network infrastructure: PopMedNet
Computable Phenotypes Computable Phenotypes allow researchers to target patients for research projects Partial example: Nichols GA, Desai J, Elston Lafata J, et al. Construction of a Multisite DataLink Using Electronic Health Records for the Identification, Surveillance, Prevention, and Management of Diabetes Mellitus: The SUPREME DM Project. Preventing Chronic Disease. 2012;9:E110. doi:10.5888/pcd9.110311.
Problems = Research Opportunities Computable Phenotypes Data Loss Data Interpretation Data Quality Organizational Challenges Mapping Issues
Issues with Computable Phenotypes Clinician: I want all the diabetes patients. Here is a computable phenotype with 3 criteria: diagnoses, medications, and lab results. Data Analyst: I extracted the data. There are between 9,441 and 49,613 patients. Clinician: What do you mean by between? Richesson RL, Rusincovitch SA, Wixted D, et al. A comparison of phenotype definitions for diabetes mellitus. Journal of the American Medical Informatics Association : JAMIA. 2013;20(e2):e319 e326. doi:10.1136/amiajnl 2013 001952.
Issues with Computable Phenotypes Clinicians do not understand data representation limitations in EMRs Clinicians do not understand the data quality issues in the EMR Different contexts: retrospective (deceased/alive) vs. prospective (alive only) Ethical Issues False positives send clinical trial notices to people who do not have a disease False negative fail to recruit everyone who has a disease
Data Loss Visit Date: 4/5/2006 HISTORY OF PRESENT ILLNESS: Mr. Smith is a 66-year-old gentleman with hypertension ICD9:401.9 and hypercholesterolemia. ICD9:272.0 He reports he is overall doing well. He is not having any trouble with his medications. His blood LOINC:35094 2 pressures generally have been running around 140/90 at home. He has been having low back pain for the past 6 months. It is more on the right side and occurs intermittently throughout the day. No known initial precipitating event. He denies any radiation down his legs, any fevers, night sweats, weight loss, or bowel or bladder problems. Of note, he CPT:77401 did have significant radiation in this area many years ago for treatment ICD9:173.51 for skin cancer. He is not getting much exercise. ICD9:724.2 Source: http://www.med.unc.edu/medselect/resources/sample notes/sample chronic issues note 2
Data Loss Not Bidirectional Clinical Narrative Visit Date: 4/5/2006 HISTORY OF PRESENT ILLNESS: Mr. Smith is a 66 year old gentleman with hypertension and hypercholesterolemia. He reports he is overall doing well. He is not having any trouble with his medications. His blood pressures generally have been running around 140/90 at home. He has been having low back pain for the past 6 months. It is more on the right side and occurs intermittently throughout the day. No known initial precipitating event. He denies any radiation down his legs, any fevers, night sweats, weight loss, or bowel or bladder problems. Of note, he did have significant radiation in this area many years ago for treatment for skin cancer. He is not getting much exercise Delta EMR Representation ICD9:401.9 ICD9:272.0 LOINC:35094 2 ICD9:724.2 CPT:77401 ICD9:173.51 Source: http://www.med.unc.edu/medselect/resources/sample notes/sample chronic issues note 2
Data Interpretation Found within Patient 123 s EMR: Found within Patient 456 s EMR: ICD9 410.9 ICD9 410.9 Problem List Scratchpad for clinician Suspected Diagnoses Not always cleaned up Context Matters Is ICD9 410.9 = ICD9 410.9? ICD9 410.9 is Acute Myocardial Infarction Billing Code Payment to the hospital Upcoding Unbundling
Data Quality PaTH Operates at large scale Difficult to manage data quality without specific disease question Only apply generalizable rules independent of specific disease (ex: blood pressure between 0 and 300) Phase II budget includes chart reviews Overall problem: Clinicians not required to correct erroneous EMR data
Michael Kahn s Data Quality Framework 1. Attribute domain constraints: Data value anomalies for individual variables, including distributions, units, and missingness. These checks identify values and distributions inconsistent with expectations (e.g., a high proportion of individuals over 120 y old). 2. Relational integrity rules: Compare elements from one data table to related elements in another data table (e.g., every person identifier in the visit table must have a record in the demographic table). 3. Historical data rules: Temporal relationships and trend visualizations to identify data gaps, unusual patterns, and dependencies across multiple data values and variables (e.g., utilization trends can identify shifts in data capture). Slide Courtesy of Shyam Visweswaran
Michael Kahn s Data Quality Framework 4. State-dependent objects rules: Extends temporal data assessment to include logical consistency (e.g., a series of prenatal ultrasounds should precede a pregnancy outcome). 5. Attribute dependency rules: Examine conditional dependencies based on knowledge of a clinical scenario (e.g., women should not have a diagnosis of prostate cancer). Slide Courtesy of Shyam Visweswaran
Mapping Issues Granularity differences Incomplete coverage Different authors Clinicians Informaticians Different Purposes Billing vs Research Temporal Issues (ex: NDC recycles codes over time)
Organizational Challenges Competing priorities At each site PCORI priorities Develop artifacts (documents/software) Arrive at decisions
Current Large Scale PCORI Research Projects Aspirin Dosing: A Patient-Centric Trial Assessing Benefits and Long-term Effectiveness (ADAPTABLE)* Duke PCORnet Bariatric Study* Group Health PCORnet Obesity Observational Study: Short- and Long-term Effects of Antibiotics on Childhood Growth Harvard Pilgrim * PaTH is participating in these studies
Summary Goals of Research Networks: Access data they need in a cost effective manner (analytic power) Recruit patients for clinical trials Challenges of Research Networks: Relying on questionable data quality Converting data into usable formats
Team
Research Areas Clinical Technical Clinical Research Questions Patient Reported Outcomes NLP Data Quality Terminology Mapping Clinical Decision Support* Collaborative Science* Data Visualiz ation* Software * Not directly researched in PaTH