Technical Issues in Aggregating and Analyzing Data from Heterogeneous EHR Systems



Similar documents
The PREDICT program: Implementing prospective pharmacogenetics for inpatient and outpatient clinical care

Electronic Medical Records and Genomics: Possibilities, Realities, Ethical Issues to Consider

Outcome Data, Links to Electronic Medical Records. Dan Roden Vanderbilt University

Incorporating Research Into Sight (IRIS) Essentia Rural Health Institute Marshfield Clinic Penn State University

Research Skills for Non-Researchers: Using Electronic Health Data and Other Existing Data Resources

Using EHRs for Heart Failure Therapy Recommendation Using Multidimensional Patient Similarity Analytics

Genomics and Health Data Standards: Lessons from the Past and Present for a Genome-enabled Future

Limited Pay Policy (L-222B) - Underwriting Guidelines

A Genetic Analysis of Rheumatoid Arthritis

TRANSLATIONAL BIOINFORMATICS 101

We have made the following changes to the Critical Illness events covered under our group critical illness policy.

SOUTH PALM CARDIOVASCULAR ASSOCIATES, INC. CHARLES L. HARRING, M.D. NEW PATIENT INFORMATION FORM. Patient Name: Home Address:

Introduction to genetic testing and pharmacogenomics

Medicare & Medicaid EHR Incentive Program Meaningful Use Stage 1 Requirements Summary.

Chapter 13: Mining Electronic Health Records in the Genomics Era

DISCLOSURES RISK ASSESSMENT. Stroke and Heart Disease -Is there a Link Beyond Risk Factors? Daniel Lackland, MD

INTRODUCTION Thrombophilia deep vein thrombosis DVT pulmonary embolism PE inherited thrombophilia

INTRODUCTION Thrombophilia deep vein thrombosis DVT pulmonary embolism PE inherited thrombophilia

Chronic Illness Benefit application form 2016

Demonstrating Meaningful Use Stage 1 Requirements for Eligible Providers Using Certified EMR Technology

Secondary Uses of Data for Comparative Effectiveness Research

Autoimmunity and immunemediated. FOCiS. Lecture outline

Big Data for Population Health

New Treatments for Stroke Prevention in Atrial Fibrillation. John C. Andrefsky, MD, FAHA NEOMED Internal Medicine Review course May 5 th, 2013

Atrial fibrillation: medicines to help reduce your risk of a stroke what are the options?

High-Throughput Phenotyping from Electronic Health Records for Research

Radiology Business Management Association Technology Task Force. Sample Request for Proposal

CLINICAL QUALITY MEASURES FINALIZED FOR ELIGIBLE HOSPITALS AND CRITICAL ACCESS HOSPITALS BEGINNING WITH FY 2014

Covers 60 major critical illnesses. Covers 11 minor critical illnesses. ManuMulti Care

Specialty Drug Care: Case management services in Quebec

Aubagio. Aubagio (teriflunomide) Description

The National Institute of Genomic Medicine (INMEGEN) was

Research Opportunities using the PaTH Network

Phoenix Remembrance Life

INITIATING ORAL AUBAGIO (teriflunomide) THERAPY

THE INTERNET STROKE CENTER PRESENTATIONS AND DISCUSSIONS ON STROKE MANAGEMENT

Dual Antiplatelet Therapy. Stephen Monroe, MD FACC Chattanooga Heart Institute

What You Need to Know About LEMTRADA (alemtuzumab) Treatment: A Patient Guide

Novel oral anticoagulant (NOAC) for stroke prevention in atrial fibrillation Special situations

Atrial Fibrillation, Chronic - Antithrombotic Treatment - OBSOLETE

Anticoagulation at the end of life. Rhona Maclean

Inpatient Anticoagulation Safety. To provide safe and effective anticoagulation therapy through a collaborative approach.

NO EXAM or LABS; To Age 60, $1,000,000 (4/22/14)

Big Data for Population Health and Personalised Medicine through EMR Linkages

Developing VA GDx: An Informatics Platform to Capture and Integrate Genetic Diagnostic Testing Data into the VA Electronic Medical Record

PHARMACOLOGICAL Stroke Prevention in Atrial Fibrillation STROKE RISK ASSESSMENT SCORES Vs. BLEEDING RISK ASSESSMENT SCORES.

New Oral Anticoagulants. How safe are they outside the trials?

Patient Information. Name: Social Security Number: Birth date: Address: Phone #: House: Cell: Work: Primary Care Physician: Address:

A Multi-locus Genetic Risk Score for Abdominal Aortic Aneurysm

A Patient s Guide to Primary and Secondary Prevention of Cardiovascular Disease Using Blood-Thinning (Anticoagulant) Drugs

Sovaldi (sofosbuvir) Prior Authorization Criteria

Group 2: Critical Illness Benefits

TravelCare Medical Questionnaire Instruction Sheet for Agents

Secondary Stroke Prevention Luke Bradbury, MD 10/4/14 Fall WAPA Conferfence

(Please fill this out to the best of your ability) Baker Eye Institute Conway, Arkansas NAME: Today s Date:

Asteron Life Business Insurance

MCHENRY WESTERN LAKE COUNTY EMS SYSTEM OPTIONAL CE ADVANCED LEVEL (EMTP, PHRN, ECRN) August Anticoagulants

Stage 1 Meaningful Use for Specialists. NYC REACH Primary Care Information Project NYC Department of Health & Mental Hygiene

Principal Accelerated Underwriting SM. Program Overview

How To Get A Chronic Illness Benefit From The Discovery Health Medical Scheme

EMA and Progressive Multifocal Leukoencephalopathy.

PT CordLife Indonesia Premium Cordblood Bank. PT CordLife Indonesia Premium Cordblood Bank

SUPER CARE CRITICAL ILLNESS PROTECTOR

Co-pay assistance organizations offering assistance

Long term anticoagulant therapy in patients with atrial fibrillation at high risk of stroke: a new scenario after RE-LY trial

The Human Genome Project. From genome to health From human genome to other genomes and to gene function Structural Genomics initiative

The Future of the Electronic Health Record. Gerry Higgins, Ph.D., Johns Hopkins

WL TERM * Addition of Coverage IUL IUL Increase Reinstatement *Child/Grandchild Policy not available with TERM

Future of clinical research in the EMR era: Phenome- Wide Associa:on Studies (PheWAS)

Can You Purchase Life Insurance If You

Asteron Life Personal Insurance

RAW PREVALENCE FOR NORTHERN IRELAND AS AT 31 MARCH 2014

WELCOME PATIENT CONDITION

Health Information Form for Adults

Risk Adjustment Factor (RAF) RADV June 1 st 2016

TALLAHASSEE EYE CENTER

SECTION I: ACTIVE DIAGNOSES

New Anticoagulants and GI bleeding

Health Information Form for Adults

Psychiatrists and Reporting on Meaningful Use Stage 1. August 6, 2012

Closed Sub-TOI: L Life - Other Co Tr Num: BANRD-01 State Status: Approved-Closed

Quiz 4 Arrhythmias summary statistics and question answers

ORANGE COUNTY EYE INSTITUTE

EXPANDING THE EVIDENCE BASE IN OUTCOMES RESEARCH: USING LINKED ELECTRONIC MEDICAL RECORDS (EMR) AND CLAIMS DATA

Transcription:

Technical Issues in Aggregating and Analyzing Data from Heterogeneous EHR Systems Josh Denny, MD, MS josh.denny@vanderbilt.edu Vanderbilt University, Nashville, Tennessee, USA 2/12/2015

EHR data are dense 196,693 individuals in an EHR DNA Biobank (BioVU) Mean follow up 5.7 yrs Distinct ICD9 codes 19 million Labs 121 million Distinct labs 5948 Avg labs/patient 662 Drugs 122 million Notes 26 million (average 132 notes/individual) Radiology tests 2 million

Approach to EHR phenotyping Identify phenotype of interest Case & control algorithm development and refinement PPV<95% Manual review; assess precision PPV 95% Deploy at site 1 Validate at other sites Genetic association tests; replicate Extant Genotypes

What we ve learned Finding phenotypes in the EMR Billing codes ICD9 & CPT Clinical Notes (NLP - natural language processing) True cases Medications eprescribing & NLP Labs & test results NLP

Finding cases: Rheumatoid Arthritis Definite Cases (algorithm-defined) Possible Cases (require manual review) Excluded (algorithm-defined) Controls (algorithm-defined) 255 507 7121 1184 Optional Manual Review Analysis

Replicating known studies in the EHR disease marker gene / region published observed Atrial fibrillation rs2200733 rs10033464 Chr. 4q25 Chr. 4q25 rs11805303 IL23R rs17234657 Chr. 5 Crohn's disease rs1000113 Chr. 5 rs17221417 NOD2 rs2542151 PTPN22 Multiple sclerosis rs3135388 rs2104286 DRB1*1501 IL2RA rs6897932 IL7RA rs6457617 Chr. 6 Rheumatoid arthritis rs6679677 RSBN1 rs2476601 PTPN22 rs4506565 TCF7L2 rs12255372 TCF7L2 rs12243326 TCF7L2 Type 2 diabetes rs10811661 rs8050136 CDKN2B FTO rs5219 KCNJ11 rs5215 KCNJ11 rs4402960 IGF2BP2 Am J Hum Genet. 2010;86:560 72. 0.5 1.0 2.0 5.0 Odds Ratio

Discovery science in emerge Algorithms can be deployed across multiple EMRs Analyses can be performed using extant data Am J Hum Genet. 2011;89:529-42

Completed emerge GWAS Diseases Dementia Cataracts Autoimmune Hypothyroidism Diverticulosis/diverticulitis Type 2 Diabetes Diabetic retinopathy Herpes zoster PheWAS Peripheral Arterial Disease Venous Thromboembolism Glaucoma Ocular hypertension Abdominal Aortic Aneurysm Colon polyps bold=gwas completed with significant results Endophenotypes PR Duration QRS Duration HDL/LDL height white blood cell counts red blood cell counts Cardiorespiratory Fitness ESR levels Platelet levels Pharmacogenomic phenotypes ACE inhibitor cough Heparin induced thrombocytopenia Resistant hypertension Drug Induced Liver Injury C. difficile colitis Selected consortia contributions Height QTc Rheum. Arthritis Myocardial Infarction Genetics Consortium Intl. Mult Sclerosis Genet. Consort. Genomic Investigation of Statin Therapy

85 phenotypes from emerge, PGRN, PCORnet 47 have validation data 118 total implementations

Hypothyroidism algorithm

Performance of 88 Phenotype Algorithms in PheKB 100% Positive Predictive Value 80% 60% 40% 20% Drug-induced liver injury Site Implementations Median 0% Primary site Secondary sites Positive Predict Value

The genome wide association study Target phenotype Example new PheWAS associations for IRF4 Known: hair, skin, eye color association P value chromosomal location The phenome wide association study Target genotype association P value diagnosis code PheWAS requirement: A large cohort of patients with genotype data and many diagnoses

Studying drug responses with GWAS Only about 120,000 samples at time of study underpowered for many rare outcomes 90% participated in >1 study Bowton et al., Sci Trans Med. 2014 Phenotype Cases Controls Clopidogrel in CV disease 225 468 Warfarin stable dose 1,167 N/A Early Repolarization 544 2,609 Vancomycin stable dose 1,067 N/A C. difficile colitis 941 1,710 Anthracycline cardiomyopathy 528 N/A Guillain-Barre Syndrome 97 6,536 Heart Transplant 181 N/A Kidney transplant 1,078 N/A Clopidogrel in strokes/tias 6 123 Statin-related myopathy 11 4,342 Heparin-induced thrombocytopenia 73 2,300 CV events with COX2 therapy 85 395 Serious bleeding during warfarin 259 276 Amiodarone toxicity (lung, thyroid) 97 343 Chronic inflammatory polyneuropathy 12 14,000* Rheumatic Heart Disease 108 3,464 ACEi cough 1,174 978 Fluoroquinolones and tenopathy 87 537 Warfarin stable dose in children 92 N/A Metformin efficacy 80 N/A Metformin and cancer 619 421 Bisphosphonates and Atypical Fracture/Jaw Osteonecrosis 16 1,454 Wolff-Parkinson-White 197 5,551 Steroid-induced Osteonecrosis 83 352 Shellfish Anaphylaxis 157 14,000* Aspirin Anaphylaxis 101 4,334 Bell's Palsy # 577 14,000*

Strengths Rich, longitudinal data stores Ability to go back to the chart to find out more Research quality phenotypes available via algorithms Potential for closed loop discovery and implementation Expensive testing available for free Ability to explore rare, detailed, drug response, and mortal phenotypes Samples easily reused for many studies

Challenges Developing algorithms takes time and people, and then implementation requires local expertise EHR data can be inaccurate, heterogeneous, unavailable, lack organization, have different storage structures Fragmentation between healthcare systems Mining of EHR data is not trivial (though improving): text data, duration and temporality

How do you share genetic data? Site 1 Site 1 Site 5 Site 2 Site 5 Coordinating Center Site 2 Site 4 Site 3 Site 4 Site 3 Edges (unique DUAs): n(n 1)/2 = 10 Edges: n = 5 10 sites = 45 vs. 10 20 sites = 190 vs. 20 30 sites = 435 vs. 30

Kaiser Permanente Network DNA samples GWAS emerge 361k 51k (100k) Million Veterans Program 350k 200k Kaiser Permanente 300k 100k Total >1 million >351k : pediatric sites Coordinating Center