National Human Genome Research Institute Electronic Phenotyping and Genomic Research National Institutes of Health U.S. Department of Health and Human Services National Institutes of Health National Human Genome Research Institute U.S. Department of Health and Human Services Teri A. Manolio, M.D., Ph.D. Director, Office of Population Genomics Senior Advisor to the Director, NHGRI, for Population Genomics September 23, 2011
EMR Phenotyping and Genomic Research Why is NHGRI involved in EMRs? Electronic phenotyping efforts emerge Network Other EMR systems Potential relationship to pharmacologic research Privacy and confidentiality concerns with EMRs
NHGRI and EMR Why Now? Larson, G. The Complete Far Side. 2003.
Genomic Medicine: On the Threshold? Making genomics-based diagnostics routine Defining genetic components of disease Characterization of cancer genomes Practical systems for clinical genomic informatics Role of microbiome in health and disease Green ED, Guyer MS. Nature 2011; 470:204-13.
Understanding the Structure of Genomes Five Domains of Genomic Research Understanding the Biology of Genomes Understanding the Biology of Disease Advancing the Science of Medicine Improving the Effectiveness of Healthcare? Green ED, Guyer MS. Nature 2011; 470:204-13.
Numerous biorepositories springing up in hospital systems across country Unclear if samples, data, consent adequate for genomic research and data sharing No community-wide standards or best practices for conducting such research No genomic or genetic research using EMRdefined phenotypes is this even possible? Develop approaches for potential large-scale US study using EMR-defined outcomes
Dementia HDL-C, Cataracts Obesity Complications Renal Disease Peripheral Arterial Disease T2DM Hypothyroidism Resistant Hypertension Coordinating Center QRS Duration Courtesy, R. Li
EMRs and Repositories at 5 emerge I Sites Institution Primary Phenotype Repository Size; Ethnicity Group Health Dementia ~4000; >96% EA EMR Description Vendor-based EMR since 2004 Phenotyping Methods SDE, mining free text, manual review Marshfield Clinic Cataracts, HDL-C ~20,000; 98% EA Internal EMR since 1985 SDE, NLP, ICR Mayo Clinic PAD 3,500; >96% EA Internal EMR since 1995 SDE, NLP Northwestern University T2DM 9,200; 12% AA 8% HA Vendor-based EMR since 2000 SDE, free-text searches Vanderbilt University QRS Duration 100,000; 11% AA Internal EMR since 2000 SDE, NLP ICD9 = Ninth International Classification of Diseases ICR = Intelligent Character Recognition NLP = Natural Language Processing. SDE = Structured data extraction = retrieving data that have been stored in a predefined format Kho AN et al., Sci Transl Med 2011; 3:79re1.
AMIA Annu Symp Proc 2009; 497-501. Arthritis Care Res 2010; 62:1120-7. Nat Rev Genet 2010; 12:417-28. Clin Pharm Ther 2011; 89:379-86. Psychol Med 2011; 20:1-10.
Pacheco J et al., AMIA Annu Symp Proc 2009; 497-501. Identification of Asthma Cases and Controls NUgene Population N = 7,970 Asthma Dx on > 1 visit N = 521 (6.5%) > 1 visit with any Dx & Rx on diff. visit N = 6,137 > 2 different visits with any Dx N = 251 No Dx but any Rx on > 2 diff. visits N = 469 Asthma Rx on > 1 other visit N = 452 (5.7%) Asthma Dx on > 1 other visit N = 12 (0.2%) No Dx for any respiratory disease or specific cancers N = 4,620 (53.5%) No other chronic lung disease Dx on > 2 visits N = 389 (4.9%) No reported smoking hx > 10 pk-yrs N = 338 (4.2%) Asthma Cases No prescribed Rx for asthma/copd or immunosuppressant meds N = 3,398 (42.6%) No reported smoking hx > 10 pk-yrs N = 2,908 (36.5%) Asthma Controls
Clinical vs EMR Phenotyping Clinical standard: One data category Diabetes by lab tests alone PAD by single radiology test EMR standard: Multiple categories Diagnostic information Medication Laboratory tests Covariates and exclusion criteria Kho AN et al., Sci Transl Med 2011; 3:79re1.
Data Categories for Clinical and EMR Phenotyping Primary Phenotype Dementia Cataracts PAD Clinical Gold Standard Demographics, clinician exam, histopathology Clinician exam notes Vascular test results (ABI, arteriography) EMR Derived Phenotype Diagnoses, meds Diagnoses, prcd codes Diagnoses, prcd codes, meds, vascular results T2DM Lab tests Diagnoses, lab tests, meds QRS Duration ECG measurements Kho AN et al., Sci Transl Med 2011; 3:79re1. ECG report results Covariates and Exclusions Demographics, lab tests, radiology reports Demographics, medications Demographics Demographics, lab tests, ht/wt, family hx, smoking hx Demographics, diagnoses, prcd codes, meds, lab
Data Completeness by Type and Site Diagnoses Meds Allergies Family Hx 0 20 40 60 80 100 20 40 Percent (%) 60 80 100 GHC MCRF Mayo NU VU Bar 6 Kho AN et al., Sci Transl Med 2011; 3:79re1.
emerge Electronic Phenotype Development Process Clinician versed in disease classification and informatics works with disease specialists to define clinically relevant features Clinician works with IT staff to translate features into extractable data elements, identifying: Where elements stored, supplement if needed How coded, particularly standard ontologies Extract set of cases and review by clinician Refine algorithm, re-extract, re-review, re-peat Transport code and pseudocode to other sites
Hypothyroidism: Initial Algorithm No thyroid-altering medications (e.g., Phenytoin, Lithium) 2+ non-acute visits in 3yrs ICD-9s for Hypothyroidism Abnl TSH or FT4 No ICD-9s for Hypothyroidism No Abnl TSH or FT4 Thyroid replacement meds Antibodies to TTG or TPO No thyroid replacement meds No Antibodies for TTG/TPO No secondary causes (e.g., pregnancy, ablation) Case 1 Case 2 No hx of myasthenia gravis Control Courtesy, J Denny
Hypothyroidism Algorithm Implement algorithm at all sites concurrently Sites share implementation strategies, refine Sites evaluate algorithm through physician and/or trained chart abstractor review of random sample Iteratively refine until PPV > 95% Courtesy, J Denny
Courtesy, J Denny Hypothyroidism: Validation Methods Chart review instrument Iteratively developed Incorrectly scored some true cases as false positives since many Hashimoto patients have thyroid enlargement at active stages of disease Applied locally Random sample of charts 50 cases 50 controls Gold standard used to calculate PPVs
Hypothyroidism: Validation PPV (Case) and NPV (Control) from Chart Review Site EMR-based Cases/Controls No. Sampled Cases/Controls PPV NPV Group Health 310/1,223 50/50 92% 92% Marshfield 592/649 50/50 88% 96% Mayo Clinic 293/2,905 100/100 76% 94% Northwestern 97/516 50/50 88% 100% Vanderbilt 185/1,476 50/50 90% 98% All sites 1,477/6,769 300/300 87% 95% Courtesy, J Denny
Larson, G. The Complete Far Side. 2003.
Hypothyroidism: Revised Algorithm No thyroid-altering medications (e.g., Phenytoin, Lithium) 2+ non-acute visits in 3yrs ICD-9s for Hypothyroidism Abnl TSH or FT4 No ICD-9s for Hypothyroidism > No 1 Nl Abnl TSH, No TSH Abnl or FT4 TFT Thyroid replacement meds > 3 months Antibodies to TTG or TPO No thyroid replacement meds No Antibodies for TTG/TPO No secondary causes (e.g., Graves, ablation) Case 1 Case 2 No hx of myasthenia gravis Control Courtesy, J Denny
Primary Hypothyroidism: Algorithm Description Case Definition: inclusion/exclusion codes Time-dependent exclusion codes Case lab names, values Case medications Control Definition: inclusion/exclusion codes Control lab names, values Pregnancy exclusion codes Radiation exposure Thyroidectomy codes Thyroid medication codes https://www.mc.vanderbilt.edu/victr/dcc/projects/acc/index.php/library_of_ Phenotype_Algorithms
Hypothyroidism Pseudocode 5 Pages https://www.mc.vanderbilt.edu/victr/dcc/projects/acc/index.php/library_of_ Phenotype_Algorithms
Revised Hypothyroidism Algorithm PPV (Case) and NPV (Control) from Chart Review Site EMR-based Cs/Cnt No. Sampled Cs/Cnt PPV (%) NPV (%) Old New Old New Group Health 430/1,188 50/50 92 98 92 100 Marshfield 509/1,193 50/50 88 91 96 100 Mayo Clinic 250/2,145 100/100 76 97 94 100 Northwestern 103/516 50/50 88 98 100 100 Vanderbilt 184/1,344 50/50 90 98 98 100 All sites 1,421/6,362 300/300 87 96 95 100 Courtesy, J Denny
GWA Scan of EMR-Defined Hypothyroidism Thyroiditis Nontoxic nodular goiter Pernicious anemia Atrial flutter Nutritional deficiency anemia Hemorrhoids Nontoxic multinodular goiter Simple goiter Nat Genet 2009; 41:460-4. Hashimoto s thyroiditis Thyroid cancer Abnormal TFTs Thyrotoxicosis Deficiency of B-vitamins Iatrogenic hypothyroidism Benign thyroid neoplasm Graves disease Denny J et al., Am J Hum Genet, in press.
EMR Phenotyping and PGx Research? Billing dx code major depressive disorder Classified for current mood state 34 NLP terms for depressed ( stress, recurrent, mood anxious ) Used to classify as rx-resistant vs rx-responsive Perlis RH et al., Psychol Med 2011 Jun 20; 1-20 [Epub ahead of print].
NLP vs Billing Codes for Depressed Mood Perlis RH et al., Psychol Med 2011 Jun 20; 1-20 [Epub ahead of print].
Mayo Clinic Psychiatric Pharmacogenetics http://www.assurerxhealth.com/genesightrx
EMR-Defined Phenotypes for Adverse Reactions to Drugs Thrombocytopenia > 2 platelet counts < 150K/ul Suspect drugs received within 30d before or 3d after low counts Drugs: anticoagulant, antiplatet, statin Neutropenia ANC < 1500-1000-500 cells/ul Drugs: antithyroid, macrolides Drug-Induced Liver Injury Under development Courtey, C. Chute
Privacy Threats in EMR Data Unusual diagnosis codes or combinations of codes may permit attacker with access to individual s EMR to identify their genomic data Mitigate risk by modifying codes so clinical profiles can only be linked to groups of pts: Generalizing codes within ICD coding hierarchy (cancer of head, body or tail of pancreas) while maintaining utility of data Suppressing codes that can t be generalized Limiting to disease codes useful for GWA studies reduces risk further, though can t foresee all GWA uses Loukides G et al., PNAS 2010; 107:7898-903.
Larson, G. The Complete Far Side. 2003.
Phenome-Wide Scanning with EMR Data Denny J et al., Bioinformatics, 2010; Mar 24.
Defining Phenotypes from EMR Data Ritchie M et al., Am J Hum Genet 2010; 86:560-72.