Next-generation Phenotyping Using Interoperable Big Data

Similar documents

Learning from observational databases: Lessons from OMOP and OHDSI

How to extract transform and load observational data?

Open-Source Big Data Analytics in Healthcare

Achilles a platform for exploring and visualizing clinical data summary statistics

Building patient-level predictive models Martijn J. Schuemie, Marc A. Suchard and Patrick Ryan

Meaningful use. Meaningful data. Meaningful care. The 3M Healthcare Data Dictionary: Standardizing lab data to LOINC for meaningful use

Environmental Health Science. Brian S. Schwartz, MD, MS

ADVANCING MEASUREMENT OF PATIENT- CENTERED OUTCOMES AND QUALITY METRICS WITH ELECTRONIC HEALTH RECORDS

Big Data and Graph Analytics in a Health Care Setting

Interoperability and Analytics February 29, 2016

Meaningful Use Stage 2 Certification: A Guide for EHR Product Managers

Practical Implementation of a Bridge between Legacy EHR System and a Clinical Research Environment

Tertiary Use of Electronic Health Record Data. Maggie Lohnes, RN, CPHIMS, FHIMSS VP Provider Relations Anolinx, LLC October 26, 2015

Bench to Bedside Clinical Decision Support:

Big Data and CancerLinQ

From Fishing to Attracting Chicks

Medical Informatic Basics for the Cancer Registry

Meaningful use. Meaningful data. Meaningful care. The 3M Healthcare Data Dictionary (HDD): Implemented with a data warehouse

Research Skills for Non-Researchers: Using Electronic Health Data and Other Existing Data Resources

An Essential Ingredient for a Successful ACO: The Clinical Knowledge Exchange

ICD-9-CM to MedDRA Mapping How Well Do the. Disclaimer

Overview of Vital Records and Public Health Informatics in CDPH

Delivering the power of the world s most successful genomics platform

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

Tackling the Semantic Interoperability challenge

Find the signal in the noise

HL7 and Meaningful Use

National Cancer Institute

TRANSFoRm: Vision of a learning healthcare system

3M Health Information Systems

SOLUTION BRIEF. IMAT Enhances Clinical Trial Cohort Identification. imatsolutions.com

HL7 Clinical Genomics and Structured Documents Work Groups

Big Data and Healthcare Payers WHITE PAPER

Digital Health: Catapulting Personalised Medicine Forward STRATIFIED MEDICINE

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

Interpretation of Laboratory Values

Extreme Makeover - ICD-10 Code Edition: Demystifying the Conversion Toolkit

Clinical and research data integration: the i2b2 FSM experience

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

A Population Based Risk Algorithm for the Development of Type 2 Diabetes: in the United States

Employing SNOMED CT and LOINC to make EHR data sensible and interoperable for clinical research

MED 2400 MEDICAL INFORMATICS FUNDAMENTALS

Genomics and Health Data Standards: Lessons from the Past and Present for a Genome-enabled Future

> Semantic Web Use Cases and Case Studies

THE STIMULUS AND STANDARDS. John D. Halamka MD

Public Health and the Learning Health Care System Lessons from Two Distributed Networks for Public Health

Creating a Hybrid Database by Adding a POA Modifier and Numerical Laboratory Results to Administrative Claims Data

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World

Improving EHR Semantic Interoperability Future Vision and Challenges

Integration of Genetic and Familial Data into. Electronic Medical Records and Healthcare Processes

The FDA s Mini- Sen*nel Program and the Learning Health System

Adam Rauch Partner, LabKey Software Extending LabKey Server Part 1: Retrieving and Presenting Data

Eliminating Barriers to Genuine Health Information Exchange. Copyright 2014 Allscripts Healthcare Solutions, Inc. 1

Research Into Care: Identifying Barriers and Gaps in Care. AAFP National Research Network Robert Graham Center Wilson D. Pace, MD

Using Public Health- Focused EHR Decision Support in Primary Care Se>ings

SNOMED CT. The Language of Electronic Health Records

The Development of the Clinical Trial Ontology to standardize dissemination of clinical trial data. Ravi Shankar

A leader in the development and application of information technology to prevent and treat disease.

Connecting Basic Research and Healthcare Big Data

How To Use Data Analysis To Get More Information From A Computer Or Cell Phone To A Computer

Understanding Diagnosis Assignment from Billing Systems Relative to Electronic Health Records for Clinical Research Cohort Identification

Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects

Appendix 6.2 Data Source Described in Detail Hospital Data Sets

Asian Data Resources. October 24, :30-12:30 Using pharmacoepidemiology database resources to address drug safety research

Department of Behavioral Sciences and Health Education

Terminology Services in Support of Healthcare Interoperability

PONTE Presentation CETIC. EU Open Day, Cambridge, 31/01/2012. Philippe Massonet

Practical Development and Implementation of EHR Phenotypes. NIH Collaboratory Grand Rounds Friday, November 15, 2013

Bringing Big Data into the Enterprise

patient-centered SCAlable National Network for Effectiveness Research

Secondary Uses of Data for Comparative Effectiveness Research

Oracle Database 11g SQL

Big Data Analytics Predicting Risk of Readmissions of Diabetic Patients

Beacon User Stories Version 1.0

Modeling Temporal Data in Electronic Health Record Systems

Healthcare Data: Secondary Use through Interoperability

Health Information Exchange. Scalable and Affordable

Electronic Health Record (EHR) Standards Survey

Developing VA GDx: An Informatics Platform to Capture and Integrate Genetic Diagnostic Testing Data into the VA Electronic Medical Record

Dr. Rob Donald - Curriculum Vitae. rob@statsresearch.co.uk, Web: Mob:

Smarter Research. Joseph M. Jasinski, Ph.D. Distinguished Engineer IBM Research

Transcription:

Biomedical Informatics discovery and impact Next-generation Phenotyping Using Interoperable Big Data George Hripcsak, Chunhua Weng Columbia University Medical Center Collab with Mount Sinai Medical Center

Introducing OHDSI Observational Health Data Sciences and Informatics International network of researchers and observational health databases with a central coordinating center housed at Columbia University Mission: Large-scale analysis of observational health databases for population-level estimation and patient-level predictions Vision: Patients and clinicians use OHDSI tools every day to access evidence based on 1 billion patients http://ohdsi.org Clinical researcher, provider, patient Tools and algorithms Data nodes Infrastructure, models, ontologies

OHDSI s global research community >120 collaborators from 11 different countries Experts in informatics, statistics, epidemiology, clinical sciences Active participation from academia, government, industry, providers http://ohdsi.org/who-we-are/collaborators/

Global reach of ohdsi.org >4600 distinct users from 96 countries in 2015

Why large-scale analysis is needed in healthcare All health outcomes of interest All drugs

What is large-scale? Millions of observations Need for performance in handling relational structure with millions of patients and billions of clinical observations, focus on optimization to analytical use cases. Millions of covariates No analytics software in the world can fit a regression with >1m observations and >1m covariates on typical hardware but CYCLOPS can! Millions of questions Systematic solutions with massive parallelization should be designed to run efficiently for one-at-a-time AND all-by-all

Drug safety surveillance Device safety surveillance Vaccine safety surveillance Comparative effectiveness Health economics Quality of care Person Observation_period Specimen Death Standardized health system data Location Care_site Provider Standardized meta-data CDM_source Concept Standardized clinical data Visit_occurrence Procedure_occurrence Drug_exposure Device_exposure Condition_occurrence Measurement Note Observation Fact_relationship Payer_plan_period Procedure_cost Drug_era Visit_cost Drug_cost Device_cost Cohort Cohort_attribute Condition_era Dose_era Standardized health economics Standardized derived elements Vocabulary Domain Concept_class Concept_relationship Relationship Concept_synonym Concept_ancestor Source_to_concept_map Drug_strength Cohort_definition Attribute_definition Standardized vocabularies

Preparing your data for analysis Patient-level data in source system/ schema ETL design ETL implement Patient-level data in OMOP CDM ETL test OHDSI tools built to help WhiteRabbit: profile your source data RabbitInAHat: map your source structure to CDM tables and fields ATHENA: standardized vocabularies for all CDM domains Usagi: map your source codes to CDM http://github.com/ohdsi CDM: DDL, index, constraints for Oracle, SQL Server, PostgresQL; Vocabulary tables with loading scripts ACHILLES: profile your CDM data; review data quality assessment; explore populationlevel summaries vocabulary OHDSI Forums: Public discussions for OMOP CDM Implementers/developers

Data Evidence sharing paradigms Single study Write Protocol Develop code Execute analysis Compile result Patient-level data in OMOP CDM Develop app Real-time query Design query Submit job Review result evidence Large-scale analytics Develop app Execute script Explore results One-time Repeated

Patient-level data in OMOP CDM Standardized large-scale analytics tools under development within OHDSI ACHILLES: Database profiling CIRCE: Cohort definition HERMES: Vocabulary exploration HERACLES: Cohort characterization CALYPSO: Feasibility assessment OHDSI Methods Library: CYCLOPS CohortMethod SelfControlledCaseSeries SelfControlledCohort TemporalPatternDiscovery Empirical Calibration LAERTES: Drug-AE evidence base PLATO: Patient-level predictive modeling HOMER: Population-level causality assessment http://github.com/ohdsi

CIRCE for cohort definition CIRCE (Cohort Inclusion and Restriction Criteria Expression) User interface to define and review cohort definitions: COHORT is a set of persons satisfying one or more criteria for a duration of time Disease phenotype is a typical use case for cohort definition Interface translates a human-readable form into a standardized JSON representation for network-based analysis interoperabilities, and compiles the JSON into platform-specific SQL dialect for direct execution against any OMOP CDM-compliant dataset Open-source, freely available source code: https://github.com/ohdsi/circe

One interface allows definition of criteria across all tables and all fields of the OMOP Common Data Model. The user interface translates this humanreadable form into JSON, which is compiled into SQL dialects for 5 platforms.

Each expression can be defined by one or more standard concept sets, using OHDSI s standardized vocabularies

HERMES for vocabulary exploration OHDSI standardized vocabularies allows consistent definitions to be applied across disparate source vocabularies: Select descendents for SNOMED concept of Attention deficit hyperactivity disorder maps all ICD9, ICD10, READ codes to execute analysis across OHDSI s international data network

Concept sets can define one or more entitities. Here, the PheKB list of ADHD inclusionary medications has been represented by 21 RxNorm ingredient concepts, all brands/dose/form are subsumed

The human-readable Expression form is translated into JSON in realtime. This JSON object can be shared across partners to materialize the definition consistently and reproducibly without any programming required

Each expression is compiled into SQL. OHDSI supports rendering SQL into platform-specific dialects for SQL Server, Oracle, Postgres, RedShift, MS APS. This code can be copied and executed in your favorite SQL UI tool, or.

Patient-level observational databases that are converted to the OMOP Common Data Model and exposed to the OHDSI webapi (either local install or any public network version) can have the cohort definition directly executed within the database to produce a COHORT. The COHORT is then available for all subsequent research within the OHDSI environment

Try it yourself http://www.ohdsi.org/web/circe/#/146

Proof of concept Treatment pathways around the world Diabetes, hypertension, depression (Submitted to PNAS)

Cohort

Databases (255M) and definitions

Diabetes

Opportunities for collaboration Implement the PheKB library in CIRCE, so that all organizations with patient-level data (translated to OMOP common data model) can take the work from emerge and directly apply the logic to their own data and participate in emerge s research

Phenotyping hard challenges Quality of the data Ambiguous or unknown meaning Accuracy 50-100% accuracy [Hogan JAMIA 1997] Completeness mostly missing Complexity disease ontologies Bias

Truth observe & interpret Concept author Record read Concept Health status of the patient Error Clinician or patient s conception Error EHR/PHR Implicit 2 nd clinician s conception of the patient (or self, lawyer, compliance,...) Error process Model Computable representation

Biased Environment Patient state Therapy Care team Objective tests Electronic health record

Inpatient mortality for community acquired pneumonia 35 30 25 Mortality (%) 20 15 18715 cohort 1935 cohort Fine 10 5 0 1 2 3 4 5 Fine class Hripcsak... Comput Biol Med 2007;37:296-304 18715 cohort +CXR +fdg -recent pneu -recent visit 1935 cohort above plus +DSUM exist +ICD9 (pneu not sepsis)

EHR-derived phenotype Clinically relevant feature derived from EHR Patient has (a diagnosis of) type II diabetes Recent rash and fever Drug-induced liver injury Then use the phenotype in correlation studies, etc. Query Raw data Phenotype Experiment

Physics of the medical record 1. Study EHR as if it were a natural object Use EHR to learn about EHR Not studying patient, but recording of patient 2. Aggregate across units and model 3. Borrow methods from non-linear time series

345 Glucose by Δt and tau Glucose 0.45 0.4 0.35 0.4-0.45 0.3 0.25 0.35-0.4 0.3-0.35 0.25-0.3 MI 0.2 0.15 0.2-0.25 0.15-0.2 0.1-0.15 0.1 0.05 0-0.05 1 2 2 7 50 450 delta-t (days) 0.05-0.1 0-0.05-0.1-0 6 7 tau Albers... Translational Bioinformatics 2009 89 10 20 30 40 50 60 70 80 90 100 0.17 0.83

Correlate lab tests and concepts 22 years of data on 3 million patients 21 laboratory tests sodium, potassium, bicarbonate, creatinine, urea nitrogen, glucose, and hemoglobin 60 concepts derived from signout notes residents caring for inpatients to facilitate the transfer of care for overnight coverage concepts likely to have an association + controls

Intentional and physiologic associations 0.15 potassium 0.1 0.05 aldactone dialysis 0-60 -40-20 0 20 40 60-0.05 hyperkalemia hypokalemia hypomagnesemia -0.1-0.15

Timing of cause in disease vs. treatment 0.1 glucose 0.08 0.06 0.04 0.02 0-60 -10 40 hyperglycemia hypernatremia hypoglycemia insulin metformin pancreatitis -0.02-0.04

Specificity of the concept 0.14 creatinine 0.12 0.1 0.08 0.06 0.04 0.02 0-60 -40-20 0 20 40 60 aldactone dialysis diarrhea diuretic hctz hyperglycemia hypernatremia vomiting -0.02-0.04

Hripcsak... JAMIA 2013 Health care process model

Hripcsak... JAMIA 2013

inpatient admit ambulatory surgery

Hripcsak JAMIA 2009 Interpreting time

Deviation by stated unit 50 45 Stated time 40 35 30 25 20 15 10 5 0-1 -0.9-0.8-0.7-0.6-0.5-0.4-0.3-0.2-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Now Number of occurrences day week month year Proportional deviation

Interpreting time Variable Definition Coefficient Significance value stated numeric value in the temporal assertion (1 to 30 in this sample) 0.0414 <0.001 round number true if value is a multiple of 5 (any unit) or 6 (with months) 0.0218 0.002 ln(duration) logarithm of stated duration in days, which equals the product of unit and value 0.150 0.023 gt 18 years true if duration 18 years, so the event should not be in the database 0.816 <0.001 intercept 0.406 0.416

Patient variability and sampling

Parameterizing Time

Parameterizing Time (Non-stationarity) 2.5 rate of change 2 coefficient of variation 1.5 1 clock warped sequence 0.5 0 creatinine glucose sodium potassium Hripcsak JAMIA 2015

Parameterizing Time

Vector autoregression to decipher associations

Noisy training sets with Nigam Shah; David Sontag

Summary OHDSI international collaboration could dovetail with emerge Next-generation phenotyping requires understanding the EHR