How to extract transform and load observational data?



Similar documents
Learning from observational databases: Lessons from OMOP and OHDSI

Open-Source Big Data Analytics in Healthcare

Next-generation Phenotyping Using Interoperable Big Data

Achilles a platform for exploring and visualizing clinical data summary statistics

Asian Data Resources. October 24, :30-12:30 Using pharmacoepidemiology database resources to address drug safety research

Environmental Health Science. Brian S. Schwartz, MD, MS

Meaningful use. Meaningful data. Meaningful care. The 3M Healthcare Data Dictionary: Standardizing lab data to LOINC for meaningful use

Disclosures. Real World Data Sources

Find the signal in the noise

HL7 and Meaningful Use

Secondary Uses of Data for Comparative Effectiveness Research

From Fishing to Attracting Chicks

Standardized Terminologies Used in the Learning Health System

Eligible Professionals please see the document: MEDITECH Prepares You for Stage 2 of Meaningful Use: Eligible Professionals.

Problem-Centered Care Delivery

IBM Watson and Medical Records Text Analytics HIMSS Presentation

Health Care Information System Standards

MED 2400 MEDICAL INFORMATICS FUNDAMENTALS

Clinical Decision Support Systems An Open Source Perspective

Meaningful Use Stage 2 Update: Deploying SNOMED CT to provide decision support in the EHR

HIMSS Electronic Health Record Definitional Model Version 1.0

ICD-9-CM to MedDRA Mapping How Well Do the. Disclaimer

AHIMA Curriculum Map Health Information Management Baccalaureate Degree Approved by AHIMA Education Strategy Committee February 2011

Electronic Health Records - An Overview - Martin C. Were, MD MS March 24, 2010

Beacon User Stories Version 1.0

Electronic Health Record. Standards, Coding Systems, Frameworks, and Infrastructures

TEST INSTRUCTIONS FOR CROSS VENDOR EXCHANGE TABLE OF CONTENTS

Meaningful use. Meaningful data. Meaningful care. The 3M Healthcare Data Dictionary: Enabling effective health information exchange

Bench to Bedside Clinical Decision Support:

Briefing on ehr Content Hong Kong Clinical Terminology Table (HKCTT) By Maggie LAU Health Informatician, ehriso

ehr Sharable Data Vicky Fung Senior Health Informatician ehr Information Standards Office

3M Health Information Systems

HL7 and Meaningful Use

A leader in the development and application of information technology to prevent and treat disease.

Meaningful Use Stage 2 Certification: A Guide for EHR Product Managers

Healthcare Data: Secondary Use through Interoperability

Using Health Information Technology to Improve Quality of Care: Clinical Decision Support

What is your level of coding experience?

Terminology Services in Support of Healthcare Interoperability

The ecosystem of the OpenClinic GA open source hospital information management software

Custom Report Data Elements: 2012 IT Database Fields. Source: American Hospital Association IT Survey

AHIMA Curriculum Map Health Information Management Associate Degree Approved by AHIMA Education Strategy Committee February 2011

A.4.2. Challenges in the Deployment of Healthcare Information Systems and Technology

Overview of FDA s active surveillance programs and epidemiologic studies for vaccines

MEANINGFUL USE. Community Center Readiness Guide Additional Resource #13 Meaningful Use Implementation Tracking Tool (Template) CONTENTS:

IMPROPER USE OF MEDICAL INFORMATION

HL7 & Meaningful Use. Charles Jaffe, MD, PhD CEO Health Level Seven International. HIMSS 11 Orlando February 23, 2011

Utility of Common Data Models for EHR and Registry Data Integration: Use in Automated Surveillance

Singapore s National Electronic Health Record

Health Informatics Development in the Hospital Authority

Guide To Meaningful Use

Understanding Diagnosis Assignment from Billing Systems Relative to Electronic Health Records for Clinical Research Cohort Identification

Automated Problem List Generation from Electronic Medical Records in IBM Watson

Meaningful Use. Michael L. Brody, DPM FACFAOM CCHIT Ambulatory Workgroup HITSP Physician Perspective Technical Committee NYeHC

CHAPTER 2 Functional EHR Systems

The FDA s Mini- Sen*nel Program and the Learning Health System

Results of a Peer Mentoring Intervention in Older Patients with Diabetes: The Care Companion Program

Employing SNOMED CT and LOINC to make EHR data sensible and interoperable for clinical research

Electronic Health Record (EHR) Standards Survey

Patient Similarity-guided Decision Support

Mona Osman MD, MPH, MBA

BI en Salud: Registro de Salud Electrónico, Estado del Arte!

ADVANCING MEASUREMENT OF PATIENT- CENTERED OUTCOMES AND QUALITY METRICS WITH ELECTRONIC HEALTH RECORDS

International HL7 Interoperability Conference - IHIC 2010

Medicare and Medicaid Programs; EHR Incentive Programs

Electronic health records to study population health: opportunities and challenges

Electronic Health Record Sharing Benefits and Challenges

THE STIMULUS AND STANDARDS. John D. Halamka MD

Georgia Department of Education Career Pathway Descriptions

University of Minnesota Health Informatics Graduate Program: Education, Research and Practice

AHIMA Curriculum Map Health Information Management Associate Degree Approved by AHIMA Education Strategy Committee February 2011

Coding Certificate Program Competencies

A Semantic Foundation for Achieving HIE Interoperability

Medication Therapy Management Services Clinical Documentation: Using a Structured Coding System SNOMED CT

Qualifying for Medicare Incentive Payments with Crystal Practice Management. Version

DENOMINATOR: All female patients aged 65 years and older with a diagnosis of urinary incontinence

The Big Data Dividend

Knowledge Clusters (Curriculum Components)

Transcription:

How to extract transform and load observational data? Martijn Schuemie Janssen Research & Development Department of Pharmacology & Pharmacy, The University of Hong Kong

Outline Observational data & research networks OMOP Common Data Model (CDM) Transforming data to the CDM Available tools for the CDM

Observational data & research networks

Observational health data Subjective: Complaint and symptoms Medical history Objective: Observations Measurements vital signs, laboratory tests, radiology/ pathology findings Assessment and Plan: Diagnosis Treatment

Observational health data All captured in the medical record Medical records are increasingly being captured in EHR systems Data within the EHR are becoming increasingly structured

The average primary care physician sees 20 different patients a day care data

that s 100 visits in a week care data

which can reflect a panel of over 2,000 patients in a year care data

Databases for research Diseases Health care provided Evidence generation Electronic Health Records Insurance claims Registries Effects of treatments Differences between patients Personalized medicine

Key point Existing observational health care data such as Electronic Health Records and insurance claims databases have great potential for research 现 有 的 卫 生 保 健 观 测 数 据, 例 如 电 子 健 康 记 录 保 险 索 赔 数 据 库 等 具 有 巨 大 的 潜 力 为 研 究 所 用

Research networks Data may be at different sites Sites often cannot share data at the patient level Data can be in very different formats Patient level, identifiable information Practice Hospital Claims Registry

ETL ETL ETL ETL Standardized analytics Aggregate summary statistics Firewall Common Data Model Extract, Transform, Load (ETL) Patient level, identifiable information Practice Hospital Claims Registry

ETL ETL ETL ETL Count people on drug A and B, and the number of outcomes X Firewall Common Data Model Extract, Transform, Load (ETL) Patient level, identifiable information Practice Hospital Claims Registry

ETL ETL ETL ETL Count people on drug A and B, and the number of outcomes X Firewall Common Data Model Practice: 100 patients on A, 1 has X 200 patients on B, 4 have X Extract, Transform, Load (ETL) Hospital: 1000 patients Patient on A, level, 10 have identifiable X 2000 patients on B, 40 have information X Aggregated data Etc. Practice No privacy concerns Hospital Claims Registry

Distributed research networks Europe United States Asia AsPEN Global

Key point Observational data can often not be shared. Using a common data model allows analysis programs to visit the data instead. 观 测 数 据 通 常 不 是 共 享 的 而 使 用 公 共 数 据 模 型 可 以 实 现 分 析 程 序 对 不 同 来 源 数 据 的 统 一 访 问

OMOP Common Data Model (CDM)

A common structure Person - person_id - year_of_birth - month_of_birth - day_of_birth - gender_concept_id A common vocabulary Common Data Model How do we store gender? - M, F, U - 0, 1, 2-8507 (male), 8532 (female), 8851 (unknown gender), 8570 (ambiguous gender)

OMOP Common Data Model Designed for various types of data (EHR, insurance claims) in various countries Developed by the OMOP / OHDSI community Currently on 5 th version Easy to get data in, easy to get data out

Standardized clinical data Drug safety surveillance Device safety surveillance Vaccine safety surveillance Comparative effectiveness One model, multiple use cases Health economics Quality of care Clinical research Person Observation_period Specimen Death Standardized health system data Location Care_site Provider Standardized meta-data CDM_source Concept Visit_occurrence Procedure_occurrence Drug_exposure Device_exposure Condition_occurrence Measurement Note Observation Fact_relationship Payer_plan_period Procedure_cost Drug_era Visit_cost Drug_cost Device_cost Cohort Cohort_attribute Condition_era Dose_era Standardized health economics Standardized derived elements Vocabulary Domain Concept_class Concept_relationship Relationship Concept_synonym Concept_ancestor Source_to_concept_map Drug_strength Cohort_definition Attribute_definition Standardized vocabularies

Person centric CDM principles Data is split into domains (e.g. conditions, drugs, procedures) Preserve source values and map to standard values Store the source of the data Separate verbatim data from inferred data

OMOP Vocabulary All codes are mapped to standard coding systems Drugs: RxNorm Conditions: SNOMED Measurements: LOINC Procedures: ICD9Proc, HCPCS, CPT-4

Vocabulary MEDDRA Liver disorder (10024670) SNOMED Injury of liver (39400004) Biliary cirrhosis (1761006) Primary biliary cirrhosis (31712002) Source codes Biliary cirrhosis of children (J616200) CPRD Primary biliary cirrhosis (K74.3) AUSOM

Key point The OMOP Common Data Model provides a common structure and a common vocabulary. Different levels of precision in different coding systems are resolved using a concept hierarchy. OMOP 公 共 数 据 模 型 对 数 据 采 用 通 用 结 构 和 通 用 词 汇 ; 利 用 统 一 的 概 念 等 级 解 决 不 同 编 码 系 统 的 各 个 级 别 的 精 确 度 问 题

Transforming data to the CDM

Process to create an ETL 1. Data experts and CDM experts together design the ETL 2. People with medical knowledge create the code mappings 3. A technical person implements the ETL 4. All are involved in quality control

Tools to support the process 1. Data experts and CDM experts together design the ETL WHITE RABBIT RABBIT IN A HAT 2. People with medical knowledge create the code mappings 3. A technical person implements the ETL? USAGI 4. All are involved in quality control ACHILLES

Run Step 1. Designing the ETL WHITE RABBIT Scans the source database Creates a report describing o Tables o Fields in the tables o Values in the fields o Frequency of values

Step 1. Designing the ETL Run WHITE RABBIT In an interactive session with both data experts and CDM experts, run RABBIT IN A HAT

RABBIT IN A HAT

Step 1. Designing the ETL Run WHITE RABBIT In an interactive session with both data experts and CDM experts, run RABBIT IN A HAT Output document becomes basis of ETL specification

Step 2. Create code mappings Need to map codes to one of the OMOP CDM standards: Drugs: RxNorm Conditions: SNOMED Lab values: LOINC Specialties: CMS

Prepare data Step 2. Create code mappings Codes Descriptions (translated to English if needed) Frequencies Use existing mapping information if available If ATC codes are available, use to select subsets Use (partial) mappings in UMLS Run USAGI

USAGI

Step 3. Implement the ETL No standard. Depends on the expertise of the person doing the work: SQL SAS C++ (CDMBuilder) Java (JCDMBuilder) Kettle

Step 4. Quality control Not yet fully defined, but we currently use: Manually compare source and CDM information on a sample of persons Use Achilles software Replicate a study already performed on the source data

Key point Transforming data to a common data model requires an interdisciplinary team, including clinicians, informatics specialists, and people that know the data. 实 现 数 据 到 公 共 数 据 模 型 的 转 换 需 要 一 个 跨 学 科 领 域 团 队 的 合 作, 包 括 临 床 医 生 信 息 学 专 家 以 及 了 解 数 据 内 容 的 人 员

Available tools for the CDM

Introducting OHDSI International collaborative ohdsi.org

Introducting OHDSI International collaborative Coordinating center in Columbia University Academia and industry Data network Research (clinical + methodology) Open source software ohdsi.org

ACHILLES Automated Characterization of Health Information at Large-scale Longitudinal Evidence Systems Data in CDM Aggregate statistics: - Persons - Conditions - Drugs - Lab results - Explore statistics in a web browser

ACHILLES >12 databases from 5 countries across 3 different platforms: Janssen (Truven, Optum, Premier, CPRD, NHANES, HCUP) Columbia University Regenstrief Institute Ajou University IMEDS Lab (Truven, GE) UPMC Nursing Home Erasmus MC Cegedim

Lists data quality issues Achilles Heel

Achilles Statistics can be explored for patterns indicating quality issues

Open source tools Support for ETL White Rabbit + Rabbit in a Hat to help design an ETL Usagi to help create code mappings Data exploration / study feasibility HERMES to explore the vocabulary ACHILLES to explore a database CIRCE to create cohort definitions HERACLES to explore cohort characteristics Orange: Under development Methods for performing studies Cohort method for new user cohort studies using large-scale propensity scores IC Temporal pattern discovery Self-Controlled Case Series Korean signal detection algorithms Knowledge base MedlineXmlToDatabase for loading Medline into a local database LAERTES for combining evidence from literature, labels, ontologies, etc. Black: Complete

Key point OHDSI is a research community that develops open source tools that run on the Common Data Model. OHDSI 是 一 个 研 究 团 体, 主 要 致 力 于 开 发 在 公 共 数 据 模 型 上 运 行 分 析 的 开 源 工 具

Concluding thoughts Wealth of observational data to generate wealth of evidence for health care ETL into Common Data Model can be large initial investment Common Data Model allows sharing of results, methods, software tools

Thank you! 谢 谢!