Preparing Electronic Health Records for Multi-Site CER Studies Michael G. Kahn 1,3,4, Lisa Schilling 2 1 Department of Pediatrics, University of Colorado, Denver 2 Department of Medicine, University of Colorado, Denver 3 Colorado Clinical and Translational Sciences Institute 4 Department of Clinical Informatics, Children s Hospital Colorado AcademyHealth Annual Research Meeting Building a Data Infrastructure for Multi-stakeholder Comparative Effectiveness Research 26 June 2012 Michael.Kahn@ucdenver.edu Funding provided by AHRQ 1R01HS019908 (Scalable Architecture for Federated Translational Inquiries Network)
Setting the context: AHRQ Distributed Research Networks AHRQ ARRA OS: Recovery Act 2009: Scalable Distributed Research Networks for Comparative Effectiveness Research (R01) Goal: enhance the capability and capacity of electronic health networks designed for distributed research to conduct prospective, comparative effectiveness research on outcomes of clinical interventions. Funding provided by AHRQ 1R01HS019908 (Scalable Architecture for Federated Translational Inquiries Network)
AHRQ Distributed Research Networks Funded Projects SAFTINet: Scalable Architecture for Federated Therapeutic Inquiries Network Lisa M. Schilling, University of Colorado Denver (R01 HS19908-01) SCANNER: Scalable National Network for Effectiveness Research Lucila Ohno-Machado, University of California San Diego (R01 HS19913-01) SPAN: Scalable PArtnering Network for CER: Across Lifespan, Conditions, and Settings John F. Steiner, Kaiser Foundation Research Institute (R01 HS19912-01) Funding provided by AHRQ 1R01HS019908 (Scalable Architecture for Federated Translational Inquiries Network)
SAFTINet Partners Clinical partners Colorado Community Managed Care Network and the Colorado Associated Community Health Information Enterprise Colorado Federally Qualified Health Centers Denver Health and Hospital Authority Cherokee Health Systems, Tennessee Technology partners University of Utah, Center for High Performance Computing QED Clinical, Inc., d/b/a CINA Medicaid partners Colorado Health Care Policy & Financing Utah Department of Public Health (partnership in development) TennCare and Tennessee managed care organizations (partnership in development) Leadership University of Colorado Denver American Academy of Family Physicians, National Research Network
Key Differences between EHR and CER data EHR Data CER Data EHR->CER task Fully identified LDS or de-identified Strip identifiers; keep mappings? Local codes and values Standardized codes and values Terminology and value set mapping (manual!) Broad data domains Focused data domains Filtering by patient, encounter, date, facility Variable data quality; high level of missingness Substantial data quality processes applied Data profiling; iterative investigations Lots of free text Fully coded data only NLP or ignore free text Local access only Shared access Distributed or centralized data access Single data source Multiple data sources Record linkage
Key Differences between EHR and CER data EHR Data CER Data EHR->CER task Fully identified LDS or de-identified Strip identifiers; keep mappings? Local codes and values Standardized codes and values Terminology and value set mapping (manual!) Broad data domains Focused data domains Filtering by patient, encounter, date, facility Variable data quality; high level of missingness Substantial data quality processes applied Data profiling; iterative investigations Lots of free text Fully coded data only NLP or ignore free text Local access only Shared access Distributed or centralized data access Single data source Multiple data sources Record linkage
Key Differences between EHR and CER data EHR Data CER Data EHR->CER task Fully identified LDS or de-identified Strip identifiers; keep mappings? Local codes and values Standardized codes and values Terminology and value set mapping (manual!) Broad data domains Focused data domains Filtering by patient, encounter, date, facility Variable data quality; high level of missingness Substantial data quality processes applied Data profiling; iterative investigations Lots of free text Fully coded data only NLP or ignore free text Local access only Shared access Distributed or centralized data access Single data source Multiple data sources Record linkage
Key Differences between EHR and CER data EHR Data CER Data EHR->CER task Fully identified LDS or de-identified Strip identifiers; keep mappings? Local codes and values Standardized codes and values Terminology and value set mapping (manual!) Broad data domains Focused data domains Filtering by patient, encounter, date, facility Variable data quality; high level of missingness Substantial data quality processes applied Data profiling; iterative investigations Lots of free text Fully coded data only NLP or ignore free text Local access only Shared access Distributed or centralized data access Single data source Multiple data sources Record linkage
Key Differences between EHR and CER data EHR Data CER Data EHR->CER task Fully identified LDS or de-identified Strip identifiers; keep mappings? Local codes and values Standardized codes and values Terminology and value set mapping (manual!) Broad data domains Focused data domains Filtering by patient, encounter, date, facility Variable data quality; high level of missingness Substantial data quality processes applied Data profiling; iterative investigations Lots of free text Fully coded data only NLP or ignore free text Local access only Shared access Distributed or centralized data access Single data source Multiple data sources Record linkage
Key Differences between EHR and CER data EHR Data CER Data EHR->CER task Fully identified LDS or de-identified Strip identifiers; keep mappings? Local codes and values Standardized codes and values Terminology and value set mapping (manual!) Broad data domains Focused data domains Filtering by patient, encounter, date, facility Variable data quality; high level of missingness Substantial data quality processes applied Data profiling; iterative investigations Lots of free text Fully coded data only NLP or ignore free text Local access only Shared access Distributed or centralized data access Single data source Multiple data sources Record linkage
Key Differences between EHR and CER data EHR Data CER Data EHR->CER task Fully identified LDS or de-identified Strip identifiers; keep mappings? Local codes and values Standardized codes and values Terminology and value set mapping (manual!) Broad data domains Focused data domains Filtering by patient, encounter, date, facility Variable data quality; high level of missingness Substantial data quality processes applied Data profiling; iterative investigations Lots of free text Fully coded data only NLP or ignore free text Local access only Shared access Distributed or centralized data access Single data source Multiple data sources Record linkage
A common data model is critical! Other EHR Other EHR Other EHR CINA CDR Local Data Warehouse Existing Clinical Registries Limited Data Set Common Data Model Common Terminology Crossing the CER chasm!! Limited Data Set Common Data Model Common Terminology Limited Data Set Common Data Model Common Terminology CER Common Query Interface
ROSITA-GRID-PORTAL
Grid Portal
Why ROSITA? ROSITA: Reusable OMOP and SAFTINet Interface Adaptor ROSITA: The only bilingual Muppet Converts EHR data into research limited data set 1. Replaces local codes with standardized codes 2. Replaces direct identifiers with random identifiers 3. Supports clear-text and encrypted record linkage 4. Provides data quality metrics 5. Pushes data sets to grid node for distributed queries
ROSITA: transforming EHR data for comparative effectiveness research Client CDW Medicaid ETL XML ETL XMK ROSITA JDBC JDBC OMOP CDM V3 Grid Data Service SAFTINet Data Quality Data Service
SAFTINet ETL specifications
SAFTINet ETL Specifications
SAFTINet ETL Specifications
Transforming EHR Data: What does ROSITA do?
What does ROSITA do?
What does ROSITA do?
Why ROSITA? Converts EHR data into research limited data set 1. Replaces local codes with standardized codes 2. Replaces direct identifiers with random identifiers 3. Supports clear-text and encrypted record linkage 4. Provides data quality metrics 5. Pushes data sets to grid node for distributed queries
Do not have Medicaid figured out
ROSITA Security Discussion Framework
ROSITA: Current Status Software development underway In Phase 1: 16 week development clinical data only; no Medicaid Phase 2: Medicaid + record linkage OMOP data model V4 finalized! Clinical & financial extensions All SAFTINet partners have begun ETL activities Two sites have provided full ETL extracts for development and testing Everything is/will be available
Questions? Michael.Kahn@ucdenver.edu Funding provided by AHRQ 1R01HS019908 (Scalable Architecture for Federated Translational Inquiries Network)