Opportunities and Limitations of Big Data to Address Diversity. Shawn Murphy MD, Ph.D.



Similar documents
Instrumenting the Healthcare Enterprise for Big Data Research. Shawn Murphy MD, Ph.D. Leicester, October

Use of the Research Patient Data Registry at Partners Healthcare, Boston

The i2b2 Hive and the Clinical Research Chart

i2b2 Clinical Research Chart

i2b2 Clinical Research Chart

i2b2 Clinical Research Chart

Medical Informatics: A Fishing Story

Balancing Big Data for Security, Collaboration and Performance

Employing SNOMED CT and LOINC to make EHR data sensible and interoperable for clinical research

Informatics for Integrating Biology and the Bedside. i2b2 User Guide. Import Data View. Document Version: I2b2 Software Release: 1.

SAP Healthcare Analytics Solutions Provide physicians and researchers access to patient data from various systems in realtime

A Commercial Approach to De-Identification Dan Wasserstrom, Founder and Chairman De-ID Data Corp, LLC

@mandl. Bringing Big Data to the Point of Care by Creating the App Store for Health

EMR Systems and the Conduct of Clinical Research. Daniel E Ford, MD, MPH Vice Dean for Clinical Investigation Johns Hopkins School of Medicine

Informatics Resources for Clinical Research. Jim Cimino, Geoff Gordon, Matt Wyatt and James Willig UAB Informatics Institute April 6, 2016

Enhancing Functionality of EHRs for Genomic Research, Including E- Phenotying, Integrating Genomic Data, Transportable CDS, Privacy Threats

GENETIC DATA ANALYSIS

How Can Institutions Foster OMICS Research While Protecting Patients?

Apps to display patient data, making SMART available in the i2b2 platform

From Fishing to Attracting Chicks

NIH s Genomic Data Sharing Policy

Environmental Health Science. Brian S. Schwartz, MD, MS

EHR OpenNESS: Innovation and Uptake

Factors for success in big data science

Predictive Analytics Drives Population Health Management

Meaningful Use and Beyond. John D. Halamka MD CIO, Beth Israel Deaconess Medical Center and Harvard Medical School

Enabling the Health Continuum with Informatics. Jeroen Tas Healthcare Informatics.Solutions.Services

NIH Genomic Data Sharing (GDS) Policy Guidance Memo #2 1

Personalized Medicine: Humanity s Ultimate Big Data Challenge. Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences

Clinical and research data integration: the i2b2 FSM experience

A leader in the development and application of information technology to prevent and treat disease.

Digital Health: Catapulting Personalised Medicine Forward STRATIFIED MEDICINE

8/27/2014. Office of Research Informatics(ORI) CORI. Introduction- The Office of Research Informatics (ORI)?

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects

Big Data for Population Health

An EVIDENCE-ENHANCED HEALTHCARE ECOSYSTEM for Cancer: I/T perspectives

EDITORIAL MINING FOR GOLD : CAPITALISING ON DATA TO TRANSFORM DRUG DEVELOPMENT. A Changing Industry. What Is Big Data?

Smarter Research. Joseph M. Jasinski, Ph.D. Distinguished Engineer IBM Research

Data Repository (CRC) Cell

Using population health data analytics to optimize medical device investment decisions

Biobanks: DNA and Research

Ganzheitliches Datenmanagement

BRISSkit: Biomedical Research Infrastructure Software Service kit. Jonathan Tedds. #brisskit #umfcloud

BRISSkit: Biomedical Research Infrastructure Software Service kit

Records and Clinical Trials

Big Data for Population Health and Personalised Medicine through EMR Linkages

Visual Analytics to Enhance Personalized Healthcare Delivery

ECRIN (European Clinical Research Infrastructures Network)

Transla6ng from Clinical Care to Research: Integra6ng i2b2 and OpenClinica

Interoperability and Analytics February 29, 2016

SINTERO SERVER. Simplifying interoperability for distributed collaborative health care

Biomedical Big Data and Precision Medicine

MemorialCare Health System: Steven Beal, VP Information Services

IMPROPER USE OF MEDICAL INFORMATION

Kick off Meeting November 11 13, MERCY CLINIC EAST COMMUNITIES Management of Patients with Heart Failure (HF)

Moffitt Cancer Center, M2Gen and ConvergeHEALTH Collaboration

I. INTRODUCTION DEFINITIONS AND GENERAL PRINCIPLE

Center for. Clinical Informatics. Clinical Informatics. Mission Statement

An Easily Accessed Clinical Research Database from your Epic EMR

Electronic Medical Records and Genomics: Possibilities, Realities, Ethical Issues to Consider

Accessing Data for Clinical Researchers: The Boston Medical Center Clinical Data Warehouse

Overview of an Enterprise HIE at Virtua Health

Integration of Genetic and Familial Data into. Electronic Medical Records and Healthcare Processes

Cerner i2b2 User s s Guide and Frequently Asked Questions. v1.3

Informatics Domain Task Force (idtf) CTSA PI Meeting 02/04/2015

Delivering the power of the world s most successful genomics platform

Data-driven Medicine in the Age of Genomics Overcoming the Challenge With Advanced Molecular Analytics

Incorporating Research Into Sight (IRIS) Essentia Rural Health Institute Marshfield Clinic Penn State University

Use Cases for Argonaut Project. Version 1.1

Transcription:

Opportunities and Limitations of Big Data to Address Diversity Shawn Murphy MD, Ph.D.

The Research Data Warehouse at Partners Healthcare Partners Enterprise Research Patient Data Registry Multiple Systems at Partners: Billing Data Epic Data Research Data (consent to contact) Specimen Data Laboratory Data visit_dimension patient_dimension PK Encounter_Num 1 PK Patient_Num observation_fact 1 Start_Date PK Patient_Num Birth_Date End_Date PK Encounter_Num Death_Date PK Concept_CD Active_Status_CD Vital_Status_CD Location_CD* PK Observer_CD Age_Num* PK Start_Date Gender_CD* PK Modifier_CD Race_CD* observer_dimension PK Instance_Num Ethnicity_CD* PK Observer_Path End_Date ValType_CD Observer_CD TVal_Char Name_Char NVal_Num concept_dimension ValueFlag_CD PK Concept_Path Observation_Blob modifier_dimension PK Modifier_Path Concept_CD Name_Char Modifier_CD Name_Char

Research Patient Data Registry (RPDR) at Partners Healthcare to find patient cohorts for clinical research 1) Queries for aggregate patient numbers - Warehouse of in & outpatient clinical data - 6.5 million Partners Healthcare patients - 2.2 billion diagnoses, medications, procedures, laboratories, & physical findings coupled to demographic & visit data - Authorized use by faculty status - Clinicians can construct complex queries - Queries cannot identify individuals, internally can produce identifiers for (2) Query construction in web tool Encrypted identifiers Z731984X Z74902XX...... Deidentified Data Warehouse 2) Returns detailed patient data - Start with list of specific patients, usually from (1) - Authorized use by IRB Protocol - Returns contact and PCP information, demographics, providers, visits, diagnoses, medications, procedures, laboratories, microbiology, reports (discharge, LMR, operative, radiology, pathology, cardiology, pulmonary, endoscopy), and images into a Microsoft Access database and text files. 0000004 2185793...... OR Real identifiers 0000004 2185793......

2014 s usage of RPDR > 5100 registered users over 12 years, 655 new in 2014 583 teams/year gathering data for research studies 2634 detailed patient data sets returned to these teams in 2014, containing data of 24.7 million patient records. From a survey of 153 teams Importance of the data received from the RPDR was evaluated in relation to the study it was supporting. $94-136 million total research support critically dependent on RPDR from patient data received throughout life of funding. Usefulness of Detailed Data 106 Total Responses Not Useful 15% Useful 42% Critical 43% ~300 data marts were created to support hospital operations, representing about 80 million patient records

Security and Patient Confidentiality of Step 1 All patients at Partners are added when seen in Affiliated Covered Entity HIPAA notification that their data may be used for research and operations upon registration. RPDR data is anonymized at the Query Tool. Aggregated numbers are obfuscated to prevent identification of individuals; automatic lock out occurs if pattern suggests identification of an individual is being attempted. Queries done in Query Tool available for review by RPDR team, a user lock out will specifically direct a review. Concept of established medical investigator is promoted by classification as a faculty member or designated list from IRB.

Security and Patient Confidentiality of Step 2 Only studies approved by designated Directors of Operation at Hospitals or the Institutional Review Board (IRB) are allowed to receive detailed data. Queries may be set up by a workgroup member, but faculty sponsor on Approval / IRB protocol must directly approve all queries that return identified data. Special controls exist when distributing data regarding HIV antibody and antigen test results, substance abuse rehab programs, and genetic data, due to specific state and federal laws.

FINDING PATIENTS Query items Person who is using tool Query construction Results - broken down by number distinct of patients

MATCHING PATIENTS Previous query items Case set construction Control set construction Estimate set size and run program

Identified data is gathered Output files placed in special directory Data is gathered from RPDR and other Partners sources Files include a Microsoft Access Database

Creating Enterprise Wide Query and Analysis System for Big Data RPDR Redcap Survey Data Public Health (CMS) i2b2 Navigator Imaging DICOM Partners Biobank Genomics Sets Notes / Text Repository Big Data Commons Integrates disparate islands of patient data (clinical and research data)onto a Common Big Data Platform 29

The Partners Biobank Samples Consent The Partners Biobank provides samples (plasma, serum, and DNA) collected from consented patients. Data Research Discoveries 31,000 patients have consented to date! Samples are available for distribution to Partners investigators* to help identify novel Personalized Medicine opportunities that reduce cost and provide better care *with required approval from the Partners Institutional Review Board (IRB). Improved Clinical Care for All Patients

Biobank Integrative Genomics Strategy Partners BioBank Samples (Whole Blood Extracted DNA/RNA) Genotyping Transcriptome Epigenome Profiling Illumina MEGArray: Multi-Ethnic GWAS/Exome SNP Array Whole Transcriptome Analysis: RNA-seq Methylation Analysis: HumanM450K Array Array Cost: $59/ sample Array Cost: $40-50/sample Array Cost: $150/sample Genome/Transcriptome Analysis: ~$100/sample Genome/Transcriptome/Epigenome Analysis: ~$260/sample

Curating a Disease Algorithm 1. Create a gold standard training set. 3. Develop the classification algorithm. Using the data analysis file and the training set from step 1, assess the frequency of each variable. Remove variables with low prevalence. Apply adaptive LASSO penalized logistic regression to identify highly predictive variables for the algorithm 2. Create a comprehensive list of features (concepts/variables) that describe the phenotype of interest 4. Apply the algorithm to all subjects in the superset and assign each subject a probability of having the phenotype 95% Specificity 32

Biobank Portal Curated Diseases Validated Phenotype Count* Predictive Positive Value Bipolar Disease 71 89% Congestive Heart Failure 387 90% Coronary Artery Disease 2,420 97% Crohn s Disease 453 90% Multiple Sclerosis 94 90% Rheumatoid Arthritis 550 90% Type 2 Diabetes Mellitus 1,887 97% Ulcerative Colitis 330 90% Healthy Controls based on Charlson Index * Based on 15,880 patients ** Based on 21,300 patients Count** 0 10-year survival probability is >98.3% 2,206 1 10-year survival probability is >95.87% 4,343 2 10-year survival probability is >90.15% 6,545

Biobank Portal Run a Query Step 1: Run a query in the Biobank Portal Two criteria: a. The patient is a smoker (from the Health Information Survey) b. The patient agrees to be re-contacted

Partners Biobank Portal Using Curated Phenotypes

Partners Biobank Portal Download De-Identified Data

Partners Biobank Portal Specify Data for Download

Download table and select patients

Partners Biobank Portal Request Genetic Data

SCILHS Clinical Data Research Network Partners HealthCare System Boston Children s Hospital BIDMC Boston Health Net (BMC and Community Health Centers) University of California, Davis Columbia U. Medical Center and New York Presbyterian Hospital Washington University in St. Louis Wake Forest Baptist Medical Center Morehouse/Grady/RCMI U Texas Health Science Center/Houston

Distributed Query System

Workflow at the sites to find patients for a clinical trial: After a query is run across the SHRINE network, the query is automatically saved at every site The query saved at each site is transformed into a patient set The patient set is studied and sorted for the specific patients eligible for the Clinical Trial

Run Query Using SHRINE

I2b2 Workbench works directly on new database created from patient set LOCALLY. At the Local site, one will: View a patient list summary with matched criteria Review a patient s health data one at a time Designate suitable patients Export to Excel / CSV format

Review de-identified Patients

Review identified Patients

A Criteria Matcher App can help assess candidate patients using the clinical trial metadata

Patient Data Registries Innovative Tools & IT solutions Research Partners Biobank Quality Patient Care

Using Big Data to Find Similar People To Help Predict Outcomes and Support Medical Decisions Analysis Pipeline To Learn About what Differences are Important for Predicting Disease Public Data Sets To Understand the Disease Traits Caused by a Gene Variant Database Schemas Image Analysis To Help Interpret Features in Medical Images and Tissues ML Cluster Analysis

Deciding to Perform a Genetic Test Inherited Heart Disease - Cardiomyopathy

Health Innovation Platform Architecture

Treating IHD Patient with Precision Medicine Partners Big Data Commons

SMART Apps link i2b2 to the EMR Substitutable Medical Application and Reusable Technology Commissioned form the Office of the National Coordinator Allows Big Data from i2b2 to integrate Clinical Apps into the Epic EMR. Paradigm is similar to Mobile Apps with a proposed standard interface using FHIR sponsored by the National Argonaut EMR Project.

New network storage in the cloud Cloud API

delivered as a Service Self-Service in 2016

Presenting the researcher desktop Robust desktop capabilities for the Partners researcher Analysis Pipeline Public Data Sets Database Schemas Image Analysis ML Cluster Analysis

Enabling Innovation to reach into EMR SMART Apps run in EMR Got Statins? BP Centiles Cancer Risk SMART-Enabled DW - I2b2/RPDR Partners Big Data Commons SMART-Enabled Document/Pivotal SMART-Enabled EMR - Epic Big Data Analytics performed with High Performance Computing

Tribute to RPDR/I2b2 Core Team Shawn Murphy Christopher Herrick Mariah Mitchell Vivian Gainer Alyssa Goodson Martin Rees Charles Wang Laurie Bogosian Stacey Duey Andrew Cagan Michael Mendis Lori Phillips Janice Donahoe Nich Wattanasin David Wang Biobank Team Natalie Boutin Scott Weiss Vivian Gainer HPC and Cloud Team Brent Richter Jon Jackson Alan Harris Genomics Innovation Team Sandy Aronson Heidi Rehm Calum MacRea

I2b2 and SMART Information and Software on the Web i2b2 Homepage (https://www.i2b2.org) i2b2 Software (https://www.i2b2.org/software) i2b2 Community Site (https://community.i2b2.org) SMART Platforms Homepage (http://smarthealtit.org)