UC s Research Enterprise, Emerging Trends / Opportunities Required Cyber Infrastructures Faculty Perspective 3/23/15 Health Science Cyberinfrastructure: Privacy Protection and High-Performance Computing with Big Human Subjects Data Lucila Ohno-Machado, MD, PhD University of California San Diego The analyses upon which this publication is based were performed under Contract Number HHSM-500-2009-00046C sponsored by the Center for Medicare and Medicaid Services, Department of Health and Human Services.
Decision Support Tools Guidelines, Alert & Reminders Patient Interaction Data Collection Tools Clinical Data Warehouse Communication Strategies Consumer Health Informatics Medical Education Predictive Modeling Evaluation Methods Data De- Identification Privacy Technology Data Integration Genomics Proteomics Sensors Data Analysis Statistics Machine Learning Data Structuring Natural Language Processing Data Modeling
slide from Dr. Xiaoqian Jiang 3 Medicine in the Era of Big Data Biomedical science is moving towards datadriven approaches 60+ genetic variants are adopted in clinical practice Sequencing is getting cheaper Family history data is consider as "the best genetic test available
Big Data Announcement 4
Types of Health Sciences Big Data Several terabytes of data are being generated every day and samples are being collected Genomes Images Body sensors Environmental sensors Electronic Health Records Personal Health Records Pharmacy Social media
Precision Medicine Workshop at NIH EHR Group recommendations Patient participation: Blue button rights (Synch for Science) Patient portals (MyChart): patientreported outcomes, research match UC could build a large portion of the 1M people cohort Cutting edge research attracts patients Strengths in cancer, genomics, analytics, robotics, imaging EHR data already harmonized to national standards Point of care contact via EHR, mhealth, sensors Health gaming, cloud, privacy Biorepositories
Learning Healthcare System Secure HIPAA cloud storage & compute for big data Predictive analytics, machine learning Natural language processing (extract data from clinical notes, specialty reports) Human-computer interaction expertise Consent management system, biorepository Benchmarking Integration at the point of care 7
Integrating Different Types of Data Genotype genome transcription RNA transcriptome translation Protein proteome Phenotype physical exam, imaging, monitoring systems Physiology tests Metabolites laboratory
9
Specific Information Can Support Clinical Decisions This patient has DPYD deficiency 5-FU/Capecitabine toxicity is increased prescribe Raltitrexed instead
Precision Medicine Genotype-phenotype correlations Sensor data, mhealth Clinical Applications Pharmacogenomics Targeted prevention
Health Insurance Portability & Accountability Act HIPAA Security Risk of disclosure Liability Privacy Risk of re-identification Informed consent Presidential Commission for the Study of Bioethical Issues. Privacy and progress in whole genome sequencing. http://www.bioethics.gov covers genome sequencing, exome sequencing, genomewide SNV analysis, and data from large scale genomic studies 12 HIPAA De-identified data removal of 18 identifiers, such as dates, biometrics, names, etc. expert certification of low risk of re-identification Limited data sets have dates
13
What is the problem? Imagine a public data set of genomes about a certain disease (e.g., Alzheimer s) Can you find out if your neighbor has Alzheimer s? Same problem happens with clinical data, which can be re-identified to a target patient Dates of visits Combination of patient characteristics 05/6/2015 Supported by the NIH Grant U54 HL108460 to 14
Our Needs Supported by the NIH Grant U54 HL108460 to the University of California, San Diego 15 Share access to data and computation Train the new generation of data scientists Innovative software, platform, and infrastructure Privacy protection Algorithms Tools Infrastructure Policies Federatio n Develop. Platform Research Privacy Knowledg Knowledge e & Tools & Tools CI Policie s Sharing Consent Data Services Hosting Aggreg. Sensors Clinical Genomic Exec. WWW Apps
Ohno-Machado L. To Share or Not To Share: That Is Not the Question. Science Translational Medicine, 2012 4(165) Supported by the NIH Grant U54 HL108460 to the University of California, San Diego Big Human Subjects Data Cyberinfrastructure commons differential privacy homomorphic encryption secure multiparty computation
Shifting control Patient Interface Consent Management System Healthcare Institutions Do I wish to disclose data D to U? Yes Trusted broker Patient I Sharing Look-up Data use agreements Study registry User U requests Data D on individual I I can check that U looked at my data D 05/6/2015
informed CONsent for Clinical data Use for Research iconcur Patient I Consent Management System Sharing Look-up Registry Connecting Software Trusted Entity Clinical Data Warehouse Concierge or Automated Services Query Results User U Electronic Health Record 18 Healthcare Institution
Ohno-Machado L. To Share or Not To Share: That Is Not the Question. Science Translational Medicine, 2012 4(165) Supported by the NIH Grant U54 HL108460 to the University of California, San Diego Big Human Subjects Data Cyberinfrastructure commons differential privacy homomorphic encryption secure multiparty computation
Demand Resources On-demand Virtualized Elastic Resilient Compute And Storage Technology AUTOMATE Compute D nodes Memory Disk storage Networking Powered by VMware compute request, direct upload & download of proprietary data, tool, recipe Data Tools Recipes upload & download data middleware and HIPAA security developed by idash Safe HIPAA-compliant Annotated Data deposit box Environment 05/6/2015 HIPAA and non-public data Powere d by MIDAS public data, tools, recipes
HIPAA-Private Cloud 3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth. Remote data replication 800+ cores 7TB+ RAM 600TB+ storage More planned: 600+ cores 4TB+ RAM 200TB+ storage
Ohno-Machado L. To Share or Not To Share: That Is Not the Question. Science Translational Medicine, 2012 4(165) Supported by the NIH Grant U54 HL108460 to the University of California, San Diego Big Human Subjects Data Cyberinfrastructure commons differential privacy homomorphic encryption secure multiparty computation
UC Clinical Data Warehouses Funded by the UC Office of the President to the NIH-funded CTSAs UL1TR000100 Integration of Clinical Data Warehouses from 5 University of California Medical Centers and affiliated institutions (>12 million patients) Aggregate results Individual-level patient data accessible according to data use agreements and, if required, IRB approval Objectives Monitor patient safety Improve outcomes Promote research
UC-Research exchange Researcher wants data about population Registries, other DBs UCSD (Epic) UC Irvine (Allscripts) Community Partners UC Davis (Epic) UCSF (Epic) Data Matching Function: Map D onto data dictionaries, use Data Warehouses derived from EHR Request for Data on Cohort Results on Harmonized Data Even if data are stored in the same Electronic Health Record system, they need to be harmonized
Big Healthcare Data, Big Clinical Sequencing Data User requests data for Quality Improvement or Research Trusted Broker(s) Identity & Trust Management Policy enforcement AHRQ R01HS19913 Clinical Data Network Diverse Healthcare Entities in 3 different states (federal, state, private) How many patients over 65 are on Warfarin or Dabigatran? What are the major and minor bleeding rates for patients on these drugs, adjusted for co-morbidities? Which CYP2C9 variants are most influential? Kim K, et al. Development of a Privacy and Security Policy Framework for a Multi-State Comparative Effectiveness Research Network. Medical Care 2013 Jiang X et al. Privacy Technology to Support Data Sharing for Comparative Effectiveness Research: A Systematic Review. Medical Care 2013 Kim KK et al. Data Governance Requirements for Distributed Clinical Research Networks: Triangulating Perspectives of Diverse Stakeholders. JAMIA 2014
Privacy-preserving distributed computing User requests data for Quality Improvement or Research Trusted Broker(s) Identity & Trust Management Policy enforcement Security Entity Meeker et al. A System to Build Distributed Multivariate Models and Manage Disparate Data Sharing Policies: Implementation in the Scalable National Network for Effectiveness Research. JAMIA (accepted) Kim K et al. Comparison of Consumers Views on Electronic Data Sharing for Healthcare and Research. JAMIA (accepted) AHRQ R01HS19913 / EDM forum Scalable National Network for Comparative Effectiveness Research
Privacy-preserving distributed computing User requests data for Quality Improvement or Research Diverse Healthcare Entities in 3 different states (federal, state, private) Trusted Broker(s) Identity & Trust Management Policy enforcement Security Entity AHRQ R01HS19913 / EDM forum Scalable National Network for Comparative Effectiveness Research Wu Y et al. Grid Binary LOgistic REgression (GLORE): Building Shared Models Without Sharing Data. JAMIA, 2012 Wang S et al. EXpectation Propagation LOgistic REgRession (EXPLORER): Distributed Privacy-Preserving Online Model Learning. J Biomed Inf 2013 Jiang W et al.. WebGLORE: A Webservice for Grid Logistic Regression. Bioinformatics 2014 Wu Y et al. Grid Multi-Category Response Logistic Models. BMC Med Inform Dec Making 2015
pscanner patient-centered SCAlable National Network for Effectiveness Research 21 M people: 5 UCs, National VA (VINCI resource), 3 Federally Qualified Health Systems in LA (USC), RAND 3 Cohorts: Kawasaki Disease, Congestive Heart Failure, Obesity New Partners (~9 M people) Cedars-Sinai USC Keck / LA Children s San Mateo Med Ctr Intermountain Healthcare U Colorado U Washington-led WWAMI practice-based network SAFTiNet U Vermont FQHCs TCC: The Children s Clinic VM: Virtual Machine SFGH: San Francisco General Hospital OMOP: Observational Medical Outcomes Partnership
Acknowledgements Vineet Bafna Tyler Bath Jane Burns Michele Day Robert El-Kareh Claudiu Farcas Olivier Harismendy Chun-nan Hsu Zhanglong Ji Jihoon Kim Eric Levy Kevin Patrick Elena Martinez Greg Talavera Staal Vinterbo Shuang Wang Wei Wei Zia Agha Lori Daniels Jason Doctor Scott Duvall Fern Fitzhenry Pietro Galassetti Michael Hogarth Kathy Kim Cleo Maehara Michael Matheny Daniella Meeker Jonathan Nebeker Dena Rifkin Carl Stepnowski Howard Taras Adriana Tremoulet Mary Whooley UC-ReX Nick Anderson Lattice Armstead Doug Bell Lisa Dahm Leslie Yuan iconcur Elizabeth Bell Dexter Friedman Rita Germann-Kurtz Xiaoqian Jiang Ian Komenaka Paulina Paul Joe Ramsdell Richard Schwab Amy Sitapati CTRi Tony Chen Daniel Clark Jim Graczik Mike Kalichman Antonios Koures Narimene Lekmine Carol Johnson Gary Firestein PhenDISCO Son Doan Hyeon-eui Kim Ko-wei Lin Hua Xu UCSD Health System Wolf Dillmann Cleo Maehara Maryann Martone Hua Xu Cui Tao Todd Johnson Peter Rose Ricky Taira NIH U54HL108460 UL1TR000100 UH3HL108785 R01LM011392 R21LM012060 K99HG008175 T15LM011271 PCORI CDRN-1306-04819 AHRQ R01HS019913 UCOP
2008-2009 2010-2011 2012-2013 2014-2015 Clinical Data Research Data Applications Integration Personnel Systems Active Directory Electronic Health Record System Epic & Clarity HIPAA Healthcare Clinical Data Clinical Data Warehouse for Research Query Tools UC-ReX Explorer Privacy Technology UCSF Davis Irvine UCLA USC/LAC Recruitment Consent tools Custom Apps pscanner PCORI CDRN Cedars Sinai Epic CDDS New modules San Mateo Other Systems PACS, lab, etc Clinical Research Data RedCAP Velos Other DBs Scalable Network (Distributed Analytics Tools) idash HIPAA/ FISMA OVERCAST idash, CTRI, School of Medicine VA LA Clinics External data (patient reported data) UCSD s Journey idash HIPAA SHADE Images, human genomes, etc iconcur Text Mining / NLP Cancer Genomics Analytical Tools UC-ReX idash Cloud PCORI contracts Accrual for Clinical Trials FISMA De-ID Tools SCANNER K22, K99 pscanner BRIGHT PhenDISCO NLM Training Grant biocaddie