1R01HG0007078: Privacy-Preserving Sharing and Analysis of Human Genomic Data. XiaoFeng Wang and Haixu Tang, IUB

Similar documents
NSF Workshop on Big Data Security and Privacy

Differential privacy in health care analytics and medical research An interactive tutorial

EFFECTIVE DATA RECOVERY FOR CONSTRUCTIVE CLOUD PLATFORM

Associate Prof. Dr. Victor Onomza Waziri

Research Data Networks: Privacy- Preserving Sharing of Protected Health Informa>on

Cloud-Based Big Data Analytics in Bioinformatics

preliminary experiment conducted on Amazon EC2 instance further demonstrates the fast performance of the design.

Homomorphic Encryption Schema for Privacy Preserving Mining of Association Rules

Cloud Computing. Key Initiative Overview

Arnab Roy Fujitsu Laboratories of America and CSA Big Data WG

Top Ten Security and Privacy Challenges for Big Data and Smartgrids. Arnab Roy Fujitsu Laboratories of America

Societal benefits vs. privacy: what distributed secure multi-party computation enable? Research ehelse April Oslo

Technical Approaches for Protecting Privacy in the PCORnet Distributed Research Network V1.0

Incent Perform Grow. Forecasting, Budgeting and Accruing Bonuses AN INCENTIVE COMPENSATION IMPERATIVE

A Privacy-preserving Approach for Records Management in Cloud Computing. Eun Park and Benjamin Fung. School of Information Studies McGill University

Executive Summary BIG DATA Future Opportunities and Challenges for the German Industry

Survey on Efficient Information Retrieval for Ranked Query in Cost-Efficient Clouds

Secure Collaborative Privacy In Cloud Data With Advanced Symmetric Key Block Algorithm

Privacy Preserving Public Auditing for Data in Cloud Storage

Consor;um (partners) ARES conference Toulouse, 24 August 2015

DEVELOPING THE SCIENCE OF PRIVACY IN SUPPORT OF THE ART OF PRIVACY. NSA Civil Liberties & Privacy Office Feb 2015

NIH s Genomic Data Sharing Policy

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

GENETIC DATA ANALYSIS

A NOVEL APPROACH FOR MULTI-KEYWORD SEARCH WITH ANONYMOUS ID ASSIGNMENT OVER ENCRYPTED CLOUD DATA

Arnab Roy Fujitsu Laboratories of America and CSA Big Data WG

Enabling the Big Data Commons through indexing of data and their interactions

Privacy-preserving Data-aggregation for Internet-of-things in Smart Grid

Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects

Is Grid or Cloud Computing Suitable for Normalization of Microarray Data and Resampling Approaches for SNP Data?

A Q&A with the Commissioner: Big Data and Privacy Health Research: Big Data, Health Research Yes! Personal Data No!

Secure Computation Martin Beck

NIST Big Data Public Working Group

Security and Privacy in Big Data, Blessing or Curse?

How To Write A Cybersecurity Framework

Big Data Analytics and Healthcare

PRIVACY-PRESERVING DATA ANALYSIS AND DATA SHARING

Privacy by Design. Ian Brown, Prof. of Information Security and Privacy Oxford Internet Institute, University of

Cost of Poor Quality:

Ensuring Integrity in Cloud Computing via Homomorphic Digital Signatures: new tools and results

SAP HANA Enabling Genome Analysis

An Efficient Multi-Keyword Ranked Secure Search On Crypto Drive With Privacy Retaining

Information Security in Big Data: Privacy and Data Mining (IEEE, 2014) Dilara USTAÖMER

Data Outsourcing based on Secure Association Rule Mining Processes

South East of Process Main Building / 1F. North East of Process Main Building / 1F. At 14:05 April 16, Sample not collected

Cybersecurity Framework. Executive Order Improving Critical Infrastructure Cybersecurity

Big Data - Security and Privacy

Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the

Towards Privacy aware Big Data analytics

PRIVACY-PRESERVING PUBLIC AUDITING FOR SECURE CLOUD STORAGE

UCD International UCD IT Services. IT12 - Device Security & Encryption Strategy. Project Initiation Document. Version nd December 2014

Concept and Project Objectives

CiteSeer x in the Cloud

HPC Cloud Computing with OpenNebula

Near Sheltered and Loyal storage Space Navigating in Cloud

Efficient Similarity Search over Encrypted Data

COLLEAGUES. CLIENTS. CONNECTED. CLOUD.

A Survey on Security Issues and Security Schemes for Cloud and Multi-Cloud Computing

Advantages and Drawbacks of Developing Mobile Health Technologies in the Cloud

Vs Encryption Suites

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October-2013 ISSN

Cloud and Mobile Computing

DATA MINING - 1DL360

SECURITY FOR ENCRYPTED CLOUD DATA BY USING TOP-KEY TREE TECHNOLOGIES

Privacy and Security in Cloud Computing

Secure Enterprise Mobility Management. Cloud-Based Enterprise Mobility Management. White Paper: soti.net

Cloud Governance is more than Security. Cloud Law or Legal Cloud?

Cloud Technology Influence on Testing Testing and new found challenges that come with cloud technologies.

Security for Cloud & Big Data

An Efficiency Keyword Search Scheme to improve user experience for Encrypted Data in Cloud

Secure Data Sharing and Processing in Heterogeneous Clouds. Bojan Suzic, Graz University of Technology

TERENA Trusted Cloud Drive

University Uses Business Intelligence Software to Boost Gene Research

AN IMPROVED PRIVACY PRESERVING ALGORITHM USING ASSOCIATION RULE MINING(27-32) AN IMPROVED PRIVACY PRESERVING ALGORITHM USING ASSOCIATION RULE MINING

A SECURE DECISION SUPPORT ESTIMATION USING GAUSSIAN BAYES CLASSIFICATION IN HEALTH CARE SERVICES

Secure semantic based search over cloud

IEEE JAVA Project 2012

Healthcare data analytics. Da-Wei Wang Institute of Information Science

Facilitating Efficient Encrypted Document Storage and Retrieval in a Cloud Framework

Risk based monitoring using integrated clinical development platform

SOME SECURITY CHALLENGES IN CLOUD COMPUTING. Hoang N.V.

Delivering the power of the world s most successful genomics platform

Microsoft Cloud Services and Dynamics CRM Online

DATAOPT SOLUTIONS. What Is Big Data?

Privacy-Preserving Distributed Encrypted Data Storage and Retrieval

Privacy-Preserving Data Exploration in Genome-Wide Association Studies

How To Create A Multi-Keyword Ranked Search Over Encrypted Cloud Data (Mrse)

International Journal of Advanced Computer Technology (IJACT) ISSN: PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS

Why SAAS makes sense: The benefits of Cloud Computing for Archiving

Understanding Vendor Risk And Analyzing the SSAE No. 16

Big Data, Big Risk, Big Rewards. Hussein Syed

Ammar Alkassar Sirrix AG Trusted Clouds: Chances for Security, Resilience and Scalability

Training for de-identifying human subjects data for sharing: a viable library service

EFFICIENT AND SECURE DATA PRESERVING IN CLOUD USING ENHANCED SECURITY

PRISMACLOUD. Privacy and Security Maintaining Services in the Cloud Thomas Loruenser AIT Austrian Institute of Technology GmbH

A Multi-locus Genetic Risk Score for Abdominal Aortic Aneurysm

Anonymization of Administrative Billing Codes with Repeated Diagnoses Through Censoring

Multi-Factor Authentication: All in This Together

Network Communications System. Redefining Intelligent Utility Communications

June 5, 2013 Ken Klingenstein. Identity Management, the Cloud, NSTIC and Accessibility

Transcription:

1R01HG0007078: Privacy-Preserving Sharing and Analysis of Human Genomic Data XiaoFeng Wang and Haixu Tang, IUB

Project Objectives Study of Scalable, Privacy-Preserving Data Analysis, particular those for public clouds Study of Privacy-Preserving Data Dissemination techniques for data sharing through NIH data centers

Project Progress Summary Secure Data Analysis Improving our privacy-preserving read-mapping techniques Exploring use of the technique for microbial filtering Secure Data Sharing Development of privacy-preserving data selection techniques Identification of the most promising way for supporting privacy-preserving GWAS studies Organization of the first Critical Assessment of Data Privacy and Protection (CADPP) Others Evaluating privacy risks in releasing clinical proteomic data Survey study on human genome privacy and building of a portal to support follow-up research

Hybrid-cloud Secure Data Analysis Partitions of the computation on privacy and public Cloud to support secure computation outsourcing Private Cloud For microbial filtering, Public Cloud We found that the computation can be performed almost entirely in the public cloud on encrypted data.

Data Analysis and Dissemination Services Programs Results

Technical Highlights LD-based noise adding technique Dimension reduction Noise adding that preserves utility in pilot data Utility assessment mechanism For identification of the data source most likely to have usable data Evaluation Use 4 popular association tests For releasing a locus with 180 SNPs In the vast majority cases, the user can find most useful dataset, without leaking out any information Product A paper is being considered by JAMIA (revision submitted).

The CADPP Competition Evaluate how effective the best security technologies could be in protecting patient privacy and preserving data utility The first challenge focuses on the tasks for sharing aggregate SNP data (allele frequencies) for GWAS studies

Research Tasks Development of techniques for analyzing the level of information leaks from computation results Preliminarily explore the approach for secure data disseminations

Teams and Tasks 6 teams U. Oklahoma UT Dallas McGill University CMU UT Austin IU (Baseline) Scenarios: Privacy Protection for GWAS Task 1: raw data sharing Task 2: outcome release

What has been Learnt Task 1 It remains a challenge to privacy-preserved sharing of aggregate human genomic data, while maintaining their utilities in genome-wide association studies (GWAS). Even for a single genomic locus involving a few hundreds of SNPs, the utility of the data was large damaged after noise-adding to ensure privacy protection It is un-likely that current privacy-preserving techniques will scale well for sharing whole human genomic data Task 2 Privacy-preserving techniques work surprisingly well on publishing outcomes of GWAS-like analyses High accuracy can be achieved when only a small number of most significant SNPs are concerned from the users perspective This task is well aligned with the centralized data/computing model The centralized data/computing center will host human genomic data as well as service for customized analyses on these data, and will only release the results of these analyses to users We encourage the community to improve the approaches to this task!

Other Outcomes BMC special issue on Human Genome Privacy Development of a web service for automatic evaluation of privacy-preserving GWAS techniques (https://humangenomeprivacy.ucsd-dbmi.org/). Competition results are made public: http://www.humangenomeprivacy.org/

Risks in releasing clinical proteomic data Presence of Identifiable information in clinical proteomic data The risks could be mitigated through pre-processing, removing such data Further study is needed to understand the privacy/utility balance

Dissemination of Research Outcomes Organization of the 1 st Workshop on Genome Privacy (GenoPri): July 15, Amsterdam Review paper on Privacy Risk and Mitigation Techniques on Genomic Research http://arxiv.org/abs/1405.1891

Next Steps Privacy-preserving data analysis Complete the development of read-mapping system Further study on microbial filtering Research on other crypto techniques for genomic data analysis on the public cloud Privacy-preserving data sharing Improve the scalability of data-selection techniques Analyze risks of information leaks on the data center Organization of the 2 nd CADPP competition with idash A tentative topic is to evaluate the performance of techniques for data analysis on encrypted data (e.g., homomorphic encryption, secure multi-party computation, etc.)