BiobankCloud a PaaS for Biobanking



Similar documents
CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

What is Cloud Computing? Tackling the Challenges of Big Data. Tackling The Challenges of Big Data. Matei Zaharia. Matei Zaharia. Big Data Collection

Certified Cloud Computing Professional VS-1067

CloudCmp:Comparing Cloud Providers. Raja Abhinay Moparthi

The Private Cloud Your Controlled Access Infrastructure

Sharing Files Using Cloud Storage Services

Attacking the Biobank Bottleneck

StorPool Distributed Storage. Software-Defined. Business Overview

Cloud Computing Technology

Building Out Your Cloud-Ready Solutions. Clark D. Richey, Jr., Principal Technologist, DoD

Flash Research Assignment: Virtualization and Cloud Computing

How To Understand Cloud Computing

Introduction to Cloud : Cloud and Cloud Storage. Lecture 2. Dr. Dalit Naor IBM Haifa Research Storage Systems. Dalit Naor, IBM Haifa Research

CLOUD COMPUTING. When It's smarter to rent than to buy

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

BIG DATA USING HADOOP

A.Prof. Dr. Markus Hagenbuchner CSCI319 A Brief Introduction to Cloud Computing. CSCI319 Page: 1

Electronic Records Storage Options and Overview

Cloud 101. Mike Gangl, Caltech/JPL, 2015 California Institute of Technology. Government sponsorship acknowledged

Data Centers and Cloud Computing. Data Centers. MGHPCC Data Center. Inside a Data Center

Secure Framework for Data Storage from Single to Multi clouds in Cloud Networking

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix

Cloud Computing. Chapter 4 Infrastructure as a Service (IaaS)

Data Centers and Cloud Computing. Data Centers

Understanding Object Storage and How to Use It

Research Paper Available online at: A COMPARATIVE STUDY OF CLOUD COMPUTING SERVICE PROVIDERS

Ch. 4 - Topics of Discussion

Security management in the internet era

Data Centers and Cloud Computing

IBM Spectrum Protect in the Cloud

Application Development. A Paradigm Shift

What Is The Cloud And How Can Your Agency Use It. Tom Konop Mark Piontek Cathleen Christensen

SCFS: A Shared Cloud-backed File System

Cloud Computing Summary and Preparation for Examination

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

What Is It? Business Architecture Research Challenges Bibliography. Cloud Computing. Research Challenges Overview. Carlos Eduardo Moreira dos Santos

Cloud Computing An Elephant In The Dark

Apache Hadoop. Alexandru Costan

Emerging Technology for the Next Decade

SERVER 101 COMPUTE MEMORY DISK NETWORK

Cloud Computing. Cloud computing:

Cloud Courses Description

Moving to the Cloud. Sam Hornstein Jetline Jason Nokes President, Distributor Central Garrett Ausfeldt Starline

BIG DATA TRENDS AND TECHNOLOGIES

BlobSeer: Towards efficient data storage management on large-scale, distributed systems

Cloud Computing Is In Your Future

Kroll Ontrack VMware Forum. Survey and Report

Ali Eghlima Ph.D Director of Bioinformatics. A Bioinformatics Research & Consulting Group

Large scale processing using Hadoop. Ján Vaňo

COST VS. ROI Is There Value to Virtualization and Cloud Computing?

Cloud Computing Now and the Future Development of the IaaS

Cloud Courses Description

Introduction to Cloud Computing

Hadoop. Sunday, November 25, 12

SaaS Security for the Confirmit CustomerSat Software

Data Centers and Cloud Computing. Data Centers

Why Private Cloud? Nenad BUNCIC VPSI 29-JUNE-2015 EPFL, SI-EXHEB

Cloud Infrastructure as a Service Market Update, United States

Virtualization and Cloud Computing

Cloud Federation to Elastically Increase MapReduce Processing Resources

The Inside Scoop on Hadoop

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Introduction to Cloud Computing

Chapter 7. Using Hadoop Cluster and MapReduce

Hadoop implementation of MapReduce computational model. Ján Vaňo

Virtualizing Apache Hadoop. June, 2012

Cloud Computing 101 Dissipating the Fog 2012/Dec/xx Grid-Interop 2012

Virginia Government Finance Officers Association Spring Conference May 28, Cloud Security 101

A PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM

OTM in the Cloud. Ryan Haney

Cloud computing and SAP

Session 5. Mixing and matching Public, Private and Hybrid Clouds for maximum benefits

5-Layered Architecture of Cloud Database Management System

Knowledge Management in Cloud Using Hadoop

Integrating cloud services with Polaris. Presented by: Wes Osborn

Microsoft Azure Cloud on your terms. Start your cloud journey.

Daren Kinser Auditor, UCSD Jennifer McDonald Auditor, UCSD

24/11/14. During this course. Internet is everywhere. Frequency barrier hit. Management costs increase. Advanced Distributed Systems Cloud Computing

Cloud Computing with Microsoft Azure

Zadara Storage Cloud A

Jim Dowling KTH Royal Institute of Technology, Stockholm SICS Swedish ICT CSHL Meeting on Biological Data Science, 2014

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

BEDIFFERENT A C E I N T E R N A T I O N A L

White Paper How Noah Mobile uses Microsoft Azure Core Services

How To Compare Cloud Computing To Cloud Platforms And Cloud Computing

Creating a Cloud Backup Service. Deon George

Hadoop in the Hybrid Cloud

Transcription:

The PaaS for Biobanking A Platform-as-a-Service as a for Biobanking www.biobankcloud.com b Presented by: Paulo Esteves Veríssimo Univ. de Lisboa, Faculdade de Ciências (FCUL), LaSIGE, Portugal, pjv@di.fc.ul.pt http://www.di.fc.ul.pt/~pjv 64th IFIP WG 10.4 Meeting, Visegrad - Hungary June 27-30, 2013 Financed by the European Commission 7 th Framework Programme. BiobankCloud a PaaS for Biobanking Duration: Dez 2012-2015 Budget: ~3Mio Euros Goal Provide a viable platform-as-a-service (PAAS) for Biobanking to support the secure storage, analysis, and sharing of genomic data. Research Challenges Design and build services for cloud platforms that securely store and analyze sensitive personal data, while conforming to a regulatory framework.

Genomics and Big Data We will soon be generating more genomic data than we can currently securely store and process Handling terabytes of information has become the norm for genomics, but the trend is to scale up. The use of large genomic data sets raises systems research challenges concerning the secure efficient storage and analysis of such data. Genomics Big Data A scent of the future trends.

The cost per ger genome is bound to decrease Cost per genome vs. Moore s Law

Growth in NGS capability vs. Kryder s law Growth in disk storage capacity vs. growth in NGS data generation capability Current reality already significant.

Increasing number of research databases LifeGene: ~500,000000 people

Sweden has 73 Opt-Out Out Quality Registries http://www.kvalitetsregister.se/om_kvalitetsregister/quality_registries A Coalition of the Willing In Europe, BBMRI is a trans-national network of Biobanks with the goal of sharing study data for research

Storing and computing with this data Scalable storage on commodity hardware Many genomics research groups still run expensive SANs (storage area networks) or file systems that require expensive interconnects. My data please Reliable Storage Service SAN Unreliable Commodity Servers

Immutable Genomic Data Mutable Meta-Data Immutable Genomic Data BiobankCloud PaaS

But we can t let the data leave our site. You buy the NGS machines You might want to use affordable compute and storage but you have to keep your genomic data on-site. ACME Web Services European Data Protection ti Directive [Taken from Illumia Basespace] What is the real privacy?

Data Leakage Ponemon Institute 3rd Annual benchmark study on patient privacy and data security - 80 health care organisations Proposed solutions

Overbank Biobank A Biobank C Biobank B Rackspace Files Google Cloud Storage Windows Azure Blob Amazon S3 Dec. 3-4, 2012 23 Six Research Ideas Wide-area state machine rep. Consolidated d Cloud-of-clouds storage Multi-Cloud MapReduce Network coding Software-Defined fi Networking Secure two-party computation Pure Research Dec. 3-4, 2012 24

Wide-Area Replication Consistent and secure replication across datacenters Dec. 3-4, 2012 25 Biobank C Cloud-of-Clouds Storage Google Cloud Not everything can go to public clouds Even if we encrypt and disperse the data, there are legal limitations But a lot of data can! TClouds work: DepSky, C2FS (file system) Rackspace Storage Can be improved for our Files Amazon S3 workloads Windows Azure Blob Database-like system Dec. 3-4, 2012 26

Multi-Cloud MapReduce What if you want to do some analysis but you need data from other Biobanks? M MM R RR 1. Not allowed! 2. moving data to the processing nodes is against what made MapReduce so cool! Dec. 3-4, 2012 27 Multi-Cloud MapReduce Move computation, not data, without privacy violations sandboxing wide-area job tracker certified library of map and reduce functions Sandbox JobTracker M M R Sandbox Certified functions M R M R Dec. 3-4, 2012 28

Thank you! Tack! Danke! Obrigado!