Steven Newhouse, Head of Technical Services



Similar documents
Big Data in BioMedical Sciences. Steven Newhouse, Head of Technical Services, EMBL-EBI

Three data delivery cases for EMBL- EBI s Embassy. Guy Cochrane

Community vs. Commodity building a community cloud infrastructure

TRANSFORMING DATA PROTECTION

Optimizing Data Center Networks for Cloud Computing

Keystones for supporting collaborative research using multiple data sets in the medical and bio-sciences

How To Get Atos Paas For Free

CEDA Storage. Dr Matt Pritchard. Centre for Environmental Data Archival (CEDA)

Service description RFL Virtual Data Centre

Virtual Server and Storage Provisioning Service. Service Description

Cloud Services. Talk to us about LETN Cloud Services today At the core of LETN Cloud Services: Infrastructure as a Service / IaaS

The NREN s core activities are in providing network and associated services to its user community that usually comprises:

Trials community. Yannick Legré. EGI InSPIRE RI

Experiences and challenges in the development of the JASMIN cloud service for the environmental science community

UK-Cambridge: servers 2011/S Contract notice. Supplies

Lecture 02a Cloud Computing I

Big Data and the Earth Observation and Climate Modelling Communities: JASMIN and CEMS

IBM Spectrum Protect in the Cloud

VMware vcloud Automation Center 6.1

Bridging the gap between local IT and Cloud services, keeping you in control

Virtualizing Apache Hadoop. June, 2012

Implementing Multi-Tenanted Storage for Service Providers with Cloudian HyperStore. The Challenge SOLUTION GUIDE

Cloud Services. May 28 th, 2014 Athens, Greece

Restricted Document. Pulsant Technical Specification

BT Ireland and the Cloud

Re-Invent Your Recovery

Scientific Cloud Computing Infrastructure for Europe Strategic Plan. Bob Jones,

vcloud Air Disaster Recovery Technical Presentation

RE Cloud Infrastructure as a Service

Documentum Document Management in the Cloud Service Definition

EMC BACKUP-AS-A-SERVICE

G-Cloud 6 brightsolid Secure Cloud Servers. Service Definition Document

Long term retention and archiving the challenges and the solution

Going Hybrid. The first step to your! Enterprise Cloud journey! Eric Sansonny General Manager!

場次: Track B-2 公司名稱: EMC 主講人: 藍基能

Interoute Virtual Data Centre. Hands on cloud control.

Open Source Sales Force Automation (SFA) in the Cloud SaaS

Testing ARES on the GTS framework: lesson learned and open issues. Mauro Femminella University of Perugia

Cloud and Virtualization to Support Grid Infrastructures

A Guide to Hybrid Cloud An inside-out approach for extending your data center to the cloud

CompTIA Cloud+ 9318; 5 Days, Instructor-led

CompTIA Cloud+ Course Content. Length: 5 Days. Who Should Attend:

Backup and Recovery Solutions for Exadata. Ľubomír Vaňo Principal Sales Consultant

Hardware/Software Guidelines

VMware vcloud Automation Center 6.0

EMC DATA DOMAIN OPERATING SYSTEM

EMC DATA DOMAIN OPERATING SYSTEM

VMware vcloud Air - Disaster Recovery User's Guide

Backup of NAS devices with Avamar

Cloud-integrated Storage What & Why

SQL Server High Availability: After Virtualization SQL PASS Virtualization Virtual Chapter September 11, 2013

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

Protect Data... in the Cloud

Dimension Data Enabling the Journey to the Cloud

VMware vrealize Automation

Federated Application Centric Infrastructure (ACI) Fabrics for Dual Data Center Deployments

Virtual Machine Management with OpenNebula in the RESERVOIR project

VMware vrealize Automation

VMUG - vcloud Air Deep Dive VMware Inc. All rights reserved.

ADDENDUM 1 September 22, 2015 Request for Proposals: Data Center Implementation

G-Cloud Service Definition. Atos Infrastructure as a Service (IL3) for Cloud IaaS

(Scale Out NAS System)

Backup and Recovery Solutions for Exadata. Cor Beumer Storage Sales Specialist Oracle Nederland

Getting Familiar with Cloud Terminology. Cloud Dictionary

Cisco Wide Area Application Services Optimizes Application Delivery from the Cloud

An Introduction to Virtualization and Cloud Technologies to Support Grid Computing

SURFsara HPC Cloud Workshop

Infrastructure as a Service (IaaS)

Introduction to Cloud : Cloud and Cloud Storage. Lecture 2. Dr. Dalit Naor IBM Haifa Research Storage Systems. Dalit Naor, IBM Haifa Research

Performance Testing of a Cloud Service

A Guide to Hybrid Cloud An inside-out approach for extending your data center to the cloud

CSC GOVCLOUD MULTI-TENANT IAAS

Business applications:

WebFOCUS Cloud Express. The WebFOCUS Cloud Express service is delivered as a managed G-Cloud service by Amtex Solutions Ltd.

C a r l G o e t h a l s T e r r e m a r k E u r o p e. C a r l. g o e t h a l t e r r e m a r k. c o m

DataCentred Cloud Compute - Powered By OpenStack

Enterprise Cloud Solutions

custom hosting for how you do business

THOUGHT LEADERSHIP. Journey to Cloud 9. Navigating a path to secure cloud computing. Alastair Broom Solutions Director, Integralis

Transcription:

Challenges at EMBL-EBI Steven Newhouse, Head of Technical Services

European Bioinformatics Institute Outstation of the European Molecular Biology Laboratory International organisation created by treaty (cf CERN, ESA) 20 year history of service provision and scientific excellence EMBL-EBI has 550+ Staff & 50 Million Budget Provide services to a wide range of users using an easyas-possible usage model Thin-client model Web browser & web services Equivalent to SaaS 2

The Challenge Facing Bioinformatics Volume and variety of genomic data expanding at EBI doubling every year - replication is challenging >45,000 Cores & 46PB (but need more!) EMBL-EBI Provides Access to both public and managed access data sets Web and programmatic access to services (3M unique users) Challenge is to support complex analysis Bespoke workflows and tools across a variety of domains Increasingly issues with moving data 3

10 Projected Hardware Requirements and Cost ( M) 9 8 7 6 In 2020: Storage: 1,920PB Cores: 230,000 5 4 3 2 1 0 1 2 3 4 5 6 7 Expected Growth DCC Baseline Spend 4

Impact on EMBL-EBI s Infrastructure Grow the capacity of the current data centres Commodity infrastructure blades and NAS (50 100 racks) RDBMS and SAN for high throughput transaction processing Tape backup is no longer feasible Provide a resilient topology by geographical separation Against local & regional disaster in the UK Against national disaster through international collaboration How to enable science on big data? 5

What are the Challenges? Storing the Analysing the 6

Overview EMBL-EBI IT infrastructure Published COMP DBs standby SAN storage Mirrored Servers LAN network NAS storage Flint Cross Disaster Recovery centre WEB Productio n COMP LAN network NAS storage Production Area DBs SAN storage DBs Hinxton Production centre COMP LAN network NAS storage Staging Area to be released WEB DBs SAN storage Power Gate Tier III London centre Published DBs SAN storage LAN network NAS storage COMP LAN network NAS storage Oliver's Yard Tier III London centre centre virtualised throughout with VMWare WEB WEB Global Server Load Balancer 7

Centralization & specialization is submitted to specialized centralized repositories. Current situation. 8 production centralization

Federation If data gets bigger, the data might have to stay where it is produced. We might have to provision data producers with storage and computation. might be pulled instead of pushed into centralized repositories. 9 production centralization

So what does such a change mean? volumes prohibit casual download Difficult to replicate data for local workflows Need to move computation to where the data is located All data is not going to be available centrally Need to federate data to get a global view Need to move computation to where the data is located Computational capacity may not be near the data Move the data to where there is computational capacity Policy driven data replication 10

EMBL-EBI Embassy Cloud Pilot service hosted at EMBL-EBI data centres Logically isolated outside EMBL-EBI s LANs Secure flexible infrastructure for both tenant and host File based access to EMBL-EBI s data sets Currently, only the 1000 Genomes dataset exposed Academic and commercial users of EMBL-EBIs big data Undertaking their analysis with their data Resources exposed using VMware s vcloud Director Provides IaaS web management interface for tenants 11

Why Embassy Cloud? An embassy is sovereign territory in a host country Host Country: EMBL-EBI Centre Sovereign Territory: Host Country not allowed to enter Virtualisation provides the protection for tenant and host Host puts boundaries in place to protect it from the tenant Tenant has freedom and control within those boundaries Added value from EMBL-EBI over other clouds: Machines and data hosted in known jurisdiction File access to hosted data sets (public & managed access) Direct network access to public EMBL-EBI services 12

Embassy Cloud Internet EMBL-EBI Firewall Global Load Balancer EBI Services & bases Embassy Cloud Exposed Resources 13

Moving Bytes Needed to: Move data between sites Move virtual machine images to the data Exploring the use of Globus Online GridFTP at EMBL-EBI and CSC Exploit existing light path Expose public and private data for download Issues: None at the moment 14

Enlighten Your Research (GEANT) Explore cross-site VM operation using light-paths Sites in NL, UK & FI Provision networks on demand Use Case: Analysis needs significant resources and data Moving beyond the scope of local clusters Goal: Distribute analysis and data over multiple clouds Activity since November 2013: Liaising with sites and NRENs for bandwidth on demand CSC & EMBL-EBI using existing light-path and different data movement protocols

Cross site VM Operation CSC ENA 3.2PB EMBL-EBI VM Janet Funet 1GB lightpath VM Computation Analysis tools Chipster 200GB 1GB lightpath 1GB lightpath SURFnet Analysis tools VM NBIC Galaxy 50GB GoNL 60TB University of Groningen 16

Other Cloud Activity at EMBL-EBI Use Amazon to provide geographical distribution Direct link to globally replicate databases HelixNebula Integration of commercial cloud providers with big research Benefit of additional security assurances For use by pharmaceutical companies For on-demand personalised medicine Explore using IaaS to supplement/replace data centres Put DC on cloud, scale out services (service + database), etc. 17

The Future Private Analysis Public Service Integrating Platform (Deal with discovery, provision & placement) EMBL-EBI IT (Services, Research, Clusters) Virtualised Infrastructure Virtualised Infrastructure Elixir Community Services Virtualised Infrastructure Elixir Service Storage Compute Cloud Providers Storage Compute EMBL-EBI Centre Storage Compute Infrastructure Provider Provider Geant Network

The Future Exploitation by Elixir An e-infrastructure for Life Science Understand key issues Replicate datasets GÉANT, DANTE, EGI.eu, PRACE, etc Portable VMI Repository and execution Providing secure isolated IaaS Federating IaaS resources from Elixir, EGI, HN, 19

Any questions? Contact Points steven.newhouse@ebi.ac.uk embassycloud@ebi.ac.uk Acknowledgements Andy Cafferkey Rafael Jimenez Pete Jokinen EMBL-EBI Systems Team 20