Challenges at EMBL-EBI Steven Newhouse, Head of Technical Services
European Bioinformatics Institute Outstation of the European Molecular Biology Laboratory International organisation created by treaty (cf CERN, ESA) 20 year history of service provision and scientific excellence EMBL-EBI has 550+ Staff & 50 Million Budget Provide services to a wide range of users using an easyas-possible usage model Thin-client model Web browser & web services Equivalent to SaaS 2
The Challenge Facing Bioinformatics Volume and variety of genomic data expanding at EBI doubling every year - replication is challenging >45,000 Cores & 46PB (but need more!) EMBL-EBI Provides Access to both public and managed access data sets Web and programmatic access to services (3M unique users) Challenge is to support complex analysis Bespoke workflows and tools across a variety of domains Increasingly issues with moving data 3
10 Projected Hardware Requirements and Cost ( M) 9 8 7 6 In 2020: Storage: 1,920PB Cores: 230,000 5 4 3 2 1 0 1 2 3 4 5 6 7 Expected Growth DCC Baseline Spend 4
Impact on EMBL-EBI s Infrastructure Grow the capacity of the current data centres Commodity infrastructure blades and NAS (50 100 racks) RDBMS and SAN for high throughput transaction processing Tape backup is no longer feasible Provide a resilient topology by geographical separation Against local & regional disaster in the UK Against national disaster through international collaboration How to enable science on big data? 5
What are the Challenges? Storing the Analysing the 6
Overview EMBL-EBI IT infrastructure Published COMP DBs standby SAN storage Mirrored Servers LAN network NAS storage Flint Cross Disaster Recovery centre WEB Productio n COMP LAN network NAS storage Production Area DBs SAN storage DBs Hinxton Production centre COMP LAN network NAS storage Staging Area to be released WEB DBs SAN storage Power Gate Tier III London centre Published DBs SAN storage LAN network NAS storage COMP LAN network NAS storage Oliver's Yard Tier III London centre centre virtualised throughout with VMWare WEB WEB Global Server Load Balancer 7
Centralization & specialization is submitted to specialized centralized repositories. Current situation. 8 production centralization
Federation If data gets bigger, the data might have to stay where it is produced. We might have to provision data producers with storage and computation. might be pulled instead of pushed into centralized repositories. 9 production centralization
So what does such a change mean? volumes prohibit casual download Difficult to replicate data for local workflows Need to move computation to where the data is located All data is not going to be available centrally Need to federate data to get a global view Need to move computation to where the data is located Computational capacity may not be near the data Move the data to where there is computational capacity Policy driven data replication 10
EMBL-EBI Embassy Cloud Pilot service hosted at EMBL-EBI data centres Logically isolated outside EMBL-EBI s LANs Secure flexible infrastructure for both tenant and host File based access to EMBL-EBI s data sets Currently, only the 1000 Genomes dataset exposed Academic and commercial users of EMBL-EBIs big data Undertaking their analysis with their data Resources exposed using VMware s vcloud Director Provides IaaS web management interface for tenants 11
Why Embassy Cloud? An embassy is sovereign territory in a host country Host Country: EMBL-EBI Centre Sovereign Territory: Host Country not allowed to enter Virtualisation provides the protection for tenant and host Host puts boundaries in place to protect it from the tenant Tenant has freedom and control within those boundaries Added value from EMBL-EBI over other clouds: Machines and data hosted in known jurisdiction File access to hosted data sets (public & managed access) Direct network access to public EMBL-EBI services 12
Embassy Cloud Internet EMBL-EBI Firewall Global Load Balancer EBI Services & bases Embassy Cloud Exposed Resources 13
Moving Bytes Needed to: Move data between sites Move virtual machine images to the data Exploring the use of Globus Online GridFTP at EMBL-EBI and CSC Exploit existing light path Expose public and private data for download Issues: None at the moment 14
Enlighten Your Research (GEANT) Explore cross-site VM operation using light-paths Sites in NL, UK & FI Provision networks on demand Use Case: Analysis needs significant resources and data Moving beyond the scope of local clusters Goal: Distribute analysis and data over multiple clouds Activity since November 2013: Liaising with sites and NRENs for bandwidth on demand CSC & EMBL-EBI using existing light-path and different data movement protocols
Cross site VM Operation CSC ENA 3.2PB EMBL-EBI VM Janet Funet 1GB lightpath VM Computation Analysis tools Chipster 200GB 1GB lightpath 1GB lightpath SURFnet Analysis tools VM NBIC Galaxy 50GB GoNL 60TB University of Groningen 16
Other Cloud Activity at EMBL-EBI Use Amazon to provide geographical distribution Direct link to globally replicate databases HelixNebula Integration of commercial cloud providers with big research Benefit of additional security assurances For use by pharmaceutical companies For on-demand personalised medicine Explore using IaaS to supplement/replace data centres Put DC on cloud, scale out services (service + database), etc. 17
The Future Private Analysis Public Service Integrating Platform (Deal with discovery, provision & placement) EMBL-EBI IT (Services, Research, Clusters) Virtualised Infrastructure Virtualised Infrastructure Elixir Community Services Virtualised Infrastructure Elixir Service Storage Compute Cloud Providers Storage Compute EMBL-EBI Centre Storage Compute Infrastructure Provider Provider Geant Network
The Future Exploitation by Elixir An e-infrastructure for Life Science Understand key issues Replicate datasets GÉANT, DANTE, EGI.eu, PRACE, etc Portable VMI Repository and execution Providing secure isolated IaaS Federating IaaS resources from Elixir, EGI, HN, 19
Any questions? Contact Points steven.newhouse@ebi.ac.uk embassycloud@ebi.ac.uk Acknowledgements Andy Cafferkey Rafael Jimenez Pete Jokinen EMBL-EBI Systems Team 20