Vision and Implementation of a Federated Cloud Infrastructure for Research Matteo Turilli matteo.turilli@oerc.ox.ac.uk David Wallom david.wallom@oerc.ox.ac.uk NGS EPW2 EPSRC funded, 2 years Oxford node FleSSR JISC funded, 10 months MyTrustedCloud EPSRC funded, 6 months
The Oxford e-research Centre 5 Years old, from 3 to >50 staff Building on the UK e-science programme Hosting both research projects and infrastructure services Projects and Applications in; Astrophysical simulations Biochemistry Computational biology Classics digital assets Climate system simulation Computational finance Energy systems management Environment and climate science Medical data sharing Neuroscience data management Social science data analysis... Images courtesy of Steve Rawlings, Myles Allen, Mike Brady, Nic Smith, Mike Giles and Mark Sansom
Research Computing Services within the Oxford e-research Centre Visualisation Data 0.6PB Storage Training HPC Matlab CUDA Computational Chemistry Grid Computing NGS EGI Campus Grid RESEARCH COMPUTING Volunteer Computing Climate Prediction High Performance Computing Clusters Shared memory Cloud Computing Eucalyptus Clouds
The UK-NGS To enable coherent electronic access for all UK researchers to all computational and data based resources and facilities required to carry out their research, independent of resource or researcher location.
Regional and Campus grids HPCx + HECtoR UK e - Infrastructure Users get common access, tools, information, nationally supported services, through NGS HEIs Community Grids Integrated internationally LHC VRE, VLE, IE ISIS TS2
Digital Research - 2010 Extracting Knowledge from the Data Deluge Example: La 2-x Sr x NiO 4 H. Woo et al, Phys Rev B 72 064437 (2005)
The Infrastructure-User Relationship End-Users & Community Technology Experts (National, European & Global Collaborations) Infrastructure Provider (Research & Commercial National, European & Global) With thanks to Steven Newhouse
The Infrastructure-User Relationship: The Problem Community End-User Technology Experts Community End-User Technology Experts Community End-User Technology Experts Community End-User Technology Experts Community End-User Technology Experts Infrastructure Co-ordinator (Research & Commercial National, European & Global) Multiple Independent Infrastructure Providers, each independent and supporting individual mixes of communities
Explosion of User Communities on e- infrastructure Increasing the diversity of the users Provide more diverse services for more users Scale out the support model Demanding an increase in the flexibility of the infrastructure Different communities have different technological demands Supported by a limited set of physical infrastructure and technology experts With a limited set of common interfaces Virtualisation must have a key role to play.
A Virtualised Future? End-Users & Community Technology Experts (National, European & Global Collaborations) Infrastructure Providers Infrastructure Providers (Research & Commercial National, European & Global) With thanks to Steven Newhouse
Community Specific Operations Staff A Virtualised Future? Experts (Communicating between users & providers) End-Users (National, European & Global Collaborations) Infrastructure Providers Infrastructure Providers (Research & Commercial National, European & Global) With thanks to Steven Newhouse
Community Specific Operations Staff Experts (Communicating between users & providers) A Virtualised Future? Community Specific Applications End-Users (National, European & Global Collaborations) Infrastructure Providers Infrastructure Providers (Research & Commercial National, European & Global) With thanks to Steven Newhouse
Community Specific Operations Staff Experts (Communicating between users & providers) A Virtualised Future? Community Specific Applications End-Users (National, European & Global Collaborations) Infrastructure Providers STANDARDS Infrastructure Providers (Research & Commercial National, European & Global) With thanks to Steven Newhouse
Community Specific Operations Staff Experts (Communicating between users & providers) A Virtualised Future? Community Specific Applications End-Users (National, European & Global Collaborations) Community STANDARDS Specific Infrastructure Appliances Providers Infrastructure Providers (Research & Commercial National, European & Global) With thanks to Steven Newhouse
Community Specific Operations Staff Experts (Communicating between users & providers) A Virtualised Future? Community Specific Applications End-Users (National, European & Global Collaborations) Community STANDARDS Specific Infrastructure Appliances Providers Utilising Community specific applications in appliances Infrastructure Providers (Research & Commercial National, European & Global) With thanks to Steven Newhouse
What does this mean? Movement of current services to VM No big-bang migration - gradual change transparent to the enduser Clearly identifying the role of the expert Already residing in the communities Increase the flexibility of the infrastructure Allow experts to configure resources Meet the immediate needs of their users Supporting interdisciplinary tool usage Experts from other communities have access, demonstrate resources accessible to all
Cloud Infrastructure for Research Centralisation Vs Federation Centralisation: one large, dedicated datacentre that serves the national HEI demand Federation: heterogeneous set of local infrastructures are coordinated nationally in order to satisfy the HEI demand Evaluation criteria Funding Scalability Flexibility Maintenance Support Accountability Obsolescence Competitiveness Security
UK Federated Cloud System Central core services Registration, Authn & Authz Accounting Monitoring Service Discovery Standard cloud interfaces Currently defacto through single IaaS provider Utilise meta layer for abstraction of exact cloud from the user Aim for seamless in usage
Resource providers Eucalyptus Instances University of Oxford (NGS/FleSSR) University of Edinburgh (NGS/FleSSR) University of Reading (FLeSSR) Imperial College Manchester University Eduserv (Commercial/Charity) NGS Core Services EoverI Meta layer Distributed Storage to allow sharing of instances and data services
Resource Centre View Experts End-User Wide Area Access Control Management Control VM VM VM VM VM Management Layer Information Services VM Image Repository Compute Resources Storage Resources Network Resources Resource Centre Wide Area Message Bus 20 Cloudscape III - EGI Use Case
Resource Centre View Experts End-User GLUE 2.0 SAML Wide Area Access Control Management Control OCCI VM Image Repository VM VM OVF VM Management Layer Compute Resources VM SRM CDMI Storage Resources Wide Area Message Bus JMX VM UsageRecord xftp Network Resources Information Services Resource Centre 21 Cloudscape III - EGI Use Case
IaaS Cloud Interoperability Profile (IaaSCIP) David Wallom and Steven Newhouse 2009 Open Grid Forum
Oxford e-research Centre NGS Cloud Activities
NGS Cloud Activities NGS Agile Deployment Environments EPSRC funded, 2 years Staff: David Wallom (OeRC, Oxford); David Fergusson (NeSC, Edinburgh); Steve Thorn (NeSC, Edinburgh); Matteo Turilli (OeRC, Oxford). Goals: EC2 compatible, open source solution; development of a dedicated pool of images; collecting data about feasibility, costs, stability; identify use cases and gather further requirements.
Eucalyptus Vs Nimbus, OpenNebula, OpenStack Eucalyptus Pros Very good implementation of EC2 and EBS APIs; Enterprise support offered by Canonical through UEC; Dedicated installation in UEC; Modular design; Xen and Kvm compatible; Open source and commercial. Eucalyptus Cons Design limitations; AAA. The others Limited EC2 API implementation; No native support for EBS; Globus WS4 (Nimbus); Early development stage; Slow development. To keep an eye on OpenNebula 2.2 (to be tested); OpenStack Compute and OpenStack Object Storage.
Eucalyptus Architecture http://open.eucalyptus.com/wiki/eucalyptusinstall_v2.0
Eucalyptus Network Architecture Users The Internet Cloud eth1 129.67.2.254 129.67.2.1 Cloud/Cluster Controller eth0 192.168.2.1 Node Controller hypervisor vbridge VM eth0 192.168.2.2
Eucalyptus Data Architecture 2 data storage systems: Walrus (S3): enables the creation of private buckets, repositories of OS, kernel, initrd images. Users can create them via euca-tools, s3curl, s3cmd and s3fs; Storage Controller (SC): enables the creation of EBS. EBS is attached to a VM as a persistent raw volume. If properly unmounted, EBS are persistent across attachments (to different VMs) or VM reboots. Storage Controller Walrus: Scp image SC: iscsi/aoe Node Controller VM acpiphp pci_hotplug
Eucalyptus Authorisation and Authentication Users Login/password: users connect via https to the cloud controller web interface. Register, get approved, log into the web interface and download a zipped x509 certificate; x509: credentials used to interrogate the cloud controller about zone, images, instances, storage blocks, etc; Key pair: generated by the users to access running instances. The key is injected into the instance at creation time. Cluster/Node Controllers Key pair: each NC is registered with the CC. A key is created on the CC and synchronised via scp.
Client Tools Command Line Interface Euca-tools WS-tools clone Linux, Windows, OSX
Client Tools Hybridfox Relatively intuitive GUI; Firefox extension based on ElasticFox > multiplatform; Specifically tailored for Eucalyptus; Accepts multiple identities > doubles as management tool for the cloud administrators; CLI interface still required to package and upload OS images.
Client Tools RightScale Gems RightAws Ruby-language interface; Open-source; Not specifically tailored for Eucalyptus; Works with EC2 and EBS. To be tested with S3 clone. Flaky with authorisation of security groups; May require Hybridfox or EucaTools;
NGS Cloud Prototypes Oxford III 6 x 2 AMD 2 core; 8GB ram. 1 x 4 AMD 2 core; 32GB ram. CentOS 5.4; Eucalyptus 1.6.2 installed from rpm repositories; Ganglia and Nagios monitoring systems; 5 default VM templates = 44/44/22/22/11 VMs (editable); 2TB ECB, 80GB Walrus.
NGS Cloud Prototypes Oxford IV 3 x 4 Xeon 6 core; 48GB ram. 2 x 1 Xeon 2 core; 32GB ram. Ubuntu 10.10; Ubuntu Enterprise Cloud; 2+2 bounded public NICs on CC; 12TB ECB, 12TB Walrus on SED disks; TPM on every motherboard.
NGS Cloud Prototypes Edinburgh II 32 x Sun Fire X4100 Dual-core, 2.8 GHz Opteron 8 GB RAM, 70 GB RAID1 64 cores 1 Headnode (Cloud and Cluster controllers 31 Nodes (Node controller) Max 2 VMs per core: 124 slots (2GB RAM) VLANs for VM isolation
Managing and Monitoring Tools Hybridfox + euca-tools: overall cloud usage and status + testing; Landscape: canonical, not open-source management solution for UEC. Did not try RightScale as fairly expensive and hosted; Linux CLI: dedicated scripts to monitor logs and daemons status. Issues Public IP Database corruption (addressed in version 2); No user quota on the open source version of Eucalyptus; No accounting on the open source version of Eucalyptus; VERY verbose, not persistent logs; Lack of error feedback in some conditions.
User Support Tools Ticketing system: web-based platform (footprints). Addressed around 200 tickets in 1 year; Web site: subscription instructions, links to Eucalyptus documentation and to the support e-mail; Mailing list: used mainly to announce new services, scheduled or unscheduled downtime, planned upgrades. Issues Access through institutional firewall via proxy; Available resources (limitation of Eucalyptus design); Instructions on how to build a dedicated image; Almost no issues about research and cloud computing.
NGS Cloud Usage 2010/2011 106 registered users: uptake has been very fast and constant throughout the whole testing period; 26 institutions: 23 HEI both universities and colleges, 3 companies; 30 projects; 10 research areas. Physics Ecology Geography Medicine Teaching Life sciences Social Science Engineering Cloud R&D Mathematics
Exemplar Case Studies Evolutionary Genomics: analysis and Information management of Next Generation Sequencing (NGS) of Genomic data poses many challenges in terms of time and size. We are exploring the translation of high quality NGS scientific analysis pipelines to make best use of Cloud infrastructure ; Geospatial Science: geospatial data is a mix of raster and vector data. As rasterizing is CPU-hungry process, and all maps displayed on the screen of the final user are rasters, it is more efficient to do the process on the server side. I am investigating how this process can be dispersed across many, if not unlimited instances in a cloud ; Agent-based modelling of crime: at the moment I have a tomcat server that hosts some web services used to run social simulation model, it needs access to the file system to run fortran scripts, create files etc. There are loads of problems with running our own server at uni and I think a virtual machine that I could have control over would be much better.
Oxford e-research Centre Research and Development in Cloud Computing
Flexible Services for the Support of Research (FleSSR) 6 Partners Academic and industrial; 3 cloud infrastructures. Goals Building federated cloud infrastructure, extending the use of NGS central services with cloud brokering and accounting. Use cases Multi Platform Software Development; On demand Research data storage.
FleSSR Architecture Zeel/i Broker Oxford Reading STFC/NGS Accounting Database Eduserv
FleSSR Infrastructure Local/Global: services depends either on local or global access. Cloud brokering is not mandatory for AWS-like service access; Multiple identities: every user may have multiple identities, both local and global; Only personal identities: group identities are not implemented. The management of every single identity is left to the legally responsible user; Multiple AA technologies: AA may differ depending on local and global policies/technologies; Multiple accounting: every single identity is accounted for its usage. Every individual may get multiple invoices.
FleSSR Use Case: Multi Platform Software Development Zeel/i Broker FleSSR cloud Instance configuration manager Build manager CVS / SVN repository Build instance 1 Build instance 2 Build instance 3 Build instance 4 Build instance 5
FleSSR Use Case: On demand Research data storage Zeel/i Broker Volume Manager FleSSR cloud EBS Volume VM EBS Interface
FleSSR Output Code Instance configuration and build manager: Perl command line utility + Java client utilising the Zeel/I API; Personal EBS volume manager: web-based, Java client for EBS volumes handling + tailored VM image with multiple data interfaces (SFTP, WebDAV, GlusterFS, rsync, ssh); Eucalyptus open-source accounting system: Perl aggregators and parsers for standard eucalyptus open-source log files + MySQL accounting database + PHP accounting client. Use cases SKA community testing of Use case 1; GlusterFS community testing of Use case 2.
MyTrustedCloud 4 Partners academic and industrial; 1 cloud infrastructure; Goals Integrating Trusted Computing technologies into Eucalyptus so to guarantee and enforce the expected behaviour of data interfaces and reliable data handling. Use cases Reliable and accountable data exchange; Reliable forecast of an infrastructure state.
MyTrustedCloud Architecture HW TPM Cluster Controller / Storage Controller HW TPM Node Controller SAN Trusted VM Trusted VM Trusted VM SED Disks for EBS volumes and S3 storage
Conclusions Utilisation of virtual infrastructure is the only scalable method to support large number of disparate user communities; Federation as robust and scalable model of national/european cloud infrastructure for research; Federation is only possible by the availability of standard interfaces; Very successful pilot test of multiple prototypes of cloud infrastructure; Crucial role played by Research & Development in order to customise open-source cloud infrastructure solutions to the specific needs of academic research.
Thank You Matteo Turilli matteo.turilli@oerc.ox.ac.uk David Wallom david.wallom@oerc.ox.ac.uk COPYRIGHT DISCLAIMER Texts, marks, logos, names, graphics, images, photographs, illustrations, artwork, audio clips, video clips, and software copyrighted by their respective owners are used on these slides for non-commercial, educational and personal purposes only. Use of any copyrighted material is not authorized without the written consent of the copyright holder. Every effort has been made to respect the copyrights of other parties. If you believe that your copyright has been misused, please direct your correspondence to: matteo.turilli@oerc.ox.ac.uk and/or david.wallom@oerc.ox.ac.uk stating your position and we shall endeavour to correct any misuse as early as possible.