Zentrale Informatik Introducing ScienceCloud Sergio Maffioletti IS/Cloud S3IT: Service and Support for Science IT Zurich, 10.03.2015
What are we going to talk about today? 1. Why are we building ScienceCloud? 2. From usecases to services 3. Architectural & Technical layout
Why are we building ScienceCloud? Provide a research infrastructure to store, access and process research data. Consolidate research infrastructure solutions. Facilitate UZH research infrastructure investment planning. Pursue S3IT research infrastructure strategy: High Performance Computing Cluster Computing Server Computing Storage
What criteria are driving this? Flexibility for end-users. Self-provisioning / elasticity of VMs, Storage and Network (ultimately of Services). Scalability and extensibility of the underlying infrastructure. Reliability and availability of the services.
Where does it come from? ScienceCloud is the v2.0 of Hobbes. 2+ years of operational and user porting experience (started with GC3). Several exploratory national projects: Academic cloud - together with ETH, SwissACC - with more than 12 partner institutions.
From Usecases to Services
high-throughput data analysis Predictive Ecology trajectory tracking (IEU/UZH) 20 000 videos per experiment grouped in trajectories (10-20 videos each) image pipeline written in R using ImageJ plugin 4 cores, 8GB RAM, 12-36 hours walltime each trajectory Total requirement of 288 000 cpu hours
instrument-based data analysis pipelines CyTOF workflow (ZMB/UZH) 1M cells/hours measured: 100MB/hour Data imported in OpenBIS Workflows submission based on selection of data and parameters Retrieval of results through OpenBIS and/or RStudio
interactive processing nodes Windows-based interactive image processing (ZMB/UZH) Access images produced by a microscope Interactively tag/manipulate images Images further processed by automatic pipelines
research data access RStudio server (Business/UZH) Web-based interface to manipulate data run R-scripts single point of access for all group / individual user
research data access NFS and/or CIFS exports (GEO/UZH) Export stored data to researchers personal PCs Allows prepare input data for data processing and Data accessible by the computing infrastructure
Current requests ratio
Why a cloud infrastructure then? Self-provisioning and Elasticity of resources Customization and control of the environment Network API
Why a cloud infrastructure then? Self-provisioning and Elasticity of resources End-users can allocate and release resources when needed. Customization and control of the environment Network API
Why a cloud infrastructure then? Self-provisioning and Elasticity of resources Customization and control of the environment End-users can tailor the research infrastructure to his/her specific needs. Network API
Why a cloud infrastructure then? Self-provisioning and Elasticity of resources Customization and control of the environment Network API To programmatically create and control own research infrastructure. Build services on top
ScienceCloud is a virtulization infrastructure based on OpenStack Infrastructure Cloud Service Multi-Tenancy Compute Storage Network Self-provisioning of VMs, Storage and Network Elastic allocation of resources on demand Multi-tenancy
Why OpenStack OpenSource project Network APIs Amazon compute and storage cloud compliant Bindings for many languages Foundation supported by more than 200 companies. https://www.openstack.org/foundation/companies/
Ceph as Storage backend Infrastructure Cloud Service Multi-Tenancy Compute Storage Network RBD RadosGW
Why Ceph Unified distributed storage (Object + Block + File) No Single Point of Failure Dynamic Data Placement Fully integrates with OpenStack. Open source project.
S3IT-enhanced platform Batch cluster Workflow Automated deployment and configuration Infrastructure Cloud Service Multi-Tenancy Compute Storage Network RBD RadosGW
ScienceCloud web interface Log in with your UZH webpass.
ScienceCloud web interface
ScienceCloud web interface
ScienceCloud layout S3IT router UZH routable private network (VLAN 842 172.23.0.0/16) UZH public network (VLAN 24 130.60.24.0/24) VM VM VM Compute node Cloud controllers RadosGW Network and LB nodes Internal network (VLAN 840-192.168.160.0/22) mon mon mon osd osd osd osd osd Ceph cluster Replication network (VLAN 619 10.130.0.0/16)
How big is ScienceCloud? 3400 compute cores (for hosting VMs) 3.4 PB of raw storage (planned for capacity) 250 TB of SSD storage (planned for intensive I/O) 10 Gb/s non-blocking between any compute blades and storage nodes
Key dates Dec 2014: Framework agreement and 1st batch hardware purchased. March-April 2015: Deployment of new hardware and start integration. Q2 2015: Pre-production and migration of users from Hobbes. Summer 2015: ScienceCloud in production.
Want to know more? Visit us at http://www.s3it.uzh.ch or contact us at contact@s3it.uzh.ch
Backup slides
OpenStack logical view
OpenStack logical view keystone provides the authentication service
OpenStack logical view nova provides computational services
OpenStack logical view neutron provides network services
OpenStack logical view glance provides image store
OpenStack logical view cinder provides block persistent store
OpenStack logical view swift provides object persistent store
OpenStack logical view horizon provides web user interface
Ceph