wu.cloud: Insights Gained from Operating a Private Cloud System Stefan Theußl, Institute for Statistics and Mathematics WU Wirtschaftsuniversität Wien March 23, 2011 1 / 14
Introduction In statistics we are increasingly facing the following challenges: more accurate and time consuming models (1), computational intensive applications (2), and/or large datasets (3). Thus, one could or just wait (1+2), reduce problem size (3), run similar tasks on independent processors in parallel (A), load data onto multiple machines that work together in parallel (B), outsource computation (C). In this talk we focus on option C: outsourcing computation. Introduction 2 / 14
Requirements (Applications) Statisticians need/want to run highly computational applications, process large data sets, run memory-demanding applications. For example: Bayesian statistics (Gibbs sampling) Complex optimization problems Investigation of CDS/Bond quote/trade via database backend MC simulation: hedging of options with Levy processes Text mining on large data sets Topic models Requirements 3 / 14
Requirements (Software) However, usually the scientific software employed is rather heterogeneous: R: want to use current version and complete development environment Compilers: GNU Compiler Collection, Intel Compiler, etc. Mathematica and gridmathematica Matlab Optimization: want to use state-of-the-art optimizers like CPLEX, GLPK, KNITRO, MOSEK, etc. ideally on different platforms: Linux and Windows-based system (32 and 64 bit) using various editors: emacs, RStudio, Winedit, nano, vi, etc. Requirements 4 / 14
Outsourcing Computation If we want to outsource our daily (scientific) computation, at WU we either buy new equipment to run the given application, not recommended only if not avoidable use appropriately configured workstations/virtual (Xen) instances for different types of problems almost perfect solution for a given application not scalable use a 520-core cluster of workstations called cluster@wu very scalable however, applications must meet certain requirements like running on specific OS as batch job or, move computations to the cloud on-demand network access to a shared pool of configurable computing resources The latter seems to become a very popular vehicle for outsourcing our computational tasks which we well show in this talk. Requirements 5 / 14
Private Clouds Why running a private cloud system? Emulate public cloud on (existing) private resources, thus, provides benefits of clouds (elasticity, dynamic provisioning, multi-os/arch operation, etc.), while maintaining control of resources. Moreover, there is always the option to scale out to the public cloud (going hybrid). wu.cloud 6 / 14
wu.cloud From the NIST Definition of Cloud Computing, see http://csrc.nist.gov/groups/sns/cloud-computing/, we derived the following cloud model for wu.cloud: private cloud as solely operated for WU members and projects, thus, network access only via Intranet/VPN, on-demand self-service, resource pooling via virtualization, extensibility/elasticity, Infrastructure as a Service (IaaS), Platform as a Service (PaaS). wu.cloud 7 / 14
wu.cloud wu.cloud is a private cloud system based on the open source software package Eucalyptus (see http://open.eucalyptus.com/). Accessible via http://cloud.wu.ac.at/. Consists of a frontend (website, management software) and a backend (providing resources) system. wu.cloud 8 / 14
wu.cloud Hardware Backend system: (c) 2010 IBM Corporation, from Datasheet XSD03054-USEN-05 Frontend System: 2x IBM X3850 X5 8x8 (64) core Intel Xeon CPUs 2.26 GHz 1 TB RAM EMC 2 Storage Area Network: 7 TB fast + 4 TB slow disks Suse Linux Enterprise Server 11 SP1 Xen 4.0.1 Eucalyptus backend components (cluster, storage, node controller) Virtual (Xen) instance Apache Webserver Eucalyptus frontend components (cloud controller, walrus) wu.cloud 9 / 14
wu.cloud Characteristics wu.cloud aims at scaling in three different dimensions: Compute-nodes: number of cloud instances and cores employed Memory: amount of memory per instance requested Software: Windows vs. Linux and software packages installed CPU 0 5 10 15 20 25 30 35 Debian/gridMathematica virtual cluster Windows/R high CPU instance 1 2 4 8 16 32 64 128 256 Debian/R high memory instance Linux base system R/Mathematica/Matlab R dev environment GUI based customized system R dev environment Matlab/PASW/Stata Windows base system RAM [GB] per instance wu.cloud 10 / 14
wu.cloud User Interface Amazon EC2 API allows for using tools like ec2/euca2ools, hybridfox, etc., primarily designed for EC2 transparent use of wu.cloud and EC2/S3 side by side Remote connection to cloud instances can be established by Secure shell (ssh), PuTTY (http://www.chiark.greenend.org.uk/ ~sgtatham/putty/) VNC (Linux) Remote Desktop (Windows) wu.cloud 11 / 14
wu.cloud User Interface wu.cloud 12 / 14
Insights Gained and Outlook Operating a private cloud environment is recommended under the following conditions: want to benefit from the cloud model (elasticity, dynamic provisioning, multi-os/arch operation, etc.), while maintaining control of resources, appropriate hardware is available/affordable, pay-per-use model of public clouds cannot be considered. The wu.cloud idea is easily advertised to researchers using the three dimensions of scalability: computing resources, RAM, and software packages employed. Some users prefer to have full control over a given system (i.e., being root) rather than just outsourcing computations to a homogeneous system. Nevertheless, it is very important to guide users into the cloud (manuals, lectures, etc.), and considerable resources have to be invested in order to provide several base images (Linux/Windows, R, Matlab, etc.). wu.cloud 13 / 14
Contact Stefan Theußl Institute for Statistics and Mathematics email: cloud@wu.ac.at, or, Stefan.Theussl@wu.ac.at URL: http://statmath.wu.ac.at/~theussl WU Vienna Augasse 2 6, A-1090 Wien wu.cloud 14 / 14