MAGELLAN 54 S CIDAC REVIEW S PRING 2010 WWW. SCIDACREVIEW. ORG



Similar documents
Magellan A Test Bed to Explore Cloud Computing for Science Shane Canon and Lavanya Ramakrishnan Cray XE6 Training February 8, 2011

Science in the Cloud Exploring Cloud Computing for Science Shane Canon. Moab Con May 11, 2011

Part V Applications. What is cloud computing? SaaS has been around for awhile. Cloud Computing: General concepts

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

HPC Update: Engagement Model

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

When Does Colocation Become Competitive With The Public Cloud? WHITE PAPER SEPTEMBER 2014

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms

When Does Colocation Become Competitive With The Public Cloud?

Open Cirrus: Towards an Open Source Cloud Stack

Mississippi State University High Performance Computing Collaboratory Brief Overview. Trey Breckenridge Director, HPC

Amazon EC2 Product Details Page 1 of 5

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

Hadoop in the Hybrid Cloud

Zadara Storage Cloud A

How To Build A Cloud Computer

Introduction to Cloud : Cloud and Cloud Storage. Lecture 2. Dr. Dalit Naor IBM Haifa Research Storage Systems. Dalit Naor, IBM Haifa Research

Cloud Computing Now and the Future Development of the IaaS

The Greenplum Analytics Workbench

BIG DATA TRENDS AND TECHNOLOGIES

Big data management with IBM General Parallel File System

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Managing Traditional Workloads Together with Cloud Computing Workloads

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

for my computation? Stefano Cozzini Which infrastructure Which infrastructure Democrito and SISSA/eLAB - Trieste

On Demand Satellite Image Processing

Cloud Computing for Science

Exploration of adaptive network transfer for 100 Gbps networks Climate100: Scaling the Earth System Grid to 100Gbps Network

HadoopTM Analytics DDN

How To Talk About Data Intensive Computing On The Cloud

Sun Constellation System: The Open Petascale Computing Architecture

Data Aggregation and Cloud Computing

Data management challenges in todays Healthcare and Life Sciences ecosystems

Cloud Computing. Chapter 1 Introducing Cloud Computing

The Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays

Building a Linux Cluster

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Program Summary. Criterion 1: Importance to University Mission / Operations. Importance to Mission

The Hartree Centre helps businesses unlock the potential of HPC

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

SEAIP 2009 Presentation

IBM Deep Computing Visualization Offering

Database Virtualization and the Cloud

What Is It? Business Architecture Research Challenges Bibliography. Cloud Computing. Research Challenges Overview. Carlos Eduardo Moreira dos Santos

wu.cloud: Insights Gained from Operating a Private Cloud System

Cost-effective Strategies for Building the Next-generation Data Center

Performance Across the Generations: Processor and Interconnect Technologies

High Performance Server SAN using Micron M500DC SSDs and Sanbolic Software

Virtualization, Grid, Cloud: Integration Paths for Scientific Computing

Kriterien für ein PetaFlop System

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

Cluster, Grid, Cloud Concepts

Hadoop: Embracing future hardware

Sistemi Operativi e Reti. Cloud Computing

ORACLE BIG DATA APPLIANCE X3-2

Cloud Computing. Chapter 1 Introducing Cloud Computing

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Quick Reference Selling Guide for Intel Lustre Solutions Overview

Emerging Technology for the Next Decade

Unified Computing Systems

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution

Elastic Cloud Computing in the Open Cirrus Testbed implemented via Eucalyptus

Trends in Cloud Computing and Data Intensive Networks. PASIG Malta - 25 June 2009

Cornell University Center for Advanced Computing

Hadoop & its Usage at Facebook

IBM Enterprise Linux Server

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads

How to Do/Evaluate Cloud Computing Research. Young Choon Lee

Hadoop on the Gordon Data Intensive Cluster

a Cloud Computing for Science

Microsoft s Open CloudServer

Availability Digest. HPE Helion Private Cloud and Cloud Broker Services February 2016

Last time. Data Center as a Computer. Today. Data Center Construction (and management)

Cloud Computing and Amazon Web Services

High Performance Computing Cloud Computing. Dr. Rami YARED

InfiniBand Update Addressing new I/O challenges in HPC, Cloud, and Web 2.0 infrastructures. Brian Sparks IBTA Marketing Working Group Co-Chair

ECE6130 Grid and Cloud Computing

A Review on "Above the Clouds: A Berkeley View of Cloud Computing (Armbrust, Fox, Griffith at.el.)"

Large File System Backup NERSC Global File System Experience

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

Can Flash help you ride the Big Data Wave? Steve Fingerhut Vice President, Marketing Enterprise Storage Solutions Corporation

Introduction to AWS Economics

Cloud Computing mit mathematischen Anwendungen

Demystifying the Cloud Computing

AMD SEAMICRO OPENSTACK BLUEPRINTS CLOUD- IN- A- BOX OCTOBER 2013

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

Protecting Information in a Smarter Data Center with the Performance of Flash

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

Platfora Big Data Analytics

Cisco UCS B460 M4 Blade Server

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Laurence Liew General Manager, APAC. Economics Is Driving Big Data Analytics to the Cloud

Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture

Private Cloud. One solution managed by Applied

How To Build A Cisco Ukcsob420 M3 Blade Server

References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline

Transcription:

MAGELLAN Exploring CLOUD Computing for DOE s Scientific Mission Cloud computing is gaining traction in the commercial world, with companies like Amazon, Google, and Yahoo offering pay-to-play cycles to help organizations meet cyclical demands for extra computing power. But can such an approach also meet the computing and data storage demands of the nation s scientific community? The exploratory project has been named Magellan in honor of the Portuguese explorer, and also for the Magellanic Clouds, the two closest galaxies to our Milky Way visible from the Southern Hemisphere. A new $32 million program funded by the American Recovery and Reinvestment Act through the U.S. Department of Energy (DOE) will examine cloud computing as a cost-effective and energyefficient computing paradigm for mid-range science users to accelerate discoveries in a variety of disciplines, including analysis of scientific datasets in biology, climate change, and physics. DOE is a world leader in providing high-performance computing resources for science with the National Energy Scientific Research Computing Center (NERSC) at Lawrence Berkeley National Laboratory (LBNL) to support the high-end computing needs of over 3,000 DOE Office of Science researchers and the Leadership Computing Facilities at Argonne and Oak Ridge National Laboratories serving the largest-scale computing projects across the broader science community through the Innovative and Novel Computational Impact on Theory and Experiment Program (INCITE). The focus of these facilities is on providing access to some of the world s most powerful supercomputing systems that are specifically designed for high-end scientific computing. Interestingly, some of the science demands for DOE computing resources do not require the scale of these well-balanced petascale machines. A great deal of computational science today is conducted on personal laptops or desktop computers or on small private computing clusters set up by individual researchers or small collaborations at their home institution. Local clusters have also been ideal for researchers that co-design complex problem-solving software infrastructures for the platforms in addition to running their simulations. Users with computational needs that fall between desktop and petascale systems are often referred to as mid-range and are the target for Magellan cloud projects. In the past, mid-range users were enticed to set up their own purpose-built clusters for developing codes, running custom software or solving computationally inexpensive problems because hardware has been relatively cheap. However the cost incurred by ownership, including ever-rising energy bills, space constraints for hardware, ongoing software maintenance, security, operations and a variety of other expenses, are forcing mid-range researchers and their funders to look for more costefficient alternatives. Some experts suspect that cloud computing may be a viable solution. Cloud computing refers to a flexible model for on-demand access to a shared pool of configurable computing resources (such as networks, servers, storage, applications, services, and software) that can be easily provisioned as needed. Cloud computing centralizes the resources to gain efficiency of scale and permit scientists to scale up to solve larger science problems while still allowing the system software to be configured as needed for individual application requirements. To test cloud computing for scientific capability, NERSC and the Argonne Leadership Computing Facility (ALCF) will install similar mid-range computing hardware, but will offer different computing environments (figure 1). The systems will create a cloud test bed that scientists 54 S CIDAC REVIEW S PRING 2010 WWW. SCIDACREVIEW. ORG

R. KALTSCHMIDT, LBNL Figure 1. Cloud control. The Magellan management and network control racks at NERSC. To test cloud computing for scientific capability, NERSC and the Argonne Leadership Computing Facility (ALCF) installed purpose-built test beds for running scientific applications on the IBM idataplex cluster. can use for their computations while also testing the effectiveness of cloud computing for their particular research problems. Since the project is exploratory, it has been named Magellan in honor of the Portuguese explorer who led the first effort to sail around the globe. It is also named for the Magellanic Clouds, which are the two closest galaxies to our Milky Way and visible from the Southern Hemisphere. What is Cloud Computing? In the report Above the Clouds: A Berkeley View of Cloud Computing (see Further Reading, p59) a team of luminaries from the Electrical Engineering and Computer Sciences Department at the University of California Berkeley noted that cloud computing refers to both the applications delivered as services over the Internet and the hardware and systems software in the datacenters that provide those services. The services themselves have long been referred to as software as a service (SaaS). The datacenter hardware and software is referred to as a cloud. When a cloud is made available in a pay-as-yougo manner to the general public, it is a public cloud; the service being sold is utility computing. Current examples of public utility computing include Amazon Web Services (AWS), Google AppEngine, and Microsoft Azure. As a successful example, Elastic Compute Cloud (EC2) from AWS sells 1.0 GHz x86 ISA slices, or instances, for $0.10 per hour, and a new instance can be added in two to five minutes. An instance is the allocated memory and collection of processes running on the server. Meanwhile, Amazon s Scalable Storage Service (S3) charges $0.12 to $0.15 per gigabyte-month, with additional bandwidth charges of $0.10 to $0.15 per gigabyte to move data into and out of AWS over the Internet. The advantages of SaaS to both end users and service providers are well understood. Service S CIDAC REVIEW S PRING 2010 WWW. SCIDACREVIEW. ORG 55

MAGELLAN Magellan Hardware This purpose-built test bed for running scientific applications will be built on the IBM idataplex chassis and based on InfiniBand technology, the system will offer high density with front-access cabling and will be liquid-cooled using rear-door heat exchangers (figure 2). Total computer performance across both sites will be on the order of 100 teraflop/s. The NERSC portion of the system will include: 61.5 teraflop/s peak performance 720 compute nodes (5,760 cores) with Intel Nehalem quad-core processors 21.1 TB DDR3 memory QDR InfiniBand fabric Meanwhile, the Argonne system will have: 43 teraflop/s peak performance 504 Compute nodes (4,032 cores) with Intel Nehalem quad-core processors 12 TB memory QDR InfiniBand fabric R. KALTSCHMIDT, LBNL providers enjoy greatly simplifed software installation and maintenance and centralized control over versioning; end users can access the service anytime, anywhere, share data and collaborate more easily, and keep their data stored safely in the infrastructure. Cloud computing does not change these arguments, but it does give more application providers the choice of deploying their product as SaaS without provisioning a datacenter: just as the emergence of semiconductor foundries gave chip companies the opportunity to design and sell chips without owning a semiconductor fabrication plant, cloud computing allows deploying SaaS and scaling on demand without building or provisioning a datacenter. Mid-Range Users on a Cloud Realizing that not all research applications require petascale computing power, the Magellan project will explore several areas: Understanding which science applications and user communities are most well-suited for cloud computing (sidebar Metagenomics on a Cloud? p58) Understanding the deployment and support issues required to build large science clouds. Is it cost effective and practical to operate science clouds? How could commercial clouds be leveraged? How does existing cloud software meet the needs of science and could extending or enhancing current cloud software improve utility? How well does cloud computing support dataintensive scientific applications? What are the challenges to addressing security in a virtualized cloud environment? Figure 2. Staying cool. By building the Magellan test bed at NERSC on IBM s idataplex chassis, the facility can take advantage of the machine s innovative halfdepth design and liquid-cooled door, reduce cooling costs by as much as half, and reduce floor space requirements by 30%. The orange tubes in the picture will carry coolant to chill the system. By installing the Magellan systems (sidebar Magellan Hardware ) at two of DOE s leading computing centers, the project will leverage staff experience and expertise as users put the cloud systems through their paces. The Magellan test bed will be comprised of cluster hardware built on IBM s idataplex chassis and based on Intel s Nehalem CPUs and QDR InfiniBand interconnect (figure 3). Total computer performance across both sites will be on the order of 100 teraflop/s. Researchers at ACLF and NERSC will look into the Eucalyptus toolkit, an open-source package that is compatible with Amazon Web Services, as a potential tool for allocating Linux virtual machine images. In addition, the teams researching Magellan s suitability will also investigate the performance of Apache s Hadoop and Google s MapReduce, two 56 S CIDAC REVIEW S PRING 2010 WWW. SCIDACREVIEW. ORG

R. KALTSCHMIDT, LBNL The Magellan test bed will be comprised of cluster hardware built on IBM s idataplex chassis and based on Intel s Nehalem CPUs and QDR InfiniBand interconnect, and the total computer performance across both sites will be on the order of 100 teraflop/s. Figure 3. Magellan systems at both NERSC and the ALCF will be built using QDR InfiniBand fabric like the one pictured here. S CIDAC REVIEW S PRING 2010 WWW. SCIDACREVIEW. ORG 57

MAGELLAN Metagenomics on a Cloud? One goal of the Magellan project is to understand which science applications and user communities are best suited for cloud computing, but some DOE researchers have already given public clouds a whirl. For example, Jared Wilkening, a software developer at Argonne National Laboratory, recently tested the feasibility of employing Amazon EC2 to run a BLASTbased metagenomics application. Metagenomics is the study of metagenomes, genetic material recovered directly from environmental samples. By identifying and understanding bacterial species based on sequence similarity, some researchers hope to put microbial communities to work mitigating global warming and cleaning up toxic waste sites, among other tasks. BLAST is the community standard for sequence comparison. It enables researchers to compare a query sequence with a library or database of sequences, and identify library sequences that resemble the query sequence above a certain threshold. Wilkening notes that the BLAST-based codes, like the one he used on the Amazon EC2, are perfect for cloud computing because there is little internal synchronization, therefore it does not rely on high-performance interconnects. Nevertheless, the study s conclusion was that Amazon is significantly more expensive than locallyowned clusters, due mainly to EC2 s inferior CPU hardware and the premium cost associated with ondemand access, although an increased demand for compute-intensive workloads could change that. Wilkening s paper was published in Cluster 2009, and slides are available at: http://www.cluster2009.org/9.pdf J. WILKENING, ANL Figure 4. Metagenomics is the study of genetic material recovered directly from environmental samples. related software frameworks that deal with large distributed datasets. Currently, one of the challenges in building a private cloud is the lack of software standards. Although these frameworks are not widely supported at traditional supercomputing facilities, large distributed datasets are a common feature of many scientific codes and are natural fits for cloud computing. The team will also be experimenting with other commercial cloud offerings such as those from Amazon, Google, and Microsoft. By making Magellan available to a wide range of DOE science users, the researchers will be able to analyze the suitability for a cloud model across the broad spectrum of the DOE science workload. They will also use performance-monitoring software to analyze what kinds of science applications are being run on the system and how well they perform on a cloud. The science users will play a key role in this evaluation as they bring a very broad scientific workload into the equation and will help the researchers learn which features are important to the scientific community. Data Storage and Networking To address the challenge of analyzing the massive amounts of data being produced by scientific 58 S CIDAC REVIEW S PRING 2010 WWW. SCIDACREVIEW. ORG

R. KALTSCHMIDT, LBNL R. KALTSCHMIDT, LBNL Figure 6. Networking. When completed, the Magellan system at NERSC will be interconnected using QDR, 10 Gbps Ethernet, multiple 1 Gbps Ethernet, and 8 Gbps fiber channel SAN. Figure 5. Main system console for Magellan at NERSC. instruments ranging from powerful telescopes photographing the Universe to gene sequencers unraveling the genetic code of life, the Magellan test bed will also provide a storage cloud with a little over a petabyte of capacity. The NERSC Global File (NGF) system will provide most storage needs for projects running on the NERSC portion of the Magellan system. Approximately 1 PB of storage and 25 gigabits per second (Gbps) of bandwidth have been added to support use by the test bed. Archival storage needs will be satisfied by NERSC s High Performance Storage System (HPSS) archive, which is being increased by 15 PB in capacity. Meanwhile, the Magellan system at ACLF will have 250 TB of local disk storage on the compute nodes and additional 25 TB of global disk storage on the GPFS system. NERSC will make the Magellan storage available to science communities using a set of servers and software called Science Gateways, as well as experiment with Flash memory technology to provide fast random access storage for some of the more data-intensive problems. Approximately 10 TB will be deployed in NGF for highbandwidth, low-latency storage class and metadata acceleration. Around 16 TB will be deployed as local SSD in one SU for data analytics, local read-only data and local temporary storage. Approximately 2 TB will be deployed in HPSS. The ALCF will provide active storage, using HADOOP over PVFS, on approximately 100 compute/storage nodes. This active storage will increase the capacity of the ALCF Magellan system by approximately 30 TF of compute power, along with approximately 500 TB of local disk storage and 10 TB of local SSD. The NERSC and ALCF facilities will be linked by a groundbreaking 100 Gbps network, developed by DOE s Energy Sciences Network (ESnet) with funding from the American Recovery and Reinvestment Act. Such high bandwidth will facilitate rapid transfer of data between geographically dispersed clouds and enable scientists to use available computing resources regardless of location. The Magellan program will run for two years, and the initial clusters will be installed in the next few months. At NERSC installation was slated to begin in November 2009, with early users getting access in December. The NERSC system (figures 5 and 6) was slated to go into production use in mid-january 2010. At ALCF, installation was planned to begin in January 2010, with early users gaining access in February, and the system opening up for full access in March. Contributors Horst Simon, Kathy Yelick, Jeff Broughton, Brent Draney, Jon Bashor, David Paul, and Linda Vu from NERSC at LBNL; Pete Beckman, Susan Coghlan, and Eleanor Taylor from ALCF at Argonne National Laboratory. Further Reading Above the Clouds: A Berkeley View of Cloud Computing http://www.eecs.berkeley.edu/pubs/techrpts/2009/ EECS-2009-28.pdf S CIDAC REVIEW S PRING 2010 WWW. SCIDACREVIEW. ORG 59