Developing High-Performance, Scalable, cost effective storage solutions with Intel Cloud Edition Lustre* and Amazon Web Services



Similar documents
Lustre * Filesystem for Cloud and Hadoop *

Amazon EC2 Product Details Page 1 of 5

Intel Platform and Big Data: Making big data work for you.

A Shared File System on SAS Grid Manger in a Cloud Environment

A Shared File System on SAS Grid Manager * in a Cloud Environment

Cloud based Holdfast Electronic Sports Game Platform

Vendor Update Intel 49 th IDC HPC User Forum. Mike Lafferty HPC Marketing Intel Americas Corp.

WE RUN SEVERAL ON AWS BECAUSE WE CRITICAL APPLICATIONS CAN SCALE AND USE THE INFRASTRUCTURE EFFICIENTLY.

High Performance Computing and Big Data: The coming wave.

Alfresco Enterprise on AWS: Reference Architecture

Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms

Intel Media SDK Library Distribution and Dispatching Process

Using Amazon EMR and Hunk to explore, analyze and visualize machine data

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

AWS Service Catalog. User Guide

Talari Virtual Appliance CT800. Getting Started Guide

Amazon Elastic Beanstalk

Big data management with IBM General Parallel File System

Building your Big Data Architecture on Amazon Web Services

Extended Attributes and Transparent Encryption in Apache Hadoop

Different NFV/SDN Solutions for Telecoms and Enterprise Cloud

High Performance Computing OpenStack Options. September 22, 2015

Zend Server Amazon AMI Quick Start Guide

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Intel Cloud Builder Guide to Cloud Design and Deployment on Intel Xeon Processor-based Platforms

Scaling up to Production

Intel Ethernet and Configuring Single Root I/O Virtualization (SR-IOV) on Microsoft* Windows* Server 2012 Hyper-V. Technical Brief v1.

Implementing Microsoft Windows Server Failover Clustering (WSFC) and SQL Server 2012 AlwaysOn Availability Groups in the AWS Cloud

Microsoft Windows Server Failover Clustering (WSFC) and SQL Server AlwaysOn Availability Groups on the AWS Cloud: Quick Start Reference Deployment

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

Deploying for Success on the Cloud: EBS on Amazon VPC. Phani Kottapalli Pavan Vallabhaneni AST Corporation August 17, 2012

Deploying Virtual Cyberoam Appliance in the Amazon Cloud Version 10

How to Configure Intel X520 Ethernet Server Adapter Based Virtual Functions on Citrix* XenServer 6.0*

Lustre* HSM in the Cloud. Robert Read, Intel HPDD

Scalable Architecture on Amazon AWS Cloud

The Transition to PCI Express* for Client SSDs

Fast, Low-Overhead Encryption for Apache Hadoop*

Migration Scenario: Migrating Batch Processes to the AWS Cloud

iscsi Quick-Connect Guide for Red Hat Linux

Intel and Qihoo 360 Internet Portal Datacenter - Big Data Storage Optimization Case Study

Cloud Computing. Adam Barker

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

How to Configure Intel Ethernet Converged Network Adapter-Enabled Virtual Functions on VMware* ESXi* 5.1

Intel Cyber Security Briefing: Trends, Solutions, and Opportunities. Matthew Rosenquist, Cyber Security Strategist, Intel Corp

Intel Service Assurance Administrator. Product Overview

Deploy Remote Desktop Gateway on the AWS Cloud

CLOUD TECH SOLUTION AT INTEL INFORMATION TECHNOLOGY ICApp Platform as a Service

High-Availability and Scalable Cluster-in-a-Box HPC Storage Solution

Using ArcGIS for Server in the Amazon Cloud

Intel Matrix Storage Console

MapReduce and Lustre * : Running Hadoop * in a High Performance Computing Environment

Dell* In-Memory Appliance for Cloudera* Enterprise

The ROI from Optimizing Software Performance with Intel Parallel Studio XE

Cloud Models and Platforms

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

Benchmarking Sahara-based Big-Data-as-a-Service Solutions. Zhidong Yu, Weiting Chen (Intel) Matthew Farrellee (Red Hat) May 2015

Deploy XenApp 7.5 and 7.6 and XenDesktop 7.5 and 7.6 with Amazon VPC

Using SUSE Studio to Build and Deploy Applications on Amazon EC2. Guide. Solution Guide Cloud Computing.

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture

MarkLogic Server. MarkLogic Server on Amazon EC2 Guide. MarkLogic 8 February, Copyright 2015 MarkLogic Corporation. All rights reserved.

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

Accelerating Business Intelligence with Large-Scale System Memory

How to Choose your Red Hat Enterprise Linux Filesystem

Building a Private Cloud with Eucalyptus

Intel RAID Volume Recovery Procedures

Acronis Storage Gateway

Cluster Implementation and Management; Scheduling

Amazon Cloud Storage Options

Using The Hortonworks Virtual Sandbox

Amazon Web Services Yu Xiao

* * * Intel RealSense SDK Architecture

HPC & Big Data THE TIME HAS COME FOR A SCALABLE FRAMEWORK

Amazon Web Services Primer. William Strickland COP 6938 Fall 2012 University of Central Florida

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Amazon EFS (Preview) User Guide

Creating Overlay Networks Using Intel Ethernet Converged Network Adapters

Make the Most of Big Data to Drive Innovation Through Reseach

Overcoming Security Challenges to Virtualize Internet-facing Applications

New Storage System Solutions

Scala Storage Scale-Out Clustered Storage White Paper

Cloud Service Brokerage Case Study. Health Insurance Association Launches a Security and Integration Cloud Service Brokerage

Solution Recipe: Improve PC Security and Reliability with Intel Virtualization Technology

IBM Global Technology Services September NAS systems scale out to meet growing storage demand.

Accelerating Business Intelligence with Large-Scale System Memory

A Superior Hardware Platform for Server Virtualization

Intel Network Builders

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

How AWS Pricing Works May 2015

Using ArcGIS for Server in the Amazon Cloud

Intel Solid-State Drive Data Center Tool User Guide Version 1.1

Resource Sizing: Spotfire for AWS

Intel Active Management Technology Embedded Host-based Configuration in Intelligent Systems

How To Create A Virtual Private Cloud In A Lab On Ec2 (Vpn)

Intel Data Center Manager. Data center IT agility and control

Transcription:

Reference Architecture Developing Storage Solutions with Intel Cloud Edition for Lustre* and Amazon Web Services Developing High-Performance, Scalable, cost effective storage solutions with Intel Cloud Edition Authors: Gabriele Paciucci Intel High Performance Data Division Solution Architect Steve Paper Intel Technical Account Manager Designed specifically for high performance computing, the open source Lustre parallel file system is one of the most popular, powerful and scalable data storage systems currently available, and is in widespread use today in super-computing scenarios where high performance and enormous storage capacity is required. 60% of the Top 100 clusters in the world 1 are currently running Lustre. Intel Cloud Edition for Lustre* Software provides a high performance Lustre file system on Amazon Web Services (AWS) using AWS resources. The full package includes CentOS, Lustre, Ganglia, and Lustre Monitoring Tool (LMT). The product is delivered in the form of an Amazon Machine Image (AMI) available on the AWS Marketplace. AWS is a leading provider of cloud computing infrastructure that allow scientists and engineers to solve complex problems requiring very fast computation coupled with high-bandwidth, low-latency networking. Ian Meyers Amazon Principal Solution Architect Dougal Ballantyne Amazon HPC solution lead Architect

Table of Contents HPC File System Options... 2 Lustre Architecture... 3 Intel Cloud Edition for Lustre... 4 How to create a Lustre file system... 5 Building a compute cluster with cfncluster... 11 Instance types and performance... 14 Summary... 16 More Information 17 HPC File System Options Scale-up storage solutions and other traditional network file systems (like NFSv3) work by designating a single node to function as the I/O server for the storage cluster. All I/O data reads and writes go through that single node. Figure 1 shows a typical NFS configuration. While this system is relatively simple to manage in a single cluster deployment, pushing all of an enterprise s I/O through one server node quickly presents a bottleneck for data-intensive workloads and for workloads that need a high number of threads/processes. When scaling up an NFS-based environment, each of the disparate NFS clusters must be managed individually, adding to data bottlenecks as well as management overhead and costs. Figure 1 2

Lustre Architecture Lustre is a POSIX object-based file system that splits file metadata (such as the file system namespace, file ownership and access permission) from the actual file data and stores them on different servers. File metadata is stored on a Metadata Server (MDS), and the file data is split into multiple objects and stored in parallel across several Object Storage Targets (OST). Figure 2 shows a typical Lustre file system configuration. The same Lustre file system can run on different heterogeneous networks thanks to the Lustre Network, a very powerful and fast abstraction layer. Lustre Networking (LNET) provides the communications infrastructure required by the Lustre filesystem. It enables highly-available cluster communication across a variety of networking technologies and supports transparent recovery during failures. Lustre is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O. Users can create a single POSIX name space of up to 512 PByte and very large files up to 32 PByte. Several sites are currently in production with a Lustre cluster scaling beyond 1 TByte/sec 2 and very high metadata operation rate (800K stats/sec). Figure 2 3

Developing Storage Solutions with Intel Cloud Edition Lustre and Amazon Web Services Intel Cloud Edition for Lustre Intel Cloud Edition for Lustre* software is a product available through the AWS Marketplace. This product provides a high performance Lustre file system on the AWS cloud using AWS compute, storage and I/O resources supported by Intel. The full package includes CentOS, Lustre, Ganglia, and the Lustre Monitoring Tool (LMT). The product is delivered in the form of an AMI available on the AWS Marketplace. Intel Cloud Edition Lustre is intended to be used as the working file system for a High Performance Compute Cluster or other IO intensive workload. It is not intended to be used as long term storage or as an alternative to cloud storage options such as Amazon S3. We recommend that Amazon S3 in particular be used for long term data storage on AWS, and Lustre used wherever a high-performance shared file system is required. With the latest edition of Intel Cloud Edition Lustre, Amazon S3 storage can be used to import data into the Lustre file system. Intel Cloud Edition Lustre versions There are currently three product versions available on AWS. Figure 3 lists the versions as well as the associated features. Features Community Version Global Support Global Support (HVM) Intel Support no yes yes Instance Types M1 M1, M3 C3,C4 Ephemeral Storage yes yes no EBS Storage no yes yes High Availability no yes yes VPC no yes Yes Enhanced Networking no no Yes Raw Initialization no yes Yes Figure 3 Intel Cloud Edition for Lustre supports several advanced capabilities of AWS. The Amazon Virtual Private Cloud (VPC) lets you provision a logically isolated section of the AWS Cloud where you can launch AWS resources in a virtual network that you define. It allows for full control over addressing and access; Amazon VPC is now the default mode of networking in AWS deployments. In addition to running in an Amazon VPC, the Lustre high availability solution automatically configures Amazon EC2 AutoScaling, adding support for restarting unhealthy Amazon EC2 instances. If an instance becomes unhealthy, the pre-configured AutoScaling feature will detect the failure and start a new instance. Once the new instance is online, it will reattach the orphaned target s resource (network interface and Amazon Elastic Block Store [EBS] volumes) and restart the target. 4

Global Support (HVM) Hardware assisted virtualization (HVM) instances, with support for Enhanced Networking in the C3 and C4 compute optimized instances provide high performance on AWS and are recommended for Lustre Cloud Edition. Enhanced Networking uses SR-IOV, which allows a physical device to be virtualized and connected directly to a virtual machine, which provides lower latency and more consistent performance. Global Support The Global Support offering is intended for use with HPC workloads. With High Availability and storage on the Amazon EBS volumes, Lustre can recover the file system after a Lustre server is terminated. In addition to supporting Amazon EBS for more resilient storage, this version also supports higher end instance types (M3) which enable a higher performance file system. Community Version The Community Version is an entry level product which can be used for proof-of-concept, development and testing. We offer storage templates which utilize local disk for this product, so our recommendation is to limit use to scratch file systems; data will be lost if one of the Lustre servers is terminated. Support is not included with this version, so we recommend moving to a product which includes Global Support once you have completed evaluation. How to create an Intel Cloud Edition Lustre cluster on AWS step by step Intel Cloud Edition for Lustre is designed to create a scalable and very fast parallel Lustre file system to be attached to an external cluster of compute nodes. During the creation of the Lustre cluster, a single client will be created. This is used for test purpose only. The compute cluster can be created using a variety of cluster managers, and AWS has simplified this process with an easy to use tool called cfncluster (https://github.com/awslabs/cfncluster). Step 1 After deciding on the version of Lustre that is right for your requirements, you will need to subscribe to the specific product. To do this either locate the version on the AWS Marketplace (Figure 5) or use the hyperlinks on the Intel webpage https://wiki.hpdd.intel.com/display/pub/intel+cloud+edition+for+lustre*+software (Figure 4) 5

Developing Storage Solutions with Intel Cloud Edition Lustre and Amazon Web Services Figure 4 Figure 5 Step 2 Once you have subscribed and received automated confirmation emails you are ready to use the pre-built templates to create your cluster. To start this process click on the correct link from the Intel web page. Each Lustre product has several templates to choose from. Select a cluster configuration based on your requirements. A specific template has been created for following AWS Regions, US East, US West (Oregon, N.California), Asia Pacific (Tokyo, Singapore, Sydney), South America, EU (Ireland). (Figure 6) Amazon VPC and HA templates will require parameters to be parsed as described in VPC Templates section. 6

Figure 6 Select the template for your preferred Availability Zone, your Lustre cluster will be created in this AZ. Step 3 The template files can be opened by selecting vpc-c4 (to deploy on the C4 instance type) in the Template column (Figure 6) as an example, modified and saved to a location of your choice. This provides flexibility for you to customize your cluster; for example define the instance types you prefer to use, include Amazon VPC settings etc. If you have your own modified version select Choose File and browse to your template location. Figure 7 and Figure 8 detail the parameters required to build a Lustre file system cluster using the templates available in the Global Support HVM VPC version. AWS CloudFormation templates are stored on Amazon S3, and the path is auto populated when you press Launch Stack in Step 2. 7

Developing Storage Solutions with Intel Cloud Edition Lustre and Amazon Web Services Figure 7 Select Next to continue. 8

Step 4 You will need to pass the name of a Private Key that is used for SSH connections (Figure 8). The key will need to have been created before you use the templates (http://docs.aws.amazon.com/awsec2/latest/userguide/ec2-key-pairs.html). At this stage you are able to change a number of parameters including the number of Object Storage Servers (default is 4) Figure 8 9

Developing Storage Solutions with Intel Cloud Edition Lustre and Amazon Web Services Step 5 Review and acknowledge the selections at the foot of the page. When ready, click Create and the AWS CloudFormation Stack process will begin. You can check the creation status using the AWS CloudFormation console. (Figure 9) Important Note: Once the AWS CloudFormation Stack completes, Amazon EC2 resources will be running, Lustre resources will have automatically started, the Lustre file system may be mounted by Lustre clients, and billing for the use of newly created resources will have begun. Figure 9 10

Building a High Performance Compute Cluster using a Lustre File system on AWS using cfncluster Intel Cloud Edition for Lustre is designed to create storage nodes, but not compute nodes. Fortunately, AWS cfncluster can be used to create HPC compute nodes tailored to MPIbased applications in AWS. It is agnostic to what the cluster is used for and can easily be extended to support different frameworks. The CLI is stateless, and all operations are done using AWS CloudFormation or other services within AWS. A full description of the tool is posted at http://cfncluster.readthedocs.org/en/latest. cfncluster includes a Lustre client, Make certain to verify the availability of a compatible Lustre client in distinct AMIs. After reading the Getting Started documentation you will then need to install cfncluster. Follow the instructions posted at http://cfncluster.readthedocs.org/en/latest/getting_started.html#installing-cfncluster. After installation, but before using cfncluster, edit the.cfncluster/config file. The config file is divided into several sections that will enable you to customize your cluster with details such as Amazon EC2 instance types (default is t2.micro), and the initial number of compute nodes to create (default is 2). In the Amazon VPC settings section use the same settings as in Step 4. This is critical to be able to connect the Lustre file system and the Lustre clients available on AWS cfncluster. The online cfncluster documentation specifies the high level network configurations that are supported. As a minimum requirement you will need to update the following sections in the.cfncluster/config file. [aws] # This is the AWS credentials section (required). # These settings apply to all clusters # replace these with your AWS keys # If not defined, boto will attempt to use a) environment # or b) EC2 IAM role. aws_access_key_id = enter your key aws_secret_access_key = enter your key [cluster default] # Name of an existing EC2 KeyPair to enable SSH access to the instances. key_name = bill (Replace with your key name) ## VPC Settings [vpc public] # ID of the VPC you want to provision cluster into. vpc_id = vpc-f250ce97 (Replace with your vpc id) # ID of the Subnet you want to provision the Master server into master_subnet_id = subnet-83be78a8 (replace with your subnet id) 11

Developing Storage Solutions with Intel Cloud Edition Lustre and Amazon Web Services Important Note: Once the parameters in the file have been updated, create the cluster by following the installation instructions. After completion, Amazon EC2 resources in the new compute cluster will be running and billing will have begun. Figure 10 Figure 11 shows the resulting Client Cluster and Lustre cluster model 12

Figure 11 Using the Amazon EC2 Management Console we can easily obtain the public IP address of your Master Server and the private IP address for the mgt instance. Using an SSH connection we can connect to the Master Server and from the Master Server mount the Lustre file system on all of the compute nodes. The Lustre client is already available on all the compute nodes, it therefore only necessary to run the following command as root: # mount -t lustre <Private IP of MGT>@tcp0:/scratch /mnt/lustrefs To simplify the administration of all the compute nodes, the usage of pdsh is recommended as ec2- user: $ pdsh -g clients sudo -u root mount -t lustre <Private IP of MGT>@tcp0:/scratch /mnt/lustrefs OpenMPI libraries are also included in the AWS cfncluster software stack. MPI based application can 13

Developing Storage Solutions with Intel Cloud Edition Lustre and Amazon Web Services be easily rebuilt to run in this environment. We compiled IOR (https://github.com/chaos/ior) version 2.10 with the MPI libraries to measure the I/O performance of our cluster. Instance type and performance measured As an example we created a Lustre file system using 8x c4.4xlarge instances as OSS server and 16x c4.4xlarge instances. To establish the I/O performance of the file system we used IOR. IOR is a parallel file system test developed by the SIOP (Scalable/IO Project) at LLNL (https://computing.llnl.gov/?set=code&page=sio). This program performs parallel writes and reads to/from a file using MPI-IO and reporting the throughput rates. MPI is used for process synchronization. We ran IOR using 256 threads across 16 nodes with xfersize of 1 MiB block size for each thread of 4 GiB and an aggregate file size of 1024 GiB, which resulted in (Figure 12) Max Write: 1818.42 MiB/sec (1906.75 MB/sec)* Max Read: 1810.10 MiB/sec (1898.03 MB/sec)* Lustre Monitor Tool and Ganglia are included and installed in Intel Cloud Edition for Lustre. We used LTOP to record the activity during the IOR experiment: 14

Figure 12 Further testing using the same parameters was performed to show the effects of scaling the Lustre file system by increasing the number of Amazon EC2 OSS instances. Figure 13 show the results obtained. 15

Developing Storage Solutions with Intel Cloud Edition Lustre and Amazon Web Services 4000 3500 3000 2500 2000 OSS Write / Read Scaling 1500 1000 500 0 4 OSS 8 OSS 16 OSS Write MB/s Read MB/s Figure 13 As the number of Object Storage Server Amazon EC2 instances in the cluster increased, both read and write performance increased at a near-linear rate. Summary Purpose-built and using dynamic computing resources available from AWS, the Intel Lustre solution is a fast, massively scalable storage platform positioned to accelerate application performance, even with complex workloads. A driving force behind the continued development of Lustre, Intel is committed to storage for high performance workloads. Intel Cloud Edition for Lustre software is an ideal foundation for dynamic AWS-based workloads demanding fast, scalable, and cost-effective storage. Using the resources and templates described in this document, you can innovate on your problem not your infrastructure. 16

For more information For detailed information about Amazon Web Services Instance types http://aws.amazon.com/ec2/ For detailed information about the Intel Cloud Edition Lustre https://wiki.hpdd.intel.com/display/pub/hpdd+wiki+front+page Cfncluster getting started http://cfncluster.readthedocs.org/en/latest/getting_started.html Configuring cfncluster http://cfncluster.readthedocs.org/en/latest/getting_started.html#configuring-cfncluster cfncluster supported Network configurations http://cfncluster.readthedocs.org/en/latest/networking.html IOR HPC Benchmark Source https://github.com/chaos/ior 1 www.top500.org 2 http://zfsonlinux.org/docs/lug12_zfs_lustre_for_sequoia.pdf, results in presentation by LLNL at Lustre User Group 2012, April 23, 2012 Legal Notices & Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHER- WISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel technologies features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at Intel.com, or from the OEM or retailer Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. Test and System Configurations: All tests were performed by Intel using cfncluster Version 18 to create the compute stack. Version 1.1 (01-29-2015) of the Intel Cloud Edition for Lustre Global Support (HVM) was used to create the Lustre Stack. For more complete information about performance and benchmark results, visithttp://www.intel.com/performance. The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request For more information on performance tests and on the performance of Intel products, reference www.intel.com/procs/perf/limits.htm and any Intel source materials such as performance briefs or white papers. Copyright 2015 Intel Corporation. All rights reserved. Intel, the Intel logo, and Intel Xeon are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 17