Building a Parallel Cloud Storage System using OpenStack s Swift Object Store and Transformative Parallel I/O



Similar documents
Performance And Scalability In Oracle9i And SQL Server 2000

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda

Performance Tuning and Optimizing SQL Databases 2016

SUSE Cloud Installation: Best Practices Using a SMT, Xen and Ceph Storage Environment

CS 6343: CLOUD COMPUTING Term Project

POSIX and Object Distributed Storage Systems

Performance in a Gluster System. Versions 3.1.x

Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012

SMB in the Cloud David Disseldorp

StorReduce Technical White Paper Cloud-based Data Deduplication

Moving Virtual Storage to the Cloud. Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage

Table of Contents. Overview... 1 Introduction... 2 Common Architectures Technical Challenges with Magento ChinaNetCloud's Experience...

UBUNTU DISK IO BENCHMARK TEST RESULTS

A Performance Analysis of Distributed Indexing using Terrier

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

Moving Virtual Storage to the Cloud

Use of Hadoop File System for Nuclear Physics Analyses in STAR

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000

Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010

Storage as a Service: Leverage the benefits of scalability and elasticity with Storage as a Service

Performance Testing. Why is important? An introduction. Why is important? Delivering Excellence in Software Engineering

SAS Grid Manager Testing and Benchmarking Best Practices for SAS Intelligence Platform

BASICS OF SCALING: LOAD BALANCERS

Amazon Cloud Storage Options

Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH

Scaling Analysis Services in the Cloud

Performance Analysis of Mixed Distributed Filesystem Workloads

Microsoft Analytics Platform System. Solution Brief

Data Storage. Vendor Neutral Data Archiving. May 2015 Sue Montagna. Imagination at work. GE Proprietary Information

A Survey on Cloud Storage Systems

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

James Serra Sr BI Architect

Benchmarking Amazon s EC2 Cloud Platform

HP reference configuration for entry-level SAS Grid Manager solutions

Diablo and VMware TM powering SQL Server TM in Virtual SAN TM. A Diablo Technologies Whitepaper. May 2015

SDFS Overview. By Sam Silverberg

A Complete Open Cloud Storage, Virt, IaaS, PaaS. Dave Neary Open Source and Standards, Red Hat

PostgreSQL Backup Strategies

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

A Performance Engineering Story

SQL Server Consolidation Using Cisco Unified Computing System and Microsoft Hyper-V

DELL s Oracle Database Advisor

MICROSOFT HYPER-V SCALABILITY WITH EMC SYMMETRIX VMAX

SUSE Cloud Installation: Best Practices Using an Existing SMT and KVM Environment

Evaluating HDFS I/O Performance on Virtualized Systems

PARALLELS CLOUD STORAGE

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework

Scaling Database Performance in Azure

Automating Big Data Benchmarking for Different Architectures with ALOJA

Mixing Hadoop and HPC Workloads on Parallel Filesystems

Evaluation of Cloud ONTAP and AltaVault using AWS

Shared Parallel File System

Kentico CMS 6.0 Performance Test Report. Kentico CMS 6.0. Performance Test Report February 2012 ANOTHER SUBTITLE

Performance Testing Process

Performance Test Report KENTICO CMS 5.5. Prepared by Kentico Software in July 2010

Testing of several distributed file-system (HadoopFS, CEPH and GlusterFS) for supporting the HEP experiments analisys. Giacinto DONVITO INFN-Bari

Investigating Private Cloud Storage Deployment using Cumulus, Walrus, and OpenStack/Swift

SQL Server Performance Tuning and Optimization

Reference Design: Scalable Object Storage with Seagate Kinetic, Supermicro, and SwiftStack

Technology and Cost Considerations for Cloud Deployment: Amazon Elastic Compute Cloud (EC2) Case Study

Common Server Setups For Your Web Application - Part II

Oracle Database 11g: RAC Administration Release 2

MAGENTO HOSTING Progressive Server Performance Improvements

HDFS Under the Hood. Sanjay Radia. Grid Computing, Hadoop Yahoo Inc.

StACC: St Andrews Cloud Computing Co laboratory. A Performance Comparison of Clouds. Amazon EC2 and Ubuntu Enterprise Cloud

Protect Data... in the Cloud

The State of Cloud Storage

NoSQL and Hadoop Technologies On Oracle Cloud

QoS-Aware Storage Virtualization for Cloud File Systems. Christoph Kleineweber (Speaker) Alexander Reinefeld Thorsten Schütt. Zuse Institute Berlin

Aspera Direct-to-Cloud Storage WHITE PAPER

SafePeak Case Study: Large Microsoft SharePoint with SafePeak

VBLOCK SOLUTION FOR SAP: SAP APPLICATION AND DATABASE PERFORMANCE IN PHYSICAL AND VIRTUAL ENVIRONMENTS

Investigation of storage options for scientific computing on Grid and Cloud facilities

Load balancing MySQL with HaProxy. Peter Boros Percona 4/23/13 Santa Clara, CA

Hard Disk Drive vs. Kingston SSDNow V+ 200 Series 240GB: Comparative Test

Ubuntu OpenStack Fundamentals Training

Architecting ColdFusion For Scalability And High Availability. Ryan Stewart Platform Evangelist

DOVECOT Overview. Timo Sirainen Chief Architect Co-Founder

A PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM

BMC Recovery Manager for Databases: Benchmark Study Performed at Sun Laboratories

Microsoft SQL Server: MS Performance Tuning and Optimization Digital

Quantum StorNext. Product Brief: Distributed LAN Client

Performance Testing of a Cloud Service

Accelerating Web-Based SQL Server Applications with SafePeak Plug and Play Dynamic Database Caching

Geospatial Server Performance Colin Bertram UK User Group Meeting 23-Sep-2014

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

ECLIPSE Performance Benchmarks and Profiling. January 2009

Recommended hardware system configurations for ANSYS users

Samba's AD DC: Samba 4.2 and Beyond. Presented by Andrew Bartlett of Catalyst //

Transcription:

Building a Parallel Cloud Storage System using OpenStack s Swift Object Store and Transformative Parallel I/O or Parallel Cloud Storage as an Alternative Archive Solution Kaleb Lora Andrew AJ Burns Martel Shorter Esteban Martinez LA-UR-12-23631

Overview Our project consists of bleeding-edge research into replacing the traditional storage archives with a parallel, cloud-based storage solution. Used OpenStack s Swift Object Store cloud software. Benchmarked Swift for write speed and scalability. Our project is unique: Swift is typically used for reads We are mostly concerned with write speeds

Tools/Software S3QL PLFS SWIFT Swift FUSE S3QL PLFS

Typical Swift Setup Auth Node Proxy node

Swift Component Servers Swift-proxy Serves as the proxy server to the actual storage node. Ties all components together. Swift-object Read, write, delete blobs of data (objects). Swift-container Lists and specifies which objects belong to which containers. Swift-account Lists the containers of Swift.

S3QL Full-featured Unix filesystem. E.g.: /mnt/s3ql_filesystem/ Stores data online using backends: Google Storage Amazon S3(Simple Storage Service) OpenStack Favors simplicity. Dynamic capacity.

Parallelization via N-N and N-1-N PLFS is LANL s own approach to parallelized data storage. Appears as an N-1 write(left), but actually is an N-1-N write(right). N-N N-1-N

How the Four Applications Interact Application PLFS FUSE FUSE FUSE S3QL S3QL S3QL Swift

Baseline Performance Testing Single Node Tests

Baseline Test Setup Wrote a script to write various block and file sizes Wrote 1GB, 2GB, and 4GB files Tested multiple configurations single write to a single file system single write to single PLFS mounted file system 3 separate writes to 3 file systems simultaneously Graphed the results to watch trends

Found Ideal Block Size FUSE S3QL Swift

Discovered FUSE Limitations FUSE PLFS FUSE S3QL Swift

Local Parallelization Increased Performance

Baseline Performance Testing was Successful We found an ideal block size. Single node parallelization is efficient FUSE is a limiter in our setup Single write performance was in line with normal cloud storage performance (~25-30MB/s)

Target Performance Testing Parallelization Benchmarking and Scalability

Target Performance Testing Used Multiple Nodes Used Open MPI for parallelizing tests across the whole cluster. Tested performance scaling from 1 to 5 hosts. We were able to get 40 processes running at once because each host contained 8 cores.

N to N Write Tests had Interesting Results Immediate performance improvement with adding nodes even with a small number of processors per node Also noticed spikes of increased performance at each number of processes that was a multiple of the number of hosts we were using Stable, didn't break the S3QL mounts to the Swift containers

2-3 Host Test Results Host 1 1GigE Open MPI Host 2 1GigE Host 1 Host 2 Host 3 1GigE 1GigE 1GigE Open MPI

4-5 Host Test Results Host 1 Host 2 Host 3 Host 4 1GigE 1GigE 1GigE 1GigE Open MPI Host 1 Host 2 Host 3 Host 4 Host 5 1GigE 1GigE 1GigE 1GigE 1GigE Open MPI

Our Tests Show Cloud Storage Scales Well Performance scales linearly as you increase the number of hosts being used for MPI

Read speeds are fast but don't tell the whole story Incredibly fast due to caching Scales very well as you increase the number of hosts being used

More work needs to be done with PLFS and S3QL PLFS performance results were similar to N to N performance results but added enough instability to the S3QL mounts that many failures prevented a complete set of tests

Cloud Storage is a Viable Option for Archiving Parallel cloud storage is possible and has good scalability in the N to N case. Linear as nodes were added More work will need to be done to get PLFS working without breaking the S3QL mounts.

Future Work and Conclusion Further research possibilities of cloud parallelization

Future Testing Test write performance impacts of increased S3QL cache sizes. Test CPU load impact of S3QL uncompressed vs the default LZMA compression Test swift tuning parameters to handle concurrent access for added stability of PLFS testing.

Other File Systems That Could Be Tested Test GlusterFS and Ceph as alternative cloud solutions to swift

Why is Cloud Storage a Viable Archive Solution Container management for larger parallel archives might ease the migration workload.. Many tools that are written for cloud storage could be utilized for local archive. Current large cloud storage practices in industry could be utilized to manage a scalable archive solution.

Acknowledgements LANL Dane Gardner (New Mexico Consortium) H.B. Chen, Benjamin McCleland, David Sherill, Alfred Torrez, Pamela Smith, and Parks Fields (High Performance Computing Division)

Questions?