Overview Context Test Bed Lustre Evaluation Standard benchmarks Fault tolerance Application-based benchmark HEPiX Storage Group report

Similar documents
Investigation of storage options for scientific computing on Grid and Cloud facilities

Investigation of Storage Systems for use in Grid Applications

Investigation of storage options for scientific computing on Grid and Cloud facilities

OSG Hadoop is packaged into rpms for SL4, SL5 by Caltech BeStMan, gridftp backend

Current Status of FEFS for the K computer

PARALLELS CLOUD STORAGE

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

NetApp High-Performance Computing Solution for Lustre: Solution Guide

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

DELL TM PowerEdge TM T Mailbox Resiliency Exchange 2010 Storage Solution

Parallels Cloud Server 6.0

POSIX and Object Distributed Storage Systems

GraySort on Apache Spark by Databricks

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

Lustre * Filesystem for Cloud and Hadoop *

Lesson Plans Microsoft s Managing and Maintaining a Microsoft Windows Server 2003 Environment

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

New Storage System Solutions

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications

February, 2015 Bill Loewe

USB Flash Drives as an Energy Efficient Storage Alternative

Running a Workflow on a PowerCenter Grid

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Quantcast Petabyte Storage at Half Price with QFS!

SAN Conceptual and Design Basics

Parallels Cloud Storage

OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006

High Availability and Backup Strategies for the Lustre MDS Server

Use of Hadoop File System for Nuclear Physics Analyses in STAR

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

Configuring ThinkServer RAID 100 on the Lenovo TS430

VTrak SATA RAID Storage System

Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

SLIDE 1 Previous Next Exit

ATLAS Tier 3

Input / Ouput devices. I/O Chapter 8. Goals & Constraints. Measures of Performance. Anatomy of a Disk Drive. Introduction - 8.1

Improving Lustre OST Performance with ClusterStor GridRAID. John Fragalla Principal Architect High Performance Computing

Benchmarking Cassandra on Violin

High Performance Computing OpenStack Options. September 22, 2015

Experiences with Lustre* and Hadoop*

DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

U-LITE Network Infrastructure

Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III


EOFS Workshop Paris Sept, Lustre at exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

PADS GPFS Filesystem: Crash Root Cause Analysis. Computation Institute

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Hadoop: Embracing future hardware

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

The Use of Flash in Large-Scale Storage Systems.

Virtuoso and Database Scalability

Azure VM Performance Considerations Running SQL Server

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda

1 Storage Devices Summary

Understanding the Benefits of IBM SPSS Statistics Server

Data Movement and Storage. Drew Dolgert and previous contributors

The Microsoft Large Mailbox Vision

Microsoft Exchange Server 2003 Deployment Considerations

EMC Business Continuity for Microsoft SQL Server Enabled by SQL DB Mirroring Celerra Unified Storage Platforms Using iscsi

File server

Dynamode External USB3.0 Dual RAID Encloure. User Manual.

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011

Performance in a Gluster System. Versions 3.1.x

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

MS EXCHANGE SERVER ACCELERATION IN VMWARE ENVIRONMENTS WITH SANRAD VXL

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Benchmarking Hadoop & HBase on Violin

In Memory Accelerator for MongoDB

Storage node capacity in RAID0 is equal to the sum total capacity of all disks in the storage node.

DELL. Virtual Desktop Infrastructure Study END-TO-END COMPUTING. Dell Enterprise Solutions Engineering

HP reference configuration for entry-level SAS Grid Manager solutions

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

A quantitative comparison between xen and kvm

The Data Placement Challenge

StACC: St Andrews Cloud Computing Co laboratory. A Performance Comparison of Clouds. Amazon EC2 and Ubuntu Enterprise Cloud

The safer, easier way to help you pass any IT exams. Exam : E Backup Recovery - Avamar Expert Exam for Implementation Engineers.

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Considerations when Choosing a Backup System for AFS

Scaling out a SharePoint Farm and Configuring Network Load Balancing on the Web Servers. Steve Smith Combined Knowledge MVP SharePoint Server

RAID HARDWARE. On board SATA RAID controller. RAID drive caddy (hot swappable) SATA RAID controller card. Anne Watson 1

New!! - Higher performance for Windows and UNIX environments

Considerations when Choosing a Backup System for AFS

ITCertMaster. Safe, simple and fast. 100% Pass guarantee! IT Certification Guaranteed, The Easy Way!

AirWave 7.7. Server Sizing Guide

Analisi di un servizio SRM: StoRM

One Solution for Real-Time Data protection, Disaster Recovery & Migration

Hardware Configuration Guide

New and Improved Lustre Performance Monitoring Tool. Torben Kling Petersen, PhD Principal Engineer. Chris Bloxham Principal Architect

MySQL performance in a cloud. Mark Callaghan

Architecting a High Performance Storage System

Cloud services for the Fermilab scientific stakeholders

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

Summer Student Project Report

Analysis of VDI Storage Performance During Bootstorm

Transcription:

Storage Evaluation on FG, FC, and GPCF Overview Context Test Bed Lustre Evaluation Standard benchmarks Fault tolerance Application-based benchmark HEPiX Storage Group report Jan 18, 2011 Gabriele Garzoglio, Doug Strain Computing Division, Fermilab Jan 18, 2011 1/28

Context Goal Evaluation of storage technologies for the use case of data intensive Grid jobs. Technologies considered Lustre Hadoop Distributed File System (HDFS) Blue Arc (BA) Orange FS (new request) Targeted infrastructures: FermiGrid, FermiCloud, and the General Physics Computing Farm. Collaboration: FermiGrid / FermiCloud, OSG Storage, DMS, REX Jan 18, 2011 2/28

Evaluation Method Set the scale: measure storage metrics from running experiments to set the scale on expected bandwidth, typical file size, number of clients, etc. http://home.fnal.gov/~garzogli/storage/dzero-sam-file-access.html http://home.fnal.gov/~garzogli/storage/cdf-sam-file-access-perapp-family.html Measure performance run standard benchmarks on storage installations study response of the technology to real-life applications access patterns (root-based) use HEPiX storage group infrastructure to characterize response to IF applications Fault tolerance: simulate faults and study reactions Operations: comment on potential operational issues Jan 18, 2011 3/28

Lustre Test Bed: ITB Bare Metal 0.4 TB 1 Disks ITB Lustre: 2 OST & 1 MDT Dom0: - 8 CPU - 16 GB RAM eth mount FG ITB Clients (7 nodes - 21 VM) BA mount Lustre Server NOTE: similar setup as DMS Lustre evaluation: - Same servers - 2 OST vs. 3 OST for DMS. Jan 18, 2011 4/28

Lustre Test Bed: FCL Bare Metal 2 TB 6 Disks FCL Lustre: 3 OST & 1 MDT Dom0: - 8 CPU - 24 GB RAM eth mount FG ITB Clients (7 nodes - 21 VM) BA mount Lustre Server - ITB clients vs. Lustre Bare Metal Jan 18, 2011 5/28

Lustre Test Bed: FCL Virtual Server 2 TB 6 Disks FCL Lustre: 3 OST & 1 MDT Dom0: - 8 CPU - 24 GB RAM Lustre Server VM Lustre Client VM eth mount FG ITB Clients (7 nodes - 21 VM) BA mount 7 x - ITB clients vs. Lustre Virtual Server - FCL clients vs. Lustre V.S. - FCL + ITB clients vs. Lutre V.S. Jan 18, 2011 6/28

Machine Specifications FCL Client / Server Machines: Lustre 1.8.3: set up with 3 OSS (different striping) CPU: dual, quad core Xeon E5640 @ 2.67GHz with 12 MB cache, 24 GB RAM Disk: 6 SATA disks in RAID 5 for 2 TB + 2 sys disks ( hdparm 376.94 MB/sec ) 1 GB Eth + IB cards ITB Client / Server Machines: Lustre 1.8.3 : Striped across 2 OSS, 1 MB block CPU: dual, quad core Xeon X5355 @ 2.66GHz with 4 MB cache: 16 GB RAM Disk: single 500 GB disk ( hdparm 76.42 MB/sec ) Jan 18, 2011 7/28

Metadata Tests 70000 60000 50000 Storage Evaluation on FG, FC, and GPCF Mdtest: Metadata Tests Results metadata rates from multiple clients. File/Directory Creation, Stat, Deletion. Setup: 48 clients on 6 VM / nodes. 1 ITB cl vs. ITB bare metal srv. DMS results 1 ITB cl vs. FCL bare metal srv. Ops s/sec 40000 30000 ITB Unique DMS-unique FC-Unique ITB-Single DMS-single FC Single ITB - Shared DMS-shared FC Shared 20000 10000 0 dir creation dir stat dir rem file creation file stat file rem Tests 48 clients on 6 VM on 6 different nodes Jan 18, 2011 8/28

Metadata Tests 6000 5000 4000 MDS Should be configured as RAID10 rather than RAID 5. Is this effect due to this? Storage Evaluation on FG, FC, and GPCF Fileop: Results Iozone's metadata tests. Tests rates of mkdir, chdir, open, close, etc. 1 ITB cl vs. ITB bare metal srv. 1 ITB cl vs. FCL bare metal srv. DMS results 3000 ITB Fermicloud DMS 2000 1000 0 mkdir rmdir create close stat chmod readdir link unlink delete Ops/sec Single client Jan 18, 2011 9/28

Data Access Tests IOZone Writes (2GB) file from each client and performs read/write tests. Setup: 3-48 clients on 3 VM/nodes. Tests Performed ITB clts vs. FCL bare metal Lustre ITB clts vs. virt. Lustre - virt vs. bare m. server. read vs. different types of disk and net drivers for the virtual server. read and write vs. number of virtual server CPU FCL clts vs. virt. Lustre - on-board vs. remote IO read and write vs. number of idle VMs on the server read and write w/ and w/o data striping Jan 18, 2011 10/28

ITB clts vs. FCL Bare Metal Lustre 350 MB/s read 250 MB/s write Our baseline Jan 18, 2011 11/28

ITB clts vs. FCL Virt. Srv. Lustre Changing Disk and Net drivers on the Lustre Srv VM Write I/O Rates Use Virt I/O drivers for Net Read I/O Rates 350 MB/s read 70 MB/s write (250 MB/s write on Bare M.) Bare Metal Virt I/O for Disk and Net Virt I/O for Disk and default for Net Default driver for Disk and Net Jan 18, 2011 12/28

ITB clts vs. FCL Virt. Srv. Lustre Trying to improve write IO changing num of CPU on the Lustre Srv VM Write I/O Rates 350 MB/s read 70 MB/s write NO DIFFERENCE Read I/O Rates Write IO does NOT depend on num. CPU. 1 or 8 CPU (3 GB RAM) are equivalent for this scale Jan 18, 2011 13/28

ITB & FCL clts vs. FCL Virt. Srv. Lustre FCL client vs. FCL virt. srv. compared to ITB clients vs. FCL virt. srv. w/ and w/o idle client VMs... FCL clts 15% slower than ITB clts: not significant reads 350 MB/s read 70 MB/s write ( 250 MB/s write on BM ) writes FCL clts ITB clts Jan 18, 2011 14/28

ITB & FCL clts vs. Striped Virt. Srv. What effect does striping have on bandwidth? Read Write No Striping Writes are the same FCL & ITB Striping Reads w/ striping: - FCL clts 5% faster -ITB clts 5% slower Not significant Jan 18, 2011 15/28

Fault Tolerance Basic fault tolerance tests of ITB clients vs. FCL lustre virtual server Read / Write rates during iozone tests when turning off 1,2,3 OST or MDT for 10 sec or 2 min. 2 modes: Fail-over vs. Fail-out. Fail-out did not work. Graceful degradation: If OST down access is suspended If MDT down ongoing access is NOT affected Jan 18, 2011 16/28

Application-based Tests Focusing on root-based applications: Nova: ana framework, simulating skim app read large fraction of all events disregard all (readonly) or write all. Minos: loon framework, simulating skim app data is compressed access CPU bound (does NOT stress storage) Tests Performed Nova ITB clts vs. bare metal Lustre Write and Read-only Minos ITB clts vs. bare m Lustre Diversification of app. Nova ITB clts vs. virt. Lustre virt. vs. bare m. server. Nova FCL clts vs. virt. Lustre on-board vs. remote IO Nova FCL / ITB clts vs. striped virt Lustre effect of striping Nova FCL + ITB clts vs. virt Lustre bandwidth saturation Jan 18, 2011 17/28

1 Nova ITB clt vs. bare metal Read BW = 15.6 ± 0.2 MB/s Read & Write BW read = 2.63 ± 0.02 MB/s BW write = 3.25 ± 0.02 MB/s Write is always CPU bound It does NOT stress storage Jan 18, 2011 18/28

1 Nova ITB / FCL clt vs. virt. srv. 1 ITB clt Read BW = 15.3 ± 0.1 MB/s (Bare m: 15.6 ± 0.2 MB/s) Virtual Server is as fast as bare metal for read 1 FCL clt Read BW = 14.9 ± 0.2 MB/s (Bare m: 15.6 ± 0.2 MB/s) w/ default disk and net drivers: BW = 14.4 ± 0.1 MB/s On-board client is almost as fast as remote client Jan 18, 2011 19/28

21 Nova clt vs. bare m. & virt. srv. Read ITB vs. bare metal BW = 12.55 ± 0.06 MB/s (1 cl. vs. b.m.: 15.6 ± 0.2 MB/s) Read ITB vs. virt. srv. BW = 12.27 ± 0.08 MB/s (1 ITB cl.: 15.3 ± 0.1 MB/s) Read FCL vs. virt. srv. BW = 13.02 ± 0.05 MB/s (1 FCL cl.: 14.4 ± 0.1 MB/s) Virtual Server is almost as fast as bare metal for read Virtual Clients on-board (on the same machine as the Virtual Server) are as fast as bare metal for read Jan 18, 2011 20/28

49 Nova ITB / FCL clts vs. virt. srv. 49 clts (1 job / VM / core) saturate the bandwidth to the srv. Is the distribution of the bandwidth fair? ITB clients FCL clients Minimum processing time for 10 files (1.5 GB each) = 1268 s Client processing time ranges up to 177% of min. time Clients do NOT all get the same share of the bandwidth (within 20%). ITB clts: Ave time = 141 ± 4 % Ave bw = 9.0 ± 0.2 MB/s FCL clts: Ave time = 148 ± 3 % Ave bw = 9.3 ± 0.1 MB/s No difference in bandwidth between ITB and FCL clts. Jan 18, 2011 21/28

21 Nova ITB / FCL clt vs. striped virt. srv. What effect does striping have on bandwidth? More consistent BW 14 MB/s MDT OSTs STRIPED 4MB stripes on 3 OST Read FCL vs. virt. srv. BW = 13.71 ± 0.03 MB/s Read ITB vs. virt. srv. BW = 12.81 ± 0.01 MB/s 14 MB/s Slightly better BW on OSTs NON STRIPED Read FCL vs. virt. srv. BW = 13.02 ± 0.05 MB/s Read ITB vs. virt. srv. BW = 12.27 ± 0.08 MB/s Jan 18, 2011 22/28

Minos 21 Clients Minos application (loon) skimming Random access to 1400 files Loon is CPU bound It does NOT stress storage Jan 18, 2011 23/28

HEPiX Storage Group Collaboration with Andrei Maslennikov Nova offline skim app. used to characterize storage solutions Lustre with AFS front-end for caching has best performance (AFS/VILU). Jan 18, 2011 24/28

Conclusions Performance Lustre Virtual Server writes 3 times slower than bare metal. Use of virtio drivers is necessary but not sufficient. The HEP applications tested do NOT have high demands for write bandwidth. Virtual server may be valuable for them. Using VM clts on the Lustre VM server has the same performance as external clients (within 15%) Data striping has minimal (5%) impact on read bandwidth. None on write. Fairness of bandwidth distribution is within 20%. More data will be coming through HEPiX Storage tests. Fault tolerance Fail-out mode did NOT work Fail-over tests show graceful degradation General Operations Managed to destroy data with a change of fault tolerance configuration. Could NOT recover from MDT vs. OST de-synch. Some errors are easy to understand, some very hard. The configuration is coded on the Lustre partition. Need special commands to access it. Difficult to diagnose and debug. Jan 18, 2011 25/28

Status and future work Storage evaluation project status Initial study of data access model: DONE Deploy test bed infrastructure: DONE Benchmarks commissioning: DONE Lustre evaluation: DONE Hadoop evaluation: STARTED Orange FS and Blue Arc evaluations TODO Prepare final report: STARTED Current completion estimate is May 2011 Jan 18, 2011 26/28

Acknowledgements Ted Hesselroth IOZone Perf. measurements Andrei Maslennikov HEPiX storage group Andrew Norman, Denis Perevalov Nova framework for the storage benchmarks and HEPiX work Robert Hatcher, Art Kreymer Minos framework for the storage benchmarks and HEPiX work Steve Timm, Neha Sharma FermiCloud support Alex Kulyavtsev, Amitoj Singh Consulting Jan 18, 2011 27/28

Any questions? Jan 18, 2011 28/28

EXTRA SLIDES Jan 18, 2011 29/28

Storage evaluation metrics Metrics from Stu, Gabriele, and DMS (Lustre evaluation) Cost Data volume Data volatility (permanent, semi-permanent, temporary) Access modes (local, remote) Access patterns (random, sequential, batch, interactive, short, long, CPU intensive, I/O intensive) Number of simultaneous client processes Acceptable latencies requirements (e.g for batch vs. interactive) Required per-process I/O rates Required aggregate I/O rates File size requirements Reliability / redundancy / data integrity Need for tape storage, either hierarchical or backup Authentication (e.g. Kerberos, X509, UID/GID, AFS_token) / Authorization (e.g. Unix perm., ACLs) User & group quotas / allocation / auditing Namespace performance ("file system as catalog") Supported platforms and systems Usability: maintenance, troubleshooting, problem isolation Data storage functionality and scalability Jan 18, 2011 30/28