Maurice Askinazi Ofer Rind Tony Wong. HEPIX @ Cornell Nov. 2, 2010 Storage at BNL



Similar documents
Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

Scientific Storage at FNAL. Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015

Data storage services at CC-IN2P3

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

Mass Storage System for Disk and Tape resources at the Tier1.

HP SN1000E 16 Gb Fibre Channel HBA Evaluation

Preview of a Novel Architecture for Large Scale Storage

HITACHI VIRTUAL STORAGE PLATFORM FAMILY MATRIX

Scalable NAS for Oracle: Gateway to the (NFS) future

Parallels Cloud Storage

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

Large Scale Storage. Orlando Richards, Information Services LCFG Users Day, University of Edinburgh 18 th January 2013

Quantum StorNext. Product Brief: Distributed LAN Client

New Storage System Solutions

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Virtualization of the MS Exchange Server Environment

Software Scalability Issues in Large Clusters

Optimizing Large Arrays with StoneFly Storage Concentrators

E4 UNIFIED STORAGE powered by Syneto

SMB Direct for SQL Server and Private Cloud

A High-Performance Storage and Ultra-High-Speed File Transfer Solution

(Scale Out NAS System)

Oracle Database Scalability in VMware ESX VMware ESX 3.5

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

nexsan NAS just got faster, easier and more affordable.

Oracle Exadata Database Machine for SAP Systems - Innovation Provided by SAP and Oracle for Joint Customers

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

HITACHI VIRTUAL STORAGE PLATFORM FAMILY MATRIX

PADS GPFS Filesystem: Crash Root Cause Analysis. Computation Institute

Zadara Storage Cloud A

August Transforming your Information Infrastructure with IBM s Storage Cloud Solution

Evaluation of Enterprise Data Protection using SEP Software

10th TF-Storage Meeting

High Availability Databases based on Oracle 10g RAC on Linux

Scalable Multi-Node Event Logging System for Ba Bar

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

SAN TECHNICAL - DETAILS/ SPECIFICATIONS

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

EMC Unified Storage for Microsoft SQL Server 2008

EMC Business Continuity for Microsoft SQL Server 2008

Reference Architecture. EMC Global Solutions. 42 South Street Hopkinton MA

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

MS Exchange Server Acceleration

StorPool Distributed Storage Software Technical Overview

Michael Kagan.

DELL TM PowerEdge TM T Mailbox Resiliency Exchange 2010 Storage Solution

STORAGE CENTER. The Industry s Only SAN with Automated Tiered Storage STORAGE CENTER

Extreme Scalability and Flexibility for Content Management Throughout Its Lifecycle

Application-Focused Flash Acceleration

Tier0 plans and security and backup policy proposals

Hadoop: Embracing future hardware

Managed GRID or why NFSv4.1 is not enough. Tigran Mkrtchyan for dcache Team

Finding a needle in Haystack: Facebook s photo storage IBM Haifa Research Storage Systems

Building a Linux Cluster

Operating Dedicated Data Centers Is It Cost-Effective?

Virtual server management: Top tips on managing storage in virtual server environments

HPC Advisory Council

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. dcache Introduction

Capacity planning for IBM Power Systems using LPAR2RRD.

MS EXCHANGE SERVER ACCELERATION IN VMWARE ENVIRONMENTS WITH SANRAD VXL

EMC Virtual Infrastructure for Microsoft Applications Data Center Solution

modular Storage Solutions MSS Series

Managing managed storage

Lessons learned from parallel file system operation

Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems

Storage Systems: 2014 and beyond. Jason Hick! Storage Systems Group!! NERSC User Group Meeting! February 6, 2014

Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation

EMC Backup and Recovery for Microsoft Exchange 2007 SP2

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

Reference Architecture - Microsoft Exchange 2013 on Dell PowerEdge R730xd

June Blade.org 2009 ALL RIGHTS RESERVED

Scala Storage Scale-Out Clustered Storage White Paper

PSAM, NEC PCIe SSD Appliance for Microsoft SQL Server (Reference Architecture) September 11 th, 2014 NEC Corporation

NexentaStor Enterprise Backend for CLOUD. Marek Lubinski Marek Lubinski Sr VMware/Storage Engineer, LeaseWeb B.V.

PARALLELS CLOUD STORAGE

Analysis of VDI Storage Performance During Bootstorm

Best Practices for Data Sharing in a Grid Distributed SAS Environment. Updated July 2010

OPEN STORAGE REVOLUTIONIZING HOW YOU STORE INFORMATION

The VMware Administrator s Guide to Hyper-V in Windows Server Brien Posey Microsoft

The dcache Storage Element

Xanadu 130. Business Class Storage Solution. 8G FC Host Connectivity and 6G SAS Backplane. 2U 12-Bay 3.5 Form Factor

Data Protection Technologies: What comes after RAID? Vladimir Sapunenko, INFN-CNAF HEPiX Spring 2012 Workshop

THE SUMMARY. ARKSERIES - pg. 3. ULTRASERIES - pg. 5. EXTREMESERIES - pg. 9

WHITE PAPER BRENT WELCH NOVEMBER

A Deduplication File System & Course Review

IEEE Mass Storage Conference Vendor Reception Lake Tahoe, NV

HPC Update: Engagement Model

Lustre failover experience

How To Back Up A Computer To A Backup On A Hard Drive On A Microsoft Macbook (Or Ipad) With A Backup From A Flash Drive To A Flash Memory (Or A Flash) On A Flash (Or Macbook) On

Scaling from Datacenter to Client

The safer, easier way to help you pass any IT exams. Exam : Storage Sales V2. Title : Version : Demo 1 / 5

The Avere Architecture for Tiered NAS

White Paper. Recording Server Virtualization

7 Real Benefits of a Virtual Infrastructure

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

IBM System x GPFS Storage Server

Object storage in Cloud Computing and Embedded Processing

Transcription:

Maurice Askinazi Ofer Rind Tony Wong HEPIX @ Cornell Nov. 2, 2010 Storage at BNL

Traditional Storage Dedicated compute nodes and NFS SAN storage Simple and effective, but SAN storage became very expensive as facility grew Not a scalable solution NFS servers SAN switches Fibre Channel Disk Storage 2

A New Model Having thousands of compute nodes in our facility provided an opportunity to install large disks internally. This is an inexpensive solution to provide data storage capacity. The task was to figure out how to use this storage for production. STAR collaboration accesses this storage with Xrootd and PHENIX use dcache. Having reduced the facility demand for network storage, its use became more focused, and spending a little more to improve its performance became possible. NFS use: Home directories (user environment), high value files which are backed up, work group scratch space 3

Linux Farm RHIC 1200 (+279) machines 7500 (+2232) cores 2.25 (+1.6) PB storage ATLAS 900 machines 5800 cores RHIC production uses local farm storage: STAR accesses this storage with Xrootd, PHENIX uses dcache ATLAS dcache uses dedicated storage servers 4

Centralized NFS Storage 5

BlueArc (hardware NFS) As the large demand for production storage was redirected to distributed local storage, the need for other NFS filesystems, smaller in size and more targeted in use, became evident. Requirements of any new NFS solution were, the ability to have many concurrent connections and to backup large amounts of data to tape. Our first purchase of two Titan 2200 servers was in 2007, which was quickly upgraded to three Titan 3200s the next year 6

Titan 3000 Blades The BlueArc server model passes data through boards with specialized processors, rather than expecting a server cpu to do it all. 7

All The Performance Money Can Buy? The Titan was working well, but could we get by with a smaller server? BlueArc list price: Titan 3100 is $97,500 Titan 3210 is $150,000 Mercury 50 is $45,000 Mercury 100 is $65,000 8

Mercury Architecture Main Mother Board (MMB) Standard X86, Dual-Core 3Gz 64 bit Linux 2 x 2.5 HDD (SW RAID 1) 10/100/1000 Ethernet 8G mem (Mercury 100) Main FPGA Board (MFB) Single custom PCB, similar in size to MMB PCIe (4 lane) connection to MMB 6 FPGAs (replacing 13 in Titan) 24G mem (Mercury 100) Hybrid solution with HW acceleration for filesystem data movement and metadata processing 9

Current Usage RHIC: 2 (3 x Titan 3210)* Clusters 20 LSI storage arrays 17 x RC16TB contollers with 300GB FC disk (RAID5) 1 x RC16SA controller with 1TB SATA disk 2 x RC12 controllers with 2TB NLSAS disk 937 raw TB ATLAS: 1 (2 x Mercury 100) Cluster 4 LSI storage arrays 2 x RC12 controllers with 450GB SAS disk 2 x RC12 controllers with 1TB NLSAS disk 128 raw TB * Second cluster necessitated by 128 LUN limit 10

Titan Performance at RACF As an example of performance, these SNMP graphs show a Titan server sustaining 9 Gb/s bandwidth, 12,000 ops/sec while system load < 30% 11

Mercury Performance at RACF As an example of performance, these SNMP graphs show a Mercury server sustaining 2Gb/s bandwidth, 60,000 ops/sec while system load < 40% 12

Titan cluster uptime Our 3 server Titan Cluster. It has provided uninterrupted service since installation. Performance graphs go back to March 2009 when our SNMP database was reset. 13

Mercury cluster uptime Our 2 server Mercury Cluster. It has had uninterrupted service for the 6 months since installation. 14

Future Though the Titan server is made to deliver greater performance, the current Bluearc operating system limits the amount of storage the cluster can mount to 128 LUNs. Without being able to add more storage to the cluster, some of the server performance may go unutilized. If the Mercury performance meets expectations for serving the desired amount of storage, then it is a better buy. BlueArc will be providing 512 LUN support in November. That will expand the capacity of the Mercury 100 from 2PB to 8PB, and the Titan 3210 from 4PB to 16PB. 15

Distributed Data Storage 16

Distributed Storage: Overview US ATLAS dcache Storage element 4.5 PB on ~100 Sun/Nexsan Servers (up to 120 TB behind one server) 2 PB DDN served by 4 Dell R710 Recently added 1.3 PB on 19 IBM x3650 M2 servers w/nexsan storage (Solaris 10 + ZFS) Choice driven by cost and some hardware concerns PHENIX dcache 1.1 PB local storage (up to 8 TB/server) + some dedicated storage servers (Aberdeen, etc.). Soon to be 2 PB STAR Xrootd 1 PB local storage. Moving toward 2 PB soon. Daya Bay, LBNE, ATLAS analysis Small scale Xrootd deployments on local storage 17

DDN Storage 1200 2TB SATA (7200 RPM) drives RAID6 (8+2); 120 x 14.5 TB luns Dual S2A9900 controllers 4 Dell R710 Servers 12 GB/s 48G RAM; RH5.5 + XFS Mirrored SSD for journal log Myricom 10G NIC 5 X 88 TB dcache pools per server Up to 1.5 GBps measured throughput per server ReadOnly (data migrated) Example of dcache throughput after addition of DDN storage 18

RHIC Data Volumes PB sized raw data sets in Run 10 - should now be the norm Terabytes PHENIX DST 1500.00 1125.00 750.00 375.00 0 Raw Data Collected in RHIC Runs BRAHMS PHENIX PHOBOS STAR Experiment Run 2 Au-Au Run 2 p-p Run 3 d-au Run 3 p-p Run 4 Au-Au 200 Run 4 Au-Au 63 Run 4 p-p Run 5 Cu-Cu 100 Run 5 Cu-Cu 62 Run 5 p-p Run 6 p-p Run 7 Au-Au Run 8 d-au Run 8 p-p Run 9 p-p Run 10 Au-Au Reconstructed data: 700+ TB to do File aggregation - now 9 GB file sizes Weekly passes over full data sets - Analysis Train model (jobs split and grouped by data subset for local I/O) C. Pinkenburg 19

The PHENIX Computing Model c/o Carla Vale HPSS from PHENIX copying reading/writing dcache scratch Write Pools dcache regular pools once a week reconstruction thumpers analysis train 1-2x per run aggregation working group areas (bluearc) 20

The PHENIX Computing Model c/o Carla Vale HPSS from PHENIX copying reading/writing dcache scratch Write Pools dcache regular pools Observed peak rate >5GB/sec in and out once a week C. Pinkenburg reconstruction thumpers analysis train 1-2x per run aggregation C. Pinkenburg working group areas (bluearc) 20

Conclusions and Plans Distributed storage performing well within current computing models Deploying alternatives to Sun/Oracle storage solutions for ATLAS PHENIX and STAR expanding within the farm hybrid model At what point does increasing core density break the scaling? When does local storage become the bottleneck? Increasing interactive end-user analysis Growing use of PROOF farm for ATLAS Expanding storage solutions for other experiments 21

Questions? 22

Extra Slides 23

24

Mercury Benchmarking Data 25

ATLAS dcache Architecture ERADAT dcache Core Servers (PNFS, admin, Pool Man, etc.) To Panda Server Oracle DB Staging HotDisk: Resilient dcache, 3-5 replicas HPSS Hopping: Tape Tokens Migration: Disk Tokens HotDisk Pools (Sun/Nexsan, DDN) Migration Hopping Write DCap Doors DCCP (or wget) LSM + DCCP Wrapper Linux Farm lcg-cp to space token areas Priority Stager Panda Mover BNL FW SRM GridFTP Doors Dual 10 Gb NICs WAN Priority Stager: PNFS ID translation, CacheInfo, Priority + Bulk Staging, Call Back, Deletion Oracle: State of prestage requests, PNFS ID cache 26