GPFS in data taking and analysis for new light source (X-Ray)

Similar documents
Advanced Computing. for large DESY. Volker Guelzow IT-Gruppe Hamburg, Oct 29th, 2014

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

HITACHI VIRTUAL STORAGE PLATFORM FAMILY MATRIX

HITACHI VIRTUAL STORAGE PLATFORM FAMILY MATRIX


Quantum StorNext. Product Brief: Distributed LAN Client

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

IBM System x GPFS Storage Server

Sun Storage Perspective & Lustre Architecture. Dr. Peter Braam VP Sun Microsystems

University of Birmingham & CLIMB GPFS User Experience. Simon Thompson Research Support, IT Services University of Birmingham

PARALLELS CLOUD STORAGE

Accelerating Real Time Big Data Applications. PRESENTATION TITLE GOES HERE Bob Hansen

OpenStack Manila File Storage Bob Callaway, PhD Cloud Solutions Group,

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

The dcache Storage Element

The Murchison Widefield Array Data Archive System. Chen Wu Int l Centre for Radio Astronomy Research The University of Western Australia

EOFS Workshop Paris Sept, Lustre at exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

Scientific Storage at FNAL. Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS

IBM ELASTIC STORAGE SEAN LEE

Scala Storage Scale-Out Clustered Storage White Paper

The Use of Flash in Large-Scale Storage Systems.

Spectrum Scale. Problem Determination. Mathias Dietz

IBM Storage Technical Strategy and Trends

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

Four Reasons To Start Working With NFSv4.1 Now

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

IBM System x SAP HANA

Hadoop: Embracing future hardware

Lustre SMB Gateway. Integrating Lustre with Windows

Distributed File System Choices: Red Hat Storage, GFS2 & pnfs

A Survey of Shared File Systems

Name: 1. CS372H: Spring 2009 Final Exam

EMC ISILON X-SERIES. Specifications. EMC Isilon X200. EMC Isilon X210. EMC Isilon X410 ARCHITECTURE

Data Movement and Storage. Drew Dolgert and previous contributors

Monitoring GPFS Using TPC or, Monitoring IBM Spectrum Scale using IBM Spectrum Control. Christian Bolik, TPC/Spectrum Control development 13/05/2015

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks

POSIX and Object Distributed Storage Systems

Managed GRID or why NFSv4.1 is not enough. Tigran Mkrtchyan for dcache Team

Technical White Paper Integration of ETERNUS DX Storage Systems in VMware Environments

Large Scale Storage. Orlando Richards, Information Services LCFG Users Day, University of Edinburgh 18 th January 2013

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

SMB in the Cloud David Disseldorp

Secure Web. Hardware Sizing Guide

2009 Oracle Corporation 1

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

How to Choose your Red Hat Enterprise Linux Filesystem

THE EMC ISILON STORY. Big Data In The Enterprise. Copyright 2012 EMC Corporation. All rights reserved.

The DESY Big-Data Cloud System

owncloud Enterprise Edition on IBM Infrastructure

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads

IBM General Parallel File System (GPFS ) 3.5 File Placement Optimizer (FPO)

OSG Hadoop is packaged into rpms for SL4, SL5 by Caltech BeStMan, gridftp backend

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Marvell DragonFly Virtual Storage Accelerator Performance Benchmarks

The Revival of Direct Attached Storage for Oracle Databases

Technology Highlights Of. (Medusa)

An Oracle White Paper August Oracle WebCenter Content 11gR1 Performance Testing Results

Configuration Management Evolution at CERN. Gavin

Building Storage as a Service with OpenStack. Greg Elkinbard Senior Technical Director

Managing your Red Hat Enterprise Linux guests with RHN Satellite

System Architecture. In-Memory Database

Scale and Availability Considerations for Cluster File Systems. David Noy, Symantec Corporation

Microsoft Windows Server Hyper-V in a Flash

Linux Powered Storage:

Analisi di un servizio SRM: StoRM

How To Install Ggscaler

10th TF-Storage Meeting

Archiving and Managing Remote Sensing Data Using State of the Art Storage Technologies

Understanding Enterprise NAS

Managing a local Galaxy Instance. Anushka Brownley / Adam Kraut BioTeam Inc.

What s new in Hyper-V 2012 R2

Scalable NAS for Oracle: Gateway to the (NFS) future

Next Gen Data Center. KwaiSeng Consulting Systems Engineer

Maurice Askinazi Ofer Rind Tony Wong. Cornell Nov. 2, 2010 Storage at BNL

SMB Direct for SQL Server and Private Cloud

Best Practices for Data Sharing in a Grid Distributed SAS Environment. Updated July 2010

Flash 101. Violin Memory Switzerland. Violin Memory Inc. Proprietary 1

AIX NFS Client Performance Improvements for Databases on NAS

Data storage services at CC-IN2P3

Integration of Virtualized Workernodes in Batch Queueing Systems The ViBatch Concept

Transcription:

GPFS in data taking and analysis for new light source (X-Ray) experiments @DESY formerly known as Large Data Ingest Architecture Martin Gasthuber, Stefan Dietrich, Marco Strutz, Manuela Kuhn, Uwe Ensslin, Steve Aplin, Volker Guelzow / DESY Ulf Troppens, Olaf Weiser / IBM GPFS-UG Austin, November 15

DESY where & what & why > founded December 1959 > research topics Accelerator Science (>5 already built) Particle Physics (HEP) i.e. Higgs@LHC, Gluon@Petra Photon Science (physics with X-Rays) i.e. X-ray crystallography, looking at viruses to van Gogh and fuel cells Astro Particle Physics > 2 Sites Hamburg/ Germany (main site) Zeuthen / Germany (close to Berlin) > ~2300 employes, ~3000 guest scientists annually Martin Gasthuber GPFS for data ingest @DESY November 15, 2015 Page 2

PETRA III > storage ring accelerator 2.3 km circumference > X-ray radiation source > since 2009: 14 beamlines in operation > 2016: 10 additional beamlines (extension) start operations Sample raw file Beamline P11 Bio-Imaging and diffraction Martin Gasthuber GPFS for data ingest @DESY November 15, 2015 Page 3

why building a new online storage and analysis system > massive increase in terms of #files and #bytes (orders of magnitude) > dramatic increase in data complexity more IO & CPU from crystal bound targets to liquid bound targets Martin Gasthuber GPFS for data ingest @DESY November 15, 2015 Page 4

becomes less precise, more complex, more iterative Martin Gasthuber GPFS for data ingest @DESY November 15, 2015 Page 5

user perspective > run startbeamtime <beamtime-id> from relational DB get [PI, participants, access rights, nodes, ] create filesets, create def. dirs + ACLs, setup policy runs, quota setting, XATTR def. settings, create shares for nfs+smb, start ZeroMQ endpoints > take data mostly detector specific mostly done through NFS & SMB today ZeroMQ (vacuum cleaner) growing non present (on site) experimentors can start accessing the data remoteley Portal (http), FTP, Globus-Online (also used for remote data ingest) > run stopbeamtime <beamtime-id> verify all files copied unlink fileset (beamline fs), unshare exports, stop ZeroMQ endpoint update XATTR to reflect state > users continue data analysis on core fileystem & remote access GPFS native, NFS, SMB Martin Gasthuber GPFS for data ingest @DESY November 15, 2015 Page 6

ZeroMQ usage Detectors / Source Posix IO / Detector specific ZeroMQ Entrance ZeroMQ Fan-Out ZeroMQ/IP protocol data analysis data analysis ZeroMQ Exit ZeroMQ Fan-In GPFS IO / Posix GPFS / Sink more pattern expected Martin Gasthuber GPFS for data ingest @DESY November 15, 2015 Page 7

from the cradle to the grave the overall dataflow Detektor NFS, SMB, 0MQ 3 min delay Linux Windows NSD NFS SMB Analysis CPUs Windows Linux Detektor ~ 30 min delay Tape Linux Sandbox per Beamline Beamline Filesystem Core Filesystem ~25 min delay γportal keep copy of data for several days/weeks keep copy of data for several months (>10) Martin Gasthuber GPFS for data ingest @DESY November 15, 2015 Page 8

Beamline Filesystem > wild-west area for beamline > only host based authentication, no ACLs > access through NFSv3, SMB or ZeroMQ > optimized for performance 1 MiB filesystem blocksize pre-optimized NFSv3: ~60 MB/s NFSv3: ~600 MB/s after tuning! SMB: ~300-600 MB/s 'carve out' low capacity & large # of spindles > Tiered storage Tier 0: SSDs from GS1 (~10 TB) migration after short period of time Tier 1: ~80 TB capacity at GL4/6 Detector prop. Link Control Server Beamline Nodes Access through NFS/SMB/0MQ GS1 NSD Proxy Nodes Beamline Filesystem Migration between GPFS Pools via PolicyRun GL6/GL4 NSD MD Martin Gasthuber GPFS for data ingest @DESY November 15, 2015 Page 9

Core Filesystem > settled world > full user authentication > NFSv4 ACLs (only) > access through NFSv3, SMB or native GPFS (dominant) > GPFS policy runs copy data Analysis Nodes Proxy Nodes Beamline Filesystem GS1 NSD GL6/GL4 NSD MD Migration between GPFS Pools GPFS Policy Run creates copy Beamline Core Filesystem Single UID/GID (many to one) ACLs inherited from. raw data set to immutable > 8 MiB filesystem blocksize > fileset per beamtime GL6/GL4 NSD Core Filesystem GS1 NSD MD Martin Gasthuber GPFS for data ingest @DESY November 15, 2015 Page 10

boxes linked together Detector prop. Link Proxy Nodes Control Server 10GE Access Layer Proxy Nodes Analysis Cluster DESY Network Detector prop. Link Control Server up to 24 Detectors Total: 160 Gb/s (320 max.) P3 DC ~1km Proxy Nodes Proxy Nodes Infiniband (56 Gbit FDR) 2 Switches 2 Fabrics GS1 NSD ESS GL6 NSD GS1 NSD Beamline Nodes RTT: ~0,24 ms Initial 7 Proxy Nodes ESS GL4 NSD P3 Hall Datacenter Martin Gasthuber GPFS for data ingest @DESY November 15, 2015 Page 11

structured view Key Components 4x GPFS clusters - Looks like one cluster from the user point of view - Provides separation for admin purposes 2x GPFS file system - Provides isolation for data taking (beamlinefile system) and long-term analysis (core file system) 2x IBM ESS GS1 - Metadata core file system - Metadata beamline file system - Tier0 beamlinefile system - Having two ESS improves high availability for data taking 2x IBM ESS GL4/6 - Tier1 beamlinefile system - Data core filesystem Additional nodes in access cluster and beamline cluster Analysis cluster already exists All nodes run GPFS All nodes are connected via InfiniBand (FDR) Martin Gasthuber GPFS for data ingest @DESY November 15, 2015 Page 12

monitoring > data tapped from ZIMon Collector -> ElasticSearch > ESS GUI (used supplementary to the others) > three areas to cover dashboards (view only) target for beamline, operating, public areas data from ElasticSearch presented with Kibana expert ElasticSearch data accessed through Kibana hunt for covered correlations logfile analysis (Kibana) operating alert system Nagios/Icinga and RZ-Monitor (home grown) checks Martin Gasthuber GPFS for data ingest @DESY November 15, 2015 Page 13

primary civil duty - tuning, tuning, tuning > in several tuning sessions (multiple days ), on the invest side we got impressive speed-ups > getting the right expert on hand (for days) is the key know-how is there > changes with the biggest impact are: GPFS: blocksize (1M, 8M), prefetch-aggressiveness (nfs-prefetch-strategy), logbuffer-size (journal), pagepool (prefetch-percentage) NFS: r/w size (max) > continues to be an important permanent topic! > finding the real protein resource @IBM is possible but not always easy Martin Gasthuber GPFS for data ingest @DESY November 15, 2015 Page 14

things we hope(d) would work, but > current architecture result of process during last months > detectors as native GPFS clients old operating systems (RHEL 4, Suse 10 etc.) inhomogeneous network for GPFS: InfiniBand and 10G Ethernet > Windows as native GPFS client more or less working, but source of pain > Active file management (AFM) as copy process no control during file transfer not supported with native GPFS Windows client cache behavior not optimal for this use case > self-made SSD burst buffer SSDs died very fast (majority), others by far too expensive (NVMe devices) but survived! > remote cluster UID-mapping Martin Gasthuber GPFS for data ingest @DESY November 15, 2015 Page 15

usage Martin Gasthuber GPFS for data ingest @DESY November 15, 2015 Page 16

preparing for > next gen. detectors @PetraIII 6GB/sec (expect 4-6 next year) > next 'customer' - XFEL (http://www.xfel.eu) (free electron laser) start in 2016/2017 blueprint larger - 100 PB per year faster - (10+10+30GB/sec) (initial three experiments) Martin Gasthuber GPFS for data ingest @DESY November 15, 2015 Page 17

summary > key points for going with GPFS as the core component creator knows business well-hung technology storage is conservative fast & reliable & scalable (to a larger extend ;-) GNR usage a must XATTR, NFSv4 ACL (no more POSIC ACLs!) federated (concurrent) access by NFS, SMB and native GPFS protocol policy-runs drives all the automatic data flow (ILC) distinguished data management operations @large scale integration into (our weird) environments/infrastructure (auth, dir services ) future stuff burst buffer, CES (incl. Ganesha), QOS, continuing expect even higher bandwidth with Ganesha (~800MB/sec single thread) Ganesha running NFSv4.1 (including pnfs ;-) our long-term agenda point Martin Gasthuber GPFS for data ingest @DESY November 15, 2015 Page 18