February, 2015 Bill Loewe
Agenda System Metadata, a growing issue Parallel System - Lustre Overview Metadata and Distributed Namespace Test setup and implementation for metadata testing Scaling Metadata Servers High Availability Seagate Confidential
Metadata Performance System Performance typically viewed in Bandwidth Bandwidth problem largely addressed, but metadata is a growing issue. We see this in workloads with high numbers of files to access and process. Genome processing CPU Chip manufacturing Video compositing/rendering Seagate Confidential
Lustre Parallel System Lustre is an open source, distributed parallel file system Object-based design provides extreme scalability Compute clients interact directly with storage servers Comprised of: Clients Metadata Servers and Targets Storage Servers and Targets
Lustre Distributed NamespacE (DNE) Distributed NamespacE (DNE) is a new feature available in Lustre 2.5 that allows multiple MDS / MDT components to participate in a single file system. DNE allows the namespace to be divided across multiple metadata servers. Enables the size of the namespace and metadata throughput to be scaled with the number of servers. The Lustre DNE project is comprised of 2 phases. Seagate Confidential
Phase 1, Lustre 2.5 Release Remote Directories -- Lustre sub-directories are distributed over multiple metadata targets (MDTs). Sub-directory distribution is defined by an administrator. Remote Directories Root dir b dir c dir d dir e dir a dir b2 dir c2 dir d2 dir e2 Seagate Confidential
Phase 2, Lustre 2.7 Striped Directories -- The contents of a given directory are distributed over multiple MDTs. Striped Directories Striped Directory dir c2 dir e2 Seagate Confidential
Engineered Storage Solutions for HPC, Big Data & Cloud High speed networking (IB/40GB/e) Parallel file system/object Data protection High availability Flash optimization system (Ext4) Linux OS BIOS/IPMI ClusterStor GEM diagnostics Custom X86 embedded server Seagate storage platforms Seagate Storage Devices Architected Integrated Optimized Qualified Supported Seagate Confidential
Lustre Components Clients Directory Operations, open/close, metadata, and concurrency I/O and locking MDS creation, file status, and recovery OSS OSS OSS Seagate Confidential
ClusterStor Management Unit (CMU): Management and Metadata (MDS/MDT) CSM Manager and MDS/MGS Nodes 2RU 4-node Sandy Bridge Servers Server 1: CSM Mgmt Server 2: Boot Server 3: MGS Server 4: MDS Fault Tolerance (active/passive) Serviceability 2U24 JBOD MDT SAS JBOD for MDS/MGS/Management Disk Configuration Qty 4 Lustre Management (MGS) Qty 4 ClusterStor Management and NFS Qty 2 Global Hot spares Qty 14 Drives for MDT
Scalable Storage Unit (SSU) SSU 5U84 Enclosure 2 Object Storage Servers s per SSU Two (2) trays of 42 HDD s each for Object Storage Targets H/A on each SSU Infiniband QDR/FDR and 40Gb Ethernet data network connectivity
ClusterStor & Lustre 2.5 DNE Hardware DNE is available in ClusterStor v2.0 MDT0 is master and default in DNE environment DNE Servers are configured in active / active pairs Seagate 2U24 with 2 MDS embedded server modules Scale Metadata Capacity / Performance with DNE Server pairs Root dir b dir c dir d dir e Base MDS dir a dir b2 dir c2 dir d2 dir e2
ClusterStor Hardware and the Lustre System Meta Data and Management Servers 2U x 4 Servers Meta Data Target Seagate 2U24 JOBD 1) Where is file? 2) is at. Client 3) Single (3,072Kb) 4) is broken into block stripe segments (1,024Kb) Object Storage Server Seagate Embedded Application Server Object Storage Target Seagate 5U84 Storage Bay Bridge Enclosure 5a) block stripe 1 of 3 (1,024Kb) 5b) block stripe 2 of 3 (1,024Kb) 5c) block stripe 3 of 3 (1,024Kb)
Op/s Scaling MDS and DNEs MDS + 4 DNE Servers (2 ADUs) mdtest create/stat/del Mean of 5 iterations 600,000 500,000 400,000 mdtest scaling MDS + 4 DNEs 300,000 200,000 100,000 Mean Create Mean Stat Mean Remove 0 Seagate Confidential
Op/s Metadata High Availability MDT failover will ensure that the Lustre filesystem remains available in the face of MDS node failure Based on existing OSS pair failover model Failover is graceful, quick, and non-disruptive Failback is automatic and nondisruptive 140,000 120,000 100,000 80,000 60,000 40,000 20,000 0 High Availability and Performance Before Failover Failed over After Failover Mean Create Mean Stat Mean Remove Seagate Confidential
Green Machine: Environmentally-Aware Cold Storage Solution Space Light weight Small foot print Cold storage optimized design Cooling Zero heat emission Ambient cooling/no fans High operating temp. tolerant HDDs Power Green Dynamic power management Low power servers Aggressive TCO goals Recyclable chassis Reduced metal Responsible disposal of old chassis Lowest Operating Cost Reduced Carbon footprint Best for the Planet 15
Typical Use cases Retrieve content, photographs etc. from deep archive while maintaining consistent user experience Online pictures/social media store use cases Pictures >45 days in cold storage Retrieve MRIs/X-rays of a patient Use cases leveraging Tape-based solutions 16
Thank you!