www.thinkparq.com www.beegfs.com



Similar documents
New Storage System Solutions

Scientific Computing Data Management Visions

Scala Storage Scale-Out Clustered Storage White Paper

Picking the right number of targets per server for BeeGFS. Jan Heichler March 2015 v1.2

POSIX and Object Distributed Storage Systems

Hadoop: Embracing future hardware

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

Can High-Performance Interconnects Benefit Memcached and Hadoop?

A parallel file system made in Germany Tiered Storage and HSM

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Architecting a High Performance Storage System

Large Scale Storage. Orlando Richards, Information Services LCFG Users Day, University of Edinburgh 18 th January 2013

Cray DVS: Data Virtualization Service

NetApp High-Performance Computing Solution for Lustre: Solution Guide

HP reference configuration for entry-level SAS Grid Manager solutions

Parallels Cloud Storage

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

Software-defined Storage at the Speed of Flash

EOFS Workshop Paris Sept, Lustre at exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB

Current Status of FEFS for the K computer

DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group

Software-defined Storage Architecture for Analytics Computing

PARALLELS CLOUD STORAGE

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

Big data management with IBM General Parallel File System

An Affordable Commodity Network Attached Storage Solution for Biological Research Environments.

Lustre SMB Gateway. Integrating Lustre with Windows

StorPool Distributed Storage Software Technical Overview

PADS GPFS Filesystem: Crash Root Cause Analysis. Computation Institute

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution

SMB Direct for SQL Server and Private Cloud

IBM General Parallel File System (GPFS ) 3.5 File Placement Optimizer (FPO)

POWER ALL GLOBAL FILE SYSTEM (PGFS)

Enabling High performance Big Data platform with RDMA

Client-aware Cloud Storage

PRIMERGY server-based High Performance Computing solutions

Hadoop on the Gordon Data Intensive Cluster

Improving Time to Results for Seismic Processing with Paradigm and DDN. ddn.com. DDN Whitepaper. James Coomer and Laurent Thiers

3 Red Hat Enterprise Linux 6 Consolidation

Oracle TimesTen In-Memory Database on Oracle Exalogic Elastic Cloud

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

GraySort on Apache Spark by Databricks

Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC Denver

F600Q 8Gb FC Storage Performance Report Date: 2012/10/30

CSE-E5430 Scalable Cloud Computing Lecture 2

Optimizing SQL Server AlwaysOn Implementations with OCZ s ZD-XL SQL Accelerator

February, 2015 Bill Loewe

Introduction to Gluster. Versions 3.0.x

Parallels Cloud Server 6.0

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Investigation of storage options for scientific computing on Grid and Cloud facilities

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

Integrated Grid Solutions. and Greenplum

High Performance Computing Infrastructure at DESY

owncloud Enterprise Edition on IBM Infrastructure

Cluster Implementation and Management; Scheduling

(Scale Out NAS System)

Understanding Microsoft Storage Spaces

Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance. Alex Ho, Product Manager Innodisk Corporation

Design and Evolution of the Apache Hadoop File System(HDFS)

Parallels Cloud Server 6.0

Accelerating and Simplifying Apache

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

PSAM, NEC PCIe SSD Appliance for Microsoft SQL Server (Reference Architecture) September 11 th, 2014 NEC Corporation

Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Michael Thomas, Dorian Kcira California Institute of Technology. CMS Offline & Computing Week

HyperQ DR Replication White Paper. The Easy Way to Protect Your Data

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment

Distributed File Systems An Overview. Nürnberg, Dr. Christian Boehme, GWDG

EMC ISILON X-SERIES. Specifications. EMC Isilon X200. EMC Isilon X210. EMC Isilon X410 ARCHITECTURE

Scientific Storage at FNAL. Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000

High Availability with Windows Server 2012 Release Candidate

The Design and Implementation of the Zetta Storage Service. October 27, 2009

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

How To Write An Article On An Hp Appsystem For Spera Hana

Performance in a Gluster System. Versions 3.1.x

Express5800 Scalable Enterprise Server Reference Architecture. For NEC PCIe SSD Appliance for Microsoft SQL Server

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

THE EMC ISILON STORY. Big Data In The Enterprise. Copyright 2012 EMC Corporation. All rights reserved.

High-Availability and Scalable Cluster-in-a-Box HPC Storage Solution

What s new in Hyper-V 2012 R2

SCALABLE DATA SERVICES

Sun Storage Perspective & Lustre Architecture. Dr. Peter Braam VP Sun Microsystems

EMC ISILON NL-SERIES. Specifications. EMC Isilon NL400. EMC Isilon NL410 ARCHITECTURE

2009 Oracle Corporation 1

nexsan NAS just got faster, easier and more affordable.

HPC Advisory Council

Transcription:

www.thinkparq.com www.beegfs.com

KEY ASPECTS Maximum Flexibility Maximum Scalability BeeGFS supports a wide range of Linux distributions such as RHEL/Fedora, SLES/OpenSuse or Debian/Ubuntu as well as a wide range of Linux kernels from 2.6.16 up to the latest vanilla. BeeGFS offers maximum scalability on various levels. It supports distributed file contents with flexible striping across the storage servers on a by file or by directory base as well as distributed metadata. The storage servers run on top of an existing local filesystem using the normal POSIX interface and clients and servers can be added to an existing system without downtime. BeeGFS was initially optimized especially for use in HPC and provides BeeGFS supports multiple networks and dynamic failover in case one of the network connections is down. Multiple BeeGFS services can run on the same physical machine, i.e. any combination of client and servers. Thus, BeeGFS can turn a compute rack, in which nodes are equipped with internal drives, into a powerful combined data processing and HPC shared storage unit, eliminating the need for external storage resources and providing a very cost-efficient solution with simplified management. Best in class client throughput (3.5 GB/s write and 4.0 GB/s read with a single I/O stream on FDR Infiniband) Best in class metadata performance: linear scalability through dynamic metadata namespace partitioning Best in class storage throughput: BeeGFS servers allow flexible choice of underlying file system to perfectly fit the given storage hardware.

System Architecture Maximum Usability The BeeGFS servers are userspace daemons and do not require special administration rights. The client is a kernel module that does not require any patches to the kernel itself. For setup and updates there are rpm/deb packages available; for the startup mechanism, easy-to-use init scripts are provided. BeeGFS also was designed with easy administration in mind. The graphical administration and monitoring system enables users to deal with typical management tasks in a simple and intuitive way: cluster installation, storage service management, live throughput and health monitoring, file browsing, striping configuration, and more. Excellent documentation and next business day available user support help to have the whole system up and running in one hour.

TOOLS & FEATURES Features Helpful Tools We have already talked about BeeGFS key aspects scalability, flexibility and usability and what s behind them. But there are way more features in BeeGFS : BeeGFS comes with built-in benchmark tools for the system s performance. No additional installations are required, simply run and see the results for a quick check or a deep analysis. Data and metadata for each file are separated into stripes and distributed accross the existing servers to aggregate the server speed. Secure authentication process between client and server to protect sensitive data. Fair I/O option on user level to prevent a single user with multiple requests to stall other users requests. Storage high-availability base on replication Automatic network failover, i.e. if Infiniband is down, BeeGFS automatically uses Ethernet (if available) Optional server failover for external shared memory resources. Runs on non-standard architectures, e.g. PowerPC, ARM, or Intel Xeon Phi. Not in the list? Just get in touch and we re happy to discuss all your questions. BeeGFS features easy to use graphical and command line tools which allow flexible adaptions of the file system to your specific requirements, while our monitoring tools deliver precise feedback to the system status at any given moment. Beyond the overall system status, our monitoring tools allow to monitor live statistics for individual users Dedicated file system checks that go beyond the local checks and take typical sources of error in a parallel file system into consideration. Due to a strict separation of data gathering and data checking, there is no load on the servers during the checking phase, thus a long running check does not affect the servers. Most important, check and repairs can be performed while the file system is up, so there s no down-time for the users. There s much more to discover, just get in touch with us!

BEEGFS ON DEMAND FUTURE ROADMAP BeeOND Exascale Plans BeeOND (BeeGFS on demand) allows on the fly creation of a complete parallel file system instance on a given set of hosts with just one single command line. Exascale is a topic that is challenging HPC community on all levels. From energy efficient chips and new programming methods down to I/O and system integrity. Node #01 Node #02 Node #03 Node #n... user-controlled data staging BeeGFS plays an important part in the European Union s DEEPER project* (Dynamic Exascale Entry Platform - Extended Reach) which is addressing the problems of the growing gap between compute speed and I/O bandwith and system resiliency for large scale systems. An exascale development storage lab at Fraunhofer deals with the bandwidth challenge by using three different storage tiers for different purposes: Compute Nodes (1-2 TB NVRAM) Storage System BeeOND was originally designed to integrate with a cluster batch system to create temporary parallel file system instances on a per-job basis on the internal SSDs of compute nodes, which are part of a compute job. Such BeeOND instances do not only provide a very fast and easy to use temporary buffer, but also can keep a lot of I/O load for temporary files away from the global cluster storage. Other use cases for the tool are manifold and also include cloud computing or quick and easy temporary setups for testing purposes. Get in touch to find out how Fraunhofer and customers are taking advantage of this exciting feature! (HDD) Archive (Tape) I/O Rate: 50x+ BeeOND used as Cache (POSIX + Ext) I/O Rate: 10x BeeGFS used as global cluster storage (POSIX + Ext) I/O Rate: 1x Archive Resiliency is approached with various strategies and on different levels: RAID level and local file system, server failover with third party software, replication: distributed, synchronous mirror of directories is built into BeeGFS and covers RAID and server failures, * More to the DEEP-ER project under http://www.deep-er.eu

BENCHMARKS Metadata Operations Throughput Scalability BeeGFS was designed for extreme scalability. In a testbed1 with 20 servers and up to 640 client processes (32x the number of metadata servers), BeeGFS delivers a sustained file creation rate of more than 500,000 creates per second, making it possible to create one billion files in as little time as about 30 minutes. In the same testbed system with 20 servers - each equipped with a single node local performance of 1332 MB/s (write) and 1317 MB/s (read) - and 160 client processes, BeeGFS scales to a sustained throughput of 25 GB/s - which is 94.7 percent of the maximum theoretical local write and 94.1 percent of the maximum theoretical local read throughput. 539.7 25.2 30 600 read write creates 25 500 20 400 GB/s Thousand Creates / second 24.8 300 15 10 200 5 100 0 1 2 4 6 8 10 12 14 16 18 20 0 # Metadata Servers 1 Bechmark System: 20 servers with 2x Intel Xeon X5660 @ 2.8 GHz, 48GB RAM running a Scientific Linux 6.3, Kernel 2.6.32279. Each server is equipped with 4x Intel 510 Series SSD (RAID 0) running ext4 as well as QDR Infiniband. Tests performed using FhGFS version 2012.10. 2 1 4 6 8 10 12 # Storage Servers 14 16 18 20

LICENSING MODEL Free to use Customers Comments BeeGFS is free to use for self-supporting end users and can be downloaded directly from the website (address on the back of this brochure). Professional Support If you are already using BeeGFS but you don t have professional support, yet, here are a few reasons to rethink this. Free Edition Community Mailing List With Support "After many unplanned downtimes with our previous parallel filesystem, we moved to BeeGFS more than two years ago. Since then we didn't have any unplanned downtime for the filesystem anymore." Michael Hengst, University of Halle, Germany We are extremely happy with our 1.2PB BeeGFS installation on 30 servers. It is rock-solid. Rune Møllegaard Friborg, University of Aarhus, Denmark Free Updates Client Source Code Next Business Day Service Level Agreements for Support Early Updates and Hotfixes Direct Contact to the File System Developers Now under heavy load, our large BeeGFS system is performing well - bioinfo users are seeing >2x speedup in their apps, with no more hotspots when hitting common index files. Unlike the previous system, BeeGFS does not lock up under heavy load and even under heavy load, interactive use remains zippy. The only complaint is how long it's taking us to move data off the old system. Harry Mangalam, UC Irvine, USA Enterprise Features (High-availability, quota, ACLs) Server Source Code (under NDA) Customer Section: HowTo s and more Documentation The performance is on the same level as physical servers with an attached RAID. Genomatix, Germany

HAPPY USERS Scientific Computing BeeGFS is widely popular among universities and the research community and is used all over the world, where scientific computing is done. Life Sciences BeeGFS is also becoming the parallel file system of choice in life sciences. The fast growing amount of sequential data to store and analyze make BeeGFS together with other Fraunhofer solutions the first choice for our users. Oil & Gas Searching for hydrocarbons is one of the traditional fields of High-Performance Computing and BeeGFS with its high throughput and bandwidth caters to the industry since it became publicly available. ThinkParQ GmbH Corporate Office Sales & Consulting Trippstadter Str. 110 67663 Kaiserslautern Germany Marco Merkel Phone: +49 170 9283245 sales@thinkparq.com www.thinkparq.com www.beegfs.com