www.thinkparq.com www.beegfs.com
KEY ASPECTS Maximum Flexibility Maximum Scalability BeeGFS supports a wide range of Linux distributions such as RHEL/Fedora, SLES/OpenSuse or Debian/Ubuntu as well as a wide range of Linux kernels from 2.6.16 up to the latest vanilla. BeeGFS offers maximum scalability on various levels. It supports distributed file contents with flexible striping across the storage servers on a by file or by directory base as well as distributed metadata. The storage servers run on top of an existing local filesystem using the normal POSIX interface and clients and servers can be added to an existing system without downtime. BeeGFS was initially optimized especially for use in HPC and provides BeeGFS supports multiple networks and dynamic failover in case one of the network connections is down. Multiple BeeGFS services can run on the same physical machine, i.e. any combination of client and servers. Thus, BeeGFS can turn a compute rack, in which nodes are equipped with internal drives, into a powerful combined data processing and HPC shared storage unit, eliminating the need for external storage resources and providing a very cost-efficient solution with simplified management. Best in class client throughput (3.5 GB/s write and 4.0 GB/s read with a single I/O stream on FDR Infiniband) Best in class metadata performance: linear scalability through dynamic metadata namespace partitioning Best in class storage throughput: BeeGFS servers allow flexible choice of underlying file system to perfectly fit the given storage hardware.
System Architecture Maximum Usability The BeeGFS servers are userspace daemons and do not require special administration rights. The client is a kernel module that does not require any patches to the kernel itself. For setup and updates there are rpm/deb packages available; for the startup mechanism, easy-to-use init scripts are provided. BeeGFS also was designed with easy administration in mind. The graphical administration and monitoring system enables users to deal with typical management tasks in a simple and intuitive way: cluster installation, storage service management, live throughput and health monitoring, file browsing, striping configuration, and more. Excellent documentation and next business day available user support help to have the whole system up and running in one hour.
TOOLS & FEATURES Features Helpful Tools We have already talked about BeeGFS key aspects scalability, flexibility and usability and what s behind them. But there are way more features in BeeGFS : BeeGFS comes with built-in benchmark tools for the system s performance. No additional installations are required, simply run and see the results for a quick check or a deep analysis. Data and metadata for each file are separated into stripes and distributed accross the existing servers to aggregate the server speed. Secure authentication process between client and server to protect sensitive data. Fair I/O option on user level to prevent a single user with multiple requests to stall other users requests. Storage high-availability base on replication Automatic network failover, i.e. if Infiniband is down, BeeGFS automatically uses Ethernet (if available) Optional server failover for external shared memory resources. Runs on non-standard architectures, e.g. PowerPC, ARM, or Intel Xeon Phi. Not in the list? Just get in touch and we re happy to discuss all your questions. BeeGFS features easy to use graphical and command line tools which allow flexible adaptions of the file system to your specific requirements, while our monitoring tools deliver precise feedback to the system status at any given moment. Beyond the overall system status, our monitoring tools allow to monitor live statistics for individual users Dedicated file system checks that go beyond the local checks and take typical sources of error in a parallel file system into consideration. Due to a strict separation of data gathering and data checking, there is no load on the servers during the checking phase, thus a long running check does not affect the servers. Most important, check and repairs can be performed while the file system is up, so there s no down-time for the users. There s much more to discover, just get in touch with us!
BEEGFS ON DEMAND FUTURE ROADMAP BeeOND Exascale Plans BeeOND (BeeGFS on demand) allows on the fly creation of a complete parallel file system instance on a given set of hosts with just one single command line. Exascale is a topic that is challenging HPC community on all levels. From energy efficient chips and new programming methods down to I/O and system integrity. Node #01 Node #02 Node #03 Node #n... user-controlled data staging BeeGFS plays an important part in the European Union s DEEPER project* (Dynamic Exascale Entry Platform - Extended Reach) which is addressing the problems of the growing gap between compute speed and I/O bandwith and system resiliency for large scale systems. An exascale development storage lab at Fraunhofer deals with the bandwidth challenge by using three different storage tiers for different purposes: Compute Nodes (1-2 TB NVRAM) Storage System BeeOND was originally designed to integrate with a cluster batch system to create temporary parallel file system instances on a per-job basis on the internal SSDs of compute nodes, which are part of a compute job. Such BeeOND instances do not only provide a very fast and easy to use temporary buffer, but also can keep a lot of I/O load for temporary files away from the global cluster storage. Other use cases for the tool are manifold and also include cloud computing or quick and easy temporary setups for testing purposes. Get in touch to find out how Fraunhofer and customers are taking advantage of this exciting feature! (HDD) Archive (Tape) I/O Rate: 50x+ BeeOND used as Cache (POSIX + Ext) I/O Rate: 10x BeeGFS used as global cluster storage (POSIX + Ext) I/O Rate: 1x Archive Resiliency is approached with various strategies and on different levels: RAID level and local file system, server failover with third party software, replication: distributed, synchronous mirror of directories is built into BeeGFS and covers RAID and server failures, * More to the DEEP-ER project under http://www.deep-er.eu
BENCHMARKS Metadata Operations Throughput Scalability BeeGFS was designed for extreme scalability. In a testbed1 with 20 servers and up to 640 client processes (32x the number of metadata servers), BeeGFS delivers a sustained file creation rate of more than 500,000 creates per second, making it possible to create one billion files in as little time as about 30 minutes. In the same testbed system with 20 servers - each equipped with a single node local performance of 1332 MB/s (write) and 1317 MB/s (read) - and 160 client processes, BeeGFS scales to a sustained throughput of 25 GB/s - which is 94.7 percent of the maximum theoretical local write and 94.1 percent of the maximum theoretical local read throughput. 539.7 25.2 30 600 read write creates 25 500 20 400 GB/s Thousand Creates / second 24.8 300 15 10 200 5 100 0 1 2 4 6 8 10 12 14 16 18 20 0 # Metadata Servers 1 Bechmark System: 20 servers with 2x Intel Xeon X5660 @ 2.8 GHz, 48GB RAM running a Scientific Linux 6.3, Kernel 2.6.32279. Each server is equipped with 4x Intel 510 Series SSD (RAID 0) running ext4 as well as QDR Infiniband. Tests performed using FhGFS version 2012.10. 2 1 4 6 8 10 12 # Storage Servers 14 16 18 20
LICENSING MODEL Free to use Customers Comments BeeGFS is free to use for self-supporting end users and can be downloaded directly from the website (address on the back of this brochure). Professional Support If you are already using BeeGFS but you don t have professional support, yet, here are a few reasons to rethink this. Free Edition Community Mailing List With Support "After many unplanned downtimes with our previous parallel filesystem, we moved to BeeGFS more than two years ago. Since then we didn't have any unplanned downtime for the filesystem anymore." Michael Hengst, University of Halle, Germany We are extremely happy with our 1.2PB BeeGFS installation on 30 servers. It is rock-solid. Rune Møllegaard Friborg, University of Aarhus, Denmark Free Updates Client Source Code Next Business Day Service Level Agreements for Support Early Updates and Hotfixes Direct Contact to the File System Developers Now under heavy load, our large BeeGFS system is performing well - bioinfo users are seeing >2x speedup in their apps, with no more hotspots when hitting common index files. Unlike the previous system, BeeGFS does not lock up under heavy load and even under heavy load, interactive use remains zippy. The only complaint is how long it's taking us to move data off the old system. Harry Mangalam, UC Irvine, USA Enterprise Features (High-availability, quota, ACLs) Server Source Code (under NDA) Customer Section: HowTo s and more Documentation The performance is on the same level as physical servers with an attached RAID. Genomatix, Germany
HAPPY USERS Scientific Computing BeeGFS is widely popular among universities and the research community and is used all over the world, where scientific computing is done. Life Sciences BeeGFS is also becoming the parallel file system of choice in life sciences. The fast growing amount of sequential data to store and analyze make BeeGFS together with other Fraunhofer solutions the first choice for our users. Oil & Gas Searching for hydrocarbons is one of the traditional fields of High-Performance Computing and BeeGFS with its high throughput and bandwidth caters to the industry since it became publicly available. ThinkParQ GmbH Corporate Office Sales & Consulting Trippstadter Str. 110 67663 Kaiserslautern Germany Marco Merkel Phone: +49 170 9283245 sales@thinkparq.com www.thinkparq.com www.beegfs.com