Fujitsu HPC Cluster Suite



Similar documents
Sun in HPC. Update for IDC HPC User Forum Tucson, AZ, Sept 2008

SUN HPC SOFTWARE CLUSTERING MADE EASY

PRIMERGY server-based High Performance Computing solutions

Cluster Implementation and Management; Scheduling

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Current Status of FEFS for the K computer

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

New Storage System Solutions

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Bright Cluster Manager

The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC Denver

MPI / ClusterTools Update and Plans

IBM Platform Computing : infrastructure management for HPC solutions on OpenPOWER Jing Li, Software Development Manager IBM

Scaling from Workstation to Cluster for Compute-Intensive Applications

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

HPC Software Requirements to Support an HPC Cluster Supercomputer

SRNWP Workshop. HP Solutions and Activities in Climate & Weather Research. Michael Riedmann European Performance Center

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Smarter Cluster Supercomputing from the Supercomputer Experts

Sun Constellation System: The Open Petascale Computing Architecture

February, 2015 Bill Loewe

Red Hat Enterprise Linux 6. Stanislav Polášek ELOS Technologies

Designed for Maximum Accelerator Performance


The Asterope compute cluster

Scala Storage Scale-Out Clustered Storage White Paper

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

CAS2K5. Jim Tuccillo

Lessons learned from parallel file system operation

FLOW-3D Performance Benchmark and Profiling. September 2012

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

(Scale Out NAS System)

Provisioning and Resource Management at Large Scale (Kadeploy and OAR)

RED HAT: UNLOCKING THE VALUE OF THE CLOUD

<Insert Picture Here> Infrastructure as a Service (IaaS) Cloud Computing for Enterprises

Selling Virtual Private Servers. A guide to positioning and selling VPS to your customers with Heart Internet

LANDesk White Paper. LANDesk Management Suite for Lenovo Secure Managed Client

locuz.com HPC App Portal V2.0 DATASHEET

Cray DVS: Data Virtualization Service

Kriterien für ein PetaFlop System

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

HPC Update: Engagement Model

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

XSEDE Service Provider Software and Services Baseline. September 24, 2015 Version 1.2

Fine-grained File System Monitoring with Lustre Jobstat

LANL Computing Environment for PSAAP Partners

CTERA Portal Datacenter Edition

Open Source Datacenter Conference 2011 System Management with RHN Satellite. Dirk Herrmann, Solution Architect, Red Hat

Mellanox Academy Online Training (E-learning)

Quantum StorNext. Product Brief: Distributed LAN Client

Analysis and Implementation of Cluster Computing Using Linux Operating System

Smarter Cluster Supercomputing from the Supercomputer Experts

Using NeSI HPC Resources. NeSI Computational Science Team

Data Center Op+miza+on

Lustre SMB Gateway. Integrating Lustre with Windows

Hadoop on the Gordon Data Intensive Cluster

- An Essential Building Block for Stable and Reliable Compute Clusters

Red Hat enterprise virtualization 3.0 feature comparison

Overview of HPC Resources at Vanderbilt

High Performance Computing OpenStack Options. September 22, 2015

Ten Reasons to Switch from Maui Cluster Scheduler to Moab HPC Suite Comparison Brief

Highly-Available Distributed Storage. UF HPC Center Research Computing University of Florida

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

Supercomputer System for Numerical Weather Prediction by Taiwan Central Weather Bureau

ABAQUS High Performance Computing Environment at Nokia

Streamline Computing Linux Cluster User Training. ( Nottingham University)

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution

Application Performance for High Performance Computing Environments

Background and introduction Using the cluster Summary. The DMSC datacenter. Lars Melwyn Jensen. Niels Bohr Institute University of Copenhagen

Resource Scheduling Best Practice in Hybrid Clusters

1 Bull, 2011 Bull Extreme Computing

Quick Reference Selling Guide for Intel Lustre Solutions Overview

Cloud Implementation using OpenNebula

SR-IOV: Performance Benefits for Virtualized Interconnects!

System Software for High Performance Computing. Joe Izraelevitz

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

INTRODUCTION TO CLOUD MANAGEMENT

Solution for private cloud computing

Cloud Computing through Virtualization and HPC technologies

SMB Direct for SQL Server and Private Cloud

Running Native Lustre* Client inside Intel Xeon Phi coprocessor

SR-IOV In High Performance Computing

Selling Compellent NAS: File & Block Level in the Same System Chad Thibodeau

With Red Hat Enterprise Virtualization, you can: Take advantage of existing people skills and investments

Big data management with IBM General Parallel File System

Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre

FOR SERVERS 2.2: FEATURE matrix

Integrated Grid Solutions. and Greenplum

Private cloud computing advances

Technical Overview of Windows HPC Server 2008

Transcription:

Webinar Fujitsu HPC Cluster Suite 29 th May 2013 Павел Борох 0

HPC: полный спектр предложений от Fujitsu PRIMERGY Server, Workstation Cluster Management & Operation ISV and Research Partnerships HPC Cluster Suite FEFS ETERNUS Storage Cent OS Gateway PreDiCT Initiative Open Petascale Libraries Network Sizing, design Consulting and Integration Services Proof of concept Integration into customer environment Certified system and production environment Complete assembly, pre-installation and quality assurance Ready-to-Go Ready to Operate at delivery 1

HPC и необходимое ПО Кластерная архитектура и использование Когда предпочтителен интегрированные пакет ПО 2

Типичная архитектура HPC кластера HPC cluster File System (PFS) Inter-process communication (MPI) and PFS data traffic Пользователь запускает задачу здесь Pre/Post End-User processing workstations Head node 1 Workload manager Очередь eth1 задач здесь ib0 eth0 failover Job A Shared Disk Job B eth1 ib0 Infiniband ib0 Задачи выполняются здесь Compute nodes eth0 Head node 2 (Fail-over) eth0 Ethernet Management (job start/stop, NFS of /home) 3

Характеристики среды HPC Characteristic Ориентирована на задачи (Job) Не интерактивна Одновременность Description Расчѐты выполняются в виде задач («пакетный режим») на наборе вычислительных узлов (от десятков до тысяч). Возможны как последовательные (один процесс), так и параллельные (множество процессов) задачи За редкими исключениями работа с HPC не интерактивна На кластере может одновременно работать множество задач Разнообразие нагрузок Большие объемы данных Межузловые коммуникации Нагрузки (число ядер на задачу) различаются в зависимости от приложения (уровня параллелизма) Многие приложения производят и используют большие объемы данных за малые интервалы времени Параллельные приложения требуют наличия скоростного межузлового интерконнекта (напр. InfiniBand). Необходимо для передачи данных и для коммуникации между процессами 4

Почему интегрированное решение? HPC SW stack must be installed on 10 s to 1,000 s of nodes Same OS installed with same options on every node Resource manager / MPI libraries / Scientific libraries Systems must conform to a standard basic set of operating conditions uid, gid and password exactly the same across all nodes Shared home file system across all nodes Common temporary storage across all nodes Password-less access for user sessions across all nodes Time consuming, tedious and error-prone set-up Operating conditions items do not scale across more than a few nodes when direct human action is needed Software choices for any individual option is daunting Compilers: GNU, Intel, PGI, Absoft Resource managers: LSF, PBSPro, Torque, Moab, SGE, SLURM, CONDOR MPI: OpenMPI, MPICH, MVAPICH, Intel MPI, Platform MPI Need to validate software choices and drivers Complex and difficult Improved TCO Improved Quality Reduced IT Cost Shortened Delivery Simple and Validated Workload manager libraries Cluster deployment & management Web based end-user interface Operating System Operating System 5

HPC software stack: типичный ландшафт Зрелый стэк ПО обеспечивает: Ввод узлов в эксплуатацию и администрирование пакетов ПО («заливок») «Workload manager» для управления задачами и ресурсами Параллельную среду обработки с необходимыми библиотеками Инструментарий для разработчиков Опции хранилища данных (NFS, PFS) Эти компоненты в принципе одни и те же Различие только в конкретных используемых продуктах Middleware Application programs Workload manager Management of cluster resources Manage serial and parallel jobs Fair share usage between users Automated installation and configuration RedHat Linux OS Drivers Scientific Libraries Cluster deployment and management Administrator interface Operation and monitoring CentOS Graphical end-user interface Operating System Compilers, performance and profiling tools User environment management Fujitsu PRIMERGY HPC Clusters GPGPU and XEON Phi software support File System Cluster checker 6

Fujitsu Software HPC Cluster Suite (HCS) Функционал и редакции 7

HPC Cluster Suite: позиционирование редакций Open edition Ограниченный бюджет Компоненты с открытым кодом достаточны Собственный опыт в настройке и эксплуатации кластера (напр. академ.) Доступ к обновлениям не критичен Basic edition Требуется поддержка (напр. индустриальные пользователи) Продвинутые функции планирования не требуются Относительно небольшие кластеры (СМБ, группы разработки) Advanced edition Требуется продвинутый функционал планирования задач Полная поддержка менеджера ресурсов Более настраиваемый HPC Gateway Требуется разработка правил обработки потока задач Note: Editions are not field upgradeable 8

Описание - Open / Basic / Advanced Main features Open Edition Basic Edition Advanced Edition Easy-to-use and scalable cluster deployment and management CDM Intel Cluster Checker CDM Intel Cluster Checker CDM Intel Cluster Checker Workload managers Torque SGE and SLURM Torque SGE and SLURM Altair PBS Professional file system No FEFS FEFS General HPC Open Source Software components MPI, parallel libraries, compilers, BMT tools Graphical end-user interface - Gateway with various ISV application catalogs Yes Yes Yes Gateway Demo Gateway Basic Gateway Advanced Line command administrator interface Yes Yes Yes Monitoring and alerting Open Source Proprietary (planned) Open Source Proprietary (planned) Open Source Proprietary (planned) Development Environment GNU Intel Cluster Studio XE GNU Intel Cluster Studio XE GNU Intel Cluster Studio XE Intel Cluster Ready Yes Yes Yes Recommended cluster size Up to 128 nodes Up to 128 nodes Up to 1024 nodes High Availability (HA) No No Yes Support and Maintenance and upgrade No perpetual Yes (9hx5) 1/3/5 year subscription Yes (9hx5) 1/3/5 year subscription 9

Fujitsu HCS поддержка ОС HCS Version 1.0 Hardware platform RHEL PRIMERGY SandyBridge RX / CX RHEL 5.8 RHEL 6.3 SUSE - CentOS - CentOS 6.3 (compute node only) 11

HPC Cluster Suite: категории SKUs planning Feature (basic / advanced / open edition) / customer segment (academic / commercial) and cluster size Per node licensing HPC Cluster Suite SKUs Подписка Размер кластера, лицензии на каждый узел внутри категории Open Edition вечная 1-128 узлов Basic Advanced Academic + Research Commercial Academic + Research Commercial 1Y 3Y 5Y 1Y 3Y 5Y 1Y 3Y 5Y 1Y 3Y 5Y 12 До 16 узлов (управляющие + вычислительные) До 64 узлов (управляющие + вычислительные) 65+ узлов (управляющие + вычислительные)

Компоненты Fujitsu HCS Описание компонентов пакета 16

The Fujitsu HPC Cluster Suite (HCS) Полнофункциональный пакет для управления кластерами на основе Fujitsu PRIMERGY Easy-to-use cluster management Popular workload managers General HPC Open Source Software Highly scalable parallel file system Graphical end-user interface for simplified usage Альянс с ведущими разработчиками Полностью протестированное решение для HPC Middleware Application programs Workload manager Management of cluster resources Manage serial and parallel jobs Fair share usage between users Automated installation and configuration RedHat Linux OS Drivers Scientific Libraries Cluster deployment and management Administrator interface Operation and monitoring CentOS Fujitsu HPC Cluster Suite Graphical end-user interface User environment management Operating System Compilers, performance and profiling tools Fujitsu PRIMERGY HPC Clusters Cluster checker GPGPU and XEON Phi software support File System 17

Software stack components Operating Systems + Drivers Importance Essential Why needed Core software supporting the hardware platform Enables support for hardware with no standard OS drivers (IB, 10GbE, Disk controllers) Availability for HCS RedHat EL 5.x/6.x CentOS EL 5.x/6.x Value add Drivers are integrated to the HCS repository for simple cluster deployment Middleware Automated installation and configuration Application programs Workload manager Cluster deployment and management Administrator interface Operation and monitoring Fujitsu HPC Cluster Suite Graphical end-user interface Scientific Libraries Management of cluster resources Manage serial and parallel jobs Fair share usage between users User environment management Operating System Compilers, performance and profiling tools Cluster checker RedHat Linux CentOS GPGPU and XEON Phi OS Drivers software support File System Fujitsu PRIMERGY HPC Clusters 18

Software stack components Co-processor support Importance Essential depending on hardware configuration Why needed To support clusters with co-processor nodes Availability for HCS GPGPU CUDA with OpenCL, drivers and dev. tools Xeon Phi Intel Manycore Platform Software Stack (MPSS) Middleware Application programs Fujitsu HPC Cluster Suite Graphical end-user interface Scientific Libraries Workload manager Compilers, performance and profiling tools Management of cluster resources Manage serial and parallel jobs Fair share usage between users Cluster deployment and management File System Value add Easy installable add-on packages for GPGPU and Xeon Phi Automated installation and configuration RedHat Linux OS Drivers Administrator interface Operation and monitoring CentOS User environment management Operating System Cluster checker GPGPU and XEON Phi software support Fujitsu PRIMERGY HPC Clusters 19

Software stack components Cluster deployment and management Importance Essential Why needed Bare metal deployment of nodes Cluster configuration management Monitoring of cluster health Availability for HCS Cluster Deployment Manager (CDM Fujitsu developed product) Intel Cluster Checker for validation Nagios/Ganglia for monitoring and alerting(now) AdminGUI (codename) graphical interface for management and monitoring (future) Value add Comprehensive deployment tool for small or large clusters Single graphical web-based interface for all activities Middleware Application programs Workload manager Management of cluster resources Manage serial and parallel jobs Fair share usage between users Automated installation and configuration RedHat Linux OS Drivers Scientific Libraries Cluster deployment and management Administrator interface Operation and monitoring CentOS Fujitsu HPC Cluster Suite Graphical end-user interface User environment management Operating System Compilers, performance and profiling tools Fujitsu PRIMERGY HPC Clusters Cluster checker GPGPU and XEON Phi software support File System 20

Software stack components Workload managers Importance Essential Why needed Enables sharing of all cluster resources between various users Manages policies to determine order of resource usage Availability for HCS Open source choices TORQUE SGE SLURM Commercial PBS Professional for advanced edition Value add Variety gives ability to meet the needs of many customers PBSPro can meet the needs of the most demanding customers and systems Middleware Application programs Workload manager Management of cluster resources Manage serial and parallel jobs Fair share usage between users Automated installation and configuration RedHat Linux OS Drivers Scientific Libraries Cluster deployment and management Administrator interface Operation and monitoring CentOS Fujitsu HPC Cluster Suite Graphical end-user interface User environment management Operating System Compilers, performance and profiling tools Fujitsu PRIMERGY HPC Clusters Cluster checker GPGPU and XEON Phi software support File System 21

Software stack components middleware Importance Essential for parallel applications running across multiple nodes Why needed Provides the software layer needed for internode process communication Availability for HCS Open source OpenMPI MPICH MVAPICH Commercial Intel MPI Value add Variety makes it possible to bid to many customers Some customers need multiple options due to application dependencies Middleware Application programs Workload manager Management of cluster resources Manage serial and parallel jobs Fair share usage between users Automated installation and configuration RedHat Linux OS Drivers Scientific Libraries Cluster deployment and management Administrator interface Operation and monitoring CentOS Fujitsu HPC Cluster Suite Graphical end-user interface User environment management Operating System Compilers, performance and profiling tools Fujitsu PRIMERGY HPC Clusters Cluster checker GPGPU and XEON Phi software support File System 22

Software stack components Scientific libraries Importance Needed for some applications. Why needed Used most often for in-house code development Sometimes needed by ISV s Availability for HCS Lapack, ScalaPack BLAS netcdf, netcdf-devel hdf5 fftw, fftw-devel atlas, atlas-devel GMP Global Arrays MKL Value add Meets the demands of many customers Some customers need multiple options due to application dependencies Middleware Application programs Workload manager Management of cluster resources Manage serial and parallel jobs Fair share usage between users Automated installation and configuration RedHat Linux OS Drivers Scientific Libraries Cluster deployment and management Administrator interface Operation and monitoring CentOS Fujitsu HPC Cluster Suite Graphical end-user interface User environment management Operating System Compilers, performance and profiling tools Fujitsu PRIMERGY HPC Clusters Cluster checker GPGPU and XEON Phi software support File System 23

Software stack components Compilers, performance and profiling tools Importance Needed for software development Why needed Used to compile applications and provide tools to optimize application performance Availability for HCS Compilers GNU c, c++, gfort Open64 (PathScale compiler) Intel Cluster studio Profiling tools Intel Cluster studio Allinea DDT Performance tools Intel vtune PAPI TAU Value add Can meets the demands of many customers with both open source and commercial offerings Middleware Application programs Workload manager Management of cluster resources Manage serial and parallel jobs Fair share usage between users Automated installation and configuration RedHat Linux OS Drivers Scientific Libraries Cluster deployment and management Administrator interface Operation and monitoring CentOS Fujitsu HPC Cluster Suite Graphical end-user interface User environment management Operating System Compilers, performance and profiling tools Fujitsu PRIMERGY HPC Clusters Cluster checker GPGPU and XEON Phi software support File System 24

Software stack components file system Importance Needed for demanding I/O requirements Why needed Usually essential for large clusters (>64 nodes) Can be used on smaller clusters if I/O load is expected to be high Availability for HCS Fujitsu Exabyte File System (FEFS), developed and maintained by Fujitsu Value add Originally developed for the demands of the K-Computer Inherits reliability and performance enhancements of this system Updates passed back to the community Middleware Application programs Workload manager Management of cluster resources Manage serial and parallel jobs Fair share usage between users Automated installation and configuration RedHat Linux OS Drivers Scientific Libraries Cluster deployment and management Administrator interface Operation and monitoring CentOS Fujitsu HPC Cluster Suite Graphical end-user interface User environment management Operating System Compilers, performance and profiling tools Cluster checker GPGPU and XEON Phi software support File System Note: NFS can be used for small or low I/O demanding clusters. Either storage from the head node or a specified NAS server is used in these cases. 25 Fujitsu PRIMERGY HPC Clusters

Software stack components Graphical end-user interface Importance Attractive to end-users Why needed Simplifies the usage of HPC for end-users Enables sharing of results and data between team members Can be used from remote locations Availability for HCS HPC Gateway Value add Used to provide pre-packaged solutions for running applications Enables non-hpc specialist to use a HPC cluster Middleware Application programs Workload manager Management of cluster resources Manage serial and parallel jobs Fair share usage between users Automated installation and configuration RedHat Linux OS Drivers Scientific Libraries Cluster deployment and management Administrator interface Operation and monitoring CentOS Fujitsu HPC Cluster Suite Graphical end-user interface User environment management Operating System Compilers, performance and profiling tools Cluster checker GPGPU and XEON Phi software support File System Fujitsu PRIMERGY HPC Clusters 26

Fujitsu HPC Cluster Suite - V1.0 release Deployment Cluster Management Open Edition Basic Edition Advanced Edition CDM + SVIM (SVIM used for the installer node) Intel Cluster Checker *1 (includes: iozone, streams, HPL) ServerView Workload manager Torque (default) *1 PBS pro Co-processor support - Scientific Libraries Intel MKL*2 Libraries Open MPI*1, Intel MPI *2 Compilers GNU*1, Intel Cluster Studio XE *2 Performance and profiling tools 27 GNU (c, c++, g77, debug and profiler)*1, Intel Cluster Studio XE *2 /Shared File system NAS - Cloud Interface - - - End-User Interface HPC Gateway Entry HPC Gateway Basic HPC Gateway Advanced Other Recommended to 128 nodes Recommended to 128 nodes *1 Only installation support, does not include any technical support or fixes *2 Must be purchased separately HA feature up to 1024 nodes > 1024 as project bid

Cluster Deployment Manager Managing the cluster and configuration 28

CDM - Easy-to-use cluster management Powerful tool used to improve the productivity by reducing the TCO. Leveraged know-how from high-end HPC (K-Computer) CDM Automates compute node installation and cluster configuration - Deployment of the operating system and all HPC software components as well as their related configuration (including PRIMERGY specific drivers) - Ability to add/modify/remove additional software components and their configuration for all nodes via a single command Installation process SVIM Installs the OS on the installer or head node of the cluster - Automatic hardware detection and apply proper drivers 29

CDM overview of operation Management from the head (installer) node Operations can be achieved from the head node (no changes on individual nodes) Modification of configuration files Copying files to nodes Installing software components Add new users to the system Add/remove/replace nodes of the cluster shell can be used to execute commands across the whole cluster Variety of node types can be deployed Multiple node groups can be used head, compute, login, I/O, ftp, compilation Different OS s can be used A separate repository is used to manage each OS to be used Node groups use software from one of the repositories 30

CDM based cluster architecture - SME use case - Fujitsu CDM External DNS server Public Public network CDM Repository installer Node group Installer node (Mgmt node) Provisioning network Management/ data network Compute node group Compute node # 1 Compute node # 2 Compute node # 3 Compute node # 4 Compute node # 5 Interconnect Ethernet (or IB) Compute node # n External NTP server DX80 31

CDM based cluster architecture - Medium/large user - External DNS server Public Public network External NTP server Fujitsu CDM installer Node group Head node 1 CDM Batch server Head node 2 CDM Batch server Login Node group Login node Login node Login node Login node Fail over CDM Repository Provisioning network Management/ data network DX80 Compute node # X1 Compute node # X2 Compute node # XX Compute1 node group Compute node # Y1 Compute node # Y2 Compute node # YY Compute2 node group IO node # Z1 IO node # ZZ IO node group Interconnect Ethernet Interconnect InfiniBand 32

Fujitsu PRIMERGY HPC Gateway A portal to the HPC work place Integrated in the HPC Cluster Suite 33

HPC Gateway An integrated web environment Built on Liferay Portal and Tomcat application server All tools accessible from a desktop browser HPC resources used as an extension of the desktop (Process Manager) Share, exchange and track activity across the team (Wiki, Documents, Calendar, Forum, KnowledgeBase) Application aware using application catalogue templates 34

Gateway architecture Pre/Post processing Gateway web interface Head node Tomcat App. server Liferay portal File System (PFS) Inter-process communication (MPI) and PFS data traffic Infiniband ib0 End-User workstations HPC Gateway portlet Gateway submits jobs Disk Disk Job A Job B Jobs run here Compute nodes Workload manager Jobs queued here eth0 Ethernet Management (job start/stop, NFS of /home) 35

Gateway differentiation for HCS versions Open Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy job scripts Yes Yes Yes On-boarding new applications (creating an Application template) Import templates from Application catalogue (Fujitsu download) Yes Yes Yes Payable Payable Yes Import workflow (own or 3 rd party processes) Yes Yes Yes Graphical desktop administration interface No No Yes Workflow editor No No Yes Collaboration (Wiki, Documents, Calendar, Forum, KnowledgeBase) Yes Yes Yes Multiple Business projects No No Yes Customizable security model No No Yes Access Multiple clusters in one site No No Yes Number of concurrent users 2 100 400 Support No Yes Yes 36

File System Fujitsu's Exabyte File System FEFS 38

of common file system types NAS Clustered Distributed client client client client client client client client client client client client client client Ethernet Ethernet or IB IB or Ethernet Ethernet or IB (locally) Ethernet or IB (locally) NAS server I/O server I/O server MDS server I/O server I/O server MDS server I/O server I/O server I/O server MDS server Meta data User data Meta data User data Meta data Normally accessed via NFS Simple set-up but limited performance More scalable versions require proprietary client modules Multiple I/O servers each with access to all the file system Clients and servers normally on the same network Bottleneck for large numbers of clients or heavy I/O Multiple I/O servers each with a part of the total file system Clients and servers normally on the same network IB used for high-speed access Very scalable (just add more I/O servers) Perform well for large block I/O Data can exist over different sites Emphasis on data accessibility, duplication, reliability Performance can vary due to network bandwidth when data is not local 39

HPC file systems (temporary storage) Main usage Applicable File system types Temporary job run-time data Permanent storage is also needed (not discussed) File system needs Global Name Space Different Locking: File/Block/Byte Security: global authentication/authorization Reliability: No Single point of Failure Availability: add nodes/capacity without downtime Scalability: Capacity/number of files Standards: IEEE Posix High Performance: bandwidth, throughput NAS file system (NFS) file system (GPFS, Lustre, FEFS) Aspects affecting file system choice Total throughput requirements (if known) Size of the cluster (# of file system client nodes) Size of the file system to be used Whether apps are I/O bound or compute only Number of concurrent jobs Application is I/O intensive (e.g. Nastran) 40

FEFS characteristics Extremely Large capacity Extra-large volume (100PB~1EB) Massive number of clients (100k~1M) & I/O servers (1k~10k) High I/O Performance Throughput of Single-stream (~GB/s) & IO (~TB/s) Reducing file open latency (~10k ops) High Reliability and High Availability Continuation of file service even if a component failure occurs I/O Usage Management Fair-share QoS Best-effort QoS FEFS is optimized for maximizing hardware performance while minimizing file I/O overhead Meta Data Meta Data Server (MDS) Client Nodes File Data Object Storage Server (OSS) Object Storage Target (OST) 41

Specification of FEFS and Lustre Feature FEFS Current Lustre System Limits Node Scalability Max file system size Max file size Max #files Max OST size Max stripe count Max ACL entries Max #OSTs Max #clients 8EB 8EB 8E 1PB 20k 8191 20k 1M 64PB 320TB 4G 16TB 160 32 8150 128K Usability QoS Yes No Directory Quota Yes No InfiniBand Multi-rail Yes No Block Size (Backend File System) ~512KB 4KB 42

FEFS typical configuration Compute Compute cluster Compute cluster Compute cluster Compute cluster cluster Login nodes Note: All FEFS servers are configured with fail-over OSS configuration fail-over pair 1 FC IB PRIMERGY RX300 FC IB FC FC FC IB PRIMERGY RX300 FC IB FC FC MDS (fail-over pair) IB switch network IB IB IB MDS OSS configuration fail-over pair n OSS1 OSS2 PRIMERGY RX300 OSS x OSS y CM CM DX80 #1 IB FC IB PRIMERGY RX300 FC IB FC FC FC IB PRIMERGY RX300 FC IB FC FC CM CM DX80 #1 CM CM DX80 #2 CM CM DX80 #3 CM CM DX80 #4 CM CM DX80 #5 CM CM DX80 #6 CM CM DX80 #7 CM CM DX80 #8 OST s OST s OST s OST s OST s OST s OST s OST s OST s OST s OST s OST s OST s OST s OST s OST s CM CM DX80 #1 CM CM DX80 #2 CM CM DX80 #3 CM CM DX80 #4 CM CM DX80 #5 CM CM DX80 #6 CM CM DX80 #7 CM CM DX80 #8 43

Support matrix FEFS Version Supported OS V1 MDS/OSS RedHat EL 5.8 Client RedHat EL 5.8/6.3 Supported PRIMERGY servers All PRIMERGY supported by the HPC Cluster suite Usable Storage units ETERNUS DX80S2/90S2 DX410S2/DX440S2 DDN *1 SFA12K *1: usage of DDN is on a project bid basis only 44

HPC: полный спектр предложений от Fujitsu PRIMERGY Server, Workstation Cluster Management & Operation ISV and Research Partnerships HPC Cluster Suite FEFS ETERNUS Storage Cent OS Gateway PreDiCT Initiative Open Petascale Libraries Network Sizing, design Consulting and Integration Services Proof of concept Integration into customer environment Certified system and production environment Complete assembly, pre-installation and quality assurance Ready-to-Go Ready to Operate at delivery 45

46

Overview of competitors Cluster Management - Deployment - Monitoring Fujitsu BCM Stack IQ IBM HP DELL X; CDM X; BCM X; Rocks+ X; PCM / xcat X; CMU X; resell Workload Manager X; resell & OSS X; resell & OSS X; resell & OSS X; sell LSF X; resell & OSS X; resell OSS integration X X; BCM X X X X; resell ISV integration X - - X - - Graphical Administrator interface - (planned) X; BCM X X X X; resell Graphical end user interface X; Gateway - - X; PAC - - Cloud integration - (planned) X; BCM X X - X; resell X X X HW integration - Validation - HW monitoring - BIOS setting X * HW monitoring - BIOS setting - * Rely on HW vendor - * Rely on HW vendor Application template X; Gateway - - X; PAC - X; resell Process integration X; Gateway - - - - - HW portfolio X; PRIMERGY - - X X X Global support X; (Planned) - - X X X PFS integration X; FEFS - - X; GPFS X; (HP SFS/Lustre) X; Lustre 47