Webinar Fujitsu HPC Cluster Suite 29 th May 2013 Павел Борох 0
HPC: полный спектр предложений от Fujitsu PRIMERGY Server, Workstation Cluster Management & Operation ISV and Research Partnerships HPC Cluster Suite FEFS ETERNUS Storage Cent OS Gateway PreDiCT Initiative Open Petascale Libraries Network Sizing, design Consulting and Integration Services Proof of concept Integration into customer environment Certified system and production environment Complete assembly, pre-installation and quality assurance Ready-to-Go Ready to Operate at delivery 1
HPC и необходимое ПО Кластерная архитектура и использование Когда предпочтителен интегрированные пакет ПО 2
Типичная архитектура HPC кластера HPC cluster File System (PFS) Inter-process communication (MPI) and PFS data traffic Пользователь запускает задачу здесь Pre/Post End-User processing workstations Head node 1 Workload manager Очередь eth1 задач здесь ib0 eth0 failover Job A Shared Disk Job B eth1 ib0 Infiniband ib0 Задачи выполняются здесь Compute nodes eth0 Head node 2 (Fail-over) eth0 Ethernet Management (job start/stop, NFS of /home) 3
Характеристики среды HPC Characteristic Ориентирована на задачи (Job) Не интерактивна Одновременность Description Расчѐты выполняются в виде задач («пакетный режим») на наборе вычислительных узлов (от десятков до тысяч). Возможны как последовательные (один процесс), так и параллельные (множество процессов) задачи За редкими исключениями работа с HPC не интерактивна На кластере может одновременно работать множество задач Разнообразие нагрузок Большие объемы данных Межузловые коммуникации Нагрузки (число ядер на задачу) различаются в зависимости от приложения (уровня параллелизма) Многие приложения производят и используют большие объемы данных за малые интервалы времени Параллельные приложения требуют наличия скоростного межузлового интерконнекта (напр. InfiniBand). Необходимо для передачи данных и для коммуникации между процессами 4
Почему интегрированное решение? HPC SW stack must be installed on 10 s to 1,000 s of nodes Same OS installed with same options on every node Resource manager / MPI libraries / Scientific libraries Systems must conform to a standard basic set of operating conditions uid, gid and password exactly the same across all nodes Shared home file system across all nodes Common temporary storage across all nodes Password-less access for user sessions across all nodes Time consuming, tedious and error-prone set-up Operating conditions items do not scale across more than a few nodes when direct human action is needed Software choices for any individual option is daunting Compilers: GNU, Intel, PGI, Absoft Resource managers: LSF, PBSPro, Torque, Moab, SGE, SLURM, CONDOR MPI: OpenMPI, MPICH, MVAPICH, Intel MPI, Platform MPI Need to validate software choices and drivers Complex and difficult Improved TCO Improved Quality Reduced IT Cost Shortened Delivery Simple and Validated Workload manager libraries Cluster deployment & management Web based end-user interface Operating System Operating System 5
HPC software stack: типичный ландшафт Зрелый стэк ПО обеспечивает: Ввод узлов в эксплуатацию и администрирование пакетов ПО («заливок») «Workload manager» для управления задачами и ресурсами Параллельную среду обработки с необходимыми библиотеками Инструментарий для разработчиков Опции хранилища данных (NFS, PFS) Эти компоненты в принципе одни и те же Различие только в конкретных используемых продуктах Middleware Application programs Workload manager Management of cluster resources Manage serial and parallel jobs Fair share usage between users Automated installation and configuration RedHat Linux OS Drivers Scientific Libraries Cluster deployment and management Administrator interface Operation and monitoring CentOS Graphical end-user interface Operating System Compilers, performance and profiling tools User environment management Fujitsu PRIMERGY HPC Clusters GPGPU and XEON Phi software support File System Cluster checker 6
Fujitsu Software HPC Cluster Suite (HCS) Функционал и редакции 7
HPC Cluster Suite: позиционирование редакций Open edition Ограниченный бюджет Компоненты с открытым кодом достаточны Собственный опыт в настройке и эксплуатации кластера (напр. академ.) Доступ к обновлениям не критичен Basic edition Требуется поддержка (напр. индустриальные пользователи) Продвинутые функции планирования не требуются Относительно небольшие кластеры (СМБ, группы разработки) Advanced edition Требуется продвинутый функционал планирования задач Полная поддержка менеджера ресурсов Более настраиваемый HPC Gateway Требуется разработка правил обработки потока задач Note: Editions are not field upgradeable 8
Описание - Open / Basic / Advanced Main features Open Edition Basic Edition Advanced Edition Easy-to-use and scalable cluster deployment and management CDM Intel Cluster Checker CDM Intel Cluster Checker CDM Intel Cluster Checker Workload managers Torque SGE and SLURM Torque SGE and SLURM Altair PBS Professional file system No FEFS FEFS General HPC Open Source Software components MPI, parallel libraries, compilers, BMT tools Graphical end-user interface - Gateway with various ISV application catalogs Yes Yes Yes Gateway Demo Gateway Basic Gateway Advanced Line command administrator interface Yes Yes Yes Monitoring and alerting Open Source Proprietary (planned) Open Source Proprietary (planned) Open Source Proprietary (planned) Development Environment GNU Intel Cluster Studio XE GNU Intel Cluster Studio XE GNU Intel Cluster Studio XE Intel Cluster Ready Yes Yes Yes Recommended cluster size Up to 128 nodes Up to 128 nodes Up to 1024 nodes High Availability (HA) No No Yes Support and Maintenance and upgrade No perpetual Yes (9hx5) 1/3/5 year subscription Yes (9hx5) 1/3/5 year subscription 9
Fujitsu HCS поддержка ОС HCS Version 1.0 Hardware platform RHEL PRIMERGY SandyBridge RX / CX RHEL 5.8 RHEL 6.3 SUSE - CentOS - CentOS 6.3 (compute node only) 11
HPC Cluster Suite: категории SKUs planning Feature (basic / advanced / open edition) / customer segment (academic / commercial) and cluster size Per node licensing HPC Cluster Suite SKUs Подписка Размер кластера, лицензии на каждый узел внутри категории Open Edition вечная 1-128 узлов Basic Advanced Academic + Research Commercial Academic + Research Commercial 1Y 3Y 5Y 1Y 3Y 5Y 1Y 3Y 5Y 1Y 3Y 5Y 12 До 16 узлов (управляющие + вычислительные) До 64 узлов (управляющие + вычислительные) 65+ узлов (управляющие + вычислительные)
Компоненты Fujitsu HCS Описание компонентов пакета 16
The Fujitsu HPC Cluster Suite (HCS) Полнофункциональный пакет для управления кластерами на основе Fujitsu PRIMERGY Easy-to-use cluster management Popular workload managers General HPC Open Source Software Highly scalable parallel file system Graphical end-user interface for simplified usage Альянс с ведущими разработчиками Полностью протестированное решение для HPC Middleware Application programs Workload manager Management of cluster resources Manage serial and parallel jobs Fair share usage between users Automated installation and configuration RedHat Linux OS Drivers Scientific Libraries Cluster deployment and management Administrator interface Operation and monitoring CentOS Fujitsu HPC Cluster Suite Graphical end-user interface User environment management Operating System Compilers, performance and profiling tools Fujitsu PRIMERGY HPC Clusters Cluster checker GPGPU and XEON Phi software support File System 17
Software stack components Operating Systems + Drivers Importance Essential Why needed Core software supporting the hardware platform Enables support for hardware with no standard OS drivers (IB, 10GbE, Disk controllers) Availability for HCS RedHat EL 5.x/6.x CentOS EL 5.x/6.x Value add Drivers are integrated to the HCS repository for simple cluster deployment Middleware Automated installation and configuration Application programs Workload manager Cluster deployment and management Administrator interface Operation and monitoring Fujitsu HPC Cluster Suite Graphical end-user interface Scientific Libraries Management of cluster resources Manage serial and parallel jobs Fair share usage between users User environment management Operating System Compilers, performance and profiling tools Cluster checker RedHat Linux CentOS GPGPU and XEON Phi OS Drivers software support File System Fujitsu PRIMERGY HPC Clusters 18
Software stack components Co-processor support Importance Essential depending on hardware configuration Why needed To support clusters with co-processor nodes Availability for HCS GPGPU CUDA with OpenCL, drivers and dev. tools Xeon Phi Intel Manycore Platform Software Stack (MPSS) Middleware Application programs Fujitsu HPC Cluster Suite Graphical end-user interface Scientific Libraries Workload manager Compilers, performance and profiling tools Management of cluster resources Manage serial and parallel jobs Fair share usage between users Cluster deployment and management File System Value add Easy installable add-on packages for GPGPU and Xeon Phi Automated installation and configuration RedHat Linux OS Drivers Administrator interface Operation and monitoring CentOS User environment management Operating System Cluster checker GPGPU and XEON Phi software support Fujitsu PRIMERGY HPC Clusters 19
Software stack components Cluster deployment and management Importance Essential Why needed Bare metal deployment of nodes Cluster configuration management Monitoring of cluster health Availability for HCS Cluster Deployment Manager (CDM Fujitsu developed product) Intel Cluster Checker for validation Nagios/Ganglia for monitoring and alerting(now) AdminGUI (codename) graphical interface for management and monitoring (future) Value add Comprehensive deployment tool for small or large clusters Single graphical web-based interface for all activities Middleware Application programs Workload manager Management of cluster resources Manage serial and parallel jobs Fair share usage between users Automated installation and configuration RedHat Linux OS Drivers Scientific Libraries Cluster deployment and management Administrator interface Operation and monitoring CentOS Fujitsu HPC Cluster Suite Graphical end-user interface User environment management Operating System Compilers, performance and profiling tools Fujitsu PRIMERGY HPC Clusters Cluster checker GPGPU and XEON Phi software support File System 20
Software stack components Workload managers Importance Essential Why needed Enables sharing of all cluster resources between various users Manages policies to determine order of resource usage Availability for HCS Open source choices TORQUE SGE SLURM Commercial PBS Professional for advanced edition Value add Variety gives ability to meet the needs of many customers PBSPro can meet the needs of the most demanding customers and systems Middleware Application programs Workload manager Management of cluster resources Manage serial and parallel jobs Fair share usage between users Automated installation and configuration RedHat Linux OS Drivers Scientific Libraries Cluster deployment and management Administrator interface Operation and monitoring CentOS Fujitsu HPC Cluster Suite Graphical end-user interface User environment management Operating System Compilers, performance and profiling tools Fujitsu PRIMERGY HPC Clusters Cluster checker GPGPU and XEON Phi software support File System 21
Software stack components middleware Importance Essential for parallel applications running across multiple nodes Why needed Provides the software layer needed for internode process communication Availability for HCS Open source OpenMPI MPICH MVAPICH Commercial Intel MPI Value add Variety makes it possible to bid to many customers Some customers need multiple options due to application dependencies Middleware Application programs Workload manager Management of cluster resources Manage serial and parallel jobs Fair share usage between users Automated installation and configuration RedHat Linux OS Drivers Scientific Libraries Cluster deployment and management Administrator interface Operation and monitoring CentOS Fujitsu HPC Cluster Suite Graphical end-user interface User environment management Operating System Compilers, performance and profiling tools Fujitsu PRIMERGY HPC Clusters Cluster checker GPGPU and XEON Phi software support File System 22
Software stack components Scientific libraries Importance Needed for some applications. Why needed Used most often for in-house code development Sometimes needed by ISV s Availability for HCS Lapack, ScalaPack BLAS netcdf, netcdf-devel hdf5 fftw, fftw-devel atlas, atlas-devel GMP Global Arrays MKL Value add Meets the demands of many customers Some customers need multiple options due to application dependencies Middleware Application programs Workload manager Management of cluster resources Manage serial and parallel jobs Fair share usage between users Automated installation and configuration RedHat Linux OS Drivers Scientific Libraries Cluster deployment and management Administrator interface Operation and monitoring CentOS Fujitsu HPC Cluster Suite Graphical end-user interface User environment management Operating System Compilers, performance and profiling tools Fujitsu PRIMERGY HPC Clusters Cluster checker GPGPU and XEON Phi software support File System 23
Software stack components Compilers, performance and profiling tools Importance Needed for software development Why needed Used to compile applications and provide tools to optimize application performance Availability for HCS Compilers GNU c, c++, gfort Open64 (PathScale compiler) Intel Cluster studio Profiling tools Intel Cluster studio Allinea DDT Performance tools Intel vtune PAPI TAU Value add Can meets the demands of many customers with both open source and commercial offerings Middleware Application programs Workload manager Management of cluster resources Manage serial and parallel jobs Fair share usage between users Automated installation and configuration RedHat Linux OS Drivers Scientific Libraries Cluster deployment and management Administrator interface Operation and monitoring CentOS Fujitsu HPC Cluster Suite Graphical end-user interface User environment management Operating System Compilers, performance and profiling tools Fujitsu PRIMERGY HPC Clusters Cluster checker GPGPU and XEON Phi software support File System 24
Software stack components file system Importance Needed for demanding I/O requirements Why needed Usually essential for large clusters (>64 nodes) Can be used on smaller clusters if I/O load is expected to be high Availability for HCS Fujitsu Exabyte File System (FEFS), developed and maintained by Fujitsu Value add Originally developed for the demands of the K-Computer Inherits reliability and performance enhancements of this system Updates passed back to the community Middleware Application programs Workload manager Management of cluster resources Manage serial and parallel jobs Fair share usage between users Automated installation and configuration RedHat Linux OS Drivers Scientific Libraries Cluster deployment and management Administrator interface Operation and monitoring CentOS Fujitsu HPC Cluster Suite Graphical end-user interface User environment management Operating System Compilers, performance and profiling tools Cluster checker GPGPU and XEON Phi software support File System Note: NFS can be used for small or low I/O demanding clusters. Either storage from the head node or a specified NAS server is used in these cases. 25 Fujitsu PRIMERGY HPC Clusters
Software stack components Graphical end-user interface Importance Attractive to end-users Why needed Simplifies the usage of HPC for end-users Enables sharing of results and data between team members Can be used from remote locations Availability for HCS HPC Gateway Value add Used to provide pre-packaged solutions for running applications Enables non-hpc specialist to use a HPC cluster Middleware Application programs Workload manager Management of cluster resources Manage serial and parallel jobs Fair share usage between users Automated installation and configuration RedHat Linux OS Drivers Scientific Libraries Cluster deployment and management Administrator interface Operation and monitoring CentOS Fujitsu HPC Cluster Suite Graphical end-user interface User environment management Operating System Compilers, performance and profiling tools Cluster checker GPGPU and XEON Phi software support File System Fujitsu PRIMERGY HPC Clusters 26
Fujitsu HPC Cluster Suite - V1.0 release Deployment Cluster Management Open Edition Basic Edition Advanced Edition CDM + SVIM (SVIM used for the installer node) Intel Cluster Checker *1 (includes: iozone, streams, HPL) ServerView Workload manager Torque (default) *1 PBS pro Co-processor support - Scientific Libraries Intel MKL*2 Libraries Open MPI*1, Intel MPI *2 Compilers GNU*1, Intel Cluster Studio XE *2 Performance and profiling tools 27 GNU (c, c++, g77, debug and profiler)*1, Intel Cluster Studio XE *2 /Shared File system NAS - Cloud Interface - - - End-User Interface HPC Gateway Entry HPC Gateway Basic HPC Gateway Advanced Other Recommended to 128 nodes Recommended to 128 nodes *1 Only installation support, does not include any technical support or fixes *2 Must be purchased separately HA feature up to 1024 nodes > 1024 as project bid
Cluster Deployment Manager Managing the cluster and configuration 28
CDM - Easy-to-use cluster management Powerful tool used to improve the productivity by reducing the TCO. Leveraged know-how from high-end HPC (K-Computer) CDM Automates compute node installation and cluster configuration - Deployment of the operating system and all HPC software components as well as their related configuration (including PRIMERGY specific drivers) - Ability to add/modify/remove additional software components and their configuration for all nodes via a single command Installation process SVIM Installs the OS on the installer or head node of the cluster - Automatic hardware detection and apply proper drivers 29
CDM overview of operation Management from the head (installer) node Operations can be achieved from the head node (no changes on individual nodes) Modification of configuration files Copying files to nodes Installing software components Add new users to the system Add/remove/replace nodes of the cluster shell can be used to execute commands across the whole cluster Variety of node types can be deployed Multiple node groups can be used head, compute, login, I/O, ftp, compilation Different OS s can be used A separate repository is used to manage each OS to be used Node groups use software from one of the repositories 30
CDM based cluster architecture - SME use case - Fujitsu CDM External DNS server Public Public network CDM Repository installer Node group Installer node (Mgmt node) Provisioning network Management/ data network Compute node group Compute node # 1 Compute node # 2 Compute node # 3 Compute node # 4 Compute node # 5 Interconnect Ethernet (or IB) Compute node # n External NTP server DX80 31
CDM based cluster architecture - Medium/large user - External DNS server Public Public network External NTP server Fujitsu CDM installer Node group Head node 1 CDM Batch server Head node 2 CDM Batch server Login Node group Login node Login node Login node Login node Fail over CDM Repository Provisioning network Management/ data network DX80 Compute node # X1 Compute node # X2 Compute node # XX Compute1 node group Compute node # Y1 Compute node # Y2 Compute node # YY Compute2 node group IO node # Z1 IO node # ZZ IO node group Interconnect Ethernet Interconnect InfiniBand 32
Fujitsu PRIMERGY HPC Gateway A portal to the HPC work place Integrated in the HPC Cluster Suite 33
HPC Gateway An integrated web environment Built on Liferay Portal and Tomcat application server All tools accessible from a desktop browser HPC resources used as an extension of the desktop (Process Manager) Share, exchange and track activity across the team (Wiki, Documents, Calendar, Forum, KnowledgeBase) Application aware using application catalogue templates 34
Gateway architecture Pre/Post processing Gateway web interface Head node Tomcat App. server Liferay portal File System (PFS) Inter-process communication (MPI) and PFS data traffic Infiniband ib0 End-User workstations HPC Gateway portlet Gateway submits jobs Disk Disk Job A Job B Jobs run here Compute nodes Workload manager Jobs queued here eth0 Ethernet Management (job start/stop, NFS of /home) 35
Gateway differentiation for HCS versions Open Basic Advanced Run, monitor, view results of application jobs Yes Yes Yes Run legacy job scripts Yes Yes Yes On-boarding new applications (creating an Application template) Import templates from Application catalogue (Fujitsu download) Yes Yes Yes Payable Payable Yes Import workflow (own or 3 rd party processes) Yes Yes Yes Graphical desktop administration interface No No Yes Workflow editor No No Yes Collaboration (Wiki, Documents, Calendar, Forum, KnowledgeBase) Yes Yes Yes Multiple Business projects No No Yes Customizable security model No No Yes Access Multiple clusters in one site No No Yes Number of concurrent users 2 100 400 Support No Yes Yes 36
File System Fujitsu's Exabyte File System FEFS 38
of common file system types NAS Clustered Distributed client client client client client client client client client client client client client client Ethernet Ethernet or IB IB or Ethernet Ethernet or IB (locally) Ethernet or IB (locally) NAS server I/O server I/O server MDS server I/O server I/O server MDS server I/O server I/O server I/O server MDS server Meta data User data Meta data User data Meta data Normally accessed via NFS Simple set-up but limited performance More scalable versions require proprietary client modules Multiple I/O servers each with access to all the file system Clients and servers normally on the same network Bottleneck for large numbers of clients or heavy I/O Multiple I/O servers each with a part of the total file system Clients and servers normally on the same network IB used for high-speed access Very scalable (just add more I/O servers) Perform well for large block I/O Data can exist over different sites Emphasis on data accessibility, duplication, reliability Performance can vary due to network bandwidth when data is not local 39
HPC file systems (temporary storage) Main usage Applicable File system types Temporary job run-time data Permanent storage is also needed (not discussed) File system needs Global Name Space Different Locking: File/Block/Byte Security: global authentication/authorization Reliability: No Single point of Failure Availability: add nodes/capacity without downtime Scalability: Capacity/number of files Standards: IEEE Posix High Performance: bandwidth, throughput NAS file system (NFS) file system (GPFS, Lustre, FEFS) Aspects affecting file system choice Total throughput requirements (if known) Size of the cluster (# of file system client nodes) Size of the file system to be used Whether apps are I/O bound or compute only Number of concurrent jobs Application is I/O intensive (e.g. Nastran) 40
FEFS characteristics Extremely Large capacity Extra-large volume (100PB~1EB) Massive number of clients (100k~1M) & I/O servers (1k~10k) High I/O Performance Throughput of Single-stream (~GB/s) & IO (~TB/s) Reducing file open latency (~10k ops) High Reliability and High Availability Continuation of file service even if a component failure occurs I/O Usage Management Fair-share QoS Best-effort QoS FEFS is optimized for maximizing hardware performance while minimizing file I/O overhead Meta Data Meta Data Server (MDS) Client Nodes File Data Object Storage Server (OSS) Object Storage Target (OST) 41
Specification of FEFS and Lustre Feature FEFS Current Lustre System Limits Node Scalability Max file system size Max file size Max #files Max OST size Max stripe count Max ACL entries Max #OSTs Max #clients 8EB 8EB 8E 1PB 20k 8191 20k 1M 64PB 320TB 4G 16TB 160 32 8150 128K Usability QoS Yes No Directory Quota Yes No InfiniBand Multi-rail Yes No Block Size (Backend File System) ~512KB 4KB 42
FEFS typical configuration Compute Compute cluster Compute cluster Compute cluster Compute cluster cluster Login nodes Note: All FEFS servers are configured with fail-over OSS configuration fail-over pair 1 FC IB PRIMERGY RX300 FC IB FC FC FC IB PRIMERGY RX300 FC IB FC FC MDS (fail-over pair) IB switch network IB IB IB MDS OSS configuration fail-over pair n OSS1 OSS2 PRIMERGY RX300 OSS x OSS y CM CM DX80 #1 IB FC IB PRIMERGY RX300 FC IB FC FC FC IB PRIMERGY RX300 FC IB FC FC CM CM DX80 #1 CM CM DX80 #2 CM CM DX80 #3 CM CM DX80 #4 CM CM DX80 #5 CM CM DX80 #6 CM CM DX80 #7 CM CM DX80 #8 OST s OST s OST s OST s OST s OST s OST s OST s OST s OST s OST s OST s OST s OST s OST s OST s CM CM DX80 #1 CM CM DX80 #2 CM CM DX80 #3 CM CM DX80 #4 CM CM DX80 #5 CM CM DX80 #6 CM CM DX80 #7 CM CM DX80 #8 43
Support matrix FEFS Version Supported OS V1 MDS/OSS RedHat EL 5.8 Client RedHat EL 5.8/6.3 Supported PRIMERGY servers All PRIMERGY supported by the HPC Cluster suite Usable Storage units ETERNUS DX80S2/90S2 DX410S2/DX440S2 DDN *1 SFA12K *1: usage of DDN is on a project bid basis only 44
HPC: полный спектр предложений от Fujitsu PRIMERGY Server, Workstation Cluster Management & Operation ISV and Research Partnerships HPC Cluster Suite FEFS ETERNUS Storage Cent OS Gateway PreDiCT Initiative Open Petascale Libraries Network Sizing, design Consulting and Integration Services Proof of concept Integration into customer environment Certified system and production environment Complete assembly, pre-installation and quality assurance Ready-to-Go Ready to Operate at delivery 45
46
Overview of competitors Cluster Management - Deployment - Monitoring Fujitsu BCM Stack IQ IBM HP DELL X; CDM X; BCM X; Rocks+ X; PCM / xcat X; CMU X; resell Workload Manager X; resell & OSS X; resell & OSS X; resell & OSS X; sell LSF X; resell & OSS X; resell OSS integration X X; BCM X X X X; resell ISV integration X - - X - - Graphical Administrator interface - (planned) X; BCM X X X X; resell Graphical end user interface X; Gateway - - X; PAC - - Cloud integration - (planned) X; BCM X X - X; resell X X X HW integration - Validation - HW monitoring - BIOS setting X * HW monitoring - BIOS setting - * Rely on HW vendor - * Rely on HW vendor Application template X; Gateway - - X; PAC - X; resell Process integration X; Gateway - - - - - HW portfolio X; PRIMERGY - - X X X Global support X; (Planned) - - X X X PFS integration X; FEFS - - X; GPFS X; (HP SFS/Lustre) X; Lustre 47