Cluster Implementation and Management; Scheduling

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Cluster Implementation and Management; Scheduling"

Transcription

1 Cluster Implementation and Management; Scheduling CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

2 Outline 1 Cluster components Nodes Interconnect Storage and file systems Software 2 Node provisioning, resource management, and job scheduling Provisioning nodes Resource management and job scheduling CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

3 Acknowledgements Some material used in creating these slides comes from gigabit_ethernet_ready_for_hpc.html CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

4 Cluster components A typical cluster consists of the following components: master/login nodes (1 or more) compute nodes (many) interconnect (1 or more) storage system system software development tools runtime system CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

5 Outline 1 Cluster components Nodes Interconnect Storage and file systems Software 2 Node provisioning, resource management, and job scheduling Provisioning nodes Resource management and job scheduling CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

6 master/login nodes Master (or service) nodes run the resource manager and job scheduler login nodes handle interactive user logins, software development, submission of jobs, and pre- and post-processing of data On small clusters a single node is both the master and login node. Larger clusters have multiple master nodes for high availability (HA) and multiple separate login nodes. CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

7 Compute nodes Compute node configuration depends on applications cluster is designed to support. Important factor to consider are number of processors, number of cores per processor amount of RAM, FSB speed GPU or other accelerator, local storage,... CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

8 Outline 1 Cluster components Nodes Interconnect Storage and file systems Software 2 Node provisioning, resource management, and job scheduling Provisioning nodes Resource management and job scheduling CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

9 Interconnect The network that connects the compute nodes to each other and the master/login nodes is called an interconnect fabric or just interconnect. As in the case of compute nodes, the type of interconnect chosen depends on the applications the cluster is designed to run. Key parameters are latency and bandwidth A scalable, low-latency, high-bandwidth interconnect is desirable for the tightly coupled tasks typical in HPC. Cost of the interconnect can be a significant portion of the overall cluster cost. CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

10 Interconnect options Two main options: Ethernet or InfiniBand. Image source: CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

11 Ethernet Gigabit Ethernet (GigE), available since the early 2000 s, is now the Ethernet standard for general use 10-Gigabit Ethernet (10-GigE) became available in late 2000s The names refer to the supplied bandwidth; 1 Gigabit/s is 125 MB/s while 10 Gigabit/s is 1.25 GB/s. Typical GigE latency is 20 µsec. Low-latency 10-GigE latency can be around 4 to 5 µsec. In many HPC applications low latency is more important than bandwidth many short messages sent between tightly-coupled processes. Unlike fast Ethernet and GigE, 10-GigE is full-duplex and is a switched network fabric (no hubs). Still somewhat expensive: adapters $300 $600, switches $1,000 $10,000. CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

12 InfiniBand InfiniBand (IB) is a switched network fabric Very low latency, 1 to 3 µsec Bandwidth comparable to 10-GigE; InfiniBand QDR 12x bandwidth is 12MB/s New InfiniBand EDR technology is pushing 36MB/s Cost is comparable to 10-GigE but usually must be augmented by Ethernet network CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

13 Other/hybrid Medium to large clusters often have multiple network interconnects IB or 10-GigE for compute node interconnect fabric; low-latency and high bandwidth This interconnect may also connect to storage subsystem or a separate IB or 10-GigE network may be used for access to storage and the master/login node(s) In some clusters IB is used for interconnect groups of compute nodes and 10-GigE or even GigE is used to connect the groups of nodes to each other (compromise to reduce cost) CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

14 Outline 1 Cluster components Nodes Interconnect Storage and file systems Software 2 Node provisioning, resource management, and job scheduling Provisioning nodes Resource management and job scheduling CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

15 Storage and file systems In small clusters disks in the master/login node provide primary shared storage. Compute nodes may have disks for scratch space In larger clusters a separate storage area network (SAN) is used to provide storage to the cluster Usually a distributed file system (DFS) is used to make the make the storage network appear transparently as a disk or disks to the cluster nodes Currently Lustre is a popular DFS option; others include NFS, GPFS, and FhGFS. Desired goal: provide concurrent, high-speed access to applications executing on multiple nodes CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

16 NFS NFS stands for Network File System Developed by Sun Microsystems in the early 1980s Open source implementations exist for most systems Still in wide use; NFS v4 is current standard; performance and security enhancements over previous versions CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

17 GPFS This is IBM s General Parallel File System Used on some computers in the Top500 list and in many commercial clusters First appeared in late 1990s Distributed metadata; no single controller to eliminate bottleneck Depends on RAID for redundancy and protection from loss of data CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

18 Lustre Open source; name derived from Linux Cluster Used by Titan and 5 other of top 10 computers in the Top500 list The Lustre system has three main components: 1 A MDS (metadata server) and associated MDTs (metadata targets; one per Lustre file system) 2 One or more OSSes (object storage servers) that interact with OSTs (object storage targets disks, SAN, etc.) 3 clients: cluster nodes, workstations, archival storage systems, etc designed for high availability and scalability CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

19 Lustre Image source: CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

20 Outline 1 Cluster components Nodes Interconnect Storage and file systems Software 2 Node provisioning, resource management, and job scheduling Provisioning nodes Resource management and job scheduling CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

21 HPC software stack One vendor s software stack diagram: CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

22 HPC software Operating system Most clusters today run some version of Linux; RedHat and CentOS (both RPM based) are most popular Some venders (e.g. Cray) have customized versions of Linux Cluster management and control provision compute nodes schedule jobs HPC development tools Compilers Debuggers and profile tools MPI libraries and runtime system CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

23 Outline 1 Cluster components Nodes Interconnect Storage and file systems Software 2 Node provisioning, resource management, and job scheduling Provisioning nodes Resource management and job scheduling CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

24 The need for diskless provisioning Original Beowulf clusters consisted of individual, stand-alone computers connected by a network Each node has a disk with the OS and other software Our workstation cluster follows this model It is untenable, however, for medium or large clusters to be configured like this, as each node would have to be installed individually software upgrades would be a huge headache The solution is to configure the nodes when they boot using a centralized system image. CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

25 PXE The PXE (Preboot execution Environment) system uses DHCP and TFTP (Trival File Transfer Protocol) to assign a network address and distribute an OS image and RAM disk to a node when it boots Nodes are not required to have disks (but may, for scratch work) Only one OS image and RAM disk need be maintained for each type of node. CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

26 Outline 1 Cluster components Nodes Interconnect Storage and file systems Software 2 Node provisioning, resource management, and job scheduling Provisioning nodes Resource management and job scheduling CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

27 Resource management A cluster resource management system provides much of the same functionality for the cluster that the OS provides for an individual system The most important resource in a cluster are the compute nodes Nodes may not all be equivalent: some may have more memory, a scratch disk, one or more accelerators (GPU, Xeon Phi), and/or share a faster interconnect with certain other nodes. The resource management system is responsible for controlling the allocation of resources to jobs on the cluster CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

28 Job scheduler The job scheduler uses information supplied by the resource manager to determine the best match between job requirements and available resources It then provides this information to the resource manager, which starts jobs as the necessary resources become available Multiple scheduling algorithms exist, including FCFS first come, first served FIFO first in, first out RR round robin SJF shortest job first LJF longest job first The algorithm chosen reflects the desired scheduling policy CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

29 Fair share Job schedulers often make adjustments to rigid scheduling decisions based on use history For example, during daytime hours a SJF policy may be enforced, giving preference to jobs with quick turn-around time Suppose Susan keeps submitting jobs that take 10 minutes to run but Bob needs to run a 15 minute job. using strict SJF, Susan s jobs will always run before Bob s If the scheduler keeps tracks of the number of jobs run for each user, it would eventually decide that Susan has had more than her fair share of the cluster nodes and Bob s job would be run. CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu

More information

New Storage System Solutions

New Storage System Solutions New Storage System Solutions Craig Prescott Research Computing May 2, 2013 Outline } Existing storage systems } Requirements and Solutions } Lustre } /scratch/lfs } Questions? Existing Storage Systems

More information

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical

More information

Sun Constellation System: The Open Petascale Computing Architecture

Sun Constellation System: The Open Petascale Computing Architecture CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical

More information

HPC Software Requirements to Support an HPC Cluster Supercomputer

HPC Software Requirements to Support an HPC Cluster Supercomputer HPC Software Requirements to Support an HPC Cluster Supercomputer Susan Kraus, Cray Cluster Solutions Software Product Manager Maria McLaughlin, Cray Cluster Solutions Product Marketing Cray Inc. WP-CCS-Software01-0417

More information

Cray DVS: Data Virtualization Service

Cray DVS: Data Virtualization Service Cray : Data Virtualization Service Stephen Sugiyama and David Wallace, Cray Inc. ABSTRACT: Cray, the Cray Data Virtualization Service, is a new capability being added to the XT software environment with

More information

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007 Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms Cray User Group Meeting June 2007 Cray s Storage Strategy Background Broad range of HPC requirements

More information

Hadoop on the Gordon Data Intensive Cluster

Hadoop on the Gordon Data Intensive Cluster Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,

More information

THE SUN STORAGE AND ARCHIVE SOLUTION FOR HPC

THE SUN STORAGE AND ARCHIVE SOLUTION FOR HPC THE SUN STORAGE AND ARCHIVE SOLUTION FOR HPC The Right Data, in the Right Place, at the Right Time José Martins Storage Practice Sun Microsystems 1 Agenda Sun s strategy and commitment to the HPC or technical

More information

Highly-Available Distributed Storage. UF HPC Center Research Computing University of Florida

Highly-Available Distributed Storage. UF HPC Center Research Computing University of Florida Highly-Available Distributed Storage UF HPC Center Research Computing University of Florida Storage is Boring Slow, troublesome, albatross around the neck of high-performance computing UF Research Computing

More information

Building Clusters for Gromacs and other HPC applications

Building Clusters for Gromacs and other HPC applications Building Clusters for Gromacs and other HPC applications Erik Lindahl lindahl@cbr.su.se CBR Outline: Clusters Clusters vs. small networks of machines Why do YOU need a cluster? Computer hardware Network

More information

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Building a Top500-class Supercomputing Cluster at LNS-BUAP Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad

More information

High Performance Computing OpenStack Options. September 22, 2015

High Performance Computing OpenStack Options. September 22, 2015 High Performance Computing OpenStack PRESENTATION TITLE GOES HERE Options September 22, 2015 Today s Presenters Glyn Bowden, SNIA Cloud Storage Initiative Board HP Helion Professional Services Alex McDonald,

More information

Logically a Linux cluster looks something like the following: Compute Nodes. user Head node. network

Logically a Linux cluster looks something like the following: Compute Nodes. user Head node. network A typical Linux cluster consists of a group of compute nodes for executing parallel jobs and a head node to which users connect to build and launch their jobs. Often the compute nodes are connected to

More information

System Software for High Performance Computing. Joe Izraelevitz

System Software for High Performance Computing. Joe Izraelevitz System Software for High Performance Computing Joe Izraelevitz Agenda Overview of Supercomputers Blue Gene/Q System LoadLeveler Job Scheduler General Parallel File System HPC at UR What is a Supercomputer?

More information

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes Anthony Kenisky, VP of North America Sales About Appro Over 20 Years of Experience 1991 2000 OEM Server Manufacturer 2001-2007

More information

FLOW-3D Performance Benchmark and Profiling. September 2012

FLOW-3D Performance Benchmark and Profiling. September 2012 FLOW-3D Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: FLOW-3D, Dell, Intel, Mellanox Compute

More information

Kriterien für ein PetaFlop System

Kriterien für ein PetaFlop System Kriterien für ein PetaFlop System Rainer Keller, HLRS :: :: :: Context: Organizational HLRS is one of the three national supercomputing centers in Germany. The national supercomputing centers are working

More information

Simple Introduction to Clusters

Simple Introduction to Clusters Simple Introduction to Clusters Cluster Concepts Cluster is a widely used term meaning independent computers combined into a unified system through software and networking. At the most fundamental level,

More information

Lessons learned from parallel file system operation

Lessons learned from parallel file system operation Lessons learned from parallel file system operation Roland Laifer STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association

More information

POWER ALL GLOBAL FILE SYSTEM (PGFS)

POWER ALL GLOBAL FILE SYSTEM (PGFS) POWER ALL GLOBAL FILE SYSTEM (PGFS) Defining next generation of global storage grid Power All Networks Ltd. Technical Whitepaper April 2008, version 1.01 Table of Content 1. Introduction.. 3 2. Paradigm

More information

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC2013 - Denver

The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC2013 - Denver 1 The PHI solution Fujitsu Industry Ready Intel XEON-PHI based solution SC2013 - Denver Industrial Application Challenges Most of existing scientific and technical applications Are written for legacy execution

More information

Building a Scalable Storage with InfiniBand

Building a Scalable Storage with InfiniBand WHITE PAPER Building a Scalable Storage with InfiniBand The Problem...1 Traditional Solutions and their Inherent Problems...2 InfiniBand as a Key Advantage...3 VSA Enables Solutions from a Core Technology...5

More information

Lustre failover experience

Lustre failover experience Lustre failover experience Lustre Administrators and Developers Workshop Paris 1 September 25, 2012 TOC Who we are Our Lustre experience: the environment Deployment Benchmarks What's next 2 Who we are

More information

Current Status of FEFS for the K computer

Current Status of FEFS for the K computer Current Status of FEFS for the K computer Shinji Sumimoto Fujitsu Limited Apr.24 2012 LUG2012@Austin Outline RIKEN and Fujitsu are jointly developing the K computer * Development continues with system

More information

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,

More information

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution Arista 10 Gigabit Ethernet Switch Lab-Tested with Panasas ActiveStor Parallel Storage System Delivers Best Results for High-Performance and Low Latency for Scale-Out Cloud Storage Applications Introduction

More information

Latency Considerations for 10GBase-T PHYs

Latency Considerations for 10GBase-T PHYs Latency Considerations for PHYs Shimon Muller Sun Microsystems, Inc. March 16, 2004 Orlando, FL Outline Introduction Issues and non-issues PHY Latency in The Big Picture Observations Summary and Recommendations

More information

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of

More information

HPC Update: Engagement Model

HPC Update: Engagement Model HPC Update: Engagement Model MIKE VILDIBILL Director, Strategic Engagements Sun Microsystems mikev@sun.com Our Strategy Building a Comprehensive HPC Portfolio that Delivers Differentiated Customer Value

More information

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services

ALPS Supercomputing System A Scalable Supercomputer with Flexible Services ALPS Supercomputing System A Scalable Supercomputer with Flexible Services 1 Abstract Supercomputing is moving from the realm of abstract to mainstream with more and more applications and research being

More information

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems About me David Rioja Redondo Telecommunication Engineer - Universidad de Alcalá >2 years building and managing clusters UPM

More information

THE CLOUD STORAGE ARGUMENT

THE CLOUD STORAGE ARGUMENT THE CLOUD STORAGE ARGUMENT The argument over the right type of storage for data center applications is an ongoing battle. This argument gets amplified when discussing cloud architectures both private and

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

BRIDGING EMC ISILON NAS ON IP TO INFINIBAND NETWORKS WITH MELLANOX SWITCHX

BRIDGING EMC ISILON NAS ON IP TO INFINIBAND NETWORKS WITH MELLANOX SWITCHX White Paper BRIDGING EMC ISILON NAS ON IP TO INFINIBAND NETWORKS WITH Abstract This white paper explains how to configure a Mellanox SwitchX Series switch to bridge the external network of an EMC Isilon

More information

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance

More information

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures 11 th International LS-DYNA Users Conference Computing Technology A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures Yih-Yih Lin Hewlett-Packard Company Abstract In this paper, the

More information

High Performance Computing in CST STUDIO SUITE

High Performance Computing in CST STUDIO SUITE High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver

More information

159.735. Final Report. Cluster Scheduling. Submitted by: Priti Lohani 04244354

159.735. Final Report. Cluster Scheduling. Submitted by: Priti Lohani 04244354 159.735 Final Report Cluster Scheduling Submitted by: Priti Lohani 04244354 1 Table of contents: 159.735... 1 Final Report... 1 Cluster Scheduling... 1 Table of contents:... 2 1. Introduction:... 3 1.1

More information

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez IT of SPIM Data Storage and Compression EMBO Course - August 27th Jeff Oegema, Peter Steinbach, Oscar Gonzalez 1 Talk Outline Introduction and the IT Team SPIM Data Flow Capture, Compression, and the Data

More information

Virtual Compute Appliance Frequently Asked Questions

Virtual Compute Appliance Frequently Asked Questions General Overview What is Oracle s Virtual Compute Appliance? Oracle s Virtual Compute Appliance is an integrated, wire once, software-defined infrastructure system designed for rapid deployment of both

More information

Availability Digest. Penguin Computing Offers Beowulf Clustering on Linux January 2007

Availability Digest. Penguin Computing Offers Beowulf Clustering on Linux January 2007 the Availability Digest Penguin Computing Offers Beowulf Clustering on Linux January 2007 Clustering can provide high availability and superr-scalable high-performance computing at commodity prices. The

More information

Integrated Application and Data Protection. NEC ExpressCluster White Paper

Integrated Application and Data Protection. NEC ExpressCluster White Paper Integrated Application and Data Protection NEC ExpressCluster White Paper Introduction Critical business processes and operations depend on real-time access to IT systems that consist of applications and

More information

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance Hybrid Storage Performance Gains for IOPS and Bandwidth Utilizing Colfax Servers and Enmotus FuzeDrive Software NVMe Hybrid

More information

HPC @ CRIBI. Calcolo Scientifico e Bioinformatica oggi Università di Padova 13 gennaio 2012

HPC @ CRIBI. Calcolo Scientifico e Bioinformatica oggi Università di Padova 13 gennaio 2012 HPC @ CRIBI Calcolo Scientifico e Bioinformatica oggi Università di Padova 13 gennaio 2012 what is exact? experience on advanced computational technologies a company lead by IT experts with a strong background

More information

Advanced Techniques with Newton. Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011

Advanced Techniques with Newton. Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011 Advanced Techniques with Newton Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011 Workshop Goals Gain independence Executing your work Finding Information Fixing Problems Optimizing Effectiveness

More information

Recommended hardware system configurations for ANSYS users

Recommended hardware system configurations for ANSYS users Recommended hardware system configurations for ANSYS users The purpose of this document is to recommend system configurations that will deliver high performance for ANSYS users across the entire range

More information

Stovepipes to Clouds. Rick Reid Principal Engineer SGI Federal. 2013 by SGI Federal. Published by The Aerospace Corporation with permission.

Stovepipes to Clouds. Rick Reid Principal Engineer SGI Federal. 2013 by SGI Federal. Published by The Aerospace Corporation with permission. Stovepipes to Clouds Rick Reid Principal Engineer SGI Federal 2013 by SGI Federal. Published by The Aerospace Corporation with permission. Agenda Stovepipe Characteristics Why we Built Stovepipes Cluster

More information

Scaling Across the Supercomputer Performance Spectrum

Scaling Across the Supercomputer Performance Spectrum Scaling Across the Supercomputer Performance Spectrum Cray s XC40 system leverages the combined advantages of next-generation Aries interconnect and Dragonfly network topology, Intel Xeon processors, integrated

More information

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN 1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction

More information

Scalable filesystems boosting Linux storage solutions

Scalable filesystems boosting Linux storage solutions Scalable filesystems boosting Linux storage solutions Daniel Kobras science + computing ag IT-Dienstleistungen und Software für anspruchsvolle Rechnernetze Tübingen München Berlin Düsseldorf Motivation

More information

Easier - Faster - Better

Easier - Faster - Better Highest reliability, availability and serviceability ClusterStor gets you productive fast with robust professional service offerings available as part of solution delivery, including quality controlled

More information

supercomputing. simplified.

supercomputing. simplified. supercomputing. simplified. INTRODUCING WINDOWS HPC SERVER 2008 R2 SUITE Windows HPC Server 2008 R2, Microsoft s third-generation HPC solution, provides a comprehensive and costeffective solution for harnessing

More information

Scaling from Workstation to Cluster for Compute-Intensive Applications

Scaling from Workstation to Cluster for Compute-Intensive Applications Cluster Transition Guide: Scaling from Workstation to Cluster for Compute-Intensive Applications IN THIS GUIDE: The Why: Proven Performance Gains On Cluster Vs. Workstation The What: Recommended Reference

More information

Cray XT3 Supercomputer Scalable by Design CRAY XT3 DATASHEET

Cray XT3 Supercomputer Scalable by Design CRAY XT3 DATASHEET CRAY XT3 DATASHEET Cray XT3 Supercomputer Scalable by Design The Cray XT3 system offers a new level of scalable computing where: a single powerful computing system handles the most complex problems every

More information

Best Practices for Data Sharing in a Grid Distributed SAS Environment. Updated July 2010

Best Practices for Data Sharing in a Grid Distributed SAS Environment. Updated July 2010 Best Practices for Data Sharing in a Grid Distributed SAS Environment Updated July 2010 B E S T P R A C T I C E D O C U M E N T Table of Contents 1 Abstract... 2 1.1 Storage performance is critical...

More information

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA WHITE PAPER April 2014 Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA Executive Summary...1 Background...2 File Systems Architecture...2 Network Architecture...3 IBM BigInsights...5

More information

How to Choose your Red Hat Enterprise Linux Filesystem

How to Choose your Red Hat Enterprise Linux Filesystem How to Choose your Red Hat Enterprise Linux Filesystem EXECUTIVE SUMMARY Choosing the Red Hat Enterprise Linux filesystem that is appropriate for your application is often a non-trivial decision due to

More information

HPC Growing Pains. Lessons learned from building a Top500 supercomputer

HPC Growing Pains. Lessons learned from building a Top500 supercomputer HPC Growing Pains Lessons learned from building a Top500 supercomputer John L. Wofford Center for Computational Biology & Bioinformatics Columbia University I. What is C2B2? Outline Lessons learned from

More information

Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems

Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems Applied Technology Abstract By migrating VMware virtual machines from one physical environment to another, VMware VMotion can

More information

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance

More information

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Can High-Performance Interconnects Benefit Memcached and Hadoop? Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,

More information

Data storage considerations for HTS platforms. George Magklaras -- node manager http://www.no.embnet.org http://www.biotek.uio.no admin@embnet.uio.

Data storage considerations for HTS platforms. George Magklaras -- node manager http://www.no.embnet.org http://www.biotek.uio.no admin@embnet.uio. Data storage considerations for HTS platforms George Magklaras -- node manager http://www.no.embnet.org http://www.biotek.uio.no admin@embnet.uio.no Overview: The need for data storage Volume dimensioning

More information

GPFS und HPSS am HLRS

GPFS und HPSS am HLRS GPFS und HPSS am HLRS Peter W. Haas Archivierung im Bereich Höchstleistungsrechner Swisstopo, Bern 3. Juli 2009 1 High Performance Computing Center Stuttgart Table of Contents 1. What are GPFS and HPSS

More information

(AS ON 07.08.2015) A. Original tender document page no: 2 1. TENDER NOTICE

(AS ON 07.08.2015) A. Original tender document page no: 2 1. TENDER NOTICE AMENDMENTS TO TENDER REFERENCE NO - AU/CPC-RCC/HPC/2015-16 TENDER DOCUMENT FOR SUPPLY, INSTALLATION AND COMMISSIONING OF HIGHPERFORMANCE COMPUTING (HPC) HYBRID SYSTEM A. Original tender document page no:

More information

Life Sciences Opening the pipe to faster research, discovery, computation and resource sharing

Life Sciences Opening the pipe to faster research, discovery, computation and resource sharing Solution Brief: Life Sciences Opening the pipe to faster research, discovery, computation and resource sharing Abstract Advances in Information Technology (IT) are significantly improving the speed at

More information

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing An Alternative Storage Solution for MapReduce Eric Lomascolo Director, Solutions Marketing MapReduce Breaks the Problem Down Data Analysis Distributes processing work (Map) across compute nodes and accumulates

More information

A Comparison on Current Distributed File Systems for Beowulf Clusters

A Comparison on Current Distributed File Systems for Beowulf Clusters A Comparison on Current Distributed File Systems for Beowulf Clusters Rafael Bohrer Ávila 1 Philippe Olivier Alexandre Navaux 2 Yves Denneulin 3 Abstract This paper presents a comparison on current file

More information

InfiniBand Update Addressing new I/O challenges in HPC, Cloud, and Web 2.0 infrastructures. Brian Sparks IBTA Marketing Working Group Co-Chair

InfiniBand Update Addressing new I/O challenges in HPC, Cloud, and Web 2.0 infrastructures. Brian Sparks IBTA Marketing Working Group Co-Chair InfiniBand Update Addressing new I/O challenges in HPC, Cloud, and Web 2.0 infrastructures Brian Sparks IBTA Marketing Working Group Co-Chair Page 1 IBTA & OFA Update IBTA today has over 50 members; OFA

More information

PADS GPFS Filesystem: Crash Root Cause Analysis. Computation Institute

PADS GPFS Filesystem: Crash Root Cause Analysis. Computation Institute PADS GPFS Filesystem: Crash Root Cause Analysis Computation Institute Argonne National Laboratory Table of Contents Purpose 1 Terminology 2 Infrastructure 4 Timeline of Events 5 Background 5 Corruption

More information

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland The Lattice Project: A Multi-Model Grid Computing System Center for Bioinformatics and Computational Biology University of Maryland Parallel Computing PARALLEL COMPUTING a form of computation in which

More information

Technical Computing Suite Job Management Software

Technical Computing Suite Job Management Software Technical Computing Suite Job Management Software Toshiaki Mikamo Fujitsu Limited Supercomputer PRIMEHPC FX10 PRIMERGY x86 cluster Outline System Configuration and Software Stack Features The major functions

More information

- An Essential Building Block for Stable and Reliable Compute Clusters

- An Essential Building Block for Stable and Reliable Compute Clusters Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative

More information

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system Christian Clémençon (EPFL-DIT)  4 April 2013 GPFS Storage Server Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " Agenda" GPFS Overview" Classical versus GSS I/O Solution" GPFS Storage Server (GSS)" GPFS Native RAID

More information

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters COSC 6374 Parallel Computation Parallel I/O (I) I/O basics Spring 2008 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network

More information

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters COSC 6374 Parallel I/O (I) I/O basics Fall 2012 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network card 1 Network card

More information

Petascale Software Challenges. Piyush Chaudhary piyushc@us.ibm.com High Performance Computing

Petascale Software Challenges. Piyush Chaudhary piyushc@us.ibm.com High Performance Computing Petascale Software Challenges Piyush Chaudhary piyushc@us.ibm.com High Performance Computing Fundamental Observations Applications are struggling to realize growth in sustained performance at scale Reasons

More information

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar Computational infrastructure for NGS data analysis José Carbonell Caballero Pablo Escobar Computational infrastructure for NGS Cluster definition: A computer cluster is a group of linked computers, working

More information

An Oracle White Paper September 2011. Oracle Exadata Database Machine - Backup & Recovery Sizing: Tape Backups

An Oracle White Paper September 2011. Oracle Exadata Database Machine - Backup & Recovery Sizing: Tape Backups An Oracle White Paper September 2011 Oracle Exadata Database Machine - Backup & Recovery Sizing: Tape Backups Table of Contents Introduction... 3 Tape Backup Infrastructure Components... 4 Requirements...

More information

Transforming the UL into a Big Data University. Current status and planned evolutions

Transforming the UL into a Big Data University. Current status and planned evolutions Transforming the UL into a Big Data University Current status and planned evolutions December 6th, 2013 UniGR Workshop - Big Data Sébastien Varrette, PhD Prof. Pascal Bouvry Prof. Volker Müller http://hpc.uni.lu

More information

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003 Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks An Oracle White Paper April 2003 Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building

More information

Local Area Networks: Software

Local Area Networks: Software School of Business Eastern Illinois University Local Area Networks: Software (Week 8, Thursday 3/1/2007) Abdou Illia, Spring 2007 Learning Objectives 2 Identify main functions of operating systems Describe

More information

PRIMERGY server-based High Performance Computing solutions

PRIMERGY server-based High Performance Computing solutions PRIMERGY server-based High Performance Computing solutions PreSales - May 2010 - HPC Revenue OS & Processor Type Increasing standardization with shift in HPC to x86 with 70% in 2008.. HPC revenue by operating

More information

Visualization @ SUN. Linda Fellingham, Ph. D Manager, Visualization and Graphics Sun Microsystems

Visualization @ SUN. Linda Fellingham, Ph. D Manager, Visualization and Graphics Sun Microsystems Visualization @ SUN Shared Visualization 1.1 Software Scalable Visualization 1.1 Solutions Linda Fellingham, Ph. D Manager, Visualization and Graphics Sun Microsystems The Data Tsunami Visualization is

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx

More information

CATS-i : LINUX CLUSTER ADMINISTRATION TOOLS ON THE INTERNET

CATS-i : LINUX CLUSTER ADMINISTRATION TOOLS ON THE INTERNET CATS-i : LINUX CLUSTER ADMINISTRATION TOOLS ON THE INTERNET Jiyeon Kim, Yongkwan Park, Sungjoo Kwon, Jaeyoung Choi {heaven, psiver, lithmoon}@ss.ssu.ac.kr, choi@comp.ssu.ac.kr School of Computing, Soongsil

More information

Stateless Compute Cluster

Stateless Compute Cluster 5th Black Forest Grid Workshop 23rd April 2009 Stateless Compute Cluster Fast Deployment and Switching of Cluster Computing Nodes for easier Administration and better Fulfilment of Different Demands Dirk

More information

The Ultimate in Scale-Out Storage for HPC and Big Data

The Ultimate in Scale-Out Storage for HPC and Big Data Node Inventory Health and Active Filesystem Throughput Monitoring Asset Utilization and Capacity Statistics Manager brings to life powerful, intuitive, context-aware real-time monitoring and proactive

More information

Preparation Guide. How to prepare your environment for an OnApp Cloud v3.0 (beta) deployment.

Preparation Guide. How to prepare your environment for an OnApp Cloud v3.0 (beta) deployment. Preparation Guide v3.0 BETA How to prepare your environment for an OnApp Cloud v3.0 (beta) deployment. Document version 1.0 Document release date 25 th September 2012 document revisions 1 Contents 1. Overview...

More information

Department of Computer Sciences University of Salzburg. HPC In The Cloud? Seminar aus Informatik SS 2011/2012. July 16, 2012

Department of Computer Sciences University of Salzburg. HPC In The Cloud? Seminar aus Informatik SS 2011/2012. July 16, 2012 Department of Computer Sciences University of Salzburg HPC In The Cloud? Seminar aus Informatik SS 2011/2012 July 16, 2012 Michael Kleber, mkleber@cosy.sbg.ac.at Contents 1 Introduction...................................

More information

SGI UV 300, UV 30EX: Big Brains for No-Limit Computing

SGI UV 300, UV 30EX: Big Brains for No-Limit Computing SGI UV 300, UV 30EX: Big Brains for No-Limit Computing The Most ful In-memory Supercomputers for Data-Intensive Workloads Key Features Scales up to 64 sockets and 64TB of coherent shared memory Extreme

More information

www.thinkparq.com www.beegfs.com

www.thinkparq.com www.beegfs.com www.thinkparq.com www.beegfs.com KEY ASPECTS Maximum Flexibility Maximum Scalability BeeGFS supports a wide range of Linux distributions such as RHEL/Fedora, SLES/OpenSuse or Debian/Ubuntu as well as a

More information

Microsoft Technical Computing The Advancement of Parallelism. Tom Quinn, Technical Computing Partner Manager

Microsoft Technical Computing The Advancement of Parallelism. Tom Quinn, Technical Computing Partner Manager Presented at the COMSOL Conference 2010 Boston Microsoft Technical Computing The Advancement of Parallelism Tom Quinn, Technical Computing Partner Manager 21 1.2 x 10 New Bytes of Information in 2010 Source:

More information

Beyond Embarrassingly Parallel Big Data. William Gropp www.cs.illinois.edu/~wgropp

Beyond Embarrassingly Parallel Big Data. William Gropp www.cs.illinois.edu/~wgropp Beyond Embarrassingly Parallel Big Data William Gropp www.cs.illinois.edu/~wgropp Messages Big is big Data driven is an important area, but not all data driven problems are big data (despite current hype).

More information

White Paper Solarflare High-Performance Computing (HPC) Applications

White Paper Solarflare High-Performance Computing (HPC) Applications Solarflare High-Performance Computing (HPC) Applications 10G Ethernet: Now Ready for Low-Latency HPC Applications Solarflare extends the benefits of its low-latency, high-bandwidth 10GbE server adapters

More information

ECDF Infrastructure Refresh - Requirements Consultation Document

ECDF Infrastructure Refresh - Requirements Consultation Document Edinburgh Compute & Data Facility - December 2014 ECDF Infrastructure Refresh - Requirements Consultation Document Introduction In order to sustain the University s central research data and computing

More information

Fujitsu HPC Cluster Suite

Fujitsu HPC Cluster Suite Webinar Fujitsu HPC Cluster Suite 29 th May 2013 Павел Борох 0 HPC: полный спектр предложений от Fujitsu PRIMERGY Server, Workstation Cluster Management & Operation ISV and Research Partnerships HPC Cluster

More information

Sun Storage Perspective & Lustre Architecture. Dr. Peter Braam VP Sun Microsystems

Sun Storage Perspective & Lustre Architecture. Dr. Peter Braam VP Sun Microsystems Sun Storage Perspective & Lustre Architecture Dr. Peter Braam VP Sun Microsystems Agenda Future of Storage Sun s vision Lustre - vendor neutral architecture roadmap Sun s view on storage introduction The

More information

CMS Tier-3 cluster at NISER. Dr. Tania Moulik

CMS Tier-3 cluster at NISER. Dr. Tania Moulik CMS Tier-3 cluster at NISER Dr. Tania Moulik What and why? Grid computing is a term referring to the combination of computer resources from multiple administrative domains to reach common goal. Grids tend

More information

NetApp High-Performance Computing Solution for Lustre: Solution Guide

NetApp High-Performance Computing Solution for Lustre: Solution Guide Technical Report NetApp High-Performance Computing Solution for Lustre: Solution Guide Robert Lai, NetApp August 2012 TR-3997 TABLE OF CONTENTS 1 Introduction... 5 1.1 NetApp HPC Solution for Lustre Introduction...5

More information