Cluster Implementation and Management; Scheduling
|
|
- Meagan Stokes
- 8 years ago
- Views:
Transcription
1 Cluster Implementation and Management; Scheduling CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
2 Outline 1 Cluster components Nodes Interconnect Storage and file systems Software 2 Node provisioning, resource management, and job scheduling Provisioning nodes Resource management and job scheduling CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
3 Acknowledgements Some material used in creating these slides comes from gigabit_ethernet_ready_for_hpc.html CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
4 Cluster components A typical cluster consists of the following components: master/login nodes (1 or more) compute nodes (many) interconnect (1 or more) storage system system software development tools runtime system CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
5 Outline 1 Cluster components Nodes Interconnect Storage and file systems Software 2 Node provisioning, resource management, and job scheduling Provisioning nodes Resource management and job scheduling CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
6 master/login nodes Master (or service) nodes run the resource manager and job scheduler login nodes handle interactive user logins, software development, submission of jobs, and pre- and post-processing of data On small clusters a single node is both the master and login node. Larger clusters have multiple master nodes for high availability (HA) and multiple separate login nodes. CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
7 Compute nodes Compute node configuration depends on applications cluster is designed to support. Important factor to consider are number of processors, number of cores per processor amount of RAM, FSB speed GPU or other accelerator, local storage,... CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
8 Outline 1 Cluster components Nodes Interconnect Storage and file systems Software 2 Node provisioning, resource management, and job scheduling Provisioning nodes Resource management and job scheduling CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
9 Interconnect The network that connects the compute nodes to each other and the master/login nodes is called an interconnect fabric or just interconnect. As in the case of compute nodes, the type of interconnect chosen depends on the applications the cluster is designed to run. Key parameters are latency and bandwidth A scalable, low-latency, high-bandwidth interconnect is desirable for the tightly coupled tasks typical in HPC. Cost of the interconnect can be a significant portion of the overall cluster cost. CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
10 Interconnect options Two main options: Ethernet or InfiniBand. Image source: CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
11 Ethernet Gigabit Ethernet (GigE), available since the early 2000 s, is now the Ethernet standard for general use 10-Gigabit Ethernet (10-GigE) became available in late 2000s The names refer to the supplied bandwidth; 1 Gigabit/s is 125 MB/s while 10 Gigabit/s is 1.25 GB/s. Typical GigE latency is 20 µsec. Low-latency 10-GigE latency can be around 4 to 5 µsec. In many HPC applications low latency is more important than bandwidth many short messages sent between tightly-coupled processes. Unlike fast Ethernet and GigE, 10-GigE is full-duplex and is a switched network fabric (no hubs). Still somewhat expensive: adapters $300 $600, switches $1,000 $10,000. CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
12 InfiniBand InfiniBand (IB) is a switched network fabric Very low latency, 1 to 3 µsec Bandwidth comparable to 10-GigE; InfiniBand QDR 12x bandwidth is 12MB/s New InfiniBand EDR technology is pushing 36MB/s Cost is comparable to 10-GigE but usually must be augmented by Ethernet network CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
13 Other/hybrid Medium to large clusters often have multiple network interconnects IB or 10-GigE for compute node interconnect fabric; low-latency and high bandwidth This interconnect may also connect to storage subsystem or a separate IB or 10-GigE network may be used for access to storage and the master/login node(s) In some clusters IB is used for interconnect groups of compute nodes and 10-GigE or even GigE is used to connect the groups of nodes to each other (compromise to reduce cost) CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
14 Outline 1 Cluster components Nodes Interconnect Storage and file systems Software 2 Node provisioning, resource management, and job scheduling Provisioning nodes Resource management and job scheduling CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
15 Storage and file systems In small clusters disks in the master/login node provide primary shared storage. Compute nodes may have disks for scratch space In larger clusters a separate storage area network (SAN) is used to provide storage to the cluster Usually a distributed file system (DFS) is used to make the make the storage network appear transparently as a disk or disks to the cluster nodes Currently Lustre is a popular DFS option; others include NFS, GPFS, and FhGFS. Desired goal: provide concurrent, high-speed access to applications executing on multiple nodes CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
16 NFS NFS stands for Network File System Developed by Sun Microsystems in the early 1980s Open source implementations exist for most systems Still in wide use; NFS v4 is current standard; performance and security enhancements over previous versions CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
17 GPFS This is IBM s General Parallel File System Used on some computers in the Top500 list and in many commercial clusters First appeared in late 1990s Distributed metadata; no single controller to eliminate bottleneck Depends on RAID for redundancy and protection from loss of data CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
18 Lustre Open source; name derived from Linux Cluster Used by Titan and 5 other of top 10 computers in the Top500 list The Lustre system has three main components: 1 A MDS (metadata server) and associated MDTs (metadata targets; one per Lustre file system) 2 One or more OSSes (object storage servers) that interact with OSTs (object storage targets disks, SAN, etc.) 3 clients: cluster nodes, workstations, archival storage systems, etc designed for high availability and scalability CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
19 Lustre Image source: CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
20 Outline 1 Cluster components Nodes Interconnect Storage and file systems Software 2 Node provisioning, resource management, and job scheduling Provisioning nodes Resource management and job scheduling CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
21 HPC software stack One vendor s software stack diagram: CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
22 HPC software Operating system Most clusters today run some version of Linux; RedHat and CentOS (both RPM based) are most popular Some venders (e.g. Cray) have customized versions of Linux Cluster management and control provision compute nodes schedule jobs HPC development tools Compilers Debuggers and profile tools MPI libraries and runtime system CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
23 Outline 1 Cluster components Nodes Interconnect Storage and file systems Software 2 Node provisioning, resource management, and job scheduling Provisioning nodes Resource management and job scheduling CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
24 The need for diskless provisioning Original Beowulf clusters consisted of individual, stand-alone computers connected by a network Each node has a disk with the OS and other software Our workstation cluster follows this model It is untenable, however, for medium or large clusters to be configured like this, as each node would have to be installed individually software upgrades would be a huge headache The solution is to configure the nodes when they boot using a centralized system image. CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
25 PXE The PXE (Preboot execution Environment) system uses DHCP and TFTP (Trival File Transfer Protocol) to assign a network address and distribute an OS image and RAM disk to a node when it boots Nodes are not required to have disks (but may, for scratch work) Only one OS image and RAM disk need be maintained for each type of node. CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
26 Outline 1 Cluster components Nodes Interconnect Storage and file systems Software 2 Node provisioning, resource management, and job scheduling Provisioning nodes Resource management and job scheduling CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
27 Resource management A cluster resource management system provides much of the same functionality for the cluster that the OS provides for an individual system The most important resource in a cluster are the compute nodes Nodes may not all be equivalent: some may have more memory, a scratch disk, one or more accelerators (GPU, Xeon Phi), and/or share a faster interconnect with certain other nodes. The resource management system is responsible for controlling the allocation of resources to jobs on the cluster CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
28 Job scheduler The job scheduler uses information supplied by the resource manager to determine the best match between job requirements and available resources It then provides this information to the resource manager, which starts jobs as the necessary resources become available Multiple scheduling algorithms exist, including FCFS first come, first served FIFO first in, first out RR round robin SJF shortest job first LJF longest job first The algorithm chosen reflects the desired scheduling policy CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
29 Fair share Job schedulers often make adjustments to rigid scheduling decisions based on use history For example, during daytime hours a SJF policy may be enforced, giving preference to jobs with quick turn-around time Suppose Susan keeps submitting jobs that take 10 minutes to run but Bob needs to run a 15 minute job. using strict SJF, Susan s jobs will always run before Bob s If the scheduler keeps tracks of the number of jobs run for each user, it would eventually decide that Susan has had more than her fair share of the cluster nodes and Bob s job would be run. CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring / 29
LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
More informationNew Storage System Solutions
New Storage System Solutions Craig Prescott Research Computing May 2, 2013 Outline } Existing storage systems } Requirements and Solutions } Lustre } /scratch/lfs } Questions? Existing Storage Systems
More informationAgenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC
HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical
More informationHPC Software Requirements to Support an HPC Cluster Supercomputer
HPC Software Requirements to Support an HPC Cluster Supercomputer Susan Kraus, Cray Cluster Solutions Software Product Manager Maria McLaughlin, Cray Cluster Solutions Product Marketing Cray Inc. WP-CCS-Software01-0417
More informationSun Constellation System: The Open Petascale Computing Architecture
CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical
More informationPerformance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007
Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms Cray User Group Meeting June 2007 Cray s Storage Strategy Background Broad range of HPC requirements
More informationCray DVS: Data Virtualization Service
Cray : Data Virtualization Service Stephen Sugiyama and David Wallace, Cray Inc. ABSTRACT: Cray, the Cray Data Virtualization Service, is a new capability being added to the XT software environment with
More informationHadoop on the Gordon Data Intensive Cluster
Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,
More informationHighly-Available Distributed Storage. UF HPC Center Research Computing University of Florida
Highly-Available Distributed Storage UF HPC Center Research Computing University of Florida Storage is Boring Slow, troublesome, albatross around the neck of high-performance computing UF Research Computing
More informationBuilding Clusters for Gromacs and other HPC applications
Building Clusters for Gromacs and other HPC applications Erik Lindahl lindahl@cbr.su.se CBR Outline: Clusters Clusters vs. small networks of machines Why do YOU need a cluster? Computer hardware Network
More informationHigh Performance Computing OpenStack Options. September 22, 2015
High Performance Computing OpenStack PRESENTATION TITLE GOES HERE Options September 22, 2015 Today s Presenters Glyn Bowden, SNIA Cloud Storage Initiative Board HP Helion Professional Services Alex McDonald,
More informationTHE SUN STORAGE AND ARCHIVE SOLUTION FOR HPC
THE SUN STORAGE AND ARCHIVE SOLUTION FOR HPC The Right Data, in the Right Place, at the Right Time José Martins Storage Practice Sun Microsystems 1 Agenda Sun s strategy and commitment to the HPC or technical
More informationBuilding a Top500-class Supercomputing Cluster at LNS-BUAP
Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad
More informationLogically a Linux cluster looks something like the following: Compute Nodes. user Head node. network
A typical Linux cluster consists of a group of compute nodes for executing parallel jobs and a head node to which users connect to build and launch their jobs. Often the compute nodes are connected to
More informationAppro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales
Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes Anthony Kenisky, VP of North America Sales About Appro Over 20 Years of Experience 1991 2000 OEM Server Manufacturer 2001-2007
More informationFLOW-3D Performance Benchmark and Profiling. September 2012
FLOW-3D Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: FLOW-3D, Dell, Intel, Mellanox Compute
More informationKriterien für ein PetaFlop System
Kriterien für ein PetaFlop System Rainer Keller, HLRS :: :: :: Context: Organizational HLRS is one of the three national supercomputing centers in Germany. The national supercomputing centers are working
More informationSystem Software for High Performance Computing. Joe Izraelevitz
System Software for High Performance Computing Joe Izraelevitz Agenda Overview of Supercomputers Blue Gene/Q System LoadLeveler Job Scheduler General Parallel File System HPC at UR What is a Supercomputer?
More informationSimple Introduction to Clusters
Simple Introduction to Clusters Cluster Concepts Cluster is a widely used term meaning independent computers combined into a unified system through software and networking. At the most fundamental level,
More informationLessons learned from parallel file system operation
Lessons learned from parallel file system operation Roland Laifer STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association
More informationPOWER ALL GLOBAL FILE SYSTEM (PGFS)
POWER ALL GLOBAL FILE SYSTEM (PGFS) Defining next generation of global storage grid Power All Networks Ltd. Technical Whitepaper April 2008, version 1.01 Table of Content 1. Introduction.. 3 2. Paradigm
More informationThe PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC2013 - Denver
1 The PHI solution Fujitsu Industry Ready Intel XEON-PHI based solution SC2013 - Denver Industrial Application Challenges Most of existing scientific and technical applications Are written for legacy execution
More informationBuilding a Scalable Storage with InfiniBand
WHITE PAPER Building a Scalable Storage with InfiniBand The Problem...1 Traditional Solutions and their Inherent Problems...2 InfiniBand as a Key Advantage...3 VSA Enables Solutions from a Core Technology...5
More informationLustre failover experience
Lustre failover experience Lustre Administrators and Developers Workshop Paris 1 September 25, 2012 TOC Who we are Our Lustre experience: the environment Deployment Benchmarks What's next 2 Who we are
More informationIntroduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution
Arista 10 Gigabit Ethernet Switch Lab-Tested with Panasas ActiveStor Parallel Storage System Delivers Best Results for High-Performance and Low Latency for Scale-Out Cloud Storage Applications Introduction
More informationCluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,
More informationALPS Supercomputing System A Scalable Supercomputer with Flexible Services
ALPS Supercomputing System A Scalable Supercomputer with Flexible Services 1 Abstract Supercomputing is moving from the realm of abstract to mainstream with more and more applications and research being
More informationLatency Considerations for 10GBase-T PHYs
Latency Considerations for PHYs Shimon Muller Sun Microsystems, Inc. March 16, 2004 Orlando, FL Outline Introduction Issues and non-issues PHY Latency in The Big Picture Observations Summary and Recommendations
More informationHigh Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates
High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of
More informationHPC Update: Engagement Model
HPC Update: Engagement Model MIKE VILDIBILL Director, Strategic Engagements Sun Microsystems mikev@sun.com Our Strategy Building a Comprehensive HPC Portfolio that Delivers Differentiated Customer Value
More informationTHE CLOUD STORAGE ARGUMENT
THE CLOUD STORAGE ARGUMENT The argument over the right type of storage for data center applications is an ongoing battle. This argument gets amplified when discussing cloud architectures both private and
More informationDavid Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems
David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems About me David Rioja Redondo Telecommunication Engineer - Universidad de Alcalá >2 years building and managing clusters UPM
More informationBRIDGING EMC ISILON NAS ON IP TO INFINIBAND NETWORKS WITH MELLANOX SWITCHX
White Paper BRIDGING EMC ISILON NAS ON IP TO INFINIBAND NETWORKS WITH Abstract This white paper explains how to configure a Mellanox SwitchX Series switch to bridge the external network of an EMC Isilon
More informationA Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures
11 th International LS-DYNA Users Conference Computing Technology A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures Yih-Yih Lin Hewlett-Packard Company Abstract In this paper, the
More informationCurrent Status of FEFS for the K computer
Current Status of FEFS for the K computer Shinji Sumimoto Fujitsu Limited Apr.24 2012 LUG2012@Austin Outline RIKEN and Fujitsu are jointly developing the K computer * Development continues with system
More informationHETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance
More informationHigh Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
More informationVirtual Compute Appliance Frequently Asked Questions
General Overview What is Oracle s Virtual Compute Appliance? Oracle s Virtual Compute Appliance is an integrated, wire once, software-defined infrastructure system designed for rapid deployment of both
More informationIT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez
IT of SPIM Data Storage and Compression EMBO Course - August 27th Jeff Oegema, Peter Steinbach, Oscar Gonzalez 1 Talk Outline Introduction and the IT Team SPIM Data Flow Capture, Compression, and the Data
More informationIntel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance
Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance Hybrid Storage Performance Gains for IOPS and Bandwidth Utilizing Colfax Servers and Enmotus FuzeDrive Software NVMe Hybrid
More informationHPC @ CRIBI. Calcolo Scientifico e Bioinformatica oggi Università di Padova 13 gennaio 2012
HPC @ CRIBI Calcolo Scientifico e Bioinformatica oggi Università di Padova 13 gennaio 2012 what is exact? experience on advanced computational technologies a company lead by IT experts with a strong background
More informationParallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
More informationStovepipes to Clouds. Rick Reid Principal Engineer SGI Federal. 2013 by SGI Federal. Published by The Aerospace Corporation with permission.
Stovepipes to Clouds Rick Reid Principal Engineer SGI Federal 2013 by SGI Federal. Published by The Aerospace Corporation with permission. Agenda Stovepipe Characteristics Why we Built Stovepipes Cluster
More information159.735. Final Report. Cluster Scheduling. Submitted by: Priti Lohani 04244354
159.735 Final Report Cluster Scheduling Submitted by: Priti Lohani 04244354 1 Table of contents: 159.735... 1 Final Report... 1 Cluster Scheduling... 1 Table of contents:... 2 1. Introduction:... 3 1.1
More informationScaling Across the Supercomputer Performance Spectrum
Scaling Across the Supercomputer Performance Spectrum Cray s XC40 system leverages the combined advantages of next-generation Aries interconnect and Dragonfly network topology, Intel Xeon processors, integrated
More informationScalable filesystems boosting Linux storage solutions
Scalable filesystems boosting Linux storage solutions Daniel Kobras science + computing ag IT-Dienstleistungen und Software für anspruchsvolle Rechnernetze Tübingen München Berlin Düsseldorf Motivation
More informationPARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN
1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction
More informationScaling from Workstation to Cluster for Compute-Intensive Applications
Cluster Transition Guide: Scaling from Workstation to Cluster for Compute-Intensive Applications IN THIS GUIDE: The Why: Proven Performance Gains On Cluster Vs. Workstation The What: Recommended Reference
More informationsupercomputing. simplified.
supercomputing. simplified. INTRODUCING WINDOWS HPC SERVER 2008 R2 SUITE Windows HPC Server 2008 R2, Microsoft s third-generation HPC solution, provides a comprehensive and costeffective solution for harnessing
More informationDriving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA
WHITE PAPER April 2014 Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA Executive Summary...1 Background...2 File Systems Architecture...2 Network Architecture...3 IBM BigInsights...5
More informationHPC Growing Pains. Lessons learned from building a Top500 supercomputer
HPC Growing Pains Lessons learned from building a Top500 supercomputer John L. Wofford Center for Computational Biology & Bioinformatics Columbia University I. What is C2B2? Outline Lessons learned from
More informationBest Practices for Data Sharing in a Grid Distributed SAS Environment. Updated July 2010
Best Practices for Data Sharing in a Grid Distributed SAS Environment Updated July 2010 B E S T P R A C T I C E D O C U M E N T Table of Contents 1 Abstract... 2 1.1 Storage performance is critical...
More informationEasier - Faster - Better
Highest reliability, availability and serviceability ClusterStor gets you productive fast with robust professional service offerings available as part of solution delivery, including quality controlled
More informationAchieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks
WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance
More informationIntegrated Application and Data Protection. NEC ExpressCluster White Paper
Integrated Application and Data Protection NEC ExpressCluster White Paper Introduction Critical business processes and operations depend on real-time access to IT systems that consist of applications and
More informationAdvanced Techniques with Newton. Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011
Advanced Techniques with Newton Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011 Workshop Goals Gain independence Executing your work Finding Information Fixing Problems Optimizing Effectiveness
More informationGPFS und HPSS am HLRS
GPFS und HPSS am HLRS Peter W. Haas Archivierung im Bereich Höchstleistungsrechner Swisstopo, Bern 3. Juli 2009 1 High Performance Computing Center Stuttgart Table of Contents 1. What are GPFS and HPSS
More informationData storage considerations for HTS platforms. George Magklaras -- node manager http://www.no.embnet.org http://www.biotek.uio.no admin@embnet.uio.
Data storage considerations for HTS platforms George Magklaras -- node manager http://www.no.embnet.org http://www.biotek.uio.no admin@embnet.uio.no Overview: The need for data storage Volume dimensioning
More informationRecommended hardware system configurations for ANSYS users
Recommended hardware system configurations for ANSYS users The purpose of this document is to recommend system configurations that will deliver high performance for ANSYS users across the entire range
More informationPADS GPFS Filesystem: Crash Root Cause Analysis. Computation Institute
PADS GPFS Filesystem: Crash Root Cause Analysis Computation Institute Argonne National Laboratory Table of Contents Purpose 1 Terminology 2 Infrastructure 4 Timeline of Events 5 Background 5 Corruption
More informationLife Sciences Opening the pipe to faster research, discovery, computation and resource sharing
Solution Brief: Life Sciences Opening the pipe to faster research, discovery, computation and resource sharing Abstract Advances in Information Technology (IT) are significantly improving the speed at
More informationCray XT3 Supercomputer Scalable by Design CRAY XT3 DATASHEET
CRAY XT3 DATASHEET Cray XT3 Supercomputer Scalable by Design The Cray XT3 system offers a new level of scalable computing where: a single powerful computing system handles the most complex problems every
More informationA Comparison on Current Distributed File Systems for Beowulf Clusters
A Comparison on Current Distributed File Systems for Beowulf Clusters Rafael Bohrer Ávila 1 Philippe Olivier Alexandre Navaux 2 Yves Denneulin 3 Abstract This paper presents a comparison on current file
More informationAn Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing
An Alternative Storage Solution for MapReduce Eric Lomascolo Director, Solutions Marketing MapReduce Breaks the Problem Down Data Analysis Distributes processing work (Map) across compute nodes and accumulates
More informationUsing VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems
Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems Applied Technology Abstract By migrating VMware virtual machines from one physical environment to another, VMware VMotion can
More informationAn Oracle White Paper September 2011. Oracle Exadata Database Machine - Backup & Recovery Sizing: Tape Backups
An Oracle White Paper September 2011 Oracle Exadata Database Machine - Backup & Recovery Sizing: Tape Backups Table of Contents Introduction... 3 Tape Backup Infrastructure Components... 4 Requirements...
More informationInfiniBand Update Addressing new I/O challenges in HPC, Cloud, and Web 2.0 infrastructures. Brian Sparks IBTA Marketing Working Group Co-Chair
InfiniBand Update Addressing new I/O challenges in HPC, Cloud, and Web 2.0 infrastructures Brian Sparks IBTA Marketing Working Group Co-Chair Page 1 IBTA & OFA Update IBTA today has over 50 members; OFA
More informationDeveloping High-Performance, Scalable, cost effective storage solutions with Intel Cloud Edition Lustre* and Amazon Web Services
Reference Architecture Developing Storage Solutions with Intel Cloud Edition for Lustre* and Amazon Web Services Developing High-Performance, Scalable, cost effective storage solutions with Intel Cloud
More information- An Essential Building Block for Stable and Reliable Compute Clusters
Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative
More informationPetascale Software Challenges. Piyush Chaudhary piyushc@us.ibm.com High Performance Computing
Petascale Software Challenges Piyush Chaudhary piyushc@us.ibm.com High Performance Computing Fundamental Observations Applications are struggling to realize growth in sustained performance at scale Reasons
More informationComputational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar
Computational infrastructure for NGS data analysis José Carbonell Caballero Pablo Escobar Computational infrastructure for NGS Cluster definition: A computer cluster is a group of linked computers, working
More informationHow To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx
More informationCATS-i : LINUX CLUSTER ADMINISTRATION TOOLS ON THE INTERNET
CATS-i : LINUX CLUSTER ADMINISTRATION TOOLS ON THE INTERNET Jiyeon Kim, Yongkwan Park, Sungjoo Kwon, Jaeyoung Choi {heaven, psiver, lithmoon}@ss.ssu.ac.kr, choi@comp.ssu.ac.kr School of Computing, Soongsil
More informationHow to Choose your Red Hat Enterprise Linux Filesystem
How to Choose your Red Hat Enterprise Linux Filesystem EXECUTIVE SUMMARY Choosing the Red Hat Enterprise Linux filesystem that is appropriate for your application is often a non-trivial decision due to
More informationPreparation Guide. How to prepare your environment for an OnApp Cloud v3.0 (beta) deployment.
Preparation Guide v3.0 BETA How to prepare your environment for an OnApp Cloud v3.0 (beta) deployment. Document version 1.0 Document release date 25 th September 2012 document revisions 1 Contents 1. Overview...
More informationTransforming the UL into a Big Data University. Current status and planned evolutions
Transforming the UL into a Big Data University Current status and planned evolutions December 6th, 2013 UniGR Workshop - Big Data Sébastien Varrette, PhD Prof. Pascal Bouvry Prof. Volker Müller http://hpc.uni.lu
More informationCan High-Performance Interconnects Benefit Memcached and Hadoop?
Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,
More informationStateless Compute Cluster
5th Black Forest Grid Workshop 23rd April 2009 Stateless Compute Cluster Fast Deployment and Switching of Cluster Computing Nodes for easier Administration and better Fulfilment of Different Demands Dirk
More informationDepartment of Computer Sciences University of Salzburg. HPC In The Cloud? Seminar aus Informatik SS 2011/2012. July 16, 2012
Department of Computer Sciences University of Salzburg HPC In The Cloud? Seminar aus Informatik SS 2011/2012 July 16, 2012 Michael Kleber, mkleber@cosy.sbg.ac.at Contents 1 Introduction...................................
More informationAchieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003
Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks An Oracle White Paper April 2003 Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building
More informationWhite Paper Solarflare High-Performance Computing (HPC) Applications
Solarflare High-Performance Computing (HPC) Applications 10G Ethernet: Now Ready for Low-Latency HPC Applications Solarflare extends the benefits of its low-latency, high-bandwidth 10GbE server adapters
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationThe Ultimate in Scale-Out Storage for HPC and Big Data
Node Inventory Health and Active Filesystem Throughput Monitoring Asset Utilization and Capacity Statistics Manager brings to life powerful, intuitive, context-aware real-time monitoring and proactive
More informationwww.thinkparq.com www.beegfs.com
www.thinkparq.com www.beegfs.com KEY ASPECTS Maximum Flexibility Maximum Scalability BeeGFS supports a wide range of Linux distributions such as RHEL/Fedora, SLES/OpenSuse or Debian/Ubuntu as well as a
More informationFujitsu HPC Cluster Suite
Webinar Fujitsu HPC Cluster Suite 29 th May 2013 Павел Борох 0 HPC: полный спектр предложений от Fujitsu PRIMERGY Server, Workstation Cluster Management & Operation ISV and Research Partnerships HPC Cluster
More informationMicrosoft Technical Computing The Advancement of Parallelism. Tom Quinn, Technical Computing Partner Manager
Presented at the COMSOL Conference 2010 Boston Microsoft Technical Computing The Advancement of Parallelism Tom Quinn, Technical Computing Partner Manager 21 1.2 x 10 New Bytes of Information in 2010 Source:
More informationUnisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise
Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise Introducing Unisys All in One software based weather platform designed to reduce server space, streamline operations, consolidate
More informationNetApp High-Performance Computing Solution for Lustre: Solution Guide
Technical Report NetApp High-Performance Computing Solution for Lustre: Solution Guide Robert Lai, NetApp August 2012 TR-3997 TABLE OF CONTENTS 1 Introduction... 5 1.1 NetApp HPC Solution for Lustre Introduction...5
More informationCMS Tier-3 cluster at NISER. Dr. Tania Moulik
CMS Tier-3 cluster at NISER Dr. Tania Moulik What and why? Grid computing is a term referring to the combination of computer resources from multiple administrative domains to reach common goal. Grids tend
More information(AS ON 07.08.2015) A. Original tender document page no: 2 1. TENDER NOTICE
AMENDMENTS TO TENDER REFERENCE NO - AU/CPC-RCC/HPC/2015-16 TENDER DOCUMENT FOR SUPPLY, INSTALLATION AND COMMISSIONING OF HIGHPERFORMANCE COMPUTING (HPC) HYBRID SYSTEM A. Original tender document page no:
More informationWorkshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012
Scientific Application Performance on HPC, Private and Public Cloud Resources: A Case Study Using Climate, Cardiac Model Codes and the NPB Benchmark Suite Peter Strazdins (Research School of Computer Science),
More informationThe Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland
The Lattice Project: A Multi-Model Grid Computing System Center for Bioinformatics and Computational Biology University of Maryland Parallel Computing PARALLEL COMPUTING a form of computation in which
More informationPost-production Video Editing Solution Guide with Microsoft SMB 3 File Serving AssuredSAN 4000
Post-production Video Editing Solution Guide with Microsoft SMB 3 File Serving AssuredSAN 4000 Dot Hill Systems introduction 1 INTRODUCTION Dot Hill Systems offers high performance network storage products
More informationHeadline in Arial Bold 30pt. The Need For Speed. Rick Reid Principal Engineer SGI
Headline in Arial Bold 30pt The Need For Speed Rick Reid Principal Engineer SGI Commodity Systems Linux Red Hat SUSE SE-Linux X86-64 Intel Xeon AMD Scalable Programming Model MPI Global Data Access NFS
More informationDesigned for Maximum Accelerator Performance
Designed for Maximum Accelerator Performance A dense, GPU-accelerated cluster supercomputer that delivers up to 329 double-precision GPU teraflops in one rack. This power- and spaceefficient system can
More informationAre Blade Servers Right For HEP?
Are Blade Servers Right For HEP? Rochelle Lauer Yale University Physics Department rochelle.lauer@yale.edu c 2002 Rochelle Lauer:1 Outline Blade Server Evaluation Why and How The HP BL Blade Servers The
More informationGPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"
GPFS Storage Server Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " Agenda" GPFS Overview" Classical versus GSS I/O Solution" GPFS Storage Server (GSS)" GPFS Native RAID
More informationNFS SERVER WITH 10 GIGABIT ETHERNET NETWORKS
NFS SERVER WITH 1 GIGABIT ETHERNET NETWORKS A Dell Technical White Paper DEPLOYING 1GigE NETWORK FOR HIGH PERFORMANCE CLUSTERS By Li Ou Massive Scale-Out Systems delltechcenter.com Deploying NFS Server
More informationArchitecting a High Performance Storage System
WHITE PAPER Intel Enterprise Edition for Lustre* Software High Performance Data Division Architecting a High Performance Storage System January 2014 Contents Introduction... 1 A Systematic Approach to
More informationUpgrading Small Business Client and Server Infrastructure E-LEET Solutions. E-LEET Solutions is an information technology consulting firm
Thank you for considering E-LEET Solutions! E-LEET Solutions is an information technology consulting firm that specializes in low-cost high-performance computing solutions. This document was written as
More information