Fine-grained File System Monitoring with Lustre Jobstat
|
|
|
- Hector Smith
- 10 years ago
- Views:
Transcription
1 Fine-grained File System Monitoring with Lustre Jobstat Daniel Rodwell Patrick Fitzhenry
2 Agenda What is NCI Petascale HPC at NCI (Raijin) Lustre at NCI Lustre on Raijin Site-wide Lustre Job Aware Filesystem Monitoring Scheduler Workloads Challenges Monitoring Linking IO activity to specific jobs
3 WHAT IS NCI
4 In case you re wondering where are we located? In the Nation s capital, at its National University
5 NCI an overview NCI is Australia s national high-performance computing service comprehensive, vertically-integrated research service providing national access on priority and merit driven by research objectives Operates as a formal collaboration of ANU, CSIRO, the Australian Bureau of Meteorology and Geoscience Australia As a partnership with a number of research-intensive universities, supported by the Australian Research Council.
6 NCI an overview Research Outcomes Our mission is To foster ambitious and aspirational research objectives and to enable their realisation, in the Australian context, through world-class, high-end computing services. Research Objectives Communities and Institutions/ Access and Services Expertise Support and Development Data Intensive Services Virtual Laboratories Compute (HPC/Cloud) and Data Infrastructure
7 NCI an overview NCI is: Being driven by research objectives, An integral component of the Commonwealth s research infrastructure program, A comprehensive, vertically-integrated research service, Engaging with, and is embedded in, research communities, high-impact centres, and institutions, Fostering aspiration in computational and dataintensive research, and increasing ambition in the use of HPC to enhance research impact, Delivering innovative virtual laboratories, Providing national access on priority and merit, and Being built on, and sustained by, a collaboration of national organisations and research-intensive universities. Research Objectives Research Outcomes Communities and Institutions/ Access and Services Expertise Support and Development Data Intensive Services Virtual Laboratories Compute (HPC/Cloud) and Data Infrastructure
8 Engagement: Research Communities Research focus areas Climate Science and Earth System Science Astronomy (optical and theoretical) Geosciences: Geophysics, Earth Observation Biosciences & Bioinformatics Computational Sciences Engineering Chemistry Physics Social Sciences Growing emphasis on data-intensive computation Cloud Services Earth System Grid
9 Raijin PETASCALE NCI
10 Raijin Petascale Supercomputer Raijin Fujitsu Primergy cluster, June 2013: 57,472 cores (Intel Xeon Sandy Bridge, 2.6 GHz) in 3592 compute nodes; 157TBytes of main memory; Infiniband FDR interconnect; and 7.6 Pbytes of usable fast filesystem (for shortterm scratch space). 24th fastest in the world in debut (November 2012); first petaflop system in Australia 1195 Tflops, 1,400,000 SPECFPrate Custom monitoring and deployment Custom Kernel, CentOS 6.4 Linux Highly customised PBS Pro scheduler. FDR interconnects by Mellanox ~52 KM of IB cabling. 1.5 MW power; 100 tonnes of water in cooling
11 LUSTRE AT NCI
12 Storage Overview 10 GigE Internet NCI data movers VMware Cloud Raijin Login + Data movers Raijin Compute To Huxley DC 10 GigE /g/data 56Gb FDR IB Fabric Raijin 56Gb FDR IB Fabric Massdata CXFS / DMF /g/data Lustre Raijin FS Lustre Cache 1.0PB, Tape 12.3PB /g/data1 ~6.1PB /g/data2 ~4.5PB /short 7.6PB /home, /system, /images, /apps
13 Raijin:/short Filesystem: raijin:/short (HPC short-term storage) Type Lustre v2.4.2 parallel distributed file system Purpose High Performance short term storage for peak HPC system Capacity 7.6 PB Throughput Connectivity Access Protocols Backup & Recovery Max 150GB/sec aggregate sequential write Avg 600MB/sec individual file 56 Gbit FDR Infiniband (Raijin compute nodes) 10 Gbit Ethernet (Raijin login nodes) Native Lustre mount (Raijin compute nodes) SFTP/SCP/Rsync-ssh (Raijin login nodes) No Backup or data recovery
14 Lustre on Raijin: Building Blocks and Composition Lustre servers are Fujitsu Primergy RX300 S7 Dual 2.6GHz Xeon (Sandy Bridge) 8-core CPUs 128/256GB DDR3 RAM 6 MDS (3 HA pairs) 40 OSS (20 HA pairs) All Lustre servers are diskless Current image is CentOS 6.4, Mellanox OFED 2.0, Lustre v 2.4.2, corosync/pacemaker (image was updated January 2014 simply required a reboot into new image) HA configuration needs to be regenerated whenever a HA pair is rebooted 5 Lustre file systems: /short scratch file system (rw) /images images for root over Lustre used by compute nodes (ro) /apps user application software (ro) /home home directories (rw) /system critical backups, benchmarking, rw-templates (rw)
15 Storage Architecture of Raijin Storage for Raijin (HPC) provided by DDN SFA block appliances 5 storage building blocks of SFA12K40-IB with 10 x SS8460, 84 bay disk enclosures Each building block: 72 x RAID6 (8+2) 3TB 7.2k SAS pools 16 x RAID1 (1+1) 3TB 7.2k SAS pools 32 x RAID1 (1+1) 900GB 10k SAS pools 2 x SFA12K40-IB 12 x 3TB 7.2k SAS hot spares 8 x 900GB 10k SAS hot spares Building blocks scale diagonally with both capacity & performance 8 x OSS Servers
16 Lustre on Raijin: Metadata Building Blocks Metadata storage is based on the DDN EF3015 storage platform Each metadata storage block has 12 RAID1 (1+1) 300GB 15kSAS pools. There are 2/4 storage blocks for each MDS. Fully redundant Direct Attached FC8 fabric Fully redundant FDR IB uplinks to main cluster IB fabric
17 /g/data Filesystem: /g/data (NCI Global Data) Type Lustre v parallel distributed filesystem Purpose High Performance Filesystem available across all NCI systems Capacity Throughput Connectivity Access Protocols Backup & Recovery 6.1 PB /g/data1 4.5 PB /g/data2 /g/data1: 21GB/sec, /g/data2: 45GB/sec aggregate sequential write Avg 500MB/sec individual file 56 Gbit FDR Infiniband (Raijin compute nodes) 10 Gbit Ethernet (NCI datamovers, /g/data NFS servers) Native Lustre mount via LNET (Raijin compute nodes) SFTP/SCP/Rsync-ssh (NCI datamover nodes), NFSv3 Lustre HSM, DMF backed with dual site tape copy (Q3 2014, Lustre 2.5.x)
18 Site-wide Lustre Functional Composition 9 Mellanox FDR IB switches, in non-blocking fat tree fabric 2 x Management Servers: onesis, DNS, DHCP etc 4 x NFS servers 21 x LNET routers connected to Raijin 2 x MDSs 72x OSSes Legend 8 Gb MM FC 10 GigE 56Gb FDR IB 10 x SGI arrays hosting OSTs 2 x DDN SFA12K hosting OSTs SGI IS5000 array hosting 4 MDTs NCI Data Centre /g/data1 ~6.1PB /g/data2 ~4.5PB
19 JOB AWARE LUSTRE MONITORING
20 Job Aware Monitoring Scheduler Raijin uses a highly customised version of Altair s PBS Professional Scheduler Typically Concurrent jobs running, queued Walltime can be 100s of seconds to 100s of hours
21 Job Aware Monitoring Workload Significant growth in highly scaling application codes. Largest: 40,000 cores; many 1,000 core tasks Heterogeneous memory distribution 2395 x 32GB nodes 1125 x 64GB nodes 72 x 128GB nodes High variability in workload applications & disciplines, cores, memory, code optimisation level. Mix of optimised and suboptimal IO patterns Legend: OSS activity
22 Job Aware Monitoring Problem Jobs = Needle in a haystack /short active MDS 1m load avg = 226 Large number of concurrent jobs running Not all application codes are properly optimised A few badly behaved jobs can impact on overall filesystem performance, degraded IO for all jobs Compute allocation!= IOPS consumption Massive small IO is often detrimental to Lustre performance, particularly MDS load /short active MDS long term load avg ~20
23 Job Aware Monitoring DDN has a current project for Job Aware Lustre Monitoring (Suichi Ihara, Xi Li, Patrick Fitzhenry) with NCI as an active collaborator (Daniel Rodwell, Javed Shaikh). The project aims to develop a centralised Lustre statistics monitoring and analysis tool: Collect near real-time data (minimum of every 1 sec.) and visualise them All Lustre statistical information can be collected Support Lustre v1.8.x, v2.x and beyond Application Aware Monitoring (Job statistics) Administrators can make custom graphs within a web browser Easy to use dashboard Scalable, Lightweight and without performance impact
24 Job Aware Monitoring Lustre keeps a lot of statistical information in Linux's /proc and it can be quite helpful for debugging and I/O analysis. Since Lustre v2.3, we can collect Lustre I/O statistics per JOBID. Administrators can see very detailed and application aware I/O statistics, as well as Cluster wide I/O statistics.
25 Monitoring System Architecture MDSs Analysis & visualisation of statistics with graphs on web browser Lustre Clients OSSs Monitoring Server... All Lustre Servers send statistical data at short intervals Monitoring Agent
26 Software Stack PC (Web Browser) OSS/MDS Server graph view/definition HTTP Monitoring Server collectd dashboards Lustre DDN Lustre plugin graphite plugin graphite collectd graphite plugin DB UDP(TCP)/IP based small text message transfer
27 Collectd We chose collectd ( Running on many Enterprise/HPC systems Written in C for performance and portability Includes optimisations and features to handle 10,000s data sets Comes with over 90 plugins which range from standard to very specialised and advanced topics Provides powerful networking features and is very extensible Actively developed and well documented The Lustre Plugin extends collectd to collect Lustre statistics while inheriting its advantages It is possible to port the Lustre Plugin to a different framework if necessary
28 XML Definition of Lustre s /proc Information Provides tree structured descriptions about how to collect statistics from Lustre /proc entries Modular A hierarchical framework comprised of a core logic layer (Lustre plugin) and statistics definition layer (XML files) Extendable without the need to update any source code of the Lustre plugin Easy to maintain the stability of the core logic Centralised A single XML file for all definitions of Lustre data collection No need to maintain massive error-prone scripts Easy to verify correctness Easy to support multiple versions and update to new versions of Lustre
29 XML Definition of Lustre s /proc Information Precise Strict rules using regular expression can be configured to filter out all but what we exactly want Locations to save collected statistics are explicitly defined and configurable Powerful Any statistic can be collected as long as there are proper regular expressions to match it Extensible Any newly required statistics can be collected rapidly by adding definitions in the XML file Efficient No matter how many definitions are predefined in the XML file, only in-use definitions will be traversed at run-time
30 Lustre Plugin Configuration Structured configuration about which statistics to collect LoadPlugin lustre <Plugin "lustre"> <Common> # The location of definition file DefinitionFile "/etc/lustre_definition.xml" </Common> <Item> # A unique item type defined in XML file which describes how to collect Type "ost_jobstats" # A Rule to determine whether to collect or not <Rule> # One of the fields of collected data defined in XML file Field "job_id" # Any regular expression which determines whether the field matches Match "dd_[[:digit:]][.][[:digit:]]+" </Rule> # Could configure multiple rules for logical conjunction </Item> </Plugin>
31 Graphite - Analysis of Statistical Information We chose Graphite/Carbon( An open source software Any graph can be made from collected statistics Each statistic is kept in its own binary file file system(cluster-wide), OST Pool, each OST/MDT statistics, etc JOB ID, UID/GID, aggregation of application's statistics, etc Archive of data by policy (e.g. 1 sec interval for one week, 10 sec for a month, 1 min for a year) It can export data to CSV, MySQL, etc Hierarchical architecture is possible for large scale
32 User and Job Aware Monitoring
33 Where To From Here? We have seen that we can monitor the I/O of specific applications using the Lustre JobStats. In the NCI context the next step is to integrate the PBSPro resource manager with this monitoring system. This will allow faster identification of user jobs with pathological I/O behaviour NCI has a broader aim for developing a highly integrated monitoring system across the whole facility. This needs to encompass the high speed interconnects, the compute and the storage subsystems. Other goals will be alerting and automated suspension & quarantine of problematic jobs.
34 Questions? Thank You!
Lustre Monitoring with OpenTSDB
Lustre Monitoring with OpenTSDB 2015/9/22 DataDirect Networks Japan, Inc. Shuichi Ihara 2 Lustre Monitoring Background Lustre is a black box Users and Administrators want to know what s going on? Find
New Storage System Solutions
New Storage System Solutions Craig Prescott Research Computing May 2, 2013 Outline } Existing storage systems } Requirements and Solutions } Lustre } /scratch/lfs } Questions? Existing Storage Systems
Current Status of FEFS for the K computer
Current Status of FEFS for the K computer Shinji Sumimoto Fujitsu Limited Apr.24 2012 LUG2012@Austin Outline RIKEN and Fujitsu are jointly developing the K computer * Development continues with system
Hadoop on the Gordon Data Intensive Cluster
Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,
Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com
Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...
Cluster Implementation and Management; Scheduling
Cluster Implementation and Management; Scheduling CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Cluster Implementation and Management; Scheduling Spring 2013 1 /
NetApp High-Performance Computing Solution for Lustre: Solution Guide
Technical Report NetApp High-Performance Computing Solution for Lustre: Solution Guide Robert Lai, NetApp August 2012 TR-3997 TABLE OF CONTENTS 1 Introduction... 5 1.1 NetApp HPC Solution for Lustre Introduction...5
Lessons learned from parallel file system operation
Lessons learned from parallel file system operation Roland Laifer STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association
Highly-Available Distributed Storage. UF HPC Center Research Computing University of Florida
Highly-Available Distributed Storage UF HPC Center Research Computing University of Florida Storage is Boring Slow, troublesome, albatross around the neck of high-performance computing UF Research Computing
LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
HPC Update: Engagement Model
HPC Update: Engagement Model MIKE VILDIBILL Director, Strategic Engagements Sun Microsystems [email protected] Our Strategy Building a Comprehensive HPC Portfolio that Delivers Differentiated Customer Value
Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007
Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms Cray User Group Meeting June 2007 Cray s Storage Strategy Background Broad range of HPC requirements
Kriterien für ein PetaFlop System
Kriterien für ein PetaFlop System Rainer Keller, HLRS :: :: :: Context: Organizational HLRS is one of the three national supercomputing centers in Germany. The national supercomputing centers are working
Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC
HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical
Architecting a High Performance Storage System
WHITE PAPER Intel Enterprise Edition for Lustre* Software High Performance Data Division Architecting a High Performance Storage System January 2014 Contents Introduction... 1 A Systematic Approach to
Improving Time to Results for Seismic Processing with Paradigm and DDN. ddn.com. DDN Whitepaper. James Coomer and Laurent Thiers
DDN Whitepaper Improving Time to Results for Seismic Processing with Paradigm and DDN James Coomer and Laurent Thiers 2014 DataDirect Networks. All Rights Reserved. Executive Summary Companies in the oil
Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage
White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage
Building a Top500-class Supercomputing Cluster at LNS-BUAP
Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad
High Performance Computing OpenStack Options. September 22, 2015
High Performance Computing OpenStack PRESENTATION TITLE GOES HERE Options September 22, 2015 Today s Presenters Glyn Bowden, SNIA Cloud Storage Initiative Board HP Helion Professional Services Alex McDonald,
Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre
Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre University of Cambridge, UIS, HPC Service Authors: Wojciech Turek, Paul Calleja, John Taylor
Michael Kagan. [email protected]
Virtualization in Data Center The Network Perspective Michael Kagan CTO, Mellanox Technologies [email protected] Outline Data Center Transition Servers S as a Service Network as a Service IO as a Service
Lustre SMB Gateway. Integrating Lustre with Windows
Lustre SMB Gateway Integrating Lustre with Windows Hardware: Old vs New Compute 60 x Dell PowerEdge 1950-8 x 2.6Ghz cores, 16GB, 500GB Sata, 1GBe - Win7 x64 Storage 1 x Dell R510-12 x 2TB Sata, RAID5,
Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales
Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes Anthony Kenisky, VP of North America Sales About Appro Over 20 Years of Experience 1991 2000 OEM Server Manufacturer 2001-2007
An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing
An Alternative Storage Solution for MapReduce Eric Lomascolo Director, Solutions Marketing MapReduce Breaks the Problem Down Data Analysis Distributes processing work (Map) across compute nodes and accumulates
Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.
Reference Architecture Designing High-Performance Storage Tiers Designing High-Performance Storage Tiers Intel Enterprise Edition for Lustre* software and Intel Non-Volatile Memory Express (NVMe) Storage
Sun Constellation System: The Open Petascale Computing Architecture
CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical
EOFS Workshop Paris Sept, 2011. Lustre at exascale. Eric Barton. CTO Whamcloud, Inc. [email protected]. 2011 Whamcloud, Inc.
EOFS Workshop Paris Sept, 2011 Lustre at exascale Eric Barton CTO Whamcloud, Inc. [email protected] Agenda Forces at work in exascale I/O Technology drivers I/O requirements Software engineering issues
THE SUN STORAGE AND ARCHIVE SOLUTION FOR HPC
THE SUN STORAGE AND ARCHIVE SOLUTION FOR HPC The Right Data, in the Right Place, at the Right Time José Martins Storage Practice Sun Microsystems 1 Agenda Sun s strategy and commitment to the HPC or technical
PRIMERGY server-based High Performance Computing solutions
PRIMERGY server-based High Performance Computing solutions PreSales - May 2010 - HPC Revenue OS & Processor Type Increasing standardization with shift in HPC to x86 with 70% in 2008.. HPC revenue by operating
Application Performance for High Performance Computing Environments
Application Performance for High Performance Computing Environments Leveraging the strengths of Computationally intensive applications With high performance scale out file serving In data storage modules
PARALLELS CLOUD STORAGE
PARALLELS CLOUD STORAGE Performance Benchmark Results 1 Table of Contents Executive Summary... Error! Bookmark not defined. Architecture Overview... 3 Key Features... 5 No Special Hardware Requirements...
ALPS Supercomputing System A Scalable Supercomputer with Flexible Services
ALPS Supercomputing System A Scalable Supercomputer with Flexible Services 1 Abstract Supercomputing is moving from the realm of abstract to mainstream with more and more applications and research being
SMB Direct for SQL Server and Private Cloud
SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server
Pivot3 Desktop Virtualization Appliances. vstac VDI Technology Overview
Pivot3 Desktop Virtualization Appliances vstac VDI Technology Overview February 2012 Pivot3 Desktop Virtualization Technology Overview Table of Contents Executive Summary... 3 The Pivot3 VDI Appliance...
Object storage in Cloud Computing and Embedded Processing
Object storage in Cloud Computing and Embedded Processing Jan Jitze Krol Systems Engineer DDN We Accelerate Information Insight DDN is a Leader in Massively Scalable Platforms and Solutions for Big Data
Supercomputer System for Numerical Weather Prediction by Taiwan Central Weather Bureau
Supercomputer System for Numerical Weather Prediction by Taiwan Central Weather Bureau Fumihiro Takehara Hidenori Hayashi Junichi Fujita The Taiwan Central Weather Bureau (CWB) is responsible for forecasting
Sun in HPC. Update for IDC HPC User Forum Tucson, AZ, Sept 2008
Sun in HPC Update for IDC HPC User Forum Tucson, AZ, Sept 2008 Bjorn Andersson Director, HPC Marketing Makia Minich Lead Architect, Sun HPC Software, Linux Edition Sun Microsystems Core Focus Areas for
FLOW-3D Performance Benchmark and Profiling. September 2012
FLOW-3D Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: FLOW-3D, Dell, Intel, Mellanox Compute
Integrated Grid Solutions. and Greenplum
EMC Perspective Integrated Grid Solutions from SAS, EMC Isilon and Greenplum Introduction Intensifying competitive pressure and vast growth in the capabilities of analytic computing platforms are driving
IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads
89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com @EdisonGroupInc 212.367.7400 IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads A Competitive Test and Evaluation Report
Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014
Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet Anand Rangaswamy September 2014 Storage Developer Conference Mellanox Overview Ticker: MLNX Leading provider of high-throughput,
Quantum StorNext. Product Brief: Distributed LAN Client
Quantum StorNext Product Brief: Distributed LAN Client NOTICE This product brief may contain proprietary information protected by copyright. Information in this product brief is subject to change without
February, 2015 Bill Loewe
February, 2015 Bill Loewe Agenda System Metadata, a growing issue Parallel System - Lustre Overview Metadata and Distributed Namespace Test setup and implementation for metadata testing Scaling Metadata
Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF
Panasas at the RCF HEPiX at SLAC Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory Centralized File Service Single, facility-wide namespace for files. Uniform, facility-wide
Lustre * Filesystem for Cloud and Hadoop *
OpenFabrics Software User Group Workshop Lustre * Filesystem for Cloud and Hadoop * Robert Read, Intel Lustre * for Cloud and Hadoop * Brief Lustre History and Overview Using Lustre with Hadoop Intel Cloud
Lustre Networking BY PETER J. BRAAM
Lustre Networking BY PETER J. BRAAM A WHITE PAPER FROM CLUSTER FILE SYSTEMS, INC. APRIL 2007 Audience Architects of HPC clusters Abstract This paper provides architects of HPC clusters with information
Sun Storage Perspective & Lustre Architecture. Dr. Peter Braam VP Sun Microsystems
Sun Storage Perspective & Lustre Architecture Dr. Peter Braam VP Sun Microsystems Agenda Future of Storage Sun s vision Lustre - vendor neutral architecture roadmap Sun s view on storage introduction The
EXAScaler. Product Release Notes. Version 2.0.1. Revision A0
EXAScaler Version 2.0.1 Product Release Notes Revision A0 December 2013 Important Information Information in this document is subject to change without notice and does not represent a commitment on the
Globus and the Centralized Research Data Infrastructure at CU Boulder
Globus and the Centralized Research Data Infrastructure at CU Boulder Daniel Milroy, [email protected] Conan Moore, [email protected] Thomas Hauser, [email protected] Peter Ruprecht,
Data management challenges in todays Healthcare and Life Sciences ecosystems
Data management challenges in todays Healthcare and Life Sciences ecosystems Jose L. Alvarez Principal Engineer, WW Director Life Sciences [email protected] Evolution of Data Sets in Healthcare
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper
Monitoring Tools for Large Scale Systems
Monitoring Tools for Large Scale Systems Ross Miller, Jason Hill, David A. Dillow, Raghul Gunasekaran, Galen Shipman, Don Maxwell Oak Ridge Leadership Computing Facility, Oak Ridge National Laboratory
(Scale Out NAS System)
For Unlimited Capacity & Performance Clustered NAS System (Scale Out NAS System) Copyright 2010 by Netclips, Ltd. All rights reserved -0- 1 2 3 4 5 NAS Storage Trend Scale-Out NAS Solution Scaleway Advantages
Lustre failover experience
Lustre failover experience Lustre Administrators and Developers Workshop Paris 1 September 25, 2012 TOC Who we are Our Lustre experience: the environment Deployment Benchmarks What's next 2 Who we are
Introduction History Design Blue Gene/Q Job Scheduler Filesystem Power usage Performance Summary Sequoia is a petascale Blue Gene/Q supercomputer Being constructed by IBM for the National Nuclear Security
Scala Storage Scale-Out Clustered Storage White Paper
White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current
Can High-Performance Interconnects Benefit Memcached and Hadoop?
Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,
How To Get The Most Out Of A Large Data Set
DDN Solution Brief Overcoming > The Big Data Technology Hurdle Turning Data into Answers with DDN & Vertica 20 Networks. All Rights Reserved. Executive Summary Networks and Vertica have collaborated to
Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution
Arista 10 Gigabit Ethernet Switch Lab-Tested with Panasas ActiveStor Parallel Storage System Delivers Best Results for High-Performance and Low Latency for Scale-Out Cloud Storage Applications Introduction
Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband
Intel Cluster Ready Appro Xtreme-X Computers with Mellanox QDR Infiniband A P P R O I N T E R N A T I O N A L I N C Steve Lyness Vice President, HPC Solutions Engineering [email protected] Company Overview
HP reference configuration for entry-level SAS Grid Manager solutions
HP reference configuration for entry-level SAS Grid Manager solutions Up to 864 simultaneous SAS jobs and more than 3 GB/s I/O throughput Technical white paper Table of contents Executive summary... 2
Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks
WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance
Lustre & Cluster. - monitoring the whole thing Erich Focht
Lustre & Cluster - monitoring the whole thing Erich Focht NEC HPC Europe LAD 2014, Reims, September 22-23, 2014 1 Overview Introduction LXFS Lustre in a Data Center IBviz: Infiniband Fabric visualization
Enabling High performance Big Data platform with RDMA
Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery
CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT
SS Data & Storage CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT HEPiX Fall 2012 Workshop October 15-19, 2012 Institute of High Energy Physics, Beijing, China SS Outline
Business white paper. HP Process Automation. Version 7.0. Server performance
Business white paper HP Process Automation Version 7.0 Server performance Table of contents 3 Summary of results 4 Benchmark profile 5 Benchmark environmant 6 Performance metrics 6 Process throughput 6
High Performance Computing Specialists. ZFS Storage as a Solution for Big Data and Flexibility
High Performance Computing Specialists ZFS Storage as a Solution for Big Data and Flexibility Introducing VA Technologies UK Based System Integrator Specialising in High Performance ZFS Storage Partner
The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC2013 - Denver
1 The PHI solution Fujitsu Industry Ready Intel XEON-PHI based solution SC2013 - Denver Industrial Application Challenges Most of existing scientific and technical applications Are written for legacy execution
Optimizing Large Arrays with StoneFly Storage Concentrators
Optimizing Large Arrays with StoneFly Storage Concentrators All trademark names are the property of their respective companies. This publication contains opinions of which are subject to change from time
High Availability Databases based on Oracle 10g RAC on Linux
High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN, June 2006 Luca Canali, CERN IT Outline Goals Architecture of an HA DB Service Deployment at the CERN Physics Database
JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert
Mitglied der Helmholtz-Gemeinschaft JUROPA Linux Cluster An Overview 19 May 2014 Ulrich Detert JuRoPA JuRoPA Jülich Research on Petaflop Architectures Bull, Sun, ParTec, Intel, Mellanox, Novell, FZJ JUROPA
GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"
GPFS Storage Server Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " Agenda" GPFS Overview" Classical versus GSS I/O Solution" GPFS Storage Server (GSS)" GPFS Native RAID
Isilon IQ Scale-out NAS for High-Performance Applications
Isilon IQ Scale-out NAS for High-Performance Applications Optimizing Performance with Isilon IQ Storage By Shai Harmelin, Sr. Solutions Architect An Isilon Systems Technical Whitepaper July 2009 ISILON
Vess. Architectural & Engineering Specifications For Video Surveillance. A2200 Series. www.promise.com. Version: 1.2 Feb, 2013
Vess A2200 Series Architectural & Engineering Specifications Version: 1.2 Feb, 2013 www.promise.com Copyright 2013 Promise Technology, Inc. All Rights Reserved. No part of this document may be reproduced
HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect [email protected]
HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect [email protected] EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training
Low-cost storage @PSNC
Low-cost storage @PSNC Update for TF-Storage TF-Storage meeting @Uppsala, September 22nd, 2014 Agenda Motivations data center perspective Application / use-case Hardware components: bought some, will buy
HPC Software Requirements to Support an HPC Cluster Supercomputer
HPC Software Requirements to Support an HPC Cluster Supercomputer Susan Kraus, Cray Cluster Solutions Software Product Manager Maria McLaughlin, Cray Cluster Solutions Product Marketing Cray Inc. WP-CCS-Software01-0417
www.thinkparq.com www.beegfs.com
www.thinkparq.com www.beegfs.com KEY ASPECTS Maximum Flexibility Maximum Scalability BeeGFS supports a wide range of Linux distributions such as RHEL/Fedora, SLES/OpenSuse or Debian/Ubuntu as well as a
2009 Oracle Corporation 1
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material,
New and Improved Lustre Performance Monitoring Tool. Torben Kling Petersen, PhD Principal Engineer. Chris Bloxham Principal Architect
New and Improved Lustre Performance Monitoring Tool Torben Kling Petersen, PhD Principal Engineer Chris Bloxham Principal Architect Lustre monitoring Performance Granular Aggregated Components Subsystem
Thematic Unit of Excellence on Computational Materials Science Solid State and Structural Chemistry Unit, Indian Institute of Science
Thematic Unit of Excellence on Computational Materials Science Solid State and Structural Chemistry Unit, Indian Institute of Science Call for Expression of Interest (EOI) for the Supply, Installation
BRIDGING EMC ISILON NAS ON IP TO INFINIBAND NETWORKS WITH MELLANOX SWITCHX
White Paper BRIDGING EMC ISILON NAS ON IP TO INFINIBAND NETWORKS WITH Abstract This white paper explains how to configure a Mellanox SwitchX Series switch to bridge the external network of an EMC Isilon
Network Storage Appliance
Enterprise File System HyperFS integrates high-performance SAN, NAS and data protection under one global namespace. No Per Seat Client Cost Based on concurrent RAID performance, connectivity, bandwidth,
WHITE PAPER 1 WWW.FUSIONIO.COM
1 WWW.FUSIONIO.COM WHITE PAPER WHITE PAPER Executive Summary Fusion iovdi is the first desktop- aware solution to virtual desktop infrastructure. Its software- defined approach uniquely combines the economics
Netapp @ 10th TF-Storage Meeting
Netapp @ 10th TF-Storage Meeting Wojciech Janusz, Netapp Poland Bogusz Błaszkiewicz, Netapp Poland Ljubljana, 2012.02.20 Agenda Data Ontap Cluster-Mode pnfs E-Series NetApp Confidential - Internal Use
PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)
PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters from One Stop Systems (OSS) PCIe Over Cable PCIe provides greater performance 8 7 6 5 GBytes/s 4
Protect Data... in the Cloud
QUASICOM Private Cloud Backups with ExaGrid Deduplication Disk Arrays Martin Lui Senior Solution Consultant Quasicom Systems Limited Protect Data...... in the Cloud 1 Mobile Computing Users work with their
FUJITSU Enterprise Product & Solution Facts
FUJITSU Enterprise Product & Solution Facts shaping tomorrow with you Business-Centric Data Center The way ICT delivers value is fundamentally changing. Mobile, Big Data, cloud and social media are driving
Fujitsu PRIMERGY Servers Portfolio
Fujitsu Servers Portfolio Complete server solutions that drive your success shaping tomorrow with you Higher IT efficiency and reduced total cost of ownership Fujitsu Micro and Tower Servers MX130 S2 TX100
THE EMC ISILON STORY. Big Data In The Enterprise. Copyright 2012 EMC Corporation. All rights reserved.
THE EMC ISILON STORY Big Data In The Enterprise 2012 1 Big Data In The Enterprise Isilon Overview Isilon Technology Summary 2 What is Big Data? 3 The Big Data Challenge File Shares 90 and Archives 80 Bioinformatics
IBM BladeCenter H with Cisco VFrame Software A Comparison with HP Virtual Connect
IBM BladeCenter H with Cisco VFrame Software A Comparison with HP Connect Executive Overview This white paper describes how Cisco VFrame Server Fabric ization Software works with IBM BladeCenter H to provide
Parallels Cloud Storage
Parallels Cloud Storage White Paper Best Practices for Configuring a Parallels Cloud Storage Cluster www.parallels.com Table of Contents Introduction... 3 How Parallels Cloud Storage Works... 3 Deploying
How To Write An Article On An Hp Appsystem For Spera Hana
Technical white paper HP AppSystem for SAP HANA Distributed architecture with 3PAR StoreServ 7400 storage Table of contents Executive summary... 2 Introduction... 2 Appliance components... 3 3PAR StoreServ
Network Contention and Congestion Control: Lustre FineGrained Routing
Network Contention and Congestion Control: Lustre FineGrained Routing Matt Ezell HPC Systems Administrator Oak Ridge Leadership Computing Facility Oak Ridge National Laboratory International Workshop on
Integration of Virtualized Workernodes in Batch Queueing Systems The ViBatch Concept
Integration of Virtualized Workernodes in Batch Queueing Systems, Dr. Armin Scheurer, Oliver Oberst, Prof. Günter Quast INSTITUT FÜR EXPERIMENTELLE KERNPHYSIK FAKULTÄT FÜR PHYSIK KIT University of the
