THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid"

Transcription

1 THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

2 Contents 2 The ARCOS Group. Expand motivation. Expand design. Expand evaluation. Conclusions. Ongoing Work.

3 University Carlos III of Madrid 3 Founded in 1989 Three faculties: Faculty of Social Sciences and Law. Faculty of Humanities, Documentation and Communication. Higher Technical School.

4 The ARCOS Group 4 The Computer Architecture, Communications and Systems Group is part of the Department of Computer Science. 20 full time members 9 PhD s (2 full professors + 4 associate professors + 3 visiting professors). 11 PhD students

5 Research lines 5 Data management on Grid environments. Parallel file systems. Optimization of irregular applications. OS for Wireless Sensor Networks. Real-time systems.

6 Some products 6 Expand: A parallel file system for cluster and grid environment. WinPFS: Windows Parallel File System. MiMPI: MPI implementation for heterogeneous cluster environments

7 Contents 7 The ARCOS Group. Expand motivation. Expand design. Expand evaluation. Conclusions. Ongoing Work.

8 June-93 April-94 February-95 December-95 October-96 August-97 June-98 April-99 February-00 December-00 October-01 August-02 June-03 April-04 February-05 December-05 October-06 8 Trends in the supercomputing environment Clusters in top500.org 75 % of supercomputers in top500 are clusters.

9 Trends in the supercomputing 9 environment Number of transistors per chip still doubling every 1.5 years. Does not mean doubling frequency, performance, More space more cores per chip.

10 Trends in the supercomputing 10 environment Grid Computing: Interconnecting supercomputers to aggregate geographically distributed resources. Applications are deployed somewhere in the grid. Applications read input data and produce output data.

11 Trends in the supercomputing 11 environment Clusters becoming the preferred option for supercomputing. Processors with increasing capacity. Grid computing using clusters as a building block. I/O will remain as a major bottleneck.

12 Storage system typical architecture 12 Clients Communication network Storage network I/O server

13 Aggregated bandwidth (MB/s) Problems with storage architectures Clients NAS

14 Solution: Parallelism 14 Parallel applications Parallel computers Exploit parallelism at multiple layers Parallel file systems Parallel devices

15 Parallel File System Architecture 15 Computing node Computing node Computing node Apps Client Communication Network I/O Server I/O Server I/O Server File 1 File 2

16 Parallel File System Architecture 16 Computing node Computing node Computing node Apps Client Communication Network GPFS I/O Server I/O Server I/O Server Storage Network File 1 File 2

17 Parallel File System Architecture 17 GPFS Computing node Computing node Computing node Client Apps Storage Network File 1 File 2

18 18 Expand Parallel File System: Motivation Provide a high performance storage system by using standard protocols and servers. Easy integration of heterogeneous systems. Reuse and aggregation of existing resources. Parallel data access.

19 Why Expand? 19 A standard data server already includes almost all the needed functionality. Reuse. Standard protocols and servers make resources universally available. Easy to deploy. Independence of the underlying storage infrastructure. Portability.

20 Objective 20 Offer a new approach to build PFS for cluster and grid environments by using standard data servers. Advantages: No server change needed. Operations at client side. Independence of client and server OS s. Operations through standard protocols. Simplified PFS construction. Take advantage of already implemented server high performance mechanisms. Allows mixing servers with different platforms and OS s. Easy installation and configuration.

21 Contents 21 The ARCOS Group. Expand motivation. Expand design. Expand evaluation. Conclusions. Ongoing Work.

22 Computing node Architecture 22 POSIX MPI-IO Expand NFS GridFTP RNS-WS Local Parallel access Server protocol Distributed partition File 1 File 2

23 File structure 23 Expand file: Metadata sub-file. Several data sub-files. Data distributed across several servers. File-to-server flexible mapping policy. Sub-files Expand file Server 1 Server 2 Server

24 Directory structure 24 Logical View Mapping Physical View Dir1 /Expand Dir2 Dir3 Server 1 Server 2 Server 3 /export1 /export2 /export3 Dir1 Dir2 Dir3 Dir1 Dir2 Dir3 Dir1 Dir2 Dir3 Dir4 Dir4 Dir4 Dir4 FileA FileA FileA FileA

25 Metadata management 25 Metadata distributed management. Two levels. Without locking. No metadata manager. Metadata distributed across servers. Master node. Hashing on name. Load balancing. Expand file block Server 1 Server 2 Server metadata 2 5 8

26 Metadata management 26 Metadata distributed management. Two levels. Without locking. No metadata manager. Metadata distributed across servers. Master node. Hashing on name. Load balancing. Expand file Server 1 Server 2 Server 3 Metadata FileA Metadata FileC block Metadata FileD Metadata FileF Metadata FileB Metadata FileE

27 Parallel access 27 read(fd, buffer,count) buffer Data blocks Expand Threads Server 1 Server 2 Server

28 MPI-IO interface using ROMIO 28 MPI-IO ADIO Unix NFS PFVS Expand IBM PIOFS SGI XFS Distributed partition

29 Dynamic partition reconfiguration 29 Server 1 Server 2 Server 3 Server Instantaneous. Deferred. hash(file) = server 3

30 Dynamic partition reconfiguration 30 Server 1 Server 2 Server 3 Server

31 Expand cluster versions 31 Linux/NFS Java/NFS NFS Server NFS Server NFS Server NFS Server Distributed partition Distributed partition

32 Contents 32 The ARCOS Group. Expand motivation. Expand design. Expand adaptation for Grid Computing. Expand evaluation. Conclusions. Ongoing Work.

33 Requirements for a Grid File System 33 Hierarchical logical space name. Resource Namespace Service (RNS). Standard access interface. POSIX and MPI-IO. Data access. GridFTP. Security. Grid Security Infrastructure (GSI). Performance optimization and improvement. Paralle I/O.

34 Computing node Adapting Expand to Grid environments 34 POSIX MPI-IO Computing node Expand Computing node NFS GridFTP RNS-WS Local RNS NFS NFS NFS Internet + GSI GridFTP GridFTP GridFTP GridFTP Distributed partition Site 1 Site 2 Site 3 Site 4 Distributed partition

35 Contents 35 The ARCOS Group. Expand motivation. Expand design. Expand evaluation. Conclusions. Ongoing Work.

36 Evaluation 36 How does Expand behaves compared to other existing solutions? Cluster PFVS. GPFS. Grid Globus Grid services.

37 Cluster environment 37 8 biprocessors (Pentium VI, 3.2 GHz). 2 GB RAM per node. Network: Gigabit ethernet. Expand. PVFS. GPFS.

38 Cluster benchmarnking 38 High performance. Parallel access to a file: IOR benchmark. FLASH I/O benchmark. Metadata operations. High throughput. Image processing. Dynamic partition reconfiguration.

39 High performance: 39 Parallel access to a file Parallel program (IOR) making interleaved writes and reads to a single file with different access sizes. MPI-IO interface Process 1 Process 2 File

40 Bandwidth (MB/s) 40 High performance: Parallel access to a file for writing processes (writing) XPN PVFS GPFS 0 access size

41 Bandwidth (MB/s) 41 High performance: Parallel access to a file for reading processes (reading) XPN PVFS GPFS 0 access size

42 Bandwidth (MB/s) 42 High performance: Parallel access to a file for writing 140 Parallel access writing (8 KB) XPN PVFS GPFS Number of processes

43 Bandwidth (MB/s) 43 High performance. Parallel access to a file for writing XPN PVFS GPFS Parallel access writing (256 KB) Number of processes

44 High Performance: FLASH-IO 44 FLASH is a parallel application simulating thermonuclear flashes. FLASH-IO simulates I/O operations performed by FLASH. Data size is proportional to number of running processes. 1 process MB 16 processes 1.16 GB

45 Bandwidth (MB/s) High Performance: FLASH-IO XPN PVFS GPFS Benchmark FLASH-IO Number of processes

46 Files/second Metadata: Creating empty files File creation (empty files)) process 4 processes XPN PVFS GPFS Filesystem

47 Files/second Metadata: Creating small files File creation (small files) process 4 processes XPN PVFS GPFS Filesystem

48 High throughput 48 Parallel application processing a set of 256 images. Each process works on a subset of images independently. No concurrent access to a file. Sizes: Image file 5 MB. Full dataset 2.5 GB. The process applies to each image file a fixed bitmask to generate a new image file.

49 Time (s) 49 High throughput: Image processing in C Image processing (C application) XPN PVFS GPFS Number of processes

50 Time (s) 50 High throughput: Image processing in Java 450 Image processing (Java application) XPN PVFS GPFS Number of processes

51 Bandwidth (MB/s) Reconstruction time (min) 51 Dynamic partition reconfiguration: Adding new nodes Bandwidth (MB/s) Reconstruction time (min) Reconstruction Model 0

52 Grid evaluation environment 52 Evaluation for high throughput. Perform 500 jobs. Each job selects randomly a file (among 200) to access. File size is 200 MB.

53 Testbed environment 53 Intel Pentium GHz GridFTP Intel Xeon Doble Procesador 2.4GHz Intel(R) Pentium(R) 4 CPU 2.40GHz GridFTP GridFTP GridFTP Intel Pentium 4 CPU 2.80GHz AMD Athlon

54 Evaluated scenarios 54 Typical Grid Completely transfer file to local node. Processing starts after transfer finishes. Globus services for transfer. globus-url-copy Expand Direct remote access to file. No previous transfer to node needed! José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

55 Model 1 / Scenario server Complete transfer to local node. Application access local copy. GridFTP Files José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

56 Model 2 / Scenario servers. Distributed files. Each server stores 50 files. Complete transfer to local node. Application accesses local copy. GridFTP GridFTP GridFTP GridFTP Files José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

57 Computing node Scenario 2 (Expand) 57 Expand with 1, 2 and 4 servers. POSIX MPI-IO Expand NFS GridFTP RNS-WS Local Local node accesses remotely needed data. No previous transfer needed. RNS Internet + GSI GridFTP GridFTP GridFTP GridFTP Site 1 Site 2 Site 3 Site 4 José Daniel García Sánchez ARCOS Group University Carlos III of Madrid Distributed partition

58 Time (min) Grid Evaluation site 4 sites (distributed files) Expand (1 server) Expand (2 servers) Expand (4 servers) Model

59 Contents 59 The ARCOS Group. Expand motivation. Expand design. Expand evaluation. Conclusions. Ongoing Work.

60 Conclusions 60 It is feasible to build parallel file system by using standard protocols and servers. Our solution is easily adaptable to different environments/situations (cluster and grid are examples). Performance results are comparable to other solutions (even comercial).

61 Contents 61 The ARCOS Group. Expand motivation. Expand design. Expand evaluation. Conclusions. Ongoing Work.

62 Ongoing work 62 Add new protocols (e.g. Web Services) Evaluation in large clusters and grid environments. Use Expand to improve performance when accessing replicated data.

63 Ongoing work 63 Use Expand as intermediate file system in large clusters. Apps Expand Cluster File System (PFVS, GPFS, etc.) Compute nodes Parallel access Network I/O nodes

64 THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

Shared Parallel File System

Shared Parallel File System Shared Parallel File System Fangbin Liu fliu@science.uva.nl System and Network Engineering University of Amsterdam Shared Parallel File System Introduction of the project The PVFS2 parallel file system

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

An On-line Backup Function for a Clustered NAS System (X-NAS)

An On-line Backup Function for a Clustered NAS System (X-NAS) _ An On-line Backup Function for a Clustered NAS System (X-NAS) Yoshiko Yasuda, Shinichi Kawamoto, Atsushi Ebata, Jun Okitsu, and Tatsuo Higuchi Hitachi, Ltd., Central Research Laboratory 1-28 Higashi-koigakubo,

More information

Enabling Technologies for Distributed Computing

Enabling Technologies for Distributed Computing Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Multi-core CPUs and Multithreading Technologies

More information

Enabling Technologies for Distributed and Cloud Computing

Enabling Technologies for Distributed and Cloud Computing Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Multi-core CPUs and Multithreading

More information

Department of Computer Sciences University of Salzburg. HPC In The Cloud? Seminar aus Informatik SS 2011/2012. July 16, 2012

Department of Computer Sciences University of Salzburg. HPC In The Cloud? Seminar aus Informatik SS 2011/2012. July 16, 2012 Department of Computer Sciences University of Salzburg HPC In The Cloud? Seminar aus Informatik SS 2011/2012 July 16, 2012 Michael Kleber, mkleber@cosy.sbg.ac.at Contents 1 Introduction...................................

More information

Interaction of Access Patterns on the dnfsp File System Rodrigo Virote Kassick Francieli Zanon Boito Philippe O.A. Navaux

Interaction of Access Patterns on the dnfsp File System Rodrigo Virote Kassick Francieli Zanon Boito Philippe O.A. Navaux [ ] Interaction of Access Patterns on the dnfsp File System Rodrigo Virote Kassick Francieli Zanon Boito Philippe O.A. Navaux GPPD Conferencia Latinoamericana de Computación de Alto Rendimiento 2009 --

More information

Cray DVS: Data Virtualization Service

Cray DVS: Data Virtualization Service Cray : Data Virtualization Service Stephen Sugiyama and David Wallace, Cray Inc. ABSTRACT: Cray, the Cray Data Virtualization Service, is a new capability being added to the XT software environment with

More information

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical

More information

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems About me David Rioja Redondo Telecommunication Engineer - Universidad de Alcalá >2 years building and managing clusters UPM

More information

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters COSC 6374 Parallel Computation Parallel I/O (I) I/O basics Spring 2008 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network

More information

Bottleneck Detection in Parallel File Systems with Trace-Based Performance Monitoring

Bottleneck Detection in Parallel File Systems with Trace-Based Performance Monitoring Julian M. Kunkel - Euro-Par 2008 1/33 Bottleneck Detection in Parallel File Systems with Trace-Based Performance Monitoring Julian M. Kunkel Thomas Ludwig Institute for Computer Science Parallel and Distributed

More information

InterferenceRemoval: Removing Interference of Disk Access for MPI Programs through Data Replication

InterferenceRemoval: Removing Interference of Disk Access for MPI Programs through Data Replication InterferenceRemoval: Removing Interference of Disk Access for MPI Programs through Data Replication Xuechen Zhang and Song Jiang The ECE Department Wayne State University Detroit, MI, 4822, USA {xczhang,

More information

Distributed RAID Architectures for Cluster I/O Computing. Kai Hwang

Distributed RAID Architectures for Cluster I/O Computing. Kai Hwang Distributed RAID Architectures for Cluster I/O Computing Kai Hwang Internet and Cluster Computing Lab. University of Southern California 1 Presentation Outline : Scalable Cluster I/O The RAID-x Architecture

More information

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,

More information

www.thinkparq.com www.beegfs.com

www.thinkparq.com www.beegfs.com www.thinkparq.com www.beegfs.com KEY ASPECTS Maximum Flexibility Maximum Scalability BeeGFS supports a wide range of Linux distributions such as RHEL/Fedora, SLES/OpenSuse or Debian/Ubuntu as well as a

More information

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures 11 th International LS-DYNA Users Conference Computing Technology A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures Yih-Yih Lin Hewlett-Packard Company Abstract In this paper, the

More information

A Survey of Shared File Systems

A Survey of Shared File Systems Technical Paper A Survey of Shared File Systems Determining the Best Choice for your Distributed Applications A Survey of Shared File Systems A Survey of Shared File Systems Table of Contents Introduction...

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu

More information

IMPLEMENTING GREEN IT

IMPLEMENTING GREEN IT Saint Petersburg State University of Information Technologies, Mechanics and Optics Department of Telecommunication Systems IMPLEMENTING GREEN IT APPROACH FOR TRANSFERRING BIG DATA OVER PARALLEL DATA LINK

More information

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. W. Jin D. K. Panda Network Based

More information

GeoGrid Project and Experiences with Hadoop

GeoGrid Project and Experiences with Hadoop GeoGrid Project and Experiences with Hadoop Gong Zhang and Ling Liu Distributed Data Intensive Systems Lab (DiSL) Center for Experimental Computer Systems Research (CERCS) Georgia Institute of Technology

More information

Introduction to Gluster. Versions 3.0.x

Introduction to Gluster. Versions 3.0.x Introduction to Gluster Versions 3.0.x Table of Contents Table of Contents... 2 Overview... 3 Gluster File System... 3 Gluster Storage Platform... 3 No metadata with the Elastic Hash Algorithm... 4 A Gluster

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

Analisi di un servizio SRM: StoRM

Analisi di un servizio SRM: StoRM 27 November 2007 General Parallel File System (GPFS) The StoRM service Deployment configuration Authorization and ACLs Conclusions. Definition of terms Definition of terms 1/2 Distributed File System The

More information

Stanford HPC Conference. Panasas Storage System Integration into a Cluster

Stanford HPC Conference. Panasas Storage System Integration into a Cluster Stanford HPC Conference Panasas Storage System Integration into a Cluster David Yu Industry Verticals Panasas Inc. Steve Jones Technology Operations Manager Institute for Computational and Mathematical

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

New!! - Higher performance for Windows and UNIX environments

New!! - Higher performance for Windows and UNIX environments New!! - Higher performance for Windows and UNIX environments The IBM TotalStorage Network Attached Storage Gateway 300 (NAS Gateway 300) is designed to act as a gateway between a storage area network (SAN)

More information

Measurement of BeStMan Scalability

Measurement of BeStMan Scalability Measurement of BeStMan Scalability Haifeng Pi, Igor Sfiligoi, Frank Wuerthwein, Abhishek Rana University of California San Diego Tanya Levshina Fermi National Accelerator Laboratory Alexander Sim, Junmin

More information

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT SS Data & Storage CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT HEPiX Fall 2012 Workshop October 15-19, 2012 Institute of High Energy Physics, Beijing, China SS Outline

More information

POSIX and Object Distributed Storage Systems

POSIX and Object Distributed Storage Systems 1 POSIX and Object Distributed Storage Systems Performance Comparison Studies With Real-Life Scenarios in an Experimental Data Taking Context Leveraging OpenStack Swift & Ceph by Michael Poat, Dr. Jerome

More information

Building a Linux Cluster

Building a Linux Cluster Building a Linux Cluster CUG Conference May 21-25, 2001 by Cary Whitney Clwhitney@lbl.gov Outline What is PDSF and a little about its history. Growth problems and solutions. Storage Network Hardware Administration

More information

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance

More information

Performance in a Gluster System. Versions 3.1.x

Performance in a Gluster System. Versions 3.1.x Performance in a Gluster System Versions 3.1.x TABLE OF CONTENTS Table of Contents... 2 List of Figures... 3 1.0 Introduction to Gluster... 4 2.0 Gluster view of Performance... 5 2.1 Good performance across

More information

Virtualised MikroTik

Virtualised MikroTik Virtualised MikroTik MikroTik in a Virtualised Hardware Environment Speaker: Tom Smyth CTO Wireless Connect Ltd. Event: MUM Krackow Feb 2008 http://wirelessconnect.eu/ Copyright 2008 1 Objectives Understand

More information

Integrated Grid Solutions. and Greenplum

Integrated Grid Solutions. and Greenplum EMC Perspective Integrated Grid Solutions from SAS, EMC Isilon and Greenplum Introduction Intensifying competitive pressure and vast growth in the capabilities of analytic computing platforms are driving

More information

POWER ALL GLOBAL FILE SYSTEM (PGFS)

POWER ALL GLOBAL FILE SYSTEM (PGFS) POWER ALL GLOBAL FILE SYSTEM (PGFS) Defining next generation of global storage grid Power All Networks Ltd. Technical Whitepaper April 2008, version 1.01 Table of Content 1. Introduction.. 3 2. Paradigm

More information

Will They Blend?: Exploring Big Data Computation atop Traditional HPC NAS Storage

Will They Blend?: Exploring Big Data Computation atop Traditional HPC NAS Storage Will They Blend?: Exploring Big Data Computation atop Traditional HPC NAS Storage Ellis H. Wilson III 1,2 Mahmut Kandemir 1 Garth Gibson 2,3 1 Department of Computer Science and Engineering, The Pennsylvania

More information

Storage benchmarking cookbook

Storage benchmarking cookbook Storage benchmarking cookbook How to perform solid storage performance measurements Stijn Eeckhaut Stijn De Smet, Brecht Vermeulen, Piet Demeester The situation today: storage systems can be very complex

More information

A Comparison on Current Distributed File Systems for Beowulf Clusters

A Comparison on Current Distributed File Systems for Beowulf Clusters A Comparison on Current Distributed File Systems for Beowulf Clusters Rafael Bohrer Ávila 1 Philippe Olivier Alexandre Navaux 2 Yves Denneulin 3 Abstract This paper presents a comparison on current file

More information

AlphaTrust PRONTO - Hardware Requirements

AlphaTrust PRONTO - Hardware Requirements AlphaTrust PRONTO - Hardware Requirements 1 / 9 Table of contents Server System and Hardware Requirements... 3 System Requirements for PRONTO Enterprise Platform Software... 5 System Requirements for Web

More information

CSAR: Cluster Storage with Adaptive Redundancy

CSAR: Cluster Storage with Adaptive Redundancy CSAR: Cluster Storage with Adaptive Redundancy Manoj Pillai, Mario Lauria Department of Computer and Information Science The Ohio State University Columbus, OH, 4321 Email: pillai,lauria@cis.ohio-state.edu

More information

Parallel I/O on JUQUEEN

Parallel I/O on JUQUEEN Parallel I/O on JUQUEEN 3. February 2015 3rd JUQUEEN Porting and Tuning Workshop Sebastian Lührs, Kay Thust s.luehrs@fz-juelich.de, k.thust@fz-juelich.de Jülich Supercomputing Centre Overview Blue Gene/Q

More information

Current Status of FEFS for the K computer

Current Status of FEFS for the K computer Current Status of FEFS for the K computer Shinji Sumimoto Fujitsu Limited Apr.24 2012 LUG2012@Austin Outline RIKEN and Fujitsu are jointly developing the K computer * Development continues with system

More information

Cloud Computing through Virtualization and HPC technologies

Cloud Computing through Virtualization and HPC technologies Cloud Computing through Virtualization and HPC technologies William Lu, Ph.D. 1 Agenda Cloud Computing & HPC A Case of HPC Implementation Application Performance in VM Summary 2 Cloud Computing & HPC HPC

More information

A Performance Analysis of Distributed Indexing using Terrier

A Performance Analysis of Distributed Indexing using Terrier A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Scala Storage Scale-Out Clustered Storage White Paper

Scala Storage Scale-Out Clustered Storage White Paper White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current

More information

Deploying a distributed data storage system on the UK National Grid Service using federated SRB

Deploying a distributed data storage system on the UK National Grid Service using federated SRB Deploying a distributed data storage system on the UK National Grid Service using federated SRB Manandhar A.S., Kleese K., Berrisford P., Brown G.D. CCLRC e-science Center Abstract As Grid enabled applications

More information

760 Veterans Circle, Warminster, PA 18974 215-956-1200. Technical Proposal. Submitted by: ACT/Technico 760 Veterans Circle Warminster, PA 18974.

760 Veterans Circle, Warminster, PA 18974 215-956-1200. Technical Proposal. Submitted by: ACT/Technico 760 Veterans Circle Warminster, PA 18974. 760 Veterans Circle, Warminster, PA 18974 215-956-1200 Technical Proposal Submitted by: ACT/Technico 760 Veterans Circle Warminster, PA 18974 for Conduction Cooled NAS Revision 4/3/07 CC/RAIDStor: Conduction

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

System Requirements Table of contents

System Requirements Table of contents Table of contents 1 Introduction... 2 2 Knoa Agent... 2 2.1 System Requirements...2 2.2 Environment Requirements...4 3 Knoa Server Architecture...4 3.1 Knoa Server Components... 4 3.2 Server Hardware Setup...5

More information

Scalable stochastic tracing of distributed data management events

Scalable stochastic tracing of distributed data management events Scalable stochastic tracing of distributed data management events Mario Lassnig mario.lassnig@cern.ch ATLAS Data Processing CERN Physics Department Distributed and Parallel Systems University of Innsbruck

More information

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage

More information

Quantum StorNext. Product Brief: Distributed LAN Client

Quantum StorNext. Product Brief: Distributed LAN Client Quantum StorNext Product Brief: Distributed LAN Client NOTICE This product brief may contain proprietary information protected by copyright. Information in this product brief is subject to change without

More information

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays Red Hat Performance Engineering Version 1.0 August 2013 1801 Varsity Drive Raleigh NC

More information

Xeon+FPGA Platform for the Data Center

Xeon+FPGA Platform for the Data Center Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system

More information

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters COSC 6374 Parallel I/O (I) I/O basics Fall 2012 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network card 1 Network card

More information

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server

More information

Performance Analysis of Mixed Distributed Filesystem Workloads

Performance Analysis of Mixed Distributed Filesystem Workloads Performance Analysis of Mixed Distributed Filesystem Workloads Esteban Molina-Estolano, Maya Gokhale, Carlos Maltzahn, John May, John Bent, Scott Brandt Motivation Hadoop-tailored filesystems (e.g. CloudStore)

More information

CMS Tier-3 cluster at NISER. Dr. Tania Moulik

CMS Tier-3 cluster at NISER. Dr. Tania Moulik CMS Tier-3 cluster at NISER Dr. Tania Moulik What and why? Grid computing is a term referring to the combination of computer resources from multiple administrative domains to reach common goal. Grids tend

More information

Finite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing

Finite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing Microsoft Windows Compute Cluster Server 2003 Partner Solution Brief Finite Elements Infinite Possibilities. Virtual Simulation and High-Performance Computing Microsoft Windows Compute Cluster Server Runs

More information

File System Suite of Benchmarks

File System Suite of Benchmarks File System Suite of Benchmarks John Corbin President EP Network Storage Performance Lab jcorbin@nsplab.com Page 1 of Overview File System Benchmark Types File System Suite of Benchmarks NFS Client Benchmark

More information

vpfs: Bandwidth Virtualization of Parallel Storage Systems

vpfs: Bandwidth Virtualization of Parallel Storage Systems vpfs: Bandwidth Virtualization of Parallel Storage Systems Yiqi Xu, Dulcardo Arteaga, Ming Zhao Florida International University {yxu6,darte3,ming}@cs.fiu.edu Yonggang Liu, Renato Figueiredo University

More information

SERVER CLUSTERING TECHNOLOGY & CONCEPT

SERVER CLUSTERING TECHNOLOGY & CONCEPT SERVER CLUSTERING TECHNOLOGY & CONCEPT M00383937, Computer Network, Middlesex University, E mail: vaibhav.mathur2007@gmail.com Abstract Server Cluster is one of the clustering technologies; it is use for

More information

MOSIX: High performance Linux farm

MOSIX: High performance Linux farm MOSIX: High performance Linux farm Paolo Mastroserio [mastroserio@na.infn.it] Francesco Maria Taurino [taurino@na.infn.it] Gennaro Tortone [tortone@na.infn.it] Napoli Index overview on Linux farm farm

More information

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN 1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction

More information

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop

More information

Influence of Virtualization on Process of Grid Application Deployment

Influence of Virtualization on Process of Grid Application Deployment Influence of Virtualization on Process of Grid Application Deployment CCM case study Distributed Systems Research Group Department of Computer Science AGH-UST Cracow, Poland Krzysztof Zieliński, Background

More information

Study of Load Balancing of Resource Namespace Service

Study of Load Balancing of Resource Namespace Service Study of Load Balancing of Resource Namespace Service Masahiro Nakamura, Osamu Tatebe University of Tsukuba Background Resource Namespace Service (RNS) is published as GDF.101 by OGF RNS is intended to

More information

Fixed Price Website Load Testing

Fixed Price Website Load Testing Fixed Price Website Load Testing Can your website handle the load? Don t be the last one to know. For as low as $4,500, and in many cases within one week, we can remotely load test your website and report

More information

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007 Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms Cray User Group Meeting June 2007 Cray s Storage Strategy Background Broad range of HPC requirements

More information

Module 2: "Parallel Computer Architecture: Today and Tomorrow" Lecture 4: "Shared Memory Multiprocessors" The Lecture Contains: Technology trends

Module 2: Parallel Computer Architecture: Today and Tomorrow Lecture 4: Shared Memory Multiprocessors The Lecture Contains: Technology trends The Lecture Contains: Technology trends Architectural trends Exploiting TLP: NOW Supercomputers Exploiting TLP: Shared memory Shared memory MPs Bus-based MPs Scaling: DSMs On-chip TLP Economics Summary

More information

Improving Grid Processing Efficiency through Compute-Data Confluence

Improving Grid Processing Efficiency through Compute-Data Confluence Solution Brief GemFire* Symphony* Intel Xeon processor Improving Grid Processing Efficiency through Compute-Data Confluence A benchmark report featuring GemStone Systems, Intel Corporation and Platform

More information

FPGA-based Multithreading for In-Memory Hash Joins

FPGA-based Multithreading for In-Memory Hash Joins FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded

More information

Scaling in a Hypervisor Environment

Scaling in a Hypervisor Environment Scaling in a Hypervisor Environment Richard McDougall Chief Performance Architect VMware VMware ESX Hypervisor Architecture Guest Monitor Guest TCP/IP Monitor (BT, HW, PV) File System CPU is controlled

More information

Understanding the Benefits of IBM SPSS Statistics Server

Understanding the Benefits of IBM SPSS Statistics Server IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster

More information

WAN Transfer Acceleration

WAN Transfer Acceleration WAN Transfer Acceleration Product Description Functionality Interfaces Specifications Index 1 Functionality... 3 2 Integration... 3 3 Interfaces... 4 3.1 Physical Interfaces...5 3.1.1 Ethernet Network...5

More information

SALSA Flash-Optimized Software-Defined Storage

SALSA Flash-Optimized Software-Defined Storage Flash-Optimized Software-Defined Storage Nikolas Ioannou, Ioannis Koltsidas, Roman Pletka, Sasa Tomic,Thomas Weigold IBM Research Zurich 1 New Market Category of Big Data Flash Multiple workloads don t

More information

Neptune. A Domain Specific Language for Deploying HPC Software on Cloud Platforms. Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams

Neptune. A Domain Specific Language for Deploying HPC Software on Cloud Platforms. Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams Neptune A Domain Specific Language for Deploying HPC Software on Cloud Platforms Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams ScienceCloud 2011 @ San Jose, CA June 8, 2011 Cloud Computing Three

More information

Hadoop & its Usage at Facebook

Hadoop & its Usage at Facebook Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the The Israeli Association of Grid Technologies July 15, 2009 Outline Architecture

More information

Clusters: Mainstream Technology for CAE

Clusters: Mainstream Technology for CAE Clusters: Mainstream Technology for CAE Alanna Dwyer HPC Division, HP Linux and Clusters Sparked a Revolution in High Performance Computing! Supercomputing performance now affordable and accessible Linux

More information

Implementing Network Attached Storage. Ken Fallon Bill Bullers Impactdata

Implementing Network Attached Storage. Ken Fallon Bill Bullers Impactdata Implementing Network Attached Storage Ken Fallon Bill Bullers Impactdata Abstract The Network Peripheral Adapter (NPA) is an intelligent controller and optimized file server that enables network-attached

More information

High Throughput WAN Data Transfer with Hadoop-based Storage

High Throughput WAN Data Transfer with Hadoop-based Storage High Throughput WAN Data Transfer with Hadoop-based Storage A Amin 2, B Bockelman 4, J Letts 1, T Levshina 3, T Martin 1, H Pi 1, I Sfiligoi 1, M Thomas 2, F Wüerthwein 1 1 University of California, San

More information

Comparing the Network Performance of Windows File Sharing Environments

Comparing the Network Performance of Windows File Sharing Environments Technical Report Comparing the Network Performance of Windows File Sharing Environments Dan Chilton, Srinivas Addanki, NetApp September 2010 TR-3869 EXECUTIVE SUMMARY This technical report presents the

More information

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Building a Top500-class Supercomputing Cluster at LNS-BUAP Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad

More information

Efficient Data Management Support for Virtualized Service Providers

Efficient Data Management Support for Virtualized Service Providers Efficient Data Management Support for Virtualized Service Providers Íñigo Goiri, Ferran Julià and Jordi Guitart Barcelona Supercomputing Center - Technical University of Catalonia Jordi Girona 31, 834

More information

AIX NFS Client Performance Improvements for Databases on NAS

AIX NFS Client Performance Improvements for Databases on NAS AIX NFS Client Performance Improvements for Databases on NAS October 20, 2005 Sanjay Gulabani Sr. Performance Engineer Network Appliance, Inc. gulabani@netapp.com Diane Flemming Advisory Software Engineer

More information

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...

More information

Best Practices for Data Sharing in a Grid Distributed SAS Environment. Updated July 2010

Best Practices for Data Sharing in a Grid Distributed SAS Environment. Updated July 2010 Best Practices for Data Sharing in a Grid Distributed SAS Environment Updated July 2010 B E S T P R A C T I C E D O C U M E N T Table of Contents 1 Abstract... 2 1.1 Storage performance is critical...

More information

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory

Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory Customer Success Story Los Alamos National Laboratory Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory June 2010 Highlights First Petaflop Supercomputer

More information

Big data management with IBM General Parallel File System

Big data management with IBM General Parallel File System Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers

More information

HP reference configuration for entry-level SAS Grid Manager solutions

HP reference configuration for entry-level SAS Grid Manager solutions HP reference configuration for entry-level SAS Grid Manager solutions Up to 864 simultaneous SAS jobs and more than 3 GB/s I/O throughput Technical white paper Table of contents Executive summary... 2

More information

EXPLOITING SHARED MEMORY TO IMPROVE PARALLEL I/O PERFORMANCE

EXPLOITING SHARED MEMORY TO IMPROVE PARALLEL I/O PERFORMANCE EXPLOITING SHARED MEMORY TO IMPROVE PARALLEL I/O PERFORMANCE Andrew B. Hastings Sun Microsystems, Inc. Alok Choudhary Northwestern University September 19, 2006 This material is based on work supported

More information

Testing of several distributed file-system (HadoopFS, CEPH and GlusterFS) for supporting the HEP experiments analisys. Giacinto DONVITO INFN-Bari

Testing of several distributed file-system (HadoopFS, CEPH and GlusterFS) for supporting the HEP experiments analisys. Giacinto DONVITO INFN-Bari Testing of several distributed file-system (HadoopFS, CEPH and GlusterFS) for supporting the HEP experiments analisys. Giacinto DONVITO INFN-Bari 1 Agenda Introduction on the objective of the test activities

More information

DMF & Tiering Update. Kirill Malkin Director of Storage Engineering. September 2015

DMF & Tiering Update. Kirill Malkin Director of Storage Engineering. September 2015 DMF & Tiering Update Kirill Malkin Director of Storage Engineering September 2015 1 What s New in DMF? Data Migration Facility (DMF) Data management solution for HPC/HPDA Over 20 years of data management

More information

Windows Compute Cluster Server 2003. Miron Krokhmal CTO

Windows Compute Cluster Server 2003. Miron Krokhmal CTO Windows Compute Cluster Server 2003 Miron Krokhmal CTO Agenda The Windows compute cluster architecture o Hardware and software requirements o Supported network topologies o Deployment strategies, including

More information

Gfarm: Present Status and Future Evolution

Gfarm: Present Status and Future Evolution OpenSFS APAC Lustre User Group 2013 Tokyo October 17, 2013 Gfarm: Present Status and Future Evolution Osamu Tatebe University of Tsukuba Gfarm file system Award-winning file system since 2000 Distributed

More information

2. COMPUTER SYSTEM. 2.1 Introduction

2. COMPUTER SYSTEM. 2.1 Introduction 2. COMPUTER SYSTEM 2.1 Introduction The computer system at the Japan Meteorological Agency (JMA) has been repeatedly upgraded since IBM 704 was firstly installed in 1959. The current system has been completed

More information