A RAM-disk provisioning service for high performance data analysis

Size: px
Start display at page:

Download "A RAM-disk provisioning service for high performance data analysis"

Transcription

1 A RAM-disk provisioning service for high performance data analysis Allan Espinosa Mentors: M. Woitaszek and J. Dennis University of Chicago, National Center for Atmospheric Research July 29, / 64

2 Outline 1 Motivation: data analysis 2 Approach and challenges 3 Implementation 4 Target applications 5 Conclusions 2 / 64

3 Motivation: data-intensive post-processing Computing center Simulation results Analysis cluster Transfer nodes Spinning disk-based parallel file system 3 / 64

4 Motivation: data-intensive post-processing Computing center Simulation results Analysis cluster Transfer nodes Analysis 1 Spinning disk-based parallel file system Tape Archive 4 / 64

5 Motivation: data-intensive post-processing Computing center Simulation results Analysis cluster Transfer nodes Analysis 1 Spinning disk-based parallel file system Tape Archive 5 / 64

6 Motivation: data-intensive post-processing Computing center Simulation results Analysis cluster Transfer nodes Analysis 1 Analysis 2 Spinning disk-based parallel file system Tape Archive 6 / 64

7 Motivation: data-intensive post-processing Computing center Simulation results Analysis cluster Transfer nodes Analysis 1 Analysis 2 Analysis n... Spinning disk-based parallel file system Tape Archive 7 / 64

8 Motivation: data-intensive post-processing Computing center Simulation results Analysis cluster Transfer nodes Analysis 1 Analysis 2 Analysis n... Spinning disk-based parallel file system Tape Archive Multiple trips to disk is slow 8 / 64

9 Approach: Run analysis on RAM Fast I/O access 9 / 64

10 Approach: Run analysis on RAM Fast I/O access tmpfs or formatted /dev/ram Analysis node CPU CPU RAM-based disk Problem: Restricted parallelism 10 / 64

11 Approach: Run analysis on RAM Fast I/O access tmpfs or formatted /dev/ram NFS-exported RAM CPU CPU RAM-based disk CPU CPU Problem: Restricted data size 11 / 64

12 Approach: Run analysis on RAM Fast I/O access tmpfs or formatted /dev/ram NFS-exported RAM Split data over multiple nodes CPU CPU RAM-based disk CPU CPU RAM-based disk Problem: Requires thorough I/O management 12 / 64

13 Approach: Run analysis on RAM CPU CPU CPU CPU Fast I/O access tmpfs or formatted /dev/ram NFS-exported RAM Split data over multiple nodes Lustre parallel RAM file system Lustre parallel RAM file system CPU CPU CPU CPU 13 / 64

14 Solution: Automatically-provisioned parallel file system Polynya analysis cluster User Client Submit jobs Scheduler 14 / 64

15 Solution: Automatically-provisioned parallel file system Polynya analysis cluster User Client Submit jobs Scheduler Control Node Parallel RAM file system 15 / 64

16 Solution: Automatically-provisioned parallel file system Kraken Polynya analysis cluster File system Transfer Node User Client Submit jobs Scheduler WAN Control Node Transfer Node Parallel RAM file system 16 / 64

17 Solution: Automatically-provisioned parallel file system Kraken Polynya analysis cluster File system Transfer Node User Client Submit jobs Scheduler WAN Control Node Transfer Node Analysis Nodes Parallel RAM file system 17 / 64

18 Solution: Automatically-provisioned parallel file system Kraken Polynya analysis cluster File system Transfer Node User Client Submit jobs Scheduler WAN Control Node Transfer Node Analysis Nodes Archive Node Parallel RAM file system Tape Archive 18 / 64

19 Remote triggering the workflow Kraken Simulation finishes Trigger workflow Polynya Workflow 19 / 64

20 Remote triggering the workflow Kraken Simulation finishes Trigger workflow Polynya Workflow Request space 20 / 64

21 Remote triggering the workflow Kraken Simulation finishes Trigger workflow Polynya Workflow Request space Transfer datasets 21 / 64

22 Remote triggering the workflow Kraken Simulation finishes Trigger workflow Polynya Workflow Archive datasets Request space Transfer datasets Run analysis 22 / 64

23 Remote triggering the workflow Kraken Simulation finishes Trigger workflow Polynya Workflow Archive datasets Request space Transfer datasets Trigger cleanup Run analysis 23 / 64

24 Requesting RAM-based disk space Implementation: PBS Torque+Maui scheduler generic resource 24 / 64

25 Requesting RAM-based disk space Implementation: PBS Torque+Maui scheduler generic resource Parameters: amount of space #PBS -W #PBS -l walltime="48:00:00" #PBS -q ramdisk_service #PBS -l prologue=allocate.sh #PBS -l epilogue=cleanup.sh sleep 45h mail sleep 3h 25 / 64

26 Requesting RAM-based disk space Implementation: PBS Torque+Maui scheduler generic resource Parameters: amount of space duration of allocation #PBS -W #PBS -l walltime="48:00:00" #PBS -q ramdisk_service #PBS -l prologue=allocate.sh #PBS -l epilogue=cleanup.sh sleep 45h mail sleep 3h 26 / 64

27 Requesting RAM-based disk space Implementation: PBS Torque+Maui scheduler generic resource Parameters: amount of space duration of allocation 1 Route to control node #PBS -W x="gres:ramdisk@25" #PBS -l walltime="48:00:00" #PBS -q ramdisk_service #PBS -l prologue=allocate.sh #PBS -l epilogue=cleanup.sh sleep 45h mail user@cluster... sleep 3h 27 / 64

28 Requesting RAM-based disk space Implementation: PBS Torque+Maui scheduler generic resource Parameters: amount of space duration of allocation 1 Route to control node 2 Prepare space #PBS -W x="gres:ramdisk@25" #PBS -l walltime="48:00:00" #PBS -q ramdisk_service #PBS -l prologue=allocate.sh #PBS -l epilogue=cleanup.sh sleep 45h mail user@cluster... sleep 3h 28 / 64

29 Requesting RAM-based disk space Implementation: PBS Torque+Maui scheduler generic resource Parameters: amount of space duration of allocation 1 Route to control node 2 Prepare space 3 Sleep until allocation expiration #PBS -W x="gres:ramdisk@25" #PBS -l walltime="48:00:00" #PBS -q ramdisk_service #PBS -l prologue=allocate.sh #PBS -l epilogue=cleanup.sh sleep 45h mail user@cluster... sleep 3h 29 / 64

30 Requesting RAM-based disk space Implementation: PBS Torque+Maui scheduler generic resource Parameters: amount of space duration of allocation 1 Route to control node 2 Prepare space 3 Sleep until allocation expiration 4 notice before expiration #PBS -W x="gres:ramdisk@25" #PBS -l walltime="48:00:00" #PBS -q ramdisk_service #PBS -l prologue=allocate.sh #PBS -l epilogue=cleanup.sh sleep 45h mail user@cluster... sleep 3h 30 / 64

31 Requesting RAM-based disk space Implementation: PBS Torque+Maui scheduler generic resource Parameters: amount of space duration of allocation 1 Route to control node 2 Prepare space 3 Sleep until allocation expiration 4 notice before expiration 5 Clean up space #PBS -W x="gres:ramdisk@25" #PBS -l walltime="48:00:00" #PBS -q ramdisk_service #PBS -l prologue=allocate.sh #PBS -l epilogue=cleanup.sh sleep 45h mail user@cluster... sleep 3h 31 / 64

32 Transferring datasets Implementation: Route request to transfer nodes Striped GridFTP data nodes 32 / 64

33 Transferring datasets Implementation: Route request to transfer nodes Striped GridFTP data nodes Co-located as RAM-based disk space provider 33 / 64

34 Transferring datasets Implementation: Route request to transfer nodes Striped GridFTP data nodes Co-located as RAM-based disk space provider Other administrative components: GridFTP control channel server 34 / 64

35 Transferring datasets Implementation: Route request to transfer nodes Striped GridFTP data nodes Co-located as RAM-based disk space provider Other administrative components: GridFTP control channel server Key-authenticated SSH Remote trigger mechanism 35 / 64

36 Transferring datasets Implementation: Route request to transfer nodes Striped GridFTP data nodes Co-located as RAM-based disk space provider Other administrative components: GridFTP control channel server Key-authenticated SSH X509-authenticaed GRAM5 Remote trigger mechanism 36 / 64

37 Example application: AMWG diagnostics Compares CESM simulation data, observational data, reanalysis data 37 / 64

38 Example application: AMWG diagnostics Compares CESM simulation data, observational data, reanalysis data Parallel implementation in Swift Parallel scripting engine 38 / 64

39 Example application: AMWG diagnostics Compares CESM simulation data, observational data, reanalysis data Parallel implementation in Swift Parameters: dataset name number of time segments (years) Parallel scripting engine 39 / 64

40 Example application: AMWG diagnostics Compares CESM simulation data, observational data, reanalysis data Parallel implementation in Swift Parameters: dataset name number of time segments (years) Dataset volume: 2.8 GB per year (1 data) Parallel scripting engine 40 / 64

41 Data movement benchmarks File system /dev/null 3,190 Lustre disk 111 tmpfs RAM 2,983 XFS RAM 2,296 Lustre RAM 2,881 IOR-8 GridFTP to Polynya Write from Frost from Kraken units in MB/s from D. Duplyakin s experiments 41 / 64

42 Data movement benchmarks File system IOR-8 GridFTP to Polynya Write from Frost from Kraken /dev/null 3, Lustre disk tmpfs RAM 2, XFS RAM 2, Lustre RAM 2, units in MB/s from D. Duplyakin s experiments 32 MB TCP buffer, 16 MB block size, 4 streams 42 / 64

43 Data movement benchmarks File system IOR-8 GridFTP to Polynya Write from Frost from Kraken /dev/null 3, Lustre disk tmpfs RAM 2, XFS RAM 2, Lustre RAM 2, units in MB/s from D. Duplyakin s experiments 32 MB TCP buffer, 16 MB block size, 16 streams 43 / 64

44 Data movement benchmarks File system IOR-8 GridFTP to Polynya Write from Frost from Kraken /dev/null 3, Lustre disk tmpfs RAM 2, XFS RAM 2, Lustre RAM 2, GridFTP from Kraken to Frost: 216 MB/s units in MB/s from D. Duplyakin s experiments 32 MB TCP buffer, 16 MB block size, 16 streams 44 / 64

45 Application performance Ran on 64-CPU node, 2-year time segment (8.2 GB total) File system Runtime (s) Lustre disk 213 tmpfs RAM 29 XFS RAM 29 Lustre RAM / 64

46 Application performance From Frost: Lustre disk tmpfs RAM XFS RAM Data Transfer AMWG Analysis Lustre RAM Time (s) 46 / 64

47 End-to-end workflow Request space Time (s) 47 / 64

48 End-to-end workflow Request space Transfer Time (s) 48 / 64

49 End-to-end workflow Request space Transfer Time (s) 49 / 64

50 End-to-end workflow Request space Transfer Analysis 1 Analysis 2... Analysis n Archive Time (s) 50 / 64

51 End-to-end workflow Request space Transfer Analysis 1 Analysis 2... Analysis n Archive Time (s) 51 / 64

52 End-to-end workflow Request space Transfer Analysis 1 Analysis 2... Analysis n Cleanup Archive Time (s) 52 / 64

53 End-to-end workflow Request space Transfer Analysis 1 Analysis 2... Analysis n Cleanup Archive Time (s) 53 / 64

54 Other use case: Interactive jobs Automated workflow split component wise 54 / 64

55 Other use case: Interactive jobs Automated workflow split component wise Each step is run by the user manually 55 / 64

56 Other use case: Interactive jobs Steps: Automated workflow split component wise Each step is run by the user manually 1 Request space 2 Transfers data to allocated space (globus-url-copy or Globus Online) 3 Runs analysis on allocated space 4 notice before expiration 5 Cleanup by deleting request job 56 / 64

57 Conclusions End-to-end analysis platform without touching spinning disk 57 / 64

58 Conclusions End-to-end analysis platform without touching spinning disk Interface through familiar PBS interface 58 / 64

59 Conclusions End-to-end analysis platform without touching spinning disk Interface through familiar PBS interface Workflow automation to drive analysis 59 / 64

60 Conclusions End-to-end analysis platform without touching spinning disk Interface through familiar PBS interface Workflow automation to drive analysis Network bandwidth critical to performance 60 / 64

61 Conclusions End-to-end analysis platform without touching spinning disk Interface through familiar PBS interface Workflow automation to drive analysis Network bandwidth critical to performance Future work: Tune network for high performance data movement 61 / 64

62 Conclusions End-to-end analysis platform without touching spinning disk Interface through familiar PBS interface Workflow automation to drive analysis Network bandwidth critical to performance Future work: Tune network for high performance data movement Application-perspective file system scalability 62 / 64

63 Conclusions End-to-end analysis platform without touching spinning disk Interface through familiar PBS interface Workflow automation to drive analysis Network bandwidth critical to performance Future work: Tune network for high performance data movement Application-perspective file system scalability Explore framework on other resources: disk, bandwidth, etc. 63 / 64

64 Questions? A RAM-disk provisioning service for high performance data analysis Allan Espinosa (aespinosa@cs.uchicago.edu) Mentors: M. Woitaszek and J. Dennis University of Chicago, National Center for Atmospheric Research July 29, / 64

Data Movement and Storage. Drew Dolgert and previous contributors

Data Movement and Storage. Drew Dolgert and previous contributors Data Movement and Storage Drew Dolgert and previous contributors Data Intensive Computing Location Viewing Manipulation Storage Movement Sharing Interpretation $HOME $WORK $SCRATCH 72 is a Lot, Right?

More information

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago Outline Introduction Features Motivation Architecture Globus XIO Experimental Results 3 August 2005 The Ohio State University

More information

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez

IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez IT of SPIM Data Storage and Compression EMBO Course - August 27th Jeff Oegema, Peter Steinbach, Oscar Gonzalez 1 Talk Outline Introduction and the IT Team SPIM Data Flow Capture, Compression, and the Data

More information

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace Beth Plale Indiana University plale@cs.indiana.edu LEAD TR 001, V3.0 V3.0 dated January 24, 2007 V2.0 dated August

More information

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Rekha Singhal and Gabriele Pacciucci * Other names and brands may be claimed as the property of others. Lustre File

More information

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING José Daniel García Sánchez ARCOS Group University Carlos III of Madrid Contents 2 The ARCOS Group. Expand motivation. Expand

More information

IMPLEMENTING GREEN IT

IMPLEMENTING GREEN IT Saint Petersburg State University of Information Technologies, Mechanics and Optics Department of Telecommunication Systems IMPLEMENTING GREEN IT APPROACH FOR TRANSFERRING BIG DATA OVER PARALLEL DATA LINK

More information

Wrangler: A New Generation of Data-intensive Supercomputing. Christopher Jordan, Siva Kulasekaran, Niall Gaffney

Wrangler: A New Generation of Data-intensive Supercomputing. Christopher Jordan, Siva Kulasekaran, Niall Gaffney Wrangler: A New Generation of Data-intensive Supercomputing Christopher Jordan, Siva Kulasekaran, Niall Gaffney Project Partners Academic partners: TACC Primary system design, deployment, and operations

More information

Enterprise Architectures for Large Tiled Basemap Projects. Tommy Fauvell

Enterprise Architectures for Large Tiled Basemap Projects. Tommy Fauvell Enterprise Architectures for Large Tiled Basemap Projects Tommy Fauvell Tommy Fauvell Senior Technical Analyst Esri Professional Services Washington D.C Regional Office Project Technical Lead: - Responsible

More information

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix Alternative Deployment Models for Cloud Computing in HPC Applications Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix The case for Cloud in HPC Build it in house Assemble in the cloud?

More information

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing An Alternative Storage Solution for MapReduce Eric Lomascolo Director, Solutions Marketing MapReduce Breaks the Problem Down Data Analysis Distributes processing work (Map) across compute nodes and accumulates

More information

Investigation of storage options for scientific computing on Grid and Cloud facilities

Investigation of storage options for scientific computing on Grid and Cloud facilities Investigation of storage options for scientific computing on Grid and Cloud facilities Overview Hadoop Test Bed Hadoop Evaluation Standard benchmarks Application-based benchmark Blue Arc Evaluation Standard

More information

Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010

Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010 Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010 This document is provided as-is. Information and views expressed in this document, including URL and other Internet

More information

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O. Reference Architecture Designing High-Performance Storage Tiers Designing High-Performance Storage Tiers Intel Enterprise Edition for Lustre* software and Intel Non-Volatile Memory Express (NVMe) Storage

More information

Data Transfer and Filesystems

Data Transfer and Filesystems Data Transfer and Filesystems 07/29/2010 Mahidhar Tatineni, SDSC Acknowledgements: Lonnie Crosby, NICS Chris Jordan, TACC Steve Simms, IU Patricia Kovatch, NICS Phil Andrews, NICS Background Rapid growth

More information

Lustre * Filesystem for Cloud and Hadoop *

Lustre * Filesystem for Cloud and Hadoop * OpenFabrics Software User Group Workshop Lustre * Filesystem for Cloud and Hadoop * Robert Read, Intel Lustre * for Cloud and Hadoop * Brief Lustre History and Overview Using Lustre with Hadoop Intel Cloud

More information

Performance And Scalability In Oracle9i And SQL Server 2000

Performance And Scalability In Oracle9i And SQL Server 2000 Performance And Scalability In Oracle9i And SQL Server 2000 Presented By : Phathisile Sibanda Supervisor : John Ebden 1 Presentation Overview Project Objectives Motivation -Why performance & Scalability

More information

Microsoft SQL Server 2005 on Windows Server 2003

Microsoft SQL Server 2005 on Windows Server 2003 EMC Backup and Recovery for SAP Microsoft SQL Server 2005 on Windows Server 2003 Enabled by EMC CLARiiON CX3, EMC Disk Library, EMC Replication Manager, EMC NetWorker, and Symantec Veritas NetBackup Reference

More information

Data Center Specific Thermal and Energy Saving Techniques

Data Center Specific Thermal and Energy Saving Techniques Data Center Specific Thermal and Energy Saving Techniques Tausif Muzaffar and Xiao Qin Department of Computer Science and Software Engineering Auburn University 1 Big Data 2 Data Centers In 2013, there

More information

Long term retention and archiving the challenges and the solution

Long term retention and archiving the challenges and the solution Long term retention and archiving the challenges and the solution NAME: Yoel Ben-Ari TITLE: VP Business Development, GH Israel 1 Archive Before Backup EMC recommended practice 2 1 Backup/recovery process

More information

Current Status of FEFS for the K computer

Current Status of FEFS for the K computer Current Status of FEFS for the K computer Shinji Sumimoto Fujitsu Limited Apr.24 2012 LUG2012@Austin Outline RIKEN and Fujitsu are jointly developing the K computer * Development continues with system

More information

Benchmarking FreeBSD. Ivan Voras <ivoras@freebsd.org>

Benchmarking FreeBSD. Ivan Voras <ivoras@freebsd.org> Benchmarking FreeBSD Ivan Voras What and why? Everyone likes a nice benchmark graph :) And it's nice to keep track of these things The previous major run comparing FreeBSD to Linux

More information

EMC NETWORKER AND DATADOMAIN

EMC NETWORKER AND DATADOMAIN EMC NETWORKER AND DATADOMAIN Capabilities, options and news Madis Pärn Senior Technology Consultant EMC madis.parn@emc.com 1 IT Pressures 2009 0.8 Zettabytes 2020 35.2 Zettabytes DATA DELUGE BUDGET DILEMMA

More information

Understanding the Benefits of IBM SPSS Statistics Server

Understanding the Benefits of IBM SPSS Statistics Server IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster

More information

Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery

Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery Center for Information Services and High Performance Computing (ZIH) Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery Richard Grunzke*, Jens Krüger, Sandra Gesing, Sonja

More information

Advanced Techniques with Newton. Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011

Advanced Techniques with Newton. Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011 Advanced Techniques with Newton Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011 Workshop Goals Gain independence Executing your work Finding Information Fixing Problems Optimizing Effectiveness

More information

OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006

OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006 OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006 EXECUTIVE SUMMARY Microsoft Exchange Server is a disk-intensive application that requires high speed storage to deliver

More information

Archival Storage Systems and Data Moveability

Archival Storage Systems and Data Moveability Configuring and Tuning Archival Storage Systems Reagan Moore, Joseph Lopez, Charles Lofton, Wayne Schroeder, George Kremenek San Diego Supercomputer Center Michael Gleicher Gleicher Enterprises, LLC Abstract

More information

Ring Protection: Wrapping vs. Steering

Ring Protection: Wrapping vs. Steering Ring Protection: Wrapping vs. Steering Necdet Uzun and Pinar Yilmaz March 13, 2001 Contents Objectives What are wrapping and steering Single/dual fiber cut Comparison of wrapping and steering Simulation

More information

Dell One Identity Manager Scalability and Performance

Dell One Identity Manager Scalability and Performance Dell One Identity Manager Scalability and Performance Scale up and out to ensure simple, effective governance for users. Abstract For years, organizations have had to be able to support user communities

More information

Understanding Hadoop Performance on Lustre

Understanding Hadoop Performance on Lustre Understanding Hadoop Performance on Lustre Stephen Skory, PhD Seagate Technology Collaborators Kelsie Betsch, Daniel Kaslovsky, Daniel Lingenfelter, Dimitar Vlassarev, and Zhenzhen Yan LUG Conference 15

More information

Configuring Apache Derby for Performance and Durability Olav Sandstå

Configuring Apache Derby for Performance and Durability Olav Sandstå Configuring Apache Derby for Performance and Durability Olav Sandstå Database Technology Group Sun Microsystems Trondheim, Norway Overview Background > Transactions, Failure Classes, Derby Architecture

More information

Division of Student Affairs Email Quota Practices / Guidelines

Division of Student Affairs Email Quota Practices / Guidelines Division of Student Affairs Email Quota Practices / Guidelines Table of Contents Quota Rules:... 1 Mailbox Organization:... 2 Mailbox Folders... 2 Mailbox Rules... 2 Mailbox Size Monitoring:... 3 Using

More information

System Administration of Windchill 10.2

System Administration of Windchill 10.2 System Administration of Windchill 10.2 Overview Course Code Course Length TRN-4340-T 3 Days In this course, you will gain an understanding of how to perform routine Windchill system administration tasks,

More information

An Architecture for Dynamic Allocation of Compute Cluster Bandwidth

An Architecture for Dynamic Allocation of Compute Cluster Bandwidth 1 An Architecture for Dynamic Allocation of Compute Cluster Bandwidth John Bresnahan 1,2,3, Ian Foster 1,2,3 1 Math and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439 2 Computation

More information

owncloud Enterprise Edition on IBM Infrastructure

owncloud Enterprise Edition on IBM Infrastructure owncloud Enterprise Edition on IBM Infrastructure A Performance and Sizing Study for Large User Number Scenarios Dr. Oliver Oberst IBM Frank Karlitschek owncloud Page 1 of 10 Introduction One aspect of

More information

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

A Novel Cloud Based Elastic Framework for Big Data Preprocessing School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview

More information

NetApp High-Performance Computing Solution for Lustre: Solution Guide

NetApp High-Performance Computing Solution for Lustre: Solution Guide Technical Report NetApp High-Performance Computing Solution for Lustre: Solution Guide Robert Lai, NetApp August 2012 TR-3997 TABLE OF CONTENTS 1 Introduction... 5 1.1 NetApp HPC Solution for Lustre Introduction...5

More information

Designing a Backup Architecture That Actually Works

Designing a Backup Architecture That Actually Works Designing a up rchitecture That ctually Works W. Curtis Preston President/CEO The Storage Group curtis@thestoragegroup.com General IBM IBM d i gi t a l General HEWLETT PCKRD What will we cover? What are

More information

Science DMZs Understanding their role in high-performance data transfers

Science DMZs Understanding their role in high-performance data transfers Science DMZs Understanding their role in high-performance data transfers Chris Tracy, Network Engineer Eli Dart, Network Engineer ESnet Engineering Group Overview Bulk Data Movement a common task Pieces

More information

Scala Storage Scale-Out Clustered Storage White Paper

Scala Storage Scale-Out Clustered Storage White Paper White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current

More information

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training

More information

Technical Writing - A Practical Case Study on ehl 2004r3 Scalability testing

Technical Writing - A Practical Case Study on ehl 2004r3 Scalability testing ehl 2004r3 Scalability Whitepaper Published: 10/11/2005 Version: 1.1 Table of Contents Executive Summary... 3 Introduction... 4 Test setup and Methodology... 5 Automated tests... 5 Database... 5 Methodology...

More information

SAS Grid Manager Testing and Benchmarking Best Practices for SAS Intelligence Platform

SAS Grid Manager Testing and Benchmarking Best Practices for SAS Intelligence Platform SAS Grid Manager Testing and Benchmarking Best Practices for SAS Intelligence Platform INTRODUCTION Grid computing offers optimization of applications that analyze enormous amounts of data as well as load

More information

GARUDA - NKN Partner's Meet 2015 Big data networks and TCP

GARUDA - NKN Partner's Meet 2015 Big data networks and TCP GARUDA - NKN Partner's Meet 2015 Big data networks and TCP Brij Kishor Jashal Email brij.jashal@tifr.res.in Garuda-NKN meet 10 Sep 2015 1 Outline: Scale of LHC computing ( as an example of Big data network

More information

Prepared by: ServiceSPAN. in cooperation with. Sun Microsystems. Application Load Test March 18, 2007 Version 3

Prepared by: ServiceSPAN. in cooperation with. Sun Microsystems. Application Load Test March 18, 2007 Version 3 Prepared by: ServiceSPAN in cooperation with Sun Microsystems Application Load Test March 18, 2007 Version 3 Introduction ServiceSPAN, a provider of work center automation software to enterprise businesses,

More information

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011 Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis

More information

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Scalable Cloud Computing Solutions for Next Generation Sequencing Data Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of

More information

Chao Chen 1 Michael Lang 2 Yong Chen 1. IEEE BigData, 2013. Department of Computer Science Texas Tech University

Chao Chen 1 Michael Lang 2 Yong Chen 1. IEEE BigData, 2013. Department of Computer Science Texas Tech University Chao Chen 1 Michael Lang 2 1 1 Data-Intensive Scalable Laboratory Department of Computer Science Texas Tech University 2 Los Alamos National Laboratory IEEE BigData, 2013 Outline 1 2 3 4 Outline 1 2 3

More information

Improvement Options for LHC Mass Storage and Data Management

Improvement Options for LHC Mass Storage and Data Management Improvement Options for LHC Mass Storage and Data Management Dirk Düllmann HEPIX spring meeting @ CERN, 7 May 2008 Outline DM architecture discussions in IT Data Management group Medium to long term data

More information

Grid Data Management. Raj Kettimuthu

Grid Data Management. Raj Kettimuthu Grid Data Management Raj Kettimuthu Data Management Distributed community of users need to access and analyze large amounts of data Fusion community s International ITER project Requirement arises in both

More information

Design and Implementation of a Storage Repository Using Commonality Factoring. IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen

Design and Implementation of a Storage Repository Using Commonality Factoring. IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen Design and Implementation of a Storage Repository Using Commonality Factoring IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen Axion Overview Potentially infinite historic versioning for rollback and

More information

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE DSS Data & Diskpool and cloud storage benchmarks used in IT-DSS CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Geoffray ADDE DSS Outline I- A rational approach to storage systems evaluation

More information

NERSC File Systems and How to Use Them

NERSC File Systems and How to Use Them NERSC File Systems and How to Use Them David Turner! NERSC User Services Group! Joint Facilities User Forum on Data- Intensive Computing! June 18, 2014 The compute and storage systems 2014 Hopper: 1.3PF,

More information

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server

More information

Symantec OpenStorage Date: February 2010 Author: Tony Palmer, Senior ESG Lab Engineer

Symantec OpenStorage Date: February 2010 Author: Tony Palmer, Senior ESG Lab Engineer ESG Lab Review Symantec OpenStorage Date: February 2010 Author: Tony Palmer, Senior ESG Lab Engineer Abstract: This ESG Lab review documents hands-on testing of consolidated management and automated data

More information

Backup architectures in the modern data center. Author: Edmond van As edmond@competa.com Competa IT b.v.

Backup architectures in the modern data center. Author: Edmond van As edmond@competa.com Competa IT b.v. Backup architectures in the modern data center. Author: Edmond van As edmond@competa.com Competa IT b.v. Existing backup methods Most companies see an explosive growth in the amount of data that they have

More information

Sonexion GridRAID Characteristics

Sonexion GridRAID Characteristics Sonexion GridRAID Characteristics Mark Swan Performance Team Cray Inc. Saint Paul, Minnesota, USA mswan@cray.com Abstract This paper will present performance characteristics of the Sonexion declustered

More information

Disk-to-Disk-to-Offsite Backups for SMBs with Retrospect

Disk-to-Disk-to-Offsite Backups for SMBs with Retrospect Disk-to-Disk-to-Offsite Backups for SMBs with Retrospect Abstract Retrospect backup and recovery software provides a quick, reliable, easy-to-manage disk-to-disk-to-offsite backup solution for SMBs. Use

More information

Turnkey Deduplication Solution for the Enterprise

Turnkey Deduplication Solution for the Enterprise Symantec NetBackup 5000 Appliance Turnkey Deduplication Solution for the Enterprise Mayur Dewaikar Sr. Product Manager, Information Management Group White Paper: A Deduplication Appliance Solution for

More information

www.thinkparq.com www.beegfs.com

www.thinkparq.com www.beegfs.com www.thinkparq.com www.beegfs.com KEY ASPECTS Maximum Flexibility Maximum Scalability BeeGFS supports a wide range of Linux distributions such as RHEL/Fedora, SLES/OpenSuse or Debian/Ubuntu as well as a

More information

Lab Validation Report

Lab Validation Report Lab Validation Report Total Data Protection for the Distributed Enterprise Quantum DXi Disk-Based Backup By Tony Palmer, Senior ESG Lab Analyst and Ginny Roth, ESG Lab Analyst May 2012 Lab Validation:

More information

Simulation Platform Overview

Simulation Platform Overview Simulation Platform Overview Build, compute, and analyze simulations on demand www.rescale.com CASE STUDIES Companies in the aerospace and automotive industries use Rescale to run faster simulations Aerospace

More information

MailEnable Scalability White Paper Version 1.2

MailEnable Scalability White Paper Version 1.2 MailEnable Scalability White Paper Version 1.2 Table of Contents 1 Overview...2 2 Core architecture...3 2.1 Configuration repository...3 2.2 Storage repository...3 2.3 Connectors...3 2.3.1 SMTP Connector...3

More information

Cloudmark Slays Spam with Fusion iomemory Solutions

Cloudmark Slays Spam with Fusion iomemory Solutions CASE STUDY Cloudmark Slays Spam with Fusion iomemory Solutions Security company improves performance in several areas by an order of magnitude and cuts server footprint by more than half. Summary of Benefits

More information

Automating Big Data Benchmarking for Different Architectures with ALOJA

Automating Big Data Benchmarking for Different Architectures with ALOJA www.bsc.es Jan 2016 Automating Big Data Benchmarking for Different Architectures with ALOJA Nicolas Poggi, Postdoc Researcher Agenda 1. Intro on Hadoop performance 1. Current scenario and problematic 2.

More information

A Tutorial on Configuring and Deploying GridFTP for Managing Data Movement in Grid/HPC Environments

A Tutorial on Configuring and Deploying GridFTP for Managing Data Movement in Grid/HPC Environments A Tutorial on Configuring and Deploying GridFTP for Managing Data Movement in Grid/HPC Environments John Bresnahan Michael Link Rajkumar Kettimuthu Dan Fraser Argonne National Laboratory University of

More information

The safer, easier way to help you pass any IT exams. Exam : E20-895. Backup Recovery - Avamar Expert Exam for Implementation Engineers.

The safer, easier way to help you pass any IT exams. Exam : E20-895. Backup Recovery - Avamar Expert Exam for Implementation Engineers. http://www.51- pass.com Exam : E20-895 Title : Backup Recovery - Avamar Expert Exam for Implementation Engineers Version : Demo 1 / 7 1.An EMC Avamar customer is currently using a 2 TB Avamar Virtual Edition

More information

Integrated Grid Solutions. and Greenplum

Integrated Grid Solutions. and Greenplum EMC Perspective Integrated Grid Solutions from SAS, EMC Isilon and Greenplum Introduction Intensifying competitive pressure and vast growth in the capabilities of analytic computing platforms are driving

More information

Backup and Recovery: The Benefits of Multiple Deduplication Policies

Backup and Recovery: The Benefits of Multiple Deduplication Policies Backup and Recovery: The Benefits of Multiple Deduplication Policies NOTICE This White Paper may contain proprietary information protected by copyright. Information in this White Paper is subject to change

More information

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance

More information

America s Most Wanted a metric to detect persistently faulty machines in Hadoop

America s Most Wanted a metric to detect persistently faulty machines in Hadoop America s Most Wanted a metric to detect persistently faulty machines in Hadoop Dhruba Borthakur and Andrew Ryan dhruba,andrewr1@facebook.com Presented at IFIP Workshop on Failure Diagnosis, Chicago June

More information

WHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION?

WHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION? WHAT IS FALCONSTOR? FalconStor Optimized Backup and Deduplication is the industry s market-leading virtual tape and LAN-based deduplication solution, unmatched in performance and scalability. With virtual

More information

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Accelerating Hadoop MapReduce Using an In-Memory Data Grid Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for

More information

Data Management. Network transfers

Data Management. Network transfers Data Management Network transfers Network data transfers Not everyone needs to transfer large amounts of data on and off a HPC service Sometimes data is created and consumed on the same service. If you

More information

Automation Engine 14. Troubleshooting

Automation Engine 14. Troubleshooting 4 Troubleshooting 2-205 Contents. Troubleshooting the Server... 3. Checking the Databases... 3.2 Checking the Containers...4.3 Checking Disks...4.4.5.6.7 Checking the Network...5 Checking System Health...

More information

PetaShare: Enabling Data Intensive Science

PetaShare: Enabling Data Intensive Science PetaShare: Enabling Data Intensive Science Tevfik Kosar Center for Computation & Technology Louisiana State University June 25, 2007 The Data Deluge Scientific data outpaced Moore s Law! 2 The Lambda Blast

More information

Data Processing Solutions - A Case Study

Data Processing Solutions - A Case Study Sector & Sphere Exploring Data Parallelism and Locality in Wide Area Networks Yunhong Gu Univ. of Illinois at Chicago Robert Grossman Univ. of Illinois at Chicago and Open Data Group Overview Cloud Computing

More information

PRODUCT BRIEF 3E PERFORMANCE BENCHMARKS LOAD AND SCALABILITY TESTING

PRODUCT BRIEF 3E PERFORMANCE BENCHMARKS LOAD AND SCALABILITY TESTING PRODUCT BRIEF 3E PERFORMANCE BENCHMARKS LOAD AND SCALABILITY TESTING THE FOUNDATION Thomson Reuters Elite completed a series of performance load tests with the 3E application to verify that it could scale

More information

Big Data and Cloud Computing for GHRSST

Big Data and Cloud Computing for GHRSST Big Data and Cloud Computing for GHRSST Jean-Francois Piollé (jfpiolle@ifremer.fr) Frédéric Paul, Olivier Archer CERSAT / Institut Français de Recherche pour l Exploitation de la Mer Facing data deluge

More information

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,

More information

Deploying Riverbed wide-area data services in a LeftHand iscsi SAN Remote Disaster Recovery Solution

Deploying Riverbed wide-area data services in a LeftHand iscsi SAN Remote Disaster Recovery Solution Wide-area data services (WDS) Accelerating Remote Disaster Recovery Reduce Replication Windows and transfer times leveraging your existing WAN Deploying Riverbed wide-area data services in a LeftHand iscsi

More information

Webnet2000 DataCentre

Webnet2000 DataCentre Webnet2000 DataCentre WEBNET2000 have been enabling organisations develop their Internet presence for over 10 Years. The Webnet2000 Datacentre features the very latest world class resilient infrastructure,

More information

Building a Parallel Cloud Storage System using OpenStack s Swift Object Store and Transformative Parallel I/O

Building a Parallel Cloud Storage System using OpenStack s Swift Object Store and Transformative Parallel I/O Building a Parallel Cloud Storage System using OpenStack s Swift Object Store and Transformative Parallel I/O or Parallel Cloud Storage as an Alternative Archive Solution Kaleb Lora Andrew AJ Burns Martel

More information

HPC performance applications on Virtual Clusters

HPC performance applications on Virtual Clusters Panagiotis Kritikakos EPCC, School of Physics & Astronomy, University of Edinburgh, Scotland - UK pkritika@epcc.ed.ac.uk 4 th IC-SCCE, Athens 7 th July 2010 This work investigates the performance of (Java)

More information

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database WHITE PAPER Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive

More information

Ground up Introduction to In-Memory Data (Grids)

Ground up Introduction to In-Memory Data (Grids) Ground up Introduction to In-Memory Data (Grids) QCON 2015 NEW YORK, NY 2014 Hazelcast Inc. Why you here? 2014 Hazelcast Inc. Java Developer on a quest for scalability frameworks Architect on low-latency

More information

HPC @ CRIBI. Calcolo Scientifico e Bioinformatica oggi Università di Padova 13 gennaio 2012

HPC @ CRIBI. Calcolo Scientifico e Bioinformatica oggi Università di Padova 13 gennaio 2012 HPC @ CRIBI Calcolo Scientifico e Bioinformatica oggi Università di Padova 13 gennaio 2012 what is exact? experience on advanced computational technologies a company lead by IT experts with a strong background

More information

HiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group

HiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group HiBench Introduction Carson Wang (carson.wang@intel.com) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is

More information

Performance Analysis of Mixed Distributed Filesystem Workloads

Performance Analysis of Mixed Distributed Filesystem Workloads Performance Analysis of Mixed Distributed Filesystem Workloads Esteban Molina-Estolano, Maya Gokhale, Carlos Maltzahn, John May, John Bent, Scott Brandt Motivation Hadoop-tailored filesystems (e.g. CloudStore)

More information

File Transfer Best Practices

File Transfer Best Practices File Transfer Best Practices David Turner User Services Group NERSC User Group Meeting October 2, 2008 Overview Available tools ftp, scp, bbcp, GridFTP, hsi/htar Examples and Performance LAN WAN Reliability

More information

Parallels Cloud Server 6.0

Parallels Cloud Server 6.0 Parallels Cloud Server 6.0 Parallels Cloud Storage I/O Benchmarking Guide September 05, 2014 Copyright 1999-2014 Parallels IP Holdings GmbH and its affiliates. All rights reserved. Parallels IP Holdings

More information

Quantum StorNext. Product Brief: Distributed LAN Client

Quantum StorNext. Product Brief: Distributed LAN Client Quantum StorNext Product Brief: Distributed LAN Client NOTICE This product brief may contain proprietary information protected by copyright. Information in this product brief is subject to change without

More information

Guideline for stresstest Page 1 of 6. Stress test

Guideline for stresstest Page 1 of 6. Stress test Guideline for stresstest Page 1 of 6 Stress test Objective: Show unacceptable problems with high parallel load. Crash, wrong processing, slow processing. Test Procedure: Run test cases with maximum number

More information

SMB Direct for SQL Server and Private Cloud

SMB Direct for SQL Server and Private Cloud SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server

More information

Data Storage At the Heart of any Information System. Ken Claffey, VP/GM - June 2015

Data Storage At the Heart of any Information System. Ken Claffey, VP/GM - June 2015 Data Storage At the Heart of any Information System Ken Claffey, VP/GM - June 2015 Seagate: A Unique Vantage Point on the Data Centre Evolution of the world s digital information End-to-end cloud solutions:

More information

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) ( TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx

More information

Veeam Cloud Connect. Version 8.0. Administrator Guide

Veeam Cloud Connect. Version 8.0. Administrator Guide Veeam Cloud Connect Version 8.0 Administrator Guide April, 2015 2015 Veeam Software. All rights reserved. All trademarks are the property of their respective owners. No part of this publication may be

More information

Characterize Performance in Horizon 6

Characterize Performance in Horizon 6 EUC2027 Characterize Performance in Horizon 6 Banit Agrawal VMware, Inc Staff Engineer II Rasmus Sjørslev VMware, Inc Senior EUC Architect Disclaimer This presentation may contain product features that

More information