A RAM-disk provisioning service for high performance data analysis
|
|
- Nelson Pope
- 7 years ago
- Views:
Transcription
1 A RAM-disk provisioning service for high performance data analysis Allan Espinosa Mentors: M. Woitaszek and J. Dennis University of Chicago, National Center for Atmospheric Research July 29, / 64
2 Outline 1 Motivation: data analysis 2 Approach and challenges 3 Implementation 4 Target applications 5 Conclusions 2 / 64
3 Motivation: data-intensive post-processing Computing center Simulation results Analysis cluster Transfer nodes Spinning disk-based parallel file system 3 / 64
4 Motivation: data-intensive post-processing Computing center Simulation results Analysis cluster Transfer nodes Analysis 1 Spinning disk-based parallel file system Tape Archive 4 / 64
5 Motivation: data-intensive post-processing Computing center Simulation results Analysis cluster Transfer nodes Analysis 1 Spinning disk-based parallel file system Tape Archive 5 / 64
6 Motivation: data-intensive post-processing Computing center Simulation results Analysis cluster Transfer nodes Analysis 1 Analysis 2 Spinning disk-based parallel file system Tape Archive 6 / 64
7 Motivation: data-intensive post-processing Computing center Simulation results Analysis cluster Transfer nodes Analysis 1 Analysis 2 Analysis n... Spinning disk-based parallel file system Tape Archive 7 / 64
8 Motivation: data-intensive post-processing Computing center Simulation results Analysis cluster Transfer nodes Analysis 1 Analysis 2 Analysis n... Spinning disk-based parallel file system Tape Archive Multiple trips to disk is slow 8 / 64
9 Approach: Run analysis on RAM Fast I/O access 9 / 64
10 Approach: Run analysis on RAM Fast I/O access tmpfs or formatted /dev/ram Analysis node CPU CPU RAM-based disk Problem: Restricted parallelism 10 / 64
11 Approach: Run analysis on RAM Fast I/O access tmpfs or formatted /dev/ram NFS-exported RAM CPU CPU RAM-based disk CPU CPU Problem: Restricted data size 11 / 64
12 Approach: Run analysis on RAM Fast I/O access tmpfs or formatted /dev/ram NFS-exported RAM Split data over multiple nodes CPU CPU RAM-based disk CPU CPU RAM-based disk Problem: Requires thorough I/O management 12 / 64
13 Approach: Run analysis on RAM CPU CPU CPU CPU Fast I/O access tmpfs or formatted /dev/ram NFS-exported RAM Split data over multiple nodes Lustre parallel RAM file system Lustre parallel RAM file system CPU CPU CPU CPU 13 / 64
14 Solution: Automatically-provisioned parallel file system Polynya analysis cluster User Client Submit jobs Scheduler 14 / 64
15 Solution: Automatically-provisioned parallel file system Polynya analysis cluster User Client Submit jobs Scheduler Control Node Parallel RAM file system 15 / 64
16 Solution: Automatically-provisioned parallel file system Kraken Polynya analysis cluster File system Transfer Node User Client Submit jobs Scheduler WAN Control Node Transfer Node Parallel RAM file system 16 / 64
17 Solution: Automatically-provisioned parallel file system Kraken Polynya analysis cluster File system Transfer Node User Client Submit jobs Scheduler WAN Control Node Transfer Node Analysis Nodes Parallel RAM file system 17 / 64
18 Solution: Automatically-provisioned parallel file system Kraken Polynya analysis cluster File system Transfer Node User Client Submit jobs Scheduler WAN Control Node Transfer Node Analysis Nodes Archive Node Parallel RAM file system Tape Archive 18 / 64
19 Remote triggering the workflow Kraken Simulation finishes Trigger workflow Polynya Workflow 19 / 64
20 Remote triggering the workflow Kraken Simulation finishes Trigger workflow Polynya Workflow Request space 20 / 64
21 Remote triggering the workflow Kraken Simulation finishes Trigger workflow Polynya Workflow Request space Transfer datasets 21 / 64
22 Remote triggering the workflow Kraken Simulation finishes Trigger workflow Polynya Workflow Archive datasets Request space Transfer datasets Run analysis 22 / 64
23 Remote triggering the workflow Kraken Simulation finishes Trigger workflow Polynya Workflow Archive datasets Request space Transfer datasets Trigger cleanup Run analysis 23 / 64
24 Requesting RAM-based disk space Implementation: PBS Torque+Maui scheduler generic resource 24 / 64
25 Requesting RAM-based disk space Implementation: PBS Torque+Maui scheduler generic resource Parameters: amount of space #PBS -W #PBS -l walltime="48:00:00" #PBS -q ramdisk_service #PBS -l prologue=allocate.sh #PBS -l epilogue=cleanup.sh sleep 45h mail sleep 3h 25 / 64
26 Requesting RAM-based disk space Implementation: PBS Torque+Maui scheduler generic resource Parameters: amount of space duration of allocation #PBS -W #PBS -l walltime="48:00:00" #PBS -q ramdisk_service #PBS -l prologue=allocate.sh #PBS -l epilogue=cleanup.sh sleep 45h mail sleep 3h 26 / 64
27 Requesting RAM-based disk space Implementation: PBS Torque+Maui scheduler generic resource Parameters: amount of space duration of allocation 1 Route to control node #PBS -W x="gres:ramdisk@25" #PBS -l walltime="48:00:00" #PBS -q ramdisk_service #PBS -l prologue=allocate.sh #PBS -l epilogue=cleanup.sh sleep 45h mail user@cluster... sleep 3h 27 / 64
28 Requesting RAM-based disk space Implementation: PBS Torque+Maui scheduler generic resource Parameters: amount of space duration of allocation 1 Route to control node 2 Prepare space #PBS -W x="gres:ramdisk@25" #PBS -l walltime="48:00:00" #PBS -q ramdisk_service #PBS -l prologue=allocate.sh #PBS -l epilogue=cleanup.sh sleep 45h mail user@cluster... sleep 3h 28 / 64
29 Requesting RAM-based disk space Implementation: PBS Torque+Maui scheduler generic resource Parameters: amount of space duration of allocation 1 Route to control node 2 Prepare space 3 Sleep until allocation expiration #PBS -W x="gres:ramdisk@25" #PBS -l walltime="48:00:00" #PBS -q ramdisk_service #PBS -l prologue=allocate.sh #PBS -l epilogue=cleanup.sh sleep 45h mail user@cluster... sleep 3h 29 / 64
30 Requesting RAM-based disk space Implementation: PBS Torque+Maui scheduler generic resource Parameters: amount of space duration of allocation 1 Route to control node 2 Prepare space 3 Sleep until allocation expiration 4 notice before expiration #PBS -W x="gres:ramdisk@25" #PBS -l walltime="48:00:00" #PBS -q ramdisk_service #PBS -l prologue=allocate.sh #PBS -l epilogue=cleanup.sh sleep 45h mail user@cluster... sleep 3h 30 / 64
31 Requesting RAM-based disk space Implementation: PBS Torque+Maui scheduler generic resource Parameters: amount of space duration of allocation 1 Route to control node 2 Prepare space 3 Sleep until allocation expiration 4 notice before expiration 5 Clean up space #PBS -W x="gres:ramdisk@25" #PBS -l walltime="48:00:00" #PBS -q ramdisk_service #PBS -l prologue=allocate.sh #PBS -l epilogue=cleanup.sh sleep 45h mail user@cluster... sleep 3h 31 / 64
32 Transferring datasets Implementation: Route request to transfer nodes Striped GridFTP data nodes 32 / 64
33 Transferring datasets Implementation: Route request to transfer nodes Striped GridFTP data nodes Co-located as RAM-based disk space provider 33 / 64
34 Transferring datasets Implementation: Route request to transfer nodes Striped GridFTP data nodes Co-located as RAM-based disk space provider Other administrative components: GridFTP control channel server 34 / 64
35 Transferring datasets Implementation: Route request to transfer nodes Striped GridFTP data nodes Co-located as RAM-based disk space provider Other administrative components: GridFTP control channel server Key-authenticated SSH Remote trigger mechanism 35 / 64
36 Transferring datasets Implementation: Route request to transfer nodes Striped GridFTP data nodes Co-located as RAM-based disk space provider Other administrative components: GridFTP control channel server Key-authenticated SSH X509-authenticaed GRAM5 Remote trigger mechanism 36 / 64
37 Example application: AMWG diagnostics Compares CESM simulation data, observational data, reanalysis data 37 / 64
38 Example application: AMWG diagnostics Compares CESM simulation data, observational data, reanalysis data Parallel implementation in Swift Parallel scripting engine 38 / 64
39 Example application: AMWG diagnostics Compares CESM simulation data, observational data, reanalysis data Parallel implementation in Swift Parameters: dataset name number of time segments (years) Parallel scripting engine 39 / 64
40 Example application: AMWG diagnostics Compares CESM simulation data, observational data, reanalysis data Parallel implementation in Swift Parameters: dataset name number of time segments (years) Dataset volume: 2.8 GB per year (1 data) Parallel scripting engine 40 / 64
41 Data movement benchmarks File system /dev/null 3,190 Lustre disk 111 tmpfs RAM 2,983 XFS RAM 2,296 Lustre RAM 2,881 IOR-8 GridFTP to Polynya Write from Frost from Kraken units in MB/s from D. Duplyakin s experiments 41 / 64
42 Data movement benchmarks File system IOR-8 GridFTP to Polynya Write from Frost from Kraken /dev/null 3, Lustre disk tmpfs RAM 2, XFS RAM 2, Lustre RAM 2, units in MB/s from D. Duplyakin s experiments 32 MB TCP buffer, 16 MB block size, 4 streams 42 / 64
43 Data movement benchmarks File system IOR-8 GridFTP to Polynya Write from Frost from Kraken /dev/null 3, Lustre disk tmpfs RAM 2, XFS RAM 2, Lustre RAM 2, units in MB/s from D. Duplyakin s experiments 32 MB TCP buffer, 16 MB block size, 16 streams 43 / 64
44 Data movement benchmarks File system IOR-8 GridFTP to Polynya Write from Frost from Kraken /dev/null 3, Lustre disk tmpfs RAM 2, XFS RAM 2, Lustre RAM 2, GridFTP from Kraken to Frost: 216 MB/s units in MB/s from D. Duplyakin s experiments 32 MB TCP buffer, 16 MB block size, 16 streams 44 / 64
45 Application performance Ran on 64-CPU node, 2-year time segment (8.2 GB total) File system Runtime (s) Lustre disk 213 tmpfs RAM 29 XFS RAM 29 Lustre RAM / 64
46 Application performance From Frost: Lustre disk tmpfs RAM XFS RAM Data Transfer AMWG Analysis Lustre RAM Time (s) 46 / 64
47 End-to-end workflow Request space Time (s) 47 / 64
48 End-to-end workflow Request space Transfer Time (s) 48 / 64
49 End-to-end workflow Request space Transfer Time (s) 49 / 64
50 End-to-end workflow Request space Transfer Analysis 1 Analysis 2... Analysis n Archive Time (s) 50 / 64
51 End-to-end workflow Request space Transfer Analysis 1 Analysis 2... Analysis n Archive Time (s) 51 / 64
52 End-to-end workflow Request space Transfer Analysis 1 Analysis 2... Analysis n Cleanup Archive Time (s) 52 / 64
53 End-to-end workflow Request space Transfer Analysis 1 Analysis 2... Analysis n Cleanup Archive Time (s) 53 / 64
54 Other use case: Interactive jobs Automated workflow split component wise 54 / 64
55 Other use case: Interactive jobs Automated workflow split component wise Each step is run by the user manually 55 / 64
56 Other use case: Interactive jobs Steps: Automated workflow split component wise Each step is run by the user manually 1 Request space 2 Transfers data to allocated space (globus-url-copy or Globus Online) 3 Runs analysis on allocated space 4 notice before expiration 5 Cleanup by deleting request job 56 / 64
57 Conclusions End-to-end analysis platform without touching spinning disk 57 / 64
58 Conclusions End-to-end analysis platform without touching spinning disk Interface through familiar PBS interface 58 / 64
59 Conclusions End-to-end analysis platform without touching spinning disk Interface through familiar PBS interface Workflow automation to drive analysis 59 / 64
60 Conclusions End-to-end analysis platform without touching spinning disk Interface through familiar PBS interface Workflow automation to drive analysis Network bandwidth critical to performance 60 / 64
61 Conclusions End-to-end analysis platform without touching spinning disk Interface through familiar PBS interface Workflow automation to drive analysis Network bandwidth critical to performance Future work: Tune network for high performance data movement 61 / 64
62 Conclusions End-to-end analysis platform without touching spinning disk Interface through familiar PBS interface Workflow automation to drive analysis Network bandwidth critical to performance Future work: Tune network for high performance data movement Application-perspective file system scalability 62 / 64
63 Conclusions End-to-end analysis platform without touching spinning disk Interface through familiar PBS interface Workflow automation to drive analysis Network bandwidth critical to performance Future work: Tune network for high performance data movement Application-perspective file system scalability Explore framework on other resources: disk, bandwidth, etc. 63 / 64
64 Questions? A RAM-disk provisioning service for high performance data analysis Allan Espinosa (aespinosa@cs.uchicago.edu) Mentors: M. Woitaszek and J. Dennis University of Chicago, National Center for Atmospheric Research July 29, / 64
Data Movement and Storage. Drew Dolgert and previous contributors
Data Movement and Storage Drew Dolgert and previous contributors Data Intensive Computing Location Viewing Manipulation Storage Movement Sharing Interpretation $HOME $WORK $SCRATCH 72 is a Lot, Right?
More informationGlobus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago
Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago Outline Introduction Features Motivation Architecture Globus XIO Experimental Results 3 August 2005 The Ohio State University
More informationIT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez
IT of SPIM Data Storage and Compression EMBO Course - August 27th Jeff Oegema, Peter Steinbach, Oscar Gonzalez 1 Talk Outline Introduction and the IT Team SPIM Data Flow Capture, Compression, and the Data
More informationWorkload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace
Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace Beth Plale Indiana University plale@cs.indiana.edu LEAD TR 001, V3.0 V3.0 dated January 24, 2007 V2.0 dated August
More informationPerformance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems
Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Rekha Singhal and Gabriele Pacciucci * Other names and brands may be claimed as the property of others. Lustre File
More informationTHE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid
THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING José Daniel García Sánchez ARCOS Group University Carlos III of Madrid Contents 2 The ARCOS Group. Expand motivation. Expand
More informationIMPLEMENTING GREEN IT
Saint Petersburg State University of Information Technologies, Mechanics and Optics Department of Telecommunication Systems IMPLEMENTING GREEN IT APPROACH FOR TRANSFERRING BIG DATA OVER PARALLEL DATA LINK
More informationWrangler: A New Generation of Data-intensive Supercomputing. Christopher Jordan, Siva Kulasekaran, Niall Gaffney
Wrangler: A New Generation of Data-intensive Supercomputing Christopher Jordan, Siva Kulasekaran, Niall Gaffney Project Partners Academic partners: TACC Primary system design, deployment, and operations
More informationEnterprise Architectures for Large Tiled Basemap Projects. Tommy Fauvell
Enterprise Architectures for Large Tiled Basemap Projects Tommy Fauvell Tommy Fauvell Senior Technical Analyst Esri Professional Services Washington D.C Regional Office Project Technical Lead: - Responsible
More informationAlternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix
Alternative Deployment Models for Cloud Computing in HPC Applications Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix The case for Cloud in HPC Build it in house Assemble in the cloud?
More informationAn Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing
An Alternative Storage Solution for MapReduce Eric Lomascolo Director, Solutions Marketing MapReduce Breaks the Problem Down Data Analysis Distributes processing work (Map) across compute nodes and accumulates
More informationInvestigation of storage options for scientific computing on Grid and Cloud facilities
Investigation of storage options for scientific computing on Grid and Cloud facilities Overview Hadoop Test Bed Hadoop Evaluation Standard benchmarks Application-based benchmark Blue Arc Evaluation Standard
More informationEstimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010
Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010 This document is provided as-is. Information and views expressed in this document, including URL and other Internet
More informationLustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.
Reference Architecture Designing High-Performance Storage Tiers Designing High-Performance Storage Tiers Intel Enterprise Edition for Lustre* software and Intel Non-Volatile Memory Express (NVMe) Storage
More informationData Transfer and Filesystems
Data Transfer and Filesystems 07/29/2010 Mahidhar Tatineni, SDSC Acknowledgements: Lonnie Crosby, NICS Chris Jordan, TACC Steve Simms, IU Patricia Kovatch, NICS Phil Andrews, NICS Background Rapid growth
More informationLustre * Filesystem for Cloud and Hadoop *
OpenFabrics Software User Group Workshop Lustre * Filesystem for Cloud and Hadoop * Robert Read, Intel Lustre * for Cloud and Hadoop * Brief Lustre History and Overview Using Lustre with Hadoop Intel Cloud
More informationPerformance And Scalability In Oracle9i And SQL Server 2000
Performance And Scalability In Oracle9i And SQL Server 2000 Presented By : Phathisile Sibanda Supervisor : John Ebden 1 Presentation Overview Project Objectives Motivation -Why performance & Scalability
More informationMicrosoft SQL Server 2005 on Windows Server 2003
EMC Backup and Recovery for SAP Microsoft SQL Server 2005 on Windows Server 2003 Enabled by EMC CLARiiON CX3, EMC Disk Library, EMC Replication Manager, EMC NetWorker, and Symantec Veritas NetBackup Reference
More informationData Center Specific Thermal and Energy Saving Techniques
Data Center Specific Thermal and Energy Saving Techniques Tausif Muzaffar and Xiao Qin Department of Computer Science and Software Engineering Auburn University 1 Big Data 2 Data Centers In 2013, there
More informationLong term retention and archiving the challenges and the solution
Long term retention and archiving the challenges and the solution NAME: Yoel Ben-Ari TITLE: VP Business Development, GH Israel 1 Archive Before Backup EMC recommended practice 2 1 Backup/recovery process
More informationCurrent Status of FEFS for the K computer
Current Status of FEFS for the K computer Shinji Sumimoto Fujitsu Limited Apr.24 2012 LUG2012@Austin Outline RIKEN and Fujitsu are jointly developing the K computer * Development continues with system
More informationBenchmarking FreeBSD. Ivan Voras <ivoras@freebsd.org>
Benchmarking FreeBSD Ivan Voras What and why? Everyone likes a nice benchmark graph :) And it's nice to keep track of these things The previous major run comparing FreeBSD to Linux
More informationEMC NETWORKER AND DATADOMAIN
EMC NETWORKER AND DATADOMAIN Capabilities, options and news Madis Pärn Senior Technology Consultant EMC madis.parn@emc.com 1 IT Pressures 2009 0.8 Zettabytes 2020 35.2 Zettabytes DATA DELUGE BUDGET DILEMMA
More informationUnderstanding the Benefits of IBM SPSS Statistics Server
IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster
More informationManaging Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery
Center for Information Services and High Performance Computing (ZIH) Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery Richard Grunzke*, Jens Krüger, Sandra Gesing, Sonja
More informationAdvanced Techniques with Newton. Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011
Advanced Techniques with Newton Gerald Ragghianti Advanced Newton workshop Sept. 22, 2011 Workshop Goals Gain independence Executing your work Finding Information Fixing Problems Optimizing Effectiveness
More informationOPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006
OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006 EXECUTIVE SUMMARY Microsoft Exchange Server is a disk-intensive application that requires high speed storage to deliver
More informationArchival Storage Systems and Data Moveability
Configuring and Tuning Archival Storage Systems Reagan Moore, Joseph Lopez, Charles Lofton, Wayne Schroeder, George Kremenek San Diego Supercomputer Center Michael Gleicher Gleicher Enterprises, LLC Abstract
More informationRing Protection: Wrapping vs. Steering
Ring Protection: Wrapping vs. Steering Necdet Uzun and Pinar Yilmaz March 13, 2001 Contents Objectives What are wrapping and steering Single/dual fiber cut Comparison of wrapping and steering Simulation
More informationDell One Identity Manager Scalability and Performance
Dell One Identity Manager Scalability and Performance Scale up and out to ensure simple, effective governance for users. Abstract For years, organizations have had to be able to support user communities
More informationUnderstanding Hadoop Performance on Lustre
Understanding Hadoop Performance on Lustre Stephen Skory, PhD Seagate Technology Collaborators Kelsie Betsch, Daniel Kaslovsky, Daniel Lingenfelter, Dimitar Vlassarev, and Zhenzhen Yan LUG Conference 15
More informationConfiguring Apache Derby for Performance and Durability Olav Sandstå
Configuring Apache Derby for Performance and Durability Olav Sandstå Database Technology Group Sun Microsystems Trondheim, Norway Overview Background > Transactions, Failure Classes, Derby Architecture
More informationDivision of Student Affairs Email Quota Practices / Guidelines
Division of Student Affairs Email Quota Practices / Guidelines Table of Contents Quota Rules:... 1 Mailbox Organization:... 2 Mailbox Folders... 2 Mailbox Rules... 2 Mailbox Size Monitoring:... 3 Using
More informationSystem Administration of Windchill 10.2
System Administration of Windchill 10.2 Overview Course Code Course Length TRN-4340-T 3 Days In this course, you will gain an understanding of how to perform routine Windchill system administration tasks,
More informationAn Architecture for Dynamic Allocation of Compute Cluster Bandwidth
1 An Architecture for Dynamic Allocation of Compute Cluster Bandwidth John Bresnahan 1,2,3, Ian Foster 1,2,3 1 Math and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439 2 Computation
More informationowncloud Enterprise Edition on IBM Infrastructure
owncloud Enterprise Edition on IBM Infrastructure A Performance and Sizing Study for Large User Number Scenarios Dr. Oliver Oberst IBM Frank Karlitschek owncloud Page 1 of 10 Introduction One aspect of
More informationA Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
More informationNetApp High-Performance Computing Solution for Lustre: Solution Guide
Technical Report NetApp High-Performance Computing Solution for Lustre: Solution Guide Robert Lai, NetApp August 2012 TR-3997 TABLE OF CONTENTS 1 Introduction... 5 1.1 NetApp HPC Solution for Lustre Introduction...5
More informationDesigning a Backup Architecture That Actually Works
Designing a up rchitecture That ctually Works W. Curtis Preston President/CEO The Storage Group curtis@thestoragegroup.com General IBM IBM d i gi t a l General HEWLETT PCKRD What will we cover? What are
More informationScience DMZs Understanding their role in high-performance data transfers
Science DMZs Understanding their role in high-performance data transfers Chris Tracy, Network Engineer Eli Dart, Network Engineer ESnet Engineering Group Overview Bulk Data Movement a common task Pieces
More informationScala Storage Scale-Out Clustered Storage White Paper
White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current
More informationHPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk
HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training
More informationTechnical Writing - A Practical Case Study on ehl 2004r3 Scalability testing
ehl 2004r3 Scalability Whitepaper Published: 10/11/2005 Version: 1.1 Table of Contents Executive Summary... 3 Introduction... 4 Test setup and Methodology... 5 Automated tests... 5 Database... 5 Methodology...
More informationSAS Grid Manager Testing and Benchmarking Best Practices for SAS Intelligence Platform
SAS Grid Manager Testing and Benchmarking Best Practices for SAS Intelligence Platform INTRODUCTION Grid computing offers optimization of applications that analyze enormous amounts of data as well as load
More informationGARUDA - NKN Partner's Meet 2015 Big data networks and TCP
GARUDA - NKN Partner's Meet 2015 Big data networks and TCP Brij Kishor Jashal Email brij.jashal@tifr.res.in Garuda-NKN meet 10 Sep 2015 1 Outline: Scale of LHC computing ( as an example of Big data network
More informationPrepared by: ServiceSPAN. in cooperation with. Sun Microsystems. Application Load Test March 18, 2007 Version 3
Prepared by: ServiceSPAN in cooperation with Sun Microsystems Application Load Test March 18, 2007 Version 3 Introduction ServiceSPAN, a provider of work center automation software to enterprise businesses,
More informationScalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
More informationScalable Cloud Computing Solutions for Next Generation Sequencing Data
Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of
More informationChao Chen 1 Michael Lang 2 Yong Chen 1. IEEE BigData, 2013. Department of Computer Science Texas Tech University
Chao Chen 1 Michael Lang 2 1 1 Data-Intensive Scalable Laboratory Department of Computer Science Texas Tech University 2 Los Alamos National Laboratory IEEE BigData, 2013 Outline 1 2 3 4 Outline 1 2 3
More informationImprovement Options for LHC Mass Storage and Data Management
Improvement Options for LHC Mass Storage and Data Management Dirk Düllmann HEPIX spring meeting @ CERN, 7 May 2008 Outline DM architecture discussions in IT Data Management group Medium to long term data
More informationGrid Data Management. Raj Kettimuthu
Grid Data Management Raj Kettimuthu Data Management Distributed community of users need to access and analyze large amounts of data Fusion community s International ITER project Requirement arises in both
More informationDesign and Implementation of a Storage Repository Using Commonality Factoring. IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen
Design and Implementation of a Storage Repository Using Commonality Factoring IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen Axion Overview Potentially infinite historic versioning for rollback and
More informationDSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE
DSS Data & Diskpool and cloud storage benchmarks used in IT-DSS CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Geoffray ADDE DSS Outline I- A rational approach to storage systems evaluation
More informationNERSC File Systems and How to Use Them
NERSC File Systems and How to Use Them David Turner! NERSC User Services Group! Joint Facilities User Forum on Data- Intensive Computing! June 18, 2014 The compute and storage systems 2014 Hopper: 1.3PF,
More informationDirect NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server
More informationSymantec OpenStorage Date: February 2010 Author: Tony Palmer, Senior ESG Lab Engineer
ESG Lab Review Symantec OpenStorage Date: February 2010 Author: Tony Palmer, Senior ESG Lab Engineer Abstract: This ESG Lab review documents hands-on testing of consolidated management and automated data
More informationBackup architectures in the modern data center. Author: Edmond van As edmond@competa.com Competa IT b.v.
Backup architectures in the modern data center. Author: Edmond van As edmond@competa.com Competa IT b.v. Existing backup methods Most companies see an explosive growth in the amount of data that they have
More informationSonexion GridRAID Characteristics
Sonexion GridRAID Characteristics Mark Swan Performance Team Cray Inc. Saint Paul, Minnesota, USA mswan@cray.com Abstract This paper will present performance characteristics of the Sonexion declustered
More informationDisk-to-Disk-to-Offsite Backups for SMBs with Retrospect
Disk-to-Disk-to-Offsite Backups for SMBs with Retrospect Abstract Retrospect backup and recovery software provides a quick, reliable, easy-to-manage disk-to-disk-to-offsite backup solution for SMBs. Use
More informationTurnkey Deduplication Solution for the Enterprise
Symantec NetBackup 5000 Appliance Turnkey Deduplication Solution for the Enterprise Mayur Dewaikar Sr. Product Manager, Information Management Group White Paper: A Deduplication Appliance Solution for
More informationwww.thinkparq.com www.beegfs.com
www.thinkparq.com www.beegfs.com KEY ASPECTS Maximum Flexibility Maximum Scalability BeeGFS supports a wide range of Linux distributions such as RHEL/Fedora, SLES/OpenSuse or Debian/Ubuntu as well as a
More informationLab Validation Report
Lab Validation Report Total Data Protection for the Distributed Enterprise Quantum DXi Disk-Based Backup By Tony Palmer, Senior ESG Lab Analyst and Ginny Roth, ESG Lab Analyst May 2012 Lab Validation:
More informationSimulation Platform Overview
Simulation Platform Overview Build, compute, and analyze simulations on demand www.rescale.com CASE STUDIES Companies in the aerospace and automotive industries use Rescale to run faster simulations Aerospace
More informationMailEnable Scalability White Paper Version 1.2
MailEnable Scalability White Paper Version 1.2 Table of Contents 1 Overview...2 2 Core architecture...3 2.1 Configuration repository...3 2.2 Storage repository...3 2.3 Connectors...3 2.3.1 SMTP Connector...3
More informationCloudmark Slays Spam with Fusion iomemory Solutions
CASE STUDY Cloudmark Slays Spam with Fusion iomemory Solutions Security company improves performance in several areas by an order of magnitude and cuts server footprint by more than half. Summary of Benefits
More informationAutomating Big Data Benchmarking for Different Architectures with ALOJA
www.bsc.es Jan 2016 Automating Big Data Benchmarking for Different Architectures with ALOJA Nicolas Poggi, Postdoc Researcher Agenda 1. Intro on Hadoop performance 1. Current scenario and problematic 2.
More informationA Tutorial on Configuring and Deploying GridFTP for Managing Data Movement in Grid/HPC Environments
A Tutorial on Configuring and Deploying GridFTP for Managing Data Movement in Grid/HPC Environments John Bresnahan Michael Link Rajkumar Kettimuthu Dan Fraser Argonne National Laboratory University of
More informationThe safer, easier way to help you pass any IT exams. Exam : E20-895. Backup Recovery - Avamar Expert Exam for Implementation Engineers.
http://www.51- pass.com Exam : E20-895 Title : Backup Recovery - Avamar Expert Exam for Implementation Engineers Version : Demo 1 / 7 1.An EMC Avamar customer is currently using a 2 TB Avamar Virtual Edition
More informationIntegrated Grid Solutions. and Greenplum
EMC Perspective Integrated Grid Solutions from SAS, EMC Isilon and Greenplum Introduction Intensifying competitive pressure and vast growth in the capabilities of analytic computing platforms are driving
More informationBackup and Recovery: The Benefits of Multiple Deduplication Policies
Backup and Recovery: The Benefits of Multiple Deduplication Policies NOTICE This White Paper may contain proprietary information protected by copyright. Information in this White Paper is subject to change
More informationHadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela
Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance
More informationAmerica s Most Wanted a metric to detect persistently faulty machines in Hadoop
America s Most Wanted a metric to detect persistently faulty machines in Hadoop Dhruba Borthakur and Andrew Ryan dhruba,andrewr1@facebook.com Presented at IFIP Workshop on Failure Diagnosis, Chicago June
More informationWHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION?
WHAT IS FALCONSTOR? FalconStor Optimized Backup and Deduplication is the industry s market-leading virtual tape and LAN-based deduplication solution, unmatched in performance and scalability. With virtual
More informationAccelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
More informationData Management. Network transfers
Data Management Network transfers Network data transfers Not everyone needs to transfer large amounts of data on and off a HPC service Sometimes data is created and consumed on the same service. If you
More informationAutomation Engine 14. Troubleshooting
4 Troubleshooting 2-205 Contents. Troubleshooting the Server... 3. Checking the Databases... 3.2 Checking the Containers...4.3 Checking Disks...4.4.5.6.7 Checking the Network...5 Checking System Health...
More informationPetaShare: Enabling Data Intensive Science
PetaShare: Enabling Data Intensive Science Tevfik Kosar Center for Computation & Technology Louisiana State University June 25, 2007 The Data Deluge Scientific data outpaced Moore s Law! 2 The Lambda Blast
More informationData Processing Solutions - A Case Study
Sector & Sphere Exploring Data Parallelism and Locality in Wide Area Networks Yunhong Gu Univ. of Illinois at Chicago Robert Grossman Univ. of Illinois at Chicago and Open Data Group Overview Cloud Computing
More informationPRODUCT BRIEF 3E PERFORMANCE BENCHMARKS LOAD AND SCALABILITY TESTING
PRODUCT BRIEF 3E PERFORMANCE BENCHMARKS LOAD AND SCALABILITY TESTING THE FOUNDATION Thomson Reuters Elite completed a series of performance load tests with the 3E application to verify that it could scale
More informationBig Data and Cloud Computing for GHRSST
Big Data and Cloud Computing for GHRSST Jean-Francois Piollé (jfpiolle@ifremer.fr) Frédéric Paul, Olivier Archer CERSAT / Institut Français de Recherche pour l Exploitation de la Mer Facing data deluge
More informationData Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com
Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,
More informationDeploying Riverbed wide-area data services in a LeftHand iscsi SAN Remote Disaster Recovery Solution
Wide-area data services (WDS) Accelerating Remote Disaster Recovery Reduce Replication Windows and transfer times leveraging your existing WAN Deploying Riverbed wide-area data services in a LeftHand iscsi
More informationWebnet2000 DataCentre
Webnet2000 DataCentre WEBNET2000 have been enabling organisations develop their Internet presence for over 10 Years. The Webnet2000 Datacentre features the very latest world class resilient infrastructure,
More informationBuilding a Parallel Cloud Storage System using OpenStack s Swift Object Store and Transformative Parallel I/O
Building a Parallel Cloud Storage System using OpenStack s Swift Object Store and Transformative Parallel I/O or Parallel Cloud Storage as an Alternative Archive Solution Kaleb Lora Andrew AJ Burns Martel
More informationHPC performance applications on Virtual Clusters
Panagiotis Kritikakos EPCC, School of Physics & Astronomy, University of Edinburgh, Scotland - UK pkritika@epcc.ed.ac.uk 4 th IC-SCCE, Athens 7 th July 2010 This work investigates the performance of (Java)
More informationImprove Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database
WHITE PAPER Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive
More informationGround up Introduction to In-Memory Data (Grids)
Ground up Introduction to In-Memory Data (Grids) QCON 2015 NEW YORK, NY 2014 Hazelcast Inc. Why you here? 2014 Hazelcast Inc. Java Developer on a quest for scalability frameworks Architect on low-latency
More informationHPC @ CRIBI. Calcolo Scientifico e Bioinformatica oggi Università di Padova 13 gennaio 2012
HPC @ CRIBI Calcolo Scientifico e Bioinformatica oggi Università di Padova 13 gennaio 2012 what is exact? experience on advanced computational technologies a company lead by IT experts with a strong background
More informationHiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group
HiBench Introduction Carson Wang (carson.wang@intel.com) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is
More informationPerformance Analysis of Mixed Distributed Filesystem Workloads
Performance Analysis of Mixed Distributed Filesystem Workloads Esteban Molina-Estolano, Maya Gokhale, Carlos Maltzahn, John May, John Bent, Scott Brandt Motivation Hadoop-tailored filesystems (e.g. CloudStore)
More informationFile Transfer Best Practices
File Transfer Best Practices David Turner User Services Group NERSC User Group Meeting October 2, 2008 Overview Available tools ftp, scp, bbcp, GridFTP, hsi/htar Examples and Performance LAN WAN Reliability
More informationParallels Cloud Server 6.0
Parallels Cloud Server 6.0 Parallels Cloud Storage I/O Benchmarking Guide September 05, 2014 Copyright 1999-2014 Parallels IP Holdings GmbH and its affiliates. All rights reserved. Parallels IP Holdings
More informationQuantum StorNext. Product Brief: Distributed LAN Client
Quantum StorNext Product Brief: Distributed LAN Client NOTICE This product brief may contain proprietary information protected by copyright. Information in this product brief is subject to change without
More informationGuideline for stresstest Page 1 of 6. Stress test
Guideline for stresstest Page 1 of 6 Stress test Objective: Show unacceptable problems with high parallel load. Crash, wrong processing, slow processing. Test Procedure: Run test cases with maximum number
More informationSMB Direct for SQL Server and Private Cloud
SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server
More informationData Storage At the Heart of any Information System. Ken Claffey, VP/GM - June 2015
Data Storage At the Heart of any Information System Ken Claffey, VP/GM - June 2015 Seagate: A Unique Vantage Point on the Data Centre Evolution of the world s digital information End-to-end cloud solutions:
More informationHow To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx
More informationVeeam Cloud Connect. Version 8.0. Administrator Guide
Veeam Cloud Connect Version 8.0 Administrator Guide April, 2015 2015 Veeam Software. All rights reserved. All trademarks are the property of their respective owners. No part of this publication may be
More informationCharacterize Performance in Horizon 6
EUC2027 Characterize Performance in Horizon 6 Banit Agrawal VMware, Inc Staff Engineer II Rasmus Sjørslev VMware, Inc Senior EUC Architect Disclaimer This presentation may contain product features that
More information