Research Technologies Data Storage for HPC

Size: px
Start display at page:

Download "Research Technologies Data Storage for HPC"

Transcription

1 Research Technologies Data Storage for HPC Supercomputing for Everyone February 17-18, 2014 Research Technologies High Performance File Systems Indiana University

2 Intro to HPC on Big Red II Workshop Data Storage Overview of presentation 1. Data Workflow Thinking about the lifecycle and types of data you use and create 2. Storage Resources - How to efficiently store and use data - Policies, best practices, and optimizing performance 3. Getting your data in and out of storage systems Questions welcome any time! There will also be time at the end for discussion.

3 Consider the types of data you may have Source code Documenta.on Reports Scien.fic data Computa.onal data Intermediate steps Output Results of computa.on

4 Data Storage Requirements Source Code & Documents Easily accessible Backed- up Computa2onal Data Matches poten.al of BR2 FAST! (parallel) Store lots of input/output large capacity Ability to work together collabora.on Results Store safely for long.me Poten.ally very large amounts of data Ability to share data with collaborators

5 Simple Workflow Input instruc.ons Read, compute, write data in parallel Archive results

6 Home Directory Default loca.on For small files Data is backed up /N/u/username/BigRed2 Data Capacitor II (DC2) Large capacity High throughput For compute data Data is not backed up /N/dc2/ Scholarly Data Archive (SDA) Disk to tape archiving Distributed copies For long term storage

7 Big Red II Storage Resource Analogies Home Directories the family sta.on wagon daily trips to school backed up / used always Data Capacitor II (DC2) the race car VERY FAST - wear a seatbelt not backed up / workspace Scholarly Data Archive (SDA) the all- terrain vehicle extremely reliable keeps your cri.cal data safe

8 Home Directories Input instruc.ons

9 Home Directory The Family Station Wagon Default storage location for your account - Available as soon as you log in The place to store source code, shell scripts, and other small files Not meant for computational data! Do not compute against data in home directories! - Don t take your station wagon to the race track More information:

10 Home Directory Quota 10 GB quota for home directory Use the command `quota -s` to see current usage `-s` flag gives output in MB, otherwise in 1 KB blocks jupmille@login1:~> quota -s Filesystem blocks quota limit grace files quota limit grace bl-nas2:/vol/hd03 184M 9900M 10240M m 4295m

11 Home Directory Snapshots Hourly and nightly snapshots are made of your home directory Snapshots are in a hidden.snapshot directory within the source directory jupmille@login1:~> cd.snapshot jupmille@login1:~/.snapshot> ls hourly.0 hourly.1 hourly.2 hourly.3 hourly.4 hourly.5 nightly.0 nightly.1 Snapshots are taken daily at 8am, 12pm, 4pm, 8pm, midnight For more information see the Knowledge Base document

12 Home Directory Shared Across Systems Home directories space shared across RT systems System Big Red 2 Mason Quarry Path /N/u/username/BigRed2 /N/u/username/Mason /N/u/username/Quarry jupmille@login1:~> pwd /N/u/jupmille/BigRed2 jupmille@login1:~> cd.. jupmille@login1:/n/u/jupmille> ls BigRed2 Mason Quarry

13 Data Capacitor II Compute against data

14 Data Capacitor II The Race Car Parallel high-speed storage based on Lustre file system 3.5 PB total size, 50 GB/s throughput to BR2 Store input and output application data intended as a temporary workspace for computation» not for indefinite storage of data DC2 is not backed up Available on IU s HPC resources - Big Red II, Mason, Quarry More information available on the Knowledge Base

15 Data Capacitor II Lustre File System Linux + Cluster = Lustre Lustre is a parallel distributed file system High performance file system used by many Top500 supercomputers (>50%) POSIX compliant behaves like other file systems Open source software under GPLv2 Guided by non-profit Open Scalable File Systems, Inc. - Intel maintains canonical tree - Active development - IU contributes code

16 Data Capacitor II Usage Lustre is designed for high-speed data access, not for metadata speed This intentional design consideration comes with a tradeoff File metadata lookups can be relatively slow Metadata server must ask each Object server for size - Otherwise Metadata would be constantly updating with size info Tips to improve interactive performance: Avoid more than 10K files in one directory - separate input, output, final results, and delete unneeded data helps with data management as well Limit the amount of metadata actions you perform - reduce file and directory operations, stat-ing files

17 Data Capacitor II Scratch Directories A scratch directory is available to every Big Red II user Path to your scratch space is /N/dc2/scratch/username/ Intended as temporary workspace for your data, not for sharing Files not accessed in 60 days may be purged The name scratch is from the phrase scratch paper - a piece of paper used while performing calculations, separate from answer sheet - you may know it as scrap paper - implies an impermanence to the data

18 Data Capacitor II Project Directories Project directories available by application for users, groups, or labs with special needs Makes it possible to share data amongst users (Unix groups) Files not accessed in 180 days may be purged Project space can be applied for by submitting application at this URL - Longer storage time than scratch space, but still not forever

19 Data Capacitor II Purging Old Data Administrators of Data Capacitor II routinely purge old data Data not accessed in a certain amount of time will be deleted - scratch = 60 days, project = 180 days You will be notified before and after any action is taken against your data An will be sent to you listing the eligible files A file will be place in the root of your scratch or project directory - Will-Purge-These-uid jupmille-Files-On txt You will have seven (7) days to take action Afterwards, a file listing the actions taken will be created See

20 Data Capacitor II Find Old Files You can be proactive about managing your data to prevent purging Use this command to list the files in your directory sorted by age - May take awhile, it depends on the number of files you have find /N/dc2/scratch/<username> -type f -exec stat --format="%n %x" '{}' \; sort -k2,3

21 Data Capacitor II Space Quota There is no strict limit on the amount of space you can use Space available for all users varies depending on system use - df -h gives you current space available Please don t use any more space than strictly necessary - Data Capacitor II is a shared resource It is intended for computation Use this command to find the total amount of space you re using: du hc /N/dc2/scratch/<username>

22 Data Capacitor II HIPAA and ephi DC2 is HIPAA aligned, but you are responsible for ensuring the privacy and security of ephi data Technical safeguards Set directory permissions to restrict read and write access - The most secure method is to allow access only to you jupmille@login1:/n/dc2/scratch/jupmille> chmod 700 ephi_file jupmille@login1:/n/dc2/scratch/jupmille> ls -l ephi_file -rwx jupmille uits 0 Jan 23 14:25 ephi_file Use `umask` to ensure all new files are created with safe permissions - Add `umask 077` to shell profile - See

23 Data Capacitor II Job Scheduling Specify DC2 as a requirement for your batch job add the dc2 file system property to the nodes directive in your in your TORQUE job script For example, if your job requires two nodes, thirty two processors per node, and the Data Capacitor II file system (/N/dc2), the resource specification line in your TORQUE job script would look like: #PBS -l nodes=2:ppn=32:dc2 Specifying the dc2 property in your script directs TORQUE to dispatch your job to only those compute nodes with the Data Capacitor II file system mounted. If DC2 is down, your job won t run. More information at:

24 Data Capacitor II Reporting Issues If you encounter any problems using Data Capacitor II, please include these details when reporting the issue: Data and time event occurred Which system your job was running on The directory being used A brief description of what was happening when the issue occurred All Data Capacitor II issues should be reported to hpfs-admin@iu.edu

25 Scholarly Data Archive Archive results

26 Scholarly Data Archive Massive near-line and archival data storage Disk cache front end ~600 TB Magnetic tape storage 15 PB (uncompressed) Hierarchical storage management (HSM) Data migrates from disk to tape over time Retrieval from tape a small cost for safety Data integrity Geographically replicated - IUB and IUPUI each get a copy Checksums and error detection

27 Scholarly Data Archive Details Account can be applied for easily Default quota is 50 TB replicated copy of data is not counted additional storage is available HIPAA aligned but you must secure the data Group or department accounts are available Data can be shared with Access Control Lists More information available on the Knowledge Base

28 Scholarly Data Archive Usage Best Uses Files of at least 1MB Single file can be up to 10TB Archive files Files rarely updated Files need to be kept long time Files are read often frequently accessed files tend to stay on disk cache Poor Uses Small files small files should be aggregated with a tool like WinZip or tar Files that will frequently change Do not edit files in place - If you need to edit: Copy -> Edit -> Reupload

29 Scholarly Data Archive Helpful Tip Data stored on the SDA can be kept for a long time So long that you might even forget - Or the people who did know have left Do your future self a favor and document the data Create a manifest or annotation of the data Keep it at the top of your storage directory, and keep it up to date

30 Transferring Data In and Out of RT Storage

31 Preparing to Transfer Data It is recommended to bundle your data before transferring - Easier to manage a single file - Preserves layout, permissions - Transferring large files is often faster than many small files jupmille@login1:/n/dc2/scratch/jupmille> ls input output results jupmille@login1:/n/dc2/scratch/jupmille> tar -cvf archive.tar input/ output/ results/ input/ output/ results/ jupmille@login1:/n/dc2/scratch/jupmille> ls archive.tar input output results

32 Getting data in and out of Home Directories scp is the easiest way to get data in and out of your home directory - secure, but for high performance the quota is 10GB, so you re unlikely to make large transfers - no restart capabilities, so if it fails you must start over - sftp and rsync over ssh are also good options $ scp archive.tar bigred2.uits.iu.edu:~

33 Getting data in and out of Data Capacitor II The IU Cyberinfrastructure Gateway allows you to transfer data between your machine and Data Capacitor II IU CI Gateway information: - transferring data with CI Gateway to DC2: - The IU CI Gateway uses Globus Online a parallel transfer tool which requires software to be installed follow the instructions in the KB article to request an account for DC2 - The endpoint is iu#dcwan_internal - Your path is /~/N/dc2/scratch/<username>/ You can still use scp/rsync/sftp but they re not high performance tools

34 Getting data in and out of Scholarly Data Archive Fast access hsi and htar command line tools - To use HSI on Big Red II, you must load the HPSS module module load hpss GridFTP clients Kerberized FTP GlobusOnline also available through the IU CI Gateway Convenience protocols Web access via browser sftp Mount to desktop via CIFS/Samba (mapped drive) Knowledge Base article on SDA access

35 Pull/Push data in SDA to DC2 or Home Directory Use hsi on Big Red II login node Add module statement to profile module load hpss Can be done interactively Can be scripted through Kerberos keytab authentication hsi can be used in many different ways (ftp style commands) - manual available at

36 Scholarly Data Archive HSI Example module load hpss HPSS (command-line utility for access to the SDA) version 4.0 loaded. hsi Kerberos Principal: jupmille Password for put samplefile.tar? ls /hpss/j/u/jupmille: samplefile.tar? get samplefile.tar? du -k? help? exit Knowledge Base: hvps://kb.iu.edu/d/avdb

37 Getting Data onto Big Red II Big Red II, Mason, Quarry login nodes do not enforce a time limit on data transfer tools scp, sftp, hsi, htar, wget, curl, etc. I recommend putting your data into the Scholarly Data Archive first Then use command line tools to pull from SDA into DC2, Home Directories Many ways to access the SDA, robust permissions You ll always have a distributed copy of your data!

38 Other RT Storage Resources Focus of this presentation was basics of storage available on Big Red II There are more storage options available - Research File System (RFS) Distributed copies, very accessible, robust permissions - Data Capacitor WAN (DC-WAN) Lustre over the wide area network, share at high speed

39 System Outages If there are any problems with the system, we will update IT Notices Data Capacitor II has regularly scheduled system maintenance First Tuesday of every month Join the maintenance mailing list to be notified -

40 Other RT Storage Resources Focus of this presentation was basics of storage available on Big Red II There are more storage options available - Research File System (RFS) Distributed copies, very accessible, robust permissions - Data Capacitor WAN (DC-WAN) Lustre over the wide area network, share at high speed

41 Ques.ons?

NERSC Archival Storage: Best Practices

NERSC Archival Storage: Best Practices NERSC Archival Storage: Best Practices Lisa Gerhardt! NERSC User Services! Nick Balthaser! NERSC Storage Systems! Joint Facilities User Forum on Data Intensive Computing! June 18, 2014 Agenda Introduc)on

More information

IU Cyberinfrastructure Overview

IU Cyberinfrastructure Overview IU Cyberinfrastructure Overview, Cyberinfrastructure and Service Center Indiana University Pervasive Technology Institute Science Storage Computation Analysis/ Bio/Health Visualization Campus Education/

More information

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University

HPC at IU Overview. Abhinav Thota Research Technologies Indiana University HPC at IU Overview Abhinav Thota Research Technologies Indiana University What is HPC/cyberinfrastructure? Why should you care? Data sizes are growing Need to get to the solution faster Compute power is

More information

Robert Ping UITS Research Technologies, Cyberinfrastructure and Service Center Indiana University Pervasive Technology Institute

Robert Ping UITS Research Technologies, Cyberinfrastructure and Service Center Indiana University Pervasive Technology Institute Cyberinfrastucture for IU Research and Academics Robert Ping, Cyberinfrastructure and Service Center Indiana University Pervasive Technology Institute Science Storage Computation Analysis/ Bio/Health Visualization

More information

Data Management Best Practices

Data Management Best Practices December 4, 2013 Data Management Best Practices Ryan Mokos Outline Overview of Nearline system (HPSS) Hardware File system structure Data transfer on Blue Waters Globus Online (GO) interface Web GUI Command-Line

More information

OLCF Best Practices. Bill Renaud OLCF User Assistance Group

OLCF Best Practices. Bill Renaud OLCF User Assistance Group OLCF Best Practices Bill Renaud OLCF User Assistance Group Overview This presentation covers some helpful information for users of OLCF Staying informed Some aspects of system usage that may differ from

More information

Quick Introduction to HPSS at NERSC

Quick Introduction to HPSS at NERSC Quick Introduction to HPSS at NERSC Nick Balthaser NERSC Storage Systems Group nabalthaser@lbl.gov Joint Genome Institute, Walnut Creek, CA Feb 10, 2011 Agenda NERSC Archive Technologies Overview Use Cases

More information

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster Introduction to Linux and Cluster Basics for the CCR General Computing Cluster Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St Buffalo, NY 14203 Phone: 716-881-8959

More information

GPN - What is theGPFS HSI HTAR ISH?

GPN - What is theGPFS HSI HTAR ISH? 1/10 Storage Capacity Expansion Plan (initial) Storage Budget: $ $ $ (5PB) Back in 2009 GPFS (scratch + project) 2010-2011 2012-2013 GPFS (add 20-50%) GPFS (add 50-100%) Rationale: * the longer we wait,

More information

Storage Capacity Expansion Plan (initial)

Storage Capacity Expansion Plan (initial) 1/14 Storage Capacity Expansion Plan (initial) Storage Budget: $ $ $ (5PB) Back in 2009 GPFS scratch + project 2010-2011 2012-2013 GPFS (add 20-50%) GPFS (add 50-100%) Rationale: * the longer we wait,

More information

Incremental Backup Script. Jason Healy, Director of Networks and Systems

Incremental Backup Script. Jason Healy, Director of Networks and Systems Incremental Backup Script Jason Healy, Director of Networks and Systems Last Updated Mar 18, 2008 2 Contents 1 Incremental Backup Script 5 1.1 Introduction.............................. 5 1.2 Design Issues.............................

More information

PetaLibrary Storage Service MOU

PetaLibrary Storage Service MOU University of Colorado Boulder Research Computing PetaLibrary Storage Service MOU 1. INTRODUCTION This is the memorandum of understanding (MOU) for the Research Computing (RC) PetaLibrary Storage Service.

More information

NERSC File Systems and How to Use Them

NERSC File Systems and How to Use Them NERSC File Systems and How to Use Them David Turner! NERSC User Services Group! Joint Facilities User Forum on Data- Intensive Computing! June 18, 2014 The compute and storage systems 2014 Hopper: 1.3PF,

More information

Introduction to Supercomputing with Janus

Introduction to Supercomputing with Janus Introduction to Supercomputing with Janus Shelley Knuth shelley.knuth@colorado.edu Peter Ruprecht peter.ruprecht@colorado.edu www.rc.colorado.edu Outline Who is CU Research Computing? What is a supercomputer?

More information

Data Movement and Storage. Drew Dolgert and previous contributors

Data Movement and Storage. Drew Dolgert and previous contributors Data Movement and Storage Drew Dolgert and previous contributors Data Intensive Computing Location Viewing Manipulation Storage Movement Sharing Interpretation $HOME $WORK $SCRATCH 72 is a Lot, Right?

More information

8/15/2014. Best Practices @OLCF (and more) General Information. Staying Informed. Staying Informed. Staying Informed-System Status

8/15/2014. Best Practices @OLCF (and more) General Information. Staying Informed. Staying Informed. Staying Informed-System Status Best Practices @OLCF (and more) Bill Renaud OLCF User Support General Information This presentation covers some helpful information for users of OLCF Staying informed Aspects of system usage that may differ

More information

Data Management. Network transfers

Data Management. Network transfers Data Management Network transfers Network data transfers Not everyone needs to transfer large amounts of data on and off a HPC service Sometimes data is created and consumed on the same service. If you

More information

Cisco Networking Academy Program Curriculum Scope & Sequence. Fundamentals of UNIX version 2.0 (July, 2002)

Cisco Networking Academy Program Curriculum Scope & Sequence. Fundamentals of UNIX version 2.0 (July, 2002) Cisco Networking Academy Program Curriculum Scope & Sequence Fundamentals of UNIX version 2.0 (July, 2002) Course Description: Fundamentals of UNIX teaches you how to use the UNIX operating system and

More information

Enhanced Research Data Management and Publication with Globus

Enhanced Research Data Management and Publication with Globus Enhanced Research Data Management and Publication with Globus Vas Vasiliadis Jim Pruyne Presented at OR2015 June 8, 2015 Presentations and other useful information available at globus.org/events/or2015/tutorial

More information

Managing, Sharing and Moving Big Data Tracy Teal and Greg Mason Insttute for Cyber Enabled Research

Managing, Sharing and Moving Big Data Tracy Teal and Greg Mason Insttute for Cyber Enabled Research Managing, Sharing and Moving Big Data Tracy Teal and Greg Mason Insttute for Cyber Enabled Research Data storage optons Storing and accessing data on the HPCC Transferring data to and from the HPCC Sharing

More information

Data management on HPC platforms

Data management on HPC platforms Data management on HPC platforms Transferring data and handling code with Git scitas.epfl.ch September 10, 2015 http://bit.ly/1jkghz4 What kind of data Categorizing data to define a strategy Based on size?

More information

The Einstein Depot server

The Einstein Depot server The Einstein Depot server Have you ever needed a way to transfer large files to colleagues? Or allow a colleague to send large files to you? Do you need to transfer files that are too big to be sent as

More information

Prerequisites and Configuration Guide

Prerequisites and Configuration Guide Prerequisites and Configuration Guide Informatica Support Console (Version 2.0) Table of Contents Chapter 1: Overview.................................................... 2 Chapter 2: Minimum System Requirements.................................

More information

Amazon-Free Big Data Analysis. Michael R. Crusoe the GED Lab @ MSU @JKhedron #NGS2013 2013-06- 18

Amazon-Free Big Data Analysis. Michael R. Crusoe the GED Lab @ MSU @JKhedron #NGS2013 2013-06- 18 Amazon-Free Big Data Analysis Michael R. Crusoe the GED Lab @ MSU @JKhedron #NGS2013 2013-06- 18 Overview Dedicated vs Shared computing Evaluating Computing Resources XSEDE Mason Lonestar Stampede Blacklight

More information

An Introduction to High Performance Computing in the Department

An Introduction to High Performance Computing in the Department An Introduction to High Performance Computing in the Department Ashley Ford & Chris Jewell Department of Statistics University of Warwick October 30, 2012 1 Some Background 2 How is Buster used? 3 Software

More information

Week Overview. Running Live Linux Sending email from command line scp and sftp utilities

Week Overview. Running Live Linux Sending email from command line scp and sftp utilities ULI101 Week 06a Week Overview Running Live Linux Sending email from command line scp and sftp utilities Live Linux Most major Linux distributions offer a Live version, which allows users to run the OS

More information

How to Use NoMachine 4.4

How to Use NoMachine 4.4 How to Use NoMachine 4.4 Using NoMachine What is NoMachine and how can I use it? NoMachine is a software that runs on multiple platforms (ie: Windows, Mac, and Linux). It is an end user client that connects

More information

Software infrastructure and remote sites

Software infrastructure and remote sites Software infrastructure and remote sites Petr Chaloupka Nuclear Physics Institute ASCR, Prague STAR regional meeting Dubna, Russia 11/21/2003 Dubna, 11/21/2003 1 Where to go for help and informations Main

More information

GoAnywhere Director to GoAnywhere MFT Upgrade Guide. Version: 5.0.1 Publication Date: 07/09/2015

GoAnywhere Director to GoAnywhere MFT Upgrade Guide. Version: 5.0.1 Publication Date: 07/09/2015 GoAnywhere Director to GoAnywhere MFT Upgrade Guide Version: 5.0.1 Publication Date: 07/09/2015 Copyright 2015 Linoma Software. All rights reserved. Information in this document is subject to change without

More information

LOCKSS on LINUX. Installation Manual and the OpenBSD Transition 02/17/2011

LOCKSS on LINUX. Installation Manual and the OpenBSD Transition 02/17/2011 LOCKSS on LINUX Installation Manual and the OpenBSD Transition 02/17/2011 1 Table of Contents Overview... 3 LOCKSS Hardware... 5 Installation Checklist... 7 BIOS Settings... 10 Installation... 11 Firewall

More information

Upgrade Guide. Product Version: 4.7.0 Publication Date: 02/11/2015

Upgrade Guide. Product Version: 4.7.0 Publication Date: 02/11/2015 Upgrade Guide Product Version: 4.7.0 Publication Date: 02/11/2015 Copyright 2009-2015, LINOMA SOFTWARE LINOMA SOFTWARE is a division of LINOMA GROUP, Inc. Contents Welcome 3 Before You Begin 3 Upgrade

More information

Introduction to SDSC systems and data analytics software packages "

Introduction to SDSC systems and data analytics software packages Introduction to SDSC systems and data analytics software packages " Mahidhar Tatineni (mahidhar@sdsc.edu) SDSC Summer Institute August 05, 2013 Getting Started" System Access Logging in Linux/Mac Use available

More information

NASA Workflow Tool. User Guide. September 29, 2010

NASA Workflow Tool. User Guide. September 29, 2010 NASA Workflow Tool User Guide September 29, 2010 NASA Workflow Tool User Guide 1. Overview 2. Getting Started Preparing the Environment 3. Using the NED Client Common Terminology Workflow Configuration

More information

Globus Research Data Management: Introduction and Service Overview. Steve Tuecke Vas Vasiliadis

Globus Research Data Management: Introduction and Service Overview. Steve Tuecke Vas Vasiliadis Globus Research Data Management: Introduction and Service Overview Steve Tuecke Vas Vasiliadis Presentations and other useful information available at globus.org/events/xsede15/tutorial 2 Thank you to

More information

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research

Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research Introduction to Running Hadoop on the High Performance Clusters at the Center for Computational Research Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

OLCF Best Practices (and More) Bill Renaud OLCF User Assistance Group

OLCF Best Practices (and More) Bill Renaud OLCF User Assistance Group OLCF Best Practices (and More) Bill Renaud OLCF User Assistance Group Overview This presentation covers some helpful information for users of OLCF Staying informed Some aspects of system usage that may

More information

Globus and the Centralized Research Data Infrastructure at CU Boulder

Globus and the Centralized Research Data Infrastructure at CU Boulder Globus and the Centralized Research Data Infrastructure at CU Boulder Daniel Milroy, daniel.milroy@colorado.edu Conan Moore, conan.moore@colorado.edu Thomas Hauser, thomas.hauser@colorado.edu Peter Ruprecht,

More information

Introduction to Archival Storage at NERSC

Introduction to Archival Storage at NERSC Introduction to Archival Storage at NERSC Nick Balthaser NERSC Storage Systems Group nabalthaser@lbl.gov NERSC User Training March 8, 2011 Agenda NERSC Archive Technologies Overview Use Cases for the Archive

More information

Eucalyptus Tutorial HPC and Cloud Computing Workshop http://portal.nersc.gov/project/magellan/euca-tutorial/abc.html

Eucalyptus Tutorial HPC and Cloud Computing Workshop http://portal.nersc.gov/project/magellan/euca-tutorial/abc.html Eucalyptus Tutorial HPC and Cloud Computing Workshop http://portal.nersc.gov/project/magellan/euca-tutorial/abc.html Iwona Sakrejda Lavanya Ramakrishna Shane Canon June24th, UC Berkeley Tutorial Outline

More information

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago Outline Introduction Features Motivation Architecture Globus XIO Experimental Results 3 August 2005 The Ohio State University

More information

Overview of HPC Resources at Vanderbilt

Overview of HPC Resources at Vanderbilt Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources

More information

Extreme Control Center, NAC, and Purview Virtual Appliance Installation Guide

Extreme Control Center, NAC, and Purview Virtual Appliance Installation Guide Extreme Control Center, NAC, and Purview Virtual Appliance Installation Guide 9034968 Published April 2016 Copyright 2016 All rights reserved. Legal Notice Extreme Networks, Inc. reserves the right to

More information

Adobe Marketing Cloud Using FTP and sftp with the Adobe Marketing Cloud

Adobe Marketing Cloud Using FTP and sftp with the Adobe Marketing Cloud Adobe Marketing Cloud Using FTP and sftp with the Adobe Marketing Cloud Contents File Transfer Protocol...3 Setting Up and Using FTP Accounts Hosted by Adobe...3 SAINT...3 Data Sources...4 Data Connectors...5

More information

Isilon OneFS. Version 7.2. OneFS Migration Tools Guide

Isilon OneFS. Version 7.2. OneFS Migration Tools Guide Isilon OneFS Version 7.2 OneFS Migration Tools Guide Copyright 2014 EMC Corporation. All rights reserved. Published in USA. Published November, 2014 EMC believes the information in this publication is

More information

HPSS Best Practices. Erich Thanhardt Bill Anderson Marc Genty B

HPSS Best Practices. Erich Thanhardt Bill Anderson Marc Genty B HPSS Best Practices Erich Thanhardt Bill Anderson Marc Genty B Overview Idea is to Look Under the Hood of HPSS to help you better understand Best Practices Expose you to concepts, architecture, and tape

More information

File Transfer Best Practices

File Transfer Best Practices File Transfer Best Practices David Turner User Services Group NERSC User Group Meeting October 2, 2008 Overview Available tools ftp, scp, bbcp, GridFTP, hsi/htar Examples and Performance LAN WAN Reliability

More information

www.thinkparq.com www.beegfs.com

www.thinkparq.com www.beegfs.com www.thinkparq.com www.beegfs.com KEY ASPECTS Maximum Flexibility Maximum Scalability BeeGFS supports a wide range of Linux distributions such as RHEL/Fedora, SLES/OpenSuse or Debian/Ubuntu as well as a

More information

CASHNet Secure File Transfer Instructions

CASHNet Secure File Transfer Instructions CASHNet Secure File Transfer Instructions Copyright 2009, 2010 Higher One Payments, Inc. CASHNet, CASHNet Business Office, CASHNet Commerce Center, CASHNet SMARTPAY and all related logos and designs are

More information

HPCHadoop: MapReduce on Cray X-series

HPCHadoop: MapReduce on Cray X-series HPCHadoop: MapReduce on Cray X-series Scott Michael Research Analytics Indiana University Cray User Group Meeting May 7, 2014 1 Outline Motivation & Design of HPCHadoop HPCHadoop demo Benchmarking Methodology

More information

Isilon OneFS. Version 7.2.1. OneFS Migration Tools Guide

Isilon OneFS. Version 7.2.1. OneFS Migration Tools Guide Isilon OneFS Version 7.2.1 OneFS Migration Tools Guide Copyright 2015 EMC Corporation. All rights reserved. Published in USA. Published July, 2015 EMC believes the information in this publication is accurate

More information

Introduction to Big Data Analysis for Scientists and Engineers

Introduction to Big Data Analysis for Scientists and Engineers Introduction to Big Data Analysis for Scientists and Engineers About this white paper: This paper was written by David C. Young, an employee of CSC. It was written as supplemental documentation for use

More information

HP-UX Essentials and Shell Programming Course Summary

HP-UX Essentials and Shell Programming Course Summary Contact Us: (616) 875-4060 HP-UX Essentials and Shell Programming Course Summary Length: 5 Days Prerequisite: Basic computer skills Recommendation Statement: Student should be able to use a computer monitor,

More information

Introduction to MSI* for PubH 8403

Introduction to MSI* for PubH 8403 Introduction to MSI* for PubH 8403 Sep 30, 2015 Nancy Rowe *The Minnesota Supercomputing Institute for Advanced Computational Research Overview MSI at a Glance MSI Resources Access System Access - Physical

More information

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert Mitglied der Helmholtz-Gemeinschaft JUROPA Linux Cluster An Overview 19 May 2014 Ulrich Detert JuRoPA JuRoPA Jülich Research on Petaflop Architectures Bull, Sun, ParTec, Intel, Mellanox, Novell, FZJ JUROPA

More information

Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data

Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data David Minor 1, Reagan Moore 2, Bing Zhu, Charles Cowart 4 1. (88)4-104 minor@sdsc.edu San Diego Supercomputer Center

More information

Scientific Storage at FNAL. Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015

Scientific Storage at FNAL. Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015 Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015 Index - Storage use cases - Bluearc - Lustre - EOS - dcache disk only - dcache+enstore Data distribution by solution

More information

Hadoop Basics with InfoSphere BigInsights

Hadoop Basics with InfoSphere BigInsights An IBM Proof of Technology Hadoop Basics with InfoSphere BigInsights Part: 1 Exploring Hadoop Distributed File System An IBM Proof of Technology Catalog Number Copyright IBM Corporation, 2013 US Government

More information

Linux Overview. Local facilities. Linux commands. The vi (gvim) editor

Linux Overview. Local facilities. Linux commands. The vi (gvim) editor Linux Overview Local facilities Linux commands The vi (gvim) editor MobiLan This system consists of a number of laptop computers (Windows) connected to a wireless Local Area Network. You need to be careful

More information

Globus Research Data Management: Introduction and Service Overview

Globus Research Data Management: Introduction and Service Overview Globus Research Data Management: Introduction and Service Overview Kyle Chard chard@uchicago.edu Ben Blaiszik blaiszik@uchicago.edu Thank you to our sponsors! U. S. D E P A R T M E N T OF ENERGY 2 Agenda

More information

Parallel Processing using the LOTUS cluster

Parallel Processing using the LOTUS cluster Parallel Processing using the LOTUS cluster Alison Pamment / Cristina del Cano Novales JASMIN/CEMS Workshop February 2015 Overview Parallelising data analysis LOTUS HPC Cluster Job submission on LOTUS

More information

Analisi di un servizio SRM: StoRM

Analisi di un servizio SRM: StoRM 27 November 2007 General Parallel File System (GPFS) The StoRM service Deployment configuration Authorization and ACLs Conclusions. Definition of terms Definition of terms 1/2 Distributed File System The

More information

SURFsara Data Services

SURFsara Data Services SURFsara Data Services SUPPORTING DATA-INTENSIVE SCIENCES Mark van de Sanden The world of the many Many different users (well organised (international) user communities, research groups, universities,

More information

LOCKSS on LINUX. CentOS6 Installation Manual 08/22/2013

LOCKSS on LINUX. CentOS6 Installation Manual 08/22/2013 LOCKSS on LINUX CentOS6 Installation Manual 08/22/2013 1 Table of Contents Overview... 3 LOCKSS Hardware... 5 Installation Checklist... 6 BIOS Settings... 9 Installation... 10 Firewall Configuration...

More information

CYCLOPE let s talk productivity

CYCLOPE let s talk productivity Cyclope 6 Installation Guide CYCLOPE let s talk productivity Cyclope Employee Surveillance Solution is provided by Cyclope Series 2003-2014 1 P age Table of Contents 1. Cyclope Employee Surveillance Solution

More information

This is when a server versus a workstation is desirable because it has the capability to have:

This is when a server versus a workstation is desirable because it has the capability to have: Protecting your Data Protecting your data is a critical necessity of having your DemandBridge Software and data programs loaded on a computer that has the ability to integrate redundant devices such as

More information

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace Beth Plale Indiana University plale@cs.indiana.edu LEAD TR 001, V3.0 V3.0 dated January 24, 2007 V2.0 dated August

More information

Content Management System

Content Management System Content Management System XT-CMS INSTALL GUIDE Requirements The cms runs on PHP so the host/server it is intended to be run on should ideally be linux based with PHP 4.3 or above. A fresh install requires

More information

Introduction to Arvados. A Curoverse White Paper

Introduction to Arvados. A Curoverse White Paper Introduction to Arvados A Curoverse White Paper Contents Arvados in a Nutshell... 4 Why Teams Choose Arvados... 4 The Technical Architecture... 6 System Capabilities... 7 Commitment to Open Source... 12

More information

File Protection using rsync. Setup guide

File Protection using rsync. Setup guide File Protection using rsync Setup guide Contents 1. Introduction... 2 Documentation... 2 Licensing... 2 Overview... 2 2. Rsync technology... 3 Terminology... 3 Implementation... 3 3. Rsync data hosts...

More information

INF-110. GPFS Installation

INF-110. GPFS Installation INF-110 GPFS Installation Overview Plan the installation Before installing any software, it is important to plan the GPFS installation by choosing the hardware, deciding which kind of disk connectivity

More information

Vital-IT Storage Guidelines

Vital-IT Storage Guidelines Introduction This document describes the current storage organization of Vital-IT and defines some rules about its usage. We need to re-specify the usage of the different parts of the infrastructure as

More information

Contingency Planning and Disaster Recovery

Contingency Planning and Disaster Recovery Contingency Planning and Disaster Recovery Best Practices Guide Perceptive Content Version: 7.0.x Written by: Product Knowledge Date: October 2014 2014 Perceptive Software. All rights reserved Perceptive

More information

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance

More information

How to Backup XenServer VM with VirtualIQ

How to Backup XenServer VM with VirtualIQ How to Backup XenServer VM with VirtualIQ 1. Using Live Backup of VM option: Live Backup: This option can be used, if user does not want to power off the VM during the backup operation. This approach takes

More information

WinSCP PuTTY as an alternative to F-Secure July 11, 2006

WinSCP PuTTY as an alternative to F-Secure July 11, 2006 WinSCP PuTTY as an alternative to F-Secure July 11, 2006 Brief Summary of this Document F-Secure SSH Client 5.4 Build 34 is currently the Berkeley Lab s standard SSH client. It consists of three integrated

More information

Wolfr am Lightweight Grid M TM anager USER GUIDE

Wolfr am Lightweight Grid M TM anager USER GUIDE Wolfram Lightweight Grid TM Manager USER GUIDE For use with Wolfram Mathematica 7.0 and later. For the latest updates and corrections to this manual: visit reference.wolfram.com For information on additional

More information

Getting Started with HPC

Getting Started with HPC Getting Started with HPC An Introduction to the Minerva High Performance Computing Resource 17 Sep 2013 Outline of Topics Introduction HPC Accounts Logging onto the HPC Clusters Common Linux Commands Storage

More information

Berkeley Research Computing. Town Hall Meeting Savio Overview

Berkeley Research Computing. Town Hall Meeting Savio Overview Berkeley Research Computing Town Hall Meeting Savio Overview SAVIO - The Need Has Been Stated Inception and design was based on a specific need articulated by Eliot Quataert and nine other faculty: Dear

More information

Storage Systems: 2014 and beyond. Jason Hick! Storage Systems Group!! NERSC User Group Meeting! February 6, 2014

Storage Systems: 2014 and beyond. Jason Hick! Storage Systems Group!! NERSC User Group Meeting! February 6, 2014 Storage Systems: 2014 and beyond Jason Hick! Storage Systems Group!! NERSC User Group Meeting! February 6, 2014 The compute and storage systems 2013 Hopper: 1.3PF, 212 TB RAM 2.2 PB Local Scratch 70 GB/s

More information

Archival Storage At LANL Past, Present and Future

Archival Storage At LANL Past, Present and Future Archival Storage At LANL Past, Present and Future Danny Cook Los Alamos National Laboratory dpc@lanl.gov Salishan Conference on High Performance Computing April 24-27 2006 LA-UR-06-0977 Main points of

More information

File transfer in UNICORE State of the art

File transfer in UNICORE State of the art Mitglied der Helmholtz-Gemeinschaft File transfer in UNICORE State of the art Bernd Schuller, Björn Hagemeier, Michael Rambadt Federated Systems and Data division Jülich Supercomputer Centre Forschungszentrum

More information

Multi-site Best Practices

Multi-site Best Practices DS SOLIDWORKS CORPORATION Multi-site Best Practices SolidWorks Enterprise PDM multi-site implementation [SolidWorks Enterprise PDM 2010] [] [Revision 2] Page 1 Index Contents Multi-site pre-requisites...

More information

AFS Usage and Backups using TiBS at Fermilab. Presented by Kevin Hill

AFS Usage and Backups using TiBS at Fermilab. Presented by Kevin Hill AFS Usage and Backups using TiBS at Fermilab Presented by Kevin Hill Agenda History and current usage of AFS at Fermilab About Teradactyl How TiBS (True Incremental Backup System) and TeraMerge works AFS

More information

Usage of the mass storage system. K. Rosbach PPS 19-Feb-2008

Usage of the mass storage system. K. Rosbach PPS 19-Feb-2008 Usage of the mass storage system K. Rosbach PPS 19-Feb-2008 Disclaimer This is just a summary based on the information available online at http://dv-zeuthen.desy.de/services/dcache_osm/e717/index_eng.html

More information

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014 Using WestGrid Patrick Mann, Manager, Technical Operations Jan.15, 2014 Winter 2014 Seminar Series Date Speaker Topic 5 February Gino DiLabio Molecular Modelling Using HPC and Gaussian 26 February Jonathan

More information

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing An Alternative Storage Solution for MapReduce Eric Lomascolo Director, Solutions Marketing MapReduce Breaks the Problem Down Data Analysis Distributes processing work (Map) across compute nodes and accumulates

More information

HPCC - Hrothgar Getting Started User Guide

HPCC - Hrothgar Getting Started User Guide HPCC - Hrothgar Getting Started User Guide Transfer files High Performance Computing Center Texas Tech University HPCC - Hrothgar 2 Table of Contents Transferring files... 3 1.1 Transferring files using

More information

Unix Sampler. PEOPLE whoami id who

Unix Sampler. PEOPLE whoami id who Unix Sampler PEOPLE whoami id who finger username hostname grep pattern /etc/passwd Learn about yourself. See who is logged on Find out about the person who has an account called username on this host

More information

GDC Data Transfer Tool User s Guide. NCI Genomic Data Commons (GDC)

GDC Data Transfer Tool User s Guide. NCI Genomic Data Commons (GDC) GDC Data Transfer Tool User s Guide NCI Genomic Data Commons (GDC) Contents 1 Getting Started 3 Getting Started.......................................................... 3 The GDC Data Transfer Tool: An

More information

XenData Product Brief: SX-550 Series Servers for LTO Archives

XenData Product Brief: SX-550 Series Servers for LTO Archives XenData Product Brief: SX-550 Series Servers for LTO Archives The SX-550 Series of Archive Servers creates highly scalable LTO Digital Video Archives that are optimized for broadcasters, video production

More information

Active Directory Compatibility with ExtremeZ-IP. A Technical Best Practices Whitepaper

Active Directory Compatibility with ExtremeZ-IP. A Technical Best Practices Whitepaper Active Directory Compatibility with ExtremeZ-IP A Technical Best Practices Whitepaper About this Document The purpose of this technical paper is to discuss how ExtremeZ-IP supports Microsoft Active Directory.

More information

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O. Reference Architecture Designing High-Performance Storage Tiers Designing High-Performance Storage Tiers Intel Enterprise Edition for Lustre* software and Intel Non-Volatile Memory Express (NVMe) Storage

More information

Active Directory Comapatibility with ExtremeZ-IP A Technical Best Practices Whitepaper

Active Directory Comapatibility with ExtremeZ-IP A Technical Best Practices Whitepaper Active Directory Comapatibility with ExtremeZ-IP A Technical Best Practices Whitepaper About this Document The purpose of this technical paper is to discuss how ExtremeZ-IP supports Microsoft Active Directory.

More information

RECOVER ( 8 ) Maintenance Procedures RECOVER ( 8 )

RECOVER ( 8 ) Maintenance Procedures RECOVER ( 8 ) NAME recover browse and recover NetWorker files SYNOPSIS recover [-f] [-n] [-q] [-u] [-i {nnyyrr}] [-d destination] [-c client] [-t date] [-sserver] [dir] recover [-f] [-n] [-u] [-q] [-i {nnyyrr}] [-I

More information

Attix5 Pro Server Edition

Attix5 Pro Server Edition Attix5 Pro Server Edition V7.0.3 User Manual for Linux and Unix operating systems Your guide to protecting data with Attix5 Pro Server Edition. Copyright notice and proprietary information All rights reserved.

More information

Avaya G700 Media Gateway Security - Issue 1.0

Avaya G700 Media Gateway Security - Issue 1.0 Avaya G700 Media Gateway Security - Issue 1.0 Avaya G700 Media Gateway Security With the Avaya G700 Media Gateway controlled by the Avaya S8300 or S8700 Media Servers, many of the traditional Enterprise

More information

Deploying a distributed data storage system on the UK National Grid Service using federated SRB

Deploying a distributed data storage system on the UK National Grid Service using federated SRB Deploying a distributed data storage system on the UK National Grid Service using federated SRB Manandhar A.S., Kleese K., Berrisford P., Brown G.D. CCLRC e-science Center Abstract As Grid enabled applications

More information

White Paper. Mimosa NearPoint for Microsoft Exchange Server. Next Generation Email Archiving for Exchange Server 2007. By Bob Spurzem and Martin Tuip

White Paper. Mimosa NearPoint for Microsoft Exchange Server. Next Generation Email Archiving for Exchange Server 2007. By Bob Spurzem and Martin Tuip White Paper By Bob Spurzem and Martin Tuip Mimosa Systems, Inc. January 2008 Mimosa NearPoint for Microsoft Exchange Server Next Generation Email Archiving for Exchange Server 2007 CONTENTS Email has become

More information

System Requirement Specification for A Distributed Desktop Search and Document Sharing Tool for Local Area Networks

System Requirement Specification for A Distributed Desktop Search and Document Sharing Tool for Local Area Networks System Requirement Specification for A Distributed Desktop Search and Document Sharing Tool for Local Area Networks OnurSoft Onur Tolga Şehitoğlu November 10, 2012 v1.0 Contents 1 Introduction 3 1.1 Purpose..............................

More information