Inside Lustre HSM. An introduction to the newly HSM-enabled Lustre 2.5.x parallel file system. Torben Kling Petersen, PhD.



Similar documents
Inside The Lustre File System

How To Write A Libranthus (Libranthus) On Libranus 2.4.3/Libranus 3.5 (Librenthus) (Libre) (For Linux) (

Increase Database Performance by Implementing Cirrus Data Solutions DCS SAN Caching Appliance With the Seagate Nytro Flash Accelerator Card

Seagate Lustre Update. Peter Bojanic

Accelerate SQL Server 2014 AlwaysOn Availability Groups with Seagate. Nytro Flash Accelerator Cards

Virtualization. Disaster Recovery. A Foundation for Disaster Recovery in the Cloud

Data management challenges in todays Healthcare and Life Sciences ecosystems

Seagate Instant Secure Erase Deployment Options

Achieving Higher VDI Scalability and Performance on Microsoft Hyper-V with Seagate 1200 SAS SSD Drives & Proximal Data AutoCache Software

Solving the Second Site IT Dilemma. Understanding the Benefits of Cloud DR for NetApp Storage Environments. Introduction.

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

How to Test VDI Scalability and Performance on a Windows Server 2012 R2

Self-Encrypting Hard Disk Drives in the Data Center

2011 FileTek, Inc. All rights reserved. 1 QUESTION

EVault Secure Storage Cloud Overview

Effective Storage Management for Cloud Computing

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads

Next Generation HPC Storage Initiative. Torben Kling Petersen, PhD Lead Field Architect - HPC

Versity All rights reserved.

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS

Cisco Unified Computing. Optimization Service

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

New Storage System Solutions

巨 量 資 料 分 層 儲 存 解 決 方 案

Archival Storage At LANL Past, Present and Future

Acme Corporation Enterprise Storage Assessment Prepared by:

The Revival of Direct Attached Storage for Oracle Databases

EMC Backup and Recovery for Microsoft SQL Server 2008 Enabled by EMC Celerra Unified Storage

Effective storage management and data protection for cloud computing

Reduce your data storage footprint and tame the information explosion

IBM Global Technology Services November Successfully implementing a private storage cloud to help reduce total cost of ownership

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

Seagate Global Access User Guide

Tiered Adaptive Storage

IBM Tivoli Storage Manager Suite for Unified Recovery

Symantec NetBackup Snapshots, Continuous Data Protection, and Replication

Mayur Dewaikar Sr. Product Manager Information Management Group Symantec Corporation

IBM Infrastructure for Long Term Digital Archiving

W H I T E P A P E R T h e C r i t i c a l N e e d t o P r o t e c t M a i n f r a m e B u s i n e s s - C r i t i c a l A p p l i c a t i o n s

THE CASE FOR ACTIVE DATA ARCHIVING

EOFS Workshop Paris Sept, Lustre at exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

Cloud Storage Backup for Storage as a Service with AT&T

Symantec NetBackup deduplication general deployment guidelines

Powering Linux in the Data Center

128-Bit Versus 256-Bit AES Encryption

Large File System Backup NERSC Global File System Experience

Seagate Dashboard User Guide

Utilizing the SDSC Cloud Storage Service

Seagate Recovery Services saved my digital life! Customer Testimonials

Logicalis Data and Storage Practice

Lessons learned from parallel file system operation

Veritas NetBackup With and Within the Cloud: Protection and Performance in a Single Platform

Massive Storage Capacities Help to Advance Surveillance Technology

IBM Information Infrastructure

Professional Archive Manager for Files

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. dcache Introduction

The Use of Flash in Large-Scale Storage Systems.

Cisco Data Center Optimization Services

Hitachi NAS Platform and Hitachi Content Platform with ESRI Image

N o v e l l A d d s t o H i g h A v a i l a b i l i t y E x t e n s i o n s f o r S U S E L i n u x E n t e r p r i s e S e r v e r 1 1

Planning the Migration of Enterprise Applications to the Cloud

Backup and Recovery FAQs

Technology Insight Series

EaseTag Cloud Storage Solution

High Performance Computing (HPC): Seagate CSS Playbook

We look beyond IT. Cloud Offerings

Veritas Backup Exec 15: Deduplication Option

Seagate Access for Personal Cloud User Manual

Managing Microsoft Office SharePoint Server Content with Hitachi Data Discovery for Microsoft SharePoint and the Hitachi NAS Platform

EMC BACKUP-AS-A-SERVICE

How to Choose your Red Hat Enterprise Linux Filesystem

Implementing Network Attached Storage. Ken Fallon Bill Bullers Impactdata

MAKING THE BUSINESS CASE

UPSTREAM for Linux on System z

Hitachi Data Migrator to Cloud Best Practices Guide

How to Manage Critical Data Stored in Microsoft Exchange Server By Hitachi Data Systems

WD s Datacenter Storage Portfolio Capacity Storage Evolved

Understanding Disk Storage in Tivoli Storage Manager

The Archival Upheaval Petabyte Pandemonium Developing Your Game Plan Fred Moore President

Archiving On-Premise and in the Cloud. March 2015

ENTERPRISE VIRTUALIZATION ONE PLATFORM FOR ALL DATA

Storage management and business continuity strategy and futures

The business value of improved backup and recovery

August Transforming your Information Infrastructure with IBM s Storage Cloud Solution

IBM Tivoli Storage Manager

IBM Tivoli Storage Manager Suite for Unified Recovery

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

Introduction to NetApp Infinite Volume

Data Warehouse as a Service. Lot 2 - Platform as a Service. Version: 1.1, Issue Date: 05/02/2014. Classification: Open

Hitachi Cloud Services for Private File Tiering. Low Risk Cloud at Your Own Pace. The Hitachi Vision on Cloud

QStar White Paper. Tiered Storage

The IBM Archive Cloud Project: Compliant Archiving into the Cloud

Oracle Database Backup Service. Secure Backup in the Oracle Cloud

Seagate Manager. User Guide. For Use With Your FreeAgent TM Drive. Seagate Manager User Guide for Use With Your FreeAgent Drive 1

Data Management using Hierarchical Storage Management (HSM) with 3-Tier Storage Architecture

IBM Scale Out Network Attached Storage

IBM Smart Business Storage Cloud

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

Time Value of Data. Creating an active archive strategy to address both archive and backup in the midst of data explosion.

Transcription:

Inside Lustre HSM Technology Paper An introduction to the newly HSM-enabled Lustre 2.5.x parallel file system Torben Kling Petersen, PhD

Introduction Hierarchical Storage Management (HSM) has been the enterprise choice of multi-tier storage management deployments for many years. It first appeared in the mid- 1970s as an IBM product for mainframes. Since then, a number of different implementations have been developed, and many solutions (both commercial and open source) exist today. The most notable are IBM s TSM (Tivoli Storage Manager), SAM-QFS (Storage Archive Manager Quick File System) from either Oracle or Versity, and SGI s DMF (Data Migration Facility) 1. The original idea of HSM was to allow users to automatically move data from near-line, expensive data storage to back end, and often tape-based, cheap archive systems in an automated and transparent fashion. While HSM is sometimes referred to as tiered storage or active archive, these processes differ significantly from tape in that most of the data, if not all of it, is online and fully available to users. In addition, the tiering and data movement is usually between NAND-based flash tiers, through SAS-based HDD arrays, and then to large SATAbased storage volumes 2. HSM functionality is now available in Lustre 2.5, closing one of the main requirement requests often voiced from the commercial technical computing community, which has traditionally relied on proprietary, full-feature parallel file systems such as IBM s GPFS. Lustre is now one of the most successful open source projects to date, with more than 70 active developers and representation from close to 20 companies and organizations 3. One of these organizations, the French Atomic Energy Commission (CEA), leads the development of HSM for Lustre 4. Note that even though Lustre HSM delivers the same functionality as other peers on the market, HSM is not to be considered a Lustre backup facility; it offers far more than this relatively simple feature. Another important distinction of Lustre HSM is that Lustre by itself is not a full HSM solution; Lustre is HSM enabled. This means that Lustre adds several components of what makes up a complete HSM solution but lacks the downstream tiers. These are normally handled by either another file system or a full HSM solution in its own right. 1 http://www-03.ibm.com/software/products/en/tivostormana http://www.oracle.com/us/products/servers-storage/storage/storage-software/storage-archive-manager/overview/index.html http://www.versity.com https://www.sgi.com/products/storage/idm/dmf.html 2 http://www.activearchive.com 3 This is a summary of characteristics for the largest supercomputer site. For more information see http://top500.org 4 http://www-hpc.cea.fr/en/red/equipements.htm

HSM basics HSM back end The goal of HSM is to free up space in the parallel file system s primary tier by automatically migrating rarely accessed data to a storage tier, which is usually significantly larger and less expensive. The true benefit of HSM is that the metadata for the file (such as icons in folders, files and folders in ls - l, etc.) is NOT moved to the HSM back end. Instead, a dirty bit is added to the metadata of an archived file, informing the file system that the actual content of a file (in Lustre stored as one or more objects on the OSTs) had been moved. This means that when listing all files in a directory, even the files that have been moved are still visible. So from a user point of view, all the files are still available regardless of where the actual data is stored. Accessing data that have been archived requires copying the file(s) back to the file system, but again, this is what an HSM solution is designed to do. The second important HSM concept involves the method and mechanisms used to automatically move data within storage subsystems. To put it simply: the file system keeps track of all files created, written, read, updated and deleted; this information is then used by a policy engine to either move data off or back to the file system. These policies can range from simple, such as any file that has not been touched in 14 days, to complex, such as any file located in the directory / mnt/lustre/large_files AND which is larger than 40 GB NOT ending in.tmp. The policy engine can have many different policies, which trigger one or more actions. Moving a file back to the file system is somewhat easier in that it doesn t require a policy, but when a user tries to read a file that has been moved, the file is copied back to the file system, after which the user gets access to the data.

Lustre HSM overview Schematic design of a typical Lustre HSM setup In the example above, we re only running a single policy engine, a single coordinator (i.e., single MDS) and four agents each running the CopyTool. While the example above is fairly basic, Lustre HSM is capable of managing multiple backends, each served by one or more agents. Based on the policy script, a migration of data can be done to multiple systems at the same time, thereby guaranteeing multiple copies if so desired. While HSM is not a backup tool, a policy triggering an Archive action as soon as a new file is written but not followed immediately by a release will in essence generate a backup copy of said file. An even more advanced policy that triggers an additional Archive on an already archived file but using a new copy with a slightly new name would be able to work as a versioning tool. Lustre HSM components Agents Agents are Lustre file system clients running CopyTool, which is a user space daemon that transfers data between Lustre and an HSM solution. A Lustre file system only supports a single CopyTool process, per ARCHIVE (i.e., the file or directories being archived) and per client node. Bundled with Lustre tools, the POSIX CopyTool can work with any HSM-capable external storage that exports a POSIX API. Currently, the following open source CopyTools are available: POSIX CopyTool Used with a system supporting a POSIX interface, such as Tivoli Storage Manager (TSM) or SAM/QFS. HPSS CopyTool CEA development, which will be freely available to all HPSS sites. This tool requires HPSS 7.3.2 or higher. Other tools, such as a DMF CopyTool from SGI and an OpenArchive CopyTool from GrauData, are being actively developed. Coordinator Helper application designed to act as an interface between the policy engine, the metadata server(s) and the CopyTool(s). Policy Engine The policy engine used in Lustre is called RobinHood. RobinHood 5 is a Lustre-aware multi-purpose tool that is external to the Lustre servers and can: Apply migration/purge policies on any POSIX file system Back up to HSM world Provide auditing and reporting Offer a multi-threaded architecture, developed for use in HPC Process Lustre changelogs to avoid an expensive file system scan Perform list/purge operations on files per OST Understand OST artifacts like OST usage 5 http://sourceforge.net/projects/robinhood/

Policy examples As mentioned above, all HSM functionality is triggered by policies managed by RobinHood. While the concept of policies might sound complicated, it is based on a very logical syntax and is easy to understand, even for those who do not have Lustre expertise. In this example to the right, A check is run every 15 minutes if the OST usage exceeds 90%, then: o Files not modified in the last hour will be migrated o Files created more than 24 hours ago and not accessed within the last 12 hours will be released Lustre HSM nomenclature The steps to accomplish these tasks in Lustre are basically the same regardless of system, and are currently referred to as: Archive ( copyout ) Copies data to external HSM system. Note that the data is still present on the Lustre file system, but a copy has been made on the HSM side. Release Deletes objects that have been copied (N.B. on OSTs). The MDT retains metadata information for the file. Restore ( copyin ) Copies data back when requested by a user. This is triggered by specific command (pre-stage) or a cache-miss. migration _ policies { policy default { condition { last _ mod > 1h hsm _ remove _ policy { hsm _ remove = enabled; deferred _ remove _ delay = 24h; purge _ policies { policy default { condition { last _ access > 12h purge _ trigger { check _ interval = 15min; trigger _ on = OST _ usage; high _ threshold _ pct = 90%;

Schematic data flows within a Lustre HSM solution To illustrate the inner workings of hierarchical storage management, the segments below are designed to show the basic steps of a Lustre Archive/Restore process. Archiving (aka Copy Out ) This example illustrates the steps HSM performs to move data out of the file system: 1. Policy engine triggers an event to migrate data 2. The MDT issues a CopyOut to the coordinator (including FID, extent, etc.) 3. Coordinator issues an HSMCopyOut to the agent 4. The agent launches a CopyOut command to the mover (CopyTool) 5. The CopyTool keeps the client updated on progress and completion 6. Upon completion, the agent node updates the coordinator that all data has been archived 7. Coordinator updates the metadata that data has been copied out by adding what s generally known as a dirty bit to the metadata As previously noted, the process outlined above does not erase the data from the Lustre file system. To do so, the policy that triggered the data archiving process above (or a totally separate policy based on different criteria, such as when fs > 80% full > release archived data in chronological order ), a specific process needs to run which releases the data from the OSTs.

Summary As Lustre has been transformed from a national lab supercomputer science project to an enterprise-quality file system, customer requirements have changed accordingly. One of the features about which the most questions have been asked is automated data movement within multi-tier storage systems. With Lustre 2.5 and the HSM functionality enabled, this goal has now been reached. Although this functionality is new to Lustre, it is important to remember that the French Atomic Energy Commission (CEA), which leads the development of HSM for Lustre, has been running HSM in production for several years within the Tera 1xx systems. Seagate plans to deliver a fully tested and supported version of Lustre 2.5.x with HSM-ready capability in our ClusterStorengineered solution for HPC and Big Data by early 2015. The exact specification cannot be described at this time, and possible back-end implementations (outside the obvious HPSS and POSIX CopyTools) will have to be defined at a later stage. It is interesting to see storage companies such as Seagate, EMC and NetApp participating in the development efforts through OpenSFS 6 and European Open File Systems (EOFS) 7, in addition to compute vendors such as Cray, Fujitsu and SGI augmenting the national labs and other high-profile end users. The continued support (both financially and through in-house development) is not only critical to the future of Lustre, but also forms the basis of the proven success of an open source file system. This is especially true for developments involving enterprise features, which in addition to the actual code, need to undergo extensive scale testing to prove reliability and resiliency. And scale testing (which, both from the point of view of actual scale and capacity, also includes long-term stability testing) is an expensive proposition. While some early adapters are willing to work with code that may not be fully vetted, the commitment of vendors such as Seagate to support extremely large installations will benefit the entire community. And as the list of high-profile users keeps growing, the efforts in developing more enterprise features increases proportionally. About Seagate Cloud Systems and Solutions Seagate is a world leader in storage solutions. Our new Cloud Systems and Solutions strategy brings innovation and an open approach to Intelligent Information Infrastructure to help all organizations manage their next-generation workloads with scale, performance, and cost aligned to business needs. Our portfolio includes integrated highperformance computing solutions; do-it-yourself components and engineered solutions; custom, modularized systems for original equipment manufacturers (OEMs); and the EVault line of cloud backup and restore, disaster recovery, and rapid archive serviceses. Next Step Find out more about the Seagate ClusterStor line of HPC storage systems by calling 1.800.SEAGATE or visiting www.seagate.com/hpc 6 http://opensfs.org/participants/ 7 European Open File Systems http://www.eofs.org seagate.com AMERICAS Seagate Technology LLC 10200 South De Anza Boulevard, Cupertino, California 95014, United States, 408-658-1000 ASIA/PACIFIC Seagate Singapore International Headquarters Pte. Ltd. 7000 Ang Mo Kio Avenue 5, Singapore 569877, 65-6485-3888 EUROPE, MIDDLE EAST AND AFRICA Seagate Technology SAS 16 18, rue du Dôme, 92100 Boulogne-Billancourt, France, 33 1-4186 10 00 2015 Seagate Technology LLC. All rights reserved. Printed in USA. Seagate, Seagate Technology and the Wave logo are registered trademarks of Seagate Technology LLC in the United States and/or other countries. QuietStep is either a trademark or registered trademark of Seagate Technology LLC or one of its affiliated companies in the United States and/or other countries. All other trademarks or registered trademarks are the property of their respective owners. When referring to drive capacity, one gigabyte, or GB, equals one billion bytes and one terabyte, or TB, equals one trillion bytes. Your computer s operating system may use a different standard of measurement and report a lower capacity. In addition, some of the listed capacity is used for formatting and other functions, and thus will not be available for data storage. Actual data rates may vary depending on operating environment and other factors. Seagate reserves the right to change, without notice, product offerings or specifications. WP_Inside_ LustreHSM_US January 2015