The IBM Storage System N series

Transcription

1 Draft Document for Review December 6, :59 am SG The IBM Storage System N series Fundamentals of Data ONTAP Snapshot Explained IBM Network Attached Storage (NAS) Alex Osuna Bruce Clarke Kurt Kiefer Dirk Peitzmann ibm.com/redbooks

2

3 Draft Document for Review December 6, :59 am 7129edno.fm International Technical Support Organization The IBM TotalStorage Network Attached Storage N Series September 2005 SG

4 7129edno.fm Draft Document for Review December 6, :59 am Note: Before using this information and the product it supports, read the information in Notices on page xxvii. First Edition (November 2005) This edition applies to the System Storage N series products as of publication release date. This document created or updated on December 6, Copyright International Business Machines Corporation All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

5 Draft Document for Review December 6, :59 am 7129TOC.fm Contents Figures xi Tables xix Examples xxi Preface xxiii The team that wrote this redbook xxiii Become a published author xxv Comments welcome xxv Notices xxvii Trademarks xxviii Part 1. Introduction to System Storage IBM N Series Chapter 1. Introduction to System Storage N Series System Storage N-series Hardware Introduction TotalStorage N EXP Common components The N3700 product highlights are as follows N Application Software Overview Software Included in Base System Optional Software Storage Architecture Overview Direct Attached Storage (DAS) Network Attached Storage (NAS) Storage Area Network NAS Gateway LAN Basics Local Area Networks File systems and I/O Network file system protocols Understanding I/O Network Attached Storage (NAS) File servers Designated Network Attached Storage NAS uses File I/O NAS benefits Other NAS considerations Total cost of ownership Industry standards Part 2. N series Systems Concepts Chapter 2. Introduction to Data ONTAP DATA ONTAP is a light weight microkernel The Data ONTAP approach Copyright IBM Corp All rights reserved. iii

6 7129TOC.fm Draft Document for Review December 6, :59 am 2.3 Benefits of Data ONTAP Data ONTAP architecture Data ONTAP startup Chapter 3. Write Anywhere File Layout(WAFL) Introduction WAFL Implementation Overview Meta-Data Lives in Files Tree of Blocks File System Consistency and Non-Volatile RAM Avoiding file system consistency checking Consistency Points and NVRAM Understanding NVLOG forwarding High Performance NAS processing Write Allocation Conclusion Chapter 4. N Series Data Protection with RAID DP Introduction What Is the Need for RAID-DP? The Effect of Modern Larger Disk Sizes on RAID Protection Schemes with Single-Parity RAID Using Larger Disks RAID-DP Data Protection How RAID-DP Works RAID-DP with Double Parity How RAID-DP Works RAID4 Horizontal Row Parity Adding RAID-DP Double-Parity Stripes RAID-DP Reconstruction RAID-DP Operation Summary RAID-DP Overview Protection Levels with RAID-DP Larger versus smaller RAID groups Advantages of large RAID groups Advantages of small RAID groups Hot spare disks Disk failure with a hot spare disk Chapter 5. SnapShots Introduction To Snapshots High level Snapshot process Understanding Snapshots in detail Snapshot Data Structures And Algorithms The Block-Map File Creating a Snapshot Chapter 6. Recovery with SnapRestore Overview Key Features Examples: SnapRestore for Databases SnapRestore for Databases Example About SnapRestore iv The IBM TotalStorage Network Attached Storage N Series

7 Draft Document for Review December 6, :59 am 7129TOC.fm How SnapRestore Works What SnapRestore Reverts Applications of SnapRestore SnapRestore in detail Part 3. Software Technical Description Chapter 7. Introduction to FLEXVOL Improved Performance FlexVol Improves Utilization Capacity Guarantees in FlexVol Flexible Capacity Planning Chapter 8. A Thorough Introduction to FlexClone Volumes Introduction How FlexClone Volumes Work Practical Applications of FlexClone Technology FlexClone Performance FlexClone Summary Chapter 9. An Introduction to FlexCache Volumes Introduction What Are FlexCache Volumes? Benefits of FlexCache Volumes Where Can FlexCache Volumes Be Used? Acceleration of Compute Farms Software Distribution to Remote Locations How Do FlexCache Volumes Work? Caching Granularity Cache Consistency Attribute Cache Time outs Delegation Write Operation Proxy Cache Hits and Misses Space Management File Locking Conclusion Chapter 10. Data Permanence with SnapLock Features Two forms of SnapLock Licensing SnapLock Connectivity How SnapLock works WORM and SnapLock Replicating SnapLock volumes How easy is it to use SnapLock High Availability Summary Chapter 11. ENABLING RAPID RECOVERY WITH SNAPVAULT Benefits How does SnapVault Work What Gets Backed Up and When Contents v

8 7129TOC.fm Draft Document for Review December 6, :59 am Incremental Updates Scheduling/Retention Policy Snapshot Copies SnapVault Detail Disaster Recovery with SnapVault Protecting Backup Data with SnapMirror Comparing SnapMirror and SnapVault Remote Solution Example Chapter 12. LockVault explained LockVault uses Benefits SnapVault, SnapLock and LockVault working together Files transferred log files Retention capabilities Chapter 13. Overview of SnapDrive The Benefits of SnapDrive SnapDrive for Windows Software Components Windows Device Manager Dynamic File System Expansion Volumes, RAID Groups, and Virtual Disks SnapManager for Microsoft Exchange SnapDrive for Unix How SnapDrive for UNIX works SnapDrive for UNIX and logical volumes Flexible Networked Storage Summary Chapter 14. SnapMirror Introduction to SnapMirror SnapMirror Defined The Three Modes of SnapMirror Asynchronous Mode Synchronous Mode Semi-Synchronous Mode SnapMirror Applications Implications for Synchronous and Asynchronous SnapMirror Volume Capacity and SnapMirror Guarantees in a SnapMirror Deployment SNAP Mirror Detail Isolate testing from production Cascading Mirrors Cascading Replication Example Disaster Recovery Understanding the Performance Impact of Synchronous and Semi-synchronous Modes CPU Impact in Synchronous and Semi-synchronous Modes Network Bandwidth Considerations System Performance on the Secondary Storage Device System Workload on the Secondary Storage Device Application of SyncMirror Modes vi The IBM TotalStorage Network Attached Storage N Series

9 Draft Document for Review December 6, :59 am 7129TOC.fm Chapter 15. SyncMirror Advantages of SyncMirror Assuring reliable enterprise data availability with SyncMirror Business Continuance with SyncMirror Part 4. Filer Access Methods Chapter 16. Cluster Failover Sample level 2 n.n chapter heading (Head 1) new page Sample level 2 n.n chapter heading (Head 2) Sample level 3 n.n.n chapter heading (Head 3) Chapter 17. MULTIPROTOCOL DATA ACCESS: NFS, CIFS, AND HTTP File System Permissions UNIX File Permissions (UFS) NTFS File Permissions NTFS Access Modes File Service for Heterogeneous Environments with NFS and CIFS NFS CIFS (SMB) NFS vs. CIFS Mixing NFS and CIFS Part 5. Initial installation and setup Chapter 18. Single-node setup Planning the Implementation Planning Worksheets System Storage N Series disk handling basics Hardware setup Before you begin Tools and equipment The N3700 A10: Filerhead Installation and initial configuration N3700 Initial Setup Web Logon to the Filer Chapter 19. Cluster Setup Planning the implementation Planning Worksheets Disk ownership Hardware setup Before you begin Tools and equipment The N3700 Model A20: - Filerhead Initial configuration N3700 Initial setup Web logon to the Filer Chapter 20. Subsequent Setup Subsequent configuration Documenting a plan Administration Methods Timezone settings Date settings Contents vii

10 7129TOC.fm Draft Document for Review December 6, :59 am Verification of the installation Changing CIFS s settings Part 6. Client Side Attachment Chapter 1. Client attachment AIX Clients Check IP connection between client and N series filer Listing available NFS shares Mounting a Volume Part 1. N series Filer administration Chapter 2. N series Filer Administration Administration Methods Starting, stopping and reboot of the N series system Starting the N series system Stopping the N series system Rebooting the system Checking the Data ONTAP software version Updating Data ONTAP software Storage Management Replacing a disk Adding new disks Aggregates N series Volumes Qtrees Shares Cluster Management Checking cluster status Takeover Giveback Managing snapshots Part 2. Sizing Part 3. Appendices Chapter 3. Pre-installation Planning Primary issues affecting planning Performance and throughput Capacity planning Effects of Optional features Future expansion Application considerations Backup and Recovery Resiliency to Failure Summary Appendix A. Setup Worksheets and Cabling Diagrams Planing and Implementation Worksheets Single node configuration Initial Setup Worksheet Single Node Configuration (N A10) Cluster configuration Initial Setup Worksheet Cluster Configuration (N A20) viii The IBM TotalStorage Network Attached Storage N Series

11 Draft Document for Review December 6, :59 am 7129TOC.fm Disk Ownership Worksheet Matching parameters in clustered environments Cabling Cabling the N3700 Model A10 to EXP600s Cabling the N3700 Model A20 to EXP600s Appendix B. LAN Basics Local Area Networks Open Systems Interconnection (OSI) model Device driver and hardware layer Internet Protocol layer The transport layer Application layer Protocol suites Appendix C. Additional material Locating the Web material Using the Web material System requirements for downloading the Web material How to use the Web material Abbreviations and acronyms Related publications IBM Redbooks Other publications Online resources How to get IBM Redbooks Help from IBM Index Contents ix

12 7129TOC.fm Draft Document for Review December 6, :59 am x The IBM TotalStorage Network Attached Storage N Series

13 Draft Document for Review December 6, :59 am 7129LOF.fm Figures 0-1 From Top, Dirk Peitzmann, Alex Osuna, From Bottom, Bruce Clarke, Kirk Kiefer. xxiv 1-1 N CPU Tray module front view CPU tray module showing Battery backup for memory Bottom of CPU tray module showing Compact Flash card External ports on CPU tray module Schematic of Arbitrated loop vs. Switch hub External ports on ESH2 module Rear view of N Fault codes for power supply N Network Attached Storage NAS gateway The role of NAS in your storage network NAS devices use File I/O Features and Benefits Data ONTAP commands Base Software with Data ONTAP Additional Software included in Data ONTAP Data ONTAP architecture Boot Sequence WAFL overhead WAFL advantages WAFL Helps Maximize Raid performance WAFL: Write Anywhere File Layout The WAFL file system is a tree of blocks with the root inode, which describes the inode file, at the top, and meta-data files and regular files underneath A more detailed view of WAFL's tree of blocks Write Requests High performance: NVRAM aware OS = WAFL NVRAM operation N3700 node Minimal seeks and no hotspot RAID-DP Raid-DP horizontal row parity approach Adding Raid-DP double parity stripes The rest of the data added for each block and corresponding row and diagonal parity stripes created Double Disk Failure data having been recovered onto an available hot spare disk reconstructing the single missing horizontal block from row parity recreating the missing diagonal block in the gray diagonal stripe recreating the one missing horizontal block in the first column Recreation of data in C stripe recreation of the missing horizontal block from row parity, RAID-DP can continues in the current chain on the D stripe, Copyright IBM Corp All rights reserved. xi

14 7129LOF.fm Draft Document for Review December 6, :59 am all data recreated with RAID-DP RAID-DP initial factory setup Raid-DP protection SnapShot Features Step Step Step Step WAFL creates a Snapshot by duplicating the root inode that describes the inode file. WAFL avoids changing blocks in a Snapshot by writing new data to new locations on disk To write a block to a new location, the pointers in the block's ancestors must be updated, which requires them to be written to new locations as well The life cycle of a block-map file entry Software testing scenario Testing with SnapRestore First Snap Second Snap SnapShot Subsequent SnapShots Reversion to Snap Comparison of Reversion to Snap 3 Time State 4 and Active File System at Time State Flexible Volumes An aggregate consists of a pool of many disks from which space is allocated to volumes Performance Advantages Productivity enhancement Think of a FlexClone volume as a transparent writable layer in front of a Snapshot FlexClone basics FlexClone Operation WAFL creates a Snapshot copy by duplicating the root. WAFL avoids changing blocks in a Snapshot copy by writing new data to new locations on disk FlexCache Caching Model FlexVol, FlexCache Remote Office usage of FlexCache Accelerating compute farms: read off loading using FlexCache Software distribution to remote offices: reduce latency using FlexCache Sparse Files SnapLock Creation Displaying regular and SnapLock Volumes Basic SnapVault Backup Flow Diagram SnapVault Solutions Traditional Backup Remote Office Solution Disk Backup Unify Backup and Compliance Compliance drives and requirements Compliance offerings Remote capabilities Unstructured data SnapLock and LockVault compared xii The IBM TotalStorage Network Attached Storage N Series

15 Draft Document for Review December 6, :59 am 7129LOF.fm 13-1 Computer management SnapDrive Microsoft Disk Manager Disk Expansion Dynamic Expansion Unix SnapDrives N series LUNS Baseline creation Single Path SnapMirror Data Replication for Warm backup/offload SnapMirror Detail SnapMirror Internals Consistent SnapMirror Snap C consistency Isolate Testing from Production Cascading Mirrors Cascading Replication Example Disaster Recovery Example Synchronous Mirroring Clustered SyncMirror single storage system sync mirror between volume X s Tiers of Disaster Recovery SyncMirror Business Continuity with SyncMirror Network Attached Storage and SAN protocols Multiprotocol N series Filer Sample configuration with four labeled shelves Drive addressing and bay numbers Sample Device ID identification SES disks Aggregates - Example RAID-DP and RAID N3700 Model A10 back side N3700 CPU Module Setting the Shelf ID Setting the 1Gb/2Gb switch Connect console cable Connecting Ethernet, Console Cable, and Fibre Channel terminator No EXP600 attached EXP 600 attached Connecting EXP600 shelves Connecting additional EXP600 shelves Initial Setup: Host and network definitions Initial Setup: WINS, CIFS and Filer settings Initial setup: Active Directory, Domain Controller or Workgroup settings CLI Output: Sysconfig -d System Storage N series Web logon System Storage N series Web interface Starting the Setup Wizard Setup Wizard: Basic settings Setup Wizard: , Location and Administrative Host Setup Wizard - DNS, NIS, Gateway settings Setup Wizard - IP configuration, MAC addresses Setup Wizard - Protocol configuration (W2000) Figures xiii

16 7129LOF.fm Draft Document for Review December 6, :59 am Setup Wizard: Confirmation Setup Wizard -Status Message Example disk ownership Node A and Node B FilerView: Manage Disks N3700 Model A20 back panel N3700 CPU module Setting the Shelf ID Setting the 1Gb/2Gb switch Connecting Ethernet, console cables and Fibre Channel terminator No EXP600 attached EXP600 attached Connecting EXP600 shelves connecting additional EXP600 shelves Initial Setup: Host and network definitions Initial Setup: WINS, CIFS and user authentication settings Initial Setup: Active Directory, Domain Controller, and Workgroup settings Command line output: sysconfig -r Command line: cf enable Command line: cf status Command line: cf monitor Cluster status after takeover Cluster status after giveback System Storage N series Web logon System Storage N series Web Interface Starting the Setup Wizard Setup Wizard - basic settings Setup Wizard - , Location and Admin Host Setup Wizard - DNS, NIS, Gateway settings Setup Wizard - IP configuration, MAC addresses Setup Wizard - Protocol configuration (W2000) Setup Wizard = Confirmation Setup Wizard -Status Message N series Web administration access FilerView administration window DataFabric Manager Telnet connection SSH connection FilerView Use Command Line option The help and? commands The man command Mapping the administration share Edit files via mapped administration share Sample /etc/rc file Change the timezone using the command line interface Change timezone using FilerView Set date and time with FilerView Ping command from a Windows system: responds Ping command from a Linux system: no response Ifconfig command Exportfs command Matching parameters in clustered environments Cifs setup AIX volume xiv The IBM TotalStorage Network Attached Storage N Series

17 Draft Document for Review December 6, :59 am 7129LOF.fm 1-2 ping ip_address ping hostname Section of our /etc/hosts file The showmount command Showmount -a lists all mounted volumes for a specific Filer Create mount point and list directory info Mount remote file system Verify mount on the client Show remote mounts of a particular Filer Smitty mountfs Message File system successfully mounted smitty mount Smitty mount List all mounted volumes via smitty The help and? commands The man command Boot screen: Press CTRL-C for special boot menu Boot menu List open CIFS sessions Cifs terminate -t Cifs terminate command for a single workstation Shutdown messages on cifs clients Cifs restart Check, if cifs is running on the Filer Halt with command line interface (serial console) Startup the filer at the CFE prompt Halt with FilerView GUI Confirmation on halting the filer with FilerView CIFS shutdown messages Reboot with command line interface Reboot with FilerView FilerView reboot Confirmation and Alert messages check Data ONTAP version FilerView About Data ONTAP Window Finding failed disks Raid.timeout setting Sysconfig -r locate the disk Disk fail without -i option Disk failure messages Sysconfig -r shows disk v0.41 as new data disk aggregate Sysconfig -r shows Broken Disks (failed disks) Locate spare disks wit sysconfig -r FilerView: List spare disk Removing a spare disk Disk drive release mechanism Remove disk drive from bay Insert Disk FilerView: Add Aggregates Create Aggregates Wizard: Add Aggregate Create Aggregates Wizard: Aggregate Name and RAID level Create Aggregates Wizard: Size of Raid Groups Create Aggregates Wizard: Disk Selection Create Aggregates Wizard: Disks to Add Figures xv

18 7129LOF.fm Draft Document for Review December 6, :59 am 2-40 Create Aggregates Wizard: Commit changes Create Aggregates Wizard: Aggregate successfully created Aggregates Manage: New created Aggregate Aggregates Manage: Details of new created Aggregate Disks zeroing The aggr create command Aggr status command: lists existing Aggregates Create an Aggregate Aggr status command: lists new Aggregate and existing Aggregates Sysconfig -r lists raid groups and disks for all Aggregates Aggr status lists RAID types for existing Aggregates Change raidtype with aggr options command Raid reconstruction in progress Aggregate status concerning disks and current tasks Raidtype changed: verification with aggr status command Verify number of disks after changing the raid type List Aggregates Set Aggregate offline List status: offline Aggregate aggr_itso Destroy Aggregate Verify Aggregates after destroying FilerView: Offline an Aggregate Offline Aggregate confirmation FilerView: aggregate changed to offline state FilerView: Destroy Aggregate Destroy Aggregate: Warning Message FilerView: aggregate destroyed successfully Information on volumes (vol status command) The df command provides file system information Vol create options Vol lang command List all volumes Verify Aggregates Create a new volume via command line interface Create a new volume via command line interface with -l option Show new volume FilerView: add volumes Add Volume Wizard Popup Add Volumes Wizard: Flexible or Traditional Volumes Add Volumes Wizard: Choose name and language for new volume Add VOlumes Wizard: choose Aggregate, size, and space guarantee Add VOlumes Wizard: Commit changes to Filer Add VOlumes Wizard: Volume successfully added Add VOlumes Wizard: New volume in Volumes-> Manage vol container command List status before off-lining the volume Offline the volume before destroying Vol status shows the offline volume Destroy a volume Exported volumes cf status and cf monitor FilerView: Cluster Manage cf status: check status xvi The IBM TotalStorage Network Attached Storage N Series

19 Draft Document for Review December 6, :59 am 7129LOF.fm 2-93 cf takeover command cf status: verification if takeover completed Cluster takeover initiated by FilerView FilerView: Cluster status cf giveback cf status: check for successful giveback FilerView: Initiate Giveback FilerView: Status after Refresh Cabling N3700 Model A10 to EXP shelves Cabling N3700 Model A20 to EXP600 shelves Bus topology Ring topology Star topology Comparing the Internet protocol suite with the OSI reference model Layering and Encapsulation Figures xvii

20 7129LOF.fm Draft Document for Review December 6, :59 am xviii The IBM TotalStorage Network Attached Storage N Series

21 Draft Document for Review December 6, :59 am 7129LOT.fm Tables Shelf IDs and associated Device IDs Disk ownership sample Disk ownership worksheet Matching parameters on cluster configurations Copyright IBM Corp All rights reserved. xix

22 7129LOT.fm Draft Document for Review December 6, :59 am xx The IBM TotalStorage Network Attached Storage N Series

24 7129LOE.fm Draft Document for Review December 6, :59 am xxii The IBM TotalStorage Network Attached Storage N Series

25 Draft Document for Review December 6, :59 am 7129pref.fm Preface Corporate workgroups, distributed enterprises and small- to medium-sized companies are increasingly seeking to network and consolidate storage to improve availability, share information, reduce costs, and protect and secure information. These organizations require enterprise-class solutions capable of addressing immediate storage needs cost-effectively, while providing an upgrade path for future requirements. Ideally, IT managers would like a maximum degree of flexibility to design the architecture that best supports the requirements of multiple types of data and a broad range of applications. The IBM System Storage N series is designed to meet these requirements. The System Storage N series offers an excellent solution for a broad range of deployment scenarios. The N series supports Ethernet environments, enabling economical NAS and iscsi deployments. The N series system functions as an unification engine, which is designed to allow you to simultaneously serve both file and block-level data across a single network demanding procedures that for some solutions require multiple, separately managed systems. The flexibility of the N series allows it to address the storage needs of a wide range of organizations, including distributed enterprises and data centers for midrange enterprises. The N series also supports sites with computer and data-intensive enterprise applications such as database, data warehousing, workgroup collaboration and messaging. This redbook will help you understand the basic hardware and software features of the System Storage N series. In addition topics such as installation, setup and administration from both the N series filer itself and the clients will be discussed. The team that wrote this redbook This redbook was produced by a team of specialists from around the world working at the International Technical Support Organization, San Jose Center. Copyright IBM Corp All rights reserved. xxiii

26 7129pref.fm Draft Document for Review December 6, :59 am IBM software group, Tivoli Storage Manager IBM Corporation Figure 0-1 From Top, Dirk Peitzmann, Alex Osuna, From Bottom, Bruce Clarke, Kirk Kiefer Alex Osuna is a Project Leader at the International Technical Support Organization, San Jose Center. He has been with IBM 24 years and has 26 years in the I/T industry. Of those 26 years Alex has worked extensively in storage for 21 years in service, support, planning, early ship programs, performance analysis, education, published flashes and provided pre-sales and post-sales support. Bruce Clarke is Employee #7 at Network Appliance and is currently Manages the Network Appliance Technical Marketing team that focuses on products and technologies created by Network Appliance Engineering. This focus takes the form of analysis and engineering that leads to the development of technical white papers, best practice documentation, demonstrations, presentations both given to customers and provided to our technical field organizations, etc. Subject matter experts on my team (located in Sunnyvale and in RTP) participate in engineering product definitions and reviews as well as customer escalation activities in their area of expertise Kurt Kiefer is an IT Specialist within Techline located in Greenock, UK. He has 5 years of experience at Techline currently working with xseries and has worked with IBM for 7 years. His areas of expertise include DCV, e1350, SAP and Storage. Dirk Peitzmann Dirk Peitzmann is a Senior IT Specialist with IBM Systems Sales in Munich, Germany. He has ten years of experience providing technical presales and postsales solutions for IBM pseries (RS/6000 ) and IBM System Storage disk, SAN, and NAS. Dirk is a Certified Specialist pseries AIX System Administration, AIX System Support Specialist, Open Systems Storage Solutions, and IBM Tivoli Storage Manager Consultant. He holds a Diploma Ingenieur (FH) degree in Computer Sciences from the University of Applied Science in Isny, Germany. Thanks to the following people for their contributions to this project: xxiv The IBM TotalStorage Network Attached Storage N Series

27 Draft Document for Review December 6, :59 am 7129pref.fm Roland Tretau International Technical Support Organization, San Jose Center Network Appliance Inc. Dave Hitz James Lau Michael Malcolm Chris Lueth Jim Lanson Michael J. Marchi Darrin Chapman Christian D. Odhner Sandeep Cariapa Nicholas Wilhelm-Olsen Andy Watson Lisa Dorr IBM Technical Support Marketing Tom Beglin IBM Development John Foley IBM Marketing Wolfgang Singer Member of the Technical Experts Council (TEC) Become a published author Join us for a two- to six-week residency program! Help write an IBM Redbook dealing with specific products or solutions, while getting hands-on experience with leading-edge technologies. You'll team with IBM technical professionals, Business Partners and/or customers. Your efforts will help increase product acceptance and customer satisfaction. As a bonus, you'll develop a network of contacts in IBM development labs, and increase your productivity and marketability. Find out more about the residency program, browse the residency index, and apply online at: ibm.com/redbooks/residencies.html Comments welcome Your comments are important to us! We want our Redbooks to be as helpful as possible. Send us your comments about this or other Redbooks in one of the following ways: Use the online Contact us review redbook form found at: ibm.com/redbooks Send your comments in an to: Preface xxv

28 7129pref.fm Draft Document for Review December 6, :59 am Mail your comments to: IBM Corporation, International Technical Support Organization Dept. QXXE Building 80-E2 650 Harry Road San Jose, California xxvi The IBM TotalStorage Network Attached Storage N Series

29 Draft Document for Review December 6, :59 am 7129spec.fm Notices This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-ibm product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-ibm Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-ibm products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-ibm products. Questions on the capabilities of non-ibm products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrates programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy, modify, and distribute these sample programs in any form without payment to IBM for the purposes of developing, using, marketing, or distributing application programs conforming to IBM's application programming interfaces. Copyright IBM Corp All rights reserved. xxvii

30 7129spec.fm Draft Document for Review December 6, :59 am Trademarks The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: Eserver Eserver Redbooks (logo) pseries xseries AIX DB2 DFS ESCON FICON Hummingbird IBM Maestro Redbooks RS/6000 SLC Tivoli TotalStorage The following terms are trademarks of other companies: VFM, Virtual File Manager, Snapshot, SnapDrive, SecureAdmin, Data ONTAP, Network Appliance, WAFL, SyncMirror, SnapVault, SnapRestore, SnapMover, SnapMirror, SnapManager, FilerView, DataFabric, NetApp, NetApp, and NetApp logo are trademarks or registered trademarks of NetApp Corporation or its subsidiaries in the United States, other countries, or both. Java, JavaScript, Solaris, Sun, Sun Microsystems, SLC, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Active Directory, Microsoft Internet Explorer, Microsoft, Windows NT, Windows Server, Windows, Win32, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Intel, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. xxviii The IBM TotalStorage Network Attached Storage N Series

31 Draft Document for Review December 6, :59 am 7129p01.fm Part 1 to Part 1 Introduction System Storage IBM N Series In this part we will discuss the System Storage N series System Storage and give a introduction to it s features of: Fast data access Extremely low maintenance requirements The integration of storage and storage processing into a single unit, facilitating affordable network deployments. Integrated I/O High availability via clustering Fibre Channel disk drives System Storage N series models are designed to integrate easily into existing IT environments to deliver unified storage for organizations with NAS, iscsi, or combined environments, making enterprise-level storage a realistic goal for company sites regardless of size or staffing. Copyright IBM Corp All rights reserved. 1

32 7129p01.fm Draft Document for Review December 6, :59 am 2 The IBM TotalStorage Network Attached Storage N Series

33 Draft Document for Review December 6, :59 am 7129ch01_chap1.fm 1 Chapter 1. Introduction to System Storage N Series In this chapter we will introduce the System Storage N Series and describe the hardware and software. The reader will also be introduced to Storage architectures, File systems and Local Area Networks concepts. IBM first introduced integrated NAS disk appliances in October 2000 with the IBM ~ xseries 150 range of network attached storage with the NAS100, NAS200 NAS300 and NAS500 following. This evolution has continued with the introduction of the IBM System Storage N Series. This system is an entry-level IP attached storage product that acts as a dual dialect platform and provides both NAS and iscsi solutions for customers wishing to access shared storage from Windows, UNIX and Linux environments. The IBM Storage System N Series provides a range of reliable, scaleable storage solutions for a variety of storage requirements. This is achieved by using network access protocols such as NFS, CIFS, and HTTP as well as Storage Area technologies such as ISCSI and Fibre Channel. Utilizing built-in RAID technologies (either RAID-DP or RAID4 which will be fully described in a later chapter) all data is well protected with options to add additional protection through mirroring, replication, snapshots and backup. These storage systems are also characterized by simple management interfaces that make installation, administrating and troubleshooting uncomplicated and straightforward. The IBM System Storage N Series is designed from the ground up as a standalone storage appliance. The appliance concept has caught on with many specialized tasks previously relegated to general purpose platforms now performed by specialized appliances. By focusing on effectively addressing a smaller, more specific need the appliance can be made to do its job faster, in a simpler manner, with greater reliability than a general purpose platform could deliver. Advantages of using this type of flexible storage solution include: The ability to tune the storage environment to a specific application while maintaining flexibility to increase, decrease, or change access methods with a minimum of disruption. Copyright IBM Corp All rights reserved. 3

34 7129ch01_chap1.fm Draft Document for Review December 6, :59 am Capability to react easily and quickly to changing storage requirements. If additional storage is required, being able to expand it quickly and non-disruptively is needed. When existing storage exists but is deployed incorrectly, the capability to reallocate available storage from one application to another quickly and simply. To maintain availability and productivity during upgrades. If outages are required, keeping them to the shortest time possible. Create effortless backup/recovery solutions that operate commonly across all data access methods. File and block level services in a single system, helping to simply your infrastructure. This chapter brings together practical material that can be used by IT managers and storage administrators to practically deploy this sophisticated storage solution in a way consistent with the specific storage requirements. Information is provided on planning, installing, and administering the N series System Storage. 1.1 System Storage N-series Hardware Introduction TotalStorage N3700. The N3700 Filer is a 3U solution designed to provide NAS and iscsi functionality for entry to mid-range environments. The basic N3700 offering is a single-node model A10, which is upgradeable to the dual-node model A20 and requires no additional rack space. The dual-node, clustered A20, is designed to support fail over and failback functions to maximize reliability. The N3700 filer can support 14 internal hot-plug disk drives with scalability being provided through attachment to up to three 3U EXP600 expansion units, each with a maximum of 14 drives.the N3700 also has the capability to connect to a Fibre Channel tape for backup. A list of supported Tape drives can be found at refer to the TotalStorage N series interoperability matrix. Figure 1-1 N The IBM TotalStorage Network Attached Storage N Series

35 Draft Document for Review December 6, :59 am 7129ch01_chap1.fm The N3700 and EXP600 share a common 3U chassis, with the type of controller defining the model. Figure 1-2 on page 5 shows a single control unit. The single node A10 uses a single control unit with dual node clustered A20 using two control units. Note: The only upgrade for an N3700 model A10 is to a model A20. This upgrade requires no additional rack space. The difference between an EXP600 and the N3700 is the presence of the CPU module as shown Figure 1-2 on page 5. The EXP600 has a shelf controller in place of the CPU module. Battery Backup Memory Figure 1-2 CPU Tray module front view The N3700 is based around a MIPS dual core processor. It has1gb of system memory of which 128MB is defined as non-volatile due to having a battery back up. The battery is a 3 cell Li-Ion Figure 1-3 on page 6 Note: The NVRAM is a battery backed up portion of the main system memory Chapter 1. Introduction to System Storage N Series 5

36 7129ch01_chap1.fm Draft Document for Review December 6, :59 am Figure 1-3 CPU tray module showing Battery backup for memory There is a 256MB compact flash card located on the bottom of the CPU tray module See Figure 1-4. It contains a copy of the Data ONTAP operating system along with firmware. The operating system is also stored on each disk drive. Figure 1-4 Bottom of CPU tray module showing Compact Flash card Rear ports on the N3700 can be seen in Figure 1-5 on page 7. Each CPU tray module has two integrated 2Gbps Fibre Channel ports. Both of these ports are initially configured in initiator mode and do not use Small Form factor Plugable SFP s. The first port, channel C, is optical and intended for direct or SAN attachment to a tape library. A standard LC-LC short wave optic cable should be used for this. 6 The IBM TotalStorage Network Attached Storage N Series

37 Draft Document for Review December 6, :59 am 7129ch01_chap1.fm The second port, channel B, is copper and is exclusively used for connection of expansion the EXP600. A special copper cable (option X6531-C) is used for this connection. Figure 1-5 External ports on CPU tray module The CPU tray module also contains two on board 10/100/1000 Mb copper Ethernet port. Each port has two LED lights, for activity and speed. The final connection allows connection of an ASCII terminal via an RJ45 to DB-9 cable EXP600 The EXP600 is identical to the N3700 chassis except that the slot holding the CPU tray is replaced with an Electronically Switched Hub (ESH2). ESH2 provides a point to point connection to the drives rather than the traditional arbitrated loop. This is illustrated Figure 1-6. Storage Software Strategy and Roadmap IBM Corporation on demand operating environment Figure 1-6 Schematic of Arbitrated loop vs. Switch hub Switched Hub architecture has the benefit of additional availability, boosted performance in high I/O environments, and more powerful diagnostic abilities. Figure 1-7 shows the ESH2 Chapter 1. Introduction to System Storage N Series 7

38 7129ch01_chap1.fm Draft Document for Review December 6, :59 am module. There are two Fibre Channel ports on each module. The PS/2 port is for IBM service only and provides no functionality. The units have LED status lights which indicate speed and fault status and are hot swapable allowing maximum availability. Figure 1-7 External ports on ESH2 module Common components Both the N3700 and EXP600 have redundant power supplies and cooling fans as standard. The two power supplies are located at the back of the chassis with the cooling fans integrated into them. Note: If a power supply fails or is turned off, while the other power supply is still providing DC power then both cooling fans will continue to operate. Power Supply Power Supply LED s 21 Figure 1-8 Rear view of N The IBM TotalStorage Network Attached Storage N Series

39 Draft Document for Review December 6, :59 am 7129ch01_chap1.fm LED status lights give basic power supply fault conditions Figure 1-9 on page 9 Storage Software Strategy and Roadmap Match Power supply LEDs with the following possible Conditions and perform action from key section. Figure 1-9 Fault codes for power supply Chapter 1. Introduction to System Storage N Series 9

40 7129ch01_chap1.fm Draft Document for Review December 6, :59 am 1.2 The N3700 product highlights are as follows Operating system Data ONTAP Standard Software Features Integrated automatic RAID manager Snapshot Fastboot telnet alerts NIS DNS SNMP Filerview NDMP SecureAdmin FlexVol FlexCache Network Protocol Support NFS V2/V3/V4 over TCP or UDP PCNFSD V1/V2 for (PC) NFS client authentication Microsoft CIFS VLD HTTP1.0 HTTP 1.1 Virtual Hosts SAN Protocol Support Fibre Channel Protocol (FCP) for SCSI;Fabric attached and direct attached iscsi Licensed Software Products FlexClone SnapManager for Microsoft Exchange 2000 & 2003 SnapManager for Microsoft SQL Server SnapManager for Oracle SnapMirror SnapVault SnapRestore SnapDrive 10 The IBM TotalStorage Network Attached Storage N Series

41 Draft Document for Review December 6, :59 am 7129ch01_chap1.fm SnapLock Cluster Failover Multistore Virtual File Manager (VFM ) Data Fabric Manager SAN Manager Appliance Watch SnapValidator Hardware Features 3U integrated Filer 3U optional storage expansion shelf - up to three Redundant hot plug power supplies Redundant Cooling Integrated 10/100/1000 full duplex Ethernet 2 Integrated Fibre Channel adapters Compact flash Diagnostic LED s/ops Optional Hardware Second CPU tray supporting Cluster Failover RAID Group size Default/Minimum/Maximum RAID 4- Default 6 data+1 parity/ Min 1data+ 1 parity/ Max 13 data+ 1parity RAID-DP-Default 14 data+2 parity/ Min 1data+ 2 parity/ Max 26 data+ 2parity Disk Drive Capacity Supported (Fibre Channel) 72GB/144GB/300GB Table 1-1 Filer Specifications N3700 Model A10 N3700 Model A20 Maximum raw Capacity 16TB 16TB Maximum Number of Disks Volumes and RAID groups (Minimum/Maximum) 1/7 1/7 Maximum Volume/Aggregate 16TB 16TB ECC Memory 1GB 2GB Nonvolatile Memory 128MB 256MB Integrated I/O Ethernet 10/100/1000 copper Copper FC Adapter Optical FC Adapter (host attach SAN/TapeSAN) Chapter 1. Introduction to System Storage N Series 11

42 7129ch01_chap1.fm Draft Document for Review December 6, :59 am Filer Specifications N3700 Model A10 N3700 Model A20 Clustered Failover-Capable No (Requires upgrade to A20) Yes Table 1-2 Storage Expansion Disk Drive Storage System Disk Drive Storage Interface Power Supply/Cooling Fans EXP low-profile slots for FC disk drives Fibre Channel Arbitrated Loop (FC-AL) Dual, redundant, hot-plugable, intergrated power supply/fan assemblies 12 The IBM TotalStorage Network Attached Storage N Series

43 Draft Document for Review December 6, :59 am 7129ch01_chap1.fm 1.3 N5200 IBM TotalStorage IBM CONFIDENTIAL 2004 IBM Corporation. Figure 1-10 N5200 Chapter 1. Introduction to System Storage N Series 13

44 7129ch01_chap1.fm Draft Document for Review December 6, :59 am 1.4 Application Software Overview The following tables list briefly the software available with the N series System Storage. A detailed description of N series software is covered in Part 2, N series Systems Concepts on page 29 and Part 3, Software Technical Description on page Software Included in Base System Data ONTAP iscsi Operating system software that optimizes data serving and allows multiple protocol data access Allows block I/O access protocol over IP networks FTP SnapShot File Transfer Protocol (FTP), a standard Internet protocol is a simple way to exchange files between computers on the Internet Enables online backups, providing near instantaneous access to previous versions of data without requiring complete, separate copies FlexVol FlexVol creates multiple flexible volume on a large pool of disks. Dynamic, non-disruptive (thin) storage provisioning; space- and time-efficiency FlexCache Disk Sanitization FlexCache has the ability to distribute files to remote locations without the need for continuous hands-on management. Filers deployed in remote offices automatically replicate, store, and serve the files or file portions that are requested by remote users without the need for any replication software or scripts Disk sanitization is the process of physically obliterating data by overwriting disks with specified byte patterns or random data in a manner that prevents recovery of current data by any known recovery methods. This feature enables you to carry out disk sanitization by using three successive byte overwrite patterns per cycle and a default six cycles per operation FilerView A web-based administration tool that allows IT administrators to fully manage N3700 systems from remote locations. Simple and intuitive web-based single-appliance administration SnapMover Migrates data among N3700 clusters with no impact on data availability and no disruption to users AutoSupport AutoSupport is a sophisticated, event-driven logging agent featured in the Data ONTAP operating software and inside each N series system and continuously monitors the health of your system and issues alerts if a problem is detected. SecureAdmin SecureAdmin is a Data ONTAP module that enables authenticated, command-based administrative sessions between an administrative user and Data ONTAP over an intranet or the Internet. 14 The IBM TotalStorage Network Attached Storage N Series

45 Draft Document for Review December 6, :59 am 7129ch01_chap1.fm Optional Software Snapshot CIFS NFS Provides File System access for Microsoft Windows environments Provides File System access for Unix and Linux environments HTTP Hypertext Transfer Protocol allows a user to transfer displayable Web pages and related files Cluster Failover Ensures high data availability for business-critical requirements by eliminating a single point of failure. Must be ordered for A20 clustered configurations or upgrades from A10 to A20 Active-active pairing delivers even more nines to right of the decimal point FlexClone Designed to provide instant replication of data volumes/sets without requiring additional storage space at the time of creation Multistore Permits an enterprise to consolidate a large number of Windows, Linux or UNIX file servers onto a single storage system Many virtual filers on one physical appliance ease migration and multi-domain failover scenarios SnapLock Provides non-erasable and non-rewritable data protection that helps enable compliance with government and industry records retention regulations LockVault Designed to provide non-erasable and non-rewritable copies of Snapshot data to help meet regulatory compliance needs for maintaining backup copies of unstructured data. SnapMirror Remote mirroring software that provides automatic block-level incremental file system replication between sites. Available in synchronous, asynchronous and semi synchronous modes of operation SnapRestore SnapVault Allows rapid restoration of the file system to an earlier point in time, typically in only a few seconds Provide disk based backup for N3700 systems by periodically backing up a snapshot copy to another system Chapter 1. Introduction to System Storage N Series 15

46 7129ch01_chap1.fm Draft Document for Review December 6, :59 am 1.5 Storage Architecture Overview Direct Attached Storage (DAS) DAS has historically been the conventional way to attach local storage to a system. With a dedicated interface between the server and storage, frequently being SCSI, it communicates on a block level. The file system resides locally on the server and data blocks are accessed as required to complete file requests from the application Windows Client UNIX Client LAN HPUX Client AIX Sun Solaris Client IBM AIX Client LAN Server Server Server Application File System Application File System Application File System Disk Storage Disk Storage Disk Storage Network Attached Storage (NAS) Traditionally NAS systems differed from DAS and SAN with the client seeing storage at a file level with network file systems like NFS and CIFS being employed. However the introduction of iscsi has allowed storage to be presented at a block level. 16 The IBM TotalStorage Network Attached Storage N Series

47 Draft Document for Review December 6, :59 am 7129ch01_chap1.fm Windows Client UNIX Client LAN HPUX Client AIX Sun Solaris Client IBM AIX Client Server Application Server NAS Filer Application LAN File System Disk Storage Server Application Figure 1-11 Network Attached Storage Chapter 1. Introduction to System Storage N Series 17

48 7129ch01_chap1.fm Draft Document for Review December 6, :59 am Storage Area Network SAN is a dedicated network devoted to connecting servers to storage devices. SANs use specialized infrastructure and protocols such as Fibre Channel, FICON & ESCON. It operates on a block level, similar to DAS, with the file system residing on the server. SAN operates within a limited geography with distances of 10KM. Solutions are available for channel extension over these distances to a Windows Client UNIX Client LAN HPUX Client AIX Sun Solaris Client IBM AIX Client LAN Server Server Server Application Application Application File System File System File System SAN Storage Subsystem Back Data LUN Figure The IBM TotalStorage Network Attached Storage N Series

49 Draft Document for Review December 6, :59 am 7129ch01_chap1.fm NAS Gateway The NAS Gateway offers the same functionality as NAS appliances with file level and iscsi block level access but adds Fibre Channel functionality. This combination uniquely positions the NAS Gateways in todays open storage environments allowing flexibility, scalability and manageability Windows Client UNIX Client HPUX Client Sun Solaris Client IBM AIX Client Server NAS Fiiler Application LAN File System File System Data Server Server NAS Gateway Application Application File System Data SAN Back Data Figure 1-13 NAS gateway Chapter 1. Introduction to System Storage N Series 19

50 7129ch01_chap1.fm Draft Document for Review December 6, :59 am 1.6 LAN Basics Local Area Networks A Local Area Network (LAN) is simply the connection of two or more computers (nodes) to facilitate data and resource sharing. They proliferated from the mid-1980s to address the problem of islands of information which occurred with standalone computers within departments and enterprises. LANs typically reside in a single or multiple buildings confined to a limited geographic area which is spanned by connecting two or more LANs together to form a Wide Area Network (WAN). More detail on Local Area Networks can be found in Appendix B, LAN Basics on page File systems and I/O In this section we describe the most common file level protocols and attempt to untangle the confusion surrounding the various I/O concepts Network file system protocols The two most common file level protocols used to share files across networks are Network File System (NFS) for UNIX/Linux and Common Internet File System (CIFS) for Windows. Both are network based client/server protocols which enable hosts to share resources across a network using TCP/IP. Users manipulate shared files, directories, and devices such as printers, as if they were locally on or attached to the user s own computer. The IBM System Storage N3700 is designed to support both NFS and CIFS. Network File System (NFS) NFS servers make their file systems available to other systems in the network by exporting directories and files over the network. Once exported, an NFS client can then mount a remote file system from the exported directory location. Originally NFS controlled access by giving client-system level user authorization based on the assumption that a user who is authorized to the system must be trustworthy. Although this type of security was adequate for some environments, it was open to abuse by anyone who can access a UNIX system via the network. Recent revisions to the NFS specifications have tightened up these access rules and improved overall security. For directory and file level security, NFS uses the UNIX concept of file permissions with User (the owner s ID), Group (a set of users sharing a common ID), and Other (meaning all other user IDs). For every NFS request, the IDs are verified against the UNIX file permissions. Additionally, the access is generally granted to a specific set of NFS clients (e.g., all the workstations in the software engineering department) and these client credentials are also included in the security validation applied to each NFS request. NFS is a stateless service. This means that each NFS request is complete and does not depend on previous or upcoming requests. This means that network failures have minimal impact because the NFS client will simply keep trying to contact the server with its request until the service is returned. When the session is re-established the two can immediately continue to work together again. 20 The IBM TotalStorage Network Attached Storage N Series

51 Draft Document for Review December 6, :59 am 7129ch01_chap1.fm NFS handles file locking by providing an advisory lock to subsequent applications to inform them that the file is in use by another application. Other applications can decide if they want to abide by the lock request or not. This has the advantage of allowing any UNIX application to access any file at any time even if it is in use. The system relies on good neighbor responsibility which, though often convenient, clearly is not foolproof. This is avoided by using the optional Network Lock Manager (NLM). It provides file locking support to prevent multiple instances of open files NFS has gone through several upgrades since it original introduction. Version 4 is now in the process of adoption and fixes many of the limitations that previously existed. Access Control Lists are now supported along with generally stronger authentication protocols. The locking issues have further been strengthened. Several vendors are now shipping NFS Version 4 client solutions. The IBM N Series already provides server capabilities for NFS V4. Common Internet File System (CIFS) Another method used to share resources across a network uses CIFS, which is a protocol based on Microsoft s Server Message Block (SMB) protocol. Using CIFS, servers create file shares which are accessible by authorized clients. Clients subsequently connect to the server s shares to gain access to the resource. Security is controlled at both the user and share level. Client authentication information is sent to the server before the server will grant access. CIFS uses access control lists that are associated with the shares, directories, and files, and authentication is required for access. A session in CIFS is oriented and stateful. This means that both client and server share a history of what is happening during a session, and they are aware of the activities occurring. If there is a problem, and the session has to be re-initiated, a new authentication process must be completed. CIFS employs opportunistic locks (oplocks) to control file access. Depending on the type of locking mechanism required by the client, CIFS offers nodes the ability to cache read or write data from the file being accessed to improve network performance. Exclusive rights to the file prevents other nodes on the network from gaining access to that file until it is closed. During a CIFS session the lock manager has historical information concerning which client has opened the file, for what purpose, and in which sequence Understanding I/O A major source of confusion regarding NAS is the concept of File I/O versus Block I/O. We try to shed a little light on this subject here. Understanding the difference between these two forms of data access is crucial to realizing the potential benefits of any SAN-based or NAS-based solution and choosing the correct configuration for an N Series server. When a partition on a hard drive is under the control of an operating system (OS), the OS will format it. Formatting of the partition occurs when the OS lays a file system structure on the partition. This file system is what enables the OS to keep track of where it stores data. The file system is an addressing scheme the OS uses to map data on the partition. Now, when you want to get to a piece of data on that partition, you must request the data from the OS that controls it. For example, suppose that Windows 2000 formats a partition (or drive) and maps that partition to your system. Every time you request to open data on that partition, your request is processed by Windows Since there is a file system on the partition, it is accessed via File I/O. Additionally, you cannot request access to just the last 10 KB of a file. You must open the entire file, which is another reason that this method is referred to as File I/O. Chapter 1. Introduction to System Storage N Series 21

52 7129ch01_chap1.fm Draft Document for Review December 6, :59 am Block I/O (raw disk) is handled differently: There is no OS format done to lay out a file system on the partition. The addressing scheme that keeps up with where data is stored is provided by the application using the partition. An example of this would be DB2 using its tables to keep track of where data is located rather than letting the OS do that job. That is not to say that DB2 cannot use the OS to keep track of where files are stored. It is just more efficient, for the database to bypass the cost of requesting the OS to do that work. Using File I/O is like using an accountant. Accountants are good at keeping up with your money for you, but they charge you for that service. For your personal checkbook, you probably want to avoid that cost. On the other hand, for a corporation where many different kinds of requests are made, an accountant is a good idea. That way, checks are not written when they should not be. When sharing files across a network, something needs to control when writes can be done. The operating system fills this role. It does not allow multiple writes at the same time, even though many write requests are made. Databases are able to control this writing function on their own so in general they run faster by skipping the OS although this depends on the efficiency of the implementation of file system and database Examples of Applications using File I/O Lotus Notes Lotus Domino - Server Lotus Approach Power Point MS Word MS Excel Freelance Word Pro All File I/Os result at the lower layers in Block I/O commands. In other words, the iscsi devices are indirectly also supporting the File I/O applications. In this case it should be noted that the 'visibility' of the files is lost, since the iscsi device knows nothing about the 'files', but just of the 'raw I/O'. So in many cases NAS devices should be primarily considered for File I/O applications although the File I/O applications can also be supported by iscsi devices. Typical applications using Block I/O are: DB2 Oracle Microsoft Exchange Informix Video Streaming ERP applications Many Block I/O applications can be configured that they can also be run in File I/O mode. However, the main reason for running them in Block I/O mode is performance. When writing an application and using the operating systems File I/O, which is a higher layer protocol, the overhead is likely to be higher. By bypassing the operating systems File I/O the designers of these applications have a better control how the data is written/organized on disk. This can 22 The IBM TotalStorage Network Attached Storage N Series

53 Draft Document for Review December 6, :59 am 7129ch01_chap1.fm be compared somehow to program in Assembler language (where you exactly know what is happening when an Assembler instruction is executed) versus programming in PL/I or Visual Basic, which is much more comfortable to use than Assembler, but has a higher overhead. Assembler was still used after PL/I was available, in environments where storage and CPU utilization was at a premium. Under normal circumstances nobody will write application programs in assembler anymore, since the cost of writing in Assembler is much higher than the cost of the "wasted" storage and CPU power. A similar approach can be expected in the storage arena, where application developers will leave the lower layer functionality to the operating systems, especially when new storage technologies are coming along.. For a more in-depth study of these topics, refer to the redbook, IP Storage Networking: IBM NAS and iscsi Solutions, SG Network Attached Storage (NAS) Storage devices which optimize the concept of file sharing across the network have come to be known as Network Attached Storage (NAS). NAS solutions utilize the mature Ethernet IP network technology of the LAN. Data is sent to and from NAS devices over the LAN using TCP/IP and sometimes UDP/IP. By making storage devices LAN addressable, the storage is freed from its direct attachment to a specific server and any-to-any connectivity is facilitated using the LAN fabric. In principle, any user running any operating system can access files on the remote storage device. This is done by means of a common network access protocol, for example, NFS for UNIX servers, and CIFS for Windows servers. Network file sharing protocols have been developed so that all vendors must implement them the same way. This allows any vendor s operating system to access network files regardless of which vendor implemented the client or which vendor implemented the server. These standards are key to the success of NAS and provide customer s greater independence since they can choose platforms appropriate to a particular task free from the worry of how that task will access the required data. A storage device cannot just attach to a LAN. It needs intelligence to manage the transfer and the organization of data on the device. The intelligence is provided by a dedicated server to which the common storage is attached. It is important to understand this concept. NAS comprises a server, an operating system, plus storage which is shared across the network by many other servers and clients. So NAS is a device, rather than a network infrastructure, and shared storage is either internal to the NAS device or attached to it File servers Early NAS implementations in the early 1990s used a standard UNIX or NT server with NFS or CIFS software to operate as a remote file server. In such implementations, clients and other application servers access the files stored on the remote file server, as though the files are located on their local disks. The location of the file is transparent to the user. Chapter 1. Introduction to System Storage N Series 23

54 7129ch01_chap1.fm Draft Document for Review December 6, :59 am Several hundred users could work on information stored on the file server, each one unaware that the data is located on another system. The file server has to manage I/O requests accurately, queuing as necessary, fulfilling the request and returning the information to the correct requestor. The NAS server handles all aspects of security and lock management. If one user has the file open for updating, no one else can update the file until it is released. The file server keeps track of connected clients by means of their network IDs, addresses, user IDs, and so on Designated Network Attached Storage More recent developments use application specific, specialized, thin server configurations with customized operating systems, usually comprising a stripped down UNIX kernel, reduced Linux OS, a specialized Windows 2000 kernel, or a specialized AIX/UNIX system as with the System Storage NAS products. In these reduced operating systems, many of the server operating system functions are not supported. The objective is to improve performance and reduce costs by eliminating unnecessary functions normally found in the standard hardware and software. Some NAS implementations also employ specialized data mover engines and separate interface processors in efforts to further boost performance. These specialized file servers with a reduced OS are typically known as NAS systems, describing the concept of an application specific system. NAS products, like the System Storage N3700, typically come with pre-configured software and hardware, and with no monitor or keyboard for user access. This is commonly termed a headless system. A storage administrator accesses the systems and manages the disk resources from a remote console. One of the typical characteristics of a NAS product is its ability to be installed rapidly using minimal time and effort to configure the system. It is integrated seamlessly into the network as shown in Figure 1-14 on page 24. This approach makes NAS products especially attractive when lack of time and skills are elements in the decision process. IP Network Specialized NAS System Figure 1-14 The role of NAS in your storage network So, a NAS system is an easy-to-use device, which is designed for a specific function, such as serving files to be shared among multiple clients. It performs this task very well. It is important to recognize this fact when selecting a NAS solution. The NAS system is not a general purpose server, and should not be used (indeed, due to its specialized OS, probably cannot be used) for general purpose server tasks. However, it does provide a good solution for appropriately selected shared storage applications. 24 The IBM TotalStorage Network Attached Storage N Series

55 Draft Document for Review December 6, :59 am 7129ch01_chap1.fm NAS uses File I/O One of the key differences of a NAS disk device, compared to direct access storage (DAS) is that all I/O operations use file level I/O protocols. File I/O is a high level type of request that, in essence, specifies only the file to be accessed, but does not directly address the storage device. This is done later by other operating system functions in the remote NAS system. A File I/O request specifies the file and the offset into the file. For instance, the I/O may specify Go to byte 1000 in the file (as if the file was a set of contiguous bytes), and read the next 256 bytes beginning at that position. Unlike Block I/O, there is no awareness of a disk volume or disk sectors in a File I/O request. Inside the NAS system, the operating system keeps track of where files are located on disk. The OS issues a Block I/O request to the disks to fulfill the File I/O read and write requests it receives. Network access methods, NFS and CIFS, can only handle File I/O requests to the remote file system. I/O requests are packaged by the node initiating the I/O request into packets to move across the network. The remote NAS file system converts the request to Block I/O and reads or writes the data to the NAS disk storage. To return data to the requesting client application, the NAS system software re-packages the data in TCP/IP protocols to move it back across the network. This is illustrated in Figure 1-15 on page 25. IP network Application server File I/O IP protocol NAS System Application server directs File I/O request over the LAN to the remote file system in the NAS system File system in the NAS system initiates Block I/O to the NAS disk Figure 1-15 NAS devices use File I/O NAS benefits NAS offers a number of benefits that address some of the limitations of directly attached storage devices, and that overcome some of the complexities associated with SANs. Resource pooling A NAS product enables disk storage capacity to be consolidated and pooled on a shared network resource, at great distances from the clients and servers which will share it. Thus a NAS device can be configured as one or more file systems, each residing on specified disk volumes. All users accessing the same file system are assigned space within it on demand. This contrasts with individual DAS storage, when some users may have too little storage, and others may have too much. Chapter 1. Introduction to System Storage N Series 25

56 7129ch01_chap1.fm Draft Document for Review December 6, :59 am Consolidation of files onto a centralized NAS device can minimize the need to have multiple copies of files spread on distributed clients. Thus overall hardware costs can be reduced. NAS pooling can reduce the need to physically reassign capacity among users. The results can be lower overall costs through better utilization of the storage, lower management costs, increased flexibility, and increased control. Exploits existing infrastructure Because NAS utilizes the existing LAN infrastructure, there are minimal costs of implementation. Introducing a new network infrastructure, such as a Fibre Channel SAN, can incur significant hardware costs. In addition, new skills must be acquired, and a project of any size will need careful planning and monitoring to bring it to completion. Simple to implement Because NAS devices attach to mature, standard LAN implementations, and have standard LAN addresses, they are typically extremely easy to install, operate, and administer. This plug-and-play operation results in lower risk, ease of use, and fewer operator errors, all of which contributes to lower costs of ownership. Enhanced choice The storage decision is separated from the server decision, thus enabling the buyer to exercise more choice in selecting equipment to meet the business needs. Connectivity LAN implementation allows any-to-any connectivity across the network. NAS products may allow for concurrent attachment to multiple networks, thus supporting many users. Scalability NAS products can scale in capacity and performance within the allowed configuration limits of the individual system. However, this may be restricted by considerations such as LAN bandwidth constraints, and the need to avoid restricting other LAN traffic. Heterogeneous file sharing Remote file sharing is one of the basic functions of any NAS product. Multiple client systems can have access to the same file. Access control is serialized by NFS or CIFS. Heterogeneous file sharing may be enabled by the provision of translation facilities between NFS and CIFS, as with the N3700. Improved manageability By providing consolidated storage, which supports multiple application systems, storage management is centralized. This enables a storage administrator to manage more capacity on a system than typically would be possible for distributed, directly attached storage Other NAS considerations On the converse side of the storage network decision, you need to take into consideration the following factors regarding NAS solutions. 26 The IBM TotalStorage Network Attached Storage N Series

57 Draft Document for Review December 6, :59 am 7129ch01_chap1.fm Proliferation of NAS devices Pooling of NAS resources can only occur within the capacity of the individual NAS system. As a result, in order to scale for capacity and performance, there is a tendency to grow the number of individual NAS systems over time, which can increase hardware and management costs. Software overhead impacts performance As we explained earlier, TCP/IP is designed to bring data integrity to Ethernet-based networks by guaranteeing data movement from one place to another. The trade-off for reliability is a software intensive network design which requires significant processing overheads, which can consume more than 50% of available processor cycles when handling Ethernet connections. This is not normally an issue for applications such as Web-browsing, but it is a drawback for performance intensive storage applications. Consumption of LAN bandwidth Ethernet LANs are tuned to favor short burst transmissions for rapid response to messaging requests, rather than large continuous data transmissions. Significant overhead can be imposed to move large blocks of data over the LAN. The maximum packet size for Ethernet is 1518 bytes. A 10 MB file has to be segmented into more than 7000 individual packets. Each packet is sent separately to the NAS device by the Ethernet collision detect access method. As a result, network congestion may lead to reduced or variable performance. Data integrity The Ethernet protocols are designed for messaging applications, so data integrity is not of the highest priority. Data packets may be dropped without warning in a busy network, and have to be resent. Since it is up to the receiver to detect that a data packet has not arrived, and to request that it be resent, this can cause additional network traffic. With NFS file sharing there are some potential risks. Security controls can fairly easily be by-passed. This may be a concern for certain applications. Also the NFS file locking mechanism is not foolproof, so that multiple concurrent updates could occur in some situations. Impact of backup/restore applications One of the potential downsides of NAS is the consumption of substantial amounts of LAN bandwidth during backup and restore operations, which may impact other user applications. NAS devices may not suit applications which require very high bandwidth. To overcome this limitation, some users implement a dedicated IP network for high data volume applications, in addition to the messaging IP network. This can add significantly to the cost of the NAS solution Total cost of ownership Because it makes use of both existing LAN network infrastructures and network administration skills already employed in many organizations, NAS costs may be substantially lower than for directly attached or additional SAN-attached storage. Specifically, NAS-based solutions offer the following cost-reducing benefits: They reduce administrative staff requirements. They improve reliability and availability. They bridge the gap between UNIX and Windows environments. Chapter 1. Introduction to System Storage N Series 27

58 7129ch01_chap1.fm Draft Document for Review December 6, :59 am Reduced administrative staff requirements Implementing single or clustered NAS systems to manage your networked storage concentrates the administrative tasks and thereby reduces the number of people required to maintain the network. Since the NAS system is a headless system, administration is usually performed via a Web-based GUI interface accessible from anywhere on the network. In addition, more capacity can be managed per administrator, thus resulting in a lower cost of ownership. Improved reliability and availability In today s business world, it has become the de facto standard to provide clients with access to information 24 hours per day, 7 days per week, allowing very little time available for unplanned outages. The System Storage N3700 offers the ability to provide excellent availability with options for clustered models. Bridges the gap between UNIX and Windows environments Most companies today contain heterogeneous operating environments. A NAS solution offers clients the ability for true cross-platform file sharing between Windows and UNIX clients by offering support for CIFS and NFS. This becomes increasingly important when application data becomes more common across platforms Industry standards There is a clear client need for standardization within the storage networking industry to allow users to freely select equipment and solutions, knowing that they are not tying themselves to a proprietary or short term investment. To this end, there are extensive efforts among the major vendors in the storage networking industry to cooperate in the early agreement, development, and adoption of standards. A number of industry associations, standards bodies, and company groupings are involved in developing and publishing storage networking standards. The most important of these are the Storage Networking Industry Association (SNIA) and the Internet Engineering Task Force (IETF). In addition, IBM, IBM Business Partners, and other major vendors in the industry, have invested heavily in inter-operability laboratories. The IBM laboratories in Gaithersburg (Maryland, USA), Mainz (Germany), and Tokyo (Japan) are actively testing equipment from IBM and many other vendors, to facilitate the early confirmation of compatibility between multiple vendors servers, storage, and network hardware and software components. 28 The IBM TotalStorage Network Attached Storage N Series

59 Draft Document for Review December 6, :59 am 7129p02.fm Part 2 Part 2 N series Systems Concepts In this part we discuss key components of the N series System: Data Ontap WAFL 7 RAID DP Snapshot... Copyright IBM Corp All rights reserved. 29

61 Draft Document for Review December 6, :59 am ONTAP7.fm 2 Chapter 2. Introduction to Data ONTAP This chapter discusses Data ONTAP. It is an operating system that has been specifically designed to provide data management tools and technologies in a network-oriented environment. This specialized operating system delivers to customers the ability to fully leverage their enormous investments in data and information, and gives them tools for managing the sheer and fast-growing magnitude of data and information with which they need to cope. 2.1 DATA ONTAP is a light weight microkernel The Data ONTAP is a robust, tightly-coupled, multi-tasking, real-time microkernel. This pre-tuned compact kernel minimizes complexity and improves reliability. In fact, Data ONTAP software is less than 2% the total size of general purpose operating systems. This is one of the real benefits of Data ONTAP, by maintaining a lightweight workable size, upgrades, maintenance, acquisition time and complexity are reduced. In addition the size of Data ONTAP gives you an idea of the Operating System overheads as compared to other Operating Systems. Designed with the goal of maximizing the throughput between network interfaces and disk drives, the Data ONTAP kernel utilizes the robust WAFL (Write Anywhere File Layout) file system. WAFL and RAID were designed together to avoid the performance problems that most file systems experience with RAID and to ensure the highest level of reliability. RAID is integrated into the WAFL file system as opposed to other approaches which have some type of volume manager on top of a Operating System which reduces operator errors, OS and application software release mismatches, patch level mismatches, etc. This integration results in RAID actually acting as a performance accelerator for WAFL rather than the normally-expected performance inhibitor. 2.2 The Data ONTAP approach The Data ONTAP approach also helps improve overall application availability in that file system operations normally run on general purpose application file servers are no longer executed improving general application server availability. This is a clear differentiation when Copyright IBM Corp All rights reserved. 31

62 ONTAP7.fm Draft Document for Review December 6, :59 am compared to conventional storage subsystems. In these examples, the odds of application server downtime are increased due to the 100% dependency on the application server's OS and file system software for all I/O operations. This contrasts significantly with Data ONTAP deployment options, which allow for multiple application servers, such that the failure of any one of those application servers does not preclude the other application server(s) from accessing the data. This is an added benefit not measured in N series storage system fault resilient availability. The robust Data ONTAP software, is based on a simple, message-passing kernel that has fewer failure modes than general purpose operating systems. These features combine to demonstrate average measured system availability greater than percent. 2.3 Benefits of Data ONTAP Data ONTAP Features and Benefits Multi-protocol Feature Standards-compliant High performance, integrated RAID Online capacity scaling Protocol support Supports business continuance Benefit Investment protection, risk reduction Helps reduce costs via data-de-duplication; flexibility in workstation & application choices Improved productivity through increasing sharing Helps increase end-user productivity & satisfaction Can help provide advanced data protection without performance compromises business continuance NFS V2/V3/V4 over UDP or TCP for Unix and Linux Microsoft CIFS for Windows iscsi provides block I/O access FTP Large file transfers HTTP Web based access NDMP Standard backup interface SNMP Diagnostic IBM Corporation Figure 2-1 Features and Benefits Data ONTAP has a look and feel similar to UNIX but actually is not as demonstrated by the limited command set seen in Figure 2-2 on page 33. It is a proprietary kernel produced by Network Appliance Corporation. Some additional benefits of the kernel are: No third party application software is allowed to be installed on Data ONTAP, thereby reducing resource contention, application management overhead, and increasing availability due to application failure avoidance. N series storage system side software (both standard and optionally enabled) is included in the Kernel. See figures Figure 2-3 on page 33 and Figure 2-4 on page 34. No third party scripts or executables are allowed securing the kernel from malicious viruses or poor programming. In fact external operations are done with the access services interface of Data ONTAP see Figure 2-5 on page 35. Support of NTLM 32 The IBM TotalStorage Network Attached Storage N Series

63 Draft Document for Review December 6, :59 am ONTAP7.fm DES encryption DES MAC MD5 for message digest (cryptographic checksum). Access services are included in the Kernel. Figure 2-2 Data ONTAP commands Data ONTAP SnapShot iscsi HTTP FTP FlexVol Disk Sanitization FilerView SecureAdmin Operating system software that optimizes data serving and allows multiple protocol data access Enables online backups, providing near instantaneous access to previous versions of data without requiring complete, separate copies Allows block I/O access protocol over IP networks Hypertext Transfer Protocol allows a user to transfer displayable Web pages and related files File Transfer Protocol (FTP), a standard Internet protocol is a simple way to exchange files between computers on the Internet. FlexVol creates multiple flexible volume on a large pool of disks. Dynamic, non-disruptive (thin) storage provisioning; space- and time-efficiency Disk sanitization is the process of physically obliterating data by overwriting disks with specified byte patterns or random data in a manner that prevents recovery of current data by any known recovery methods. This feature enables you to carry out disk sanitization by using three successive byte overwrite patterns per cycle and a default six cycles per operation A web-based administration tool that allows IT administrators to fully manage N3700 systems from remote locations. Simple and intuitive web-based single-appliance administration SecureAdmin is a Data ONTAP module that enables authenticated, command-based administrative sessions between an administrative user and Data ONTAP over an intranet or the Internet. Figure 2-3 Base Software with Data ONTAP Chapter 2. Introduction to Data ONTAP 33

64 ONTAP7.fm Draft Document for Review December 6, :59 am Software included in Base ONTAP that requires licensing CIFS NFS HTTP Cluster Failover FlexClone MultiStore SnapLock LockVault SnapMirror SnapRestore SnapVault Provides File System access for Microsoft Windows environments Provides File System access for Unix and Linux environments Hypertext Transfer Protocol allows a user to transfer displayable Web pages and related files Ensures high data availability for business-critical requirements by eliminating a single point of failure. Must be ordered for A20 clustered configurations or upgrades from A10 to A20 Active-active pairing delivers even more nines to right of the decimal point Designed to provide instant replication of data volumes/sets without requiring additional storage space at the time of creation Permits an enterprise to consolidate a large number of Windows, Linux or UNIX file servers onto a single storage system Many virtual filers on one physical appliance ease migration and multi-domain failover scenarios Provides non-erasable and non-rewritable data protection that helps enable compliance with government and industry records retention regulations Designed to provide non-erasable and non-rewritable copies of Snapshot data to help meet regulatory compliance needs for maintaining backup copies of unstructured data. Remote mirroring software that provides automatic block-level incremental file system replication between sites. Available in synchronous, asynchronous and semi synchronous modes of operation Allows rapid restoration of the file system to an earlier point in time, typically in only a few seconds Provide disk based backup for N3700 systems by periodically backing up a snapshot copy to another system IBM Corporation Figure 2-4 Additional Software included in Data ONTAP 2.4 Data ONTAP architecture Data ONTAP is made up of the following components as shown in Figure 2-5 on page 35. WAFL Protection RAID & Mirroring NVRAM management WAFL virtualization SnapShot management File Services Block Services Network layer Protocol layer 34 The IBM TotalStorage Network Attached Storage N Series

65 Draft Document for Review December 6, :59 am ONTAP7.fm System Administration. Data ONTAP Storage Microkernel Fibre Channel mass storage System Administration and monitoring NV RAM Journaling TCP/IP 10/100 & GbE (Fibre & Copper) WAFL Protection (RAID & Mirroring) File Semantics File Services NFS, CIFS, HTTP, FTP, WAFL Virtualization GbE TCP/IP Offload Engine 2 Gbps Fibre Channel Snapshots SnapMirror LUN Semantics Block Services FCP, iscsi Future 10/6/2005 NetApp Confidential -- Do Not Distribute Subject To Change Without Notice 3 Figure 2-5 Data ONTAP architecture Data ONTAP startup Data ONTAP itself resides on the compact flash (see Figure 1-3 on page 6) and on each of the physical disks. The boot sequence of Data ONTAP can be seen in Figure 2-6 on page 35. Figure 2-6 Boot Sequence During this Boot sequence Data ONTAP checks /ETC to see if a install was already done. Flash memory also holds a copy of /etc. Chapter 2. Introduction to Data ONTAP 35

66 ONTAP7.fm Draft Document for Review December 6, :59 am 36 The IBM TotalStorage Network Attached Storage N Series

67 Draft Document for Review December 6, :59 am WAFL.fm 3 Chapter 3. Write Anywhere File Layout(WAFL) This Chapter describes WAFL, which is a file system designed specifically to work in a file server appliance. The primary focus is on the algorithms and data structures that WAFL uses to perform its I/O and to implement Snapshots which are read-only clones of the active file system. WAFL uses a unique copy-on-write technique to minimize the disk space that Snapshots consume. This paper also describes how WAFL uses Snapshots to eliminate the need for file system consistency checking after an unclean shutdown. The file system requirements for a file server storage system are different from those for a general-purpose UNIX/Windows system both because a file server appliance must be optimized for network file access and because an appliance must be easy to use. 3.1 Introduction An storage system is a device designed to perform a particular function. A standard trend in networking has been to provide common services using appliances instead of generalpurpose computers. For instance, special-purpose routers have almost entirely replaced general-purpose computers for packet routing, even though general purpose computers originally handled all routing. Other examples of network appliances include network terminal concentrators, network FAX servers, and network-capable printers. Copyright IBM Corp All rights reserved. 37

68 WAFL.fm Draft Document for Review December 6, :59 am Right sized version -98% of disk Create a Volume Volume 10% of space is used for WAFL Overhead - work area Right sized version -98% of disk Right sized version -98% of disk 20MB from the right sized portion of each disk is reserved for boot strap. DOT Kemel, disk labels Volume Size = sum of the right sized portions of disk MINUS 20MB for each disk 90% remaining = WAFL filesystem * By default, 80% of this space makes up the active filesystem - data blocks, meta data, etc. Right sized version -98% of disk By default, 20% is the SnapShot Reserve Figure 3-1 WAFL overhead A new type of network appliance is the file server appliance also known as a filer. The requirements for a file system operating in a filer are different from those for a general purpose file system: network client access patterns are different from local access patterns due to the protocols used, and the special-purpose nature of an appliance also affects the design. WAFL (Write Anywhere File Layout) is the file system used in all N series storage systems. WAFL was designed to meet four primary requirements: It should provide fast NFS service. It should support large file systems (tens of TB) that grow dynamically as disks are added. It should provide high performance while supporting RAID (Redundant Array of Independent Disks). It should restart quickly, even after an unclean shutdown due to power failure or system crash. The requirement for fast file service is obvious, given WAFL's intended use in a storage appliance. Support for large file systems simplifies system administration by allowing all disk space to belong to a single large partition. Large file systems make RAID desirable because the probability of disk failure increases with the number of disks. Large file systems require special techniques for fast restart because the file system consistency checks must be eliminated if the uptime goals of an appliance are to be achieved. 38 The IBM TotalStorage Network Attached Storage N Series

69 Draft Document for Review December 6, :59 am WAFL.fm Other filesystems write to pre-allocated locations on disk, resulting in many more disk head seeks than WAFL WAFL Writes occur to memory first and are then flushed to disk in contiguous stripes across all disks Application OS Security Volume Mgmt (RAID) NTFS/UFS Disk Drivers Figure 3-2 WAFL advantages File services and RAID both strain write performance: file services because servers must store data safely before replying to network requests, and RAID because of the read-modify-write sequence it uses to maintain parity. This led to the use of non-volatile RAM to reduce response time and a write-anywhere design that allows WAFL to write to disk locations that minimize RAID's write performance penalty. The write-anywhere design enables Snapshots, which in turn eliminate the requirement for time-consuming consistency checks after power loss or system failure. Chapter 3. Write Anywhere File Layout(WAFL) 39

70 WAFL.fm Draft Document for Review December 6, :59 am Parity Disk Data Disks One 4-KB RAID Stripe A single RAID group* Figure 3-3 WAFL Helps Maximize Raid performance NVRAM is designed to allow writes to be gathered up, optimized, and safely deferred in addition this helps provide more disk bandwidth for reads. 3.2 WAFL Implementation WAFL is a compatible file system optimized for network file access. It is unique in that it stores sufficient information to make it compatible with a number of different client environments (NFS, CIFS, HTTP, etc.) and optimized to maximize the reading and writing of disk content while supplying it to various types of network clients. In many ways WAFL is similar to other UNIX file systems such as the Berkeley Fast File System (FFS) and TransArc's Episode file system. WAFL is a block-based file system that uses inodes to describe files. It uses 4 KB blocks with no fragments Overview WAFL is a unique file system that is optimized for network file access. In many ways WAFL is similar to other UNIX file systems such as the Berkeley Fast File System (FFS) and TransArc's Episode file system. It is a block-based file system that uses inodes to describe files. It uses 4 KB blocks with no fragments. But its uniqueness lies in its ability to store sufficient meta-data to enable it to function with any of the current mainstream operating systems (Unix, Linux, and Windows) as well as interoperate with block-level protocols like FCP and iscsi. It also contains unique optimizations that effectively increase its ability to move data between disk blocks and network interfaces better than any mainstream general-purpose operating system currently on the market. 40 The IBM TotalStorage Network Attached Storage N Series

71 Draft Document for Review December 6, :59 am WAFL.fm Berkeley Fast File System/Veritas File System/NTFS/etc. Writes to pre-allocated locations (data vs. metadata)... WAFL No pre-allocated locations (data and metadata blocks are treated equally). Writes go to nearest available free block. 1-2 MB Cylinders... Writing to nearest available free block reduces disk seeking (the #1 performance challenge when using disks). Figure 3-4 WAFL: Write Anywhere File Layout Each WAFL inode contains pointers to indicate which blocks belong to the file. Unlike FFS, all the block pointers in a WAFL inode refer to blocks at the same level. Thus, inodes for small files use the block pointers within the inode to point to data blocks. Inodes for larger files point to indirect blocks which point to actual file data. Inodes for larger files point to doubly indirect blocks and so forth. For very small files, data is stored in the inode itself in place of the block pointers. By storing data and blocks this way, filesystem I/O can be reduced since the access of the inode actually brings in the first layer of block pointers and even actual file content Meta-Data Lives in Files Like Episode, WAFL stores meta-data in files. WAFL's three meta-data files are the inode file, which contains the inodes for the file system, the block-map file, which identifies free blocks, and the inode-map file, which identifies free inodes. The term map is used instead of bit map because these files use more than one bit for each entry. The block-map file's format is described in detail below. Because the Meta-Data lives in actual files (albeit completely hidden) these files are treated just as data files. They receive the same protections and accelerations as described for user data files. Chapter 3. Write Anywhere File Layout(WAFL) 41

72 WAFL.fm Draft Document for Review December 6, :59 am Figure 3-5 The WAFL file system is a tree of blocks with the root inode, which describes the inode file, at the top, and meta-data files and regular files underneath. Keeping meta-data in files allows WAFL to write meta-data blocks anywhere on disk. This is the origin of the name WAFL, which stands for Write Anywhere File Layout. The write-anywhere design allows WAFL to operate efficiently with RAID by scheduling multiple writes to the same RAID stripe whenever possible to avoid the general 4-to-1 write penalty that RAID incurs when it updates just one block in a stripe. Keeping meta-data in files makes it easy to increase the size of the file system on the fly. When a new disk is added, the N Series storage system automatically increases the sizes of the meta-data files. The system administrator can increase the number of inodes in the file system manually as well if the default is too small but this is generally unnecessary. Finally, the write-anywhere design enables the copy-on-write technique used by Snapshots. For Snapshots to work, WAFL must be able to write all new data, including meta-data, to new locations on disk, instead of overwriting the old data. If WAFL stored meta-data at fixed locations on disk, this would not be possible Tree of Blocks A WAFL file system is best thought of as a tree of blocks. At the root of the tree is the root inode, as shown in Figure 3-5. The root inode is a special inode that describes the inode file. The inode file contains the inodes that describe the rest of the files in the file system, including the block-map and inode-map files. The leaves of the tree are the data blocks of all the files. 42 The IBM TotalStorage Network Attached Storage N Series

73 Draft Document for Review December 6, :59 am WAFL.fm Root Inode Inode File Indirect Blocks Inode File Data Blocks Regular File Indirect Blocks Regular File Data Blocks.. Block Map File Inode Map File Random Small File Random Large File Figure 3-6 A more detailed view of WAFL's tree of blocks. Figure 3-6 is a more detailed version of Figure 1. It shows that files are made up of individual blocks and that large files have additional layers of indirection between the inode and the actual data blocks. In order for WAFL to boot, it must be able to find the root of this tree, so the one exception to WAFL's write-anywhere rule is that the block containing the root inode must live at a fixed location on disk where WAFL can find it. 3.3 File System Consistency and Non-Volatile RAM Avoiding file system consistency checking WAFL avoids the need for file system consistency checking after an unclean shutdown by creating a special Snapshot called a consistency point every few seconds. Unlike other Snapshots, a consistency point has no symbolic name, and it is not accessible. Like all Snapshots, a consistency point is a completely self consistent image of the entire file system. When WAFL restarts, it simply reverts to the most recent consistency point and plays back the recorded filesystem changes existing in the log. This allows a filer to reboot in a short time regardless of the amount of storage attached and active at the time of outage Consistency Points and NVRAM No less frequently than every 10 seconds WAFL generates an internal Snapshot called a consistency point so that the disks contain a completely self-consistent version of the file system. A consistency check is created more frequently if the recent I/O load so demands. When the filer boots, WAFL always uses the most recent consistency point on disk which means that even after power loss or system failure there is no need for time consuming file system checks. The filer boots in just a minute or two, most of which is spent spinning up disk drives and checking system memory. Chapter 3. Write Anywhere File Layout(WAFL) 43

74 WAFL.fm Draft Document for Review December 6, :59 am Data Flow Write Requests Figure 3-7 Write Requests Typical Write Request 1. Request Credentials 2. New block to write too 3. Write to NVRAM 4. Acknowledge sent to Client 44 The IBM TotalStorage Network Attached Storage N Series

75 Draft Document for Review December 6, :59 am WAFL.fm CLIENT Client RAM-to-NVRAM data path Fast, predictable client response time Permits WAFL layer to optimize physical access (fewer writes, faster writes) Better disk subsystem throughput Virtualization layer IBM Corporation Figure 3-8 High performance: NVRAM aware OS = WAFL NVRAM 0% 50% 100% Log 1 Log 2 Filesystem Journal Fileystem Journal Of File update Requests Forced Consistency Point if Requests not Completed and NVRAM utilization Reaches 50% and 10 second timer has Not been reached 0 seconds 10 seconds Periodic Consistency Point after 10 seconds Figure 3-9 NVRAM operation The filer uses battery-backed non-volatile RAM (NVRAM) to avoid losing any client update requests that might have occurred after the most recent consistency point. During a normal system shutdown, the filer turns off file and block services, flushes all cached operations to Chapter 3. Write Anywhere File Layout(WAFL) 45

76 WAFL.fm Draft Document for Review December 6, :59 am disk and turns off the NVRAM. When the filer restarts after a system failure or power loss, it looks in NVRAM for any uncompleted requests and replays those that have not reached disk. NVRAM Battery N3700 node NVRAM Figure 3-10 N3700 node Using NVRAM to store a log of uncommitted requests is very different from using NVRAM as a disk cache, as some other storage products do. When NVRAM is used at the disk layer, it may contain data that is critical to file system consistency. If the NVRAM fails, the file system may become inconsistent in ways that fsck or chkdsk cannot correct. So NVRAM is typically handling only write operations and not Read operations. WAFL uses NVRAM as a file system journal, not as a cache of disk blocks that need to be changed on the drives. As such, WAFL's use of NVRAM space is extremely efficient. For example, a request for a file system to create a file can be described in just a few hundred bytes of information, where as the actual operation of creating a file on disk might involve changing a dozen blocks of information or more. Because WAFL uses NVRAM as a journal of operations that need to be performed on the drives, rather than the result of the operations themselves, thousands of operations can be journaled in a typical filer NVRAM log. Between consistency points, WAFL does write data to disk, but it writes only to blocks that are not in use, so the tree of blocks representing the most recent consistency point remains completely unchanged. WAFL processes hundreds or thousands of I/O requests between consistency points, so the on-disk image of the file system remains the same for many seconds until WAFL writes a new consistency point, at which time the on-disk image advances atomically to a new state that reflects the changes made by the new requests. Although this technique is unusual for a UNIX file system, it is well known for databases. Even in databases it is unusual to write as many operations at one time as WAFL does in its consistency points. WAFL uses non-volatile RAM (NVRAM) to keep a log of NFS requests it has processed since the last consistency point. (NVRAM is special memory with batteries that allow it to store data even when system power is off.) After an unclean shutdown, WAFL replays any requests in the log to prevent them from being lost. When a N series storage system shuts down 46 The IBM TotalStorage Network Attached Storage N Series

77 Draft Document for Review December 6, :59 am WAFL.fm normally, it creates one last consistency point after suspending I/O services. Thus, on a clean shutdown the NVRAM doesn't contain any unprocessed I/O requests, and it is turned off to increase its battery life. WAFL actually divides the NVRAM into two separate logs of equal size 1. When one log gets full, WAFL switches to the other log and starts writing a consistency point to store the changes from the first log safely on disk. WAFL schedules a consistency point every 10 seconds, even if the log is not full, to prevent the on-disk image of the file system from getting too far out of date. Logging only I/O requests that will modify the filesystem to NVRAM has several advantages over the more common technique of using NVRAM to cache writes at the disk driver layer. In particular, it is a much more efficient use of space; it is very common for a small I/O request to necessitate the updating of many disk blocks. Modified data blocks in turn require the update of parity information in RAID. Logging in this manner also frees the filer up from being locked into performing I/O on specific disk blocks; it actually provides the opportunity to combine a number of update requests into a single string of disk operations (referred to as coalescing). NVRAM placement is at the N series storage system operation level, not at the (more typical) block level. This assures self-consistent CP flushes to disk for the entire box rather than just certain sections or volumes Understanding NVLOG forwarding NVLOG forwarding is a critical component of how synchronous mode works. It is the method used to take write operations submitted from clients against the primary file systems to be replicated to the destination. In order to completely understand NVLOG forwarding, a basic knowledge of how Data ONTAP performs local file system writes is required. 1. The filer receives a write request. This request could be a file-oriented request from an NFS, CIFS, or DAFS client, or it could be a block-oriented request via FCP or iscsi. 2. The request is journaled in battery backed-up, nonvolatile memory (NVRAM). It is also recorded in cache memory, which has faster access times than NVRAM. 3. Once the request is safely stored in NVRAM and cache memory, Data ONTAP acknowledges the write to the client system, and the application that requested the write is free to continue processing. At this point the data has not been written to disk, but is protected from power failure and most types of hardware problems by the NVRAM. 4. Under certain conditions, a consistency point (CP) is triggered. Typically this will occur when the NVRAM journal is one-half full or when 10 seconds have passed since the most recent CP, whichever comes first When a CP is triggered, Data ONTAP uses the transaction data in cache memory to build a list of data block changes that need to be written to disk. It also computes parity information at this time. Once the required disk modifications have been determined, WAFL sends a list of data blocks to be written to the RAID software. This list of data blocks is called a tetris. 6. The RAID software writes the data blocks out to disk. When it is complete, it returns an acknowledgment to the WAFL software, and Data ONTAP considers the CP complete. 7. The request is journaled in NVRAM. It is also recorded in cache memory and forwarded over the network to the SnapMirror destination system, where it is journaled in NVRAM and cache memory. 1 If the N-series is clustered, the NVRAM is actually divided in half where one half is used by the local node and the second half is a mirror of the NVRAM on the partner. Each of these halves are divided again in half to create the two halves used for consistency point processing by WAFL. Chapter 3. Write Anywhere File Layout(WAFL) 47

78 WAFL.fm Draft Document for Review December 6, :59 am Once the request is safely stored in NVRAM and cache memory on both the primary and secondary systems, Data ONTAP acknowledges the write to the client system, and the application that requested the write is free to continue processing. One important detail is the way in which the NVLOG data is stored on the secondary storage system. Since this system has its own storage as well as the mirrored data from the primary system (at the very least, every filer has its own root volume), and because CPs need to be kept synchronized on any mirrored volumes, the NVLOG data cannot be stored in NVRAM on the secondary system in the same way as normal file system writes. Instead, the NVLOG data is treated as a stream of writes to a pair of these files are logged in the secondary system s NVRAM just like any other write. Because the NVLOG data needs to be written to these files on the root volume, the performance of the root volume on the secondary system has a direct impact on the overall performance of synchronous or semi-synchronous mode SnapMirror High Performance NAS processing Processing an NFS or CIFS request and caching the resulting disk writes generally takes much more NVRAM than simply logging the information required to replay the request. For instance, to move a file from one directory to another using NFS, the file system must update the contents and inodes of both the source and target directories. WAFL uses about 150 bytes to log the information needed to replay a rename operation. Rename, with its factor of 200 difference in NVRAM usage, is an extreme case but even for a simple 8 KB write, caching disk blocks will consume 8 KB for the data, 8 KB for the inode update, and for large files another 8 KB for the indirect block. WAFL logs just the 8 KB of data along with about 120 bytes of header information. With a typical mix of NFS operations, WAFL can store more than 1000 operations per megabyte of NVRAM. Using NVRAM as a cache of unwritten disk blocks turns it into an integral part of the disk subsystem. A failure in traditional NVRAM can corrupt the file system in ways that fsck cannot detect or repair. If something goes wrong with WAFL's NVRAM, WAFL may lose a few I/O requests, but the on-disk image of the file system remains completely self consistent. This is important because NVRAM is reliable, but not as reliable as a RAID disk array. A final advantage of logging NAS requests is that it improves response times. To reply to a request, a file system without any NVRAM must update its in-memory data structures, allocate disk space for new data, and wait for all modified data to reach disk. A file system with an NVRAM write cache does all the same steps, except that it copies modified data into NVRAM instead of waiting for the data to reach disk. WAFL can reply to an NFS request much more quickly because it need only update its in-memory data structures and log the request. It does not allocate disk space for new data or copy modified data to NVRAM. 3.4 Write Allocation WAFL's design was motivated largely by a desire to maximize the flexibility of its write allocation policies. This flexibility takes three forms: 1. WAFL can write any file system block (except the one containing the root inode) to any location on disk. In FFS, meta-data, such as inodes and bit maps, is kept in fixed locations on disk. This prevents FFS from optimizing writes by, for example, putting both the data for a newly updated file and its inode right next to each other on disk. Since WAFL can write meta-data anywhere on disk, it can optimize writes more creatively 2. WAFL can write blocks to disk in any order. FFS writes blocks to disk in a carefully determined order so that fsck(8) can restore file system consistency after an unclean 48 The IBM TotalStorage Network Attached Storage N Series

79 Draft Document for Review December 6, :59 am WAFL.fm shutdown. WAFL can write blocks in any order because the on-disk image of the file system changes only when WAFL writes a consistency point. The one constraint is that WAFL must write all the blocks in a new consistency point before it writes the root inode for the consistency point. 3. WAFL can allocate disk space for many I/O operations at once in a single write episode. FFS allocates disk space as it processes each I/O request. WAFL gathers up hundreds of I/O requests before scheduling a consistency point, at which time it allocates blocks for all requests in the consistency point at one time. Deferring write allocation improves the latency of NFS operations by removing disk allocation from the processing path of the reply and it avoids wasting time allocating space for blocks that are removed before they reach disk. 4. WAFL eliminates the classic parity-disk hotspot issue by the use of flexible write allocation policies: Writes any filesystem block to any disk location (data and meta-data)* New data does not overwrite old data Allocates disk space for many client-write operations at once in a single new RAID-stripe write (no parity re-calculations) Writes to stripes that are near each other Writes to disk in any order Typical File System WAFL file1 file3 Long head seeks especially on parity disk file1 file2 file3 Short head seeks across all disks file2 1 file at a time Multiple files at once Figure 3-11 Minimal seeks and no hotspot These features give WAFL extraordinary flexibility in its write allocation policies. The ability to schedule writes for many requests at once enables more intelligent allocation policies and the fact that blocks can be written to any location and in any order allows a wide variety of strategies. It is easy to try new block allocation strategies without any change to WAFL's on-disk data structures. Chapter 3. Write Anywhere File Layout(WAFL) 49

80 WAFL.fm Draft Document for Review December 6, :59 am The details of WAFL's write allocation policies are outside the scope of this document. In short, WAFL improves RAID performance by writing to multiple blocks in the same stripe; WAFL reduces seek time by writing blocks to locations that are near each other on disk, and WAFL reduces head- contention when reading large files by placing sequential blocks in a file on a single disk in the RAID array. Optimizing write allocation with traditional storage systems is difficult because these goals often conflict. Unlike other Virtual implementations WAFL is aware of the underlying RAID subsystem and it s architecture. 3.5 Conclusion WAFL was developed and became stable surprisingly quickly for a new file system. We attribute this stability in part to WAFL's use of consistency points. Processing file system requests is simple because WAFL updates only in-memory data structures and the NVRAM log. Consistency points eliminate ordering constraints for disk writes, which are a significant source of bugs in most file systems. The code that writes consistency points is concentrated in a single file and interacts little with the rest of WAFL. More importantly it is much easier to develop high quality, high performance system software for an appliance than for a general purpose operating system. Special-purpose filesystems also have difficulty obtaining good performance and reliability because they are often hosted on general purpose platforms which limits their efficiencies and reliability. Compared to a general purpose file system, WAFL handles a very regular and simple set of requests. A general purpose file system receives requests from thousands of different applications with a wide variety of different access patterns, and new applications are added frequently. By contrast, WAFL receives requests only from the Network Attached Storage or SAN client modules of other systems which have been implemented following a strict regime of industry-developed protocol definitions. ISCSI, NFS, FTP, and HTTP all must function the same regardless of which platform they are running on because the protocol they follow is well constructed. CIFS is, of course, only available from a single source so it too is well constrained. Of course, applications are the ultimate source of I/O requests, but the client code converts applications request into a regular pattern of network requests, and it filters out error cases before they reach the server. The small number of operations that WAFL supports makes it possible to define and test the entire range of inputs that it is expected to handle. These advantages apply to any appliance, not just to file server appliances. A network appliance only makes sense for protocols that are well defined and widely used, but for such protocols, an appliance can provide important advantages over a general purpose computer. 50 The IBM TotalStorage Network Attached Storage N Series

81 Draft Document for Review December 6, :59 am raiddp.fm 4 Chapter 4. N Series Data Protection with RAID DP This chapter provides an overview of RAID-DP and how it dramatically increases data fault tolerance from various disk failure scenarios. Other key areas covered include how much RAID-DP costs (it's free), special hardware requirements (none), and converting RAID groups from RAID4-to RAID-DP (it's easy). And this document presents a double-disk failure recovery scenario to show how RAID-DP both allows the RAID group to continue serving data and recreates data lost on the two failed disks. For the remainder of this chapter the term volume, when used alone, means both traditional volumes and aggregates. Data ONTAP 7G volumes have two distinct versions. Traditional volumes and a new type of volume known as FlexVol is available. As the name implies, FlexVol volumes offer extremely flexible and unparalleled functionality but complete coverage of them is beyond the scope of this paper. FlexVol volumes are housed on a new construct known as an aggregate. At the RAID layer, both RAID-DP and RAID 4 operate at the traditional volume and aggregate level.introduction 4.1 Introduction Traditional single-parity RAID technology offers protection from a single failed disk drive. The caveat is that no other disk fail or, uncorrectable bit errors not occur during a read operation while reconstruction of the failed disk is still in progress. If either secondary event occurs during reconstruction, then some or all data contained in the RAID array or volume could be lost. With modern larger disk media, the likelihood of an uncorrectable bit error is fairly high, since disk capacities have increased but bit error rates have stayed the same. Hence the ability of traditional single parity RAID to protect data is being stretched past its limits. The next level in the evolution of RAID data protection is RAID Double Parity, or RAID-DP, available on the entire IBM N series data storage product line. Copyright IBM Corp All rights reserved. 51

82 raiddp.fm Draft Document for Review December 6, :59 am Address the challenge with Double Parity RAID: RAID-DP Single-parity RAID Protects against any single disk failure P RAID-DP Protects against any two-disk failure P DP Survives any 2-disk-failure scenario Compared to single-parity RAID, RAID-DP has: Better protection (>4,000 MTTDL) Equal, often better performance Same capacity overhead (typically 1 parity per 6 data drives Outperforms any other doubleparity offering Combined with SyncMirror (RAID1), the N-series storage systems are designed to survive failure of any five disks in one disk protection group IBM Corporation Figure 4-1 RAID-DP 4.2 What Is the Need for RAID-DP? As mentioned earlier, traditional single-parity RAID offers adequate protection against a single event, which could be either a complete disk failure or a bit error during a read. In either event, data is recreated using both parity and data remaining on unaffected disks in the array or volume. If the event is a read error, then recreating data happens almost instantaneously, and the array or volume remains in an online mode. However, if a disk fails, then all data lost on it has to be recreated, and the array or volume will remain in a vulnerable degraded mode until data has been reconstructed onto a spare disk. It is in degraded mode that traditional single-parity RAID shows its protection has not kept up with modern disk architectures. I High Reliability Figure 4-2 Raid-DP 52 The IBM TotalStorage Network Attached Storage N Series

83 Draft Document for Review December 6, :59 am raiddp.fm The Effect of Modern Larger Disk Sizes on RAID Modern disk architectures have continued to evolve, as have other computer-related technologies. Disk drives are orders of magnitude larger than they were when RAID was first introduced. As disk drives have gotten larger, their reliability has not improved, and, more importantly, the bit error likelihood per drive has increased proportionally with the larger media. These three factors larger disks, unimproved reliability, and increased bit errors with larger media all have serious consequences for the ability of single-parity RAID to protect data. Given that disks are as likely to fail now as when RAID technology was first introduced to protect data from such an event, RAID is still as vital now as it was then. When one disk fails, RAID simply recreates data from both parity and the remaining disks in the array or volume onto a hot spare disk. But since RAID was introduced, the significant increases in disk size have resulted in much longer reconstruct times for data lost on the failed disk. Simply put, it takes much longer to recreate data lost when a 274GB disk fails than when a 36GB disk fails. Compounding the longer reconstruct times is the fact that the larger disk drives in production use today tend to be ATA based and perform more slowly and are less reliable than smaller SCSI-based ones Protection Schemes with Single-Parity RAID Using Larger Disks The various options to extend the ability of single-parity RAID to protect data as disks continue to get larger are not attractive. The first is to continue to buy and implement storage using the smallest disk sizes possible so that reconstruction after a failed disk completes more quickly. However, this approach isn't practical from any point of view. Capacity density is critical in space-constrained data centers, and smaller disks result in less capacity per square foot. Plus storage vendors are forced to offer products based on what disk manufacturers are supplying, and smaller disks aren't readily available, if at all. The second way to protect data on larger disks with single-parity RAID is slightly more practical but, with the introduction of RAID-DP, a less attractive approach for various reasons. Namely, by keeping the size of arrays or volumes small, the time to reconstruct is reduced. Continuing the analogy about a larger disk taking longer to reconstruct than a smaller one, an array or volume built with more disks takes longer to reconstruct data from one failed disk than one built with fewer disks. However, smaller arrays and volumes have two costs that cannot be overcome. The first cost is that additional disks will be lost to parity impacting usable capacity and total cost of ownership (TCO). The second cost is that performance is generally slower with smaller arrays, aggregates, and volumes, impacting business and users. The most reliable protection offered by single-parity RAID is RAID1, or mirroring. In RAID1, the mirroring process replicates an exact copy of all data on an array, aggregate, or volume to a second array or volume. While RAID1 mirroring affords maximum fault tolerance from disk failure, the cost of the implementation is severe, since it takes twice the disk capacity to store the same amount of data. Earlier it was mentioned that using smaller arrays and volumes to improve fault tolerance increases the total cost of ownership of storage due to less usable capacity per dollar spent. Continuing this approach, RAID1 mirroring, with its unpleasant requirement for double the amount of capacity, is the most expensive type of storage solution with the highest total cost of ownership RAID-DP Data Protection In short, given the current landscape, with larger disk drives affecting data protection, customers and analysts demand a better story about affordably improving RAID reliability Chapter 4. N Series Data Protection with RAID DP 53

84 raiddp.fm Draft Document for Review December 6, :59 am from storage vendors. To meet this demand, new type of RAID protection named RAID-DP. RAID-DP stands for RAID Double Parity, and it significantly increases the fault tolerance from failed disk drives over traditional RAID. When all relevant numbers are plugged into the standard mean time to data loss (MTTDL) formula for RAID-DP versus single-parity RAID, RAID-DP is on the order of 10,000 times more reliable on the same underlying disk drives. With this level of reliability RAID-DP offers significantly better data protection than RAID1 mirroring, but at RAID4 pricing. RAID-DP offers businesses the most compelling total cost of ownership storage option without putting their data at an increased risk. 4.3 How RAID-DP Works RAID-DP with Double Parity It is well known that parity generally improves fault tolerance and that single-parity RAID improves data protection. Given that traditional single-parity RAID has established a very good track record to date, the concept of double-parity RAID should certainly sound like a better protection scheme. This is borne out in the earlier example using the MTTDL formula. But what exactly is RAID-DP, with its double parity? At the most basic layer, RAID-DP adds a second parity disk to each RAID group in a volume. A RAID group is an underlying construct that volumes are built upon. Each traditional RAID4 group has some number of data disks and one parity disk, with volumes containing one or more RAID4 groups. Whereas the parity disk in a RAID4 volume stores row parity across the disks in a RAID4 group, the additional RAID-DP parity disk stores diagonal parity across the disks in a RAID-DP group. With these two parity stripes in RAID-DP, one the traditional horizontal, and the other diagonal, data protection is obtained even in the event of two disk failure events occurring in the same RAID group. In RAID4 parity is at the block level. 4.4 How RAID-DP Works With RAID-DP, the traditional RAID4 horizontal parity structure is still employed and becomes a subset of the RAID-DP construct. In other words, how RAID4 works on storage that hasn't been modified with RAID-DP. The same process, in which data is written out in horizontal rows with parity calculated for each row, still holds in RAID-DP and is considered the row component of double parity. In fact, if a single disk fails or a read error from a bad block or bit error occurs, then the row parity approach of RAID4 is the sole vehicle used to recreate the data without ever engaging RAID-DP. In this case, the diagonal parity component of RAID-DP is simply a protective envelope around the row parity component RAID4 Horizontal Row Parity Figure 4-3 illustrates the horizontal row parity approach used in the traditional RAID4 solution and is the first step in establishing an understanding of RAID-DP and double parity. 54 The IBM TotalStorage Network Attached Storage N Series

85 Draft Document for Review December 6, :59 am raiddp.fm D D D D P 3A 1B 2C 3D 9E Figure 4-3 horizontal row parity approach Figure 4-3 represents a traditional RAID4 group using row parity that consists of four data disks (the first four columns, labeled "D") and the single row parity disk (the last column, labeled "P"). The rows in the above diagram represent the standard 4KB blocks used by the traditional RAID4 implementation. The second row in the above diagram has been populated with some sample data in each 4KB block and parity calculated for data in the row then stored in the corresponding block on the parity disk. In this case, the way parity was calculated was to add the values in each of the horizontal blocks, then store the sum as the parity value ( = 9). In practice, parity is calculated by an exclusive OR (XOR) process, but addition is fairly similar and works as well for the purposes of this example. If the need arose to reconstruct data from a single failure, the process used to generate parity would simply be reversed. For example, if the first disk were to fail, when RAID4 recreated the data value 3 in the first column above, it would subtract the values on remaining disks from what is stored in parity ( = 3). This example of reconstruction with single-parity RAID should further assist with the conceptual understanding of why data is protected up to but not beyond one disk failure event Adding RAID-DP Double-Parity Stripes Figure 4-4 on page 56 adds one diagonal parity stripe, denoted by the B series blocks, and a second parity disk, denoted with a "DP" in the sixth column, to the existing RAID4 group from the previous section and shows the RAID-DP construct that is a superset of the underlying RAID4 horizontal row parity solution. Chapter 4. N Series Data Protection with RAID DP 55

86 raiddp.fm Draft Document for Review December 6, :59 am D D D D P DP 3A 1B 2C 3D 9E 2B 12B 2B 7B Figure 4-4 Adding Raid-DP double parity stripes The diagonal parity stripe has been calculated using the addition approach for this example rather than the XOR used in practice as discussed earlier and stored on the second parity disk ( = 12). One of the most important items to note at this time is that the diagonal parity stripe includes an element from row parity as part of its diagonal parity sum. RAID-DP treats all disks in the original RAID4 construct, including both data and row parity disks, as the same. Figure 4-5 adds in the rest of the data for each block and creates corresponding row and diagonal parity stripes. D D D D P DP DP 33a 3A 3a 11e 1E 1e 22d 2D 2d 11c 1C 1c 11b 1B 1b 11a 1A 1a 33e 3E 3e 11d 1D 1d 22c 2C 2c 22b 2B 2b 11a 1A 1a 33e 3E 3e 33d 3D 3d 11c 1C 1c 22b 2B 2b 22a 2A 2a 99e 9E 9e 55d 5D 5d 88c 8C 8c 77b 7B 7b 77a 7A 7a 12b 12B 12b 12c 12C 12c 11d 11D 11d Figure 4-5 The rest of the data added for each block and corresponding row and diagonal parity stripes created One RAID-DP condition that is apparent from Figure 4-5 is that the diagonal stripes wrap at the edges of the row parity construct. Two important conditions for RAID-DP's ability to recover from double disk failures may not be readily apparent in this example. The first 56 The IBM TotalStorage Network Attached Storage N Series

87 Draft Document for Review December 6, :59 am raiddp.fm condition is that each diagonal parity stripe misses one and only one disk, but each diagonal misses a different disk. This results in the second condition, that there is one diagonal stripe that doesn't get parity generated on it or get stored on the second diagonal parity disk. In this example the omitted diagonal stripe is the white non-patterned blocks. In the reconstruction example that follows it will be apparent that omitting the one diagonal stripe doesn't affect RAID-DP's ability to recover all data in a double disk failure. It is important to note that the same RAID-DP diagonal parity conditions covered in this example hold in real storage deployments that involve dozens of disks in a RAID group and millions of rows of data written horizontally across the RAID4 group. And, while it is easier to illustrate RAID-DP with the smaller example above, recovery of larger size RAID groups works exactly the same regardless of the number of disks in the RAID group. Proving that RAID-DP really does recover all data in the event of a double disk failure can be done in two manners. One is using mathematical theorems and proofs, and the other is to simply go through a double disk failure and subsequent recovery process. This document will use the latter approach to prove the concept of RAID-DP double-parity protection RAID-DP Reconstruction Using the most recent diagram as the starting point for the double disk failure, assume that the RAID group is functioning normally when a double disk failure occurs. This is denoted by all data in the first two columns now missing in Figure 4-6. D D D D P DP 3a 3A 1b 1B 2c 2C 3d 3D 9e 9E 7a 7A 1e 1E 1a 1A 2b 2B 1c 1C 5d 5D 12b 12B 2d 2D 3e 3E 1a 1A 2b 2B 8c 8C 12c 12C 1c 1C 1d 1D 3e 3E 2a 2A 7b 7B 11d 11D Figure 4-6 Double Disk Failure When engaged after a double disk failure, RAID-DP first begins looking for a chain to start reconstruction on. In this case let's say the first diagonal parity stripe in the chain it finds is represented by the B series diagonal stripe. Remember when reconstructing data for a single disk failure under RAID4 that this is possible if and only if no more than one element is missing. With this in mind, traverse the B series diagonal stripe in the above diagram and notice that only one of the five B series blocks is missing. With four out of five elements available, RAID-DP has all of the information needed to reconstruct the data in the missing B series block. Figure 4-7 on page 58 reflects this data having been recovered onto an available hot spare disk. Chapter 4. N Series Data Protection with RAID DP 57

88 raiddp.fm Draft Document for Review December 6, :59 am D D D D P DP 3A 1B 2C 3D 9E 7A 1E 1A 2B 1C 5D 12B 2D 3E 1A 2B 8C 12C 1C 1D 3E 2A 7B 11D Figure 4-7 data having been recovered onto an available hot spare disk. The data has been recreated from the missing blue diagonal block using the same arithmetic discussed earlier ( = 1). Now that the missing 1B series diagonal information has been recreated, the recovery process switches from using diagonal parity to using horizontal row parity. Specifically, in the top row after Block1B has recreated the missing diagonal block, there is now enough information available to reconstruct the single missing horizontal block 3A from row parity ( = 3). This occurs in Figure 4-8. D D D D P DP 3A 1B 2C 3D 9E 7A 1E 1A 2B 1C 5D 12B 2D 3E 1A 2B 8C 12C 1C 1D 3E 2A 7B 11D Figure 4-8 reconstructing the single missing horizontal block from row parity RAID-DP next continues in the same chain to determine if other diagonal stripes can be recreated. With the top left block having been recreated from row parity, RAID-DP can now recreate the missing diagonal block 1A Figure 4-9 on page The IBM TotalStorage Network Attached Storage N Series

89 Draft Document for Review December 6, :59 am raiddp.fm D D D D P DP 3A 1B 2C 3D 9E 7A 1E 1A 2B 1C 5D 12B 2D 3E 1A 2B 8C 12C 1C 1D 3E 2A 7B 11D Figure 4-9 recreating the missing diagonal block in the gray diagonal stripe. Once again, after RAID-DP has recovered a missing diagonal block in a diagonal stripe, enough information exists for row parity to recreate the one missing horizontal block 1E in the first column, as illustrated in Figure D D D D P DP 3A 1B 2C 3D 9E 7A 1E 1A 2B 1C 5D 12B 2D 3E 1A 2B 8C 12C 1C 1D 3E 2A 7B 11D Figure 4-10 recreating the one missing horizontal block in the first column. As we noted earlier, the white diagonal stripe is not stored, and no additional diagonal blocks can be recreated on the existing chain. RAID-DP will start to look for a new chain to start recreating diagonal blocks on and, for the purposes of this example, determines it can recreate missing data in the C stripe, as the following diagram shows. Chapter 4. N Series Data Protection with RAID DP 59

90 raiddp.fm Draft Document for Review December 6, :59 am D D D D P DP 3A 1B 2C 3D 9E 7A 1E 1A 2B 1C 5D 12B 2D 3E 1A 2B 8C 12C 1C 1D 3E 2A 7B 11D Figure 4-11 Recreation of data in C stripe After RAID-DP has recreated a missing diagonal block, the process again switches to recreating a missing horizontal block from row parity. When the missing diagonal block in the C stripe has been recreated, enough information is available to recreate the missing horizontal block from row parity, as evident in the following diagram. D D D D P DP 3a 1b 2c 3d 9e 7a 1e 1a 2b 1c 5d 12b 2d 3e 1a 2b 8c 12c 1c 1d 3e 2a 7b 11d Figure 4-12 recreation of the missing horizontal block from row parity, After the missing block in the horizontal row has been recreated, reconstruction switches back to diagonal parity to recreate a missing diagonal block. RAID-DP can continue in the current chain on the D stripe, as shown in the next diagram. 60 The IBM TotalStorage Network Attached Storage N Series

91 Draft Document for Review December 6, :59 am raiddp.fm D D D D P DP 3a 1b 2c 3d 9e 7a 1e 1a 2b 1c 5d 12b 2d 3e 1a 2b 8c 12c 1c 1d 3e 2a 7b 11d Figure 4-13 RAID-DP can continues in the current chain on the D stripe, Once again, after the recovery of a diagonal block the process switches back to row parity, as it has enough information to recreate data for the one horizontal block. The final diagram in the double disk failure scenario follows next, with all data having been recreated with RAID-DP. D D D D P DP 3A 1B 2C 3D 9E 7A 1E 1A 2B 1C 5D 12B 2D 3E 1A 2B 8C 12C 1C 1D 3E 2A 7B 11D Figure 4-14 Figure 4-15 all data recreated with RAID-DP RAID-DP Operation Summary The above recovery example goes a long way to give a pictorial description of RAID-DP in operation. But there are a few more areas about RAID-DP operations the example didn't make evident that need further discussion. If a double disk failure occurs, RAID-DP automatically raises the priority of the reconstruction process so the recovery completes faster. As a result, the time to reconstruct data from two failed disks is slightly less than the time to reconstruct data from a single disk failure. A second key feature of RAID-DP with double disk failure is that it is highly likely one disk failed some time before the second and at least some information has already been recreated with traditional row parity. RAID-DP automatically adjusts for this occurrence by starting recovery where two elements are missing from the second disk failure. Chapter 4. N Series Data Protection with RAID DP 61

92 raiddp.fm Draft Document for Review December 6, :59 am 4.5 RAID-DP Overview RAID-DP is available with no cost or special hardware requirements. The only requirement to start using RAID-DP is to upgrade to at least Data ONTAP version 7.1 with the IBM N series storage system. In addition, there are some more technical details about RAID-DP usage that haven't been covered and will be addressed in this section. By default the IBM N Series storage systems are shipped with the RAID-DP configuration. The initial configuration has three drives configured as shown in figure Figure 4-16 on page 62 below. PARITY ONTAP PARITY Figure 4-16 RAID-DP initial factory setup 4.6 Protection Levels with RAID-DP At the lowest level RAID-DP offers protection against either two failed disks within the same RAID group or from a single disk failure followed by a bad block or bit error before reconstruction has completed. A higher level of protection is available by using RAID-DP in conjunction with SyncMirror. In this configuration, the protection level is up to five concurrent disk failures, four concurrent disk failures followed by a bad block or bit error before reconstruction is completed. 4.7 Larger versus smaller RAID groups Configuring an optimum RAID group size for a volume requires a trade-off of factors. You must decide which factor--speed of recovery, assurance against data loss, or maximizing data storage space-- is most important for the volume that you are configuring. 62 The IBM TotalStorage Network Attached Storage N Series

93 Draft Document for Review December 6, :59 am raiddp.fm Advantages of large RAID groups Large RAID group configurations offer the following advantages More data drives available. A volume configured into a few large RAID groups requires fewer drives reserved for parity than that same volume configured into many small RAID groups. Better system performance. Read/write operations are usually faster over large RAID groups than over smaller RAID groups Advantages of small RAID groups Small RAID group configurations offer the following advantages: Shorter disk reconstruction times. In case of disk failure within a small RAID group, data reconstruction time is usually shorter than it would be within a large RAID group. Decreased risk of data loss due to multiple disk failures. The probability of data loss through double disk failure within through triple disk failure within a RAID DP group is lower within a small RAID group than in a large RAID group. 4.8 Hot spare disks A hot spare disk is a filer disk that has not been assigned to a RAID group. It does not yet hold data but is ready for use. In event of disk failure within a RAID group, Data ONTAP automatically assigns hot spare disks to RAID groups to replace the failed disks. Hot spare disks do not have to be in the same disk shelf as other disks of a RAID group to be available to a RAID group. Figure 4-17 Raid-DP protection Chapter 4. N Series Data Protection with RAID DP 63

94 raiddp.fm Draft Document for Review December 6, :59 am IBM recommends keeping at least one hot spare disk for each disk size and disk type installed in your filer. This allows the filer to use a disk of the same size and type as a failed disk when reconstructing a failed disk. If a disk fails and a hot spare disk of the same size is not available, the filer uses a spare disk of the next available size up Disk failure with a hot spare disk Filer replaces disk with spare and reconstructs data If a disk fails, the filer Replaces the failed disk with a hot spare disk (if RAID DP is enabled and double disk failure occurs in the RAID group, the filer replaces each failed disk with a separate spare disk). Data ONTAP first attempts to use a hot spare disk of the same size as the failed disk. If no disk of the same size is available, Data ONTAP replaces the failed disk with a spare disk of the next available size up. Reconstructs in the background the missing data onto the hot spare disk or disks Logs the activity in the /etc/messages file on the root volume Note: With RAID DP, the above processes can be carried out even in the event of simultaneous failure on two disks in a RAID group. Attention: During reconstruction, file service can slow down. Attention: After the filer is finished reconstructing data, replace the failed disk or disks with new hot spare disks as soon as possible so that hot spare disks are always available in the system. IBM recommends keeping at least one matching hot spare disk for each disk size and disk type installed in your filer. This allows the filer to use a disk of the same size and type as a failed disk when reconstructing a failed disk. If a disk fails and a hot spare disk of the same size is not available, the filer uses a spare disk of the next available size up. 64 The IBM TotalStorage Network Attached Storage N Series

95 Draft Document for Review December 6, :59 am Snap.fm 5 Chapter 5. SnapShots The primary focus of this chapter is on the algorithms and data structures that WAFL uses to implement Snapshots, which are read-only clones of the active file system. WAFL uses a copy-on-write technique to minimize the disk space that Snapshots consume. This chapter also describes how WAFL uses Snapshots to eliminate the need for file system consistency checking after an unclean shutdown. 5.1 Introduction To Snapshots WAFL's primary distinguishing characteristic is Snapshots, which are read-only copies of the entire file system. WAFL creates and deletes Snapshots automatically at pre scheduled times, and it keeps up to 255 Snapshots on-line at once to provide easy access to old versions of files. Copyright IBM Corp All rights reserved. 65

96 Snap.fm Draft Document for Review December 6, :59 am Block 1 Block 2 Snapshot A snapshot is a read-only, freeze framed version of a filer s file system, frozen at some past point in time A volume can maintain up to 255 snapshots concurrently Snapshots are readily accessible via special subdirectories that appear in the current or active file system Snapshots consume space when the file system changes Block 3 Snapshots use no additional disk space when first taken Snapshots can be taken manually or automatically on a schedule Figure 5-1 SnapShot Features Snapshots use a copy-on-write technique to avoid duplicating disk blocks that are the same in a Snapshot as in the active file system. Only when blocks in the active file system are modified or removed do Snapshots containing those blocks begin to consume disk space. Users can access Snapshots through NFS to recover files that they have accidentally changed or removed, and system administrators can use Snapshots to create backups safely from a running system. In addition, WAFL uses Snapshots internally so that it can restart quickly even after an unclean system shutdown. 5.2 High level Snapshot process 66 The IBM TotalStorage Network Attached Storage N Series

97 Draft Document for Review December 6, :59 am Snap.fm How Snapshot works (1) Active Data File/LUN: X A B C IBM Corporation Figure 5-2 Step 1 1. Snapshots are done from Active Data on the file system How Snapshot works (2) Active Data File/LUN: X Snapshot File/LUN: X A B C These particular Disk blocks are frozen on disk Consistent point-in-time copy Ready to use (read-only) Consumes minimal amount of space (4KB) & creates pointers or original data IBM Corporation Figure 5-3 Step 2 2. When a a initial Snapshot is done no initial data is copied. Instead pointers are created to the original blocks for recording of a Point in time state of these blocks. Chapter 5. SnapShots 67

98 Snap.fm Draft Document for Review December 6, :59 am How Snapshot works (3) Active Data File/LUN: X Snapshot File/LUN: X A B C1 Disk blocks Client sends new data block C2 C2 Modified data block is simply written to an optimal location on the disk, in one I/O operation. Some vendors require 3 operations IBM Corporation Figure 5-4 Step 3 3. When a request to Block C occurs the original Block C1 is frozen to maintain a point in time copy, the modified block C2 is written to another location on disk and now becomes the active block. How Snapshot works (4) Active Data File/LUN: X Snapshot File/LUN: X A B C1 C2 Disk blocks Active version of X is now composed of blocks A, B & C Snapshot version of X remains composed of blocks A, B & C Consumes space incrementally! IBM Corporation Figure 5-5 Step 4 68 The IBM TotalStorage Network Attached Storage N Series

99 Draft Document for Review December 6, :59 am Snap.fm 4. The final result is that the Snapshot now consumes 4k + C1 of space. Active points for the point in time snapshot are unmodified blocks A, B and point in time copy C Understanding Snapshots in detail Understanding that the WAFL file system is a tree of blocks rooted by the root inode is the key to understanding Snapshots. To create a virtual copy of this tree of blocks, WAFL simply duplicates the root inode. Figure 5-6 shows how this works. Figure 5-6 column B shows how WAFL creates a new Snapshot by making a duplicate copy of the root inode. This duplicate inode becomes the root of a tree of blocks representing the Snapshot, just as the root inode represents the active file system. When the Snapshot inode is created, it points to exactly the same disk blocks as the root inode, so a brand new Snapshot consumes no disk space except for the Snapshot inode itself. Figure 5-6 column C shows what happens when a user modifies data block D. WAFL writes the new data to block D' on disk, and changes the active file system to point to the new block. The Snapshot still references the original block D which is unmodified on disk. Over time, as files in the active file system are modified or deleted, the Snapshot references more and more blocks that are no longer used in the active file system. The rate at which files change determines how long Snapshots can be kept on line before they consume an unacceptable amount of disk space. A Before Snapshot B After Snapshot C After Block Update Root Inode New Snapshot Root Inode New Snapshot Root Inode Figure 5-6 WAFL creates a Snapshot by duplicating the root inode that describes the inode file. WAFL avoids changing blocks in a Snapshot by writing new data to new locations on disk. WAFL's Snapshots duplicate the root inode instead of copying the entire inode file. This reduces considerable disk I/O and saves a lot of disk space. By duplicating just the root inode, WAFL creates Snapshots very quickly and with very little disk I/O. Snapshot performance is important because WAFL creates a Snapshot every few seconds to allow quick recovery after unclean system shutdowns. Figure 5-7 on page 70 shows the transition from Figure 5-6 column B to Figure 5-6 column C in more detail. When a disk block is modified, and its contents are written to a new location, the block's parent must be modified to reflect the new location. The parent's parent, in turn, must also be written to a new location, and so on up to the root of the tree Chapter 5. SnapShots 69

100 Snap.fm Draft Document for Review December 6, :59 am A Before Block Update Snapshot Inode Root Inode B After Block Update Snapshot Inode Root Inode Figure 5-7 To write a block to a new location, the pointers in the block's ancestors must be updated, which requires them to be written to new locations as well. WAFL would be very inefficient if it wrote this many blocks for each NFS write request. Instead, WAFL gathers up many hundreds of NFS requests before scheduling a write episode. During a write episode, WAFL allocates disk space for all the dirty data in the cache and schedules the required disk I/O. As a result, commonly modified blocks, such as indirect blocks and blocks in the inode file, are written only once per write episode instead of once per NFS request. 5.4 Snapshot Data Structures And Algorithms The Block-Map File Most file systems keep track of free blocks using a bit map with one bit per disk block. If the bit is set, then the block is in use. This technique does not work for WAFL because many snapshots can reference a block at the same time. WAFL's block-map file contains a 32-bit entry for each 4 KB disk block. Bit 0 is set if the active file system references the block, bit 1 is set if the first Snapshot references the block, and so on. A block is in use if any of the bits in its block-map entry are set. Figure 5-8 on page 71 shows the life cycle of a typical block-map entry. At time t1, the block-map entry is completely clear, indicating that the block is available. At time t2, WAFL allocates the block and stores file data in it. When Snapshots are created, at times t3 and t4, 70 The IBM TotalStorage Network Attached Storage N Series

101 Draft Document for Review December 6, :59 am Snap.fm WAFL copies the active file system bit into the bit indicating membership in the Snapshot. The block is deleted from the active file system at time t5. This can occur either because the file containing the block is removed, or because the contents of the block are updated and the new contents are written to a new location on disk. The block can't be reused, however, until no Snapshot references it. In Figure 5-8, this occurs at time t8 after both Snapshots that reference the block have been removed. Time T1 T2 t3 t4 t5 t6 t7 t8 Block Map Entry Description Block is unused Block is allocated for active FS Snapshot #1 is created Snapshot #2 is created Block is deleted from active FS Snapshot #3 is created Snapshot #1 is deleted Snapshot # 2 is deleted block is unused Bit 0 set for active filesystem Bit 1 set for Snapshot #1 Bit 2 set for Snapshot #2 Bit 3 set for Snapshot #3 Figure 5-8 The life cycle of a block-map file entry Creating a Snapshot The challenge in writing a Snapshot to disk is to avoid locking out incoming NFS requests. The problem is that new NFS requests may need to change cached data that is part of the Snapshot and which must remain unchanged until it reaches disk. An easy way to create a Snapshot would be to suspend NFS processing, write the Snapshot, and then resume NFS processing. However, writing a Snapshot can take over a second, which is too long for an NFS server to stop responding. Remember that WAFL creates a consistency point Snapshot at least every 10 seconds, so performance is critical. WAFL's technique for keeping Snapshot data self consistent is to mark all the dirty data in the cache as IN_SNAPSHOT. The rule during Snapshot creation is that data marked IN_SNAPSHOT must not be modified, and data not marked IN_SNAPSHOT must not be flushed to disk. NFS requests can read all file system data, and they can modify data that isn't IN_SNAPSHOT, but processing for requests that need to modify IN_SNAPSHOT data must be deferred. To avoid locking out NFS requests, WAFL must flush IN_SNAPSHOT data as quickly as possible. To do this, WAFL performs the following steps: 1. Allocate disk space for all files with IN_SNAPSHOT blocks. WAFL caches inode data in two places: in a special cache of in-core inodes, and in disk buffers belonging to the inode file. When it finishes write allocating a file, WAFL copies the newly updated inode information from the inode cache into the appropriate inode file disk buffer, and clears the IN_SNAPSHOT bit on the in-core inode. When this step is complete no inodes for regular Chapter 5. SnapShots 71

102 Snap.fm Draft Document for Review December 6, :59 am files are marked IN_SNAPSHOT, and most NFS operations can continue without blocking. Fortunately, this step can be done very quickly because it requires no disk I/O. 2. Update the block-map file. For each block-map entry, WAFL copies the bit for the active file system to the bit for the new Snapshot. 3. Write all IN_SNAPSHOT disk buffers in cache to their newly-allocated locations on disk. As soon as a particular buffer is flushed, WAFL restarts any NFS requests waiting to modify it. 4. Duplicate the root inode to create an inode that represents the new Snapshot, and turn the root inode's IN_SNAPSHOT bit off. The new Snapshot inode must not reach disk until after all other blocks in the Snapshot have been written. If this rule were not followed, an unexpected system shutdown could leave the Snapshot in an inconsistent state. Once the new Snapshot inode has been written, no more IN_SNAPSHOT data exists in cache, and any NFS requests that are still suspended can be processed. Under normal loads, WAFL performs these four steps in less than a second. Step (1) can generally be done in just a few hundredths of a second, and once WAFL completes it, very few NFS operations need to be delayed. Deleting a Snapshot is trivial. WAFL simply zeros the root inode representing the Snapshot and clears the bit representing the Snapshot in each block-map entry. 72 The IBM TotalStorage Network Attached Storage N Series

103 Draft Document for Review December 6, :59 am snaprestore.fm 6 Chapter 6. Recovery with SnapRestore This chapter provides discusses SnapRestore software that allows an enterprise to recover almost instantly from these disaster scenarios. In seconds, SnapRestore software can recover anything from an individual file to a multiterabyte volume so that operations can be quickly resumed. SnapRestore software makes recovering your data fast and easy. Without SnapRestore you would need to use one of the following methods to restore data: Restore files from tape Copy files from a snapshot to the active file system Using either of these methods takes longer than reverting a volume, and can take longer than everting a single file. This is because with SnapRestore, no data is copied; instead, the file system is restored to an earlier state. SnapRestore requires a license code. You must license SnapRestore before you can use it. 6.1 Overview SnapRestore leverages the Snapshot feature of Data ONTAP software by restoring a file entire file system or LUN to an earlier preserved state. It can be used to recover a damaged or deleted file or to recover from a corrupted database, application, or damaged file system.the system administrator can restore a file or the entire file system, LUN or entire volume from any existing Snapshot copy. Without rebooting, the restored file, volume, filesystem or LUN is available for full production use, having returned to the precise state that existed when the selected Snapshot copy was created. From a single home directory to a huge production database, SnapRestore does the job in seconds regardless of file or volume size. Cost and Storage Efficiency Snapshot technology makes extremely efficient use of storage by storing only block-level changes between each successive Snapshot. Since the Snapshot process is automatic and virtually instantaneous, backups are significantly faster and simpler. SnapRestore software uses Snapshot technology to perform near-instantaneous data restoration. In contrast, alternative storage solutions copy all of the data and require much more time and disk storage for Copyright IBM Corp All rights reserved. 73

104 snaprestore.fm Draft Document for Review December 6, :59 am the backup-and-restore operations. SnapRestore also saves staffing resources. Whether your business employs a small group of end users or an enterprise-scale user community and IT support team, SnapRestore's easy single-command restoration eliminates complexity and reduces errors. Not only that, but using SnapRestore requires no special training or expertise. More Than Just Data Protection With SnapRestore, data can be restored from any one of the Snapshots stored on the file system. This allows an application development team, for example, to revert to Snapshots from various stages of their design, or test engineers to quickly and easily return data to a baseline state. Restoring to the base environment takes only seconds, and the restored environment is identical to the point at which the Snapshot copy was created. 6.2 Key Features Fast and Efficient Recover entire volumes or individual files in seconds. Simple Single-Command Operation No special expertise required, virtually eliminating the chance of operator error. File or Full Volume Restore Choose to restore only affected files or the entire volume. Multiple Recovery Points Restore the most recent clean copy from any Snapshot. Unsurpassed Reliability Far more dependable than traditional data restoration methods. SnapRestore lets you: Restore databases quickly Especially useful to databases where recovery time is critical Quickly recover from virus attacks Allows a particular volume to revert back to any specified Snapshot Nearly no time required for recovery:typically 2-3 minutes Great for software testing situations that need to frequently return to a baseline state. Based on Snapshot technology Reverts an entire volume, individual file/lun to data of some previous Snapshot Can reduce dependency upon tape Recover data after a user or application error 6.3 Examples: Snaprestore allows a customer to quickly restore an entire volume or a single file/lun by simply having any of the available snapshot images overwrite the existing data in the active filesystem. An analogy would be comparing Snaprestore to being a time machine for data; you can roll back data to the instant a Snapshot was taken. Of course, this has the effect of wiping out any changes made to the volume after the snapshot. 74 Introduction to Storage Infrastructure Simplification

105 Draft Document for Review December 6, :59 am snaprestore.fm Another example, say snapshots were taken every night at midnight. On Friday, you discover that an upgrade you did to your application on Thursday accidentally caused some serious corruption to my data stored on that volume. You would really like to go back to the state things were in when the snapshot occurred Wednesday night. You can invoke the snaprestore command and specify the Wednesday night snapshot. This will restore the entire volume in a fraction of a second and "revert" the active filesystem to the state it was in on Wednesday night. Upon further examination, one would notice that the Wednesday snapshot is still in the.snapshot list but the Thursday snapshot is now missing. The same thing can be done specifying a single file or LUN. However, there is one catch if you do this; to make it happen quickly, the file/lun has to be removed first from the active filesystem. If its not, the blocks for that file/lun are already defined so the filer turns the request into a copy operation instead of an instantaneous revert. So SFSR (single file snaprestore) becomes the same operation as copying the data from the Snapshot into the active filesystem. 6.4 SnapRestore for Databases Provides a unique solution to database recovery Rather than restoring large amounts of data from backup tape as seen by the following high level step by step procedure: 1. Simply revert the entire volume back in time to its state when the Snapshot was taken 2. Then play change logs forward to complete recovery This Effectively protects data without expensive mirroring or replication. Use Snaprestore where time to copy data from either a Snapshot or tape into the active filesystem is prohibitive SnapRestore for Databases Example Lets say that our Oracle database is damaged. 300 GB Oracle database and all of it requires recovery Tape recovers at 60GB/hour Normal recovery: 5 hours + log replay time Snaprestore reverts volume to same state as when backup was taken: Duration - 3 minutes Total recovery: 3 minutes + log replay time Note: A single use could pay for the filer in terms of cost-of-downtime for the Enterprise. Chapter 6. Recovery with SnapRestore 75

106 snaprestore.fm Draft Document for Review December 6, :59 am Typical software testing scenario: Establish Base Environment Run Test Restore Base Environment Might take hours! Evaluate Test Results If the Base Environment is 100 s of GB, restore time can be excessively long 5 Figure 6-1 Software testing scenario Testing with SnapRestore becomes: Establish Base Environment Take Snapshot Run Test Snaprestore Only takes 3-4 minutes! Evaluate Test Results Restoring Base Environment is now minutes, not hours. And environment is GUARANTEED to be identical. 6 Figure 6-2 Testing with SnapRestore 76 Introduction to Storage Infrastructure Simplification

107 Draft Document for Review December 6, :59 am snaprestore.fm 6.5 About SnapRestore The filer s SnapRestore feature enables you to revert a volume or file to the state it was in when a specific snapshot was taken. Without this feature, you would need to use one of the following methods to restore data: Restore files from tape Copy files from a snapshot to the active file system Using either of these methods takes longer than reverting a volume, and can take longer than everting a single file. This is because with SnapRestore, no data is copied; instead, the file system is restored to an earlier state. SnapRestore requires a license code. You must license SnapRestore before you can use it How SnapRestore Works After you select a snapshot for reversion, the filer restores the volume or file to contain the same data and timestamps as it did when the snapshot was taken. All data that existed before the reversion is overwritten. Important: You cannot undo a SnapRestore reversion to change the volume back to the state it was in prior to the reversion What SnapRestore Reverts SnapRestore only reverts file contents. It does not revert attributes of a volume, such as the snapshot schedule, volume option settings, RAID group size, and maximum number of files per volume. However, option settings applicable to the entire filer might be reverted. This is because the option settings are stored in a registry in the /etc directory on the root volume. If you revert the root volume, the registry is reverted to the version that was in use at the snapshot creation time. You can revert a volume to a snapshot taken when the filer was running a different Data ONTAP version as well. Beware that doing so can cause problems because of version incompatibilities. Important: You cannot revert a volume to recover a deleted snapshot. For example, if you delete the hourly.2 snapshot and revert the volume to the hourly.1 snapshot, you cannot find the hourly.2 snapshot after the reversion. Although the hourly.2 snapshot existed at the creation time of the hourly.1 snapshot, SnapRestore cannot revert the contents of the hourly.2 snapshot because you had already deleted it. Important: After you revert a volume to a specific snapshot, you will lose snapshots that are more recent than the snapshot used for the volume reversion. Chapter 6. Recovery with SnapRestore 77

108 snaprestore.fm Draft Document for Review December 6, :59 am For example, after you revert the volume to the hourly.1 snapshot, you no longer have access to more recent snapshots, such as the hourly.0 snapshot. This is because at the creation time of the hourly.1 snapshot, the hourly.0 snapshot did not exist Applications of SnapRestore Disaster recovery Database corruption recovery Application testing, such as a development environment using large data files If a client application corrupts data files in a volume, you can revert the volume to a snapshot taken before the data corruption. The following examples illustrate some situations in which you can apply SnapRestore to recover from corrupted data. Example: A messaging application or a database application stores user data in one or two files that can grow to several hundred GB in a volume. If, for some reason, this application corrupts the files, you can revert the volume to a snapshot taken before the data corruption. Or, if a single file is corrupt, you can revert only the specific file. Example: You can revert a volume or file in a test environment to its original state after each test SnapRestore in detail At time State 0Figure 6-3 on page 78, the first SnapShot is taken it s points to the 4K blocks are equivalent to those in the Active File System. No additional space is used at this time by SnapShot 1 because modifications to the Active File System blocks have not occurred. Time State 0 Snapshot 1 Active File System Figure 6-3 First Snap Time goes by and new files are added with new blocks and modifications to files and their existing blocks are done Figure 6-4 on page 79. SnapShot 1 now points to blocks and the filesystem as it appeared in Time State 0, notice that one of the blocks A1 has not been modified and is still part of the Active File System. SnapShot 2 reflects a SnapShot of file modifications and adds Since Time State 0 notice it still points to Active File System blocks A1 and A2. 78 Introduction to Storage Infrastructure Simplification

109 Draft Document for Review December 6, :59 am snaprestore.fm Time State 1 Snapshot 1 B Snapshot 2 C Active File System A B B B C A1 C C A2 C A A A A Figure 6-4 Second Snap Time goes by and more files are added with new blocks and modifications to files and their existing blocks are done Figure 6-5 on page 79. SnapShot 1 now points to blocks and the filesystem as it appeared in Time State 0. SnapShot 2 reflects a SnapShot of file modifications and adds Since Time State 0. SnapShot 3 reflects modifications and adds since Time State 1and SnapShot 2. Note that SnapShot1 no longer points to any Active File System Blocks Time State 2 Snapshot 1 B Snapshot 2 C Snapshot 3 D Active File System B B B C D C C A C D D A D A A A A A Figure 6-5 SnapShot 3 Chapter 6. Recovery with SnapRestore 79

110 snaprestore.fm Draft Document for Review December 6, :59 am Jumping ahead to subsequent SnapShot 4 Figure 6-6 on page 80 it puts us at Time State 3, and reflects adds, modifications of 4K blocks. Notice that the first 2 SnapShots no longer reflects any of the Active File System Blocks. Time State 3 Snapshot 1 B Snapshot 2 C Snapshot 3 D Snapshot 4 E Active File System A B B B C D C C E C D D A D E E E A A A A A A A Figure 6-6 Subsequent SnapShots In Figure 6-7 on page 80 the Customer has discovered that he has some filesystem corruption due to a virus and must revert to the Point in Time Snapshot as it looked like in Time State 2 and SnapShot 3. The Active File system becomes SnapShot 3. Blocks that were previously pointed to by solely by SnapShot4 or the Active FileSystem are freed up for writes again. In addition any blocks that were pointed to only by SnapShot 4 and the previous Active File system are also freed up again. Time State 4 Snapshot 1 B Snapshot 2 C Active File System A Deleted Deleted B B B C A C C A C A A A A Free Blocks Figure 6-7 Reversion to Snap 3 In Figure 6-8 on page 81 we compare Time State 4 and the Reversion to SnapShot 3 and Time State 1 which reflects the Active FileSystem before SnapShot 3. As you can see they are the same. 80 Introduction to Storage Infrastructure Simplification

111 Draft Document for Review December 6, :59 am snaprestore.fm Time State 4 Snapshot 2 B Snapshot 1 C Active File System A Deleted Deleted B B B C A C C A C A A A A After Reversion Before Reversion Free Blocks Time State 1 Snapshot 2 B Snapshot 1 C Active File System A B B B C A C C A C A A A A Figure 6-8 Comparison of Reversion to Snap 3 Time State 4 and Active File System at Time State 1 Chapter 6. Recovery with SnapRestore 81

112 snaprestore.fm Draft Document for Review December 6, :59 am 82 Introduction to Storage Infrastructure Simplification

115 Draft Document for Review December 6, :59 am flexvol.fm 7 Chapter 7. Introduction to FLEXVOL Flexible volumes are a ground breaking new technology. These volumes are logical data containers managed separately from their underlying physical storage that can be sized, resized, managed, and moved independently and non disruptively. Flexible volumes are file systems that hold user data which is accessible via one or more of the access protocols supported by Data ONTAP, including NFS, CIFS, HTTP, WebDAV, FTP, FCP and iscsi. Since each Flexible Volume is a separate filesystem, you can create one or more snapshots of the data in a volume so that multiple, space-efficient, point-in-time images of the data can be maintained for such purposes as backup and error recovery. The physical storage supporting Flexible Volumes is first arranged in Raid Groups (either RAID4 or RAID-DP the default). One or more Raid Groups are then combined together into an aggregate. Each storage appliance can support multiple aggregates with the maximum number dependent on the capacity of the storage appliance and the version of Data ONTAP. Copyright IBM Corp All rights reserved. 85

116 flexvol.fm Draft Document for Review December 6, :59 am Each volume depends on its containing aggregate for all its physical storage, that is, for all Flexible Volumes Flexible Volumes Disks Disks Disks Pooled physical storage (aggregates) IBM Corporation Figure 7-1 Flexible Volumes storage in the aggregate's disks and RAID groups. Because the FlexVol is managed separately from the aggregate, you can create small FlexVol volumes (20 MB or larger) and you can increase or decrease the size of FlexVol volumes in increments as small as 4 KB. A FlexVol can share its containing aggregate with other FlexVol volumes. Thus, a single aggregate can be the shared source of all the storage used by all the FlexVol volumes contained by that aggregate. The unused space is managed by the aggregate so that unallocated space in one FlexVol does not impact the space used in another FlexVol within the same aggregate. As shown in Figure 7-2 on page 87 below, an aggregate is defined as a pool of many disks, from which space is allocated to volumes (volumes are shown in the illustration as FlexVol and FlexClone entities). From the administrator s point of view, volumes remain the primary unit of data management. But transparently to the administrator, flexible volumes now refer to logical entities, not (directly) to physical storage. Flexible volumes are therefore no longer bound by the limitations of the disks on which they reside. A FlexVol volume is simply a pool of storage that can be sized based on how much data you want to store in it rather than on what the size of your disks dictates. A FlexVol volume can be shrunk or increased on the fly without any downtime. Flexible volumes have all the spindles in the aggregate available to them at all times. For I/O-bound applications, flexible volumes can run much faster than equivalent-sized traditional volumes. Flexible volumes provide these new benefits while preserving the familiar semantics of volumes and the current set of volume-specific data management and space allocation capabilities. Functions like Snapshot scheduling, quotas, volume security options are all retained with flexible volumes and their function and access is unchanged. 86 The IBM TotalStorage Network Attached Storage N Series

117 Draft Document for Review December 6, :59 am flexvol.fm FlexVol dynamic virtualization: Designed to offer dramatic improvement in storage management Manage data, not disks Applications: tied to flexible volumes Designed to provide: Flexible Volumes: not tied to physical storage Improved asset utilization Better productivity Increased performance Disks Disks Disks Pooled physical storage (aggregates) IBM Corporation Figure 7-2 An aggregate consists of a pool of many disks from which space is allocated to volumes Improved Performance With Data ONTAP, disks are still organized in RAID groups, which consist of a parity disk (two in the case of RAID-DP) and some number of data disks. RAID groups will now usually be combined into aggregates. In Data ONTAP, RAID groups are combined to create aggregates. Since volumes are still the usual unit of storage management, it will be common to include all disks on a single IBM storage appliance in one aggregate and then allocate multiple volumes on that one large aggregate. This makes it possible to tap the unused performance capacity of all the disks, making that capacity available to the busiest part of the system. A FlexVol volume is flexible in changing size because the underlying physical storage does not have to be re partitioned. Chapter 7. Introduction to FLEXVOL 87

118 flexvol.fm Draft Document for Review December 6, :59 am FlexVol : Helps to Improved Utilization Vol 1 Vol 2 Vol 3 Vol 4 Regular Volume Pre-allocated Free space fragmented Vol 1 Vol Vol 2 3 Vol 3 Vol 4 Vol 2 Free Vol 1 FlexVol No pre-allocation Free space shared Free space reduced Thin Provisioning Vol 4 Free IBM Corporation FlexVol Improves Utilization Dynamic Virtualization reduces storage bottlenecks related to hardware and allows for easy movement of data to idle space and spindles. For those applications that are i/o intensive Dynamic Virtualization helps those applications perform better. Even for those applications or data patterns that have a tight locality of reference, those references are virtually spread out amongst multiple physical disks. 88 The IBM TotalStorage Network Attached Storage N Series

119 Draft Document for Review December 6, :59 am flexvol.fm FlexVol : Increasing I/O Performance Static virtualization Volume performance limited by number of disks it has Hot volumes can t be helped by disks on other volumes Dynamic virtualization Spindle sharing makes total aggregate performance available to all contained flexible volumes Typically as much as 2x improvement (10 disk vol. vs. shared 40-disk aggregate) IBM Corporation Figure 7-3 Performance Advantages Capacity Guarantees in FlexVol Data ONTAP introduces a new storage management concept of guarantees. A guarantee differs from the previous management concept of space reservations, which is familiar to customers using the iscsi or Fibre Channel facilities of Data ONTAP. Guarantees extend the control by allowing administrators to set policies that determine how much storage is actually preallocated when volumes or files are created. This allows administrators to effectively implement the concept of thin provisioning. This concept of thin provisioning enables the administrator to in effect oversubscribe their storage safely. Provision your storage space once, then grow as needed in the aggregate. Guarantees, set at the volume level, determine how the aggregate pre allocates space to a flexible volume. When you create a FlexVol volume within an aggregate, you specify the capacity and, optionally, the type of guarantee. There are three types of guarantees: 1. Volume: A guarantee type of volume ensures that the amount of space required by the flexible volume is always available from its aggregate. This is the default setting for flexible volumes. 2. File: With the file guarantee type, the aggregate guarantees that space is always available for writes to space-reserved LUNs or other files. 3. None: A flexible volume with a guarantee type of none reserves no space, regardless of the space reservation settings for LUNs in that volume. Write operations to space-reserved LUNs in that volume might fail if the containing aggregate does not have enough available space. Chapter 7. Introduction to FLEXVOL 89

120 flexvol.fm Draft Document for Review December 6, :59 am Flexible Capacity Planning There are virtually no restrictions on the size of a FlexVol volume, and flexible volumes can be resized dynamically. Restrictions in size differ between platform types. Check the system configuration guide on the IBM website for real limits. Administrators can use flexible volumes as a powerful tool for allocation and provisioning of storage resources among various users, groups, and projects. The smallest growth, or shrink increment is 4KB (1 block in WAFL terms). For example, suppose a database grows much faster than originally anticipated. The administrator can reconfigure the relevant flexible volumes at any time during the operation of the system. The reallocation of storage resources does not require any downtime, and it is transparent to users on a file system, or a LUN mapped to a host in a block environment. The effect is non disruptive to all clients connected to this file system. When additional physical space is required, the administrator can increase the size of the aggregate by assigning additional disks to it. The new disks become part of the aggregate, and their capacity and I/O bandwidth are available to all of the flexible volumes in the aggregate. Overall FlexVol capacity can also be over allocated where the set capacity of all the flexible volumes on an aggregate exceeds the total available physical space. Increasing the capacity of a FlexVol volume does not require changing the capacity of another volume in the aggregate or the aggregate itself. Currently The only limit imposed on FlexVol volumes is the overall system limit of 200 for all volumes. For clusters, these limits apply to each node individually, so the overall limits for the pair are doubled. FlexVol : Helps Improve Productivity Example: Shrink volume by 100 GB Legacy (static virtualization) Create new (smaller) LUN Provision space Design new layout (slice, stripe, etc.) Schedule downtime Bring LUN offline Copy data from old to new Reconfigure host Bring LUN back online FlexVol (dynamic virtualization) > vol size dbvol -100g : hours/days : seconds IBM Corporation Figure 7-4 Productivity enhancement 90 The IBM TotalStorage Network Attached Storage N Series

121 Draft Document for Review December 6, :59 am flexclone.fm Chapter 8. A Thorough Introduction to FlexClone Volumes The goal of this chapter is to help storage system administrators understand the full value FlexClone volumes can bring to their operations. In the following sections we will explain how FlexClone volumes work, explore practical applications for FlexClone technology, provide a detailed example scenario, discuss FlexClone performance, detail best practices for success with FlexClone volumes, and conclude with a list of references to learn even more. 8.1 Introduction This chapter describes a powerful new feature that allows N series administrators to instantly create clones of a flexible volume (FlexVol volume Chapter 7, Introduction to FLEXVOL on page 85). A FlexClone volume is a writable point-in-time image of a FlexVol volume or another FlexClone volume. FlexClone volumes add a new level of agility and efficiency to storage operations. They take only a few seconds to create and are created without interrupting access to the parent FlexVol volume. FlexClone volumes use space very efficiently, leveraging the Data ONTAP architecture to store only data that changes between the parent and clone. This is a huge potential saving in dollars, space, and energy. In addition to all these benefits, clone volumes have the same high performance as other kinds of volumes Copyright IBM Corp All rights reserved. 91

122 flexclone.fm Draft Document for Review December 6, :59 am Figure 8-1 Think of a FlexClone volume as a transparent writable layer in front of a Snapshot Conceptually, FlexClone volumes are great for any situation where testing or development occur, any situation where progress is made by locking in incremental improvements, and any situation where there is a desire to distribute data in changeable form without endangering the integrity of the original. FlexClone Helps Improve Flexibility and Efficiency User reads unchanged Snapshot data in base New production writes FlexClone Snapshot Flexible Volume (base production data) 62 User desiring clone access to base data User writes new data Snapshot-based logical copy Only new data consumes additional storage, in the FlexClone or the base flexible volume, as appropriate Usage examples Test and production copies Database simulations Software testing Data mining where R/W required Large system deployments where small variations required, e.g., grid computing Key Benefit Can help produce dramatic cost savings vs. competitive alternatives 2005 IBM Corporation Figure 8-2 FlexClone basics 92 The IBM TotalStorage Network Attached Storage N Series

123 Draft Document for Review December 6, :59 am flexclone.fm For example, imagine a situation where the IT staff needs to make substantive changes to a production environment. The cost and risk of a mistake are too high to do it on the production volume. Ideally, there would be an instant writable copy of the production system available at minimal cost in terms of storage and service interruptions. By using FlexClone volumes, the IT staff gets just that: an instant point-in-time copy of the production data that is created transparently and uses only enough space to hold the desired changes. They can then try out their upgrades using the FlexClone volumes. At every point that they make solid progress, they clone their working FlexClone volume to lock in the successes. At any point where they get stuck, they just destroy the working clone and go back to the point of their last success. When everything is finally working just the way they like, they can either split off the clone to replace their current production volumes or codify their successful upgrade process to use on the production system during the next maintenance window. The FlexClone feature allows them to make the necessary changes to their infrastructure without worrying about crashing their production systems or making untested changes on the system under tight maintenance window deadlines. The results are less risk, less stress, and higher levels of service for the IT customers. 8.2 How FlexClone Volumes Work FlexClone Operation Volume 1 Snapshot Copy of Volume 1 Volume 2 (Clone) Data Written to Disk: Snapshot Copy Volume 1 Changed Blocks Cloned Volume Changed Blocks Start with a volume Create a Snapshot copy Create a clone (a new volume based on the Snapshot copy) Modify the original vol Modify the cloned vol Result: Independent volume copies, efficiently stored Figure 8-3 FlexClone Operation FlexClone volumes have all the capabilities of a FlexVol volume, including growing, shrinking, and being the source of a Snapshot copy or even another FlexClone volume. The technology that makes this all possible is integral to how Data ONTAP manages storage. N series filers use a Write Anywhere File Layout (WAFL) to manage disk storage. Any new data that gets written to the volume doesn t need to go on a specific spot on the disk; it can be written anywhere. WAFL then updates the metadata to integrate the newly written data into the right place in the file system. If the new data is meant to replace older data, and the older data is not part of a Snapshot copy, WAFL will mark the blocks containing the old data as reusable. This can happen asynchronously and does not affect performance. Snapshot copies work by Chapter 8. A Thorough Introduction to FlexClone Volumes 93

124 flexclone.fm Draft Document for Review December 6, :59 am making a copy of the metadata associated with the volume. Data ONTAP preserves pointers to all the disk blocks currently in use at the time the Snapshot copy is created. When a file is changed, the Snapshot copy still points to the disk blocks where the file existed before it was modified, and changes are written to new disk blocks. As data is changed in the parent FlexVol volume, the original data blocks stay associated with the Snapshot copy rather than getting marked for reuse. All the metadata updates are just pointer changes, and the filer takes advantage of locality of reference, NVRAM, and RAID technology to keep everything fast and reliable. Figure 8-4 on page 94 provides a graphical illustration of how this works. Original File system Snapshot taken New File System Modified root Snapshot root Snapshot root Smith Miller Smith Miller Smith Miller Jones Jones Jones Jones Sales Finance Docs Sales Finance Docs Sales Finance Docs Sales File A File B File C File A File B File C File A File B File C File A (A) (B) (C) Figure 8-4 WAFL creates a Snapshot copy by duplicating the root. WAFL avoids changing blocks in a Snapshot copy by writing new data to new locations on disk. You can think of a FlexClone volume as a transparent writable layer in front of the Snapshot copy. Figure 8-1 on page 92 provides a memorable illustration of that concept. A FlexClone volume is writable, so it needs some physical space to store the data that is written to the clone. It uses the same mechanism used by Snapshot copies to get available blocks from the containing aggregate. Whereas a Snapshot copy simply links to existing data that was overwritten in the parent, a FlexClone volume stores the data written to it on disk (using WAFL) and then links to the new data as well. The disk space associated with the Snapshot copy and FlexClone is accounted for separately from the data in the parent FlexVol volume. When a FlexClone volume is first created, it needs to know the parent FlexVol volume and also a Snapshot copy of the parent to use as its base. The Snapshot copy can already exist, or it can get created automatically as part of the cloning operation. The FlexClone volume gets a copy of the Snapshot copy metadata and then updates its metadata as the clone volume is created. Creating the FlexClone volume takes just a few moments because the copied metadata is very small compared to the actual data. The parent FlexVol volume can change independently of the FlexClone volume because the Snapshot copy is there to keep track of the changes and prevent the original parent s blocks from being reused while the Snapshot copy exists. The same Snapshot copy is read-only and can be efficiently reused as the base for multiple FlexClone volumes. Space is used very efficiently, since the only new disk space used is either associated with the small amounts of metadata or updates and/or additions to either the parent FlexVol volume or the FlexClone volume. 94 The IBM TotalStorage Network Attached Storage N Series

125 Draft Document for Review December 6, :59 am flexclone.fm FlexClone volumes appear to the storage administrator just like a FlexVol volume, which is to say that they look like a regular volume and have all of the same properties and capabilities. Using the CLI, FilerView or DataFabric Manager, one can manage volumes, Snapshot copies, and FlexClone volumes including getting their status and seeing the relationships between the parent, Snapshot copy, and clone. The CLI is required to create and split a FlexClone volume. FlexClone volumes are treated just like a FlexVol volume for most operations. The main limitation is that Data ONTAP forbids operations that would destroy the parent FlexVol volume or base Snapshot copy while dependent FlexClone volumes exist. Other caveats are that management information in external files (e.g., /etc) associated with the parent FlexVol volume is not copied, quotas for the clone volume get reset rather than added to the parent FlexVol volume, and LUNs in the cloned volume are automatically marked offline until they are uniquely mapped to a host system. Lastly, splitting the FlexClone volume from the parent volume to create a fully independent volume requires adequate free space in the aggregate to copy shared blocks. 8.3 Practical Applications of FlexClone Technology FlexClone technology enables multiple, instant data set clones with no storage overhead. It provides dramatic improvements for application test and development environments and is tightly integrated with the file-system technology and a microkernel design in a way that renders competitive methods archaic. FlexClone volumes are ideal for managing production data sets. They allow effortless error containment for bug fixing and development. They simplify platform upgrades for ERP and CRM applications. Instant FlexClone volumes provide data for multiple simulations against large data sets for ECAD, MCAD, and Seismic applications all without unnecessary duplication or waste of physical space. The ability to split FlexClone volumes from their parent lets administrators easily create new permanent, independent volumes for forking project data. FlexClone volumes have their limits, but the real range of applications is limited only by imagination. Table 2 lists a few of the more common examples. Application Area Application Testing Data Mining Parallel Processing Benefit Make the necessary changes to infrastructure without worrying about crashing production systems. Avoid making untested changes on the system under tight maintenance window deadlines. Less risk, less stress, and higher service-level agreements. Data mining operations and software can be implemented more flexibly because both reads and writes are allowed. Multiple FlexClone volumes of a single milestone/production data set can be used by parallel processing applications across multiple servers to get results more quickly. Chapter 8. A Thorough Introduction to FlexClone Volumes 95

126 flexclone.fm Draft Document for Review December 6, :59 am Application Area Online Backup System Deployment IT Operations Benefit Immediately resume read-write workload on discovering corruption in the production data set by mounting the clone instead. Use database features such as DB2 write-suspend or Oracle hot backup mode to transparently prepare the database volumes for cloning by delaying write activity to the database. This is necessary because databases need to maintain a point of consistency Maintain a template environment and use FlexClone volumes to build and deploy either identical or variation environments. Create a test template that is cloned as needed for predictable testing. Faster and more efficient migration using the Data ONTAP SnapMirror feature in combination with FlexClone volumes. Maintain multiple copies of production systems: live, development, test, reporting, etc. Refresh working FlexClone volumes regularly to work on data as close to live production systems as practical. 96 The IBM TotalStorage Network Attached Storage N Series

127 Draft Document for Review December 6, :59 am flexclone.fm 8.4 FlexClone Performance The performance of FlexClone volumes is nearly identical to the performance of flexible volumes. This is thanks to the way cloning is tightly integrated with WAFL and the filer architecture. Unlike other implementations of cloning technology, FlexClone volumes are implemented as a simple extension to existing core mechanisms. The impact of cloning operations on other system activity should also be relatively light and transitory.the FlexClone create operation is nearly identical to creating a Snapshot copy. Some CPU, memory, and disk resources are used during the operation, which usually completes in seconds. The clone metadata is held in memory like a regular volume, so the impact on filer memory consumption is identical to having another volume available. After the clone creation completes, all ongoing accesses to the clone are nearly identical to accessing a regular volume. Splitting the clone to create a fully independent volume also uses resources. While the split is occurring, free blocks in the aggregate are used to copy blocks shared between the parent and the clone. This incurs disk I/O operations and can potentially compete with other disk operations in the aggregate. The copy operation also uses some CPU and memory resources, which may impact the performance of a fully loaded filer. Data ONTAP addresses these potential issues by completing the split operation in the background and sets priorities in a way that does not significantly impact foreground operations. It is also possible to manually stop and restart the split operation if some critical job requires the full resources of the filer. The final area to consider is the impact on disk usage from frequent operations where FlexClone volumes are split off and used to replace the parent FlexVol volume. The split volume is allocated free blocks in the aggregate, taking contiguous chunks as they are available. If there is lots of free space in the aggregate, the blocks allocated to the split volume should be mostly contiguous. If the split is used to replace the original volume, the blocks associated with the destroyed original volume will become available and create a potentially large free area within the aggregate. That free area should also be mostly contiguous. In cases where many simultaneous volume operations reduce contiguous regions for the volumes, Data ONTAP uses a block reallocation functionality. The reallocate command makes defragmentation and sequential reallocation even more flexible and effective. It reduces any impact of frequent clone split and replace operations, as well as optimizes performance after other disk operations (e.g., adding disks to an aggregate) that may unbalance block allocations. 8.5 FlexClone Summary Starting with Data ONTAP 7G, storage administrators have access to greater flexibility and performance. Flexible volumes, aggregates, and RAID-DP provide unparalleled levels of storage virtualization, enabling IT staff to economically manage and protect enterprise data without compromise. FlexClone volumes are one of the many powerful features that make this possible, providing instantaneous writable volume copies that use only as much storage as necessary to hold new data. FlexClone volumes enable and simplify many operations. Application testing benefits from less risk, less stress, and higher service levels by using FlexClone volumes to try out changes on clone volumes and upgrade under tight maintenance windows by simply swapping tested FlexClone volumes for the originals. Data mining and parallel processing benefit by using multiple writable FlexClone volumes from a single data set, all without using more physical storage than needed to hold the updates. FlexClone volumes can be used as online backup Chapter 8. A Thorough Introduction to FlexClone Volumes 97

128 flexclone.fm Draft Document for Review December 6, :59 am and disaster recovery volumes immediately resuming read-write operation if a problem occurs. System deployment becomes much easier by cloning template volumes for testing and rollout. IT operations benefit from multiple copies of production system that can be used for testing and development and refreshed as needed to more closely mirror the live data. This Chapter thoroughly explored the flexible volume clone feature of Data ONTAP. It explained how FlexClone volumes work, explored practical applications and discussed performance. 98 The IBM TotalStorage Network Attached Storage N Series

129 Draft Document for Review December 6, :59 am flexcache.fm 9 Chapter 9. An Introduction to FlexCache Volumes This chapter provides an introduction to FlexCache, a feature of Data ONTAP that implements file caching. We will explain what FlexCache volumes are, describe how they work, and explain the benefits of caching by describing common deployment examples. Finally we will go over best practices for FlexCache configuration and sizing. This chapter is targeted toward evaluators and implementers to help them understand FlexCache and apply them to their organizations storage environments... Copyright IBM Corp All rights reserved. 99

130 flexcache.fm Draft Document for Review December 6, :59 am 9.1 Introduction FlexCache is that allows organizations to reduce management and infrastructure costs by automatically replicating, storing, and serving data requested over NFS. In the caching model FlexCache volumes store the data that s most recently accessed from the origin. To ensure freshness, of cached data, FlexCache will reclaim space by ejecting less recently accessed data from the cache. Additionally, if the source volume s data is modified FlexCache fetches the modified data upon access to ensure cache consistency. The FlexCache volume can therefore perform optimally when the source volume does not change very often. In compute farm environments, where the file server becomes a bottleneck, FlexCache can be used to increase overall performance while keeping administration costs to a minimum. Typically, computational capacity is increased by adding new clients to the compute farm to the point where file server throughput becomes the limiting factor. For employees located at remote offices, WAN-associated latencies and bandwidth constraints often lead to the deployment of additional servers at the remote site. Additional servers offer LAN-like access to remote employees but inevitably lead to increased management and infrastructure costs in the form of complex data replication processes and hardware. FlexCache alleviates much of this burden by caching and serving only the requested portion of a data file, while simultaneously maintaining data consistency with the origin file server FlexCache M ost recently accessed D a t a Less recently accessed d a ta e je c te d Figure 9-1 FlexCache Caching Model 9.2 What Are FlexCache Volumes? FlexCache volumes are sparsely populated volumes on a local (caching) N series System Storage that is backed by a volume on a different, possibly remote (origin), N series System Storage. When data is requested from the FlexCache volume, it is read through the network from the origin system and cached on the FlexCache volume. Subsequent requests for that data are then served directly from the FlexCache volume. In this way, clients in remote locations are provided with direct access to cached data. This improves performance when data is accessed repeatedly, because after the first request, the data no longer has to travel across the network. FlexCache supports read caching of NFS. FlexCache volumes are writable, and updates to a FlexCache volume are written through to the origin volume. The following terms are used throughout the paper and are important for understanding Data ONTAP because they relate to FlexCache volumes: 100 The IBM TotalStorage Network Attached Storage N Series

131 Draft Document for Review December 6, :59 am flexcache.fm Aggregate. A physical container of disks from which logical volumes and RAID groups can be created. Flexible volume. (see Chapter 7, Introduction to FLEXVOL on page 85 for a detailed description) A logical volume available in Data ONTAP 7G that resides within an aggregate and can grow or decrease in size. It is constrained only by the soft limits set when created and the hard limits of the aggregate. It is the enabling technology for all other flexible features. FlexCache volumes are special FlexVol volumes that are purpose-built for caching files from another filer. A FlexCache volume maps to a single volume on the origin server. The origin volume can be either a FlexVol volume or traditional volume. The caching filer implements the data protection and business continuance features of a filer, including clustered failover (CFO), allowing for a more redundant design without a single point of failure. FlexVol FlexVol FlexVol FlexVol FlexVol FlexVol Figure 9-2 FlexVol, FlexCache FlexCache volumes, FlexVol volumes can coexist on the caching filer. The type of volume and its properties will determine how space is managed within the aggregate. Figure 9-2 shows two FlexCache volumes, backed by FlexVol volumes on an origin filer. In addition to the two FlexCache volumes, the caching filer also hosts two FlexVol volumes. 9.3 Benefits of FlexCache Volumes Management and infrastructure costs can be reduced by deploying FlexCache volumes at remote offices Figure 9-3 on page 102 and in computer farm environments. NFS read requests are cached and write-through requests are proxied back to the origin filer. Deploying FlexCache provides the following benefits: Reduced administration costs: Chapter 9. An Introduction to FlexCache Volumes 101

132 flexcache.fm Draft Document for Review December 6, :59 am No need to manage replication or synchronization processes Fewer servers and storage devices to manage Faster time-to-market: New data is always accessible; no need to wait for next replication run Faster overall computation times in compute farms Lower infrastructure costs: Lower bandwidth costs because only the data requested by users gets transferred Lower server and storage costs because only the data that is needed gets stored Remote Caching Filer WAN FlexVol FlexVol FlexVol FlexVol FlexVol FlexVol Figure 9-3 Remote Office usage of FlexCache 9.4 Where Can FlexCache Volumes Be Used? Acceleration of Compute Farms One of the emerging trends in the technical computing marketplace is the proliferation of compute farms. The primary driver of this trend is the decrease in the price/performance of the PC-based computers through commoditization. Many of the high-end mainframe, supercomputer, or proprietary high-end SMP configurations are being complemented, or in some cases replaced, by small 1U/2U PCs, usually running a flavor of Linux. This proliferation of compute farms can cause hot spots in storage that serves application datasets. One of the primary benefits of using FlexCache volumes is their ability to create a copy of the data that is accessed most frequently. For applications workloads that are mostly read oriented, this ability of FlexCache can be used for off loading read operations from a filer serving application data to compute farms. In this configuration, the overall throughput of the compute farm accelerates because of the increased read bandwidth. Examples of 102 The IBM TotalStorage Network Attached Storage N Series

133 Draft Document for Review December 6, :59 am flexcache.fm applications that are mostly read intensive, and could therefore benefit from read off loading are Semiconductor Tapeout, Seismologic Analysis, and Rendering) Figure 9-4 Accelerating compute farms: read off loading using FlexCache. Figure 9-4 shows clients in a compute farm deployment accessing the application data set over NFS from a filer. The application data set can consist of software, reference libraries, and the application data. FlexCache can be configured in this environment in such a way that groups of NFS clients access cached copies of the entire application data set. Whenever an NFS client tries to access a reference library, the caching filer will fetch it from the origin and cache it locally. Subsequent accesses to this library will be from the local cache. Writes to any application data are written through the origin, making this deployment transparent to the application. This approach reduces load on the server and at the same time scales up server throughput with no reduction in client access latency. Chapter 9. An Introduction to FlexCache Volumes 103

134 flexcache.fm Draft Document for Review December 6, :59 am 9.5 Software Distribution to Remote Locations Figure 9-5 Software distribution to remote offices: reduce latency using FlexCache. In a distributed global enterprise, a common problem is associated with the distribution of software uniformly and consistently throughout the enterprise. In the UNIX world, software is often made available at a central location or locations by sharing the central location over NFS. Software distribution servers are often used in a tree topology, and the software distribution and libraries are replicated to the software distribution servers in the remote locations using tools such as rsync or rdist. This approach to software distribution can be expensive in terms of infrastructure, productivity, and administration costs. Since all of the data gets replicated from the central server to remote locations independent of needs, the cost of bandwidth to the remote offices as well as remote storage needs can be unnecessarily high. In terms of productivity, any data that is not replicated will not be available. Additionally, data may not be available between replication windows. Administrative costs of maintaining the scripts and monitoring replication cron jobs can be high. Moreover the storage for the remote software distribution servers may require occasional management. FlexCache can be used to simplify the remote software distribution within enterprises. In a typical remote office deployment, caching filers can be used as software distribution servers in the remote office. In a remote office configuration, a caching filer sits near the edge of the network or as close as possible to the remote office. Client requests are configured to explicitly mount the caching filer instead of the origin server. The caching filer will pull down only files that are requested by clients at the remote office and serve them locally on subsequent requests. This approach ensures that bandwidth to and the storage at the remote location is efficiently used. Additionally, local access to remote data significantly reduces the latency of access. Data in the local cache is immediately available as soon as the requested file or directory is pulled into the local cache, and the clients do not have to wait for bulk transfers to finish as described in the push model above. Since the FlexCache volumes can manage space allocation very efficiently based on the caching filer s space management 104 The IBM TotalStorage Network Attached Storage N Series

135 Draft Document for Review December 6, :59 am flexcache.fm feature, administration costs for software distribution can be significantly reduced. The clustered failover (CFO) feature of filers ensures that the distribute software is highly available at the remote site, where administrative resources may be scarce. 9.6 How Do FlexCache Volumes Work? A newly created FlexCache volume starts out empty and is populated as responses are stored from client requests. Only the data that has been requested by a user is cached. Cached objects remain on the device as long as they are deemed consistent with the origin server and are requested by the end user Caching Granularity A FlexCache volume can cache files, directories, and symbolic links. Caching is performed at a 4K block level granularity, and invalidation is performed at the file level. This means that a client doesn t have to request an entire file for FlexCache to cache it only the 4K blocks accessed by the client will be cached. The immediate consequences of this specificity are efficient usage of disk space and the maximization of hits on hot or frequently requested objects. Whenever a file attribute is altered on the source file server, the entire file will need to be revalidated. Responses to directory-related requests, such as READDIR and READDIRPLUS, are cached by FlexCache. Subsequent requests for LOOKUPS, READDIR, and READDIRPLUS, are then served as hits to the client. Figure 9-6 Sparse Files Chapter 9. An Introduction to FlexCache Volumes 105

136 flexcache.fm Draft Document for Review December 6, :59 am 9.7 Cache Consistency Data is cached/verified through client read requests (e.g., READ, LOOKUP, READDIR) and invalidated through client modifying requests (e.g., WRITE, REMOVE). Cache consistency for FlexCache volumes is achieved using a combination of attribute cache time outs, delegations and invalidation of cached objects while proxying writes Attribute Cache Time outs When data is retrieved from the origin volume, the file that contains that data is considered current in the FlexCache volume for a specified length of time, called the attribute cache time out. During that time, if a client reads from that object and the requested data blocks are cached, the read request is fulfilled without any access to the origin volume. If a client requests data from a file for which the attribute cache time out has been exceeded, the FlexCache volume verifies that the attributes of the file have not changed on the origin system. Then one of the following actions is taken: If the attributes of the file have not changed since the file was cached, the requested data is either directly returned to the client (if it was already in the FlexCache volume) or retrieved from the origin system and then returned to the client. If the attributes of the file have changed, the file is marked as invalid in the cache. Then the requested data blocks are read from the origin system as if it were the first time that file had been accessed from that FlexCache volume. 7 With attribute cache time outs, clients can get stale data if they access a file on the FlexCache volume that has been changed on the origin volume, and the access is made before that filefs attribute cache time out is reached. To prevent clients from ever getting stale data, you can set the attribute cache time out to zero. However, this will negatively affect your caching performance, because then every data request causes an access to the origin system Delegation When a request to the FlexCache volume results in a miss, the caching filer contacts the origin to fetch a fresh copy. The Caching Filer may also request a read delegation. A read delegation on a file is a guarantee that no other clients are writing to the file. Once a file delegation is granted, the file cache is able to trust that it has the latest copy of the file and its attributes, and it can return portions of the file and attributes to clients without going back to the origin server for freshness checking. This mechanism reduces revalidation requirements between the cache and the origin and is more efficient than having to poll the file for freshness Write Operation Proxy If the client modifies the file, that operation is proxied through to the origin filer, and the file is ejected from the cache. This also changes the attributes of the file on the origin volume, so any other FlexCache volume that has that data cached will re-request the data once the attribute cache time out is reached and a client requests that data Cache Hits and Misses When a client makes a read request, if the relevant block is cached in the FlexCache volume, the data is read directly from the FlexCache volume. This is called a cache hit. Cache hits are the result of a previous request. A cache hit can be one of the following types:. 106 The IBM TotalStorage Network Attached Storage N Series

137 Draft Document for Review December 6, :59 am flexcache.fm Hit The requested data is cached and no verify is required; the request is fulfilled locally and no access to the origin filer is made. Hit-Verify The requested data is cached but the verification time out has been exceeded, so the file attributes are verified against the origin system. No data is requested from the origin system. In the case of a file being delegated, a file need not be verified, and hence there is no user visible latency. If data is requested that is not currently on the FlexCache volume, or if that data has changed since it was cached, the caching system loads the data from the origin system and then returns it to the requesting client. This is called a cache miss. A cache miss can be one of the following types:. Miss The requested data is not in the cache; it is read from the origin system and cached. Miss-Verify The requested data is cached, but the file attributes have changed since the file was cached; the file is ejected from the cache and the requested data is read from the origin system and cached. 9.8 Space Management Space Management in FlexCache is designed to ensure that space within an aggregate is efficiently and optimally allocated across multiple FlexCache volumes and FlexVols. During the normal course of operation, objects in the cache that are not frequently used are ejected in favor of objects that are accessed most often. At the same time, space guarantees of FlexVols are honored. Because of this design, very little administrative intervention is required once FlexCache volumes are created within an aggregate. 9.9 File Locking Locking is critical in maintaining the consistency of files that can be modified by multiple clients over NFS. FlexCache uses a distributed Network Lock Manager (NLM) to ensure that multiple processes don t simultaneously modify a file, as well as coordinate the modification of a shared file between cooperating processes. FlexCache acts as a proxy on behalf of the NLM server on the origin filer, while all locks are held by the origin server. Since NLM is a stateful protocol, it is imperative that clients can reestablish locks if the servers go down and that these servers know to release client locks if the client reboots. Both of these issues are solved by the Network Status Monitor (NSM). The network status monitor notifies NLM about the current state of the network, including any system crashes that occur. NLM can then reestablish any of the stale sessions. FlexCache offers support for all NLM related requests, as well as NSM feedback Conclusion To enhance the productivity of workers in remote locations, organizations with a geographically distributed workforce must ensure that remote workers can access and share mission-critical data quickly and efficiently. However, limited wide area network (WAN) bandwidth between the central office and remote locations can make that difficult. Remote users often experience poor performance in receiving large files for applications, such as CAD/CAM or software development, and in accessing data repositories across the WAN (which must be shared to support remote office operations). Additionally, IT support at remote Chapter 9. An Introduction to FlexCache Volumes 107

138 flexcache.fm Draft Document for Review December 6, :59 am sites is often limited, complicating the management of operations such as backup and data replication. The Network Appliance FlexCache solution provides centralized IT administration of data while delivering improved data access to remote workers. In high-performance computing environments, such as movie postproduction, bio-informatics, and simulation, FlexCache can be used to scale overall performance without adding management overhead. 108 The IBM TotalStorage Network Attached Storage N Series

139 Draft Document for Review December 6, :59 am snaplock.fm 10 Chapter 10. Data Permanence with SnapLock This Chapter introduces the SnapLock, SEC and Enterprise data protection solution that allowsyou to control access to your data using the Write-Once-Read-Many (WORM) technology. Two versions of this product exist. They are the SnapLock Compliance and SnapLock Enterprise. SnapLock is an advanced storage solution that provides an alternative to traditional optical WORM (write-once-read-many) storage systems for non-rewritable data. SnapLock is a license-based, open-protocol functionality that works with application software to administer nonrewritable storage of data. SnapLock is a implementation of high performance disk-based Magnetic Write-Once-Read-Many (WORM) storage. The primary objective of this Data ONTAP feature is to provide secure, storage enforced data retention functionality via open file protocols such as CIFS and NFS. Data ONTAP has hardened administrative interfaces to the degree that SnapLock can be deployed for protecting data in regulatory environments so strict that even the storage administrator is considered an untrusted party. An example of such an environment is broker/dealer market regulated by SEC a-4. Alternate configurations of SnapLock can be deployed for unregulated or more flexible regulated environments. Both SnapLock software products provide nonerasable, nonrewritable WORM functionality utilizing disk drives in a cost-efficient, highly available RAID configuration. From a data protection perspective, the process of committing data to WORM status on either SnapLock product can be thought of in the same manner as storing data on an optical platter. As does an optical platter "burned" with data, both SnapLock software products protect data committed to WORM status from any possible alteration or deletion until its retention period has expired Features Non-erasable, non-rewritable magnetic disk storage (WORM) Enabled at a volume level Fine grained management Copyright IBM Corp All rights reserved. 109

140 snaplock.fm Draft Document for Review December 6, :59 am File level retention policies Reclaim space with file expiration Tamper proof Compliance Clock Data access via CIFS and NFS Comprehensive permanence, security features mitigate regulatory risk Simple architecture means easy deployment, management Leverages existing investments Open protocols simplify integration, management; protect against vendor lock-in Support for diverse platforms, data types maximizes flexibility High performance for quick search/recovery Comprehensive permanence, security features mitigate regulatory risk Simple architecture means easy deployment, management Simply enable license on any platform. Leverages familiar Data ONTAP interface. Leverages existing investments The customer does not have to buy a new storage server just for compliance SnapLock works on an existing N series System Storage. Furthermore, compliance and other (reference, archival) volumes can co-exist on the same platform. Open protocols simplify integration, management; protect against vendor lock-in N Series System open protocol approach means the customer will always be able to search and recover records from a SnapLock volume. Support for diverse platforms, data types maximizing flexibility SnapLock works for structured, semi-structured AND un-structured data. High performance for search/recovery Easy application integration without closed, proprietary API. This Non-API approach means SnapLock performance is x faster sometimes customers don t care about this because data is not recovered/searched often. However when there is a regulatory action or legal discovery situation search/recovery speed becomes critical. SnapMirror is supported with SnapLock volumes (aka: WORM->WORM SnapMirror support): Both volume and qtree SnapMirror Support for per-file retention periods Critical for regulated environments Two forms of SnapLock SnapLock is available in two forms: SnapLock Compliance and SnapLock Enterprise. SnapLock Volume type Volume type determined by installed license Volume type is set at volume creation time Volumes remember their type and behave accordingly 110 The IBM TotalStorage Network Attached Storage N Series

141 Draft Document for Review December 6, :59 am snaplock.fm SnapLock Compliance Provides WORM protection of files while also restricting the storage administrator's ability to perform any operations that might modify or erase retained WORM records. SnapLock Compliance was designed to assist organizations in implementing a comprehensive archival solution for meeting strict regulatory requirements for data retention (such as SEC 17a-4). Records and files committed to WORM storage on a SnapLock Compliance volume cannot ever be altered or modified but can be deleted after the expiration of their retention periods. Moreover, a SnapLock Compliance volume cannot be deleted until all data stored on it has passed its retention period and been deleted by the archival application or some other process. Feature Details Software driven secure clock mechanism see, Compliance Clock on page 111. Default, minimum and maximum retention periods SnapLock Compliance volumes can be destroyed WORM file retention dates can be extended Expired WORM files can be deleted Compliance Clock The compliance clock is a software driven secure clock that runs independently of the regular system clock and stores state persistently on disk. It is required to ensure a secure time base, and all WORM delete operations are based on this clock. It can only be initialized once and is tamper resistant. If the filer is turned off or the disks are removed, the clock will stop ticking and the length of time for the retention will be extended by the amount of time the clock was not ticking. Volumes track the compliance clock. Based on the clock, WORM files and SnapLock compliant volumes can be deleted once their retention period has expired. SnapLock Enterprise Provides WORM protection of files, but uses a trusted administrator model of operation that allows the storage administrator to manage the system with very few restrictions. For example, SnapLock Enterprise allows the administrator to perform operations, such as destroying SnapLock volumes, that might result in the loss of data. SnapLock Enterprise is geared toward assisting organizations with meeting self-regulated and best practice guidelines for protecting digital assets with WORM-type data storage. Data stored as WORM on a SnapLock Enterprise volume is permanently protected from alteration or modification but can be deleted after the expiration date. Functionality-wise, SnapLock Enterprise matches SnapLock Compliance exactly, with only one main difference: As the data being stored is not for the strictest regulatory applications, an administrator is trusted with the ability to delete a SnapLock Enterprise volume, including the data it contains Licensing For more flexible enterprise environments All client protocol interfaces are still SnapLock protected All administrative operations are enabled (except disk zero) The SnapLock storage forever maintains the rules and policies of the particular SnapLock license that was active when it was initially created. In other words, a SnapLock aggregate or traditional volume created with an active SnapLock Compliance license will always retain the rules associated with SnapLock Compliance. This protection holds if SnapLock storage is created with an active SnapLock Compliance license and then a SnapLock Enterprise license is made active on the appliance, or even if the SnapLock license is removed. Chapter 10. Data Permanence with SnapLock 111

142 snaplock.fm Draft Document for Review December 6, :59 am Note: A storage appliance may have either the snaplock license (compliance) or the snaplock_enterprise license enabled but not both at the same time. Regulatory Requirements Summarized Security and confidentiality Authorization and access controls Audit logging Encryption Data destruction Flexibility Scalability with highperformance Support for multiple data types Data permanence Non-erasable, non-alterable media Unique identification Data replication July 05 Network Appliance Redistribution outside of an authorized NetApp distributor or reseller to third parties prohibited without prior written NetApp approval platters) and in other cases the physical media is rewritable but the integrated hardware and software codes controlling access to the media prevent such overwrites (such as with WORM magnetic tape and disk-based SnapLock) SnapLock Connectivity Both SnapLock software products utilize CIFS and NFS open NAS protocols to store and access archived data. That means business servers interoperating with SnapLock do not have any additional software installed to facilitate the integration. UNIX and Microsoft operating systems already have all of the connectivity required for integration with SnapLock built into them. One of the core SnapLock value propositions is that of open connectivity to data without requiring use of a closed, proprietary API. This methodology provides easier access to data, provides simpler integration between application vendors and storage vendors, and removes the vendor lock-in risk for customers compliance archival data. The open protocol aspect of SnapLock provides a natural and flexible way to manage, store, and retrieve data via regular CIFS and NFS clients How SnapLock works SnapLock Activation SnapLock is activated at the aggregate level. Therefore, it can be turned on per traditional volume, but FlexVols inherit SnapLock from their aggregate. If SnapLock is enabled on an aggregate, any FlexVols created on the aggregate will be SnapLock. So, in Data ONTAP, SnapLock is implemented either at the traditional volume level or at the aggregate level. To 112 The IBM TotalStorage Network Attached Storage N Series

143 Draft Document for Review December 6, :59 am snaplock.fm deploy SnapLock with FlexVol, first create a SnapLock aggregate, then create the FlexVol volume WORM and SnapLock As its name implies, Write-Once-Read-Many (WORM) media possesses the property that data can only be written once to any area of the media and can never be overwritten or erased. In some cases this is a property of the physical media (such as with WORM optical WORM data resides on SnapLock volumes that are administered much like regular (non-worm) volumes. SnapLock volumes operate in WORM mode and support standard file system semantics. Data on a SnapLock volume can be created and committed to WORM state by transitioning the data from a writable state to a read-only state. Marking a currently writable file as read-only on a SnapLock volume commits the data as WORM. This commit process prevents the file from being altered or deleted by applications, users, or administrators. Data that is committed to WORM state on a SnapLock volume is immutable and cannot be deleted before its retention date. The only exceptions are empty directories and files that are not committed to a WORM state. Additionally, once directories are created, they cannot be renamed. In Data ONTAP, WORM files can be deleted after their retention date. The retention date on a WORM file is set when the file is committed to WORM state, but can be extended at any time. The retention period can never be shortened for any WORM file Replicating SnapLock volumes You can replicate SnapLock volumes to another filer using the SnapMirror feature of Data ONTAP. If an original volume becomes disabled, SnapMirror ensures quick restoration of data. For more information about SnapMirror and SnapLock, see the Data Protection Online Backup and Recovery Guide How easy is it to use SnapLock 1. License SnapLock (SLC or SLE) 2. Create SnapLock volume / aggregate 3. Cannot convert volume to SnapLock 4. Share / export SnapLock volume 5. Copy write enabled file over NFS or CIFS 6. Change last access time to reflect retention date 7. Change permissions to read only * 8. After file is already stored on SnapLock volume 9. vol create use L switch to specify SnapLock Note: Filerview does not support SnapLock option at the time of his publication 10.Use aggr create volume_name L to create SnapLock aggregates 11.Then create flexible volumes on the SnapLock aggregate Chapter 10. Data Permanence with SnapLock 113

144 snaplock.fm Draft Document for Review December 6, :59 am 12.Determine Type of SnapLock volume license dependent 13.Select Compliance versus Enterprise Notes: Cannot Convert existing volume or aggregate to SnapLock. Cannot convert SnapLock to regular volume or aggregate 14.With SnapLockTM, the process of creating a WORM volume and committing files to it is very simple customers have installed it and had it running in a couple of hours SnapLock TM - Simplicity Application Basic Data Storage Primary Storage CIFS or NFS Protocol Archival Application Access Data and Move to WORM Storage Create a SnapLock Compliance (WORM) volume Archive files to SnapLock volume via CIFS/NFS Set each file s expiration date Mark files read only Prevents any and all alterations, overwrites or deletions WORM Volumes Filer N Series Figure 10-1 SnapLock Creation Figure 10-2 Displaying regular and SnapLock Volumes High Availability Replication to Remote site For compliance with data retention rules, regulatory agencies may require keeping a second copy of archived data at a remote site. The most straightforward and natural way to comply with this requirement is to replicate data from a primary N series System Storage to a secondary N series System Storage separate location. There are two integrated NetApp 114 The IBM TotalStorage Network Attached Storage N Series

145 Draft Document for Review December 6, :59 am snaplock.fm solutions available to seamlessly perform data replication. The easiest and most robust solution is using SnapMirror to replicate data to a remote location. SnapMirror in either synchronous or asynchronous mode replicates SnapLock data to a remote appliance while maintaining all aspects of the original WORM file, such as date-time stamp and filename, including path. The second solution, ndmpcopy, is a free utility and is already bundled in with Data ONTAP. Like SnapMirror, ndmpcopy maintains WORM aspects of the original files in the replicated copy. SnapMirror can replicate at either the volume or quota tree (also known as qtree) level. When replicating SnapLock Compliance FlexVol volumes or traditional volumes, IBM recommends using SnapMirror with qtree replication rather than replicating at the volume level. The reason for using qtree replication is that it allows additional replication strategies, including the ability to re synchronize SnapLock Compliance volumes. Note: SnapMirror replication of SnapLock Enterprise volumes works exactly the same as with traditional volumes. Tape Backup N series System Storage offer substantial performance improvement and storage capabilities for nearline data storage over optical or tape-based storage. Even so, tape backups, including off-site tape rotation, are still a valuable part of an overall data protection strategy for enterprises. In keeping with customer wishes for meeting the high standards imposed by regulatory compliance, IBM recommends that regulated data archived on SnapLock also be backed up to tape. In most cases, a tape backup infrastructure including a backup application is already in place, and N Series System Storage can leverage the existing environment. Industry-standard NDMP-generated backups of SnapLock data include all file WORM metadata for each file so that a subsequent NDMP restore to a new SnapLock volume puts the data back into its original WORM state. The native dump command already bundled in Data ONTAP also includes all WORM metadata on each file in the backup stream. Later recoveries using the native restore command to a new SnapLock volume put the data into its original WORM state Summary SnapLock Compliance and SnapLock Enterprise are designed to be critical pieces of a comprehensive data archival solution for businesses that require higher performance and lower TCO alternatives for WORM storage functionality. SnapLock benefits over traditional WORM storage include substantial improvements to performance, capacity, and reliability while significantly reducing management overhead. These benefits layer nicely for businesses needing WORM data storage for regulatory compliance or for protecting critical enterprise content beyond the capabilities of normal data storage. Chapter 10. Data Permanence with SnapLock 115

146 snaplock.fm Draft Document for Review December 6, :59 am 116 The IBM TotalStorage Network Attached Storage N Series

147 Draft Document for Review December 6, :59 am snapvault.fm 11 Chapter 11. ENABLING RAPID RECOVERY WITH SNAPVAULT SnapVault is a low overhead, disk-based online backup of heterogeneous storage systems for fast and simple restores. SnapVault is a separately licensed feature in Data ONTAP that provides disk-based data protection for filers. SnapVault server runs on the IBM TotalStorage N series platform. However, you can use a N series filer as a SnapVault server as well. SnapVault replicates selected snapshots from multiple client filers to a common snapshot on the SnapVault server, which can store many snapshots. These snapshots on the server have the same function as regular tape backups. Periodically, data from the SnapVault server can be dumped to tape for extra security. SnapVault which is a heterogeneous disk-to-disk data protection solution ideal for use with the N series. SnapVault is a low overhead, disk-based online backup of heterogeneous storage systems for fast and simple restores. A SnapVault primary system corresponds to a backup client in the traditional backup architecture. The SnapVault secondary is always a data storage system running Data ONTAP. SnapVault software protects data residing on a SnapVault primary. All of this heterogeneous data is protected by maintaining online backup copies (Snapshot) on a SnapVault secondary system. The replicated data on the secondary system can be accessed via NFS or CIFS just as regular data can be. The primary systems can restore entire directories or single files directly from the secondary system. There is no corresponding equivalent to the SnapVault secondary in the traditional tape-based backup architecture. Copyright IBM Corp All rights reserved. 117

148 snapvault.fm Draft Document for Review December 6, :59 am N series primary Secondary Storage System Figure 11-1 Basic SnapVault Benefits It avoids the bandwidth limitations of tape drives so restore can be faster. It doesn't require full dumps from the primary storage, so there is no need for a backup window. Data protection solution for heterogeneous storage environments Performs disk-to-disk backup and recovery Incrementals forever model Designed to address pain points associated with tape Intelligent data movement reduces Network traffic Impact on production systems Frequent backups ensure superior data protection Uses Snapshot technology Significantly reduces the amount of backup media Reduced backup overhead: incrementals only, changed blocks only. Instant single file restore:.snapshot directory displays SnapVault snapshots. Can protect remote sites over a WAN How does SnapVault Work Administrators set up the backup relationship, backup schedule and retention policy 118 The IBM TotalStorage Network Attached Storage N Series

149 Draft Document for Review December 6, :59 am snapvault.fm Multiple qtrees or open system directory can be backed up to the same volume if they have the same schedule and retention policy Kicks off backup job based on a backup schedule Backup job can involve backing up multiple SnapVault primaries Moves data from SnapVault primary to SnapVault secondary Incremental forever backup after initial level 0 transfer Filers transfer changed blocks to SnapVault secondary Open systems transfer changed files to SnapVault Secondary Upon successful completion of a backup job, takes SnapShot on SnapVault secondary Only saves changed blocks on SnapVault secondary Maintains SnapShots on SnapVault secondary based on retention policy SnapMirror SnapVault secondary to remote location for DR (optional) Traditional backup application can be leveraged to backup SnapVault secondary to tape SnapVault Backup Flow Diagram Setup Initial Full Backup Incremental Backup SnapMirror Tape Backup Backup images are in file format on disk Immediately and easily verifiable Reliable and redundant disk storage SM2Tape Local copy LAN copy Incremental forever backup Transfer changed blocks for Filers Transfer changed blocks or files for open systems Only store changed blocks for all systems All backup copies are full images SnapMirror SV secondary to remote location for disaster recovery Use NDMP backup application to backup data to tape anytime No more backup window needed Centralized and efficient usage of tape resources Figure 11-2 Backup Flow Diagram After the simple installation of the SnapVault agent on the desired primary file and application servers, the SnapVault secondary system requests initial baseline image transfers from the primary storage system. This initial, or baseline, transfer may take some time to complete, as it is duplicating the entire source data set on the secondary much like a level-zero backup to tape. SnapVault protects data on a SnapVault primary system by maintaining a number of read-only versions of that data on a SnapVault secondary system. These transfers establish SnapVault relationships between the primary qtrees or directories and the SnapVault secondary qtrees. The baseline transfer will typically be the most time-consuming process of the SnapVault implementation as it is duplicating the entire source data set on the secondary, much like a full backup to tape. With SnapVault, one baseline transfer is required before any subsequent backups or restores can be performed, but unlike traditional tape backup environments, this initial baseline transfer is a one time occurrence, not a weekly event. Chapter 11. ENABLING RAPID RECOVERY WITH SNAPVAULT 119

150 snapvault.fm Draft Document for Review December 6, :59 am First, a complete copy of the data set is pulled across the network to the SnapVault secondary. Each subsequent backup transfers only the data blocks that have changed since the previous backup (incremental backup or incrementals forever). When the initial full backup is performed, the SnapVault secondary stores the data in a WAFL file system, and creates a Snapshot image of that data. A Snapshot is a read-only, point-in-time version of a data set. Each of these snapshots can be thought of as full backups (although they are only consuming a fraction of the space). A new Snapshot is created each time a backup is performed, and a large number of Snapshots can be maintained according to a schedule configured by the backup administrator. Each Snapshot consumes an amount of disk space equal to the differences between it and the previous Snapshot. A very common scenario is data protection of the secondary system. A SnapVault secondary system can be protected by either backup to tape or backup to another disk based system. The method to backup to a tertiary disk based system is simply volume based Snapmirror. All snapshots are transferred to the tertiary system and the SnapVault primaries can be directed to this tertiary system if necessary. In order to backup a secondary to a tape library, Snapmirror to tape can be used or simply NDMP backup to tape What Gets Backed Up and When SnapVault provides a lot of flexibility in deciding which data and at what granularity the chosen data is protected. The data structures that are backed up and restored through SnapVault depend on the primary storage system. On N series primary systems, the qtree is the basic unit of SnapVault backup and restore(see chap X). SnapVault backs up specified qtrees on the primary system to associated qtrees on the SnapVault secondary storage system. If necessary, data is restored from the secondary qtrees back to their associated primary qtrees. On open system storage platforms, the directory is the basic unit of SnapVault backup. SnapVault backs up specified directories from the native system to specified qtrees in the SnapVault secondary storage system. If necessary, SnapVault can restore an entire directory or a specified file to the primary storage platform Incremental Updates Incremental backups are updates to an existing baseline copy of the primary system's data set. The concept of obtaining and transferring primary system data to the secondary system after the baseline transfer has completed is typically referred to as incremental backups forever. With SnapVault there is only one full backup (the baseline transfer) followed by incremental backups for the remainder of the qtree/directory relationships. An incremental backup occurs when the SnapVault secondary contacts the primary to update its qtrees with the latest data from the primary. No additional full backups need to be performed after the first baseline copy has been completed. Each incremental backup and subsequent Snapshot copy of the data set on the secondary system can be used as full backups of the original data set except that they only consume the amount of disk space that was actually changed. The incremental backups are replications of the primary data set with the replication versions being updated as often as every hour without consuming the media that traditional tape-based architectures would require. It is important to note that incremental backups only consume space on the secondary system for the data that has actually changed. For example: if a 10GB file has had 100kB of changes since the last incremental backup, the secondary server will consume only 100kB of storage space to record that change. In other words, only the changed blocks in an updated file are stored on the secondary server. This is dramatically different from the behavior of 120 The IBM TotalStorage Network Attached Storage N Series

151 Draft Document for Review December 6, :59 am snapvault.fm most incremental tape backups, where the entire changed file is recorded from the incremental backup. This is a dramatic advantage in resource conservation for those deploying SnapVault in place of traditional backup architectures Scheduling/Retention Policy The schedule details the frequency, number of copies to retain, date, and time to perform incremental backups for a specific SnapVault relationship. The SnapVault secondary system creates and maintains copies, based on the specified schedule, for each primary data set that it is responsible for protecting. Incremental backups can be scheduled every hour, week, or month depending on the needs of the environment, providing backup/storage administrators considerable flexibility when defining policies for data protection. In normal operation, updates and Snapshot creation proceed automatically according to the Snapshot schedule. However, SnapVault also supports manual operation via basic command line and management interface operations, thus allowing for customer on demand application-level integration for specific applications/servers that require an application/event-driven backup capability. All of these scheduling options result in the capability to increase the frequency of backups, without an increased requirement of baseline transfers and media cost. In most cases traditional backup windows can be reduced or even eliminated once the initial full backup has been performed Snapshot Copies The backup data is stored in Snapshot copies on the secondary system in the RAID-protected NetApp WAFL file system. A Snapshot backup is a read-only, point-in-time version of a data set. Each time a backup is performed a new Snapshot is created. Up to 250 Snapshot copies can be maintained on a SnapVault secondary system. When creating a new Snapshot backup, SnapVault will delete the oldest Snapshot copy, rename the remaining copies as appropriate and then create a new base Snapshot backup. The data is readily available and safely stored on disk. Most organizations will then make a tape copy from the Snapshot copy or better yet, replicate the Snapshot copy (via SnapMirror) to an off-site facility where tape copies are created. If multiple systems are being backed up, the transfers will be synchronized so that all transfers are completed at the same time, enabling multiple backups to share a Snapshot duplicate, thereby preserving Snapshot copies for archiving on the secondary system. Each Snapshot backup consumes an amount of disk space equal to the amount of data changed during the period between its creation and the creation of the previous Snapshot backup. As stated earlier, only data that has changed (not entire files that have changed) is saved. Snapshot allows you to keep hundreds of full backup equivalent images of the source data set, in a minimum amount of space. This boils down to each Snapshot copy representing a full backup of the primary storage data set minus the space requirements of typical full backups. Full backup versioning is achieved and, best of all, the data is online for immediate access for restore and recovery without sorting through hundreds of tape cartridges SnapVault Detail 1. Speed of recovery: In order to recover a full data set, one must first recover a full backup, and then recover each incremental backup, in order. If one performs full backups weekly and incremental backups daily, restores will typically involve a full restore and up to six incremental restores. If one performs fewer full backups and more incrementals, restoring a full data set will take considerably longer. SnapVault addresses both of these issues. It ensures backup reliability by storing the backups on disk in the WAFL file system; these backups are protected by RAID, block Chapter 11. ENABLING RAPID RECOVERY WITH SNAPVAULT 121

152 snapvault.fm Draft Document for Review December 6, :59 am checksums, and periodic disk scrubs, just like all other data on a N series System Storage. Restores are simple because each incremental backup is represented by a Snapshot copy, which is a point-in-time view of the entire data set, and can be restored in a single operation, eliminating the need to manage large quantities of tape cartridges. 2. Simplicity of restores - One of the unique benefits of SnapVault is that users do not require special software or privileges to perform a restore of their own data. Any user who wishes to perform a restore of his own data may do so without the intervention of a system administrator, saving end-user time and money, as well as freeing up valuable administrator time. If required by policies, data recovery can be restricted to authorized individuals as well. Recovering a file from a SnapVault backup is simple. Just as the original file was accessed via an NFS mount or CIFS share, the SnapVault secondary may be configured with NFS exports and CIFS shares. As long as the destination qtrees are accessible to the users, restoring data from the SnapVault secondary is as simple as copying from a local Snapshot image. Restores can be drag-and-drop or a simple copy command, depending on the environment. If SnapVault has been deployed in an open systems environment, the restore process can be initiated directly from the primary system that was backed up via the command line. Recovery of an entire data set can be performed the same way if the user has appropriate access rights. SnapVault provides a simple interface to recover an entire data set from a selected Snapshot copy using the snapvault restore command. A user can recover the complete contents of a secondary qtree/directory back to the primary with the snapvault restore command on the primary. The primary data set will be read-only until the transfer completes, at which time it becomes writable. After a restore, the user may choose to resume backups from the recovered data set to the secondary qtree from which it was recovered. When used alone, SnapVault creates hourly read-only Snapshots on Secondary. Hence restores are done via a copy back Since each Snapshot refers to a complete point-in-time image of the entire filesystem, this means restore time is ZERO, no tape, no incremental unwind. In comparison, recovery from tape can consume considerable resources. Single files can sometimes be recovered by users, but are typically recovered by an administrator. The tape that contains the backup file must be located (sometimes retrieved from an off-site storage location) and the backup application has to transfer the file from the tape location to the requesting host. The backup administrator starts the tape restore process and retrieves the tape from the appropriate location if necessary. The tape must be loaded (from seven seconds up to two minutes), positioned to the correct location on the tape (usually several seconds, sometimes more) and finally the data is read. If a full image must be restored, data has to be recovered using the last full and subsequent incremental backups. If a restore requires recovery from one full backup and all incremental backups since that last full backup, then there is more of a chance an error may occur with the media involved in a tape solution. This process can be long and tedious depending on the amount of data being recovered. Hours and days can be spent during the restore process. If there is a failure during the restore the entire process has to be reinitiated, thereby significantly adding to downtime. If the data to be restored is a large critical database, users are offline during the entirety of the restore process. 3. Reliability - Since a full backup is required in order to recover a full data set from a traditional incremental backup, failure to recover the full backup due to media error or other causes renders all of the incremental backups useless when recovering the entire data set. Tapes used in traditional backup applications are offline storage; one cannot be sure that the data on the tape is readable without placing the tape in a drive and reading from it. Even if each piece of tape media is individually read back and verified after being written, it could still fail after being verified, but before being recovered, due to improper 122 The IBM TotalStorage Network Attached Storage N Series

153 Draft Document for Review December 6, :59 am snapvault.fm handling, resulting in a tremendous amount of uncertainty when increasing the number of incremental backups per full backup. This problem is usually solved by taking full backups more frequently and by duplicating backup tapes. Duplication of backup tapes serves several purposes including providing an off-site copy of the backup and providing a second copy of the media in case one copy is bad; however, it is possible that the bad data will simply be copied to both sets of tapes. 4. Reduced time spent in backup (incremental backups forever) - The promise of incremental backups forever is delivered with SnapVault. An incremental backup copies only the changes in a data set to a backup media. Entire files are stored in traditional backup architectures whereas only changed blocks are stored in a SnapVault configuration, resulting in dramatic space savings. Because incremental backups take less time and consume less network bandwidth and backup media, they are less expensive. Traditional backup schedules involve a full backup once per week or once per month and incremental backups each day. There are two reasons why full backups are done so frequently in traditional tape environments: 5. Space/Media Savings-Because SnapVault only requires storage of changed blocks of data, the storage requirements are much less than that of traditional backup applications, which typically store the entire changed file(s) Disaster Recovery with SnapVault Protecting Backup Data with SnapMirror In traditional tape-based solutionsfigure 11-4 on page 124, it is common to duplicate tapes for off-site storage, ship the tape off-site, and store them remotely for disaster recovery purposes. Making duplicate copies of the backup data allows one copy to be kept locally for restore purposes, while the other is shipped off-site for remote recovery in the event of a disaster. SnapVault provides several superior disaster recovery and off-site options. One option is to back up to a remote SnapVault secondary or multiple SnapVault secondaries across a wide area network for off-site storagefigure 11-3 on page 124 A second option is to back up the SnapVault secondary to tape for offline storage in the event the SnapVault secondary system is unavailable. This deployment would add a tape backup of the SnapVault secondary storage system and can serve two purposes: it enables the storage of an unlimited number of network backups offline while keeping the more recent backups available online in secondary disk storage for quick recovery if necessary. If a single tape backup is generated off the SnapVault secondary storage system, the N series and open systems storage platforms are not subject to the performance degradation, system unavailability, and the complexity of direct tape backup of multiple systems. In this instance, tape augments the backup experience. These options provide multiple lines of defense in the event of a disaster: local Snapshot backups, secondary SnapVault storage, and offline tape-archived data. Another variation to the basic SnapVault deployment protects replications stored on SnapVault secondary storage against interruption of the secondary storage system itself. The data backed up to SnapVault secondary storage is mirrored to a system configured as a SnapMirror partner, or destination. If the SnapVault secondary storage system fails, the SnapMirror destination can be converted to a secondary storage system and used to continue the SnapVault backup operation with minimum disruption to the environment. Note: In the SnapMirror protection architecture both SnapMirror and SnapVault must be licensed on the secondary Chapter 11. ENABLING RAPID RECOVERY WITH SNAPVAULT 123

154 snapvault.fm Draft Document for Review December 6, :59 am N series Storage N series Storage Secondary N series Figure 11-3 SnapVault Solutions Traditional backup Application or filerserver clients Backup TSM Server Policy DB Figure 11-4 Traditional Backup Comparing SnapMirror and SnapVault Both SnapVault and SnapMirror use data replication 124 The IBM TotalStorage Network Attached Storage N Series

155 Draft Document for Review December 6, :59 am snapvault.fm SnapMirror copies all Snapshots from a read/write source into a read-only destination SnapVault copies the active file system data from a read/write source into a read-only destination, but protects and versions the data by creating destination snapshots SnapMirror provides up to per minute updates SnapVault provides up to per hour updates SnapMirror is homogeneous, works on N series storage only SnapVault is heterogeneous 11.3 Remote Solution Example Figure 11-5 on page 125 represents a example of a using SnapVault for Local and remote backup. SnapVault s are occurring from primary N series System Storage to a secondary N series System Storage. In addition Clients with agents using Open Systems SnapVault can also backup to the Secondary System. Remote Office Solutions Remote Branches SnapVault OSSV SnapMirror N series SnapVault clients in each of over 800 locations. Centralized backup over WAN to 3 N series Block-level incrementals minimize media and bandwidth 3 rd mirrored copy for additional protection Centrally managed with DFM Figure 11-5 Remote Office Solution As seen in Figure 11-6 on page 126 with the cost-efficient storage of the N series it can be used as a tapeless backup system for N series filers. This configuration allows failover to the secondary N series System Storage for access and high availability. Chapter 11. ENABLING RAPID RECOVERY WITH SNAPVAULT 125

156 snapvault.fm Draft Document for Review December 6, :59 am SnapVault: Unleashing the Potential of Disk-based Backup Today: Normal Operation Today: Failover to Backup Image Application Servers N series Storage Block-Level Incrementals SnapVault File System Images Clients Redirect to secondary storage N series Figure 11-6 Disk Backup 126 The IBM TotalStorage Network Attached Storage N Series

157 Draft Document for Review December 6, :59 am Lockvault.fm 12 Chapter 12. LockVault explained In this chapter LockVault is described. It is a product designed for retaining large amounts of unstructured data such as documents, project files, and home directories. LockVault is built upon the SnapLock and SnapVault products. With LockVault, retention periods are set on the Snapshot copy created automatically after a SnapVault transfer takes place. A simple depiction of the LockVault process can be seen in Figure 12-1 on page 127. LockVault integrates with Open Systems SnapVault (OSSV) as well, creating a compliance solution for open systems without compliant storage. Figure 12-1 Unify Backup and Compliance Copyright IBM Corp All rights reserved. 127

158 Lockvault.fm Draft Document for Review December 6, :59 am 12.1 LockVault uses LockVault is the combination of features of SnapLock and SnapVault into a single, unified solution Unification of Backup/Recovery and Regulatory Compliance Initial Focus on Financial and Insurance Industries Designed primarily for Unstructured Data Compliance Some the Compliance drives and requirements can be seen in Figure 12-2 on page 128. Compliance Drivers and Requirements Market Drivers Litigation Protection Regulations: SEC 17 a-4 Sarbanes-Oxley NASD 3010/3110 Basel II Check 21 GoBS Compliance Requirements Data Permanence: Immutable storage Data authenticity Data integrity Data replication DOD SB 1386 Graham Leach Bliley HIPAA Patriot Act 21 CFR Part 11 UK Data Protection Act Privacy & Security: Authorization Access controls Encryption Auditing Secure deletion Most companies are subjected to multiple regulations 9/23/2005 NetApp Confidential -- Do Not Distribute 2 Figure 12-2 Compliance drives and requirements LockVault covers all data types including structured data like databases, semi structured like mail and unstructured like presentations, spreadsheets, documents. Using a combination of SnapLock and LockVault and the Security features of the N series you are able to meet compliance needs Figure 12-2 on page The IBM TotalStorage Network Attached Storage N Series

159 Draft Document for Review December 6, :59 am Lockvault.fm Figure 12-3 Compliance offerings With LockVault, customers can store Snapshot copies of their unstructured data, as required, in a WORM format without the need to identify each file that comes under regulatory purview. LockVault creates periodic (can be as often as hourly) Snapshot copies of the file system and backs this data up to a local or remote N series filer Figure 12-4 on page 130 while protecting each Snapshot copy in WORM format. Once an initial full backup has been completed, all subsequent backups only store changed blocks while at the same time providing a compliant view of the entire backup image. This dramatically reduces the amount of storage that is consumed and enables an organization to keep more information online cost effectively. The data is stored in file format providing the ability for any administrator with access privilege to view (but not edit, alter, or delete) the data. LockVault also supports retention dates, meaning that information can be disposed of at a given point and time once a retention date expires. Chapter 12. LockVault explained 129

160 Lockvault.fm Draft Document for Review December 6, :59 am N series filers Figure 12-4 Remote capabilities Benefits Mitigates risk- LockVault eliminates the need to rely on manual or policy-based methods of identifying and isolating records subject to regulatory compliance rules. Unifies backup and compliance - One data copy satisfies both backups and compliance demands. Delivers fast access for search/discover- Nightly compliant archives of the enterprise are available online, for instant search, retrieval or restore. Minimizes Storage Capacity Consumption- Block level incremental scheme uses less than 1/20th the capacity consumed by a traditional tape backup scheme over a one-year period. Simplifies infrastructure deployment and management- A single, unified platform that is easy to deploy and manage can handle requirements of unstructured, structured, and semi structured data. Imparts flexibility - Open protocols-acrhival avoids the complexity and performance penalties of API-based solutions and assures true protection against obsolescence or vendor lock-in. Snap Mirror can also be used to make duplicate copies of the LockVault backup compliant images because SnapMirror supports WORM-to-WORM remote replication Figure 12-5 Unstructured data 12.2 SnapVault, SnapLock and LockVault working together. SnapVault uses a SnapLock volume in which the log files are kept. This volume, called a LockVault log volume, is a standard WORM volume. SnapVault uses an append-only write to 130 The IBM TotalStorage Network Attached Storage N Series

161 Draft Document for Review December 6, :59 am Lockvault.fm write to the log files. This allows accurate record keeping but does not allow previous events to be overwritten. The contents of log files cannot be overwritten or deleted, nor can the log file itself be deleted. The log files will accurately record the exact series of events that occur during a time frame. The log volume is similar to the WORM volume, but has the added feature of supporting append operations. This allows accurate record keeping but does not allow previous events to be overwritten. SnapVault uses two types of log files to record events: SnapVault operations log files and SnapVault files transferred log files. The combination of these append -only files enable the customer to demonstrate (in a compliant manner) the precise history of all backups of compliant data performed via LockVault. SnapLock and LockVault Compared Solution for: Mode of operation Commit Retention dates Compliance Journal Version handling SnapLock Structured and semi-structured data Driven by archival application Explicit commit required Assigned to files None Each version is a different file LockVault Unstructured data Self-contained application Automatic commit and data assignment Assigned to Snapshots Yes; logs file changes Full original, then only changed blocks Figure 12-6 SnapLock and LockVault compared Files transferred log files SnapVault creates a new files transferred log file at the beginning of each SnapVault transfer. Files transferred log files are kept in the following directory structure in the LockVault log volume: Example 12-1 /etc/logs/snapvault secondary vol/secondary qtree name/month/log file Retention capabilities LockVault also supports fixed data retention periods (as stipulated by Rule 17a-4and other pieces of compliance legislation) by allowing expiration and date to be applied to a particular backup. Once an expiration date has been set, the retention period for a backup can be reduced. A LockVault backup can be disposed of at a given point and time once a retention period expires. Plus, automatic disposal dates can be set to prevent any archived Chapter 12. LockVault explained 131

162 Lockvault.fm Draft Document for Review December 6, :59 am unstructured data from being retained unnecessarily, thereby reducing a firm-exposure to frivolous liability. 132 The IBM TotalStorage Network Attached Storage N Series

163 Draft Document for Review December 6, :59 am snapdrive.fm 13 Chapter 13. Overview of SnapDrive This chapter provides a introduction to SnapDrive which provides storage Virtualization of filer volumes via the iscsi or Fibre Channel (FCP) access protocols The Benefits of SnapDrive SnapDrive software was designed to work with applications that support Snapshot backups or that have software written to allow the application to take advantage of Snapshot backups. For example, SnapManager for Microsoft Exchange is specifically designed to work with SnapDrive. SnapDrive software provides a layer of abstraction between a Windows application running on a Windows 2000 server or Unix systems and N series systems. Applications running on a server with SnapDrive access virtual disks on N series systems as if they were locally connected disks. This allows applications that require locally attached storage, such as Exchange 2000 and Microsoft SQL Server 2000, to leverage N series functionality. It has been estimated that SnapDrive saves 100 steps in the creation of virtual disks. The list below highlights some of the important benefits System Storage N series offers to applications: Snapshot copies provide rapid backup/restore capability with minimal resource and capacity requirements Dynamic "on-the-fly" file system expansion, new disks are usable within seconds Mirroring data/replication and clustering for high availability Patented, high-performance, low-latency file system with industry-leading reliability Robust yet easy-to-use data and storage management features and software Industry-leading availability-exceeding 99.99X% availability on nonclustered systems Virtual disks created within a dynamic pool of storage that can be re-allocated, scaled, and enlarged in real time, even while systems are accessing data. Robust data integrity features such as advanced RAID functionality and built-in file system checksums help protect against potential disk drive failures and disk errors An MMC extension for provisioning and management of NetApp LUNs (known as Virtual Disks) from the Windows Server Copyright IBM Corp All rights reserved. 133

164 snapdrive.fm Draft Document for Review December 6, :59 am Works for any Application running on Win2K or later Works with NetApp FCP or iscsi protocols Integrated NetApp Snapshot, SnapRestore& SnapMirror functionality Easy-to-use GUI snap-in within the Windows MMC, riding alongside Disk Manager Virtual Disks can be dynamically grown, even while under load Instant point-in-time Snapshot for Non-disruptive backup, Data Mining, Application testing Greater application performance: run other processes off independent snapshots Near-instantaneous restoration with SnapRestore Replication using SnapMirror and rolling Snapshots. Automates the task of taking consistent application data snapshots Dynamic Volume Management Cluster and Multi-Path Aware Disk-based Back up and Restore in Seconds Seamless Online Replication (with SnapMirror) Intuitive Management on Windows hosts Note: At the time of publication not all features yet available on UNIX version of SnapDrive 13.2 SnapDrive for Windows SnapDrive enables Windows and Unix applications to access storage resources on N series systems, which are presented to the Windows 2000 or later, operation system as locally attached disks. N series systems and SnapDrive software represent a complete data management solution for Windows applications. SnapDrive includes Windows 2000 and above device drivers and software that is used to manage application Snapshot backups. Snapshot backups are non disruptive to applications and occur very quickly. Restoring data from a Snapshot copy is nearly instantaneous. Snapshot backups may also be mirrored across LAN or WAN links for centralized archiving and disaster recovery purposes. This chapter will briefly outline the architecture of SnapDrive Software Components SnapDrive software combines System Storage N series functionality, Windows 2000 and above N series device drivers, and a Microsoft Management Console (MMC) application into a complete data management solution. SnapDrive software is installed on a Windows 2000 server. SnapDrive software consists of the following components: The SnapDrive Win32 device drivers The SnapDrive Win32 service The SnapDrive Microsoft Management Console (MMC) application 134 The IBM TotalStorage Network Attached Storage N Series

165 Draft Document for Review December 6, :59 am snapdrive.fm Windows Device Manager In the SCSI and RAID controllers section of the Windows Device Manager, the Emulex LightPulse PCI Fibre Channel HBA and Microsoft iscsi Initiator are both visible in the Figure 13-1 on page 135 picture below. SnapDrive is capable of accessing virtual disks over iscsi and FCP on one or more filers simultaneously. IBM TotalStorage 12 IBM TotalStorage The power to break through 2004 IBM Corporation Figure 13-1 Computer management The Figure 13-2 on page 136 below shows the SnapDrive MMC management interface, which is used to manage virtual disks. In this example there are two LUNs in use by this server. Chapter 13. Overview of SnapDrive 135

166 snapdrive.fm Draft Document for Review December 6, :59 am IBM TotalStorage Figure 13-2 SnapDrive Once installed, SnapDrive can be used to create and manage virtual disks on N series systems, which appear as basic disks to the Windows 2000 server and its applications. Virtual disks that reside on the filer can be expanded, unlike Windows-native basic disks. SnapDrive is also used to create, delete, and manage all aspects of the application Snapshot backups. Once a SnapDrive Virtual disks is created it will appear in the Microsoft Disk manager as seen in Figure 13-3 on page The IBM TotalStorage Network Attached Storage N Series

167 Draft Document for Review December 6, :59 am snapdrive.fm As Seen by Windows Disk Manager (no visible difference) Figure 13-3 Microsoft Disk Manager Dynamic File System Expansion As your storage needs increase, you might need to expand a virtual disk to hold more data. A good opportunity for doing this is right after you have expanded your filer volumes. Planned downtime is also minimized with online disk expansion. Scheduled maintenance is minimized. Dynamic Filesystem Expansion satisfies those unique occurrences when unplanned growth or data movement is required and capacity needs to be increased on the fly. Figure 13-4 on page 138 and Figure 13-5 on page 138 shows just how easily this can be done. Chapter 13. Overview of SnapDrive 137

168 snapdrive.fm Draft Document for Review December 6, :59 am Disk Expansion is a Snap! Simple point & click Disk can be expanded dynamically, online and under load! Figure 13-4 Disk Expansion 54MB 32MB Figure 13-5 Dynamic Expansion Volumes, RAID Groups, and Virtual Disks Physical disks are grouped together in the form of volumes on the filer and may consist of one or more RAID groups. 138 The IBM TotalStorage Network Attached Storage N Series

169 Draft Document for Review December 6, :59 am snapdrive.fm Virtual disks are created and managed within the limits of the N series available storage capacity on a per-volume basis. The size of a volume is determined by the number of disks multiplied by the capacity of the disks. The disks that make up a single volume can be divided into multiple RAID groups. Each RAID group calculates parity information for the drives within the RAID group. The use of multiple RAID groups is transparent to data access and only important at the RAID layer. Volumes function as a whole regardless of the number of RAID groups. Application programs that run on a Windows server, such as a database application, access virtual disks as if they were locally attached physical disks. Virtual disks are units of storage that are designated for use by one or more host servers. Virtual disks may also be used with Microsoft Cluster Server (MSCS). MSCS can use virtual disks for data storage and as quorum disks. Virtual disks and their attributes such as their size and file system format (NTFS) are defined by the system administrator. The following are true of virtual disks: Virtual disks are created on the filer and mounted as disks on Windows server Virtual disks are created on the filer and mounted as disks on Windows servers Virtual disks are accessed and function as physical disks to the Windows server Virtual disks appear within Windows as basic disks, not dynamic disks Virtual disks can be expanded, unlike the native basic disks within Windows Virtual disks are formatted with the NTFS file system Virtual disks reside on physical filer volumes that are RAID protected and distributed among multiple disks for maximum data integrity and performance Dynamic-disk features and functionality are provided by the filer Multiple virtual disks can be created on a single filer volume Virtual disks can be accessed using the iscsi over Gigabit Ethernet or FCP over Fibre Channel access methods SnapManager for Microsoft Exchange N series SnapManager for Microsoft Exchange is integrated with SnapDrive and offers a comprehensive data management solution for Microsoft Exchange. SnapManager for Microsoft Exchange dramatically reduces the time it takes to back up and restore Exchange data. Nearly instantaneous Snapshot backups are verified for data integrity using Microsoft tools after each backup. Restoring entire Exchange databases with SnapManager can be done in minutes. Restorations that used to take days can be accomplished in a fraction of the time and with complete confidence. Using SnapManager for Exchange, server scalability is no longer limited by the time it takes to back up and restore data. Capacity up to 32TB can be added on-the-fly while Exchange is online. SnapManager includes an intuitive graphical user interface with task wizards that help simplify administration 13.3 SnapDrive for Unix SnapDrive for UNIX is a tool that simplifies the backup of data so that you can recover it should it be accidentally deleted or modified. SnapDrive for UNIX uses N series Snapshot Chapter 13. Overview of SnapDrive 139

170 snapdrive.fm Draft Document for Review December 6, :59 am technology to create an image (i.e., a snapshot) of the data on a storage system attached to a UNIX host at a specific point in time. If the need arises later, you can restore the data to the storage system. When you restore a snapshot, it replaces the current data on the storage system with the image of the data in the snapshot. In addition, SnapDrive for UNIX lets you provision storage on the storage system. SnapDrive for UNIX provides a number of storage features that enable you to manage the entire storage hierarchy, from the host-side application-visible file down through the volume manager to the storage-system-side LUNs providing the actual repository. With SnapDrive for UNIX installed, you can perform the following tasks: Create a snapshot of one or more volume groups on a storage system. Then you can rename the snapshot, restore it, or delete it. You can also connect a snapshot to a different location on the host or a different host or disconnect it. In addition, SnapDrive for UNIX lets you display information about snapshots that you created with it. Create storage on a storage system in the form of LUNs, file systems, logical volumes, or disk groups. Then you can resize the storage or delete it. You can also connect the storage to a host or disconnect it. In addition, SnapDrive for UNIX lets you display information about storage that you created with it. Note: SnapDrive for UNIX works only with snapshots it creates. It cannot restore snapshots that it did not create How SnapDrive for UNIX works The SnapDrive for UNIX software interacts with the host operating system and volume manager. It lets you easily and quickly back up and restore data about host volume groups that you stored on a N series system. You can use it to manage the snapshots you create using it. SnapDrive for UNIX coordinates the host Logical Volume Manager (LVM) volume groups and file systems to ensure that the host file systems stored on NetApp LUNs have consistent images in the snapshot. This action enables you to restore data from the backup snapshots without requiring significant data recovery steps on the host. You can also use SnapDrive for UNIX to create and manage storage. The SnapDrive storage commands work with LVM to let you create LVM objects and file systems that use the storage. They also let you remove the mappings between the storage and the host as well as delete the storage. SnapDrive for UNIX communicates with the storage system using the host IP interface that you specified when you set up the storage system 140 The IBM TotalStorage Network Attached Storage N Series

171 Draft Document for Review December 6, :59 am snapdrive.fm snapdrive storage show -dg Nseries11 dg: netapp1 hostvol: /dev/nseries1/lvol1 state: AVAIL hostvol: /dev/nseries1/lvol2 state: AVAIL fs: /dev/nseries1lvol1 mount point: /mnt/um1 fs: /dev/nseries1/lvol2 mount point: NOT MOUNTED device filename adapter path size state lun path /dev/sdb - P 2g online eccentric:/vol/vol1/lun1 /dev/sdc - P 2g online eccentric:/vol/vol1/lun2 Figure 13-6 Unix SnapDrives SnapDrive for UNIX and logical volumes The host LVM combines LUNs from a storage system into disk or volume groups. This storage is then divided into logical volumes, which are used as if they were raw disk devices to hold file systems or raw data. Note: This documentation refers to logical volumes as host volumes to distinguish them from storage system volumes. SnapDrive for UNIX integrates with the host LVM to determine which N series LUNs make up each disk group, host volume, and file system requested for snapshot. Because data from any given host volume can be distributed across all disks in the disk group, snapshots can be taken and restored only for whole disk groups. j SnapDrive SnapDrive SnapDrive SnapDrive N series Virtual Disks Virtualization layer Storage Network Filer RAIDgroups Instant point-in-time Snapshot for Non-disruptive backup, Data Mining, App testing Greater application performance: run other processes off independent snapshots Near-instantaneous restoration with SnapRestore Replication using SnapMirror and rolling Snapshots Figure 13-7 N series LUNS Chapter 13. Overview of SnapDrive 141

172 snapdrive.fm Draft Document for Review December 6, :59 am 13.4 Flexible Networked Storage SnapDrive is independent of the underlying storage access media and protocol. The iscsi protocol provides storage access when the filer and host server are joined using Gigabit Ethernet. The FCP protocol facilitates storage access through a Fibre Channel host bus adapter (HBA) and storage area network (SAN). The protocol used depends on the filer-to-host (Windows) server interconnect. The iscsi protocol cannot be used to access storage over Fibre Channel and the FCP protocol cannot be used to access storage over Gigabit Ethernet. The functionality and features intrinsic to SnapDrive are identical regardless of the underlying storage access protocol. This is because SnapDrive software utilizes either of the two access methods to access virtual disks, which are created and stored on filers. Thus a virtual disk can be created and accessed using the iscsi or FCP access protocols. Virtual disks are referred to as logical unit numbers (LUNs) when accessed over the iscsi and FCP protocols. Within the N Series storage systems LUN s are just special files 13.5 Summary SnapDrive enables Microsoft Windows 2000 servers and Windows applications to access virtual disks on NetApp filers. Administrators can execute nearly instantaneous Snapshot backups and restorations of application data. Snapshot backups can be mirrored to one or more locations across a LAN or WAN link for centralized archival or disaster recovery purposes. SnapDrive coordinates Snapshot execution with supported applications. Virtual disks appear as, and function, in the same manner as locally-attached drives using the iscsi or Fibre Channel (FCP) access protocols. LUNs and Snapshots are managed from within the SnapDrive MMC interface. 142 The IBM TotalStorage Network Attached Storage N Series

173 Draft Document for Review December 6, :59 am snapmirror.fm 14 Chapter 14. SnapMirror This chapter describes SnapMirror which is a software product that allows a dataset to be replicated between N Series storage systems over a network, typically for backup or disaster-recovery purposes. SnapMirror is enhanced by the introduction of FlexVol and FlexClone technology, described earlier in this book and by the introduction of synchronous and semi-synchronous modes Introduction to SnapMirror SnapMirror is a software product that allows a data set to be replicated between N Series storage systems over a network for backup or disaster recovery purposes. After an initial baseline transfer of the entire data set, subsequent updates only transfer new and changed data blocks from the source to the destination, which makes SnapMirror highly efficient in terms of network bandwidth utilization. The destination file system is available for read-only access, or the mirror can be broken to enable writes to occur on the destination. After breaking the mirror, it can be reestablished by synchronizing the changes made to the destination back onto the source file system. In the traditional asynchronous mode of operation, updates of new and changed data from the source to the destination occur on a schedule defined by the storage administrator. These updates could be as frequent as once per minute or as infrequent as once per week, depending on user needs. Synchronous mode is also available, which sends updates from the source to the destination as they occur, rather than on a schedule. If configured correctly, this can guarantee that data written on the source system is protected on the destination even if the entire source system fails due to natural or human-caused disaster. A semi-synchronous mode is also provided, which can minimize loss of data in a disaster while also minimizing the performance impact of replication on the source system. In order to maintain consistency and ease of use, the asynchronous and synchronous interfaces are identical with the exception of a few additional parameters in the configuration file SnapMirror Defined SnapMirror replicates a filesystem on one filer to a read-only copy on another filer (or within the same filer) Copyright IBM Corp All rights reserved. 143

174 snapmirror.fm Draft Document for Review December 6, :59 am Based on Snapshot technology, only changed blocks are copied once initial mirror is established Runs over IP or FC Data is accessible read-only at remote site. Replication is either volume or qtree-based The Three Modes of SnapMirror SnapMirror can be used in three different modes: asynchronous, synchronous, and semi-synchronous Asynchronous Mode In asynchronous mode, SnapMirror performs incremental, block-based replication as frequently as once per minute. Please consult with your technical team for the best plan for your environment, or to find out whether synchronous SnapMirror is a better match. Performance impact on the source N Series storage system is minimal as long as the system is configured with sufficient CPU and disk I/O resources. More than one physical path may be required for a synchronous mirror. Synchronous SnapMirror supports up to two paths for a particular relationship. These paths can be Ethernet, Fibre Channel, or a combination of the two. Multipath support allows synchronous and semi-sync traffic to be load-balanced between these paths and provides for failover in the event of a network outage. There are two modes of multipath operation 1. Multiplexing mode. Both paths are used simultaneously, load-balancing transfers across the two. When a failure occurs, the load from both transfers will move to the remaining path. 2. Failover mode. One path is specified as the primary path in the configuration file. This path is the desired path and will be used until a failure occurs. The second path would then be used. The first and most important step in asynchronous mode, involves the creation of a one-time, baseline transfer of the entire dataset. This is required before incremental updates can be performed. This operation proceeds as follows: 1. The primary storage system takes a Snapshot copy (a read-only, point-in-time image of the file system). 2. This Snapshot copy is called the baseline Snapshot copy. 3. All data blocks referenced by this Snapshot copy and any previous Snapshot copies, are transferred and written to the secondary file system. 144 The IBM TotalStorage Network Attached Storage N Series

175 Draft Document for Review December 6, :59 am snapmirror.fm 4. After initialization is complete, the primary and secondary file systems will have at least one Snapshot copy in common. Step 1: Baseline Source Target SAN or NAS Attached hosts... Immediate Write Acknowledgement Baseline copy of source volume(s) OR LAN/WAN Step 2: Updates Source Target SAN or NAS Attached hosts... Immediate Write Acknowledgement Periodic updates of changed blocks LAN/WAN Figure 14-1 Baseline creation After initialization, scheduled or manually triggered, updates can occur. Each update transfers only the new and changed blocks from the primary to the secondary file system. This operation proceeds as follows: 1. The primary storage system takes a Snapshot copy. 2. The new Snapshot copy is compared to the baseline Snapshot copy to determine which blocks have changed. 3. The changed blocks are sent to the secondary and written to the file system. 4. After the update is complete, both file systems have the new Snapshot copy, which becomes the baseline Snapshot copy for the next update. Because asynchronous replication is periodic, SnapMirror is able to consolidate writes and conserve network bandwidth Synchronous Mode Synchronous SnapMirror is a SnapMirror feature that replicates data from a source volume to a partner destination volume at or near the same time that it is written to the source volume, rather than according to a predetermined schedule. This guarantees that data written on the source system is protected on the destination even if the entire source system fails and guarantees zero data loss in the event of a failure, but can have a significant impact on performance. It is not necessary or appropriate for all applications. Synchronous SnapMirror replicates data between single filers or clustered filers located at remote sites using IP or FCP infrastructure with no special converters required. Synchronous SnapMirror is simply a mode of operation or feature that has recently been added to the SnapMirror software. Chapter 14. SnapMirror 145

176 snapmirror.fm Draft Document for Review December 6, :59 am N Series to N Series Synchronous SnapMirror Figure 14-2 Single Path SnapMirror What is meant by Synchronous To avoid any potential confusion, it is appropriate to review exactly what is meant by the word synchronous in this context. The best way to do this is to examine a scenario where the primary data storage device fails completely and examine the disaster s impact on an application. In a typical application environment: 1. A user saves some information in the application. 2. The client software communicates with a server and transmits the information. 3. The server software processes the information and transmits it to the operating system on the server. 4. The operating system software sends the information to the storage. 5. The storage acknowledges receipt of the data. 6. The operating system tells the application server that the write is complete. 7. The application server tells the client that the write is complete. 8. The client software tells the user that the write is complete. In most cases, these steps take only tiny fractions of a second to complete. If the storage system fails in such a way that all data on it is lost (such as a fire or flood that destroys all of the storage media), the impact to an individual transaction will vary based on when the failure occurs. If the failure occurs before step 5, the storage will never acknowledge receipt of the data. This will result in the user receiving an error message from the application, indicating it failed to save the transaction. If the failure occurs after step 5, the user will see client behavior that indicates correct operation (at least until the following transaction is attempted). Despite the indication by the client software (in step 8) that the write was successful, the data is lost. 146 The IBM TotalStorage Network Attached Storage N Series

177 Draft Document for Review December 6, :59 am snapmirror.fm The first case is obviously preferable to the second, as it provides the user or application with knowledge of the failure and the opportunity to preserve the data until the transaction can be attempted again. In the second case, the data may be discarded based on the belief that it is safely stored already. With traditional asynchronous SnapMirror, data is replicated from the primary storage to a secondary or destination storage device on a schedule. If this schedule were configured to cause updates once per hour, for example, it is possible for a full hour of transactions to be written to the primary storage, and acknowledged by the application, only to be lost when a failure occurs before the next update. For this reason, many customers attempt to minimize the time between transfers. Some customers replicate as frequently as once per minute, which significantly reduces the amount of data that could be lost in a disaster. This level of flexibility is good enough for the vast majority of applications and users. In most real-world environments, loss of one or five minutes of data is of trivial concern compared to the downtime incurred during such an event; any disaster that completely destroys the data on the N Series storage system would most likely also destroy the relevant application servers, critical network infrastructure, etc. However, there are some customers and applications that have a zero data loss requirement, even in the event of a complete failure at the primary site. Synchronous mode is appropriate for these situations. It modifies the application environment described above such that replication of data to the secondary storage occurs with each transaction. 1. A user saves some information in the application. 2. The client software communicates with a server and transmits the information. 3. The server software processes the information and transmits it to the operating system on the server. 4. The operating system software sends the information to the primary storage. 5. The primary storage sends the information to the secondary storage. 6. The secondary storage acknowledges receipt of the data. 7. The primary storage acknowledges receipt of the data. 8. The operating system tells the application server that the write is complete. 9. The application server tells the client that the write is complete. 10.The client software tells the user that the write is complete. The key difference, from the application s point of view, is that the storage does not acknowledge the write until the data has been written to both the primary and the secondary storage. This has some performance impact, as will be discussed later, but modifies the failure scenario in beneficial ways. If the failure occurs before step 7, the storage will never acknowledge receipt of the data. This will result in the user receiving an error message from the application, indicating it failed to save the transaction. This causes inconvenience, but no data loss. If the failure occurs during or after step 7, the data is safely preserved on the secondary storage system despite the failure of the primary. Note: Regardless of what technology is used, it is always possible to lose data; the key point is that with synchronous mode loss of data that has been acknowledged is prevented. Chapter 14. SnapMirror 147

178 snapmirror.fm Draft Document for Review December 6, :59 am Operation The first step involved in synchronous replication, is a one-time, baseline transfer of the entire dataset, just as in asynchronous mode, as described , Asynchronous Mode on page Once the baseline transfer has completed, SnapMirror can change to synchronous mode, as follows: a. Asynchronous updates occur, as described above, until the primary and secondary file systems are very close to being synchronized. b. NVLOG forwarding begins. This is a method for transferring updates as they occur. c. Consistency point (CP) synchronization begins. This is a method for ensuring that writes of data from memory to disk storage are synchronized on the primary and secondary systems. d. New writes from clients or hosts on the primary file system are blocked until acknowledgment of those writes has been received from the secondary system. e. One final update occurs using the same method as asynchronous updates, as described above. Once SnapMirror has determined that all data acknowledged by primary has been safely stored on the secondary, the system is in synchronous mode. At this point, the output of a SnapMirror status query will show the relationship is In Sync. Note: If the environment is not able to maintain synchronous mode (because of networking or destination issues), SnapMirror will drop to asynchronous mode. When the connection is reestablished, the source filer will asynchronously replicate data to the destination once each minute, until synchronous replication is reestablished. Once this occurs, a message will be logged of the change of status ( into or out of synchronous status). This safety net is known as fail-safe synchronous. The role of NVLOG in Synchronous SnapMirror NVLOG forwarding is a critical component of how synchronous mode works. It is the method used to take write operations submitted from clients against the primary file systems to be replicated to the destination. In order to completely understand NVLOG forwarding, a basic knowledge of how it works can be read in 3.3.3, Understanding NVLOG forwarding on page 47. When NVLOG forwarding is active in synchronous mode, some modifications to steps 2 and 3 as described in 3.3.3, Understanding NVLOG forwarding on page 47 are modified as follows. The request is journaled in NVRAM. It is also recorded in cache memory and forwarded over the network to the SnapMirror destination system, where it is journaled in NVRAM and cache memory. Once the request is safely stored in NVRAM and cache memory on both the primary and secondary systems, Data ONTAP acknowledges the write to the client system, and the application that requested the write is free to continue processing. As can be seen, NVLOG forwarding is the primary mechanism by which data is synchronously protected. One important detail is the way in which the NVLOG data is stored on the secondary storage system. Since this system has its own storage as well as the mirrored data from the primary system (at the very least, every filer has its own root volume), and because CPs need to be kept synchronized on any mirrored volumes, the NVLOG data cannot be stored in NVRAM on the secondary system in the same way as normal file system writes. Instead, the NVLOG 148 The IBM TotalStorage Network Attached Storage N Series

179 Draft Document for Review December 6, :59 am snapmirror.fm data is treated as a stream of writes to a pair of these files are logged in the secondary system s NVRAM just like any other write. Because the NVLOG data needs to be written to these files on the root volume, the performance of the root volume on the secondary system has a direct impact on the overall performance of synchronous or semi-synchronous mode SnapMirror Semi-Synchronous Mode SnapMirror also provides a semi-synchronous mode, sometimes called semi-sync. Synchronous SnapMirror can be configured to lag behind the source volume by a user-defined number of write operations or milliseconds. This mode is like asynchronous mode in that the application does not need to wait for the secondary storage to acknowledge the write before continuing with the transaction. Of course, for this very reason it is possible to lose acknowledged data. This mode is like synchronous mode in that updates from the primary storage to the secondary storage occur right away, rather than waiting for scheduled transfers. This makes the potential amount of data lost in a disaster very small. Semi-Synchronous minimizes data loss in a disaster while also minimizing the extent to which replication impacts the performance of the source system. Semi-synchronous mode provides a middle ground that keeps the primary and secondary file systems more closely synchronized than asynchronous mode. Configuration of semi-synchronous mode is identical to configuration of synchronous mode, with the addition of an option that specifies how many writes can be outstanding (unacknowledged by the secondary system) before the primary system delays acknowledging writes from the clients. Internally, semi-synchronous mode works identically to synchronous mode in most cases. The only difference lies in how quickly client writes are acknowledged; the replication methods used are the same. However, it is possible to configure semi-synchronous mode in a way that changes the replication strategy. As discussed in the 3.3.3, Understanding NVLOG forwarding on page 47 earlier, a CP is triggered when NVRAM is one-half full, or every 10 seconds, whichever occurs sooner. If semi-synchronous mode is configured to allow unacknowledged transactions greater than 10 seconds old, SnapMirror falls back to performing CP synchronization only. NVLOG forwarding is halted, since a CP synchronization is sufficiently frequent to meet the service level requested. When a CP synchronization occurs under such circumstances, the tetris sent to the secondary filer includes not just the list of data blocks to be written, but also the content of those data blocks. This is because with NVLOG forwarding disabled, the secondary system does not have a copy of the data until the CP synchronization occurs. For the vast majority of customer configurations, NVLOG forwarding is desirable. Thus, configuring SnapMirror to allow more than 10 seconds of outstanding data is not recommended for customers who want higher synchronicity levels. However, if NVLOG forwarding is not required, specifying a large time value for outstanding data may reduce the overall CPU usage on the primary storage system. This can allow for significant increases in overall throughput if CPU usage is a limiting factor. Note: Unlike asynchronous mode, which can replicate either volumes or quota trees, synchronous and semi-synchronous modes work only with volumes. Scenario The scenario looks like this if using semi-sync mode. 1. A user saves some information in the application. 2. The client software communicates with a server and transmits the information. 3. The server software processes the information and transmits it to the operating system on the server. Chapter 14. SnapMirror 149

180 snapmirror.fm Draft Document for Review December 6, :59 am 4. The operating system software sends the information to the primary storage. 5. The primary storage sends the information to the secondary storage. The primary storage simultaneously acknowledges receipt of the data. 6. The operating system tells the application server that the write is complete. 7. The application server tells the client that the write is complete. 8. The client software tells the user that the write is complete. 9. At some point after step 5, the secondary acknowledges receipt of the data. Note that step 9 above could potentially occur before, or simultaneously with, step 6. If the secondary storage system is slow or unavailable, it is possible that a large number of transactions could be acknowledged by the primary storage system and yet not protected on the secondary. These transactions represent a window of vulnerability to loss of acknowledged data. For a window of zero size, customers may of course use fully synchronous mode rather than semi-sync. If using semi-sync, the size of this window is customizable based on user and application needs. It may be specified as a number of operations, milliseconds, or seconds. If the number of outstanding operations equals or exceeds the number of operations specified by the user, further write operations will not be acknowledged by the primary storage system until some have been acknowledged by the secondary. Likewise, if the oldest outstanding transaction has not been acknowledged by the secondary within the amount of time specified by the user, further write operations will not be acknowledged by the primary storage system until all responses from the secondary are being received within that time frame SnapMirror Applications Data replication for local read access at remote sites Slow access to corporate data is eliminated Offload tape backup CPU cycles to mirror Isolate testing from production volume ERP testing, Offline Reporting Cascading Mirrors Replicated mirrors on a larger scale Disaster recovery Replication to hot site for mirror failover and eventual recovery They can use the Data ONTAP SnapMirror feature in combination with FlexClone volumes to perform migration faster and more efficiently. For Corporations with a warm backup site, or need to offload backups from production servers For generating queries and reports on near-production data 150 The IBM TotalStorage Network Attached Storage N Series

181 Draft Document for Review December 6, :59 am snapmirror.fm Backup Site MAN/WAN Tape Library Production Sites Figure 14-3 Data Replication for Warm backup/offload 14.5 Implications for Synchronous and Asynchronous SnapMirror With synchronous SnapMirror, a Snapshot copy is made on the destination volume every time a write is done on the source. The Snapshot copy may be deleted from the clone but not from the source volume while the SnapMirror relationship is in sync. Synchronous SnapMirror has a hard lock, whereas asynchronous SnapMirror has a soft lock. If the process falls out of synchronous mode, it will revert to asynchronous mode and become a soft lock. Synchronous SnapMirror keeps the source and destination in sync as much as possible: If the NVLOG channel requests (per op) time out. If the CP on the source takes more than one minute. If network errors persist even after three retransmissions. If the source or destination fail to restart. If the network connection fails. In such situations, synchronous SnapMirror completes an asynchronous update within one minute. It also turns on consistency point forwarding and NVLOG forwarding Volume Capacity and SnapMirror The source capacity must be less than or equal to the destination capacity when using flexible volumes. When the administrator performs a SnapMirror Break and the destination Chapter 14. SnapMirror 151

182 snapmirror.fm Draft Document for Review December 6, :59 am capacity is greater then the source capacity, the destination volume shrinks to match the capacity of the smaller source volume. This is a much more efficient usage of disk space, since it avoids consumption of unused space Guarantees in a SnapMirror Deployment Guarantees determine how the aggregate preallocates space to the flexible volume. SnapMirror never enforces guarantees, regardless of how the source volume is set. As long as the destination volume is a SnapMirror destination (replica), the guarantee is volume-disabled. Subsequently, the guarantee mode is the same as the volume mode when the volume is broken off using SnapMirror Break 14.8 SNAP Mirror Detail Source Volume Target Volume Snap A Baseline Transfer Figure 14-4 SnapMirror Detail 1. The first step is create a full Snap and then do a baseline transfer to the target volumefigure 14-4 on page The IBM TotalStorage Network Attached Storage N Series

183 Draft Document for Review December 6, :59 am snapmirror.fm Source Volume Source file system continues to change during transfer Target Volume Target file system is now consistent, and a mirror of the Snapshot A file system Common snapshot Snap A Baseline Transfer Completed Figure 14-5 SnapMirror Internals 2. As you would expect in a Customer 24 hour operation updates to the source volume continue to occur while the baseline image is transferred. The integrity of snap A is maintained with Snapshot and a point in time baseline image of Snapshot A is transferred Source Volume Target Volume Target volume is now consistent, and a mirror of the Snapshot B file system Snap B Incremental Transfer Completed Snap A Figure 14-6 Consistent SnapMirror Chapter 14. SnapMirror 153

184 snapmirror.fm Draft Document for Review December 6, :59 am 3. For reference purposes we will call Snap A T0 time and Snap B T1 time. At T1 time a Snapshot is done again capturing a image of the volume at that point in time. After completion of Snap B a incremental transfer is initiated. Incremental because only portions of the volume have changed since T0 time. Updates continue to occur but the Snapshot maintains integrity of Snap B. After completion of the Incremental transfer you now have a consistent full image copy of the source volume as it looked at T1 time. Source Volume Target Volume Target volume is now consistent, and a mirror of the Snap C file system Snap C Incremental Transfer Completed Figure 14-7 Snap C consistency 4. Operations continue and now another Snapshot is done Snap C or T2 time capturing a image of the volume at that point in time. After completion of Snap C a incremental transfer is initiated. Incremental because only portions of the volume have changed since T1 time or Snap B. Updates continue to occur but the Snapshot maintains integrity of Snap C. After completion of the Incremental transfer you now have a consistent full image copy of the source volume as it looked at T2 time 154 The IBM TotalStorage Network Attached Storage N Series

185 Draft Document for Review December 6, :59 am snapmirror.fm 14.9 Isolate testing from production Production Backup/Test Incremental Transfer Snap C SnapMirror Resync READ & WRITE READ & WRITE (Resync backward works similarly in opposite direction) Figure 14-8 Isolate Testing from Production After you ve captured a consistent image, i.e. baseline image and subsequent incremental transfers, the SnapMirror relationship is broken and the target is enabled for write operations as well as read for application testing, etc. During this time the Source volumes continues to be available online. At any time you can Resync forward by re-establishing the mirror relationship Cascading Mirrors Cascading is a method of replicating from one destination system to another in a series. For example, one might want to perform synchronous replication from the primary site to a nearby secondary site and asynchronous replication from the secondary site to a far-off tertiary site. Currently only one synchronous SnapMirror relationship can exist in a cascade Chapter 14. SnapMirror 155

186 snapmirror.fm Draft Document for Review December 6, :59 am Source N Series Target N Series Target N Series Target N Series SnapMirror SnapMirror SnapMirror Source Volume Target Volume Target Volume Target Volume (read + write) (read only) (read only) (read only) Figure 14-9 Cascading Mirrors Cascading Replication Example New York Sao Paulo Mexico City WAN Toronto Bogata Figure Cascading Replication Example Replicate to multiple locations (30) across the continent 156 The IBM TotalStorage Network Attached Storage N Series

187 Draft Document for Review December 6, :59 am snapmirror.fm Disaster Recovery Send data only once across the expensive WAN Reduces resource utilization on source filer (redirect) X LAN/ WAN Production Site Disaster Recovery Site Figure Disaster Recovery Example (resync backwards after source restoration) For any corporation that cannot afford the downtime of a full restore from tape. (days) Data Centric Environments Reduces Mean Time To Recovery when a disaster occurs Understanding the Performance Impact of Synchronous and Semi-synchronous Modes Performance is a very complex and difficult area to quantify. It is certainly beyond the scope of this report to go into overall NetApp filer performance, but it makes sense to discuss what effects synchronous SnapMirror has on individual system performance and how synchronous SnapMirror affects overall performance. It s important to note that the guidelines and recommendations that follow are just that and are not mean to be taken as exact measurements. Any synchronous replication method, regardless of the technology used, will have an impact on the performance of applications using the storage. Understanding business requirements for application performance and data protection will allow an organization to make informed choices between various data protection strategies. When examining the application performance impact of synchronous or semi-synchronous replication, there are two primary factors to consider. 1. Overall system throughput may be reduced, due to CPU overhead imposed by the replication process Network bandwidth constraints between the primary and secondary storage Slower system performance on the secondary storage than on the primary Impact of workload on the secondary storage, reducing its ability to service replication traffic Root volume performance on the secondary system Chapter 14. SnapMirror 157

188 snapmirror.fm Draft Document for Review December 6, :59 am 2. Individual write operations will take longer to complete, due to The need for additional processing of each operation Network latency between the primary and secondary storage All discussion of these factors will focus on the primary storage system and its client applications; best practice is to provide a dedicated secondary storage system for synchronous or semi-synchronous replication, and it is assumed that this best practice is being followed. Thus, performance impact on the secondary storage system is not considered an important issue except insofar as it creates an impact on the primary storage system CPU Impact in Synchronous and Semi-synchronous Modes When a system running SnapMirror in synchronous or semi-synchronous mode receives a write request from a client, it needs to do all of the standard processing that would be required normally. It also needs to do additional processing related to SnapMirror, to transfer the information to the secondary storage system. This adds significant CPU overhead to each and every write operation. While it is outside the scope of this book to discuss the individual components of this CPU overhead in detail, it is helpful to illustrate the concept using an example. Reading or writing information over network connections is one task performed by a filer. Higher volumes of data being passed across the network result in more CPU usage on the filer. So if the network-related CPU impact is considered independently of other factors, a client writing data to the filer at 30MB per second would use about half of the CPU used by a client writing data at 60MB per second. When replicating data in synchronous or semi-synchronous mode, all of the data written to the primary by clients must also be passed across a network to the secondary system. So now in addition to processing the data coming in from clients, the filer CPU must do additional work to send the same data back out to the secondary system. The same basic mechanism is at work in other CPU-intensive parts of the software in addition to networking. So in general, one can expect about double the CPU usage on a system with synchronous or semi-synchronous SnapMirror as compared to the same workload on a system without SnapMirror Network Bandwidth Considerations Since all of the data written to the primary storage must be replicated to the secondary storage as it is written, write throughput to the primary storage cannot generally exceed the bandwidth available between the primary and secondary storage devices. Since SnapMirror transfers can be performed over standard Ethernet networks and over Fibre Channel networks, there is a choice for transport. This choice will most likely be determined by preference or existing infrastructure rather than by performance needs. In general, the configuration guideline is to configure the network between the primary and secondary storage with at least as much bandwidth as the network between the clients and the primary storage. 158 The IBM TotalStorage Network Attached Storage N Series

189 Draft Document for Review December 6, :59 am snapmirror.fm System Performance on the Secondary Storage Device Since the secondary storage system must write all the same data that the primary storage system writes, it is important that the secondary system is capable of maintaining the write throughput that is expected on the primary system. For example, if a low-end filer such as the N3700 is configured as a secondary for a high-end filer such as a N5500, the write performance on the N5500 will be limited to what can be achieved on the N3700. For this reason, the best practice is to always configure the same model of filer for both primary and secondary systems or a higher-end filer on the secondary system System Workload on the Secondary Storage Device For the same reason, one should avoid using the secondary system as a production data storage device; any workload imposed on the secondary will have the effect of reducing the speed with which it can accept writes from the primary system; this in turn will reduce the speed at which the primary system can acknowledge write operations from clients Application of SyncMirror Modes Since using synchronous or semi-synchronous mode requires a more significant investment in configuration, network, and hardware resources, asynchronous mode is recommended for most customers and applications. If business requirements are such that lag times in the range of two to three minutes are unacceptable, but zero lag is not a requirement, semi-synchronous mode is appropriate. For most customers the recommended value for the outstanding option is nine seconds; this provides a good balance between application performance and data protection in most environments. If zero lag time is an absolute business requirement, use synchronous mode. Careful system configuration can keep the performance impact within a reasonable range. Chapter 14. SnapMirror 159

190 snapmirror.fm Draft Document for Review December 6, :59 am 160 The IBM TotalStorage Network Attached Storage N Series

191 Draft Document for Review December 6, :59 am syncmirror.fm 15 Chapter 15. SyncMirror SyncMirror is synchronous mirror of a volume. It maintains a strict physical separation between the two copies of your mirrored data. In case of an error in one copy, the data is still accessible without any manual intervention. The SyncMirror software creates aggregates (see, Aggregates and RAID groups on page 186) or traditional volumes that consist of two copies of the same WAFL file system. The two copies, known as plexes, are simultaneously updated; therefore, the copies are always identical What is SyncMirror? front end LAN/SAN Two synchronous mirrors (plexes) of a filesystem within a single volume FC back end FC Both plexes are updated synchronously on writes. Can be described as RAID No single point of failure in hardware will cause a mirrored volume to fail except for the filer head itself. /vol0/plex0 RAID group 0 P D D D RAID group 1 P D D D /vol0/plex1 RAID group 0 P D D D RAID group 1 P D D D Figure 15-1 Synchronous Mirroring Copyright IBM Corp All rights reserved. 161

192 syncmirror.fm Draft Document for Review December 6, :59 am 15.1 Advantages of SyncMirror A SyncMirror relationship between two aggregates or traditional volumes provides a high level of data availability because the two plexes are physically separated on different shelves and the shelves are connected to the filer with separate cables and adapters. Each plex has its own collection of spare disks. What: Replicate synchronously Upon disaster, fail over to partner filer at remote site to access replicated data 500 meters distance Site #1 Site #2 LAN Benefits No single point of failure No data loss Fast data recovery A B FC B A Limitations Distance X Replicated Data Y Y-mirror X-mirror Figure 15-2 Clustered SyncMirror Physical separation of the plexes protects against data loss in the case of a double-disk error or loss of disk connectivity. The unaffected plex continues to serve data while you fix the cause of the failure. Once fixed, the two plexes can be re synchronized and the mirror relationship reestablished. 162 The IBM TotalStorage Network Attached Storage N Series

193 Draft Document for Review December 6, :59 am syncmirror.fm Figure 15-3 single storage system sync mirror between volume X s Another advantage of mirrored plexes is faster rebuild time. SyncMirror falls into the 6th Tier of the Disaster Recovery hierarchy Figure 15-4 on page 163. SyncMirror SnapMirrorr Figure 15-4 Tiers of Disaster Recovery In contrast, if a SnapMirrored aggregate or traditional volume goes down, its SnapMirror partner cannot automatically take over the file serving functions and can only restore data to its condition at the time the last snapshot was created (you must issue commands to make the partner's data available) Figure 15-4 on page 163. Chapter 15. SyncMirror 163

194 syncmirror.fm Draft Document for Review December 6, :59 am With SyncMirror, filers can tolerate multiple simultaneous disk failures across the RAID groups within the filesystem. This redundancy goes beyond typical mirrored (RAID-1) implementations seen in the market in that each SyncMirror RAID group is also RAID-4 protected. A complete mirror could be lost and an additional single drive loss within each RAID group could occur without data loss. Figure 15-5 SyncMirror Each RAID group is mirrored on storage connected to the server through completely independent data paths. All mirrored storage is connected to separate host bus adapters (HBAs) with completely separate data paths for the greatest possible redundancy. Re synchronization of a mirror volume occurs efficiently using Snapshots from the source volume. (Snapshots are discussed in more detail in a later section.) SyncMirror can be configured for use with stand alone filers or clusters. Difference between SnapMirror and SyncMirror The major difference between SyncMirror and Synchronous SnapMirror is who owns the 2nd copy of the data. With SyncMirror, one host owns both plexes. It simply writes to both plexes from NVRAM. It also provides for instant failover in the event that one plex fails for any reason. With Synchronous SnapMirror, another filer owns the 2nd copy. It moves blocks similar to SnapMirror where blocks move over IP to the other filer rather than direct fibre channel writes to local disks. There is no automatic failover since the primary host does not own the 2nd copy of the volume and SnapMirror does not have this functionality. 164 The IBM TotalStorage Network Attached Storage N Series

195 Draft Document for Review December 6, :59 am syncmirror.fm 15.2 Assuring reliable enterprise data availability with SyncMirror With SyncMirror, filers can tolerate multiple simultaneous disk failures across the RAID groups within the filesystem. This redundancy goes beyond typical mirrored (RAID-1) implementations seen in the market in that each SyncMirror RAID group is also RAID-4 protected. A complete mirror could be lost and an additional single drive loss within each RAID group could occur without data loss. All mirrored storage is connected to separate host bus adapters (HBAs) with completely separate data paths for the greatest possible redundancy. Re synchronization of a mirror volume occurs efficiently using Snapshots from the source volume. (Snapshots are discussed in more detail in a later section.) SyncMirror can be configured for use with standalone filers or clusters 15.3 Business Continuance with SyncMirror By operating at the storage level instead of at the server or application level, IBM N Series business continuance solutions ensure protection while off loading tasks from busy servers. All solutions operate with simple and consistent interfaces and are guaranteed to work together. IBM N Series solutions rationalize business continuance strategies, simplify management, and greatly improve recovery times and reduce expensive downtime, protecting against lost revenue and damaged reputation. At the same time, their simplicity produces significant cost savings in the deployment and ongoing operation of a business continuance strategy. SnapVault to N5500 Figure 15-6 Business Continuity with SyncMirror Chapter 15. SyncMirror 165

196 syncmirror.fm Draft Document for Review December 6, :59 am 166 The IBM TotalStorage Network Attached Storage N Series

199 Draft Document for Review December 6, :59 am clusterfail.fm 16 Chapter 16. Cluster Failover This chapter provides/describes/discusses/contains... In this chapter we introduce/provide/describe/discuss... In this chapter: In this chapter, the following topics are discussed/described: This chapter provides/describes/discusses/contains the following: Sample level 2 n.n chapter heading (created by Special > Cross-Reference > Format: Head > Insert) Sample next level 2 heading Copyright IBM Corp All rights reserved. 169

200 clusterfail.fm Draft Document for Review December 6, :59 am 16.1 Sample level 2 n.n chapter heading (Head 1) new page Note to Author: The first level 2 n.n heading in a chapter should be the (Head 1) tag, skip to new page. Add text here (Body0) Sample level 2 n.n chapter heading (Head 2) Add text here (Body0) Sample level 3 n.n.n chapter heading (Head 3) Add text here (Body0). Sample level 4 heading (Head 4) Add text here (Body0). Sample level 5 heading (Head 5) Add text here (Body0) 170 The IBM TotalStorage Network Attached Storage N Series

201 Draft Document for Review December 6, :59 am protocol.fm 17 Chapter 17. MULTIPROTOCOL DATA ACCESS: NFS, CIFS, AND HTTP IBM N series filers (file server appliances) provide fast, simple, and reliable network data access to Network File System (NFS), Common Internet File System (CIFS, for Microsoft Windows networking), and Hypertext Transfer Protocol (HTTP, primarily for Web browsers) clients. Support for all three protocols is woven into the N series microkernel and file system, providing multi protocol data access that transcends the enclosed perspective of general-purpose operating systems. In the context of file service to Windows clients, Data ONTAP software for Windows is virtually indistinguishable from other Microsoft Windows servers in a Windows domain. For example, in addition to many other Windows-compatible features: Access control lists (ACLs) can be set on shares, files, and directories N series filers can be administered via Windows Server Manager and User Manager UNIX users are mapped to Windows users Multilanguage support is available via UNICODE File access logging can be tracked for Windows and UNIX users Filers inter operate with NTFS and Active Directory. Windows-style ACLs and UNIX-style file access permissions are fully integrated on the System Storage N series filer. Furthermore, Windows users are automatically mapped on the fly to their respective UNIX accounts (to assess file permissions), simplifying the unification of the two separate namespaces. This is especially powerful in conjunction with the N series autohome feature, which provides all Windows users with share-level access to their own home directories without the painstaking administrative efforts typically required on other Windows file servers. (Each user automatically sees his or her own home directory as a share in the Network Neighborhood but not other users home directories, unless those others have been deliberately and explicitly exported as publicly visible shares, of course With multiprotocol filing, PCs1 can store and access data side by side with UNIX-based clients, without compromising their respective file attributes, security models, or performance. Users with PC desktops can work within the single instances of their home or project Copyright IBM Corp All rights reserved. 171

202 protocol.fm Draft Document for Review December 6, :59 am directories, with Windows-based applications executing locally, or UNIX-based applications running on a server. And whether written to the filer via NFS or CIFS, documents can be accessed directly by a wide variety of Web browsers via HTTP Multiprotocol filing liberates the data infrastructure, largely freeing it from the constraints of operating system preference or legacy investments. This chapter describes the N series multiprotocol filer architecture explores the implications of multiprotocol filing for system administrators and end users reflects the Data ONTAP software for Windows evolution... Network Attached Storage(NAS) and SAN protocols Figure 17-1 Network Attached Storage and SAN protocols 172 The IBM TotalStorage Network Attached Storage N Series

203 Draft Document for Review December 6, :59 am protocol.fm 17.1 File System Permissions N series filers support both UNIX-style and NTFS-style file permissions. Because the ACL security model in NTFS is more complex than the file security model used in UNIX, no one-to-one mapping can be made between them. The fundamental problem occurs when a Windows or similar type client which expects an ACL accesses a UNIX file, or a UNIX client which expects UNIX file permissions accesses a Windows file. In these cases the file server must sometimes authorize the request using a user identity that has been mapped from one system to the other, or in some cases even a set of permissions that has been synthesized for one system based on the actual permissions for the file in the other system. The N series filer s promise to its users is that these synthesized file permissions are at least as restrictive as the true file permissions. In other words, if user XYZ cannot access a file using the true file permissions, the same is true when using the synthesized file permissions. Data ONTAP has a mechanism called UID-to-SID mapping to address this issue UNIX File Permissions (UFS) UNIX file permissions are usually represented as three sets of concatenated rwx triplets. An example directory listing in a UNIX file system looks like Example Example 17-1 lrwxrwxrwx 1 agy eng 10 Sep 2 14:42 perms.doc->perms.html -rw-r--r-- 1 agy eng 1662 Sep 2 14:32 perms.html -rw-rw agy eng 2399 Feb privileges.nt.txt drwxr-xr-x 2 agy eng 4096 Sep 2 14:42 work The first 10 characters on each line indicate the file type and permissions for the listed file. The first character contains a d if the file is a directory, and an l if it is a symbolic link, and a dash (-) if it is a regular file. The next three characters specify whether the user (agy in this example) can read(r), write(w), or execute(x) the file. The following three characters specify the permissions for the group associated with the file (eng in this example). The last three characters specify the permissions for users who are neither the owner nor members of the file s group. In the example, perms.doc is a symbolic link anyone can traverse (obtaining the file perms.html), perms.html is a regular file anyone can read but only the user agy can write, privileges.nt.txt is a file agy or anyone from the group eng can both read and write, and work is a directory that anyone can search and read files in, but only XYZ can insert files into or delete files from. When user agy attempts to access a UNIX file named nfsfile, the behavior of the file system depends on what kind of file it is. First, though, the request is checked against the permissions associated with the file. Suppose it is a read request. If agy is nfsfile s owner and the owner has read permission on nfsfile, the request can be honored. Otherwise, if XYZ is a member of the file s group and the group has read permission, the request can be honored. Otherwise, if all others have read permission, the request can be honored. If none of the foregoing tests succeed, the request is denied NTFS File Permissions NTFS uses a different system for denoting file permissions. On the FAT and FAT32 file systems which were designed as single-user file systems there are no permissions; anyone who can gain access to the machine has unlimited privilege on every file in the Chapter 17. MULTIPROTOCOL DATA ACCESS: NFS, CIFS, AND HTTP 173

204 protocol.fm Draft Document for Review December 6, :59 am system. The NTFS file system, however, available from Microsoft only on workstation and server platforms running Windows NT and Windows XP, 2000, and 2003, has a sophisticated security model. This same security model is also available for use by the CIFS network file system protocol on N series filers, so Windows clients accessing files on a filer, whether or not they are running NTFS can also use this security model. In NTFS and CIFS, each file has a data structure associated with it called a security descriptor (SD). This contains, among other things, the file owner s security ID (SID) and another data structure called an access control list (ACL, usually pronounced "ackle"). An ACL consists of one or more access control entries (ACEs), each of which explicitly allows or denies access to a single user or group. Suppose user agy attempts to open file pcfile for reading. The algorithm used to determine whether to grant agy permission to do this is conceptualized as follows: First search all the ACEs that deny access to anyone. If any of them deny read access to agy specifically or to any of the groups of which agy is a member, stop searching the ACL and reject the request. If no denials of access are found, continue searching the rest of the ACEs in the ACL. If one is found that grants read access to agy or to any of the groups agy is in, stop searching the ACL and allow the request. If the entire ACL has been searched and no ACEs were found that allow agy to read the file specified in the request, reject the request. It should be clear by now why it is not always possible to make a one-to-one mapping from the ACL model to the UNIX security model. For example, it is possible using the ACL security model to allow access to all the members of a group except some specified user. This can t be done using the UNIX model NTFS Access Modes The Windows file permissions model defines more access modes than UNIX does (read, write, and execute). The following table explains what each of the basic file access modes means. Request Type The Object is a Folder The object is a file Read (r) Write (w) Read & Execute (x) Modify Full Control Display the file s data, attributes, owner, and permissions Write the file, append the file, and read or change its attributes Display the folder s contents; display the data, attributes, owner, and permissions for files within the folder; and run files within the folder Read, write, modify, and execute files in the folder; change attributes and permissions; and take ownership of the folder or files within Read, write, modify, and execute files in the folder; change attributes and permissions; and take ownership of the folder of files within Display the file s data, attributes, owner, and permissions Write the file, append the file, and read or change its attributes Display the file s data, attributes, owner, and permissions, and run the file Read, write, modify, execute, and change the file s attributes Read, write, modify execute, and change the file s attributes and permissions and take ownership of the file 174 The IBM TotalStorage Network Attached Storage N Series

205 Draft Document for Review December 6, :59 am protocol.fm Request Type The Object is a Folder The object is a file List Folder contents Display the folder s contents; display the data, attributes, owner, and permissions for files within the folder; and run files within the folder Windows XP, 2000, and 2003 also support special access permissions, which are made by combining the permissions described above. The following table shows these special access permissions and their combinations. File Special Permissions Full Control Modify Read & Execute Read Write Traverse Folder/Execute file X X X List folder/read data X X X X Read attributes X X X X Read extended attributes Create files/write data Create folders/append data X X X X X X X X X X Write attributes X X X Write extended attributes Delete Subfolders and files X X X X Delete X X Read Permissions Change Permissions Take Ownership X X X X X X X Synchronize X X X X X Chapter 17. MULTIPROTOCOL DATA ACCESS: NFS, CIFS, AND HTTP 175

206 protocol.fm Draft Document for Review December 6, :59 am 17.2 File Service for Heterogeneous Environments with NFS and CIFS N series filers support both NFS-style and CIFS-style file permissions. NFS-style file permissions are widely used in most UNIX systems, while CIFS-style file permissions are used in Windows when communicating over networks. Because the ACL security model in CIFS is more complex than the NFS file security model used in UNIX, no one-to-one mapping can be made between them. This mathematical fact has forced all vendors of multiprotocol file storage products to develop non mathematical strategies to blend the two systems and make them as compatible as possible. This section explains the N series approach to this problem. File service for heterogeneous environments (UNIX workstations plus Windows Servers) is challenging. Windows NFS software can be installed on Windows clients or SAMBA can be installed on UNIX server, but these approaches are either costly or time-consuming or introduce an extra layer of file system emulation. Common sense suggests that changing the file server makes better sense than altering a large (and growing) number of Windows clients or adding a file system emulation layer that reduces performance. In other words, in a heterogeneous environment, the file server should support remote file access protocols for both UNIX-based clients and Windows clients. The alternative using separate file servers for each protocol can increase costs due to administrative complexity and redundant investments in storage. Routine administrative functions like backup and restore are duplicated. And it is still difficult to implement applications that need to facilitate sharing of data between UNIX and Windows users. Perhaps worst of all, perpetuating an arrangement of separate servers for distinct sets of UNIX and Windows clients creates an awkward situation for users who need to access the same files (in their home directories, for example) with locally executing applications on their Server, and by means of an X-Windows session on a UNIX host Example 17-2 on page 176. N series Multiprotocol Filer Figure 17-2 Multiprotocol N series Filer 176 The IBM TotalStorage Network Attached Storage N Series

207 Draft Document for Review December 6, :59 am protocol.fm 17.3 NFS Most UNIX clients use NFS for remote file access. Sun Microsystems introduced NFS in Since then, it has become a de facto standard protocol, used by 10 million systems worldwide. NFS is particularly common on UNIX-based systems, but NFS implementations are available for virtually every modern computing platform in current use, from desktops to supercomputers. Only when used by UNIX-based systems, however, does NFS closely resemble the behavior of a client s local file system CIFS (SMB) The operating systems running on Windows clients do not include NFS. Instead, the protocol for remote file access is CIFS, formerly known as Server Message Block (SMB). SMB was first introduced by Microsoft and Intel in the early 1980s, and is the protocol used in several diverse PC network environments. In 1992, SMB was ratified as an X/Open specification [XO- 92a and XO-92b]2. Author Comment: I will need to put these credits in the book [XO-92a] "Protocols for X/Open PC Internetworking: SMB, Version 2," X/Open, ISBN , [XO-92b] "IPC Mechanisms for SMB," X/Open, ISBN [MS-96c] Microsoft's CIFS protocol specification reference. In mid-1996, Microsoft began to promote CIFS as an open standard, with a published specification [MS-96c]3. Through developers conferences and interaction with standards bodies, Microsoft actively solicited input for the future evolution of CIFS, with one goal being support for the protocol on non-windows operating systems NFS vs. CIFS In every textbook description of NFS, its statelessness is emphasized. NFS operations are idempotent (can be repeatedly applied harmlessly), or if non-idempotent (file deletion, for example) are managed safely by the server. Clients are oblivious to server restarts (if service is restored promptly), with few exceptions. The NFS protocol emphasizes error recovery over file locking error recovery is simple if no state need be preserved. A CIFS file server is stateful (not stateless). The CIFS protocol emphasizes locking over error recovery, because Windows applications rely on strict locking. Strict locking requires a sustained connection. It is imperative that an active session not be interrupted. Applications executing on Windows clients react to a CIFS server in exactly the same manner as they do to local disk drives: a down server is no different from an unresponsive disk drive. Therefore, Chapter 17. MULTIPROTOCOL DATA ACCESS: NFS, CIFS, AND HTTP 177

208 protocol.fm Draft Document for Review December 6, :59 am Windows clients must be warned and allowed time to gracefully disengage (i.e., save files, exit applications, and so on) before server shutdowns or restarts Mixing NFS and CIFS Software solutions exist which allow UNIX-based servers to provide remote file access functionality to Windows clients without requiring NFS. Running in user-mode (not in the UNIX kernel), these applications support Windows clients via CIFS. Of these, the most widely used are SAMBA, Hummingbird NFS Maestro, and Windows Service for UNIX (SFU) by Microsoft. SAMBA is a server-side installation, while NFS Maestro and SFU are NFS emulators installed on the clients running NTFS. The most prevalent application is SAMBA. For users with a casual need for CIFS access, or who are new to PCs and are trying to get a feel for what Windows service is like, SAMBA offers several advantages: it is free, easily available, runs on most popular UNIX systems, and is relatively reliable for simple uses. To meet more serious requirements (e.g., providing primary file service for a large organization), SAMBA falls short in several important areas: shallow integration with the underlying UNIX-centric file system (particularly with respect to locking mechanisms), difficulty of installation, configuration, and administration, and lack of reliable support (it being public domain software). For discussion purposes, this paper uses the more widely implemented SAMBA as the example for emulated CIFS protocol support, in contrast to the N series native multiprotocol approach. 178 The IBM TotalStorage Network Attached Storage N Series

209 Draft Document for Review December 6, :59 am 7129p05.fm Part 5 installation Part 5 Initial and setup In this part we introduce/provide/describe/discuss the installation of the System Storage N series. Describe: two models single node A10 and clustered A20! -> chapter1 single -> chapter2 cluster next part: administration + additional configuration Copyright IBM Corp All rights reserved. 179

211 Draft Document for Review December 6, :59 am 7129_initial_inst_single_node_dp_0819.fm 18 Chapter 18. Single-node setup This chapter discusses the hardware setup and initial configuration of the System Storage N Series after delivery and unpacking. It contains information on cabling and the setup procedure for the single-node system, IBM Total Storage N 3700 Model 2863-A10. We also describe disk characteristics and basic disk addressing used in the Data ONTAP Operating System. Our primary focus are the steps to get the system up and running. For additional information on systems administration (creating volumes, shares and so on) refer to 2.5, Storage Management on page 23. This section covers the following topics: Planning the Implementation Planning worksheets System Storage N series disk handling basics Hardware setup Installation and initial configuration Copyright IBM Corp All rights reserved. 181

212 7129_initial_inst_single_node_dp_0819.fm Draft Document for Review December 6, :59 am 18.1 Planning the Implementation To achieve the greatest benefit from the System Storage N Series systems, pre-installation planning and sizing should include several important steps. The steps ensure that the appliances are configured for the best possible performance, reliability and ease of management for the needs of your environment. Proper installation and configuration planning helps to minimize the duration of your installation and configuration. Information regarding the physical planning can be obtained from the N3700 Hardware and Service, GA Physical Planning is not covered in this redbook Planning Worksheets Planning Worksheets are helpful for gathering information on the planed environment and configuration. Please refer to Appendix A, Setup Worksheets and Cabling Diagrams on page 1 for the worksheets: Initial setup worksheet single-node configuration (N A10) Initial setup worksheet clustered configuration (N A20) Basic cluster worksheet Disk ownership worksheet System Storage N Series disk handling basics The System Storage N3700 (appliance or EXP600 expansion) shelf can hold up to 14 disks. The disks are formatted with 520 bytes per sector. Before we step into the disk handling of the N Series systems, there are some details you need to understand. We explain the hardware and logical disk layout of the filers in this section. Prior to installing and attaching disk shelves to the N series appliances, you need to understand the following characteristics: Disk shelf numbering Loop IDs, Device IDs and disk addressing in Data ONTAP Disk shelf numbering Each disk shelf in a loop must have a unique ID. A valid shelf ID is from 1 through 6, with disk shelf 1 connected to the storage appliance. The default for an invalid shelf ID is 7. If you install a second or third loop of disk shelves (not available with N3700 A10 or A20), the disk shelf IDs in each loop must start at 1. The ID of a single disk shelf should be 1. Each disk shelf is shipped with its assigned ID set on its back panel. You must ensure that the disk shelf has the correct ID number on the label. The ID label is on the right side of the disk shelf. Relabel the shelf if there is no or a wrong ID label. Figure 18-1 on page 183 shows the shelf IDs and the ID label on each shelf. 182 The IBM TotalStorage Network Attached Storage N Series

213 Draft Document for Review December 6, :59 am 7129_initial_inst_single_node_dp_0819.fm Shelf ID 4: (3rd EXP600) Shelf ID 3: (2nd EXP600) Shelf ID 2: (1st EXP600) Shelf ID 1: (Filerhead) Figure 18-1 Sample configuration with four labeled shelves Note: If you enter a shelf ID that is not from 1 through 7, the drive addresses default to those of a shelf with the ID switch set to 7 even though the shelf ID indicator in the front operation panel displays a dash (-). The disks in the N3700 appliance or EXP600 are mounted in so-called Bays. Each shelf can hold up to 14 disks. The physical bay numbering starts from the right side to the left side (0, 1,...13). See Figure 18-2 for the bay numbering of a shelf. Figure 18-2 Drive addressing and bay numbers Loop IDs In addition to identifying the disk shelf ID and the direction of the drive bays, the ID label on the right side of the disk shelf includes the loop ID. The loop ID identifies the disks in the disk shelf. Device IDs The Device ID refers to the Loop ID number of the disk and is determined by the shelf number and the bay the device is installed in. Chapter 18. Single-node setup 183

214 7129_initial_inst_single_node_dp_0819.fm Draft Document for Review December 6, :59 am Drive addressing The drive addressing in Data ONTAP is done by the Device ID (device_id) and the associated path (path_id). Device_id is an integer between 0 and 126 (FC-AL addressing) and path_id stands for the host adapter or FC-port where the disk shelf is attached to. The following format is used: path_id.device_id Path_id refers to the host adapter number to which the disks is associated. In a N3700 configuration this is 0b. Device_id is the associated number for the drive in a shelf. Table 18-1 shows the seven disk Shelf IDs and the Device IDs of each disk in the shelves. The Drive Bay Number represents the location of a disk in each shelf. Table 18-1 Shelf IDs and associated Device IDs Drive Bay Number Shelf ID *, ** Device ID / Loop ID * The following IDs are reserved: 0-15, 30-31, 46-47, 62-63, 78-79, 94-95, **Shelf ID settings 0, 8 and 9 are displayed as - in the OPS Panel For example, in a N3700 environment, the disk drive located in Bay 0 (Drive Bay Number: 0) of the first shelf (Base unit, Shelf ID:1) would have the Device ID 16. Because the N3700 uses 1 Loop, the Device would be identified by Data ONTAP as: 0b.16 See the Figure 18-3 on page 185 for details on identifying drives. 184 The IBM TotalStorage Network Attached Storage N Series

215 Draft Document for Review December 6, :59 am 7129_initial_inst_single_node_dp_0819.fm ID 2 Shelf ID 1 ID 1 Figure 18-3 Sample Device ID identification In a single-node configuration such as the N3700 Model A10 the filer owns all disks. See Disk ownership for more details on disk ownership in a clustered environment. SES disks The SCSI Enclosure Services (SES) program monitors the disk shelf. The associated drives are called SES drives, the bays where these drives reside are called SES bays. Bay 0 an 1 of each shelf are designated SES Bays. These Bays must be filled with disks for enclosure monitoring to work. (Figure 18-4 on page 186) Chapter 18. Single-node setup 185

216 7129_initial_inst_single_node_dp_0819.fm Draft Document for Review December 6, :59 am SES Bays/Disks { Shelf 2 Shelf 1 Bays (0-13) Figure 18-4 SES disks Disk types Each hard disk is known to the operating system. Depending on their usage the type can be: spare Disks are unused data Disk is used as a data disk parity Disk is used as RAID 4 parity disk dparity Disk is used as RAIDDP disk (diagonal, double parity) partner Disk is assigned to other node (cluster configuration) broken Disk is broken (failed, removed logically from system) reconstructing (temporary) Aggregates and RAID groups Several disks with type: data, parity (RAID4 and RAID-DP), and dparity (in a RAID-DP configuration)) build a RAID group. Minimum number of disks in a RAID group for a N3700 is 2 for a RAID4, and 3 for a RAID-DP configuration. Maximum number of disks in a N3700 for RAID-DP is 26 data, and 2 parity disks, and for RAID4 is 13 data disk plus one parity disk. An Aggregate is organized by one or more RAID groups and is used as a container for flexible volumes. The default RAID type for an aggregate is RAID-DP but can be changed to RAID4. See Figure 18-5 on page 187 for details on Aggregates and raid groups. 186 The IBM TotalStorage Network Attached Storage N Series

217 Draft Document for Review December 6, :59 am 7129_initial_inst_single_node_dp_0819.fm Aggregate A with 3 RAID Groups (RAID-DP) D D D D P DP RAID Group 1 4 Data, 2 Parity D D D D P DP RAID Group 2 4 Data, 2 Parity D D D D P DP RAID Group 3 4 Data, 2 Parity Aggregate B with 1 RAID Group (RAID4) D D D P RAID Group 1 3 Data, 1 Parity Figure 18-5 Aggregates - Example RAID-DP and RAID4 Additional disks can be added after creating Aggregates. Note: Consider RAID groups may be reorganized after a disk failure. Spare disk become data disks and new disks become spares. That means a RAID group may consist of disks from different Bays compared the layout you initially created. Chapter 18. Single-node setup 187

218 7129_initial_inst_single_node_dp_0819.fm Draft Document for Review December 6, :59 am 18.2 Hardware setup Now we describe the installation and setup of the System Storage N3700 Series Model A10, a single-node base unit and the attachment of up to four System Storage EXP600 Expansion Units Before you begin Before starting the setup of the N3700, is a useful practice to create a plan. This the plan should include to write down important details about the configuration. You can find helpful information in Appendix A, Setup Worksheets and Cabling Diagrams on page 1. Note: Check the interoperability matrix for supported configurations Attention: The N3700 and EXP600 use sensitive electronic components. To avoid damage, put on an antistatic wrist strap and grounding leash for the installation Tools and equipment The 2863 Model A10 shipment package includes the N3700 base unit, including the power supplies, power cords, console cable (RJ45 - DB9) for a serial console, a Fibre Channel terminator, publications, Setup kit, and software licenses. Before you install the hardware make sure you have the appropriate tools and equipment assembled that you will need to use (customer-supplied items). Flathead screwdriver and both #1 and #2 Phillips head screwdrivers Pointed tool (used for setting termination switches) ASCII terminal, console (for example, a laptop or PC with serial port) Null modem cable (to connect to the console serial port) Ethernet local area network (LAN) cables required for file serving network Fibre Channel cables for EXP600 or tape connection (cables to EXP600 included) Note: The N3700 provides a DB-9 to RJ-45 console adapter. You can use this connection to connect a console to the Filer. Keep in mind to use a a null modem cable. Use the N3700 Hardware and service Guide and Installation GA and Installation and Setup Instructions for an System Storage N3700 or an EXP600 Expansion Unit, GA for more information while mounting and installing the N3700 Filer The N3700 A10: Filerhead We describe the hardware installation of the single-node N3700 (model A10) Filer. First we want to explain the components of the system. Customer Replaceable Units The base unit has various Customer Replaceable Units (CRUs), which are designated as either Tier 1 or Tier The IBM TotalStorage Network Attached Storage N Series

219 Draft Document for Review December 6, :59 am 7129_initial_inst_single_node_dp_0819.fm Installation of Tier 1 CRUs is a customer responsibility. The following parts have been designated as Tier 1 CRUs: AC power cord AC power supply Disk drive Cables Memory modules Wrap plug Operator Panel Tier 2 CRUs can be installed by either customer or IBM. See Announcement letter for more details on Tier 1 and Tier 2 CRUs. Components The N3700 Filer is shipped with two power supplies (PSU1 and PSU2), which can be located at the back side on the right- and leftmost sides the N3700 Filer. Each power supply has its own AC power cord. Note: We recommend that you use independent, separate and grounded power sources. The CPU module can be found in the middle on the bottom of the back side (Figure 18-6). Appliance and Disk Loop Speed Switch Figure 18-6 N3700 Model A10 back side CPU Module Shelf ID Switch Power Supplies The CPU Module hosts several LEDs and ports. Ports are: Two Ethernet Ports (green) Console port (purple) Two Fibre-Channel Ports Fibre Channel Port 1 (optical, orange) is used for third party devices (such as tape for backup) Fibre Channel Port 2 (copper, blue) is used for connections to EXP600 expansion modules Chapter 18. Single-node setup 189

220 7129_initial_inst_single_node_dp_0819.fm Draft Document for Review December 6, :59 am The N3700 head unit provides two independent Fibre Channel ports identified internally as port 0b and 0c. Port 0b (copper) is used to communicate with disks (EXP600). Port 0c is an external (optical) port which can be configured in two modes: Initiator mode to communicate with tape backup devices such as in a Tape SAN backup configuration. Target mode to communicate with SAN hosts or a front end SAN switch. Fibre Channel port 0c does not support mixed initiator/target mode. The default mode for port 0c is initiator mode. If you have not licensed the FCP service and you want to use port 0c in initiator mode, you do not need to configure the port. N series cluster configurations must be cabled to switches that support public loop topology. To connect a N series cluster to a fabric topology that includes switches that only support point-to-point topology, such as McDATA Director class switches, you must connect the cluster to an edge switch and use this switch as a bridge to the fabric. Check with IBM Support for Target Mode Setup. The shelf ID switch and appliance and disk shelf loop switch can be found above the CPU module. Connections and Cabling The illustration in Figure 18-7 on page 190 shows the CPU module ports. The following procedure describes the tasks in order to do the hardware setup of the N3700 Model A10. After unpacking and mounting of the N3700 in the rack, Make sure then3700 system is turned off Set the shelf ID to 1 as in Figure 18-8 on page 191 Attention: The power to N3700 must be off before changing the shelf ID. Console Port (purple) Ethernet Ports (green) Fibre-channel Ports: FC 2: Blue, EXP600 FC 1: Orange, 3rd party device Figure 18-7 N3700 CPU Module 190 The IBM TotalStorage Network Attached Storage N Series

221 Draft Document for Review December 6, :59 am 7129_initial_inst_single_node_dp_0819.fm Figure 18-8 Setting the Shelf ID Set the 1Gb/2Gb switch to the 1 Gb position. You must set it to 1. See Figure 18-9 Figure 18-9 Setting the 1Gb/2Gb switch Connect the Ethernet cable to the (left, green) Ethernet port in the middle of the Controller module. See Figure 18-7 on page 190 Connect the console cable (DB9 - RJ45 converter) to the console port (purple) on the back of the base unit (Figure 18-10). If you are attaching a third-party device, such as a tape backup or a Fibre Channel switch, leave the Fibre Channel port (orange port) un-terminated. Please refer to Connecting to third-party devices in the N3700 Hardware and Service Guide GA , for details. Connect console cable Figure Connect console cable Chapter 18. Single-node setup 191

222 7129_initial_inst_single_node_dp_0819.fm Draft Document for Review December 6, :59 am If no third-party device will be attached to the Fibre Channel port (orange port), then insert the Fibre Channel terminator or loopback terminator into the Fibre Channel port (orange port) at the far left of the CPU module. (Figure 18-11) Figure Connecting Ethernet, Console Cable, and Fibre Channel terminator If no EXP600 disk shelf will be attached, set the terminate switch on the CPU Module to on. See Figure Figure No EXP600 attached If no EXP will be attached, proceed as follows (system is still powered off): Plug in power cord to left and right Power supplies. Fasten the power cords with the hold-down clamps. Plug the other ends of the power cords into grounded power source. If you will connect one or more EXP600 disk shelves, set the terminate switch on the CPU Module to OFF. (Figure 18-13) Figure EXP 600 attached Proceed if you are connecting additional EXP600 disk shelves. Otherwise go to Chapter 20, Subsequent Setup on page The IBM TotalStorage Network Attached Storage N Series

223 Draft Document for Review December 6, :59 am 7129_initial_inst_single_node_dp_0819.fm Note: The N3700 supports hot adding a disk shelf. Please refer to the Hardware and Service Guide, GA for details and restrictions. Attention: The N3700 and the EXP600 shelves use sensitive electronic components. To avoid damage, put on an antistatic wrist strap and grounding leash for the installation. If one or more EXP600s will be attached proceed as follows (remind, system is still powered off): Confirm the shelf ID of the base unit is set to 1. Connect Fibre Channel (CPU Module - blue port) to the first EXP600 disk shelf (ESH2 module B port = In Port). See Figure on page 193 and Appendix 0-1, Cabling N3700 Model A10 to EXP shelves on page 8. Attach the grounding cable (EXP600 - base shelf). Figure Connecting EXP600 shelves Set the EXP600 shelf ID to 2 and set the disk shelf loop speed to 1 Gb. If you attach only one EXP600, then plug in the power cord to left and right Power supplies, fasten the power cords with the hold-down clamps, and plug the other ends of the power cords into grounded AC power source. Skip adding a second EXP600 and proceed with initial configuration. If you are adding a second EXP600 disk shelf, Connect Fibre Channel ESH B module port (Out port) of the first EXP600 (ID 2) to the ESH2 B module In port of the second EXP600 module (ID 3). (Figure on page 194) Attach the grounding cable (EXP600 #1 and #2). Chapter 18. Single-node setup 193

224 7129_initial_inst_single_node_dp_0819.fm Draft Document for Review December 6, :59 am ID 3 ID2 Base Unit ID 1 Figure Connecting additional EXP600 shelves Set the EXP600 shelf ID (second EXP600) to 3 and set the disk shelf loop speed to 1 Gb. Repeat the previous steps if a third EXP600 (ID 4) will be attached. Plug in power cord to left and right Power supplies, fasten the power cords with the hold-down clamps, and plug the other ends of the power cords into grounded power source. Skip adding a second EXP600 and proceed with setup and configuration. 194 The IBM TotalStorage Network Attached Storage N Series

225 Draft Document for Review December 6, :59 am 7129_initial_inst_single_node_dp_0819.fm 18.3 Installation and initial configuration This section describes the initial configuration after booting the N Series Filer for the first time. After turning on your system for the first time, run diagnostics to make sure that it is functioning properly and to diagnose any hardware problems N3700 Initial Setup Make sure the hardware setup of the N3700 System is done. Please refer to Hardware setup on page 188 for details on the hardware setup: shelf IDs, speed settings, grounding, termination and cabling. The System Storage N Series Filer is shipped with a complete Version of the System Storage N series software. During the setup procedure, several files will be updated with the information provided during the configuration procedure. These files are: /etc/rc /etc/exports /etc/hosts /etc/hosts.equiv /etc/dgateways /etc/nsswitch.conf /etc/resolf.conf Setup Method You can do the system setup using various interfaces. You may use the serial connection with a console PC for the very first configuration steps. Console PC Use the null modem cable to connect the console adapter cable of the N3700 Filerhead to the serial port of the console PC or laptop. The console adapter cable should be plugged into the console Port (purple) of the CPU Module of the Filer (Figure 18-7 on page 190) Console Settings Baud: 9600 Data bit: 8 Parity: None Stop Bits: 1 Flow control: None Web browser If DHCP is enabled, a Telnet client can be used for the setup. To keep the installation simple, we skip this and continue with the console setup. Power on N3700 Power on the N3700 system in the following order: 1. EXP600 (Expansion disk shelves) 2. Appliance (Base unit) Chapter 18. Single-node setup 195

226 7129_initial_inst_single_node_dp_0819.fm Draft Document for Review December 6, :59 am Tip: The default spin-up time for all disks in the appliance is 60 seconds. Reduce this spin-up time to 20 seconds by turning on the switches of both power supplies within five seconds of each other. Setup and configuration After the system is connected properly and powered on for the first time, the initial setup starts with basic filer settings. Enter the settings according to your planning worksheets. Please enter your new hostname [ ]: itso-n1 See screenshots Figure 18-16, Figure on page 197 and Figure on page 197 for more detail. Note: During the setup, you will be asked to continue the configuration using the Web interface. If you select No, the setup continues using the command line interface. If you select Yes, you could use the Web Interface through the IP address which was displayed before. Time zone settings can be left by default and can be configured later after initial configuration The initial setup starts automatically when the system is powered on for the first time. The basic configuration depends on which licenses are installed on your system by factory. The following screenshots may vary from your environment depending on which licenses are installed, the Data ONTAP software level, and the hardware you are using. For example when you did not obtain a CIFS license the CIFS setup won t show up during the initial configuration. Figure Initial Setup: Host and network definitions 196 The IBM TotalStorage Network Attached Storage N Series

227 Draft Document for Review December 6, :59 am 7129_initial_inst_single_node_dp_0819.fm Figure Initial Setup: WINS, CIFS and Filer settings Figure Initial setup: Active Directory, Domain Controller or Workgroup settings After the initial setup procedure is completed we will logon to the appliance: Chapter 18. Single-node setup 197

228 7129_initial_inst_single_node_dp_0819.fm Draft Document for Review December 6, :59 am login: root password: <password_you_chose_during_setup> Now we can assign all disks to the N3700 Controller, because the Model A10 is a single-controller Appliance. Use the following command to do so: disk assign all Verify if all disks are assigned correctly: disk show -v (or: sysconfig -r) Figure shows the output of the sysconfig -d command. Figure CLI Output: Sysconfig -d Verify license settings and add missing or additional obtained licenses. All licenses you bought with the hardware should already be installed on the Filer. Use the following commands to show license information and add additional licenses to your system. license license add wwxxyyzz Finally, reboot the system: reboot Pre-creating a Filer Domain account If the N3700 Filer has to join an Active Directory domain, this section gives a brief overview about the steps which have to been taken. This assumes to have a valid CIFS license When you use the N3700 cifs setup command to create the filer domain account, you must assign certain permissions (listed later in this section) on the filer container. Permissions for adding a storage system to an Active Directory domain are the same as permissions required to add any Windows server. The storage system administrator account 198 The IBM TotalStorage Network Attached Storage N Series

229 Draft Document for Review December 6, :59 am 7129_initial_inst_single_node_dp_0819.fm must have those permissions for the Active Directory container in which you are installing your storage device. To pre-create and configure a domain account for the filer, perform the following steps. Note: The following procedure describes Windows 2000 Server tasks. Details of this procedure may vary on other Windows server versions. Action: 1. In the Active Directory Users and Computers View menu, ensure that the Advanced Features menu item is checked. 2. In the Active Directory tree, locate the OU for your filer, right-click, and choose New > Computer. 3. Enter the filer (domain account) name. Note the filer name to make sure you enter it precisely when you run cifs setup later. 4. Specify the name of the filer administrator account to be allowed to "add this computer to the domain." 5. Right-click on the computer account you just created and choose Properties from the pop-up menu. 6. Click the Security tab. 7. Select the user or group that will add the filer to the domain. 8. In the Permissions list, ensure that the following check boxes are enabled: Change Password and Write Public Information. 9. Run cifs setup. At the prompt Please enter the new hostname, enter the filer name you specified in Step 3. For more details please refer to the REFERENCE TO ACTIVE DIRECTORY INTEGRATION Web Logon to the Filer Once the setup has completed successfully, you can log on to the N3700 system via your Web browser. This setup may be done after you choose continue with Web Setup during initial configuration or if you want to change networking settings of your Filer. Note: There may be restriction regarding the Web administration of the Filers regarding Browser and Java Versions. Currently they are: Netscape Navigator 4.51 or later Alternatively Microsoft Internet Explorer 4.0 or later Java and JavaScript must be enabled. Other Browsers with Java / JavaScript may work too. In addition to Microsoft Internet Explorer 6.0, we used the Mozilla Firefox Browser (Version ) The Setup Wizard can be used for additional configuration tasks. The following step-by-step procedure describes this approach. First click on FilerView, logon to the Filer using the hostname or IP address as described in Figure on page 196. Use the user root and password. You may use the Information from your worksheets. See Figure on page 200, and Figure on page 200for details. Chapter 18. Single-node setup 199

230 7129_initial_inst_single_node_dp_0819.fm Draft Document for Review December 6, :59 am Use the following URL: Figure System Storage N series Web logon Figure System Storage N series Web interface Select Wizards (left bottom of FilerView) then Setup Wizard as shown in Figure on page The IBM TotalStorage Network Attached Storage N Series

231 Draft Document for Review December 6, :59 am 7129_initial_inst_single_node_dp_0819.fm Figure Starting the Setup Wizard Figure on page 202 contains settings from the initial setup, verify this settings and make changes when needed. Tip: You might use the shortcut: Chapter 18. Single-node setup 201

232 7129_initial_inst_single_node_dp_0819.fm Draft Document for Review December 6, :59 am Figure Setup Wizard: Basic settings Figure is used to provide an Address, Location of the Filer (datacenter, branch office, etc.) and the Administrative Host name. Figure Setup Wizard: , Location and Administrative Host 202 The IBM TotalStorage Network Attached Storage N Series

233 Draft Document for Review December 6, :59 am 7129_initial_inst_single_node_dp_0819.fm Figure 18-25: Filer Setup Wizard - Network Services enables DNS- and NIS Services and Gateway settings. Figure Setup Wizard - DNS, NIS, Gateway settings Figure on page 204 Setup Wizard: IP configuration and MAC addresses is where you provide information on the network configuration such as IP addresses, network masks, network type, and WINS settings. Chapter 18. Single-node setup 203

234 7129_initial_inst_single_node_dp_0819.fm Draft Document for Review December 6, :59 am Figure Setup Wizard - IP configuration, MAC addresses Figure 18-27: Setup Wizard - Protocol configuration (W2000) is used to change Windows Domain, Windows Administrator, Windows password, WINS and NFS settings. Figure Setup Wizard - Protocol configuration (W2000) Figure on page 205 is where you verify the settings in the confirmation panel and proceed with the Setup Wizard by clicking Next. 204 The IBM TotalStorage Network Attached Storage N Series

235 Draft Document for Review December 6, :59 am 7129_initial_inst_single_node_dp_0819.fm Figure Setup Wizard: Confirmation Finally the Setup Wizard finishes. You should get a successful submitted message (Figure 18-29) stating that the configuration settings have completed. Please wait a couple of minutes since the filer will be updated with the new settings. Figure Setup Wizard -Status Message Chapter 18. Single-node setup 205

236 7129_initial_inst_single_node_dp_0819.fm Draft Document for Review December 6, :59 am 206 The IBM TotalStorage Network Attached Storage N Series

237 Draft Document for Review December 6, :59 am 7129_initial_inst_cluster_node_dp_0824.fm 19 Chapter 19. Cluster Setup Before you begin reading the cluster setup and installation chapter, you need to become familiar with the System Storage N series Filers. We recommend that you start with the single-node setup chapter if you have not already read it. In this chapter we discuss how to setup clustered systems such as the IBM Total Storage N 3700 Model 2863-A20. This chapter covers: Hardware setup, such as cabling and connecting to Local Area Networks (LAN s) and expansion shelves such as the System Storage EXP600 Expansion Unit Initial setup of the cluster nodes and cluster configuration. Disk ownership in a clustered environment. Details on basic disk addressing, which is used by the Data ONTAP Operating System, can be found in , System Storage N Series disk handling basics on page 182. We start our planning and installation procedure after the systems have been unpacked and mounted in the 19 rack. For additional information regarding system administration (creating volumes, shares, and so on) refer to Chapter 2, N series Filer Administration on page 11. Copyright IBM Corp All rights reserved. 207

238 7129_initial_inst_cluster_node_dp_0824.fm Draft Document for Review December 6, :59 am 19.1 Planning the implementation Regardless of weather or not you are installing a single-node or clustered systems environment, planning and preparation are of particular importance. Planning and preparation save time during installation and configuration, and often prevent problems. Gathering Information and planning should be done before the systems are installed. Remember, we do not cover the physical planning and mounting the appliances and expansions into racks in this redbook. You can obtain more information on those subjects from the N3700 Hardware and Service Guide GA Planning Worksheets The planning worksheets are divided into several sections. Basically, a cluster consists of two nodes. Therefore, you need to plan for each specific node, and, you also need to complete the additional disk ownership worksheet. Refer to Disk ownership on page 208 for detailed information on disk ownership. Initial setup worksheet for a clustered configuration (N A20) Disk ownership worksheet Hardware setup Initial configuration SEE Appendix A, Setup Worksheets and Cabling Diagrams on page 1 and Appendix A, Setup Worksheets and Cabling Diagrams on page 1 for worksheets. PLANNED: ADDITIONAL WORKSHEETS FOR AGGEGATES, VOLUMES, SHARES, QTREES, 19.3 Disk ownership A System Storage N Series cluster consists of two nodes that are able to takeover/failover their resources or services to the associated counterpart nodes. This functionality assumes that all resources can be accessed by each node. That means both nodes must have access to all disks physically (cabling) and logically (cluster software). The N3700 Model A20 combines both cluster nodes in one shelf. You can obtain details on Data ONTAP disk addressing from Drive addressing on page 184. Each disk has a preferred ownership and stays on the owner node until a takeover occurs. Ownership can be either to node A or node B. (Figure 19-1 on page 209) 208 The IBM TotalStorage Network Attached Storage N Series

239 Draft Document for Review December 6, :59 am 7129_initial_inst_cluster_node_dp_0824.fm Ownership Node A A A B B B A A A B B B A B Shelf 2 Shelf 1 Ownership Node B A B A B A B A B A B A B A Figure 19-1 Example disk ownership Node A and Node B The table (Table 19-1) shows a sample disk ownership. Table 19-1 Disk ownership sample Bay Number Disk Shelf Node A X X X X X X X Node B X X X X X X X Disk Shelf Node A X X X X X X X Node B X X X X X X X Disk Shelf Node A Node B X Markes the Node who owns the disk - means no disk in place. Disks in Bay 0 and 1 (Shelf 1 and 2) are SES disk Note: Disk ownership is changed after initial setup, or after adding additional disks. Use the disk assign command to change ownership, and storage disk show to determine which Node owns which disk. Disk assignment can be checked after setup using the CLI command storage show disk, or the web interface FilerView -> Storage -> Manage. Figure 19-2 on page 210 contains disks in a sample clustered environment. Chapter 19. Cluster Setup 209

240 7129_initial_inst_cluster_node_dp_0824.fm Draft Document for Review December 6, :59 am Figure 19-2 FilerView: Manage Disks Tip: Plan the disk assignment carefully. Load balancing or active/passive configurations may be reflected in the disk assignment procedure. 210 The IBM TotalStorage Network Attached Storage N Series

241 Draft Document for Review December 6, :59 am 7129_initial_inst_cluster_node_dp_0824.fm 19.4 Hardware setup Now we describe the installation and setup of the System Storage N3700 Series Model A20, the clustered filer appliance, and the attachment of the System Storage EXP600 Expansion Unit Before you begin Before starting the installation, we recommend you read the Chapter 18, Single-node setup on page 181, and complete the installation worksheets in Appendix A, Setup Worksheets and Cabling Diagrams on page 1. Tip: Check the interoperability matrix for supported configurations Attention: The N3700 and EXP600 use sensitive electronic components. To avoid damage to them, wear an antistatic wrist strap and grounding leash for the installation Tools and equipment The 3700 Model A20 shipment package includes: N3700 base unit (equipped with two CPU modules) Power supplies Power cords Two console cables (RJ45 - DB9) for a serial console Fibre Channel terminators Publications Setup kit Software licenses. Before you install the hardware, make sure you already have the appropriate tools and equipment assembled to use (customer-supplied items): Flathead screwdriver, and #1 and #2 Phillips head screwdrivers Pointed tool (to use for setting termination switches) ASCII terminal, console (for example, a laptop or PC with serial port) Null modem cable (to connect to the console serial port) Ethernet local area network (LAN) cables required for file serving network Fibre Channel cables for EXP600 or tape connection (cables to EXP600 included) Note: The N3700 A20 provides two DB-9 to RJ-45 console adapters, one for each CPU module. You can use this connection to connect a console to the Filer. Remember to use a null modem cable. Use the N3700 Hardware and service Guide and Installation GA and Installation and Setup Instructions for an System Storage N3700 or an EXP600 Expansion Unit, GA for more information while mounting and installing the N3700 Filer The N3700 Model A20: - Filerhead This chapter describes the hardware installation of clustered N3700 Model A20 appliance. First we want to explain the components of the system. Chapter 19. Cluster Setup 211

242 7129_initial_inst_cluster_node_dp_0824.fm Draft Document for Review December 6, :59 am Components The N3700 Model A20 Filer is shipped with two power supplies (PSU1 and PSU2) which are located on the back panel on the right- and leftmost sides of the N3700 appliance. Each power supply has its own AC power cord. Note: We recommend that you use independent, separate power sources. The CPU module Node B can be found in the middle, bottom of the back panel, module Node A on the middle, top of the shelf. (Figure 19-3) CPU Module Node A Shelf ID Switch Appliance and Disk Loop Speed Switch CPU Module Node B Power Supplies Figure 19-3 N3700 Model A20 back panel Each of the CPU modules (Figure 19-4 on page 213) hosts several LEDs and ports. Ports are: Two Ethernet Ports (green) Console port (purple) Two Fibre Channel Ports Fibre Channel Port 1 (optical, orange) is used for third party devices (such as tape for backup) Fibre Channel Port 2 (copper, blue) is used for connections to EXP600 expansion modules The shelf ID switch and appliance and disk shelf loop switch can be found between the CPU modules. The illustration in Figure 19-4 on page 213 shows the CPU module ports. Next, we describe installing the N3700 cluster. 212 The IBM TotalStorage Network Attached Storage N Series

243 Draft Document for Review December 6, :59 am 7129_initial_inst_cluster_node_dp_0824.fm Console Port (purple) Ethernet Ports (green) Fibre-channel Ports: FC 2: Blue, EXP600 FC 1: Orange, 3rd party device Figure 19-4 N3700 CPU module Connections and Cabling After unpacking and mounting the N3700 and all EXP600s in a 19 rack, we will first install the base unit (two CPU modules), then we will describe the attachment of the EXP600 units. Make sure the N3700 system and its EXP600 units turned off Set the shelf ID of the N3700 base unit to 1 (Figure 19-5) Attention: The power to N3700 must be turned off before changing the shelf ID. Figure 19-5 Setting the Shelf ID Set the 1Gb/2Gb switch to the 1 Gb position. You must set it to 1. (Figure 19-6 on page 214) Chapter 19. Cluster Setup 213

244 7129_initial_inst_cluster_node_dp_0824.fm Draft Document for Review December 6, :59 am 1/2 Gb Switch Figure 19-6 Setting the 1Gb/2Gb switch Connect the Ethernet cable to the (green) Ethernet port in the middle of the Controller Modules Node B repeat this for Node B. See Figure Connect the console cables (DB9 - RJ45 converter) to the console port (purple) from Node B and Node A on the back of the appliance. (See Figure 19-7) If you are attaching a third-party device, such as a tape backup or a Fibre Channel switch, leave the Fibre Channel ports (orange port) un-terminated. Please refer to Connecting to third-party devices in the N3700 Hardware and service Guide GA for details. If no third-party device will be attached to the Fibre Channel ports (orange ports), then insert the Fibre Channel terminator or loopback terminator into the Fibre Channel ports (orange port) at the CPU Modules (Node A and Node B). (Figure 19-7) Figure 19-7 Connecting Ethernet, console cables and Fibre Channel terminator If no EXP600 disk shelf will be attached, set the terminate switch on both CPU Modules (Node B and Node A) to on See Figure 19-8 on page The IBM TotalStorage Network Attached Storage N Series

245 Draft Document for Review December 6, :59 am 7129_initial_inst_cluster_node_dp_0824.fm Figure 19-8 No EXP600 attached If no EXP600 will be attached, proceed as follows (system is still powered off): Plug in power cord to left and right power supplies. Fasten the power cords with the hold-down clamps. Plug the other ends of the power cords into grounded AC power source. Proceed with initial setup Initial configuration on page 218. If you will connect one or more EXP600 disk shelves, set the terminate switch on the CPU Modules (Node B and Node A) to OFF. See Figure Figure 19-9 EXP600 attached Proceed if you are connecting additional EXP600 disk shelves. Otherwise, proceed with Initial configuration on page 218. Note: The N3700 supports hot adding a disk shelf. Please refer to the N3700 Hardware and service Guide GA for details and restrictions. Attention: The N3700 and the EXP600 shelves use sensitive electronic components. To avoid damaging them, put on an antistatic wrist strap and grounding leash for the installation. If one or more EXP600s will be attached, proceed as follows (remember, during the initial hardware setup the N3700 system is still powered off) see Figure on page 216. The steps are as follows: 1. Confirm that the shelf ID of the base unit is set to 1 2. Connect Fibre Channel Port (blue) of Node B (lower CPU Module) to the first EXP600 disk shelf ESH2 module B (In Port) 3. Connect Fibre Channel Port (blue) of Node A (upper CPU Module) to the first EXP600 disk shelf ESH2 module A (In Port) 4. Attach the grounding cable (EXP600 - base shelf) Chapter 19. Cluster Setup 215

246 7129_initial_inst_cluster_node_dp_0824.fm Draft Document for Review December 6, :59 am Figure Connecting EXP600 shelves 5. Set the EXP600 shelf ID to 2 and set the disk shelf loop speed to 1 Gb. 6. If you attach only one EXP600, then plug in the power cord to left and right power supplies, fasten the power cords with the hold-down clamps and plug the other ends of the power cords into grounded AC power source. Skip adding a second EXP600 and proceed with initial configuration. 7. If you are adding additional EXP600 shelves, the cable: EXP600 (ID 2) ESH2 Mod. B (Out-Port) to EXP600 (ID 3) ESH2 Mod. B (In-Port) EXP600 (ID 2) ESH2 Mod. A (Out-Port) to EXP600 (ID 3) ESH2 Mod. A (In-Port) 8. Attach the grounding cable (EXP600 #1 and #2). See Figure on page The IBM TotalStorage Network Attached Storage N Series

247 Draft Document for Review December 6, :59 am 7129_initial_inst_cluster_node_dp_0824.fm ID 3 ID2 Base Unit (ID1) Figure connecting additional EXP600 shelves 9. Set the EXP600 shelf ID (second EXP600) to 3 and set the disk shelf loop speed to 1 Gb. 10.Repeat the previous steps if a third EXP600 will be attached. 11.Plug in power cord to left and right Power supplies, fasten the power cords with the hold-down clamps and plug the other ends of the power cords into grounded AC power source. Chapter 19. Cluster Setup 217

248 7129_initial_inst_cluster_node_dp_0824.fm Draft Document for Review December 6, :59 am 19.5 Initial configuration Next we describe the initial configuration after booting the N Series Filer for the first time. After turning on your system for the first time, run diagnostics to make sure that it is functioning properly and to diagnose any hardware problems N3700 Initial setup Make sure the hardware setup of the N3700 System is done. Please refer to Hardware setup on page 211 for details on the hardware setup. (shelf IDs, speed settings, grounding, termination, and cabling). Note: Repeat the initial configuration both nodes (Node A and Node B). The System Storage N Series Filer is shipped with a complete Version of the System Storage System Storage N series software. During the setup procedure, several files will be updated with the information provided during the configuration procedure. These Files are: /etc/rc /etc/exports /etc/hosts /etc/hosts.equiv /etc/dgateways /etc/nsswitch.conf /etc/resolf.conf Setup method The setup of the system can be done using various interfaces. You may use the serial connection with a console PC for the very first configuration steps. Console PC Use the null modem cable to connect the console adapter cable of the N3700 Filerhead to the serial port of the console PC or laptop. The console adapter cable should be plugged to the console port (purple) of the CPU Module of the Filer (See Figure 19-4 on page 213). Console Settings Baud: 9600 Data bit: 8 Parity: None Stop Bits: 1 Flow control: None Web browser If DHCP is enabled, a Telnet client can be used for the setup. To keep the installation simple, we skip this and continue with the console setup. Power on N3700 Power on the N3700 System in the following order: 1. EXP600 (Expansion Disk shelves) 218 The IBM TotalStorage Network Attached Storage N Series

249 Draft Document for Review December 6, :59 am 7129_initial_inst_cluster_node_dp_0824.fm 2. N3700 Appliance (Base unit) Tip: The default spin-up time for all disks in the appliance is 60 seconds. Reduce this spin-up time to 20 seconds by turning on the switches of both power supplies within 5 seconds of each other. Setup and configuration After the System is connected properly and for the first time powered on, the initial setup starts with basic filer settings. Please enter the settings according to your planning worksheets. The initial configuration must be done on both nodes. Proceed with the following steps on Node A and Node B. Please enter your new hostname [ ]: itso-n1 See window captures Figure on page 220, Figure on page 221, and Figure on page 221 for more detail. Note: During the setup, you will be asked to continue the configuration using the Web interface. If you select No, the setup continues using the command line interface. If you select Yes, you can use the Web interface through the IP address which was displayed before. Time zone settings can be left by default and can be configured later after initial configuration The initial setup starts automatically when the system is powered on for the first time. The basic configuration depends on which licenses are installed on your system by factory. The following screenshots may vary from your environment depending on which licenses are installed, the Data ONTAP software level, and the hardware you are using. For example when you didn t obtain a CIFS license the CIFS setup won t show up during the initial configuration Chapter 19. Cluster Setup 219

250 7129_initial_inst_cluster_node_dp_0824.fm Draft Document for Review December 6, :59 am Figure Initial Setup: Host and network definitions 220 The IBM TotalStorage Network Attached Storage N Series

251 Draft Document for Review December 6, :59 am 7129_initial_inst_cluster_node_dp_0824.fm Figure Initial Setup: WINS, CIFS and user authentication settings Figure Initial Setup: Active Directory, Domain Controller, and Workgroup settings After the initial setup procedure is done, we log on to the appliance: login: root password: <password_you_choosed_during_setup> Now we can start with disk management. First we check the disk ownership. Each cluster node should have one or two disks (SES disks). Verify the disk information. storage show disk or sysconfig -r Chapter 19. Cluster Setup 221

252 7129_initial_inst_cluster_node_dp_0824.fm Draft Document for Review December 6, :59 am Then, we assign the disks to the N3700 controllers Node A and Node B. Make sure you have your completed disk ownership worksheet. Details on the Data ONTAP disk addressing can be obtained from Drive addressing on page 184. disk assign 0b.NN After assigning the disks to the nodes, verify all disks are assigned to the correct node with sysconfig -r. See Figure Figure Command line output: sysconfig -r Verify license settings and add missing or additional licenses. All licenses you bought with the hardware should already be installed on the Filer. Use the following commands to show license information and add additional licenses to your system. Cluster licenses must be set on both Nodes. Check the license settings on both storage systems using the following command: license license add wwxxyyzz Reboot the system: reboot 222 The IBM TotalStorage Network Attached Storage N Series

253 Draft Document for Review December 6, :59 am 7129_initial_inst_cluster_node_dp_0824.fm Now we can enable the cluster on one node. The command is cluster aware and you have to execute it only on one node. See Figure cf enable Figure Command line: cf enable Verify whether or not the cluster is enabled and that the other node is up and running. See Figure cf status Figure Command line: cf status The cf monitor command provides more information on the cluster. See Figure cf monitor Figure Command line: cf monitor The takeover test can be issued through the takeover option: cf takeover If the takeover was not successful, run the Cluster Configuration Checker for NAS and iscsi - N series at the following Web site and proceed as directed: Attention: The counterpart of the Node where you issued the cf takeover command shuts down. To re-enable the node, enter the cf giveback command. Check again the status of the appliance cluster. See Figure 19-19: cf status Figure Cluster status after takeover Then bring the cluster back to its original state and check the status again: Chapter 19. Cluster Setup 223

254 7129_initial_inst_cluster_node_dp_0824.fm Draft Document for Review December 6, :59 am cf giveback cf status The cluster should be cluster enabled and both nodes up. See Figure Figure Cluster status after giveback Pre-creating a Filer Domain account If the N3700 Filer nodes have to join an Active Directory domain, refer to REFERENCE SINGLE NODE PRE-CREATING FILER DOMAIN ACCOUNT Web logon to the Filer Once the setup has successfully completed, you can logon to the N3700 System via your Web browser. This setup may be done after you choose Continue with Web Setup during initial configuration. Repeat these steps for both cluster-nodes. Note: There may be restriction regarding the Web administration of the Filers regarding Browser and Java Versions. Currently they are: Netscape Navigator 4.51 or later Alternatively Microsoft Internet Explorer 4.0 or later Java and JavaScript must be enabled. Other Browsers with Java / JavaScript may work too. In addition to Microsoft Internet Explorer 6.0, we used the Mozilla Firefox Browser (Version ) The Setup Wizard can be used for additional configuration tasks. The following step-by-step procedure describes this approach. First Logon to the Filer using the hostname or IP address as described in Figure on page 220. Use the user root and password. Use the Information from your worksheets. Use the following URL: The IBM TotalStorage Network Attached Storage N Series

255 Draft Document for Review December 6, :59 am 7129_initial_inst_cluster_node_dp_0824.fm Figure System Storage N series Web logon Figure System Storage N series Web Interface Select Wizards (left bottom of FilerView) then Setup Wizard Chapter 19. Cluster Setup 225

256 7129_initial_inst_cluster_node_dp_0824.fm Draft Document for Review December 6, :59 am Figure Starting the Setup Wizard Figure on page 227 contains settings from the initial setup. Verify these settings and make changes when needed. Tip: You might use the shortcut: The IBM TotalStorage Network Attached Storage N Series

257 Draft Document for Review December 6, :59 am 7129_initial_inst_cluster_node_dp_0824.fm Figure Setup Wizard - basic settings Figure is used to provide an Address, Location of the Filer (datacenter, branch office, etc.) and the Administrative Host name. Figure Setup Wizard - , Location and Admin Host Chapter 19. Cluster Setup 227

258 7129_initial_inst_cluster_node_dp_0824.fm Draft Document for Review December 6, :59 am Filer Setup Wizard - Network Services enables DNS- and NIS Services and Gateway settings. (Figure 19-26) Figure Setup Wizard - DNS, NIS, Gateway settings The Filer Setup Wizard - Network addresses (Figure on page 229) screen provides information on the Network configuration like IP addresses, Network masks, Network type and WINS settings. 228 The IBM TotalStorage Network Attached Storage N Series

259 Draft Document for Review December 6, :59 am 7129_initial_inst_cluster_node_dp_0824.fm Figure Setup Wizard - IP configuration, MAC addresses The following screen is used to change Windows Domain, Windows Administrator, Windows password, WINS and NFS settings.(figure 19-28) Figure Setup Wizard - Protocol configuration (W2000) Verify the settings in the Confirmation Panel and Proceed with the Setup Wizard by clicking on the Next button. (Figure on page 230) Chapter 19. Cluster Setup 229

260 7129_initial_inst_cluster_node_dp_0824.fm Draft Document for Review December 6, :59 am Figure Setup Wizard = Confirmation Finally the Setup Wizard finished and you should get a successful submitted message stating that the configuration settings have been done. Please wait a couple of minutes since the filer will be updated with the new settings. (Figure 19-30) Figure Setup Wizard -Status Message 230 The IBM TotalStorage Network Attached Storage N Series

261 Draft Document for Review December 6, :59 am 7129_initial_inst_cluster_node_dp_0824.fm Chapter 19. Cluster Setup 231

262 7129_initial_inst_cluster_node_dp_0824.fm Draft Document for Review December 6, :59 am 232 The IBM TotalStorage Network Attached Storage N Series

263 Draft Document for Review December 6, :59 am 7129_subsequent_setup_dp_0825.fm 20 Chapter 20. Subsequent Setup This chapter describes the subsequent configuration of the IBM Total storage N Series Filers. It covers the Models N3700 Model A10 and A20 (the Model N5200 and Model N5500) This section covers the following topics: Planing Administration Methods Timezone settings Data settings Verification of the installation Changing CIFS settings Copyright IBM Corp All rights reserved. 233

264 7129_subsequent_setup_dp_0825.fm Draft Document for Review December 6, :59 am 20.1 Subsequent configuration This chapter describes the subsequent configuration of the IBM Total storage N Series Filers. It covers the Models N3700 Model A10 and A Documenting a plan Before starting the setup the N3700, it is a useful practice to create a plan. Basically, you need to write down important data about the configuration. You can find helpful information in Appendix A, Setup Worksheets and Cabling Diagrams on page 1 Attention: Check the interoperability matrix for supported configurations Administration Methods Once you have successfully completed the basic setup you can logon to the N3700 System for additional configuration steps. The System Storage N Series provide various ways to administer and configure systems, depending on what the administrator prefers. You can choose from: FilerView interface DataFabric Manager software Command line interface (CLI) Mounting the root volume/mapping administration share FilerView interface The FilerView interface is a very useful and handy interface for the administration of the N series systems. It uses a standard Web browser and provides an easy follow up for managing the filer. Note: Currently the following prerequisites apply: Netscape Navigator 4.51 or later Alternatively Microsoft Internet Explorer 4.0 or later Java and JavaScript must be enabled. Other Browsers with Java / JavaScript may work too. In addition to Microsoft internet Explorer 6.0, we used the Mozilla Firefox Browser (Version ) Use the correct URL in the Browser address field: or ip-address>/na_admin Enter username and password and press Ok to proceed. See Figure 20-1 N series Web administration access on page 235 and Figure 20-2 FilerView administration window on page The IBM TotalStorage Network Attached Storage N Series

265 Draft Document for Review December 6, :59 am 7129_subsequent_setup_dp_0825.fm Figure 20-1 N series Web administration access Figure 20-2 FilerView administration window DataFabric Manager software DataFabric Manager Software is an optional software feature which provides monitoring and management features for N series Filers. DataFabric Manager allows an organization to rapidly deploy, provision, and manage a complete enterprise storage and content delivery network. It delivers a central point of control for alerts, reports and the configuration tool. DataFabric Manager capabilities are: Discovery Chapter 20. Subsequent Setup 235

266 7129_subsequent_setup_dp_0825.fm Draft Document for Review December 6, :59 am Filers Aggregates, volumes, qtrees, and LUNs Monitoring Status and health Alerts via and pager Performance and capacity analysis Storage and content caching devices Aggregates, volumes, qtrees, LUNs, disks, CPUs, and network links Configuration FilerView launch SnapMirror and SnapVault See Figure 20-3 for Web browser access to the DataFabric Manager. Figure 20-3 DataFabric Manager Note: The DataFabric Manager runs on a external server and needs an additional license. Command line interface (CLI) The CLI provides a fast and easy to use interface to the N series filers in several ways: telnet client (Figure 20-4 Telnet connection on page 237) Use the IP address or the hostname to connect via telnet to the N series Filer. 236 The IBM TotalStorage Network Attached Storage N Series

267 Draft Document for Review December 6, :59 am 7129_subsequent_setup_dp_0825.fm Figure 20-4 Telnet connection Secure shell interface (ssh) or remote shell (rsh) You may also use a rsh or ssh connection to the filer. We used the putty tool in our environment. See Figure Figure 20-5 SSH connection Serial console The serial console is used during the initial setup of the filers and may be used later for additional setup. FilerView -> command line feature (Figure 20-6 FilerView Use Command Line option on page 238) FilerView provides a command line interface which can be accessed by clicking Filer in Chapter 20. Subsequent Setup 237

268 7129_subsequent_setup_dp_0825.fm Draft Document for Review December 6, :59 am the FilerView navigation field (left part of FilerView window) then choosing Use Command Line. Figure 20-6 FilerView Use Command Line option The help command or simply? provides an overview of the commands of the CLI. See REFERENCE. The help <command> provides a brief description of what the <command> does. <command> help lists available options of the specified command. See Figure Figure 20-7 The help and? commands The manual pages can be accessed by the man command. Figure 20-8 The man command on page 239 provides a detailed description of a command and lists options (man <command>) 238 The IBM TotalStorage Network Attached Storage N Series

269 Draft Document for Review December 6, :59 am 7129_subsequent_setup_dp_0825.fm Figure 20-8 The man command Mounting the root volume/mapping administration share Mounting the root volume (/vol/vol0) or mapping the administration share (C$) enables you to edit files in the root directory and in subdirectories. Administration is limited to files as you cannot enter commands in this administration method. Remember we created in our environment during the initial setup a user account Administrator (Initial setup -> WORKGROUP setup -> local Administrator account -> Administrator. This user and password may differ from the root user. Figure 20-9 Mapping the administration share on page 240 and Figure Edit files via mapped administration share on page 240 shows how we mapped the filer itso-n1 (\\isto-n1\c$ drive letter D) on a Windows client for administration purposes. NFS /vol/vol0 is exported to the administration host for root access; /vol0/home is exported to the administration host for root access and to all clients for general access. CIFS /vol/vol0 is shared as C$; only the system administrator with the root password has read and write access to the C$ share. The /vol/vol0/home directory is shared as HOME without access granted to anyone. Chapter 20. Subsequent Setup 239

270 7129_subsequent_setup_dp_0825.fm Draft Document for Review December 6, :59 am Figure 20-9 Mapping the administration share Figure Edit files via mapped administration share Permissions are set by Data ONTAP to the default directories (/etc and /home) to prevent unauthorized access to the Filer. Unix access via NFS mounts to the /etc directory gain -rwx permissions. All other users get no permissions. The Administrator user (CIFS mapping) has read and write access to all files in the /etc directory. By default, all other CIFS users will not obtain permissions. 240 The IBM TotalStorage Network Attached Storage N Series

271 Draft Document for Review December 6, :59 am 7129_subsequent_setup_dp_0825.fm Attention: Be careful when editing files and consider making backup of files you are going to edit. Do not delete any files in the /etc directory unless instructed by support personnel. Some Files should not be changed at all. Reboot of the N series Filer is required after changing files. Figure shows a sample /etc/rc file. We mounted the C$ share to a Windows workstation and opened the rc file. #Regenerated by registry Thu Sep 08 23:12:57 CEST 2005 #Auto-generated by setup Wed Sep 7 09:35:52 PDT 2005 hostname itsosj-n1 ifconfig ns0 `hostname`-ns0 netmask route add default routed on options dns.enable off options nis.enable off timezone Europe/Berlin Figure Sample /etc/rc file The N series Filer carries some files which should be treated higher attention. Some files require a carriage return after the last entry such as: /etcpasswd /etc/group /etc/netgroup /etcshadow Some files should not be changed or edited at all: cifsconfig.cfg cifssec.cfg lclgroups.cfg filesid.cfg sysconfigtab registry.* Timezone settings Timezone settings can be changed after initial setup with the timezone command (See Figure 20-12). Each timezone is described by a file that is kept in the /etc/zoneinfo directory on the Filer. To access the /etc/timezone directory mount the Filer as described in Mounting the root volume/mapping administration share on page 239. itsosj-n1> timezone Europe/Berlin itsosj-n1> timezone Current time zone is Europe/Berlin itsosj-n1> Figure Change the timezone using the command line interface Alternatively you can use or FilerView. Click on Filer -> Set Timezone. Your specific timezone may not be listed so have to click on Show All Time Zones. Choose the appropriate timezone and click on Apply. (See Figure 20-13) Chapter 20. Subsequent Setup 241

272 7129_subsequent_setup_dp_0825.fm Draft Document for Review December 6, :59 am Figure Change timezone using FilerView Date settings Set day and correct time with date on the command line or use FilerView. The date command uses the following syntax: date [ -u ] [[[[[cc]yy]mm]dd]hhmm[.ss] ] cc: The first digits of the year (4 digit format) (For example 20 for 2005) yy: Last two digits of the year mm: Month (mumeric format 1 to 12) dd: Day (numeric Format 1 to 31) hh: Hour (00 to 23) ss: Seconds (00 to 59) Or use FilerView Click on Filer -> Set Date/Time and enter the appropriate information. See Figure Set date and time with FilerView on page 243 for a screen shot using FilerView for setting date and time. 242 The IBM TotalStorage Network Attached Storage N Series

273 Draft Document for Review December 6, :59 am 7129_subsequent_setup_dp_0825.fm Figure Set date and time with FilerView Verification of the installation This section shows how to verify the setup and installation. It describes procedures to test network connectivity of the filer. Verify network connections The ping <destination> command can be used to test if the filer is reachable from a client via the IP network. Open a Windows cmd box or a UNIX terminal on your client. See Figure Ping command from a Windows system: responds on page 244 for a Windows and Figure Ping command from a Linux system: no response on page 244 for a UNIX example. The options -c (LINUX) or -n (Windows) specifies how many request should be send before giving up on the command. If you don t know the IP address, you can use a serial console terminal and issue the ipconfig command (Figure Ifconfig command on page 245) Chapter 20. Subsequent Setup 243

274 7129_subsequent_setup_dp_0825.fm Draft Document for Review December 6, :59 am C:\>ping itso-n1 Pinging itso-n1 [ ] with 32 bytes of data: Reply from : bytes=32 time<10ms TTL=254 Reply from : bytes=32 time<10ms TTL=254 Reply from : bytes=32 time<10ms TTL=254 Reply from : bytes=32 time<10ms TTL=254 Ping statistics for : Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 0ms, Maximum = 0ms, Average = 0ms C:\> Figure Ping command from a Windows system: responds ping c 5 PING ( ) 56(84) bytes of data. From icmp_seq=0 Destination Host Unreachable From icmp_seq=1 Destination Host Unreachable From icmp_seq=2 Destination Host Unreachable From icmp_seq=3 Destination Host Unreachable From icmp_seq=4 Destination Host Unreachable ping statistics packets transmitted, 0 received, +5 errors, 100% packet loss, time 4020ms, pipe 4 [root@itso3775 root]# Figure Ping command from a Linux system: no response If the destination responses, a basic network connection can be established. You may try the ping command with the IP address as well as the hostname for a correct DNS resolution. If the filer does not respond, check the network connection for Correct cabling: Are the network cables connected. Media Type: Is the Media Type set up correctly (speed, interface) Routing: Check the routing between different networks. Host name resolution: Verify the name resolution if ping on IP addresses is responding but on hostnames not. If you are using Active Directory domains, you must also create a host address (A) record on the DNS server for the storage system's fully qualified domain name. Name resolution may also be done with /etc/hosts files. Add an entry in each host's /etc/hosts file for each of the storage system interfaces. Firewalls may also prevent from getting access to the Filer (ping, addresses, and so on) 244 The IBM TotalStorage Network Attached Storage N Series

275 Draft Document for Review December 6, :59 am 7129_subsequent_setup_dp_0825.fm For more information about network configuration, see the System Storage N series Network Management Guide GA /00. The ifconfig command can be used to display network adapter settings of the N series systems. Use the -a option to display the settings for all adapters/ports. See itsosj-n1(takeover)> ifconfig -a ns0: flags=848043<up,broadcast,running,multicast> mtu 1500 inet netmask 0xffffffc0 broadcast ether 00:50:56:09:d4:67 (Linux AF_PACKET socket) ns1: flags=808042<broadcast,running,multicast> mtu 1500 ether 00:50:56:0a:d4:67 (Linux AF_PACKET socket) lo: flags=19e8049<up,loopback,running,multicast,multihost,partner_up,tcpcksum> mtu 4064 inet netmask 0xff broadcast ether 00:00:00:00:00:00 (Shared memory) itsosj-n1(takeover)> Figure Ifconfig command Verify storage configuration After setup is complete, at least the following entries should exist on the storage system: /vol/vol0 (a virtual root path) /vol/vol0/home (a directory) The entries /vol/vol0 and /vol/vol0/home had been created during installation and configuration. You can verify this by issuing the exportfs command at the Filer command line. See Figure for details. itsosj-n1> exportfs /vol/vol0 -sec=sys,rw,anon=0,nosuid /vol/vol0/home -sec=sys,rw,nosuid /vol/dsp -sec=sys,rw,nosuid itsosj-n1> Figure Exportfs command Note: /vol is not a directory. It is a special virtual root path under which the storage system mounts all volumes. Therefore /vol can not be mounted in order to view all the volumes on the storage system. You must mount each storage system volume separately Cluster verification This section describer further cluster verification. License information and cluster status had been checked on both nodes. Certain configuration files in the root volume should match on both nodes. Ensure these files are identical on both nodes: /etc/resolv.conf /etc/httpd.mimetypes /etc/dgateways /etc/nsswitch.conf Chapter 20. Subsequent Setup 245

276 7129_subsequent_setup_dp_0825.fm Draft Document for Review December 6, :59 am Mounting the root volume/mapping administration share on page 239 helps to compare the files on node1 and node2. Some parameters on both cluster nodes must match. Verify matching parameters om both cluster nodes: ARP table date nfs route table routed timezone You can use Appendix 0.3.3, Matching parameters in clustered environments on page 7 to note down and compare the settings. See Figure for details. itsosj-n1> arp -a? ( ) at (incomplete) itsosj-n1 ( ) at 0:50:56:9:d4:67 permanent? ( ) at ff:ff:ff:ff:ff:ff permanent itsosj-n1> date Wed Sep 7 16:20:13 PDT 2005 itsosj-n1> nfs status NFS server is running. itsosj-n1> route -s Routing tables Internet: Destination Gateway Flags Refs Use Interface default UGS ns /26 link#1 UC 0 0 ns link#1 UHL 1 0 ns0 itsosj-n1 0:50:56:9:d4:67 UHL 0 4 lo ff:ff:ff:ff:ff:ff UHL 0 54 ns0 127 localhost UGS 0 35 lo localhost localhost UH 2 32 lo itsosj-n1> routed status RIP snooping is on Gateway Metric State Time Last Heard ALIVE Wed Sep 7 14:07:08 PDT free gateway entries, 1 used itsosj-n1> timezone Current time zone is US/Pacific itsosj-n1> Figure Matching parameters in clustered environments Changing CIFS s settings CIF s settings can be changed after initial setup. The cifs setup procedure provides an easy way to customize previous taken settings. First stop cifs on the Filer with the cifs terminate command, the run cifs setup to change cifs settings. In a clustered environment both nodes should be customized with the same settings. Instead of using command line (Figure Cifs setup on page 247) you may use FilerView CIFS setup Wizard. 246 The IBM TotalStorage Network Attached Storage N Series

277 Draft Document for Review December 6, :59 am 7129_subsequent_setup_dp_0825.fm itsosj-n1> cifs terminate CIFS local server is shutting down... CIFS local server has shut down... itsosj-n1> cifs setup This process will enable CIFS access to the filer from a Windows(R) system. Use "?" for help at any prompt and Ctrl-C to exit without committing changes. This filer is currently a member of the Windows-style workgroup 'WORKGROUP'. Do you want to continue and change the current filer account information? [n]: y Your filer does not have WINS configured and is visible only to clients on the same subnet. Do you want to make the system visible via WINS? [n]: This filer is currently configured as a multiprotocol filer. Would you like to reconfigure this filer to be an NTFS-only filer? [n]: The default name for this CIFS server is 'ITSOSJ-N1'. Would you like to change this name? [n]: Data ONTAP CIFS services support four styles of user authentication. Choose the one from the list below that best suits your situation. (1) Active Directory domain authentication (Active Directory domains only) (2) Windows NT 4 domain authentication (Windows NT or Active Directory domains) (3) Windows Workgroup authentication using the filer's local user accounts (4) /etc/passwd and/or NIS/LDAP authentication Selection (1-4)? [1]: What is the name of the Active Directory domain? [testdom01.almaden.ibm.com]: In order to create an Active Directory machine account for the filer, you must supply the name and password of a Windows account with sufficient privileges to add computers to the testdom01.almaden.ibm.com domain. Enter the name of the Windows user [[email protected]]: Password for [email protected]: CIFS - Logged in as [email protected]. The user that you specified has permission to create the filer's machine account in several (2) containers. Please choose where you would like this account to be created. (1) CN=computers (2) OU=Domain Controllers (3) None of the above Selection (1-3)? [1]: CIFS - Starting SMB protocol... Currently the user "ITSOSJ-N1\administrator" and members of the group "testdom01\domain Admins" have permission to administer CIFS on this filer. You may specify an additional user or group to be added to the filer's "BUILTIN\Administrators" group, thus giving them administrative privleges as well. Would you like to specify a user or group that can administer CIFS? [n]: Welcome to the testdom01.almaden.ibm.com (testdom01) Active Directory(R) domain. CIFS local server is running. Figure Cifs setup Chapter 20. Subsequent Setup 247

278 7129_subsequent_setup_dp_0825.fm Draft Document for Review December 6, :59 am 248 The IBM TotalStorage Network Attached Storage N Series

281 Draft Document for Review December 6, :59 am 7129_dp_client_attachment_0914.fm 1 Chapter 1. Client attachment This chapter describes how to mount N series volume to clients. This section covers the following topics: Mounting N series volumes to AIX clients Mounting N series volumes to LINUX clients TBD Mounting N series volumes to Solaris clients TBD Mounting N series volumes to XXX clients TBD Copyright IBM Corp All rights reserved. 1

282 7129_dp_client_attachment_0914.fm Draft Document for Review December 6, :59 am 1.1 AIX Clients We created on our N series Filer a volume named AIX_vol01 with the settings shown in Figure 1-1 for our AIX tests. Figure 1-1 AIX volume Check IP connection between client and N series filer In order to check the IP connection between the AIX client and the N series filer, we used the ping command. Depending on IP configuration, DNS or /etc/host settings the Filer responds to the commands ping ip_address and ping hostname. (Figure 1-2 and Figure 1-3 ping hostname on page 3). The -c options specifies the number of tries. We chose 5 in this example. # ping -c PING : ( ): 56 data bytes 64 bytes from : icmp_seq=0 ttl=254 time=0 ms 64 bytes from : icmp_seq=1 ttl=254 time=0 ms 64 bytes from : icmp_seq=2 ttl=254 time=0 ms 64 bytes from : icmp_seq=3 ttl=254 time=0 ms 64 bytes from : icmp_seq=4 ttl=254 time=0 ms PING Statistics packets transmitted, 5 packets received, 0% packet loss round-trip min/avg/max = 0/0/0 ms # Figure 1-2 ping ip_address 2 The IBM TotalStorage Network Attached Storage N Series

283 Draft Document for Review December 6, :59 am 7129_dp_client_attachment_0914.fm # ping -c 5 itsosj_n1 PING itsosj_n1: ( ): 56 data bytes 64 bytes from : icmp_seq=0 ttl=254 time=0 ms 64 bytes from : icmp_seq=1 ttl=254 time=0 ms 64 bytes from : icmp_seq=2 ttl=254 time=0 ms 64 bytes from : icmp_seq=3 ttl=254 time=0 ms 64 bytes from : icmp_seq=4 ttl=254 time=0 ms ----itsosj_n1 PING Statistics packets transmitted, 5 packets received, 0% packet loss round-trip min/avg/max = 0/0/0 ms # Figure 1-3 ping hostname If the hostname can not be resolved but you get responds from the ping ip_address command, check DNS configuration or /etc/hosts entries. Use more /etc/hosts or pg /etc/hosts to view hosts entries if your client uses /etc/hosts files. See Figure 1-4 for an example section of an /etc/host file.... # The format of this file is: # Internet Address Hostname # Comments # Items are separated by any number of blanks and/or tabs. A '#' # indicates the beginning of a comment; characters up to the end of the # line are not interpreted by routines which search this file. Blank # lines are allowed. # Internet Address Hostname # Comments # net0sample # ethernet name/address # token0sample # token ring name/address # x25sample # x.25 name/address loopback localhost # loopback (lo0) name/address crete itsosj_n1... # Figure 1-4 Section of our /etc/hosts file Listing available NFS shares The showmount -e ip_address command lists all shares on the Filer which can be accessed. You may specify the ip_address or hostname of the filer. In Figure 1-5 The showmount command on page 3 our test volume AIX_vol01 appears. # showmount -e itsosj_n1 export list for itsosj_n1: /vol/aix_vol01 (everyone) /vol/vol0/home (everyone) /vol/dsp (everyone) /vol/vol0 (everyone) # Figure 1-5 The showmount command Chapter 1. Client attachment 3

284 7129_dp_client_attachment_0914.fm Draft Document for Review December 6, :59 am Showmount -a filer_name lists all mounted file system from the Filer r filer_name on the client. Figure 1-6 shows the -a option in an environment where no remote mount was processed. # showmount -a itsosj_n1 # Figure 1-6 Showmount -a lists all mounted volumes for a specific Filer Mounting a Volume Before a N series volume can be mounted make sure a mount point is available on the client. We created a mount point with in the root directory of our AIX client named /N_series for this example. See Figure 1-7. The ls commands shows details about the previously created mount point. # mkdir /N_series # ls -ld /N_series drwxr-xr-x 2 root system 512 Sep 14 15:58 /N_series # Figure 1-7 Create mount point and list directory info You will need the following informations to mount a NFS share from your Filer: Mount point (We used: /N_series) Filer name or IP address specified by the -n option (itsosj_n1 in our environment) NFS share name with path on Filer (we created a volume: /vol/aix_vol01) Type of mount: NFS specified by the -v option In order to mount a volume either use the command line command mount or use Smitty. We will explain both ways to create an NFS mount. Command line mount First we show how to mount a NFS share with the command line. The mount command mounts the remote file system to our client. (Figure 1-8) mount -n'itsosj_n1' -v'nfs' /vol/aix_vol01 /N_series Figure 1-8 Mount remote file system Verify if the command finished successfully on your client with the mount command. See Figure 1-9 Verify mount on the client on page 5 4 The IBM TotalStorage Network Attached Storage N Series

285 Draft Document for Review December 6, :59 am 7129_dp_client_attachment_0914.fm # mount node mounted mounted over vfs date options /dev/hd4 / jfs Jul 28 14:49 rw,log=/dev/hd8 /dev/hd2 /usr jfs Jul 28 14:49 rw,log=/dev/hd8 /dev/hd9var /var jfs Jul 28 14:49 rw,log=/dev/hd8 /dev/hd3 /tmp jfs Jul 28 14:49 rw,log=/dev/hd8 /dev/hd1 /home jfs Jul 28 14:50 rw,log=/dev/hd8 /proc /proc procfs Jul 28 14:50 rw /dev/hd10opt /opt jfs Jul 28 14:50 rw,log=/dev/hd8 itsosj_n1 /vol/aix_vol01 /N_series nfs3 Sep 14 16:05 # Figure 1-9 Verify mount on the client You may list all remote mounts from a particular Filer specify showmount with the -a option. See Figure # showmount -a itsosj_n :/vol/AIX_vol01 # Figure 1-10 Show remote mounts of a particular Filer Smitty mount This section shows how we mounted a remote file system from Filer itsosj_n1 to our AIX client named crete. We used the same volume (/vol/aix_vol0) on our Filer as in the command line example). We assume the mount point (we used: /N_series) is already created. Open a AIX terminal and enter smitty mountfs. Proceed as described in Figure 1-11 Smitty mountfs on page 6 and press enter. Chapter 1. Client attachment 5

286 7129_dp_client_attachment_0914.fm Draft Document for Review December 6, :59 am Figure 1-11 Smitty mountfs After the remote file system was successfully mounted smitty shows a message about command competition. (Figure 1-12) Figure 1-12 Message File system successfully mounted Verify all mounted file systems by issuing smitty mount in an AIX terminal. See starting smitty with fast path mount in Figure 1-13 smitty mount on page 7. 6 The IBM TotalStorage Network Attached Storage N Series

287 Draft Document for Review December 6, :59 am 7129_dp_client_attachment_0914.fm # smitty mount Figure 1-13 smitty mount Highlight by moving the cursor to List All Mounted File Systems from the menu (see Figure 1-14) and press enter. Figure 1-14 Smitty mount Smitty provides details about all mounted file systems. (Figure 1-15 List all mounted volumes via smitty on page 8) Chapter 1. Client attachment 7

288 7129_dp_client_attachment_0914.fm Draft Document for Review December 6, :59 am Figure 1-15 List all mounted volumes via smitty 8 The IBM TotalStorage Network Attached Storage N Series

289 Draft Document for Review December 6, :59 am 7129p07.fm Part 1 Part 1 N series Filer administration In this part we introduce/provide/describe/discuss the installation of the System Storage N series. Describe: two models single node A10 and clustered A20! -> chapter1 single -> chapter2 cluster next part: administration + additional configuration Copyright IBM Corp All rights reserved. 9

291 Draft Document for Review December 6, :59 am 7129_admin_dp_0908.fm 2 Chapter 2. N series Filer Administration This chapter describes common basic administration tasks with the N series Filers. For more, detailed Information on administrating the N series Filers refer to REFERENCE Administration Guide the subsequent configuration of the IBM Total storage N Series Filers. It covers the Models N3700 Model A10 and A20 (the Model N5200 and Model N5500) This section covers the following topics: Administration Methods Starting, stopping and reboot of the N series system Checking the Data ONTAP software version Updating Data ONTAP software Storage Management Disks Aggregates N series Volumes Qtrees (TBD) Shares (TBD) Cluster Management Managing snapshots Copyright IBM Corp All rights reserved. 11

292 7129_admin_dp_0908.fm Draft Document for Review December 6, :59 am 2.1 Administration Methods Administration Methods are described in Administration Methods on page 234. Basically these are the FilerView interface DataFabric Manager software Command line interface (CLI) Mounting the root volume/mapping administration share We try to explain the administration steps on command line as well as N series FilerView Web interface. Web interface To access the Filer via FilerView Open your Browser and point to the URL: or ip-address>/na_admin Specify user and password to proceed. Command line interface The command line interface (CLI) can be accessed via telnet or secure shell interface. The help command or simply? provides an overview of the commands of the CLI. See REFERENCE. The help <command> provides a brief description of what the <command> does. <command> help lists available options of the specified command. See Figure 2-1. Figure 2-1 The help and? commands The manual pages can be accessed by the man command. Figure 2-2 The man command on page 13 provides a detailed description of a command and lists options (man <command>) 12 The IBM TotalStorage Network Attached Storage N Series

293 Draft Document for Review December 6, :59 am 7129_admin_dp_0908.fm Figure 2-2 The man command 2.2 Starting, stopping and reboot of the N series system This section describes shutdown, boot and halt procedures. Important: Reboot or halt should always be planned procedures. Users should be informed about these tasks. Give Users enough time to save their changes to prevent from data loss Starting the N series system The N series boot code resides on a Compact Flash card. After turning on the system, N series will boot automatically from this card. You may enter in an alternative boot modus by pressing Ctrl-C and choose the boot option. Figure 2-3 Boot screen: Press CTRL-C for special boot menu on page 14 shows the boot procedure. Attention: Power on the N3700 System in the following order: 1. EXP600 (Expansion Disk shelves) 2. Appliance (Base unit) Chapter 2. N series Filer Administration 13

294 7129_admin_dp_0908.fm Draft Document for Review December 6, :59 am CFE version based on Broadcom CFE: Copyright (C) 2000,2001,2002,2003 Broadcom Corporation. Portions Copyright (C) 2002,2003 Network Appliance Corporation. CPU type 0x : 650MHz Total memory: 0x bytes (1024MB) Starting AUTOBOOT press any key to abort... Loading: 0xffffffff / xffffffff / Entry at 0xffffffff Starting program at 0xffffffff Press CTRL-C for special boot menu Figure 2-3 Boot screen: Press CTRL-C for special boot menu See Figure 2-4 to choose from the boot options. 1) Normal Boot 2) Boot without /etc/rc 3) Change Password 4) Initialize all disks 4a) Same as option 4 but create a flexible root volume 5) Maintenance boot Selection (1-5)? Figure 2-4 Boot menu Normal case is to boot in Normal Boot mode Stopping the N series system Stopping and rebooting the N series system disables all users from accessing the filer. Before stopping or rebooting the system, make sure maintenance is possible and all users (File access, Database user and others) are informed about the upcoming action). Important: In order to graceful shutdown the N series systems use the halt command or FilerView. It prevents from having unpredictable problems. Remember to shutdown both Nodes if a N series 3700 Model A20 has to be shutdown. Cifs services The cifs sessions command reports open session to the N3700 Filer. See Figure 2-5 List open CIFS sessions on page The IBM TotalStorage Network Attached Storage N Series

295 Draft Document for Review December 6, :59 am 7129_admin_dp_0908.fm itsosj-n1> cifs sessions Server Registers as 'ITSO-N1' in workgroup 'WORKGROUP' Root volume language is not set. Use vol lang. WINS Server: Using Local Users authentication ==================================================== PC IP(PC Name) (user) #shares #files () (ITSO-N1\administrator - root) (using security signatures) () (ITSO-N1\administrator - root) (using security signatures) 3 0 itsosj-n1> Figure 2-5 List open CIFS sessions With the N series Filers, you can specify which users receive CIFS shutdown messages. By issuing the cifs terminate command, Data ONTAP sends per default a message to all open client connections. This setting can be changed by issuing the following command: options cifs.shutdown_msg_level Never send CIFS shutdown messages 1 - Send CIFS messages to clients connected and having open files only 2 - Send CIFS messages to all open connections (default) The cifs terminate command shuts down CIFS, ends CIFS service for a volume, or logs off a single station. See Figure 2-6. The -t option can be used to specify a delay interval in minutes before cifs stops. You can even pick single workstations, for which the cifs service should stop. See Figure 2-7 Cifs terminate command for a single workstation on page 16 for details. When you shutdown a Filer you there is no need to specify the cifs terminate command as it will run by the operating system automatically. itsosj-n1> cifs terminate -t 3 Total number of connected CIFS users: 1 Total number of open CIFS files: 0 Warning: Terminating CIFS service while files are open may cause data loss!! 3 minutes left until termination (^C to abort)... 2 minutes left until termination (^C to abort)... 1 minute left until termination (^C to abort)... CIFS local server is shutting down... CIFS local server has shut down... itsosj-n1> Figure 2-6 Cifs terminate -t Chapter 2. N series Filer Administration 15

296 7129_admin_dp_0908.fm Draft Document for Review December 6, :59 am itsosj-n1> cifs terminate -t 3 workstation_01 3 minutes left until termination (^C to abort)... 2 minutes left until termination (^C to abort)... 1 minute left until termination (^C to abort)... itsosj-n1> Thu Sep 8 09:41:43 PDT [itsosj-n1: cifs.terminationnotice:warning]: CIFS: shut down completed: disconnected workstation workstation_01. itsosj-n1> Figure 2-7 Cifs terminate command for a single workstation Note: Workstations running Windows 95/98 or Windows for Workgroups won t see the notification unless they are running WinPopup. Depending on CIFS Message settings, Pop-ups or similar messages as described in Figure 2-8 should appear on the affected workstations. Figure 2-8 Shutdown messages on cifs clients To restart cifs, issue the cifs restart command as shown in See Figure 2-9.The Filer startup procedure starts the cifs services automatically. itsosj-n1> cifs restart CIFS local server is running. itsosj-n1> Figure 2-9 Cifs restart You can check if cifs is running with the cifs sessions command. If cifs is not running a message as shown in Figure 2-10 Check, if cifs is running on the Filer on page 17 appears. If cifs is running the command returns all clients as describes in Figure 2-10 Check, if cifs is running on the Filer on page The IBM TotalStorage Network Attached Storage N Series

297 Draft Document for Review December 6, :59 am 7129_admin_dp_0908.fm itsosj-n1> cifs sessions CIFS not running. Use "cifs restart" to restart Use "cifs prefdc" to set preferred DCs Use "cifs testdc" to test WINS and DCs Use "cifs setup" to configure itsosj-n1> Figure 2-10 Check, if cifs is running on the Filer Halting the Filer You can use the command line or FilerView interface to stop the filer. You may use the halt command on CLI to perform a graceful shutdown. the -t option causes the system after the number of minutes you specified (For example: halt -t 5). Halt stops all services and shuts down the system gracefully to the Common Firmware Environment (CFE) prompt. File system changes will be written to disk and the non-volatile Random Access Memory (NVRAM) content vacated. We used the serial console because the IP connection will be lost after halting the Filer. See Figure 2-11 for details. CFE version based on Broadcom CFE: Copyright (C) 2000,2001,2002,2003 Broadcom Corporation. Portions Copyright (C) 2002,2003 Network Appliance Corporation. CPU type 0x : 650MHz Total memory: 0x bytes (1024MB) CFE> Figure 2-11 Halt with command line interface (serial console) Booting the Filer As described in Starting the N series system on page 13, the N series appliances automatically boot Data ONTAP from a PC Compact Flash card which ships with the most current Data ONTAP release. The Compact Flash card contains sufficient space for an upgrade kernel. The download command is used to copy a boot kernel to the Compact Flash card. The Common Firmware Environment (CFE) prompt provides several boot options: boot_ontap Boots the current version of Data ONTAP from Compact Flash card. boot_primary Boots the current version of Data ONTAP from Compact Flash card as primary kernel (same Kernel as boot_ontap) boot_backup Boots the backup version of Data ONTAP from Compact Flash card. The backup release is created during the first software upgrade to preserve the kernel that shipped with the system. It provides a known good release from which you can boot the system if it fails to automatically boot the primary image. netboot Boots from a Data ONTAP version stored on a remote HTTP or TFTP server. Netboot enables Chapter 2. N series Filer Administration 17

298 7129_admin_dp_0908.fm Draft Document for Review December 6, :59 am you to Boot an alternative kernel if the Compact Flash card becomes damaged Upgrade the boot kernel for several devices from a single server To enable netboot, you must configure networking for the N series appliance (using DHCP or static IP address) and place the boot image on a configured server. Tip: We recommend to store a boot image on a Http or TFTP server to protect against Compact Flash card corruption. For more information about setting up Netboot go to: Enter Netboot Process for N3700 Storage System and click the Search button. Then select Netboot Process for the N3700 Storage System and proceed as described in the description. Usually you boot the Filer after you issued the halt command with the boot_ontap or bye command. These commands end the CFE prompt and restart the Filer. See Figure 2-12 Startup the filer at the CFE prompt on page The IBM TotalStorage Network Attached Storage N Series

299 Draft Document for Review December 6, :59 am 7129_admin_dp_0908.fm CFE>bye CFE version based on Broadcom CFE: Copyright (C) 2000,2001,2002,2003 Broadcom Corporation. Portions Copyright (C) 2002,2003 Network Appliance Corporation. CPU type 0x : 650MHz Total memory: 0x bytes (1024MB) Starting AUTOBOOT press any key to abort... Loading: 0xffffffff / xffffffff / Entry at 0xffffffff Starting program at 0xffffffff Press CTRL-C for special boot menu interconnect based upon M-VIA ERing Support Copyright (c) Berkeley Lab Wed Aug 31 19:00:46 GMT [cf.nm.nictransitionup:info]: Interconnect link 0 is UP Wed Aug 31 19:00:46 GMT [cf.nm.nictransitiondown:warning]: Interconnect link 0 is DOWN Data ONTAP Release 7.1H1: Mon Aug 15 16:02:45 PDT 2005 (IBM) Copyright (c) Network Appliance, Inc. Starting boot on Wed Aug 31 19:00:45 GMT 2005 Wed Aug 31 19:00:51 GMT [diskown.isenabled:info]: software ownership has been enabled for this system Wed Aug 31 19:00:56 GMT [raid.cksum.replay.summary:info]: Replayed 0 checksum blocks. Wed Aug 31 19:00:56 GMT [raid.stripe.replay.summary:info]: Replayed 0 stripes. Wed Aug 31 19:00:57 GMT [localhost: cf.fm.launch:info]: Launching cluster monitor Wed Aug 31 19:00:57 GMT [localhost: cf.fm.notkoverclusterdisable:warning]: Cluster monitor: cluster takeover disabled (restart) add net : gateway DBG: Failed to get partner serial number from VTIC DBG: Set filer.serialnum to: Wed Aug 31 19:00:58 GMT [rc:notice]: The system was down for 71 seconds Wed Aug 31 12:01:00 PDT [itsosj-n1: dfu.firmwareuptodate:info]: Firmware is up-to-date on all disk drives Wed Aug 31 12:01:00 PDT [ltm_services:info]: Ethernet e0a: Link up add net default: gateway : network unreachable Wed Aug 31 12:01:02 PDT [rc:alert]: timed: time daemon started Wed Aug 31 12:01:03 PDT [itsosj-n1: mgr.boot.disk_done:info]: Data ONTAP Release 7.1H1 boot complete. Last disk update written at Wed Aug 31 11:59:46 PDT 2005 Wed Aug 31 12:01:03 PDT [itsosj-n1: mgr.boot.reason_ok:notice]: System rebooted. Password: itsosj-n1> Wed Aug 31 12:01:20 PDT [console_login_mgr:info]: root logged in from console itsosj-n1> Figure 2-12 Startup the filer at the CFE prompt Alternatively you can use FilerView to shutdown the Filer. See Figure 2-13 Halt with FilerView GUI on page 20. Choose Filer -> Shutdown and Reboot and specify any options you want. Additional Confirmation and Alert Popup Windows will appear (Figure 2-14 Confirmation on halting the filer with FilerView on page 20). Chapter 2. N series Filer Administration 19

300 7129_admin_dp_0908.fm Draft Document for Review December 6, :59 am Figure 2-13 Halt with FilerView GUI Figure 2-14 Confirmation on halting the filer with FilerView Depending on the CIFS Message settings and Microsoft Windows Client settings you may receive several messages on your CIFS client concerning the shutdown of the filer. See Figure 2-15 CIFS shutdown messages on page The IBM TotalStorage Network Attached Storage N Series

301 Draft Document for Review December 6, :59 am 7129_admin_dp_0908.fm Figure 2-15 CIFS shutdown messages Rebooting the system The System Storage N series systems can be rebooted with command line or FilerView Interface. Reboot on the command line interface halts the Filer and then restarts it. NFS clients can maintain use of a file over a halt or reboot, because NFS is a stateless protocol. CIFS, FCP, and iscsi clients behave different and you may use the -t option to specify the time before shutdown. See Figure 2-16 [root@itso3775 node1]# reboot Broadcast message from root (pts/2) (Thu Sep 8 13:23: ): The system is going down for reboot NOW! Figure 2-16 Reboot with command line interface If you choose FilerView to reboot the system proceed as shown in Figure 2-17 Reboot with FilerView on page 22. A Confirmation and Alert Window will popup select Ok and proceed (Figure 2-18 FilerView reboot Confirmation and Alert messages on page 22) if you are certain to shutdown the Filer. Depending on Shutdown message settings cifs Clients will get Popup messages See Figure Chapter 2. N series Filer Administration 21

302 7129_admin_dp_0908.fm Draft Document for Review December 6, :59 am Figure 2-17 Reboot with FilerView Figure 2-18 FilerView reboot Confirmation and Alert messages 2.3 Checking the Data ONTAP software version Data ONTAP software level can be listed through the version command. Use the command line as shown in Figure 2-19 or use FilerView as shown in Figure 2-20 FilerView About Data ONTAP Window on page 23. itsosj-n1> version Data ONTAP Release 7.1X17: Mon Aug 8 02:50:45 PDT 2005 (IBM) itsosj-n1> Figure 2-19 check Data ONTAP version 22 The IBM TotalStorage Network Attached Storage N Series

303 Draft Document for Review December 6, :59 am 7129_admin_dp_0908.fm Figure 2-20 FilerView About Data ONTAP Window 2.4 Updating Data ONTAP software TBD 2.5 Storage Management Replacing a disk The IBM N series systems are designed to replace disks online. Most common reason for replacing a disks in storage subsystems is a disk failure or a disk is producing excessive error messages. The N series administrator is even able to remove spare disks from N series filers, in order to use it in a different N series Filer. Restriction: The administrator can only remove spare disks from the system. Data disks which are part of an aggregate can not be removed. The only way to reduce the number of disks in an aggregate is to copy data and transfer it to a new file system that has fewer disks. Chapter 2. N series Filer Administration 23

304 7129_admin_dp_0908.fm Draft Document for Review December 6, :59 am Important: Before you fail or replace a disk, check if you meet SES requirements. See REFERENCE CHAPTER INST_SINGLENODE -Z HEADE: SES DISK and System Storage EXP600 Storage Expansion Unit Hardware and Service Guide GA for more detail. Failed disks The administrator can locate failed disks in an Aggregate on the command line by issuing the aggr status -f or vol status -f command. See Figure 2-21 where we previously failed a disk by the root administrator in order to show a failed disk with the aggr status - f command. Note: In a clustered environment check both nodes for failed disks, as disks assigned to the other node appear as partner disks. Failed disks from the other node may not show up as failed on both nodes. itsosj-n1> aggr status -f Broken disks RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) admin failed v0.19 v0 1 3 FC:A - FCAL N/A 36/ /87168 itsosj-n1> Figure 2-21 Finding failed disks When a disk failure is detected, a hot spare automatically becomes online as a replacement. The N series Filer enters a degraded mode and rebuild takes place if any spare is available. If the Filer is shutdown while it is in degraded mode, reconstruction will be stopped. If the Filer will be powered on again, reconstruction process starts from the beginning. If no replacement is available, the Filer operates in degraded mode and will shut down after 24 hours. This value can be increased to 72 hours with the raid.timeout option. If you are running a clustered system change the setting on both nodes. See Figure 2-22 for how to change the setting. itsosj-n1> options raid.timeout raid.timeout 24 (value might be overwritten in takeover) itsosj-n1> itsosj-n1> options raid.timeout 72 You are changing option raid.timeout which applies to both members of the cluster in takeover mode. This value must be the same in both cluster members prior to any takeover or giveback, or that next takeover/giveback may not work correctly. itsosj-n1> Fri Sep 9 22:05:19 CEST [itsosj-n1: reg.options.cf.change:warning]: Option raid.timeout changed on one cluster node itsosj-n1> Figure 2-22 Raid.timeout setting 24 The IBM TotalStorage Network Attached Storage N Series

305 Draft Document for Review December 6, :59 am 7129_admin_dp_0908.fm Attention: We do not recommend to change generally the raid.timeout settings. If the Filer operates in degraded mode and no spare disk is available, the RAID array can not be rebuild and the Filer runs probably without Raid protection. A second failure may cause data loss. Have enough spare disks available in our Filer and replace failed disks once the Filer has reconstructed data. Remove the disk you identified as failed physically from the shelf: Press down the release mechanism with one hand while grasping the top flange of the shelf with the under hand. See Figure 2-31 Disk drive release mechanism on page 29 Mechanism Pull the disk a little bit till it disengages and wait a few seconds to stop spinning. Remove the disk gently from the bay. See Figure 2-32 Remove disk drive from bay on page 30. Remove a data disk To remove a disk which reports errors but did not fail, note the disk from the log messages that reports errors. (look at numbers that follow the word Disk) Enter the command aggr status -r or vol status -r Look at the device column of the output of the sysconfig -r command which shows the disk ID of each disk. The location of the disk appears on the right of the disk ID, in the HA SHELF BAY column. Use disk fail [-i] disk to fail the appropriate disk as seen in Figure 2-24 Disk fail without -i option on page 26. Data ONTAP asks for confirmation on failing the specific disk. The -i option fails the disk immediately. Specify the disk you identified in the log messages. If no -i option has been specified, the specified disk will be pre-failed. Data will be copied to a replacement disk If copying was successful, disk will be marked as failed. This will take a while, depending on the size of the disk and the load of the N series Filer. When copy operation fails, the system turns to degraded mode. See Figure 2-27 Sysconfig -r shows Broken Disks (failed disks) on page 27 for more detail on the degraded mode. Important: Wait until the data has been copied when you use the -i option (pre-failed disk) If the -i option has been specified, the disk will fail immediately and the system runs in degraded mode until the RAID system has been rebuild. In our example we failed data disk v0.38 (Aggregate: aggr_test, which contains two raid groups (rg0 and rg1) v0.38 is in raid group: rg0; HA: 2 and SHELF BAY: 6). First we located the disk (v0.38) which to fail with the sysconfig -r command (Figure 2-23 Sysconfig -r locate the disk on page 26). Chapter 2. N series Filer Administration 25

306 7129_admin_dp_0908.fm Draft Document for Review December 6, :59 am itsosj-n1> sysconfig -r... Aggregate aggr_test (online, raid_dp) (block checksums) Plex /aggr_test/plex0 (online, normal, active) RAID group /aggr_test/plex0/rg0 (normal) RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) dparity v0.40 v0 2 8 FC:A - FCAL N/A 36/ /87168 parity v0.37 v0 2 5 FC:A - FCAL N/A 36/ /87168 data v0.38 v0 2 6 FC:A - FCAL N/A 36/ /87168 data v0.35 v0 2 3 FC:A - FCAL N/A 36/ /87168 RAID group /aggr_test/plex0/rg1 (normal) RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) dparity v0.34 v0 2 2 FC:A - FCAL N/A 36/ /87168 parity v0.33 v0 2 1 FC:A - FCAL N/A 36/ /87168 data v0.32 v0 2 0 FC:A - FCAL N/A 36/ /87168 data v0.39 v0 2 7 FC:A - FCAL N/A 36/ / itsosj-n1> Figure 2-23 Sysconfig -r locate the disk Then we failed the disk without the -i option (Figure 2-24). The system asked for confirmation (Yes/No) where we chose y. itsosj-n1> disk fail v0.38 *** You are about to prefail the following file system disk, *** *** which will eventually result in it being failed *** Disk /aggr_test/plex0/rg0/v0.38 RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) data v0.38 v0 2 6 FC:A - FCAL N/A 36/ /87168 *** Really prefail disk v0.38? y disk fail: The following disk was prefailed: v0.38 Disk v0.38 has been prefailed. Its contents will be copied to a replacement disk, and the prefailed disk will be failed out. itsosj-n1> Figure 2-24 Disk fail without -i option Disk failure, triggered by the disk fail command generated some messages as shown in Figure itsosj-n1> Mon Sep 12 19:25:44 CEST [itsosj-n1: raid.rg.diskcopy.start:notice]: /aggr_test/plex0/rg0: starting disk copy from v0.38 to v0.41 Mon Sep 12 19:25:55 CEST [itsosj-n1: raid.rg.diskcopy.done:notice]: /aggr_test/plex0/rg0: disk copy from v0.38 to v0.41 completed in 0:10.57 Mon Sep 12 19:25:55 CEST [itsosj-n1: raid.config.filesystem.disk.admin.failed.after.copy:info]: File system Disk v0.38 Shelf 2 Bay 6 [NETAPP VD-16MB-FZ ] S/N [ ] is being failed after it was successfully copied to a replacement. Mon Sep 12 19:25:55 CEST [itsosj-n1: raid.disk.unload.done:info]: Unload of Disk v0.38 Shelf 2 Bay 6 [NETAPP VD-16MB-FZ ] S/N [ ] has completed successfully itsosj-n1> Figure 2-25 Disk failure messages Sysconfig -r shows the new data disk v0.41 (Figure 2-26 Sysconfig -r shows disk v0.41 as new data disk aggregate on page 27). 26 The IBM TotalStorage Network Attached Storage N Series

307 Draft Document for Review December 6, :59 am 7129_admin_dp_0908.fm itsosj-n1> sysconfig -r... Aggregate aggr_test (online, raid_dp) (block checksums) Plex /aggr_test/plex0 (online, normal, active) RAID group /aggr_test/plex0/rg0 (normal) RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) dparity v0.40 v0 2 8 FC:A - FCAL N/A 36/ /87168 parity v0.37 v0 2 5 FC:A - FCAL N/A 36/ /87168 data v0.41 v0 2 9 FC:A - FCAL N/A 36/ /87168 data v0.35 v0 2 3 FC:A - FCAL N/A 36/ /87168 RAID group /aggr_test/plex0/rg1 (normal) RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) dparity v0.34 v0 2 2 FC:A - FCAL N/A 36/ /87168 parity v0.33 v0 2 1 FC:A - FCAL N/A 36/ /87168 data v0.32 v0 2 0 FC:A - FCAL N/A 36/ /87168 data v0.39 v0 2 7 FC:A - FCAL N/A 36/ / itsosj-n1> Figure 2-26 Sysconfig -r shows disk v0.41 as new data disk aggregate The failed disk v0.38 shows up in sysconfig -r section: Broken Disks. (Figure 2-27). itsosj-n1> sysconfig -r... Broken disks RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) admin failed v0.19 v0 1 3 FC:A - FCAL N/A 36/ /87168 admin failed v0.36 v0 2 4 FC:A - FCAL N/A 36/ /87168 admin failed v0.38 v0 2 6 FC:A - FCAL N/A 36/ / itsosj-n1> Figure 2-27 Sysconfig -r shows Broken Disks (failed disks) Remove hot spare disks Use the sysconfig - r or aggr status -s command to locate spare disks. See Figure 2-28 Locate spare disks wit sysconfig -r on page 28. Spare disks are listed with their ID and location in shelf and bay. Chapter 2. N series Filer Administration 27

308 7129_admin_dp_0908.fm Draft Document for Review December 6, :59 am itsosj-n1> sysconfig -r Volume vol0 (online, raid4) (block checksums) Plex /vol0/plex0 (online, normal, active) RAID group /vol0/plex0/rg0 (normal) RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) parity 0b.17 0b 1 1 FC:A - FCAL / / data 0b.38 0b 2 6 FC:A - FCAL / / Spare disks RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) Spare disks for block or zoned checksum traditional volumes or aggregates spare 0b.19 0b 1 3 FC:A - FCAL / / spare 0b.21 0b 1 5 FC:A - FCAL / / spare 0b.25 0b 1 9 FC:A - FCAL / / spare 0b.27 0b 1 11 FC:A - FCAL / / spare 0b.28 0b 1 12 FC:A - FCAL / / spare 0b.39 0b 2 7 FC:A - FCAL / / spare 0b.40 0b 2 8 FC:A - FCAL / / spare 0b.41 0b 2 9 FC:A - FCAL / / spare 0b.42 0b 2 10 FC:A - FCAL / / spare 0b.43 0b 2 11 FC:A - FCAL / / spare 0b.44 0b 2 12 FC:A - FCAL / / itsosj-n1> Figure 2-28 Locate spare disks wit sysconfig -r You can use FilerView to list spare disks for more convenience. Click in the right Frame of FilerView on Storage -> Disk -> Manage. Choose in the View Type Menu Spare disks and proceed by clicking on the View button. (Figure 2-29) Figure 2-29 FilerView: List spare disk 28 The IBM TotalStorage Network Attached Storage N Series

309 Draft Document for Review December 6, :59 am 7129_admin_dp_0908.fm To remove a Fibre Channel spare disk from the Filer issue the disk remove disk_name command. In Figure 2-30 we removed disk v0.41. itsosj-n1> itsosj-n1> disk remove v0.41 Fri Sep 9 22:27:50 CEST [itsosj-n1: raid.config.spare.disk.admin.removed:info]: Spare Disk v0.41 Shelf 2 Bay 9 [NETAPP VD-16MB-FZ ] S/N [ ] is being removed by administrator. Fri Sep 9 22:27:50 CEST [itsosj-n1: raid.disk.unload.done:info]: Unload of Disk v0.41 Shelf 2 Bay 9 [NETAPP VD-16MB-FZ ] S/N [ ] has completed successfully disk remove: The following disk was removed: v0.41 Removal and unload of disk v0.41 has been initiated. You will be notified via the system log when unload is complete itsosj-n1> Figure 2-30 Removing a spare disk After you removed the disk logically from the Filer wait till the disk stops spinning, put on an antistatic wrist strap and ground leash and remove the disk physically from the shelf: Press down the release mechanism with one hand while grasping the top flange of the shelf with the under hand. See Figure 2-31 Pull the disk a little bit till it disengages and wait a few seconds to stop spinning. Remove the disk gently from the bay. See Figure 2-32 Remove disk drive from bay on page 30. Release Mechanism Figure 2-31 Disk drive release mechanism Chapter 2. N series Filer Administration 29

310 7129_admin_dp_0908.fm Draft Document for Review December 6, :59 am Figure 2-32 Remove disk drive from bay Adding new disks After a failed disk was removed physically from the shelf, replace the disk as early as possible with a new one. Attention: New added disks should always have the same specifications as the existing disks in your shelf. Never use unsupported disk in the N series system. In order to insert a disk, first put on an antistatic wrist strap and ground leash. Insert the disk into the bay having the release mechanism at the top (Figure 2-33). Gently slide the disk in the bay until it engages the backplane. The release mechanism clicks into place. 2 Slide into bay Top locate release mechanism in position 3 1 Figure 2-33 Insert Disk After a disk was added physically, the disk becomes first spare disks and can be added to a RAID group. After a couple of seconds a message will be displayed that a disk was installed. You can verify that the disk was added by entering the aggr status -r command. The new inserted disk will show up in the Spare disk section. Adding multiple disks will take up to a 30 The IBM TotalStorage Network Attached Storage N Series

311 Draft Document for Review December 6, :59 am 7129_admin_dp_0908.fm minute to speed up and check device addresses. SES Bays must be populated wit disks. See SES disks on page 185 for details on SES Disks Aggregates This section describes how we created Aggregates with the FilerView interface and command line interface in our environment. A short introduction into Aggregates and raid groups can be found in Aggregates and RAID groups on page 186. Restriction: Maximum number of Aggregates is 100. Minimum Aggregate size is 10GB, maximum Aggregate size is 16TB. Maximum number of raid groups in an Aggregate 150. Create Aggregates with FilerView First we started FilerView with our Browser and opened the aggregate section on the left hand side. (Figure 2-34). Use Manage to view all existing Aggregates. Figure 2-34 FilerView: Add Aggregates Click on Add to create an Aggregate. The Aggregate Wizard pops up as shown in Figure 2-35 Create Aggregates Wizard: Add Aggregate on page 31. Click on Next. Figure 2-35 Create Aggregates Wizard: Add Aggregate Chapter 2. N series Filer Administration 31

312 7129_admin_dp_0908.fm Draft Document for Review December 6, :59 am Specify the name of your new Aggregate and choose the RAID Level (RAID 4 od RAIDDP) as you can see in our example Figure 2-36 and click on Next. We used RAID 4 (RAID DP box not checked) because we wanted to create very small raid groups for this example. Figure 2-36 Create Aggregates Wizard: Aggregate Name and RAID level As described we used small Raid groups with only two disks (Figure 2-37). You may use in your environment raid groups with more disks. Choose Next. Figure 2-37 Create Aggregates Wizard: Size of Raid Groups As you can see in Figure 2-38 Create Aggregates Wizard: Disk Selection on page 32, you may choose an automatic or manual disk selection. We used manual, so we were able to choose the disks on our own. See Figure 2-39 Create Aggregates Wizard: Disks to Add on page 33. Figure 2-38 Create Aggregates Wizard: Disk Selection Select which disks should form the aggregate from the list and proceed by pressing the Next button. (Figure 2-39) 32 The IBM TotalStorage Network Attached Storage N Series

313 Draft Document for Review December 6, :59 am 7129_admin_dp_0908.fm Figure 2-39 Create Aggregates Wizard: Disks to Add A popup window appears with a summary of the new Aggregate. Check the settings and proceed when they are ok. See Figure 2-40 Create Aggregates Wizard: Commit changes on page 33. Figure 2-40 Create Aggregates Wizard: Commit changes Finally you will get a status message about the creation of the new aggregate. See Figure 2-41 for a successfully created Aggregate. Chapter 2. N series Filer Administration 33

314 7129_admin_dp_0908.fm Draft Document for Review December 6, :59 am Figure 2-41 Create Aggregates Wizard: Aggregate successfully created Verify the successful creation of the new Aggregate. Choose in FilerView the Aggregates Manage option. The status of the new Aggregate should be creating, Raidtype, initializing. (See Figure 2-42 Aggregates Manage: New created Aggregate on page 34) Figure 2-42 Aggregates Manage: New created Aggregate If you want more Detail on the new aggregate simply click on the new Aggregate link in the right Frame of Aggregates -> Manage. See Figure 2-43 how FilerView shows details on the Aggregate. 34 The IBM TotalStorage Network Attached Storage N Series

315 Draft Document for Review December 6, :59 am 7129_admin_dp_0908.fm Figure 2-43 Aggregates Manage: Details of new created Aggregate More information on the status of the integration of disk the new Aggregate can be obtained in the FilerView -> Storage -> Disk -> Manage section on the left Frame of FilerView. See Figure 2-44 Disks zeroing on page 36 for an example of our environment. Chapter 2. N series Filer Administration 35

316 7129_admin_dp_0908.fm Draft Document for Review December 6, :59 am Figure 2-44 Disks zeroing Creating Aggregates with command line interface You may use the command line instead of FilerView. Use the aggr create command to add new Aggregates. See details on the command syntax in Figure itsosj-n1> help aggr create aggr create <aggr-name> [-f] [-l <language-code>] [-L [compliance enterprise]] [-m] [-n] [-r <raid-group-size>] [-R <rpm>] [-T {ATA EATA FCAL LUN SCSI}] [-t {raid4 raid_dp}] [-v] <disk-list> - create a new aggregate using the disks in <disk-list>; <disk-list> is either <ndisks>[@<disk-size>] or -d <disk-name1> <disk-name2>... <disk-namen> [-d <disk-name1> <disk-name2>... <disk-namen>]. If a mirrored aggregate is desired, make sure to specify even number for <ndisks>, or to use two '-d' lists. itsosj-n1> Figure 2-45 The aggr create command Use the aggr status command to view information about existing aggregates. See Figure 2-46 Aggr status command: lists existing Aggregates on page 36. itsosj-n1> aggr status Aggr State Status Options aggr0 online raid0, aggr root aggr_test online raid_dp, aggr raidsize=4 itsosj-n1> Figure 2-46 Aggr status command: lists existing Aggregates 36 The IBM TotalStorage Network Attached Storage N Series

317 Draft Document for Review December 6, :59 am 7129_admin_dp_0908.fm We created an Aggregate with the following settings: Name: aggr_itso02 RAID type: RAID4 Disk per raid group: 2 Disks: v0.42 v0.43 v0.44 v0.45 Figure 2-47 shows how we created an Aggregate by using aggr create on the command line. itsosj-n1> aggr create aggr_itso02 -t raid4 -r 2 -d v0.42 v0.43 v0.44 v0.45 Tue Sep 13 17:26:57 CEST [itsosj-n1: raid.vol.disk.add.done:notice]: Addition of Disk /aggr_itso02/plex0/rg1/v0.45 Shelf 2 Bay 13 [NETAPP VD-16MB-FZ ] S/N [ ] to aggregate aggr_itso02 has completed successfully Tue Sep 13 17:26:57 CEST [itsosj-n1: raid.vol.disk.add.done:notice]: Addition of Disk /aggr_itso02/plex0/rg1/v0.44 Shelf 2 Bay 12 [NETAPP VD-16MB-FZ ] S/N [ ] to aggregate aggr_itso02 has completed successfully Tue Sep 13 17:26:57 CEST [itsosj-n1: raid.vol.disk.add.done:notice]: Addition of Disk /aggr_itso02/plex0/rg0/v0.43 Shelf 2 Bay 11 [NETAPP VD-16MB-FZ ] S/N [ ] to aggregate aggr_itso02 has completed successfully Tue Sep 13 17:26:57 CEST [itsosj-n1: raid.vol.disk.add.done:notice]: Addition of Disk /aggr_itso02/plex0/rg0/v0.42 Shelf 2 Bay 10 [NETAPP VD-16MB-FZ ] S/N [ ] to aggregate aggr_itso02 has completed successfully Tue Sep 13 17:26:57 CEST [itsosj-n1: raid.rg.spares.low:warning]: /aggr_test/plex0/rg0 Tue Sep 13 17:26:57 CEST [itsosj-n1: raid.rg.spares.low:warning]: /aggr_test/plex0/rg1 Tue Sep 13 17:26:57 CEST [itsosj-n1: raid.rg.spares.low:warning]: /aggr_itso02/plex0/rg0 Tue Sep 13 17:26:57 CEST [itsosj-n1: raid.rg.spares.low:warning]: /aggr_itso02/plex0/rg1 Creation of an aggregate with 4 disks has completed. itsosj-n1> Tue Sep 13 17:26:58 CEST [itsosj-n1: wafl.vol.add:notice]: Aggregate aggr_itso02 has been added to the system. itsosj-n1> Figure 2-47 Create an Aggregate Finally verify the Aggregate by entering aggr status as we did it in Figure itsosj-n1> aggr status Aggr State Status Options aggr_itso02 online raid4, aggr raidsize=2 aggr0 online raid0, aggr root aggr_test online raid_dp, aggr raidsize=4 itsosj-n1> Figure 2-48 Aggr status command: lists new Aggregate and existing Aggregates Use the sysconfig -r command to list all raid groups and disk in the newly created Aggregate. Figure 2-49 Sysconfig -r lists raid groups and disks for all Aggregates on page 38 is only a section of the sysconfig -r output, as is lists information on all Aggregates, spare Disks, Broken Disk, and Partner Disks. Instead of sysconfig -r you may use the command aggr status aggr_name -r. Chapter 2. N series Filer Administration 37

318 7129_admin_dp_0908.fm Draft Document for Review December 6, :59 am itsosj-n1> sysconfig -r... Aggregate aggr_itso02 (online, raid4) (block checksums) Plex /aggr_itso02/plex0 (online, normal, active) RAID group /aggr_itso02/plex0/rg0 (normal) RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) parity v0.42 v FC:A - FCAL N/A 36/ /87168 data v0.43 v FC:A - FCAL N/A 36/ /87168 RAID group /aggr_itso02/plex0/rg1 (normal) RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) parity v0.44 v FC:A - FCAL N/A 36/ /87168 data v0.45 v FC:A - FCAL N/A 36/ / itsosj-n1> Figure 2-49 Sysconfig -r lists raid groups and disks for all Aggregates Changing RAID type for an Aggregate The following section describes how to change the RAID type of an Aggregate. We changed the agg_itso02 Aggregate (RAID4) to a RAIDDP protected Aggregate. First check the existing RAID type of the Aggregate you can use the aggr status command as shown in Figure 2-50 itsosj-n1> aggr status Aggr State Status Options aggr_itso02 online raid4, aggr raidsize=2 aggr0 online raid0, aggr root itsosj-n1 Figure 2-50 Aggr status lists RAID types for existing Aggregates Change the RAID type using the aggr options command. Use the option RAID type and specify the new RAID type as shown in Figure itsosj-n1> aggr options aggr_itso02 raidtype raid_dp Tue Sep 13 18:02:45 CEST [itsosj-n1: raid.config.raidsize.change:notice]: aggregate aggr_itso02: raidsize is adjusted from 2 to 16 after changing raidtype Aggregate aggr_itso02: raidsize is adjusted from 2 to 16 after changing raidtype. itsosj-n1> Tue Sep 13 18:03:00 CEST [itsosj-n1: raid.rg.recons.missing:notice]: RAID group /aggr_itso02/plex0/rg1 is missing 1 disk(s). Tue Sep 13 18:03:00 CEST [itsosj-n1: raid.rg.recons.info:notice]: Spare disk v0.32 will be used to reconstruct one missing disk in RAID group /aggr_itso02/plex0/rg1. Tue Sep 13 18:03:01 CEST [itsosj-n1: raid.rg.recons.start:notice]: /aggr_itso02/plex0/rg1: starting reconstruction, using disk v0.32 Tue Sep 13 18:03:12 CEST [itsosj-n1: raid.rg.recons.missing:notice]: RAID group /aggr_itso02/plex0/rg0 is missing 1 disk(s). Tue Sep 13 18:03:12 CEST [itsosj-n1: raid.rg.recons.info:notice]: Spare disk v0.33 will be used to reconstruct one missing disk in RAID group /aggr_itso02/plex0/rg0. Tue Sep 13 18:03:12 CEST [itsosj-n1: raid.rg.recons.done:notice]: /aggr_itso02/plex0/rg1: reconstruction completed for v0.32 in 0:11.55 Tue Sep 13 18:03:17 CEST [itsosj-n1: raid.rg.recons.start:notice]: /aggr_itso02/plex0/rg0: starting reconstruction, using disk v0.33 Tue Sep 13 18:03:28 CEST [itsosj-n1: raid.rg.recons.done:notice]: /aggr_itso02/plex0/rg0: reconstruction completed for v0.33 in 0:10.83 itsosj-n1 Figure 2-51 Change raidtype with aggr options command After you started RAID conversion, the Filer changes the raidtype of the Aggregate. This may take a while, depending on Aggregate size and workload. You may use FilerView to verify 38 The IBM TotalStorage Network Attached Storage N Series

319 Draft Document for Review December 6, :59 am 7129_admin_dp_0908.fm progress of the RAID reconstruction. See Figure Open FilerView Storage -> DISKs -> Manage. Figure 2-52 Raid reconstruction in progress Alternatively you may use the command line command aggr status aggr_name -r. See Figure itsosj-n1> aggr status aggr_itso02 -r Aggregate aggr_itso02 (online, raid_dp, reconstruct) (block checksums) Plex /aggr_itso02/plex0 (online, normal, active) RAID group /aggr_itso02/plex0/rg0 (reconstruction 21% completed) RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) dparity v0.33 v0 2 1 FC:A - FCAL N/A 36/ /87168 (reconstruction 21% completed) parity v0.42 v FC:A - FCAL N/A 36/ /87168 data v0.43 v FC:A - FCAL N/A 36/ /87168 RAID group /aggr_itso02/plex0/rg1 (normal) RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) dparity v0.32 v0 2 0 FC:A - FCAL N/A 36/ /87168 parity v0.44 v FC:A - FCAL N/A 36/ /87168 data v0.45 v FC:A - FCAL N/A 36/ /87168 itsosj-n1> Figure 2-53 Aggregate status concerning disks and current tasks You can see the new raidtype of the Aggregate by entering again the aggr status command. see Figure 2-54 Raidtype changed: verification with aggr status command on page 40. Chapter 2. N series Filer Administration 39

320 7129_admin_dp_0908.fm Draft Document for Review December 6, :59 am itsosj-n1> aggr status Aggr State Status Options aggr_itso02 online raid_dp, aggr aggr0 online raid0, aggr root itsosj-n1> Figure 2-54 Raidtype changed: verification with aggr status command Important: Changing RAID types RAID4 to RAIDDP and vice versa affects the number of disk used in your Aggregates. Raid DP uses double, diagonal Parity, therefore in each raid group one more disk is used for Parity calculation. See Chapter 4, N Series Data Protection with RAID DP on page 51 for detailed information about RAID4 and RAIDDP. Again, sysconfig -r lists details about disk in the Aggregates. In our example we changed Aggregate: aggr_itso02 from Raid4 to RAIDDP. sysconfig displays that we have in both raid groups (rg0 and rg1) now three disks (dparity, parity, and data) (Figure 2-55). Instead of sysconfig -r you may use the command aggr status aggr_name -r. itsosj-n1> sysconfig -r... Aggregate aggr_itso02 (online, raid_dp) (block checksums) Plex /aggr_itso02/plex0 (online, normal, active) RAID group /aggr_itso02/plex0/rg0 (normal) RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) dparity v0.33 v0 2 1 FC:A - FCAL N/A 36/ /87168 parity v0.42 v FC:A - FCAL N/A 36/ /87168 data v0.43 v FC:A - FCAL N/A 36/ /87168 RAID group /aggr_itso02/plex0/rg1 (normal) RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) dparity v0.32 v0 2 0 FC:A - FCAL N/A 36/ /87168 parity v0.44 v FC:A - FCAL N/A 36/ /87168 data v0.45 v FC:A - FCAL N/A 36/ / itsosj-n1> Figure 2-55 Verify number of disks after changing the raid type. Destroying Aggregates with command line Aggregates can be easily removed from the Filer. the administrator may se FilerView or command line for this operations. In our environment, we took offline the previously created Aggregate: aggr_itso02. See Figure 2-56 for status information before setting offline the Aggregate. itsosj-n1> aggr status Aggr State Status Options aggr_itso02 online raid_dp, aggr aggr0 online raid0, aggr root itsosj-n1> Figure 2-56 List Aggregates Before an Aggregate can be destroyed, it has to be set offline. See Figure 2-57 Set Aggregate offline on page 41 how to offline an Aggregate. Data ONTAP warns that FSIDs may be duplicate if new Volumes would be created. 40 The IBM TotalStorage Network Attached Storage N Series

321 Draft Document for Review December 6, :59 am 7129_admin_dp_0908.fm itsosj-n1> aggr offline aggr_itso02 Aggregate 'aggr_itso02' is now offline. itsosj-n1> Tue Sep 13 18:34:32 CEST [itsosj-n1: volaggr.offline:critical]: Some aggregates are offline. Volume creation could cause duplicate FSIDs. Figure 2-57 Set Aggregate offline Status of the offline aggr_its002 can be listed by entering aggr status as seen in Figure itsosj-n1> aggr status Aggr State Status Options aggr_itso02 offline raid4, aggr raidsize=2, lost_write_protect=off aggr0 online raid0, aggr root itsosj-n1> Figure 2-58 List status: offline Aggregate aggr_itso02 The aggr destroy command removes the offline Aggregate. Data ONTAP asks for confirmation on delete. If you are sure to delete the particular aggregate proceed and answer y as shown in Figure itsosj-n1> aggr destroy aggr_itso02 Are you sure you want to destroy this aggregate? y Tue Sep 13 18:35:01 CEST [itsosj-n1: raid.config.vol.destroyed:info]: Aggregate 'aggr_itso02' destroyed. Aggregate 'aggr_itso02' destroyed. itsosj-n1> Figure 2-59 Destroy Aggregate Finally lists all Aggregates with aggr status. the particular Aggregate has bee deleted (Figure 2-60) from Filer. tsosj-n1> aggr status Aggr State Status Options aggr0 online raid0, aggr root itsosj-n1> Figure 2-60 Verify Aggregates after destroying Destroying Aggregates with FilerView Proceed as follows in order to destroy aggregates with FilerView. Select Aggregates -> Manage and activate the check box on the left side of the Aggregate you want to destroy and click on the Offline Button. See Figure 2-61 FilerView: Offline an Aggregate on page 42 for an example of our environment. Chapter 2. N series Filer Administration 41

322 7129_admin_dp_0908.fm Draft Document for Review December 6, :59 am Figure 2-61 FilerView: Offline an Aggregate Click Ok on the confirmation popup. (Figure 2-62) Figure 2-62 Offline Aggregate confirmation The status of the aggregate will change to offline. See Figure 2-63 FilerView: aggregate changed to offline state on page The IBM TotalStorage Network Attached Storage N Series

323 Draft Document for Review December 6, :59 am 7129_admin_dp_0908.fm Figure 2-63 FilerView: aggregate changed to offline state Now the Aggregate can be destroyed. check the box on the left side of the aggregate you want to destroy and click on Destroy as shown in Figure Figure 2-64 FilerView: Destroy Aggregate Chapter 2. N series Filer Administration 43

324 7129_admin_dp_0908.fm Draft Document for Review December 6, :59 am A warning Message pops up. Click on OK if you are certain to delete the Aggregate. (Figure 2-65) Figure 2-65 Destroy Aggregate: Warning Message Finally FilerView GUI shows the status of the destroy procedure. (Figure 2-66) Figure 2-66 FilerView: aggregate destroyed successfully N series Volumes System Storage N series Filers provides two different types of volumes: Traditional Volumes Flexible Volumes. A Traditional Volumes is a combination of Volume and Aggregate. That means only this volume occupies the space. Growing a traditional Volume is done by adding one or more disks to the volume. Decreasing of the volume is not possible. With Flexible Volumes (FlexVols), Aggregates and the volumes have been separated to provide more flexibility. FlexVol technology enables the administrator to pool storage resources and manage file systems separately stored in an aggregate. Separate FlexVols can be created for different needs on volumes such as Databases, Volumes which will be accessed from users in different locations or different languages and so on. FlexVols can be 44 The IBM TotalStorage Network Attached Storage N Series

325 Draft Document for Review December 6, :59 am 7129_admin_dp_0908.fm increased and decreased by 4KB or more. FlexVols can even grow automatically (vol autosize) Each volume is handled, and known by Data ONTAP as a file system. Each file system has its own snapshot area. Restrictions: Minimum FlexVol size is 20MB, maximum size of a FlexVol is 16TB. Maximum number of volumes (FlexVols and Traditional Volumes) is 200. Maximum number of FlexVols on clustered Filers with FCP and/or iscsi licensed is 50 per node / 100 per cluster. Number of Volumes in a cluster affects time for takeover / giveback. In order not to get time outs do not exceed this limit. The vol status command (Figure 2-67) lists information on volumes such as name, status, settings, and options. The df command (Figure 2-68) provides information on the file system level of the volumes. itsosj-n1> vol status Volume State Status Options vol0 online raid0, flex root, no_atime_update=on, create_ucode=on, convert_ucode=on, maxdirsize=2621 dsp online raid0, flex create_ucode=on, convert_ucode=on itsosj_vol03 online raid_dp, flex create_ucode=on, convert_ucode=on itsosj-n1> Figure 2-67 Information on volumes (vol status command) itsosj-n1> df Filesystem kbytes used avail capacity Mounted on /vol/vol0/ % /vol/vol0/ /vol/vol0/.snapshot % /vol/vol0/.snapshot /vol/dsp/ % /vol/dsp/ /vol/dsp/.snapshot % /vol/dsp/.snapshot /vol/itsosj_vol03/ % /vol/itsosj_vol03/ /vol/itsosj_vol03/.snapshot % /vol/itsosj_vol03/.snapshot itsosj-n1> Figure 2-68 The df command provides file system information Space guarantee types Before we start to explain how to create Flexible Volumes (FlexVols), first we describe Flexible Volumes space guarantee types. The space guarantee type specifies how Data ONTAP allocates storage space (in an Aggregate) for a FlexVol. Space guarantee types are: Volume At the creation of the Volume, space is allocated from the Aggregate for the entire size of the Volume. Space is not used, but reserved for the specific volume. This is the default setting. None No space allocation at the creation of the volume. Space will be allocated as data is written to the volume. Note that you may run out of space before achieving the volume size. No support for File and LUN space reservations. File No space allocation at the creation of the volume. Rather, space is allocated from the aggregate as data is written to the volume. File and LUN space reservation supported. Chapter 2. N series Filer Administration 45

326 7129_admin_dp_0908.fm Draft Document for Review December 6, :59 am Note: Do not set space guarantee to none for volumes in a CIFS environment, because out of space errors are unexpected in a CIFS environment. Creating Flexible Volumes with command line Flexible Volumes (FlexVols) can be performed using FilerView. or Command line interface. The following naming conventions with FlexVols apply: name must start with a letter or underscore (_) only letters, digits, and underscores allowed maximum of 255 characters The vol create command will be used to create a new Volume on the Filer. See Figure itsosj-n1> vol create vol create: No volume name supplied. usage: vol create <vol-name> { [-l <language-code>] [-s {none file volume}] <hosting-aggr-name> <size>[k m g t] [-S remotehost:remotevolume] } { [-f] [-l <language-code>] [-m] [-n] [-L [compliance enterprise]] [-r <raid-group-size>] [-t {raid4 raid_dp}] <disk-list> } - create a new volume, either a flexible volume from an existing aggregate, or a traditional volume from a disk list. A disk list is either <ndisks>[@<disk-size>] or -d <disk-name1> <disk-name2>... <disk-namen> [-d <disk-name1> <disk-name2>... <disk-namen>]. itsosj-n1> Figure 2-69 Vol create options Required information for FlexVols are: Name of the volume Size of the volume (number, optionally followed by k, m, g, or t, denoting kilobytes, megabytes, gigabytes, or terabytes respectively. Is k, m, g or t is missing, specified size will be considered as bytes. Aggregate name where the volume will be created Optional settings options are: space guarantee type (- s option followed by volume, none, or file) language (-l option followed by the language) default is same language as root volume The vol lang command lists all supported languages. (Figure 2-70 Vol lang command on page 47) 46 The IBM TotalStorage Network Attached Storage N Series

327 Draft Document for Review December 6, :59 am 7129_admin_dp_0908.fm vol lang Supported language codes are: C (POSIX) ar (Arabic) cs (Czech) da (Danish) de (German) en (English) en_us (English (US)) es (Spanish) fi (Finnish) fr (French) he (Hebrew) hr (Croatian) hu (Hungarian) it (Italian) ja (Japanese euc-j*) ja_v1 (Japanese euc-j) ja_jp.pck (Japanese PCK(sjis)*) ja_jp.932 (Japanese cp932*) ja_jp.pck_v2 (Japanese PCK(sjis)) ko (Korean) no (Norwegian) nl (Dutch) pl (Polish) pt (Portuguese) ro (Romanian) ru (Russian) sk (Slovak) sl (Slovenian) sv (Swedish) tr (Turkish) zh (Simplified Chinese) zh.gbk (Simplified Chinese (GBK)) zh_tw (Traditional Chinese euc-tw) zh_tw.big5 (Traditional Chinese Big 5) To use UTF-8 as the NFS character set append '.UTF-8' Language codes flagged with "*" are obsolete versions of those language character sets. itsosj-n1> Figure 2-70 Vol lang command Consider the language format if you are using on the same volume: NFS (v2 or v3) and CIFS Chose same language as clients using NFS v4 (with or without CIFS) Set language of the volume to client_lang.utf-8 (client_lang is the language of the clients) If you use NFS (v2 or v3) only, language settings does not matter. In the following example we will create a 20MB volume in the aggr_itso02 Aggregate. Space guarantee will be set to volume. First verify the new volume does not exist on your Filer. (Figure 2-71) itsosj-n1> vol status Volume State Status Options vol0 online raid0, flex root, no_atime_update=on, create_ucode=on, convert_ucode=on, maxdirsize=2621 dsp online raid0, flex create_ucode=on, convert_ucode=on itsosj-n1> Figure 2-71 List all volumes Chapter 2. N series Filer Administration 47

328 7129_admin_dp_0908.fm Draft Document for Review December 6, :59 am Use aggr status to list all Volumes, verify the Aggregate which should hold the new Volume is online, and check the size of the Aggregate. See Figure itsosj-n1> aggr status Aggr State Status Options aggr0 online raid0, aggr root aggr_itso02 online raid_dp, aggr itsosj-n1> itsosj-n1> aggr status aggr_itso02 -r Aggregate aggr_itso02 (online, raid_dp) (block checksums) Plex /aggr_itso02/plex0 (online, normal, active) RAID group /aggr_itso02/plex0/rg0 (normal) RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) dparity v0.33 v0 2 1 FC:A - FCAL N/A 36/ /87168 parity v0.42 v FC:A - FCAL N/A 36/ /87168 data v0.43 v FC:A - FCAL N/A 36/ /87168 RAID group /aggr_itso02/plex0/rg1 (normal) RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) dparity v0.32 v0 2 0 FC:A - FCAL N/A 36/ /87168 parity v0.44 v FC:A - FCAL N/A 36/ /87168 itsosj-n1> Figure 2-72 Verify Aggregates Create a new Volume (vol create command) on the appropriate Aggregate. (Figure 2-73) itsosj-n1> vol create itsosj02_vol aggr_itso02 20m Creation of volume 'itsosj02_vol' with size 20m on containing aggregate 'aggr_itso02' has completed. itsosj-n1> Figure 2-73 Create a new volume via command line interface The -l option specifies the language format. In the Example Figure 2-74 we used de. itsosj-n1> vol create itsosj_vol03 -l de aggr_itso02 20M Wed Sep 14 17:22:34 CEST [itsosj-n1: vv_config_worker:alert]: Language on volume itsosj_vol03 changed to de The new language mappings will be available after reboot Wed Sep 14 17:22:34 CEST [itsosj-n1: vv_config_worker:notice]: XL - Language of Volume itsosj_vol03 has been changed to de. Creation of volume 'itsosj_vol03' with size 20m on containing aggregate 'aggr_itso02' has completed. itsosj-n1> Figure 2-74 Create a new volume via command line interface with -l option After the Volume was created successfully, it will appear in the list generated by the vol status command. Detailed information can be obtained by entering the -v option (Figure 2-75 Show new volume on page 49). 48 The IBM TotalStorage Network Attached Storage N Series

329 Draft Document for Review December 6, :59 am 7129_admin_dp_0908.fm itsosj-n1> vol status Volume State Status Options vol0 online raid0, flex root, no_atime_update=on, create_ucode=on, convert_ucode=on, maxdirsize=2621 dsp online raid0, flex create_ucode=on, convert_ucode=on itsosj02_vol online raid_dp, flex create_ucode=on, convert_ucode=on itsosj-n1> itsosj-n1> vol status itsosj02_vol -v Volume State Status Options itsosj02_vol online raid_dp, flex nosnap=off, nosnapdir=off, minra=off, no_atime_update=off, nvfail=off, snapmirrored=off, create_ucode=on, convert_ucode=on, maxdirsize=1310, fs_size_fixed=off, guarantee=volume, svo_enable=off, svo_checksum=off, svo_allow_rman=off, svo_reject_errors=off, no_i2p=off, fractional_reserve=100, extent=off, try_first=volume_grow Containing aggregate: 'aggr_itso02' itsosj-n1> Plex /aggr_itso02/plex0: online, normal, active RAID group /aggr_itso02/plex0/rg0: normal RAID group /aggr_itso02/plex0/rg1: normal Figure 2-75 Show new volume Note: With the default setting space guarantee for a FlexVol the space allocated for a volume must be less then the free space of an Aggregate. Use the aggr status aggr_name -r command to list free aggregate space. Space guarantee may be changed later with the vol options guarantee <file volume none> command. Creating Flexible Volumes with FilerView Instead of using the command line an administrator may use FilerVew for creating Volumes as well. First open FilerView in you Browser and click on Volumes -> Add as shown in Figure 2-76 FilerView: add volumes on page 50. Chapter 2. N series Filer Administration 49

330 7129_admin_dp_0908.fm Draft Document for Review December 6, :59 am Figure 2-76 FilerView: add volumes Choose Next to create a new Volume on your Filer (Figure 2-77). Figure 2-77 Add Volume Wizard Popup FilerView add volume wizard supports creating Traditional or Flexible volumes. We are creating a Flexible Volume in tis example. See Figure 2-78 Add Volumes Wizard: Flexible or Traditional Volumes on page The IBM TotalStorage Network Attached Storage N Series

331 Draft Document for Review December 6, :59 am 7129_admin_dp_0908.fm Figure 2-78 Add Volumes Wizard: Flexible or Traditional Volumes Figure 2-79 shows the Volume name we choose and the Language settings. Figure 2-79 Add Volumes Wizard: Choose name and language for new volume Then we choose the aggregate in which our volume will reside, specified the volume size, and the space guarantee (we used: volume). See Figure 2-80 Add VOlumes Wizard: choose Aggregate, size, and space guarantee on page 52. Chapter 2. N series Filer Administration 51

332 7129_admin_dp_0908.fm Draft Document for Review December 6, :59 am Figure 2-80 Add VOlumes Wizard: choose Aggregate, size, and space guarantee FilerView shows a summary (Figure 2-81) on the Volume which will be created in Figure 2-82 Add VOlumes Wizard: Volume successfully added on page 53. Choose Commit, then Close the Message which states volume successfully created. If you get an error Message, check your settings. The space in you aggregate may not be enough to create the new Volume or the new volume falls below or exceeds minimum or maximum volume sizes. Figure 2-81 Add VOlumes Wizard: Commit changes to Filer 52 The IBM TotalStorage Network Attached Storage N Series

333 Draft Document for Review December 6, :59 am 7129_admin_dp_0908.fm Figure 2-82 Add VOlumes Wizard: Volume successfully added Finally check the new volume in FilerView by selecting Volumes -> Manage. The new Volume will show up in this view after it was successfully created. See Figure Figure 2-83 Add VOlumes Wizard: New volume in Volumes-> Manage List affiliation of FlexVols and Aggregates The command vol container volume_name lists the volume and its containing Aggregate. See Figure itsosj-n1> vol container vol0 Volume 'vol0' is contained in aggregate 'aggr0' itsosj-n1> vol container itsosj_vol03 Volume 'itsosj_vol03' is contained in aggregate 'aggr_itso02' itsosj-n1> Figure 2-84 vol container command Destroying Volumes If a volume is no longer needed the administrator can delete the volume from the Filer. Data disks will change into spare disks. Allocated disk space becomes available for other or new volumes in the Aggregate. Attention: Data in a volume is lost if a volume is destroyed. Chapter 2. N series Filer Administration 53

334 7129_admin_dp_0908.fm Draft Document for Review December 6, :59 am The vol status command shows the current status before off-lining the Volume. See Figure itsosj-n1> vol status Volume State Status Options vol0 online raid0, flex root, no_atime_update=on, create_ucode=on, convert_ucode=on, maxdirsize=2621 dsp online raid0, flex create_ucode=on, convert_ucode=on itsosj_vol03 online raid_dp, flex create_ucode=on, convert_ucode=on itsosj-n1> vol Figure 2-85 List status before off-lining the volume Enter vol offline volume_name before destroying the volume. (Figure 2-86) itsosj-n1> vol offline itsosj_vol03 Wed Sep 14 19:38:40 CEST [itsosj-n1: cifs.terminationnotice:warning]: CIFS: shut down completed: CIFS is disabled for volume itsosj_vol03. Wed Sep 14 19:38:41 CEST [itsosj-n1: wafl.vvol.offline:info]: Volume 'itsosj_vol03' has been set temporarily offline Volume 'itsosj_vol03' is now offline. itsosj-n1> Figure 2-86 Offline the volume before destroying Again, the vol status list the status on the previously off-lined volume.(figure 2-87) itsosj-n1> vol status Volume State Status Options vol0 online raid0, flex root, no_atime_update=on, create_ucode=on, convert_ucode=on, maxdirsize=2621 dsp online raid0, flex create_ucode=on, convert_ucode=on itsosj_vol03 offline raid_dp, flex itsosj-n1> Figure 2-87 Vol status shows the offline volume Now the volume can be destroyed. All data will be lost and users will not be able to access data anymore. See Figure 2-88 how to destroy a volume. Data ONTAP asks for confirmation on destroying the volume. Answer Y if you certain to destroy the volume. itsosj-n1> vol destroy itsosj_vol03 Are you sure you want to destroy this volume? y Volume 'itsosj_vol03' destroyed. itsosj-n1> Figure 2-88 Destroy a volume When a volume was destroyed and the nfs.export.auto-update is set to on, NFS export informations are updated automatically. That means vol destroy deletes the entry from /etc/export file. See Figure 2-89 for the output of the options nfs.export.auto-update and exportfs command after destroying the volume. In our environment, we destroyed volume itsosj_vol03 before. If CIFS is running, all shares on the destroyed volume will be deleted. 54 The IBM TotalStorage Network Attached Storage N Series

335 Draft Document for Review December 6, :59 am 7129_admin_dp_0908.fm itsosj-n1> options nfs.export.auto-update nfs.export.auto-update on itsosj-n1> itsosj-n1> exportfs /vol/vol0/home -sec=sys,rw,nosuid /vol/dsp -sec=sys,rw,nosuid /vol/vol0 -sec=sys,rw,anon=0,nosuid itsosj-n1> Figure 2-89 Exported volumes Update the clients regarding the deletion of the volume. Especially mount point information in /etc/fstab or /etc/vfstab on NFS clients Qtrees Shares CIFS op;ocks CREATE and MANAGE SHARES (CIFS AND NFS) 2.6 Cluster Management Checking cluster status Cluster status may be checked with the cf status or cf monitor command (Figure 2-90) or by using FilerView Cluster -> Manage (Figure 2-91 FilerView: Cluster Manage on page 56). itsosj-n1> cf status Cluster enabled, itsosj-n2 is up. itsosj-n1> itsosj-n1> cf monitor current time: 14Sep :04:05 UP 02:49:33, partner 'itsosj-n2', cluster monitor enabled Interconnect is up, takeover capability on-line partner update TAKEOVER_ENABLED (14Sep :04:04) itsosj-n1> Figure 2-90 cf status and cf monitor Chapter 2. N series Filer Administration 55

336 7129_admin_dp_0908.fm Draft Document for Review December 6, :59 am Figure 2-91 FilerView: Cluster Manage Takeover In order to takeover resources from one node to another node, enter the cf takeover command or FilerView Cluster -> Manage -> Initiate Takeover. Attention: Taking over resources will have impact on the client environment. Especially Windows users and shares (CIFS services) will be affected by this procedure. Note: Issue the cf takeover command on the node which will remain operating and will take over resources of the other node. In our example we took node itsosj_n2 offline by issuing cf takeover on node itsosj_n1. First check cluster status with the cf status command. See Figure itsosj-n1> cf status Cluster enabled, itsosj-n2 is up. itsosj-n1> Figure 2-92 cf status: check status Takeover by entering cf takeover as shown in Figure 2-93 cf takeover command on page The IBM TotalStorage Network Attached Storage N Series

337 Draft Document for Review December 6, :59 am 7129_admin_dp_0908.fm itsosj-n1> cf takeover cf: takeover initiated by operator itsosj-n1> Wed Sep 14 00:07:11 CEST [itsosj-n1: cf.misc.operatortakeover:warning]: Cluster monitor: takeover initiated by operator Wed Sep 14 00:07:11 CEST [itsosj-n1: cf.fsm.nfo.accepttakeoverreq:warning]: Negotiated failover: accepting takeover request by partner, reason: operator initiated cf takeover. Asking partner to shutdown gracefully; will takeover in at most 180 seconds. Wed Sep 14 00:07:14 CEST [itsosj-n1: cf.fsm.firmwarestatus:info]: Cluster monitor: partner rebooting Wed Sep 14 00:07:14 CEST [itsosj-n1: cf.fsm.nfo.partnershutdown:warning]: Negotiated failover: partner has shutdown Wed Sep 14 00:07:14 CEST [itsosj-n1: cf.fsm.takeover.nfo:info]: Cluster monitor: takeover attempted after cf takeover command Wed Sep 14 00:07:14 CEST [itsosj-n1: cf.fsm.statetransit:warning]: Cluster monitor: UP --> TAKEOVER Wed Sep 14 00:07:14 CEST [itsosj-n1: cf.fm.takeoverstarted:warning]: Cluster monitor: takeover started Wed Sep 14 00:07:16 CEST [itsosj-n1: cf_takeover:info]: NVRAM takeover: partner nvram is disabled Wed Sep 14 00:07:17 CEST [itsosj-n2/itsosj-n1: wafl.vol.loading:debug]: Loading Volume partner:vol_itsosj01 Wed Sep 14 00:07:17 CEST [itsosj-n2/itsosj-n1: wafl.vol.loading:debug]: Loading Volume partner:vol0 Wed Sep 14 00:07:18 CEST [itsosj-n2/itsosj-n1: wafl.maxdirsize.boot.notice:warning]: partner:vol0: This volume's maxdirsize (2621KB) is higher than the default (1310KB). There may be a performance penalty when doing operations on large directories. Replaying takeover WAFL log Wed Sep 14 00:07:18 CEST [itsosj-n2/itsosj-n1: wafl.vol.guarantee.fail:error]: Space for volume vol_itsosj01 is NOT guaranteed Wed Sep 14 00:07:18 CEST [itsosj-n2/itsosj-n1: wafl.takeover.nvram.missing:error]: WAFL takeover: no partner area found during wafl replay Wed Sep 14 00:07:18 CEST [itsosj-n2/itsosj-n1: wafl.replay.done:info]: WAFL log replay completed, 0 seconds Wed Sep 14 00:07:20 CEST [itsosj-n2/itsosj-n1: cf_takeover:alert]: Language not set on volume vol0. Using language config "C". Use vol lang to set language. ifconfig: 'ns0' cannot be configured: Address does not match any partner interface. ifconfig: ns0: no such interface add net default: gateway : network unreachable Wed Sep 14 00:07:20 CEST [itsosj-n2/itsosj-n1: net.ifconfig.nopartner:error]: ifconfig: 'ns0' cannot be configured: Address does not match any partner interface. Wed Sep 14 00:07:23 CEST [itsosj-n1: net.ifconfig.takeovererror:warning]: WARNING: 1 error detected during network takeover processing WARNING: Some network clients may not be able to access the cluster during takeover Wed Sep 14 00:07:23 CEST [itsosj-n1: cf.rsrc.takeoveropfail:error]: Cluster monitor: takeover during ifconfig_2 failed; takeover continuing... CIFS partner server is running. Wed Sep 14 00:07:23 CEST [itsosj-n1 (takeover): cf.rsrc.transittime:notice]: Top Takeover transit times wafl=2240, registry_postrc_phase1=1630, rc=990 {options=570, hostname=160, options=10}, raid=720, wafl_sync=500, ifconfig=240, registry_prerc=220, raid_replay=180, sshd=160, syslog=120 Wed Sep 14 00:07:23 CEST [itsosj-n1 (takeover): cf.fm.takeovercomplete:warning]: Cluster monitor: takeover completed Wed Sep 14 00:07:33 CEST [itsosj-n2/itsosj-n1: nbt.nbns.registrationcomplete:info]: NBT: All CIFS name registrations have completed for the partner server. itsosj-n1(takeover)> Wed Sep 14 00:08:01 CEST [itsosj-n1 (takeover): monitor.globalstatus.critical:critical]: This node has taken over itsosj-n2. Disk on adapter v0, shelf 2, bay 6, failed by administrator. Disk on adapter v0, shelf 2, bay 4, failed by administrator. Disk on adapter v0, shelf 1, bay 3, failed by administrator. Wed Sep 14 00:08:01 CEST [itsosj-n2/itsosj-n1: monitor.globalstatus.critical:critical]: itsosj-n1 has taken over this node. itsosj-n1(takeover)> Figure 2-93 cf takeover command Finally check status (See Figure 2-94) itsosj-n1 has taken over itsosj-n2. Takeover due to negotiated failover, reason: operator initiated cf takeover itsosj-n1(takeover)> Figure 2-94 cf status: verification if takeover completed See Figure 2-95 Cluster takeover initiated by FilerView on page 58 for how we did the takeover with FilerView. Click on OK when FilerView Chapter 2. N series Filer Administration 57

338 7129_admin_dp_0908.fm Draft Document for Review December 6, :59 am Figure 2-95 Cluster takeover initiated by FilerView After clicking on Refresh you will get the most current cluster status. (Figure 2-96) Figure 2-96 FilerView: Cluster status Giveback In order to give back resources, issue the cf giveback command. (Figure 2-97 cf giveback on page 59) 58 The IBM TotalStorage Network Attached Storage N Series

339 Draft Document for Review December 6, :59 am 7129_admin_dp_0908.fm itsosj-n1(takeover)> cf giveback Wed Sep 14 00:10:15 CEST [itsosj-n1 (takeover): cf.misc.operatorgiveback:info]: Cluster monitor: giveback initiated by operator Wed Sep 14 00:10:15 CEST [itsosj-n1: cf.fm.givebackstarted:warning]: Cluster monitor: giveback started itsosj-n1(takeover)> CIFS partner server is shutting down... CIFS partner server has shut down... Wed Sep 14 00:10:17 CEST [itsosj-n1: cf.rsrc.transittime:notice]: Top Giveback transit times wafl=1580, ndmpd=560, raid=80, registry_giveback=20, java=10, snapmirror=10, nfsd=10, fmfsm_reserve=0, fmdisk_inventory=0, raid_disaster_early=0 Wed Sep 14 00:10:17 CEST [itsosj-n1: cf.fm.givebackcomplete:warning]: Cluster monitor: giveback completed Wed Sep 14 00:10:17 CEST [itsosj-n1: cf.fsm.statetransit:warning]: Cluster monitor: TAKEOVER --> UP Wed Sep 14 00:10:18 CEST [itsosj-n1: cf.fsm.takeoverbypartnerdisabled:notice]: Cluster monitor: takeover of itsosj-n1 by itsosj-n2 disabled (unsynchronized log) Wed Sep 14 00:10:18 CEST [itsosj-n1: cf.fsm.firmwarestatus:info]: Cluster monitor: partner rebooting Wed Sep 14 00:10:26 CEST [itsosj-n1: cf.fsm.partnernotresponding:notice]: Cluster monitor: partner not responding Wed Sep 14 00:10:29 CEST [itsosj-n1: cf.fm.timemasterstatus:info]: Acting as cluster time slave Wed Sep 14 00:10:30 CEST [itsosj-n1: cf.fsm.partnerok:notice]: Cluster monitor: partner ok Wed Sep 14 00:10:30 CEST [itsosj-n1: cf.fsm.takeoverofpartnerdisabled:notice]: Cluster monitor: takeover of itsosj-n2 disabled (partner booting) Wed Sep 14 00:10:33 CEST [itsosj-n1: cf.fsm.takeoverofpartnerdisabled:notice]: Cluster monitor: takeover of itsosj-n2 disabled (unsynchronized log) Wed Sep 14 00:10:35 CEST [itsosj-n1: cf.fsm.takeoverofpartnerenabled:notice]: Cluster monitor: takeover of itsosj-n2 enabled Wed Sep 14 00:10:35 CEST [itsosj-n1: cf.fsm.takeoverbypartnerenabled:notice]: Cluster monitor: takeover of itsosj-n1 by itsosj-n2 enabled itsosj-n1> Figure 2-97 cf giveback Cluster status can be obtained by issuing the cf status command (Figure 2-98) itsosj-n1> cf status Cluster enabled, itsosj-n2 is up. itsosj-n1> Figure 2-98 cf status: check for successful giveback See Figure 2-99 FilerView: Initiate Giveback on page 60 for how we did the cluster giveback with FilerView. Chapter 2. N series Filer Administration 59

340 7129_admin_dp_0908.fm Draft Document for Review December 6, :59 am Figure 2-99 FilerView: Initiate Giveback After refreshing the View, cluster status should be Cluster enabled, itsosj-n2 is up as shown in Figure Figure FilerView: Status after Refresh 2.7 Managing snapshots TBD CREWATE USE SnapShot 60 The IBM TotalStorage Network Attached Storage N Series

345 Draft Document for Review December 6, :59 am sizing.fm 3 Chapter 3. Pre-installation Planning This chapter discusses some of the steps involved in sizing an N Series server to a customer s environment. Sizing involves taking into account: Performance and throughput Capacity planning Effects of Optional features Future expansion Application considerations Backup and Recovery Resiliency to Failure All of these items will be discussed briefly here. Any one of these subjects is capable of its own standalone documentation and much of this already exists. So only high-level planning steps will be presented. 3.1 Primary issues affecting planning We start with the understanding that a decision has been reached to utilize an N series for storage. A good decision on the customer s part! During the decision process it was likely discussed which model N series to go with, amount of storage required on the N series, optional features desired, and future expansion requirements. Most all of these are discussed elsewhere in this RedBook but a quick summary is likely helpful Performance and throughput The performance required from the storage subsystem is usually driven by the number of client systems relying on the N series for storage service and the demands of the applications running on those client systems. The actual response time and throughput capabilities of Copyright IBM Corp All rights reserved. 65

346 sizing.fm Draft Document for Review December 6, :59 am each N series model is available so once the throughput and response time requirements of the customer s environments are known, the model and number of N series appliances can be easily determined. Keep in mind that performance does involve a balance of all of the following: performance of a particular N series model number of disks used for a particular workload type of disks used how close to capacity are we running our disks number of network interfaces in use protocols used for storage access workload mix - reads vs. writes vs. lookups, etc. background tasks running on the storage server (e.g., Snapmirror) With special note on this last item, it is a good idea to always size a storage server to have some reserve capacity beyond what is expected to be its normal workload Capacity planning One of the key measurements of a storage server its the amount of storage it provides. Vendors and installers of storage servers generally deal with raw storage capacities. The end-user, of course, is generally only concerned with available capacity. Ensuring that the gap is bridged between raw and usable capacity will minimize surprises both at install time and in the future. Raw capacity is determined by taking the number of disks connected and multiplying by their capacity. So fourteen disks (the maximum number provided in the N series disk shelves) times (for instance) 72GB per drive results in a raw capacity of approximately 1000 GB or 1TB. The usable capacity is determined by factoring out where some of this raw capacity goes to support the infrastructure of the storage system. This includes space used for operating system information, disk drive formatting, file system formatting, RAID protection overhead, spare disk allocation, mirroring overhead, and space used by the Snapshot protection mechanism. Let s work through an example of where the storage would go in our example 14 x 72GB drive system. Overhead capacity usually gets utilized as follows: space disk protection against a disk failure. It is good practice to allocate spare disk drives to every system. These are utilized in the event a disk drive does fail (and the do!) so that the data on the failed drive can automatically be rebuilt without any operator intervention or downtime. For a 14 drive system, minimally acceptable practice would be to allocate one spare drive. RAID-DP When a drive does fail, it is the RAID information that allows the lost data to be recovered. With the N series, the maximum protection against lost is provided by using the RAID-DP facility. If capacity is at a premium and reliability not critical, RAID4 can be utilized instead. For a 14 drive system, minimally acceptable practice would be to allocate one drive to RAID4. Better practice would be to allocate two drives to RAID-DP. 66 The IBM TotalStorage Network Attached Storage N Series

347 Draft Document for Review December 6, :59 am sizing.fm When setting up the disks, RAID groups must be defined. A RAID group is a set of data disks protected by one (RAID4) or two (RAID-DP) parity disks. RAID groups are then combined to create storage aggregates which subsequently have volumes (also referred to as filesystems) or LUNs allocated on them. Normal practice would be to take the 11 remaining disks and treat them all as data disks creating a single large aggregate. At this point in our example we have allocated all fourteen available disks. This allocation went: Spare disk drive - count = 1 RAID parity disk(s) - count = 2 (RAID-DP) Data disks - count = 11 We have now lost about 25% of our raw capacity to hardware protection issues. The remaining usable capacity becomes less deterministic from this point because of ever-increasing numbers of variables but a few firm guidelines are still available. One near-constant that remains is that you will lose approximately 5% of a disk s raw capacity when it is formatted. This results in our example 72GB disk actually holding only about 68GB of data. Another overhead factor is imposed by the filesystem. The filesystem used by the N series, WAFL, has less overhead than many filesystems but it still exists. Generally WAFL uses an additional 10% of the formatted capacity of a drive. So our nice big example 72GB disk drives are now down to only a bit over 60GB before we put any user data on them at all! If at this point in our example we take our 11 data drives and allocate them to a single large volume we will find that the resulting capacity is approximately 600GB. Finally, we have to consider the overhead imposed by Snapshot protection. Snapshot is a built-in capability that does not utilize any space until it is actually used buts its use will eat into the apparent usable capacity of the storage server. It is common to run a storage server with 30% Snapshot space reserved. This is space that appears to be unavailable to user storage (but can be easily adjusted when necessary). Running with this 30% overhead will further reduce our 600GB usable storage to a more realistic 400GB. So when a customer ultimately gets their storage server installed and configured using something close to this example and they run the server s df command, they will likely see returned a usable capacity in the 400GB range. Returning to our focus on reconciling usable storage to raw storage suggests early planning that provides for just under 50% of raw capacity to be ultimately available for storing user data Effects of Optional features A handful of optional features will effect the early planning required. Most notably the heavy use of the Snapmirror option consumes large amounts of CPU resources. These resources are directly removed from the pool available for serving user/application data resulting in what appears to be an overall reduction in performance. Snapmirror can also eat into available disk I/O bandwidth and network bandwidth as well. So if heavy, constant use of Snapmirror is planned, these factors should be adjusted accordingly. Chapter 3. Pre-installation Planning 67

348 sizing.fm Draft Document for Review December 6, :59 am Future expansion Many of the resources of the storage server can be expanded on the fly. However, early planning can make this even easier and less disruptive if considered from the start. Adding disk drives is one simple example. The disk drives and shelves themselves are all hot-pluggable so can be added or replaced without service disruption. But take the example of where all available space in a rack is used by completely full disk shelves. When this point is reached, how does a disk drive get added? Where possible, a good practice is to try to avoid fully populating disk shelves from the very beginning. It is much more flexible to install a new storage server with two half-full disk shelves attached to it rather than a single full shelf. The added cost is generally minimal and quickly recovered the first time additional disks are added. Similar consideration can be put to allocating network resources. For instance, if a storage server has two available Gigabit Ethernet interfaces it is a good practice to install and configure both interfaces from the beginning. Commonly one interface will be configured for actual production use and one configured as a standby in case of failure although it is also possible (given a network environment that supports this) to configure both interfaces to be in use and providing mutual failover protection to each other. This provides additional insurance since both interfaces are constantly in use rather than finding the standby interface is actually broken at the time you need it most (at time of failure). Overall, it is valuable during the planning and deployment phase to consider how the environment might change into the future and engineering in some consideration and flexibility from the very beginning Application considerations Different applications and environments put different workloads on the storage server. In this section we ll discuss a few considerations that a best addressed early in the planning and installation phases. Home directories, desktop serving This is a traditional application for Network Attached storage solutions. Since lots of different clients are attached to one-or-more servers there is little possibility to effectively plan and model in advance of actual deployment. But a few common-sense considerations can help. First, this environment is generally characterized by the use of NFS or CIFS protocols. Second, it is generally accessed using Ethernet with TCP/IP as the primary access mechanisms. Third, the mix of reading and writing is heavily tipped towards the reading side. Uptime requirements are generally less than enterprise application situations so scheduled downtime for maintenance is not too difficult. In this environment, the requirements for redundancy and maximum uptime are reduced. Data writing throughput importance is also lessened. More important is the protection offered by Snapshot facilities to protect end-user s data and provide for rapid recovery in case of accidental deletion or corruption. For instance, viruses can disrupt this type of environment more readily than an environment serving applications like Oracle or SAP. Load balancing in this environment often takes the form of moving specific home directories from one storage server to another or moving client machines from one subnet to another. Effective ahead-of-time planning is difficult and the best planning takes into account that the production environment will be dynamic and therefore flexibility will be key. It is especially important in this environment to initially install with the maximum flexibility in 68 The IBM TotalStorage Network Attached Storage N Series

349 Draft Document for Review December 6, :59 am sizing.fm mind from the very beginning. These environment also tend to make use of a large number of Snapshot images to maximize the protection offered to user Enterprise applications Previously the domain of direct attached storage (DAS) architectures, it is becoming ever more common to deploy enterprise applications utilizing storage servers. These environments have significantly different requirements to the home directory environment. It is common for the emphasis to be on performance, uptime, and backup rather than flexibility and individual file recovery. Commonly these environments utilize a blocks protocol such as ISCSI or FCP since these mimic DAS more closely than the use of NAS technologies. However, increasingly the advantages and flexibility provided by NAS solutions have been drawing more attention. Rather than designing to serve individual files, the configuration focuses on LUNs or the use of files as if they were LUNs. An example of this latter would be a database application that uses files for its storage instead of LUNs. At its most fundamental, the database application doesn t treat I/O to files any differently than it does to LUNs allowing the customer to choose the deployment that provides the combination of flexibility and performance that they require. Enterprise environments are usually deployed with their storage servers clustered. This minimizes the possibility of a service outage caused by a failure of the storage appliance. In clustered environments there is always the opportunity to spread workload across at least two active storage servers so getting good throughput for the enterprise application is generally not difficult. Of course, this assumes that the application administrator has a good idea of where the workloads are concentrated in the environment so that beneficial balancing can be accomplished. Clustered environments always have multiple I/O paths available so its important to balance the workload across these I/O paths as well as across server heads. Finally, for mission-critical environments, its important to plan for the worst-case scenario, running their enterprise when one of the storage servers has failed and the remaining single unit has to provide the entire load. In most circumstances, the mere fact that the enterprise is running in spite of a significant failure is viewed as positive but there are some situations where the full performance expectation must be met even after a failure. In this case, the size of the storage servers must be sized accordingly. Block protocols with ISCSI or FCP are also common. The use of a small number of files or LUNs to support the enterprise application means that the distribution of the workload is relatively easy to install and predict. Backup servers Protecting and archiving critical corporate data is of increasing interest each passing day. Deploying servers for this purpose is becoming more common and these situations call for their own planning guidelines. A backup server generally is not planned for delivering great performance. Data center managers rely more on the fact that the backup server is available to receive the backup streams when they are sent. Ofttimes the backup server is an intermediate repository for data before it goes to backup tape and ultimately offsite but frequently the backup server is taking the place of backup tapes. The write-throughput of a backup server is frequently the most important factor to consider in planning. Another important factor is the number of simultaneous backup streams that a single server can handle. The more effective the write throughput and the greater number of simultaneous threads, the more rapid backup processes complete and the production servers taken out of backup mode and returned to full performance. Chapter 3. Pre-installation Planning 69

350 sizing.fm Draft Document for Review December 6, :59 am Different N series platforms have different capabilities in each of these areas and the planning process should take these characteristics into account to ensure that the backup server is capable of the workload expected Backup and Recovery In addition to backup servers, all storage servers have to be backed up. Generally the goal is to have the backup process occur at a time and in a way that minimizes the impact on overall production. That is why its common to find large numbers of backup processes scheduled to run off-hours. But since all these backups run more-or-less at the same time, the greatest I/O load put on the storage environment frequently is during these backup activities instead of during normal production. N Series servers have a number of backup mechanisms available. Other sections of this report discuss the use of Snapshot technologies for backup as well as Snapmirror. Planning ahead of time the proper use (or usually a combination of uses) will allow an environment to be deployed that provides maximum protection against failure while at the same time making the most use out of both the storage and performance capabilities present. Issues to keep in mind include: Storage capacity used by Snapshots How much extra storage needs to be available for Snapshots to consume Networking bandwidth consumed by Snapmirror In addition to the production storage I/O paths, Snapmirror needs bandwidth to duplicate data to the remote server Number of possible simultaneous Snapmirror threads How many parallel backup operations can be run at once before some resource runs out? Resources to consider include CPU cycles, network throughput, maximum parallel threads (platform dependent), amount of data requiring transfer. Frequency of Snapmirror operations The more frequently data is synchronized, the fewer the number of changes each time. More frequent operations result in background operations running almost all the time. Rate at which stored data is modified Data that is not changing much (e.g., archive repositories) need not be synchronized as often and each operation takes less time. Use and effect of third party backup facilities (e.g., Tivoli) Each third party backup tool has its unique I/O impacts that need to be accounted for. Data synchronization requirements of enterprise applications Certain applications (e.g., DB2, Oracle, Microsoft Exchange) need to quiesced and flushed prior to performing backup operations to ensure data consistency of backed up data images Resiliency to Failure Like all data processing equipment, storage devices will fail. Most times the failure is of small, uncritical pieces that have redundancy (disks, networks, fans, power supplies, etc.) which generally have only a small impact (usually none at all) on the production environment. But unforeseen problems can cause rare and infrequent outages of entire storage servers. Most common problems are software problems that occur inside the storage server or infrastructure errors (DNS, routing tables, etc.) that prevent access to the storage server; if a storage server is running but can t be accessed the effect on the enterprise is pretty much the same as it being completely out of service. 70 The IBM TotalStorage Network Attached Storage N Series

351 Draft Document for Review December 6, :59 am sizing.fm Designing 100% reliable configurations is difficult, time-consuming, and costly. Its likely more effective to strike a compromise that minimizes the likelihood of error while providing a mechanism to get the server back into service as quickly as possible. In other words, accept the fact that failures will occur but have a plan ready (and practiced) ahead of time to recover from them when they do occur. Spare servers Some enterprises keep spare equipment around in case of failure. Generally this is the most expensive solution and only practical for the largest enterprises. But a often overlooked situation that is similar to this is the installation of new servers. Additional or replacement equipment is generally always being brought into most data environments. Bringing this equipment in a bit early and using it as spare or test equipment is a good idea wherever possible. Storage administrators are given an opportunity to practice new procedures and configurations as well as test new software without having to do so on production equipment. Software upgrades Frequently it is the process of upgrading software that causes unexpected outages. A good general rule is that software on storage servers should not be upgraded if they are providing acceptable and reliable service. Only in the event that new functionality is provided in software or a known bug that is expected to be encountered should server software be upgraded. Upgrade recommendations for Data ONTAP are available on the web as well as the mechanisms for implementing the upgrade. Be sure to understand the recommendations from the vendor as well as the risks. Use all the available protection tools (Snapshots, mirrors, etc.) to provide a way back in case the upgrade introduces more problems than it solves. And whenever possible, perform incremental unit tests on an upgrade before putting an upgrade into critical production. Testing Testing was mentioned earlier. Testing of storage configurations is often a task not well performed by most enterprises due to cost and time requirements. However, as storage environments become ever more complex and critical the need for customer-specific testing increases in importance. Customers should work with their storage vendors to determine an appropriate and cost-effective approach to testing various solutions to ensure that their storage configurations are running optimally. Even more important is that testing of disaster-recovery procedures become a regular and ingrained process for everyone involved with storage management. Chapter 3. Pre-installation Planning 71

352 sizing.fm Draft Document for Review December 6, :59 am 3.2 Summary While providing only a high-level set of guidelines for planning, it is expected that consideration of most of the issues discussed here will maximize the likelihood for a successful initial deployment of an N Series storage server. Other sources of specific planning templates exist or are under development and should be simple to locate using sensible web search queries. Deploying a network of storage servers is not terribly difficult and most customers are successful at doing it alone by following these guidelines. And remember, the N Series represents the best in storage appliances and it is the appliance concept from the beginning that should provide a great deal of confidence. Because of the simplicia that appliances provide, if a mistake is made in the initial deployment, corrective actions are generally not difficult nor terribly disruptive. For many years customers have iterated their storage server environments into ultimately scalable, reliable, and smooth running configurations. So getting it right the first time is not nearly as critical as it used to be prior to the introduction of storage appliances. If a storage server planner / architect remembers to keep things simple and flexible it is unlikely that they will deploy an N series server that misses the mark by very much. 72 The IBM TotalStorage Network Attached Storage N Series

353 Draft Document for Review December 6, :59 am 7129ax01_dp_appendix_worksheets.fm Appendix A. Setup Worksheets and Cabling Diagrams This appendix provides worksheets and cabling diagrams to help you document, install and setup your N series filer. In this appendix, the following are described: This appendix provides/describes/discusses/contains the following: Initial configuration Worksheet additional Planing and configuration worksheets... Copyright IBM Corp All rights reserved. 1

354 7129ax01_dp_appendix_worksheets.fm Draft Document for Review December 6, :59 am 0.1 Planing and Implementation Worksheets 0.2 Single node configuration The Initial Configuration Worksheet contains information about Network setup and basic configuration during the initial setup of the N Series Filers Initial Setup Worksheet Single Node Configuration (N A10) Basic worksheet for the initial setup of a single node configuration. Single node Setup N3700 Model A10 MAC address (backside of the N Series filer). (Used for setup with DHCP / Web Interface) Hostname Password Time zone Filer location Language for Multiprotocol filers Administration Host Hostname IP address) Can later be deleted on command line by issuing: options admin.hosts Do you want to configure Ethernet virtual interfaces? Interface name, Number of links, Link names) The default is set to No for most installations. Network configuration e0a e0b IP address Netmask Media type/speed 100tx-fd 100tx auto [100/1000]): Flow control (none, receive, send, full): 2 The IBM TotalStorage Network Attached Storage N Series

355 Draft Document for Review December 6, :59 am 7129ax01_dp_appendix_worksheets.fm Single node Setup Enable jumbo frames? (Y/N) (MTU size for jumbo frames) N3700 Model A10 on 270 / 3700 also possible? Default Gateway IP address Router Networkname HTTP Directory location (default: /home/http) Would you like to continue setup through Web interface? (You do this through the Setup Wizard.) IP address or name of administration host: (Leave blank to allow root user access to /etc from any NFS client) Do you want to run DNS resolver? Domain name, Server address1, 2, 3, Do you want to run NIS client? Domain name, Server address 1, 2, 3 WINS Server 1, 2, 3, Windows 2000 Domain Admin User Windows 2000 Domain admin user password Active Directory (command line setup only Disk and Raid configuration Disk How many disks in one aggregate Disks can be parity, spare data Volumes in an aggregate (name, size, options, qtrees, snapshots, etc.) Appendix A. Setup Worksheets and Cabling Diagrams 3

356 7129ax01_dp_appendix_worksheets.fm Draft Document for Review December 6, :59 am 0.3 Cluster configuration The Initial Configuration Worksheet contains information about Network setup and basic configuration during the initial setup of the N Series Filers Initial Setup Worksheet Cluster Configuration (N A20) Basic setup worksheet for the cluster configuration. TABLE REFERENCE MISSING Cluster Node Setup N3700 Model A20 - Node 1 and Node 2 Node configuration Node 1 Node 2 MAC address (backside of the N Series filer) Hostname Password Time zone Filer location Language for Multiprotocol filers Administration Host Hostname IP address) Can later be deleted on command line by issuing: options admin.hosts Do you want to configure Ethernet virtual interfaces? Interface name, Number of links, Link names) The default is set to No for most installations. Network configuration Node 1- e0a / e0b Node 2 - e0a / e0b IP address Netmask Media type/speed 100tx-fd 100tx auto [100/1000]): Flow control (none, receive, send, full): 4 The IBM TotalStorage Network Attached Storage N Series

357 Draft Document for Review December 6, :59 am 7129ax01_dp_appendix_worksheets.fm Cluster Node Setup N3700 Model A20 - Node 1 and Node 2 Enable jumbo frames? (Y/N) on 270 / 3700 also possible? (MTU size for jumbo frames) Default Gateway IP address Router Networkname HTTP Directory location (default: /home/http) Would you like to continue setup through Web interface? (You do this through the Setup Wizard.) IP address or name of administration host: (Leave blank to allow root user access to /etc from any NFS client) Do you want to run DNS resolver? Domain name, Server address1, 2, 3, Do you want to run NIS client? Domain name, Server address 1, 2, 3 WINS Server 1, 2, 3, Windows 200 Domain Admin User Windows 200 Domain admin user password Active Directory (command line setup only RMC and RLM for 5200 and 5500 RMC (Remote Management Controller) enhanced autosupport not for N3700 Node 1 Node 2 MAC address IP address Network mask (subnet mask) Gateway Media type Mailhost RLM (Remote Lan Module) = Management card for N5200/5500) Node 1 Node 2 Appendix A. Setup Worksheets and Cabling Diagrams 5

358 7129ax01_dp_appendix_worksheets.fm Draft Document for Review December 6, :59 am Cluster Node Setup N3700 Model A20 - Node 1 and Node 2 MAC address IP address Network mask (subnet mask) Gateway AutoSupport mailhost AutoSupport recipient(s) Disk Ownership Worksheet Table 0-1 Disk ownership worksheet Bay Number Disk Shelf Node A Node B Disk Shelf Node A Node B Disk Shelf Node A Node B Disk Shelf Node A Node B Disk Shelf Node A Node B Disk Shelf The IBM TotalStorage Network Attached Storage N Series

359 Draft Document for Review December 6, :59 am 7129ax01_dp_appendix_worksheets.fm Node A Node B X Markes the Node who owns the disk / - means no disk in place. Disk in Bay 0 and 1 (Shelf 1 and 2) are SES disk Matching parameters in clustered environments Table 0-2 Matching parameters on cluster configurations Parameter Command Node 1 Node 2 ARP table arp -a date nfs date nfs status route table route -s routed timezone routed status timezone Appendix A. Setup Worksheets and Cabling Diagrams 7

360 7129ax01_dp_appendix_worksheets.fm Draft Document for Review December 6, :59 am 0.4 Cabling Cabling the N3700 Model A10 to EXP600s Out In Module A EXP600 Shelf ID 4 In Out Module B EXP600 3 Out In Module A Shelf ID 3 In Out Module B EXP600 2 Out In Module A Shelf ID 2 In Out Module B Node A N3700 A10 1 Shelf ID 1 T Exp Node B 1 Cabling one EXP600 2 Cabling two EXP600 3 Cabling three EXP600 Figure 0-1 Cabling N3700 Model A10 to EXP shelves 8 The IBM TotalStorage Network Attached Storage N Series

361 Draft Document for Review December 6, :59 am 7129ax01_dp_appendix_worksheets.fm Cabling the N3700 Model A20 to EXP600s Out In Module A EXP600 3 Shelf ID 4 In Out Module B EXP600 3 In Out Out In 2 Module A Module B Shelf ID 3 EXP600 2 In Out Out In 1 Module A Module B Shelf ID 2 N3700 A20 1 Exp T Node A Shelf ID 1 T Exp Node B 1 Cabling one EXP600 2 Cabling two EXP600 3 Cabling three EXP600 Figure 0-2 Cabling N3700 Model A20 to EXP600 shelves Appendix A. Setup Worksheets and Cabling Diagrams 9

362 7129ax01_dp_appendix_worksheets.fm Draft Document for Review December 6, :59 am 10 The IBM TotalStorage Network Attached Storage N Series

363 Draft Document for Review December 6, :59 am 7129ax01.fm B Appendix B. LAN Basics 0.5 Local Area Networks LAN designs are based typically on open systems networking concepts, as described in the network model of the Open Systems Interconnection (OSI) standards of the International Standards Organization (ISO). The OSI model is shown in detail in Figure 0-6 on page 13. LAN types are defined by their topology, which is simply how nodes on the network are physically connected together. A LAN may rely on a single topology throughout the entire network but typically has a combination of topologies connected using additional hardware. The primary topologies defined for Local Area Networks are: Bus topology In a bus topology, all nodes are connected to a central cable, called the bus or backbone. Bus networks are relatively inexpensive and easy to install. Ethernet systems use a bus topology (Figure 0-3). Bus (Backbone) Node 1 Node 2 Node 3 Node 4 Figure 0-3 Bus topology Ring topology Nodes in a ring topology are connected via a closed loop such that each node has two other nodes connected directly to either side of it. Ring topologies are more costly and can be difficult to install. The IBM Token Ring uses a ring topology (Figure 0-4 on page 12). Copyright IBM Corp All rights reserved. 11

364 7129ax01.fm Draft Document for Review December 6, :59 am Node 1 Ring Node 4 Node 2 Node 3 Figure 0-4 Ring topology Star topology A star topology uses a centralized hub to connect the nodes in the network together. Star networks are easy to install and manage. However, bottlenecks occur since all of the network traffic travels through the hub. Ethernet systems also use a star topology (Figure 0-5). Star Node 1 Node 5 Central hub Node 2 Node 4 Node 3 Figure 0-5 Star topology Today, Ethernet topologies are predominant. International Data Corporation (IDC) estimates more than 85% of all installed network connections worldwide are Ethernet. It is popular due to its simplicity, affordability, scalability, and manageability. Ethernet includes definitions of protocols for addressing, formatting and sequencing of data transmissions across the network and also describes the physical media (cables) used for the network Open Systems Interconnection (OSI) model The Open Systems Interconnection (OSI) model describes the layers in the network required for communication between computers. OSI is a seven layered model illustrated with the Internet protocol suite (or stack) in Figure 0-6 on page 13. Each layer is responsible for a certain set of tasks associated with moving data across the network. Most Ethernet networks (including ours) communicate using the TCP/IP protocol. In this section, we discuss TCP/IP and how it relates to the OSI model, since it is the default communication protocol for the System Storage N The IBM TotalStorage Network Attached Storage N Series

365 Draft Document for Review December 6, :59 am 7129ax01.fm Internet protocol suite Network process to application OSI model Application 7 Application Data representation Presentation 6 Inter-Host communication Session 5 TCP UDP End-to-end connection Transport 4 IP Address and best path Network 3 Device Driver and Hardware (Subnet) Access to media Binary Transmission Data Link Physical 2 1 Figure 0-6 Comparing the Internet protocol suite with the OSI reference model Device driver and hardware layer Also called the Subnet layer, the device driver and hardware layer comprises both the physical and data link layers of the OSI model. It is considered the hardware that is part of each node on the network. The hardware handles the electrical and mechanical aspects of data transfers, moving the bits across a physical link. The data link layer packages packets of data into frames, ensures that they arrive safely to the target destination, and encompasses error detection and correction Internet Protocol layer In the OSI model, the Network layer finds the best route through the network to the target destination. It has little to do in a single discrete LAN; but in a larger network with subnets, or access to WANs, the Network layer works with the various routers, bridges, switches, gateways, and software, to find the best route for data packets. The Internet Protocol (IP) layer in the Internet protocol suite performs the functions of the network layer. It is the common thread running through the Internet and most LAN technologies, including Ethernet. It is responsible for moving data from one host to another, using various routing algorithms. Layers above the network layer break a data stream into chunks of a predetermined size, known as packets or datagrams. The datagrams are then sequentially passed to the IP layer. The job of the IP layer is to route these packets to the target destination. IP packets consist of an IP header, together with the higher level TCP protocol and the application datagram. IP knows nothing about the TCP and datagram contents. Prior to transmitting data, the network layer might further subdivide it into smaller packets for ease of transmission. When all the pieces finally reach the destination, they are reassembled by the network layer into the original datagram. Appendix B. LAN Basics 13

366 7129ax01.fm Draft Document for Review December 6, :59 am IP connectionless service The IP is the standard that defines the manner in which the network layers of two hosts interact. These hosts may be on the same network, or reside on physically remote heterogeneous networks. IP was designed with inter-networking in mind. It provides a connectionless, best-effort packet delivery service. Its service is called connectionless because it is like the postal service rather than the telephone system. IP packets, like telegrams or mail, are treated independently. Each packet is stamped with the addresses of the receiver and the sender. Routing decisions are made on a packet-by-packet basis. On the other hand, connection-oriented, circuit switched telephone systems explicitly establish a connection between two users before any conversation takes place. They also maintain the connection for the entire duration of conversation. A best-effort delivery service means that packets might be discarded during transmission, but not without a good reason. Erratic packet delivery is normally caused by the exhaustion of resources, or a failure at the data link or physical layer. In a highly reliable physical system such as an Ethernet LAN, the best-effort approach of IP is sufficient for transmission of large volumes of information. However, in geographically distributed networks, especially the Internet, IP delivery is insufficient. It needs to be augmented by the higher-level TCP protocol to provide satisfactory service. The IP packet All IP packets or datagrams consist of a header section and a data section (payload). The payload may be traditional computer data, or it may, commonly today, be digitized voice or video traffic. Using the postal service analogy again, the header of the IP packet can be compared with the envelope and the payload with the letter inside it. Just as the envelope holds the address and information necessary to direct the letter to the desired destination, the header helps in the routing of IP packets. The payload has a maximum size limit of 65,536 bytes per packet. It contains error and/or control protocols, like the Internet Control Message Protocol (ICMP). To illustrate control protocols, suppose that the postal service fails to find the destination on your letter. It would be necessary to send you a message indicating that the recipient's address was incorrect. This message would reach you through the same postal system that tried to deliver your letter. ICMP works the same way: It packs control and error messages inside IP packets. IP addressing An IP packet contains a source and a destination address. The source address designates the originating node's interface to the network, and the destination address specifies the interface for an intended recipient or multiple recipients (for broadcasting). Every host and router on the wider network has an address that uniquely identifies it. It also denotes the sub-network on which it resides. No two machines can have the same IP address. To avoid addressing conflicts, the network numbers are assigned by an independent body. The network part of the address is common for all machines on a local network. It is similar to a postal code, or zip code, that is used by a post office to route letters to a general area. The rest of the address on the letter (that is, the street and house number) are relevant only within that area. It is only used by the local post office to deliver the letter to its final destination. The host part of the IP address performs a similar function. The host part of an IP address can further be split into a sub-network address and a host address. 14 The IBM TotalStorage Network Attached Storage N Series

367 Draft Document for Review December 6, :59 am 7129ax01.fm Time to Live (TTL) The IP packet header also includes Time to Live (TTL) information that is used to limit the life of the packet on the network. It includes a counter that is decremented each time the packet arrives at a routing step. If the counter reaches zero, the packet is discarded The transport layer The transport layer is responsible for ensuring delivery of the data to the target destination, in the correct format in which it was sent. The Transmission Control Protocol (TCP) in this layer is also responsible for delivering the sequence of packets in the correct order. In the Internet protocol suite, the protocol operating in the transport layer is the Transmission Control Program (TCP) and User Datagram Protocol (UDP). UDP is basically an application interface to IP. It adds no reliability, flow-control, or error recovery to IP. It simply serves as a multiplexer/demultiplexer for sending and receiving datagrams. TCP provides considerably more facilities for applications than UDP, notably error recovery, flow control, and reliability. TCP is a connection-oriented protocol which means that it is reliable, unlike UDP, which is connectionless and also called an unreliable service. Most of the user application protocols, such as Telnet and FTP, use TCP. The advantage of UDP is that it is has minimal overhead. That makes it usable for fast requests like nameservice or streaming of media data, which need no acknowledgements. The application data has no meaning to the Transport layer. On the source node, the transport layer receives data from the application layer and splits it into data packets or chunks. The chunks are then passed to the network layer. At the destination node, the transport layer receives these data packets and, if TCP is used, reassembles them before passing them to the appropriate process or application. The Transport layer is the first end-to-end layer of the TCP/IP stack. This characteristic means that the transport layer of the source host can communicate directly with its peer on the destination host, without concern about 'how' data is moved between them. These matters are handled by the network layer. The layers below the transport layer understand and carry information required for moving data across links and subnetworks. In contrast, at the transport layer or above, one node can specify details that are only relevant to its peer layer on another node. For example, it is the job of the transport layer to identify the exact application to which data is to be handed over at the remote end. This detail is irrelevant for any intermediate router. But it is essential information for the transport layers at both ends Application layer The functions of the Session, Presentation, and Application layers of the OSI model are all combined in the Application layer of the Internet protocol suite. It encompasses initial logon, security, final termination of the session, interpretation services (compression, encryption, or formatting), and delivery of the network messages to the end user program. The Application layer is the layer with which end users normally interact. It is responsible for formatting the data so that its peers can understand it. Whereas the lower three layers are usually implemented as a part of the OS, the application layer is a user process. Some application-level protocols that are included in most TCP/IP implementations, include: Telnet for remote login Appendix B. LAN Basics 15

368 7129ax01.fm Draft Document for Review December 6, :59 am File Transfer Protocol (FTP) for file transfer Simple Mail Transfer Protocol (SMTP) for mail transfer Protocol suites A protocol suite (or protocol stack), as we saw in the Internet protocol suite, is organized so that the highest level of abstraction resides at the top layer. For example, the highest layer may deal with streaming audio or video frames, whereas the lowest layer deals with raw voltages or radio signals. Every layer in a suite builds upon the services provided by the layer immediately below it. Note: You may see the different terms Internet protocol suite, TCP/IP suite, or TCP/IP stack. These are simply names for the same thing, the group of network layers to describe how two nodes on the Internet communicate. The terms protocol and service are often confused. A protocol defines the exchange that takes place between identical layers of two hosts. For example, in the IP suite, the transport layer of one host talks to the transport layer of another host using the TCP protocol. A service, on the other hand, is the set of functions that a layer delivers to the layer above it. For example, the TCP layer provides a reliable byte-stream service to the application layer above it. Each layer adds a header containing layer-specific information to the data packet. A header for the network layer might include information such as source and destination addresses. The process of appending headers to the data is called encapsulation. Figure 0-7 shows how data is encapsulated by various headers. During de-encapsulation the reverse occurs; the layers of the receiving stack extract layer-specific information and process the encapsulated data accordingly. The process of encapsulation and de-encapsulation increases the overhead involved in transmitting data. Internet protocol suite Layering and encapsulation Application Application datagram TCP TCP Header IP Subnet IP Header IP Header IP Packet TCP Header Application data Subnet Header IP Packet Subnetwork Frame Subnet Trailer Figure 0-7 Layering and Encapsulation 16 The IBM TotalStorage Network Attached Storage N Series

369 Draft Document for Review December 6, :59 am 7129addm.fm C Appendix C. Additional material This redbook refers to additional material that can be downloaded from the Internet as described below. Locating the Web material The Web material associated with this redbook is available in softcopy on the Internet from the IBM Redbooks Web server. Point your Web browser to: ftp:// Alternatively, you can go to the IBM Redbooks Web site at: ibm.com/redbooks Select the Additional materials and open the directory that corresponds with the redbook form number, SG24????. Using the Web material The additional Web material that accompanies this redbook includes the following files: File name Description????????.zip????Zipped Code Samples????????????.zip????Zipped HTML Documents????????????.zip????Zipped Presentations???? System requirements for downloading the Web material The following system configuration is recommended: Hard disk space:????mb minimum???? Operating System:????Windows/Linux???? Processor:???? or higher???? Memory:????MB???? Copyright IBM Corp All rights reserved. 17

370 7129addm.fm Draft Document for Review December 6, :59 am How to use the Web material Create a subdirectory (folder) on your workstation, and unzip the contents of the Web material zip file into this folder. 18 The IBM TotalStorage Network Attached Storage N Series

371 Draft Document for Review December 6, :59 am 7129abrv.fm Abbreviations and acronyms AC ACCESS ACCOUNT ACE ACL ACTIVE AD ADDITIONAL AGGEGATES AIX AIX/UNIX AND ARP ASCII ATA BAY BDC BE BIND BLABLA CAD/CAM CFE CFO CHECK CIFS CIFS/SMB CLI COULD CPU CREATE CREWATE CRM CRU CTRL-C DAS DATA DC DC/KDC DCV DELETE DETAILS DFS DFS DHCP DIRECTORY DISK DNS DOD DOMAIN DP ECAD ERP ESCON ESH EXP EXPLAIN EXPORTS FAT FAX FC-AL FCP FFS FICON FILER FLEXVOL FOR FSID FTP GB GPO GUI HA HERE HIPAA HOME HOST HTML HTTP HTTPI I/O Copyright IBM Corp All rights reserved. 19

372 7129abrv.fm Draft Document for Review December 6, :59 am I/T IBM IBM ICMP ID IDC IETF INTEGRATION IP ISCSI ISO IT KB KDC KPASSWD LAN LC-LC LDAP LED LINUX LOOKUP LOOKUPS LUN MAC MANAGE MB MCAD MIPS MISSING MMC MODE MORE MTTDL MULTIPROTOCOL NAS NDMP NFS NFS/CIFS NIS NLM NODE NSM NT NTFS NTLM NT NVRAM OFF OK ON ONTAP ONTAPTM ONTAP OR OS OSI OU PC PCNFSD PDC PERMISSIONS PLANNED PRE-CREATING PROVIDED QTREES QXXE RAID RAID-DP RAIDDP RAM READ READDIR READDIRPLUS REFERENCE REMOVE RFC SAMBA SAN SAP SCSI SD SEC SEE SES SETUP SFU SHARES SHELF 20 The IBM TotalStorage Network Attached Storage N Series

373 Draft Document for Review December 6, :59 am 7129abrv.fm SID SINGLE SMB SMP SMTP SnapShots SNIA SNMP SQL SRV SSH SUPPORT TABLE TARGET TBD TCO TCP TCP/IP TFTP TO TTL UDP UFS UK UNICODE UNIX UNIX URL USA USE VFM VLD VOLUMES VSS WAFL WAN WINS WORKGROUP WORKSHEETS WORM WRITE XO XOR XP XXX XYZ This abbreviations and acronyms file is optional. abbreviation1 abbreviation2 IBM ITSO abbreviation3 abbreviation4 Use this file by adding names and descriptions to it. Sort these names: highlight rows > Table > Sort > Sort By: Column 1 > Sort or optionally add names and descriptions to the Index file instead of this file by indexing the first use of an abbreviation or acronym: highlight text > Special > Marker > Index > New Marker Description1 Description2 International Business Machines Corporation International Technical Support Organization Description3 Description4 Abbreviations and acronyms 21

374 7129abrv.fm Draft Document for Review December 6, :59 am 22 The IBM TotalStorage Network Attached Storage N Series

375 Draft Document for Review December 6, :59 am 7129abrv.fm Abbreviations and acronyms 23

376 7129abrv.fm Draft Document for Review December 6, :59 am 24 The IBM TotalStorage Network Attached Storage N Series

377 Draft Document for Review December 6, :59 am 7129bibl.fm Related publications The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this redbook. IBM Redbooks For information on ordering these publications, see How to get IBM Redbooks on page 26. Note that some of the documents referenced here may be available in softcopy only.????full title???????, xxxx-xxxx????full title???????, SG24-xxxx????full title???????, REDP-xxxx????full title???????, TIPS-xxxx Other publications Microsoft Knowledge Base Domain Users Cannot Join Workstation or Server to a Domain Enhanced Security Joining or Resetting Machine Account in Windows 2000 Domain Time Synchronization What is Microsoft Active Directory? Windows 2000 Advanced Server Documentation Third-party references Precreating Computer Objects to Join Active Directory Understanding Active Directory Active Directory "Cookbook" Author Comment: We may need a little more info for the above 3 These publications are also relevant as further information sources:????full title???????, xxxx-xxxx????full title???????, xxxx-xxxx????full title???????, xxxx-xxxx Copyright IBM Corp All rights reserved. 25

378 7129bibl.fm Draft Document for Review December 6, :59 am Online resources These Web sites and URLs are also relevant as further information sources: Description1 Description2 Description3 How to get IBM Redbooks You can search for, view, or download Redbooks, Redpapers, Hints and Tips, draft publications and Additional materials, as well as order hardcopy Redbooks or CD-ROMs, at this Web site: ibm.com/redbooks Help from IBM IBM Support and downloads ibm.com/support IBM Global Services ibm.com/services 26 The IBM TotalStorage Network Attached Storage N Series

379 Draft Document for Review December 6, :59 am 7129IX.fm Index A Application layer 15 B Block I/O 21, 25 bus topology 11 C CIFS 20 21, 26 Common Internet File System connectionless service 14 connectivity 26 D data integrity 27 datagram 14 F File I/O 21, 25 file servers 23 file sharing 26 file system 20 I I/O 20 IETF 28 Internet Engineering Task Force 28 Internet Protocol 13 IP 13 IP address 14 IP packet 14 L LAN 20 LAN bandwidth 27 Local Area Network 20 N NAS 23 benefits 25 enhanced backup 26 File I/O 25 IBM TotalStorage NAS 24 manageability 26 Network Attached Storage 23 Network Attached Storage 23 network file system protocols 20 Network layer 13 NFS 20, 26 O Open Systems Interconnection 12 oplocks 21 OSI 12 compared to TCP/IP 12 model 12 P packet 14 payload 14 performance 27 Presentation layer 15 protocol stack 16 protocol suite 16 protocols 20 R Redbooks Web site 26 Contact us xxv resource pooling 25 S scalability 26 Session 15 SNIA 28 star topology 12 Storage Networking Industry Association 28 Subnet layer 13 T TCP 15 TCP/IP addressing 14 application layer 15 device driver and hardware layer 13 Internet Protocol layer 13 IP addressing 14 IP connectionless service 14 packet 14 protocol suites 20 Subnet layer 13 TCP layer 15 time to live 15 thin server 24 topology bus 11 ring 12 star 12 total cost of ownership 27 Transmission Control Program 15 TTL 15 Copyright IBM Corp All rights reserved. 27

380 7129IX.fm Draft Document for Review December 6, :59 am 28 The IBM TotalStorage Network Attached Storage N Series

381 To determine the spine width of a book, you divide the paper PPI into the number of pages in the book. An example is a 250 page book using Plainfield opaque 50# smooth which has a PPI of 526. Divided 250 by 526 which equals a spine width of.4752". In this case, you would use the.5 spine. Now select the Spine width for the book and hide the others: Special>Conditional Text>Show/Hide>SpineSize(-->Hide:)>Set. Move the changed Conditional text settings to all files in your book by opening the book file with the spine.fm still open and File>Import>Formats the Conditional Text Settings (ONLY!) to the book files. Draft Document for Review December 6, :59 am 7129spine.fm 29 The IBM TotalStorage Network Attached Storage N Series (1.5 spine) 1.5 <-> <->1051 pages The IBM TotalStorage Network Attached Storage N Series (1.0 spine) <-> <-> 788 pages (0.5 spine) <-> <-> 459 pages The IBM TotalStorage Network Attached Storage N Series (0.2 spine) 0.17 <-> <->249 pages (0.1 spine) 0.1 <-> <->89 pages

382 To determine the spine width of a book, you divide the paper PPI into the number of pages in the book. An example is a 250 page book using Plainfield opaque 50# smooth which has a PPI of 526. Divided 250 by 526 which equals a spine width of.4752". In this case, you would use the.5 spine. Now select the Spine width for the book and hide the others: Special>Conditional Text>Show/Hide>SpineSize(-->Hide:)>Set. Move the changed Conditional text settings to all files in your book by opening the book file with the spine.fm still open and File>Import>Formats the Conditional Text Settings (ONLY!) to the book files. Draft Document for Review December 6, :59 am 7129spine.fm 30 The IBM TotalStorage Network Attached Storage N Series (2.5 spine) 2.5 <->nnn.n 1315<-> nnnn pages The IBM TotalStorage Network Attached Storage N Series (2.0 spine) 2.0 <-> <-> 1314 pages

383

384 Draft Document for Review December 6, :59 am Back cover The IBM TotalStorage NAS N Series Fundamentals of Ontap Snapshot Explained IBM Network Attached Storage This Redbook is an easy-to-follow guide that describes the market segment at which the IBM Network Attached Storage is aimed, and explains the IBM N series installation, ease-of-use, remote management, high availability (clustering), backup and recovery techniques, and software such as Flexvol. It explains cross platform storage concepts and methodologies for common data sharing for Linux/UNIX/AIX and Windows NT/2000/XP/2003 environments. INTERNATIONAL TECHNICAL SUPPORT ORGANIZATION BUILDING TECHNICAL INFORMATION BASED ON PRACTICAL EXPERIENCE IBM Redbooks are developed by the IBM International Technical Support Organization. Experts from IBM, Customers and Partners from around the world create timely technical information based on realistic scenarios. Specific recommendations are provided to help you implement IT solutions more effectively in your environment. For more information: ibm.com/redbooks SG ISBN