1 W H I T E P A P E R Hitachi NAS Platform and Hitachi Content Platform with ESRI Image Aciduisismodo Extension to ArcGIS Dolore Server Eolore for Dionseq Geographic Uatummy Information Odolorem Systems Vel By Everett Dolgner April 2010 Hitachi Data Systems
2 2 Table of Contents Executive Summary 3 Problem Overview 4 Solution Overview 4 Solution Testing Details 7 Test Environment 7 Test Process 8 Test Results 9 Solution Details 9 Hitachi NAS Platform 9 Hitachi Content Platform 10 Solution Benefits 10 Data Protection and Disaster Recovery 10 Reduce Backup Size and Time, Faster Recovery 10 Maintain Historic Data Online 11 Quickly Expand Capacity 11 Global Namespace 11 Based on Industry Standards 11 Architectures 11 Overview 11 Architecture 1: Adaptable Modular Storage-based 12 Architecture 2: Universal Storage Platform-based 13 Solution Recommendations 14 Integration Best Practices 16 Conclusion 16 Appendix A 17 Additional Resources 17
3 3 Executive Summary For a Geographic Information System (GIS) implementation to be successful, servers, storage, software and data must all be integrated into a reliable and scalable system. As individuals and organizations use GIS data more heavily, stress is placed on the back end storage systems that contain the data and the servers that process it. Additionally, as the size of the data grows, the management burden that is placed on system administrators increases. Provisioning new storage for the growing data is often difficult and time consuming and typically results in islands of data. Backup becomes more difficult, if not impossible, as historical data is maintained online for longer periods of time and more views are created for users. Hitachi Data Systems has worked with ESRI to develop and test reference architectures that provide scalability, reliability and performance for any GIS deployment. The purpose of this paper is to outline a joint Hitachi Data Systems and ESRI solution, provide information on the testing process, and provide sample architectures and use cases. For best results use Acrobat Reader 8.0.
4 4 Problem Overview As Geographic Information System (GIS) data grows, image storage and processing becomes more of a management issue. Data sets are steadily growing due to increased image resolution and images being kept online for longer periods of time. In order to address the issue of growing data sets and longer online storage for historic data, Hitachi Data Systems and ESRI collaborated to test an ESRI-based image repository with the Hitachi NAS Platform, powered by BlueArc, and the Hitachi Content Platform. Used together, the Hitachi NAS Platform and Hitachi Content Platform provide a highly scalable and robust solution for long term image storage. In the past, GIS deployments have been completed with direct attached storage (DAS) or via a storage area network (SAN). While DAS is good for small deployments, issues become more apparent as capacity and data is scaled and additional application servers are added. DAS quickly becomes unmanageable and unsustainable. For larger deployments, SAN storage provides centralized management, higher utilization and larger capacities, but issues still remain with the distribution of data across application servers. When data is stored locally, via SAN or DAS, on an application server it must either be copied to multiple application servers or shared from a single application server. Creating multiple copies of the data for distribution is an inefficient use of storage capacity and can be very time consuming and labor intensive. Sharing the data from an application server creates a performance and management burden. The application server that is sharing data is subject to performance degradation and is difficult to scale. As additional application servers are brought in to address the growing capacity requirements, islands of data are created, contributing to the management burden. All of these problems are exacerbated by the need to keep data online for longer periods of time. This historic data is becoming more important as Internet enabled GIS applications are deployed. Users expect data to be available on demand and in near real time. Traditionally, historic data is stored on tape or optical systems. Both systems have challenges with the storage and maintenance of media as well as retrieval times. If data is needed and the media is not in the library, then the media must be retrieved and loaded. Once the media is loaded it must be mounted; the data can then be read and made available to the application and user. Depending on the size of the data required, retrieval could be hours or days. The solution to the scalability, management and availability problems is to deploy a multitier file services environment based on the Hitachi NAS Platform and the Hitachi Content Platform. The Hitachi NAS Platform replaces application servers that share image data with dedicated, high performance network attached storage that is available to multiple application servers, eliminating data islands and the need to copy data to multiple application servers. The Hitachi Content Platform is a highly scalable active archive that enables historic data to be maintained online and readily available for long periods of time without requiring active management and backup. Solution Overview The ESRI Image Extension for ArcGIS Server provides the ability to quickly serve very large volumes of imagery to multiple Web applications directly from the source imagery. The dynamic "mosaicing" and on-the-fly processing capability of the server enables extremely large image libraries to be accessed as a single virtual image, with users being able to control the order of the imagery and
5 5 other aspects of image processing. Such image server technology makes these large and valuable image libraries accessible but requires that the imagery is quickly accessible by the servers. The Hitachi NAS Platform uses industry standard protocols, CIFS and NFS, for image storage and retrieval; because of this, integration into an existing environment is easily accomplished. End users and applications can both utilize the NAS Platform for image storage, retrieval and processing. For storage administrators, the NAS Platform reduces the management burden when compared to traditional file servers. Data protection, including backup, restore and replication, is easier to implement and takes less time to manage while also providing more advanced features. Data is easily replicated to remote sites for high availability and disaster recovery while point-in-time snapshots enable rapid recovery locally and allow cloned data sets to quickly be deployed for development and testing. The Hitachi NAS Platform provides storage for active image data that is being retrieved and processed by users and applications. The Hitachi NAS Platform utilizes Hitachi Adaptable Modular Storage and the Hitachi Universal Storage Platform V for back end SAN storage. Both the Adaptable Modular Storage and the Universal Storage Platform V can provide multiple tiers of storage within a single installation. The Universal Storage Platform V can utilize Flash, 15,000RPM Fibre Channel, 10,000RPM Fibre Channel and 7,200RPM SATA drives in a single chassis, while the Adaptable Modular Storage can utilize 15,000RPM SAS, 10,000RPM SAS and 7,200RPM SATA drives in a single chassis. During the loading and processing of new data, performance is crucial. As image data ages, it will be less frequently accessed by end users and applications, requiring less performance, but it still needs to be readily available. Because of this, it is important to store aging and historic data on an appropriate tier of storage, reducing the cost of storage and management. The Hitachi NAS Platform has the ability to tier image data automatically between internal and external volumes, transparently to end users and applications, as shown in Figure 1. When image data is first stored and processed, it can be stored on high speed Fibre Channel or Serial Attached SCSI (SAS) disk. Both Fibre Channel and SAS disk provide high performance. Once the data has been processed and is only being retrieved by users and applications it can be tiered to Serial ATA (SATA) disk. SATA provides high capacity at a lower cost than Fibre Channel or SAS with acceptable performance. Figure 1. File Tiering with the Hitachi NAS Platform and Hitachi Content Platform
6 6 With most image data sets, tiering from SAS or Fibre Channel to SATA is an easy way to manage capacity growth and cost. When historic data is introduced, administration and cost can quickly become unmanageable again due to the sheer volume of data. For large amounts of historic data, simply tiering to SATA is not enough. Data sets that are hundreds of terabytes, or even petabytes, require a different approach to storage, retrieval and management. Finding and managing data becomes difficult and expensive, and backup becomes unwieldy or impossible. The Hitachi Content Platform is a clustered object store that is specifically designed for large, read-only data sets. With the ability to scale to multiple petabytes, the Content Platform uses advanced policies to guarantee the authenticity of data and, since data on the Content Platform is authenticated and replicated, it does not require backup. The Hitachi NAS Platform can tier data to the Hitachi Content Platform based on administrator defined policies, allowing it to be migrated to an appropriate tier of storage but also the appropriate storage type as it ages. Tiering is a transparent process for both end users and applications. When data is tiered, a link or stub will remain in the original location and provide access to the original data. When a user or application accesses the link, the data is read from the lower tier and delivered to the user or application transparently (see Figure 2). Administrators also have the option of recalling data from a lower tier to its original location. For example, file A has not been accessed for 30 days and is migrated to tier 2 automatically. 120 days later file A will be used to generate new images and is recalled by the administrator to tier 1. After file A is moved to tier 1 the aging process starts over and the file will be migrated to tier 2. For this example tier 2 can be SATA disks in the same storage frame, a separate storage frame or the Hitachi Content Platform. It is also possible to directly connect the Image Extension to ArcGIS Server to a SAN solution using Hitachi Adaptable Modular Storage or Universal Storage Platform V, both of which provide block storage over Fibre Channel or iscsi. A combined solution with SAN storage, Hitachi NAS Platform and Hitachi Content Platform is possible and easily deployed. The SAN storage will be used for new data processing and initial image storage. Once the image data has been processed it can easily be migrated to the Hitachi NAS Platform if required for user and application access. The Adaptable Modular Storage and Universal Storage Platform can also be used to provide storage for the database requirements of a GIS deployment.
7 7 Figure 2. Disk Cost, Performance and Capacity Solution Testing Details Test Environment To validate ESRI software with the Hitachi NAS Platform and the Hitachi Content Platform detailed testing was performed at the Hitachi Solution Integration Lab. During the test, SAN storage was provided by a Hitachi Adaptable Modular Storage 2300 with SAS and SATA disk drives. The Adaptable Modular Storage was connected to the Hitachi NAS Platform, the Image Extension to ArcGIS Server and the Hitachi Content Platform (see Figure 3). Tests were performed with a base 45GB of source image data and included building three layers of views at different resolutions. After the layers were built, a load test was performed, simulating users requesting random images for 55 minutes. Each test was performed three times to account for deviations.
8 8 Figure 3. Test Environments Test Process 1. Baseline test of Image Extension to ArcGIS Server with a SAN-based SAS volume 2. Test Image Extension to ArcGIS Server with a SAN-based SATA volume 3. Connect Image Extension to ArcGIS Server to the Hitachi NAS Platform 4. Test Image Extension to ArcGIS Server with a Hitachi NAS Platform SAS volume 5. Test Image Extension to ArcGIS Server with a Hitachi NAS Platform SATA volume 6. Add the Hitachi Content Platform as an archive tier for the Hitachi NAS Platform and tier base image data to the Content Platform 7. Test Image Extension to ArcGIS Server with a Hitachi NAS Platform SAS volume and Hitachi Content Platform 8. Test Image Extension to ArcGIS Server with Hitachi NAS Platform SATA volume and Hitachi Content Platform
9 9 Test Results All tests scenarios provided access to all of the image data and were completed successfully. During the testing, the delta in read performance for SAS and SATA volumes was negligible, with the number of simultaneous transactions and response times within.1 percent. The image build tests also had a very low delta between SAS and SATA drives, within 10 percent. This can be attributed to the tests performed with a single instance of Image Extension to ArcGIS Server. Testing with archived image data did show an increase in response time and a decrease in transactions. This is expected and does not affect the images stored directly on the Hitachi NAS Platform. In most cases, this additional delay is acceptable for archived data as it is accessed infrequently. The current CIFS requirements of the Image Extension to ArcGIS Server work best when connected to the Hitachi NAS Platform. The Hitachi NAS Platform is integrated with the Hitachi Content Platform to provide archiving. The combined solution provides a complete architecture for storage of and access to active and historic image data. For the highest possible performance, block disk over Fibre Channel should be used with the Adaptable Modular Storage or Universal Storage Platform V. Fibre Channel yielded the shortest build time for image tiles and the fastest access time. When connecting the Image Extension to ArcGIS Server to the Hitachi NAS Platform UNC Pathnames must be used instead of Mapped Drive Letters. For example use: \\HNAS_Name\ Share_Name\Directory_Name instead of G:\Directory_Name. Solution Details Hitachi NAS Platform The Hitachi NAS Platform, powered by BlueArc, is an advanced and integrated family of network attached storage (NAS) systems. It is a powerful tool for file sharing or file server consolidation, data protection and general purpose NAS workloads. Intelligent File Tiering enables policy-based Hierarchical Storage Management (HSM) within the Hitachi NAS Platform across Fibre Channel, SAS and serial ATA (SATA) and to the Hitachi Content Platform. Rolling upgrades and faster failover times reduce planned and unplanned downtime. Hardware accelerated network storage enables up to 1.6GB/sec throughput for sequential workloads and up to 200,000 IOPS per node. Expandability supports up to 4PB of capacity and file systems up to 256TB when used with Hitachi Adaptable Modular Storage 2000 and Universal Storage Platform V storage. Active-active clustering provides up to eight nodes with single namespace for simplified data management. Integration with Hitachi Storage Management software allows a single console view. It enables block- and file-level replication. It supports up to 16 million objects per directory.
10 10 Hitachi Content Platform The Hitachi Content Platform is a highly scalable and reliable storage system designed for large amounts of unstructured content; it is ideally suited to the large image sets used by ESRI customers. The Content Platform is fully integrated with the Hitachi NAS Platform to provide transparent data archiving. Data that has not been accessed in a specified amount of time can be archived from the NAS Platform to the Content Platform. Archiving old data helps to reduce the size of data sets on the primary storage, NAS Platform, by removing old data. Data is still readily available should a user need access to it and is read transparently through the NAS Platform from the Content Platform. The Content Platform also provides a "write once, read many" WORM file system and data verification to guarantee that files are identical to when they were written to the archive. The Content Platform scales by simply adding additional nodes to the cluster. Volumes do not have to be modified and users and applications are unaware that any changes have been made. The Hitachi Content Platform provides: WORM file system and time-base retention at the object level Authenticated content preservation with a user choice of digital signature or hash algorithms Optional automation of object-level remote replication Custom metadata support Standards-based interfaces Storage optimization with object-based duplicate data elimination and object-level compression Scalability to 40PB for large content repositories Solution Benefits Data Protection and Disaster Recovery The Hitachi NAS Platform provides solutions for data protection and disaster recovery through the use of snapshots and replication. A snapshot provides an instant point-in-time image of the data that can be used for backup or to recover quickly. Disaster recovery is addressed through replication and allows a complete copy of the data to be stored in a remote site, ready to be accessed when needed. Data that has been archived to the Hitachi Content Platform is protected in multiple ways. The Content Platform uses a WORM file system to prevent modification of files while also guaranteeing their authenticity. The entire Content Platform cluster can be replicated to a remote site for failover and recovery. Finally, the administrator can configure a data protection level that will keep multiple copies of important files on the local and replicated cluster. Reduce Backup Size and Time, Faster Recovery As part of the tiering process, files that have not been accessed in a period of time are migrated to the Hitachi Content Platform. While these files are still available to end users and applications, they can be excluded from system backups to reduce the backup size and the time required to complete the backup. When the data that is backed up is decreased, the restore time from tape will also decrease, improving disaster recovery times.
11 11 Maintain Historic Data Online Historic image data is easy to maintain when it has been tiered to the Hitachi Content Platform from the Hitachi NAS Platform. Historic data will be online and available to users and applications without the need to restore it from tape or requiring dedicated systems. Because the Content Platform does not require backup, historic data will not affect the size of the tape or disk backup. When compared to optical or tape-based archiving, the Content Platform is easier to manage and maintain. All files are online and available immediately without the need to mount tapes or platters. Quickly Expand Capacity The Hitachi NAS Platform and Hitachi Content Platform allow capacity to be quickly and easily increased. The NAS Platform allows additional capacity to be added to any tier nondisruptively. New capacity can be used to grow an existing file system or create a new one. The Content Platform can also add new capacity without the need for downtime. Global Namespace The Hitachi NAS Platform and the Hitachi Content Platform work together to provide a global namespace. The use of a global namespace reduces the management burden because users and applications only need to be connected to a single file system for access to image data. Also, because the global namespace allows capacity to be expanded nondisruptively, the system can scale in place. This improves manageability when compared to standalone NAS and file servers that become silos of capacity and management. Based on Industry Standards The Hitachi NAS Platform and the Hitachi Content Platform both use industry standard protocols. Because the Content Platform uses HTTP, HTTPS, CIFS, NFS, WebDAV and SMTP, it can be used to archive more than just image data. A single Content Platform can be used for image archiving, file archiving, archiving and database archiving. The NAS Platform can also be used to serve files to Microsoft Windows and UNIX/Linux hosts with CIFS and NFS. Additionally, the NAS Platform provides iscsi access to block storage for applications Architectures Overview Several options are available when deploying Hitachi storage with ESRI software. The correct architecture depends on several things: data size, number of users, growth rate, amount of historic data, processing requirements and frequency of access. Most deployments can be divided into three different architectures: large active data set with historic data, small to medium active data set with historic data, and small to large active data set with no historic data. These three architectures can also be subdivided by the requirement for high performance image processing and manipulation. Because the Hitachi NAS Platform is a gateway device, the back end Fibre Channel storage can be adapted based on performance and expected growth of active data. Existing or planned SAN
12 12 infrastructure should also be taken into account when architecture is chosen. The SAN that is used for the Hitachi NAS Platform can also be used to provide block storage for the image database and high performance storage for image processing. A small to medium data set, less than 1TB to more than 50TB, can easily be managed by any Adaptable Modular Storage system and Hitachi NAS Platform model. With this smaller architecture, performance and growth are the most important considerations. As the number of users grows, the bandwidth requirements will also increase. The correct NAS Platform can be determined by looking at the expected performance needed to service all simultaneous users. Table 1 provides a comparison of available NAS Platform models. TABLE 1. HITACHI NAS PLATFORM MODELS Number of Nodes per Cluster Up to 2 Nodes Up to 4 Nodes Up to 8 Nodes Up to 8 Nodes IOPS (1-node/2-node) 60, , , ,000 97, , , ,000 Throughput Up to 700*MB/sec Up to 1100*MB/sec Up to 850MB/sec Up to 1600MB/sec Maximum Capacity 1PB 2PB 2PB 4PB File System Size 128TB 256TB 128TB 256TB File System Objects/ Directory 16 Million 16 Million 16 Million 16 Million Architecture 1: Adaptable Modular Storage-based A solution based around the Adaptable Modular Storage system can easily scale to hundreds of terabytes, providing high performance with SAS and high capacity with SATA. Multiple Adaptable Modular Storage systems can be used with the Hitachi NAS Platform to provide scalability up to 4PB with the NAS Platform This architecture can easily be deployed with less storage and grown as needed, up to the maximum amount of capacity and performance. The addition of a Hitachi Content Platform enables active archiving of historic data. The Adaptable Modular Storage system can also be used to provide Fibre Channel storage to the Image Extension to ArcGIS Server database. This creates a complete GIS storage solution.
13 13 Figure 4. Hitachi NAS Platform with Hitachi Content Platform 300 and Hitachi Adaptable Modular Storage Architecture 2: Universal Storage Platform-based The Universal Storage Platform provides the highest level of capacity and performance available. A single Universal Storage Platform can provide storage for the Hitachi NAS Platform, the Hitachi Content Platform and many application servers, including applications not related to the GIS. A single Universal Storage Platform V can manage more than 200PB of internal and external capacity. With this architecture, the Universal Storage Platform V provides storage for the NAS Platform, the Content Platform and the Image Extension to ArcGIS Server database. Flash storage can be added when even higher performance is required on the NAS Platform or Image Extension to ArcGIS Server.
14 14 Figure 5. Hitachi NAS Platform with Hitachi Content Platform 500 and Hitachi Universal Storage Platform Solution Recommendations Use Case 1: Small to Medium Sized ESRI Environment with Archiving GIS environment ingests 25TB of new data every month with moderate use and activity of image tiles. Base images are not used after initial tiles are built and historic image data is kept online for five years. New image data is processed immediately and is generally available within one week. Solution 1 Hitachi NAS Platform 3080 or 3090 Storage for all active image data Hitachi Content Platform 300 Storage for all historic image data Hitachi Adaptable Modular Storage 2x00 Fibre Channel storage for Hitachi NAS Platform z SAS and SATA drives Fibre Channel storage for Image Extension to ArcGIS Server database z SAS drives
15 15 The Hitachi NAS Platform will be used to provide storage for all image data during processing and user or application access. The Adaptable Modular Storage will provide storage for the Hitachi NAS Platform and the database requirements of the Image Extension to ArcGIS Server. New image data is stored and processed on the Hitachi NAS Platform using SAS drives. Once processing is complete, the processed data is tiered to SATA drives for use by end users and applications. The base image data is archived to the Hitachi Content Platform for long term storage. Use Case 2: Medium Sized Environment with Heavy Short Term Usage GIS environment ingests 50TB to 75TB of new image data every month. Processed image data is used heavily by many applications and end users for the first 90 days, with consistent usage over the next 90 days. All data should be kept online for one year. Solution 2 Hitachi NAS Platform 3080 or 3090 Storage for all active image data Hitachi Content Platform 500 Storage for all historic data Hitachi Adaptable Modular Storage 2x00 Fibre Channel storage for Hitachi NAS Platform z SAS and SATA drives Fibre Channel storage for Image Extension to ArcGIS Server database z SAS drives The Hitachi NAS Platform provides storage for all new image data on SAS drives for 90 days during heavy application use. When the initial heavy usage is complete at 90 days the image data will be tiered to SATA drives for an additional 90 days of consistent usage. Once the data is 180 days old it will be archived to the Hitachi Content Platform for one year and then disposed of. Use Case 3: Growing Data Set with Large Archive The GIS system is constantly ingesting new data that varies in size and also has over 100TB of historic data on tape to be converted to online access. All data should be kept online and available for end users and applications. This environment also has an existing Hitachi SAN with Universal Storage Platform storage. Solution 3 Hitachi NAS Platform 3100 or 3200 Storage for all active image data Hitachi Content Platform Storage for all historic data Use existing Hitachi Universal Storage Platform V for SAN storage Fibre Channel and SATA drives for 3100 or 3200 Fibre Channel drives for Image Extension to ArcGIS Server database SATA drives for Hitachi Content Platform 500
16 16 This Hitachi NAS Platform provides storage for all active image data with new data processed on Fibre Channel disk and then tiered to SATA disk for user and application access. As image data ages and is not accessed for 45 days it is archived to the Hitachi Content Platform for long term storage and access. Historic data is restored from tape and immediately archived to the Hitachi Content Platform for long term online storage. Integration Best Practices Image Extension to ArcGIS Server 9.3 requires explicit ownership of all image data. Because of this, when Image Extension to ArcGIS Server 9.3 is used with the Hitachi NAS Platform, UNC path names should be used instead of mapped drive letters. For example, instead of using G:\image_data use \\HNAS\Share_Name\image_data. In a Hitachi NAS Platform and Hitachi Content Platform solution, Image Extension to ArcGIS Server 9.3 should be configured to write directly to the Hitachi NAS Platform only. The Intelligent File Tiering features of the Hitachi NAS Platform should be used to archive data to the Hitachi Content Platform. When archiving from the Hitachi NAS Platform to the Hitachi Content Platform, HTTP should be used. A migration link should be configured for each node in the Hitachi Content Platform cluster. Conclusion Hitachi Data Systems provides a complete storage solution for GIS deployments with ESRI software. The ability to grow the image storage environment to multiple petabytes within a single global namespace reduces the management burden while also providing an enhanced experience for end users and applications. The combined solution of a Hitachi NAS Platform and Hitachi Content Platform enable nondisruptive storage growth and allow image data to be stored in the proper location, based on business value and performance requirements. The Hitachi NAS Platform and Hitachi Content Platform have been tested with ESRI ArcGIS Server and will meet the requirements for a GIS deployment today and for years to come.
17 17 Appendix A Additional Resources
18 Corporate Headquarters 750 Central Expressway Santa Clara, California USA Regional Contact Information Americas: or Europe, Middle East and Africa: +44 (0) or Asia Pacific: or Hitachi is a registered trademark of Hitachi, Ltd., in the United States and other countries. Hitachi Data Systems is a registered trademark and service mark of Hitachi, Ltd., in the United States and other countries. All other trademarks, service marks and company names in this document or website are properties of their respective owners. Notice: This document is for informational purposes only, and does not set forth any warranty, expressed or implied, concerning any equipment or service offered or to be offered by Hitachi Data Systems Corporation. Hitachi Data Systems Corporation All Rights Reserved. DS-371-A DG April 2010