1 Lab Validation Report Hitachi Data Ingestor with Hitachi Content Platform Bottomless, Backup-free Storage for Distributed IT By Tony Palmer, ESG Lab Senior Analyst June 2012
2 Lab Validation: Hitachi Data Ingestor with Hitachi Content Platform 2 Contents Introduction... 3 Background... 3 Hitachi Data Ingestor and Hitachi Content Platform... 4 ESG Lab Validation... 6 Getting Started... 6 Scalable Content Sharing and Performance Resilience ESG Lab Validation Highlights Issues to Consider The Bigger Truth Appendix ESG Lab Reports The goal of ESG Lab reports is to educate IT professionals about data center technology products for companies of all types and sizes. ESG Lab reports are not meant to replace the evaluation process that should be conducted before making purchasing decisions, but rather to provide insight into these emerging technologies. Our objective is to go over some of the more valuable feature/functions of products, show how they can be used to solve real customer problems and identify any areas needing improvement. ESG Lab's expert third-party perspective is based on our own hands-on testing as well as on interviews with customers who use these products in production environments. This ESG Lab report was sponsored by Hitachi Data Systems. All trademark names are property of their respective companies. Information contained in this publication has been obtained by sources The Enterprise Strategy Group (ESG) considers to be reliable but is not warranted by ESG. This publication may contain opinions of ESG, which are subject to change from time to time. This publication is copyrighted by The Enterprise Strategy Group, Inc. Any reproduction or redistribution of this publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the express consent of The Enterprise Strategy Group, Inc., is in violation of U.S. copyright law and will be subject to an action for civil damages and, if applicable, criminal prosecution. Should you have any questions, please contact ESG Client Relations at
3 Introduction Lab Validation: Hitachi Data Ingestor with Hitachi Content Platform 3 This report documents hand-on evaluation and testing of Hitachi Data Systems edge-to-core storage solution, combining the Hitachi Data Ingestor as the bottomless and backup-free cloud on-ramp with the Hitachi Content Platform object storage. The report focuses on Hitachi Data Ingestor as a distributed edge device providing users with transparent connectivity to object based, private cloud storage, eliminating the need to perform backup, capacity planning, and other data storage management activities at remote and branch offices (ROBOs). Background The importance of agility in today s connected business ecosystem cannot be overstated. Organizations that can rapidly respond to competitive opportunities (or threats) across multiple geographies are likely to thrive, while those that cannot may fail. Agility is one of the key benefits of cloud computing, and as cloud services mature, interest is surging. ESG s 2012 Spending Intentions Survey offers a proof point: Almost three-quarters (74%) of surveyed organizations reported plans to increase spending on cloud services in Cloud services enable organizations to provision, scale, and decommission applications and storage quickly and easily. Private clouds can be deployed through corporate IT or through a service provider, but both methods result in simple user access to data services. Most significant, the cloud makes applications accessible from any location; in reality, the decision to place data in the cloud effectively makes every office a remote office because the data is located elsewhere. As a result, IT departments face the same challenges as cloud service providers when providing services to ROBOs: They must cost-effectively deliver acceptable application performance and availability, keep data secure and protected, enable employees to collaborate, and minimize the cost of bandwidth for data replication, backup, etc. When ESG asked IT organizations about their top priorities for supporting remote locations, improving application performance (48%) and accessibility (38%) for end-users were most often cited (see Figure 1). 2 Figure 1. Top IT Priorities for Remote Data Access Which of the following would you consider to be your organization s top IT priorities with respect to supporting ROBO locations? (Percent of respondents, N=454, seven responses accepted) Improving application performance for end-users 48% Improving application accessibility for end-users 38% Improving information security measures 37% Improving employees abilities to share files/collaborate with other employees 36% Reducing WAN connectivity expenses 35% 0% 10% 20% 30% 40% 50% 60% Source: Enterprise Strategy Group, Source: ESG Research Report, 2012 IT Spending Intentions Survey, January Source: ESG Research Brief, WAN Optimization Usage at Remote/Branch Offices, October 2011.
4 Lab Validation: Hitachi Data Ingestor with Hitachi Content Platform 4 One of IT s greatest challenges today is the raging growth of unstructured data. Continual growth of and documents, video, web pages, presentations, medical images, and the like increase both complexity and risk, particularly in distributed IT environments such as ROBOs. Organizations often end up with silos of storage divided by application or workload, with few resources available to manage and protect the data. Just adding more storage capacity is not doing the job; IT needs a way to effectively control, manage, and protect this data while keeping costs, complexity, and risk to a minimum. Hitachi Data Ingestor and Hitachi Content Platform Hitachi Data Systems offers a solution for storing and managing unstructured data using a central object storage device and a remote on-ramp to that device for edge locations such as ROBOs and cloud service providers and consumers. Object storage allows unstructured data files to be stored as objects, essentially containers that include both data and the metadata used to define the structure and administration of the data. An intelligent object storage device can then apply management functions to each object. For example, metadata for an x-ray may include requirements for replication, file versioning, and expiration; an intelligent object storage device can replicate the file, retain versions, and expire it according to compliance requirements. Hitachi s solution combines the object-based Hitachi Content Platform (HCP) at the core and the Hitachi Data Ingestor (HDI) at the edge. A minimal-footprint physical or virtual appliance caches data for fast retrieval at the edge and sends data to the core infrastructure where it can enjoy advanced storage and data management capabilities. This configuration provides full-featured IT services to end-users and remote deployments while minimizing costs and complexity. The edge deployments gain seamless scalability while benefiting from the centralized management and protection capabilities at the core. Remote deployments no longer need to worry about having IT staff to handle storage management or backup, and corporate IT (or cloud service providers) don t have to design and build their own edge-to-core infrastructure. Figure 2. Hitachi Data Ingestor with Hitachi Content Platform
5 Lab Validation: Hitachi Data Ingestor with Hitachi Content Platform 5 At the Edge: Hitachi Data Ingestor The HDI serves as a bottomless and backup-free on-ramp for remote locations and cloud customers. The standardsbased file interface requires no recoding or usage changes; storage is accessed using standard NFS or server message block (SMB CIFS) protocols. HDI is essentially a cache that provides users and applications with seemingly endless storage capacity and advanced storage management and protection. HDI retains files that are currently in use for fast retrieval, minimizing WAN traffic. All files on HDI are migrated, i.e., copied, to Content Platform for data protection, but remain local to provide instant access. Once capacity reaches 90%, Data Ingestor will then delete any file above the 90% threshold and create a 4KB stub to replace it. These stubs are transparent to clients and are backed up just like a complete file. File recall is done simply by reading the stub. However, users have the capability of pinning files of their choice so that these files will stay on HDI all the time, guaranteeing the fastest possible access. Each file system on the HDI is mapped to a specific namespace within a designated tenant on the core HCP, enabling maintenance of end-to-end access control. While all namespaces can be shared by file systems on multiple HDIs, each HDI can write only to its own namespace. This relationship enables content sharing without compromising access control, and it enables multiple service levels to be managed at the core. HDI is also capable of performing file restore, a practical feature that enables users to retrieve previous versions of a file or even deleted files. HDI does this by creating historical snapshots that users can access. Hitachi Data Ingestor also provides the means for administrators to automatically migrate data from existing NAS systems and Windows servers into HDI without going through a disruptive migration process. HDI is available in three configuration options: Dual node (HA) configuration with SAN-attached storage. Single node configuration with internal Direct-Attached storage (DAS). VMware appliance configuration running on VMware vsphere (ESXi) that is customer-installable and also supports HA. Other features include: Scalability to 400 million files and thousands of users per HDI. Archiving, backup, compliance, data lifecycle management, e-discovery, and file tiering that all occur on the HCP. A management API that enables HDI integration with the HCP management interface as well as with thirdparty and homegrown management UIs. Full integration with Active Directory and LDAP. Support for leading WAN acceleration solutions for cost and operational efficiency. The HDI can be co-located with the HCP, used as a standalone device, or integrated into the HCP rack. At the Core: Hitachi Content Platform The centralized core infrastructure is the massively scalable, multi-tiered, multi-tenant HCP. This object store can be divided into thousands of virtual content platforms: Tenants and underlying namespaces have configurable attributes to deliver varying service levels for different users and applications (or in the case of a cloud service provider, different organizations). NFS, CIFS, and REST protocols are supported, as well as Active Directory authentication. The HCP easily accommodates changes in both scale and storage technology so that data can reside for decades or longer with minimal disruption. It scales from a few TB to tens of PB of capacity, includes data compression and single-instancing to streamline capacity requirements, can store select objects on spin-down media, and offers optional deletion and shredding capabilities. These features enable HCP to eliminate storage sprawl and reduce the cost and complexity of storing unstructured data.
6 Lab Validation: Hitachi Data Ingestor with Hitachi Content Platform 6 Advanced storage management capabilities on HCP include data protection, preservation, distribution, retention, governance, and search. All of these capabilities are available to the data coming from the HDIs, relieving ROBOs from having to deliver these services at the edge and enabling cloud providers to offer services to multiple customers from a single infrastructure. Instead of parallel silos all requiring individual management, data at the edge and at the core is centrally stored and managed on a single shared infrastructure. This reduced IT footprint improves service levels, reduces complexity, and minimizes cost while enabling massive scalability. Data protection and security are provided with encryption, Write Once Read Many (WORM), replication, RAID-6, data integrity checks, and more. ROBOs and cloud service providers can eliminate the cost of tape infrastructure for backup because the HCP centrally replicates data and retains multiple versions. The ability to browse the environment ensures that all data is easily recoverable. Deduplication and compression are applied as new objects are written to the HCP to minimize capacity requirements; in addition, selective replication can be applied in disaster recovery cases to conserve bandwidth and offsite storage capacity. HCP can automatically apply data retention and disposition, deleting expired content and reclaiming storage. Monitoring, reporting, and audit capabilities are built in and enable chargeback. ESG Lab Validation ESG Lab performed hands-on evaluation and testing of HDI with HCP at a Hitachi Data Systems facility in Santa Clara, California. Testing was designed to demonstrate the ease with which HDI/HCP can be integrated into an organization s infrastructure to provide limitless unstructured data storage to a distributed IT environment. Also of interest were the scalability, secure multi-tenancy, and resilience of the solution. Getting Started ESG Lab began with an environment designed to simulate a branch office connecting to a central data center. In the data center environment, HCP was installed and populated with unstructured (file and object) data in multiple namespaces. One HDI was installed in a simulated remote office with Windows clients mounting the file system over the SMB protocol. Connectivity between the clients, HDI, and HCP was via 1Gb Ethernet. Figure 3. The ESG Lab Test Bed ESG Lab Testing First, ESG Lab used the HCP management GUI to create a tenant for the simulated remote office. From the HCP Console, ESG Lab clicked Tenant, then Create Tenant. This action brought up the Create Tenant wizard shown in Figure 4. The Lab created a new tenant named ESG2 and enabled all four major features: Retention Mode, Replication, Search, and Versioning, which are selectable on a tenant-by-tenant basis. Once selected for a tenant, they can then be configured individually for each namespace under that tenant.
7 Lab Validation: Hitachi Data Ingestor with Hitachi Content Platform 7 Figure 4. Creating a Tenant Next, as shown in Figure 5, ESG Lab created a namespace named Branch1 for use by the clients at the remote office. Using a simple pull-down menu, ESG Lab selected the Dynamic data protection level, which determines how many copies of an object are stored for protection. Dynamic keeps a minimum of two copies, but it will keep additional copies as long as there is available capacity in the HCP. The hash algorithm which generates the unique identifier for each object stored was also selectable, and ESG lab chose SHA-256. Figure 5. Creating a Namespace ESG Lab also enabled advanced functionality during namespace creation, such as: Retention mode, Replication, Versioning, and Search. Versioning enables HCP to retain previous versions of files for a specified number of days, while retention mode can be set to Enterprise, which allows data to be deleted after the retention period expires, or to Compliance, which does not allow data to be deleted. Figure 6 shows how Enterprise retention mode would be configured.
8 Lab Validation: Hitachi Data Ingestor with Hitachi Content Platform 8 Figure 6. Setting Retention Class To enable access to data for clients, ESG Lab logged in to the HDI GUI and created a file system (see Figure 7). Creating the file system has two components, creating local access and quotas for the file system, and attaching it to a namespace on an HCP for content storage. Figure 7. Creating a File System ESG Lab next looked at the HDI Data Ingestor Dashboard, shown in Figure 8. The HDI Dashboard gives administrators a quick but thorough overview of the file and content environment at a remote office and how it
9 Lab Validation: Hitachi Data Ingestor with Hitachi Content Platform 9 relates to the data stored in the data center. Under Namespace information, administrators can see that the namespace arc-documents is being shared out as file system Documents. At a higher level, the Capacity Usage section shows the utilization of both the file system on the HDI and the namespace, which resides on the HCP. Figure 8. The Hitachi Data Ingestor Dashboard Finally, ESG Lab connected to the file system on the HDI using a Windows client in the simulated remote office. Figure 9 shows the folders in the file system Documents from the point of view of the Windows client. Figure 9. Accessing Content on HDI
10 Lab Validation: Hitachi Data Ingestor with Hitachi Content Platform 10 Why This Matters ESG research found that among organizations with a distributed IT infrastructure, two of the top IT priorities for supporting ROBO locations were improving performance for users and improving file sharing and collaboration. 3 Installing, configuring, and managing traditional NAS systems can be complex and time consuming, especially when many remote offices lack dedicated IT staff. Time and money can be wasted trying to deploy legacy systems for widely distributed file and content sharing. Hitachi Data Systems HDI and HCP offer native file system access to a hugely scalable distributed object store with advanced storage and management capabilities. In less than 10 minutes, ESG Lab was able to integrate an HDI at a simulated remote office with a central HCP and provide file services to ROBO users with native, local access. Scalable Content Sharing and Performance Hitachi Data Ingestor presents a standards-based file system interface to users that is tightly integrated with Hitachi Content Platform to provide seamless data access and a wide range of advanced storage features. HDI uses HTTP/HTTPS to securely move data over a local or wide area network and into HCP. HDI provides local and remote access to an HCP for clients over CIFS and NFS, delivering effectively bottomless storage capacity (up to 400 million files per HDI), back-ended by HCP (up to 40PB per HCP). HDI migrates inactive content to a central HCP and maintains a local link to the migrated content referred to as a stub. Files are automatically retrieved when users open them to access their contents. Versioning is supported, allowing previous versions to be preserved for a configurable time period. Organizations can use built-in HDI tools to migrate data from traditional NAS or file servers to HDI. As Figure 10 shows, users at a remote site have full access to all their content, including content that has been migrated to the HCP. Figure 10. Bottomless, Backup-free Storage ESG Lab first examined a group of large files that had been copied into a folder in the HDI file system Documents and left untouched until the migration threshold was reached and the HDI moved their content to the HCP. The directory listing looked normal, and all files were present in the directory. Next, ESG Lab examined the properties of 3 Source: ESG Research Report, Remote Office/Branch Office Technology Trends, July 2011.
11 Lab Validation: Hitachi Data Ingestor with Hitachi Content Platform 11 a 1.96GB DVD ISO file containing the HDI virtual appliance image. As Figure 11 shows, the file s size on disk was 4KB, indicating that the full file had been moved to HCP with only the 4K stub (and file metadata) left behind on HDI. Figure 11. A Large File, Stubbed ESG Lab opened the vsphere client and imported the image. While the installation was occurring, HDI was transparently retrieving the file data from HCP in the background. When the import was complete, the virtual machine properties were checked as shown in Figure 12, and the virtual machine was started. Figure 12. Installing Virtual HDI
12 Lab Validation: Hitachi Data Ingestor with Hitachi Content Platform 12 Next, as seen in Figure 13, the file properties were checked again, and the file s size on disk was confirmed to be 1.96GB. Figure 13. A Large File on Local Disk As a final exercise, the file was again left untouched, and once the migration threshold was reached, the system again left a file stub in its place. Why This Matters ESG Research specialists recently asked IT managers to name the most important attributes of a private cloud infrastructure. Elasticity of IT resources (the ability to add or remove capacity as needed), along with scalability, were the top two responses. 4 As the size and number of files that need to be kept online continues to grow, capital equipment and operating budgets are being stretched to their limits. Scaling capacity at remote offices can lead to lost productivity, and, in some cases, lost revenue as legacy storage systems are filled to capacity, becoming increasingly difficult to back up. ESG Lab has confirmed that HDI with HCP offers effectively unlimited, backup-free file and content storage for distributed users with advanced data availability functionality that protects all file and unstructured content in the data center, while providing local file system performance to users. ESG Lab was extremely impressed by HDI with HCP s ability to transparently migrate inactive files out of the remote office while allowing users to transparently access and retrieve those files, with no IT intervention. 4 Source: ESG Research Report, 2012 IT Spending Intentions Survey, January 2012
13 Lab Validation: Hitachi Data Ingestor with Hitachi Content Platform 13 Resilience HDI s tight integration with HCP enables file system replication and sharing across multiple sites, while rapid disaster recovery is enabled by the storage of a copy of its system configuration to HCP. Users can install a new HDI and issue one command to restore the system configuration and re-establish connection to all client data. The same technique can be used to replace an entire system in a remote site. ESG Lab Testing First, ESG Lab simulated a site failure in which an entire remote site goes offline, and a new site must be brought online. First, ESG Lab wiped the configuration of the HDI used for the tests described in the previous sections of this report. At this point, the system was in the state of a new system fresh from the factory, with no configuration. ESG Lab booted the system and performed an initial installation, as shown in Figure 14, formatting the disks and ensuring that all data was deleted from the previous installation. Figure 14. Recovering a Site to a New HDI When the installation was complete, the Lab executed the system configuration wizard. As Figure 15 shows, basic information such as the host name and IP addresses were entered.
14 Lab Validation: Hitachi Data Ingestor with Hitachi Content Platform 14 Figure 15. The System Configuration Wizard Next, the Lab used the Service Configuration Wizard to configure the connection to the HCP in the data center, as shown in Figure 16. Figure 16. Connecting to HCP
15 Lab Validation: Hitachi Data Ingestor with Hitachi Content Platform 15 Finally, the syslurestore command was run to restore the configuration from the copy stored in the HCP in the data center. When this step was complete, all connections to namespaces on the HCP were restored, and all local file systems were rebuilt. Figure 17 shows a client accessing files from the Documents share, exactly as it was before the system was destroyed and rebuilt. Figure 17. Users Accessing Their Files Finally, ESG examined the ability of multiple HDIs to read from a single HCP namespace. As seen in Figure 18, a file system was replicated from one HDI to a clustered pair of HDIs, with all content residing in the same namespace on a single HCP in the data center. Figure 18. File System Replication and Access with HDI and HCP
16 Lab Validation: Hitachi Data Ingestor with Hitachi Content Platform 16 ESG Lab configured replication for the Documents file system created earlier in this report. When replication is configured, the file system may be mounted for read-only at the target site. Figure 19. Creating a Read-only File System Using the Same Namespace Figure 20 shows the same file system mounted by two different clients, from two different HDIs. Note that the replica file system is mounted for read-only, as only one copy of the file system at a time may be tagged read-write. Figure 20. Mounting the File System and the Replica
17 Lab Validation: Hitachi Data Ingestor with Hitachi Content Platform 17 Why This Matters IT managers reported to ESG Research that their two most significant challenges associated with managing data at ROBO locations were keeping pace with data growth and improving backup and recovery processes. 5 Regardless of the number and types of failures that may occur during the life of digital files on disk, managers, employees, and customers expect their data to be available. ESG Lab has confirmed that HDI and HCP provide advanced data protection and system recovery capabilities that can enable remote offices to maintain access to data through hardware and software faults thanks to a robust, integrated architecture combined with optional clustered systems at the edge. ESG Lab was able to replicate a file system in minutes, enabling access to file data from two locations simultaneously, and restore access to files after a simulated disaster using a single command. 5 Source: ESG Research Report, Remote Office/Branch Office Technology Trends, July 2011.
18 Lab Validation: Hitachi Data Ingestor with Hitachi Content Platform 18 ESG Lab Validation Highlights HDI with HCP was easily integrated and deployed into a distributed IT environment, providing transparent end-user access to secure, centrally managed content storage. ESG Lab was able to provision secure multi-tenancy using multiple namespaces in HCP. HDI and HCP enabled ESG Lab to provision bottomless, on-demand capacity to the edge while providing fast, local access to files and objects in the cloud. HCP offers secure, robust data protection techniques that enable data to persist beyond the life of the storage system it resides on. When HDI is used to access content in HCP, data can be rapidly recovered to a new HDI in a disaster using a single command. The combination of HDI file stubbing at the edge with single-instance storage by HCP in the core optimizes capacity utilization across the distributed enterprise. Issues to Consider While HDI provides excellent file system replication capabilities, currently, only one copy of the replicated file system may be mounted read-write. A true clustered file system, allowing read-write access from multiple locations, would enhance both availability and usability.
19 The Bigger Truth Lab Validation: Hitachi Data Ingestor with Hitachi Content Platform 19 Private cloud implementations are gaining significant ground in corporate IT departments and with service providers the cost and business-process benefits are compelling. They include simplified provisioning, better application availability, centralized data management and protection, and the chance to improve service levels while minimizing capital and operational costs. Private clouds can make an organization more agile and responsive to market conditions. But for many, getting started in the cloud is the challenge. They need an easy way to get up and running. That is not the only challenge facing IT, however. The growth of data volumes particularly unstructured data continues unabated, driven by trends such as greater use of social media, video and photo ubiquity, increased medical imaging, and a corporate and regulatory focus on data retention. In addition, as IT delivery becomes more service focused, user expectations for application availability and data access are soaring. Providing these services to remote and branch offices is particularly difficult, and as a result, IT and cloud providers are anxiously seeking a faster, simpler way to provide a full suite of data services on demand, without increasing cost and complexity. HDI with HCP creates an integrated offering that provides distributed consumers of IT, such as remote office workers, branch office workers, or cloud storage users, with a seamlessly scalable, backup-free storage solution. HDI serves as the easy cloud entry at the edge, caching active data for rapid access, while all data is stored, protected, retained, and governed centrally on the HCP. ROBO and cloud consumers obtain data services that were previously unavailable, while cloud providers and IT departments can consolidate service delivery and data management. ESG Lab testing demonstrated that the HDI can be set up simply and quickly, and it integrates easily with HCP and its management interface for a smooth transition to the cloud. ESG Lab was able to transparently access content of any size on the HCP through stubs left on the HDI. Capacity available in the HDI file environment was able to be expanded without interruption or impact to users. ESG lab demonstrated the ease of replicating a file system and executing HDI system recovery. Over the years, Hitachi Data Systems has proven it has an innovative approach to solving IT challenges, and the combined HDI/HCP solution proves the point yet again. For both corporate IT organizations and cloud service providers, this solution can ease the pain of massive unstructured data growth while delivering an easy entry to a private cloud deployment.
20 Appendix Lab Validation: Hitachi Data Ingestor with Hitachi Content Platform 20 Table 1. ESG Lab Test Bed Hardware Software Hitachi Data Ingestor Single Node Hitachi Data Ingestor Virtual Appliance Hitachi Content Platform Clients Windows 7 Professional Windows 7 Enterprise v Version SP1 SP1
21 20 Asylum Street Milford, MA Tel: Fax: