Springpath Data Platform

Springpath Data Platform WHITE PAPER February 2015

SUMMARY SPRINGPATH DATA PLATFORM Introduction 1 HALO Architecture 3 Data Distribution 3 Data Caching and Data Persistence 4 Data Optimization 5 Data Services 7 Log Structured Distributed Objects 8 Data Availability 9 Data Rebalancing 10 Autosupport 10 SPRINGPATH DATA PLATFORM FOR ware Installation 11 Expansion 11 Springpath vcenter Web Client Plugin 12 Native Snapshots 12 ReadyClones 13 CONCLUSION

SUMMARY Today s IT datacenter is plagued with inefficiencies stemming from infrastructure silos. One major contributor to the creation of these silos is the growing number of applications and their unique demands on storage and data management. Existing storage products cannot satisfy the requirements of existing enterprise applications, test-dev environments, big data applications and the new generation of web applications. These storage products may have one or more of the following issues: 1. Cannot meet the performance and capacity needs of the applications cost-effectively. 2. Are not tightly integrated with networking, servers, hypervisors and management tools. 3. Do not scale out and are difficult to grow in small increments. 4. Are available only as expensive hardware appliances, not as software that runs on a common server hardware pool. 5. Cannot provide data services like snapshots and clones without caveats. 6. Cannot provide storage efficiency features like compression and deduplication without caveats. 7. Can only support one of File, Block, Object or Big Data (Hadoop) access methods. 8. Are unable to support various workloads - Virtualized, Big Data, Web applications and containers with one product. 9. Cannot independently scale caching and persistence tiers. Hybrid storage arrays, which mix flash with hard disk drives, have addressed the first issue. Converged or integrated infrastructures that are preconfigured bundles of storage, servers and networking arrived on the scene soon afterwards to address the second issue, but a few challenges such as fragmented management and support remained. The advent of hyper-converged infrastructure eliminated separate network storage hardware completely by running the storage software on the same server hardware that hosts the hypervisors and applications. The storage software is able to leverage both flash and hard drives inside the server to provide a pool of storage capacity. In addition, the ability to scale out in small increments as new servers are added is an integral property of hyper-converged solutions, which directly addresses issue three. However, most hyper-converged solutions do not address the challenges outlined in 4, 5 and 6 above. They have to be purchased as hardware appliances, just as in traditional network storage, and cannot provide granular data services like snapshots, clones and compression and deduplication without impacting performance. The vendors that do offer a software solution do not address issues 5, 6, 7, 8 and 9. The Springpath Data Platform addresses all these challenges and delivers unmatched flexibility, agility and efficiency to IT infrastructures. It minimizes, if not eliminates, the silos that have been created due to these drawbacks in storage and data management.

SPRINGPATH DATA PLATFORM Introduction Springpath Data Platform puts an end to expensive and inefficient storage arrays, converged stacks and hyperconverged server appliances. The solution is differentiated in five specific ways. The software scales out storage capacity and performance as these servers get added. It delivers category leading IO performance and capacity, is extremely easy to deploy and use, and provides Enterprise-grade data services without caveats. The software s built-in cloud based monitoring and analytics helps discover and address issues proactively. It is a 100% software solution that runs on the same brand name servers like Cisco, Dell or HP that customers are already buying to run their applications. The software can be purchased at a low risk on an annual subscription basis. As shown in Figure 1, Springpath Data Platform software is engineered to support a variety of applications, workloads, data sets and use cases - virtualized, containerized or bare metal workloads, test, enterprise or web applications, object or big data. It consolidates all of these disparate data sets onto a single commodity server based hardware and data management platform, dramatically simplifying IT infrastructure. Springpath Data Platform is purpose built. It is based on the patent pending HALO (Hardware Agnostic Log-structured Object) architecture. Data can be accessed by the compute infrastructure via a Data Access layer using File, Block, Object or API Plugins. GUI, CLI or API based management complements the HALO architecture with seamless and tightly integrated orchestration. FIGURE 1 SPRINGPATH DATA PLATFORM APPLICATIONS Enterprise Apps Web Apps Test & Dev Hadoop & Big Data COMPUTE INFRASTRUCTURE BARE METAL BARE METAL SPRINGPATH DATA PLATFORM APIs DATA ACCESS File Block Object Hadoop Plug-in Monitoring HALO ARCHITECTURE Automation Orchestration Hybrid Cloud SERVER POOL 1

The Data Platform spans across three or more servers to form a highly available cluster. Each of these servers has a software controller called the Springpath Controller that takes control of the internal flash based solid state drives (SSDs) and high capacity hard disk drives (HDDs) to store data. The controllers communicate with each other over 10Gb Ethernet to present a single pool of storage that spans across all of the servers in the cluster. FIGURE 2 A DISTRIBUTED SPRINGPATH CLUSTER ON COMMODITY SERVERS Hypervisor Hypervisor Hypervisor Hypervisor Springpath Controller Springpath Controller Datastore Springpath Controller Springpath Controller Each server in the cluster is providing both compute and data services. Compute, storage capacity and IO performance can be scaled linearly by just adding servers to the cluster. Figure 3 depicts the linear scalability of a Springpath cluster going from 4 to 8 nodes based on user profile benchmarking. FIGURE 3 SPRINGPATH CLUSTER EXPANDS LINEARLY 10000 20 Users 8000 6000 4000 2000 16 12 8 4 Average Response Time (ms) 0 4 servers 6 servers 8 servers 0 E-Mails OLTP Web Access File Services Average Response Time 2

HALO Architecture Springpath Data Platform is based on the patent pending HALO architecture. Springpath s HALO architecture is a distributed file system built from the ground up to deliver high performance without compromising data management (snapshots and clones), data optimization (inline global deduplication, inline compression), scalability or rebalance. A key benefit of this decoupled approach is the flexibility to independently scale the caching and persistence tiers. The HALO stack is made up of several key layers (as shown in Fig 4) and each of these layers is architected with these goals in mind. FIGURE 4 THE HALO STACK HALO ARCHITECTURE Incoming Data Data Distribution Data Caching Data Persistence Data Optimization In-line Compression In-line De-duplication Log Structured Distributed Object Store Data Services Snapshots Clones Backup DR Data Distribution All incoming data is distributed across all the servers in the cluster as shown in Figure 5. All incoming data is distributed to a configurable set of servers to optimize performance via the caching tier and distributed in the backend by the Log Structured Distributed Objects layer to optimize capacity in the persistent tier. Location affinity between application cache data and persistent data is decoupled in order to achieve the best overall performance. FIGURE 5 SPRINGPATH DISTRIBUTED CLUSTER Hypervisor Springpath Controller Hypervisor Springpath Controller Hypervisor Springpath Controller 3

Effective data distribution is achieved by mapping all incoming data to stripe units that are configurable in width, as shown in Figure 6. These stripe units are in turn mapped to a specific server in the cluster. The stripe units are distributed across all the servers. Data is sent to the appropriate cluster server based on which stripe unit a block of data is being written to by the application. FIGURE 6 A SPRINGPATH DISTRIBUTED DATA STRIPE Stripe Unit: 256 MB Stripe Units per Stripe: 8 2 GB Stripe N 1 2 3 4 5 6 7 8 Write progresses in appended log structured file system 1 2 3 4 5 6 7 8 Stripe N+1 Cluster Servers KEY BENEFITS Distribution eliminates performance bottlenecks, storage bin packing All compute resources dedicated to Springpath controllers across all servers are not left idle and are used to deliver maximum performance Data Caching and Data Persistence Springpath Data Platform leverages flash media to deliver the highest levels of IO performance at low latencies. All data from the distribution layer is written to or read from a tier consisting of one or more SSDs. Both reads and writes are cached in the SSDs. Based on policies, incoming writes are acknowledged as persistent after the writes have been replicated to additional SSDs in other servers within the same cluster, ensuring no data loss occurs due to SSD or server failures. The writes are then de-staged to inexpensive, high-density hard disk drives for persistent storage. Hot data sets that are frequently or recently read from the persistent tier are cached in both SSDs as well as DRAM. In every layer, the data layout is optimized based on the underlying media: flash and memory for cache and hard drives for persistence. This is illustrated in Figure 7. Flash performance combined with the lowest cost-capacity enables the best possible cost for storing and retrieving application data at full speed. 4

FIGURE 7 DECOUPLED CACHING AND PERSISTENCE Write Write Caching Persistence Read Read Caching (Deduped) L1 Memory Cache L2 SSD Cache The HALO architecture allows for scaling the caching tier independently from the persistent tier, which directly impacts performance and capacity. The scaling can be achieved either by adding SSDs or HDDs to existing slots in a server or having a tier of SSD only servers (or blade servers) that can be scaled out independently from a tier of high density HDD only servers. This ability to scale IO performance and storage capacity independently while preserving the ability to leverage consistent and synchronized data services (such as snapshots) and optimization services (such as deduplication and compression), differentiates HALO from caching only solutions deployed with network storage. KEY BENEFITS Lowest cost IO performance and storage capacity Maximized flash media endurance Caching optimized to leverage both Flash and DRAM Scale performance independently from storage capacity Leverage data management and optimization services along with Flash performance Data Optimization HALO provides fine-grained inline de-duplication and variable block inline compression that is always on for all objects in the cache layer (SSD and memory) and the persistent layer (HDD). Unlike solutions in the market that require you to turn these features off to maintain a certain level of performance, deduplication and compression in HALO have been designed from the ground up to sustain and enhance performance. Deduplication is leveraged across all media memory, flash and hard disk. Deduplication is based on a patent pending Top-K Majority algorithm. It leverages conclusions from empirical research that the majority of data, when sliced into small data blocks, has significant deduplication potential based on a minority of the data blocks. By fingerprinting and indexing just these frequently used blocks, high rates of deduplication can be achieved with a minimal amount of memory, a resource that is expensive and at a premium in servers. 5

The platform is flexible enough to accommodate datasets that can benefit from a higher deduplication ratio with a larger memory allocation for deduplication processing. This ability to adjust the amount of memory for deduplication is extremely valuable when the deduplication potential for a dataset is unpredictable. FIGURE 8 SPRINGPATH OPTIMIZES DATA STORAGE WITH NO PERFORMANCE IMPACT D A C B C D A D C B C A D B B A Original Data A B C D In-line De-duplication 20-30% Space Savings No Performance Impact A B C D In-line Compression 30-50% Space Savings Data is not only deduplicated in the persistence tier yielding space savings, it remains deduplicated when it is read into the caching tier composed of both SSDs and memory. This enables a larger working set to be stored in the caching tier, delivering higher read performance. High performance inline compression is the other key feature of the HALO architecture. Compression friendly data sets can yield significant space savings. While many solutions in the market claim support for compression, the right compression implementation should not impact performance. First, HALO compression uses CPU offload instructions to minimize any performance impact. Second, the log structured distributed objects layer does not impact any modifications (writes) to already compressed data. All incoming modifications are compressed and written to a new location, and the older data is marked for deletion (unless the data needs to be retained in a snapshot). None of the data that is being modified needs to be read for modification before writing. Avoiding read-modify-write penalties improves write performance significantly. KEY BENEFITS Inline global deduplication and inline compression on all data in all tiers, all the time No negative performance impact Inline deduplication can enhance performance if the deduplication rate is high Inline deduplication and inline compression reduce the amount of storage space needed, lowering the cost of hardware that needs to be purchased 6

Data Services The HALO architecture provides a scalable implementation of space efficient data services such as thin provisioning, built-in space reclamation, pointer-based based snapshots, and clones without impacting performance. Thin provisioning enables higher storage utilization by eliminating the need to forecast, purchase, and install disk capacity that may remain unused for a long time. Virtual data containers in HALO can present any amount of logical space that the applications need but the actual physical space usage is driven by the data that the applications eventually write. Native snapshots in HALO are pointer-based and zero-copy. These space efficient snapshots are an excellent way to have frequent online backups of data without worrying about consumption of storage capacity due to the growing number of online backup copies. Data can also be restored from these snapshots instantaneously. The log-structured nature of the data layout in HALO benefits snapshots similarly to how HALO benefits compression. Any modifications to data present in snapshots is written to a new location and the meta data is updated to point to the new location. Again, there are no read-modify-write penalties. Deletion of snapshots are also very quick as it involves deletion of a small amount of metadata to SSD as opposed to a long snapshot consolidation process prevalent in architectures that use delta-disks. As a result, snapshots in HALO do not impact performance. The other unique property of HALO snapshots is their granularity. Snapshots can be taken on an individual file basis. In ware environments, these files map to drives in a. This flexible granularity allows for different snapshot policies on different s. FIGURE 9 Springpath Data Services Provide Added Value With no Performance Impact Thin Provisioning Snapshots (ROW Based) Space Efficient Fast and Scalable No Performance Impact Flexible Granularity or folder Clones in HALO are implemented exactly like snapshots, except they are writable. These writable-snapshots, or clones, can be used to rapidly provision application clones such as virtual desktops or applications for test and development. Hundreds of clones can be created and deleted in minutes. Compared to full copy methods, this can save a significant amount of time and increase agility. Since HALO clones are just like their snapshot counterparts, they are space efficient. Clones are naturally 100% deduplicated at creation time. When the clones start diverging from one another, data that is common is shared between them and only unique data may occupy new space. Any duplication of data in the diverged clones will be eliminated by the HALO deduplication engine, further reducing the footprint of clones. HALO clones are great way to spin up a large number of applications without worrying about space usage. 7

KEY BENEFITS Creation and deletion of Springpath native snapshots and clones are very fast Springpath native snapshots provide unlimited zero-copy space-efficient backups for critical data at a fine grained granularity offering unlimited backups Springpath snapshots and clones do not impact performance negatively Log Structured Distributed Objects HALO s Log structured Distributed Object Store layer groups and compresses any data that filters through the deduplication engine into self-addressable objects. These objects are written in a log structured or sequential manner on disks either local to a server or on another server in the cluster. As a result, all incoming IO, even if it is random, is finally laid out in a sequential manner on both flash (caching tier) and hard disks (persistent tier). The objects are also distributed across all the servers in the cluster to make sure the storage capacity available in these servers are utilized uniformly. This sequential layout increases flash endurance and maximizes performance with hard disks, which are ideal for sequential IO. It also eliminates any negative impact of compression and snapshots on overall performance due to read-modify-write operations. Since these objects are chronologically laid out, it enables HALO to recover from media or server failures quickly by just rewriting the data from the time it got truncated due to a failure. FIGURE 10 SPRINGPATH LOG STRUCTURED FILE SYSTEM Log Structured Data Layout Reclaimed Segments Segment 1 Segment 2 Segment 3... Segment N Segment Header Obj. 1 Obj. 2 Obj. 3 Obj. 4 Obj. 5 Obj. 6 Obj. 7 Obj. 8 Segment Summary Self Describing Variable Sized Compressed Objects As shown in Figure 10, all data blocks that are finally written to both SSDs and HDDs are compressed into objects and sequentially laid out into fixed size segments, which in turn are sequentially laid out in a logstructured manner. Each of these compressed objects within the log structured segments are uniquely addressable using keys, with each key finger printed and stored with checksum to provide highest data integrity, similar to an object storage system like Amazon S3 or SWIFT API. This layer can be exposed via an S3 or SWIFT like Rest API based interface to provide an enterprise grade object storage solution. 8

KEY BENEFITS Maximizes flash endurance Maximizes performance Uniform utilization of storage capacity across servers Ensures compression, snapshots and clones do not impact performance negatively Fast recovery from drive or server failures Provide highest data integrity Data Availability HALO s Log Structure Distributed Object layer provides high availability for data by replicating copies of any incoming data. First, any data that is written to the write cache on an SSD is synchronously replicated to one or two (this is a policy driven tune that users can set) other SSDs located in different servers, before the writes are acknowledged to the application. This enables incoming writes to be acknowledged quickly at a low latency and ensures they are protected from any SSD or server failures. If an SSD or server fails, the replica is quickly recreated on surviving SSDs or servers using other available copies. Similarly, all data that is de-staged from the write cache to the persistent tier is replicated by the Log Structure Distributed Object layer. This replicated persistent data is likewise protected from any hard disk or server failure. With two replicas or a total of three data copies, the cluster can survive two SSDs or two HDDs or two server failures without the risk of data loss. Please see the Springpath Systems Administrator s Guide for a full table of fault tolerant configurations and settings. In scenarios where the Springpath controller software in a server has issues, all data requests from the applications residing in that server are automatically routed to other Springpath controllers in the cluster. This same capability can be leveraged to upgrade the controller software on a rolling basis without impacting cluster or data availability. The self-healing nature of the HALO architecture is one of the key reasons why the Springpath Data Platform can be deployed for production applications. KEY BENEFITS HALO protects data from disk and server failures by replicating data across servers Flexible replication policy allows for trading off capacity and protection level Springpath controller software can be patched and upgrades without taking the cluster offline 9

Data Rebalancing A robust data rebalancing capability is a must-have in a distributed file system. Springpath rebalancing is extremely efficient. There is no meta-data access overhead. Data is moved at a fine granularity, resulting in better space utilization. Rebalancing of data occurs in both the caching tier and the persistent tier. Rebalancing is a non-disruptive online process. HALO rebalances existing data triggered by server or drive additions, removals or failures. When a new server is added to the cluster, its capacity and performance is made available to not only new data that will be created, but for existing data as well. HALO s rebalancing engine distributes any existing data to the new server and ensures all the servers are uniformly utilized from a capacity perspective. The new server s resources are used for performance as well. When a server fails or is removed from the cluster, the rebalancing engine rebuilds and distributes copies of the data that was on the failed or removed server. The rebalancing occurs automatically after specific events like drive or server additions and failures. KEY BENEFITS New hardware resources are utilized for both new and existing data resulting in uniform capacity and performance consumption Hardware failures trigger automatic recreation of missing copies in the working servers No major performance impact during rebalancing with effective/minimal data movement. Autosupport For an enterprise grade solution, it is important to have the ability to proactively discover and address any issues. HALO has a built-in, continuous health monitoring capability, also known as Autosupport, which collects and sends important health and usage metrics on a daily basis to the Springpath Support Cloud. No application or user data is collected. In addition to a daily report, lightweight heart-beats are sent every five minutes. In the event of any hardware or software failure, alerts are sent immediately. The daily reports help address any latent issues, proactively. Alerts reduce the amount of time it is takes to resolve critical issues by notifying both Springpath support and the customer of any issues. In addition to providing proactive and speedy support, the metrics that are collected in the cloud are mined using Big Data analytics to provide insights on configurations, usage history, trends, and best practices. These insights are presented via dashboards in Springpath s support cloud that customers can view and act upon. KEY BENEFITS Proactive prediction of failures and faster issue resolution Capacity and performance insights Dashboard for overviews of the entire installation 10

SPRINGPATH DATA PLATFORM FOR ware Springpath Data Platform tightly integrates with ware s vsphere ESXi and its management application, vcenter, to provide a seamless data management experience to users and administrators. It supports key shared storage features like vmotion, DRS, HA and vsphere replication. The Springpath web client plugin seamlessly extends vcenter and empowers administrators to manage their storage and data without having to learn yet another management tool. ware snapshots and cloning capability is replaced by the more scalable and performant Springpath native snapshots and clones. Springpath native compression and deduplication reduces storage space occupied by the s. Installation Springpath software installation and configuration for ware takes less than 30 minutes to complete. After the servers have had ware ESXi installed on each of them, the Springpath controllers are installed. The Springpath controllers use _DIRECT_PATH to directly access the storage inside the servers and pin a small percentage of the system resources (memory and CPU cores) to guarantee performance. Once the Springpath controllers are installed, they are immediately accessible at their IP address via a web browser. The administrator drags and drops a pre-built configuration file into the wizard. Within about 8-10 minutes, the cluster is configured and ready for use. During this configuration process, the management utilities and plugins are automatically added to the ware vcenter server infrastructure. Expansion Cluster expansion is a one button operation. Perhaps the only time you will go back to the cluster summary page after the system is built is to expand it by adding nodes. This simple and quick operation is performed by pressing the Expand Cluster button on the bottom of the cluster summary page. FIGURE 11 SPRINGPATH CLUSTER EXPANSION Once this is done, the cluster will search the local network for additional nodes. Any nodes identified by SLP (Service Location Protocol) that are eligible for inclusion in your existing cluster are presented. Click the checkbox and add the node to your cluster. Within a few minutes the node is incorporated into the system. The additional available capacity is reflected on both the summary page and in the management plugin. 11

Springpath vcenter Web Client Plugin Springpath utilizes the extensible plugin nature of the vsphere Web Client to create and enhance the familiar vcenter environment for the administrator. There is no separate management console to learn or worry about. Management is centralized with the hypervisor by use of this plugin. There is no need to learn a new interface or a new tool and there is no learning curve. Simply access your vcenter instance as you normally would and use the Springpath portlet on the cluster summary page to access cluster specific operations like creating datastores or monitoring cluster status. FIGURE 12 SPRINGPATH WARE WEB CLIENT PLUGIN SUMMARY PAGE Native Snapshots Springpath native snapshots can be taken at the individual level, or at the folder level as shown in the screen shot below. FIGURE 13 RIGHT CLICK ACCESS TO SPRINGPATH NATIVE SNAPSHOTS AND CLONES 12

Selecting the Snapshot Now item from the Springpath right-click menu item brings up a simple snapshot wizard. You simply enter the name of the snapshot for the or folder and click OK. FIGURE 14 SPRINGPATH NATIVE SNAPSHOT DIALOGUE Once you click OK, a Springpath native snapshot is taken. All snapshots are still managed the same way that the ware administrator is familiar with, namely the Snapshot Manager built into the Web Client. ReadyClones Springpath ReadyClones enable vsphere offloading of cloning to the cluster via API integration. This provides a rapid and robust mechanism for cloning large quantities of virtual machines. Batches of clones, with prebuilt specification files applied to them for customization, can be created in a few easy steps with the pluginintegrated cloning wizard. FIGURE 15 SPRINGPATH READYCLONE WIZARD To rapidly prepare large numbers of clones, simply enter the number of clones and specify a configuration file if you have one. You can also specify a clone name that is incremented by the count you enter. You can even select the option to power on s after the clone operation is complete. 13

CONCLUSION Springpath Data Platform, based on its patent pending HALO architecture, delivers a new approach to store and manage data in enterprise data centers with cloud-like simplicity, scalability and economics. This 100% software based solution enables data centers to standardize on and leverage commodity servers of their choice. It eliminates storage silos that are based on proprietary hardware storage arrays or converged appliances. In addition, the solution delivers features that are superior to what is available in popular enterprise storage solutions today. 14