WHITEPAPER CHOOSING THE RIGHT STORAGE PLATFORM FOR SPLUNK ENTERPRISE INTRODUCTION Savvy enterprises are investing in operational analytics to help manage increasing business and technological complexity. In doing so, they are able to drive greater efficiency, enhanced customer satisfaction, increased transparency, and superior resilience. Deploying and managing operational analytics at scale is not without challenges, however, particularly with regards to data storage. In order to deliver the insights that operational analytics users demand, large amounts of information must be collected and stored. As a result, IT organizations commonly find that storage for enterprise-scale operational analytics is either too difficult to manage, or too expensive, or both. To help businesses easily and cost-effectively realize the benefits of operational analytics, Red Hat has integrated Splunk Enterprise, an industry-leading platform for delivering real-time operational intelligence, with Red Hat Gluster Storage, a software-defined storage platform for files, objects, and machine-to-machine data. Using these products together helps enterprise solve the cost and scale problems of explosive analytic data growth. The use of Red Hat software-defined storage with Splunk creates an important new opportunity for enterprises deploying Splunk Enterprise: as opposed to using inexpensive but difficult to manage direct-attached storage (DAS), or expensive and high-latency network-attached storage (NAS), Red Hat is pioneering a hybrid storage model that allows Splunk Enterprise to use a combination of DAS and software-defined storage to achieve a high-performance, highly manageable, and cost-effective system for operational analytics. facebook.com/redhatinc @redhatnews linkedin.com/company/red-hat
DATA REQUIREMENTS FOR OPERATIONAL ANALYTICS To more rapidly identify trends, patterns, and behaviors in operational data, or to facilitate regulatory compliance, enterprises retain the data indexed by Splunk Enterprise for extended periods of time. This is because Splunk s data-hungry analytical algorithms produce more insightful results when fed more data, both in terms of the number of unique data sources as well as the number of retained data points from each source. Figure 1. Data retention aids pattern recognition in operational analytics. 2013 Richard Candy According to Splunk documentation, daily indexing volumes for medium and large enterprises are typically: 100 300GB per day for a medium enterprise with tens to low hundreds of users. 300GB 1TB per day for a large enterprise with up to five hundred or more users. Figure 2 illustrates the aggregate amount of storage required as the ingest rate and data retention period vary. As can be seen in the figure, a large enterprise ingesting a moderate 500GB of data per day will accumulate approximately 1PB of data if that data is to be retained for four years, while an enterprise indexing 1TB of data per day will require the same 1PB of data retaining that data for only two years. 2
Figure 2. Splunk storage required vs retention period and ingestion rate With such large data storage requirements, enterprises are faced with selecting a storage architecture offering scalability, manageability, and low cost, while not compromising on performance. STORAGE OPTIONS FOR SPLUNK ENTERPRISE Enterprises have several options when it comes to architecting storage for Splunk Enterprise, each offering a unique combination of operating characteristics at a given price point. LOCAL DIRECT-ATTACHED STORAGE The default storage option for Splunk is local, direct-attached storage. Local DAS has the advantage of simplicity, allowing enterprises to get started quickly, using storage already available in their Splunk servers. In addition to simplicity, DAS also offers high performance. Because DAS is connected locally to the Splunk Indexer via a high-bandwidth and low-latency SATA bus, operations such as indexing and search can be very fast. Local storage is also extremely cost-effective, since the drives themselves are commoditized. In spite of its short-term advantages, local storage presents significant manageability challenges in the long term, as storage requirements grow. These include: Poor expandability. Upgrading local disks is time consuming and generally requires that nodes be taken out of service for the duration of the upgrade. Reduced efficiency. Because compute and storage must be scaled together, direct-attached storage results in lower overall resource utilization. Lower availability. With direct-attached storage, disk failures can result in data loss and system downtime. Thus, while all-local storage can be effective for some Splunk deployments, enterprises with large data sets can quickly grow out of DAS. 3
SHARED ENTERPRISE STORAGE With the manageability limitations of local storage providing the motivation, traditional enterprise storage vendors suggest that a NAS cluster be deployed by Splunk Enterprise users in an all-shared storage architecture. Deploying a NAS cluster in a shared manner means that all Splunk Indexers store indexed data on the NAS devices, where it may be accessed directly via Splunk Search Heads. The shared nature of NAS storage does improve upon the manageability challenges presented by large amounts of DAS. However, because shared storage is accessed over a network, it imposes performance and latency penalties not present with DAS, resulting in reduced indexer ingest throughput and longer search times. In addition to diminished performance, cost is a significant challenge with a NAS cluster. Due to their closed, proprietary nature, traditional NAS devices can cost many times as much as the equivalent amount of storage obtained via commodity disk drives. Beyond performance and cost, traditional NAS is also: Hardware-based. Traditional NAS seeks to deliver reliability through expensive hardware redundancy and requires additional software or hardware to deliver the disaster recovery required for operational analytics projects. Monolithic. The monolithic nature of traditional NAS makes it difficult to expand incrementally. This presents challenges for operational analytics projects, which typically start small, but expand broadly across the enterprise as they mature. Proprietary. Traditional NAS locks customers in and dramatically adds to the total cost of ownership (TCO) of operational analytics projects, especially at scale. Rigid. NAS supports a single, on-premise deployment model and makes it difficult to deploy a cloud-based operational analytics system. These characteristics make traditional enterprise NAS devices a weak fit for operational analytics projects. HYBRID SOFTWARE-DEFINED STORAGE To help enterprises to overcome the manageability and scalability challenges of local storage, while avoiding the performance and cost shortcomings of NAS, Red Hat has integrated Red Hat Gluster Storage with Splunk Enterprise using a hybrid storage model. Splunk Enterprise enables the hybrid storage by segmenting indexed data into hot, warm, cold, and frozen repositories called buckets. Splunk s data placement policies control the distribution of data across buckets, based on the size of the indexes or the age of the data they contain. Buckets allow enterprises to maximize efficiency, performance and value by utilizing a tiered approach to managing the lifecycle of ingested data. 4
In the hybrid storage model, Splunk Enterprise stores recently indexed data on DAS, maximizing performance, and moves older data to a storage system selected to ensure scalability and manageability at a low total cost of ownership (TCO). Red Hat Gluster Storage is particularly well suited for housing Splunk Enterprise data in cold and frozen buckets because it is: Software-defined. Red Hat Gluster Storage provides reliability inexpensively, via software, and requires no additional hardware or software to ensure data protection and disaster recovery for operational analytics or other workloads. Cost-effective. Red Hat Gluster Storage environments are based on open-source software (the proven GlusterFS file system and Red Hat Enterprise Linux ) running across industrystandard servers and disk drives, eliminating storage vendor lock-in and delivering low TCO for operational analytics projects. Expandable. Red Hat Gluster Storage is easily expanded, with no downtime, allowing operational analytics projects to start small and grow as needed without disruption. Figure 3. Buckets in Splunk Enterprise allow indexed data to be segmented. Flexible. Red Hat Gluster Storage is easily deployed wherever Linux runs, facilitating operational analytics both on-premise and in the cloud. The open, storage-defined nature of Red Hat Gluster Storage and its tight integration with Splunk Enterprise make it an ideal choice for supporting enterprise operational analytics. Hybrid storage using DAS and Red Hat Gluster Storage addresses the shortcomings of both DAS- and NAS-based approaches. HYBRID STORAGE IS THE BEST OF BOTH WORLDS DAS-ONLY Hot/Warm and Cold Data NAS-ONLY Hot/Warm and Cold Data HYBRID Hot/Warm on DAS, Cold on Red Hat Storage Cost Pros Performance Cost Manageability Manageability Performance Scalability Cons Scalability Manageability Cost Performance 5
WHITEPAPER Choosing the right storage platform for Splunk Enterprise CONCLUSION The Red Hat and Splunk partnership has resulted in an important new deployment alternative for enterprises deploying Splunk Enterprise for operational analytics. With the hybrid deployment model, enterprises can deploy Splunk Enterprise using direct-attached storage for hot and warm data, and Red Hat Gluster Storage for cold and frozen data. The hybrid storage configuration has the advantage of offering highest-performance ingest and search on the most recent data, and strong performance search on older data, while minimizing overall cost and complexity. For more information on the Red Hat Gluster Storage and Splunk, visit /storage/. ABOUT RED HAT Red Hat is the world s leading provider of open source solutions, using a community-powered approach to provide reliable and high-performing cloud, virtualization, storage, Linux, and middleware technologies. Red Hat also offers award-winning support, training, and consulting services. Red Hat is an S&P company with more than 80 offices spanning the globe, empowering its customers businesses. facebook.com/redhatinc @redhatnews linkedin.com/company/red-hat #12350037_INC0210625_v2_0215 NORTH AMERICA 1 888 REDHAT1 EUROPE, MIDDLE EAST, AND AFRICA 00800 7334 2835 europe@ ASIA PACIFIC +65 6490 4200 apac@ LATIN AMERICA +54 11 4329 7300 info-latam@ Copyright 2015 Red Hat, Inc. Red Hat, Red Hat Enterprise Linux, the Shadowman logo, and JBoss are trademarks of Red Hat, Inc., registered in the U.S. and other countries. Linux is the registered trademark of Linus Torvalds in the U.S. and other countries.