EMC SOLUTION FOR SPLUNK



Similar documents
EMC XTREMIO EXECUTIVE OVERVIEW

VIDEO SURVEILLANCE WITH SURVEILLUS VMS AND EMC ISILON STORAGE ARRAYS

EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution

EMC VFCACHE ACCELERATES ORACLE

EMC XTREMIO AND MICROSOFT EXCHANGE DATABASES

THE EMC ISILON STORY. Big Data In The Enterprise. Copyright 2012 EMC Corporation. All rights reserved.

CHOOSING THE RIGHT STORAGE PLATFORM FOR SPLUNK ENTERPRISE

EMC DATA DOMAIN OPERATING SYSTEM

EMC DATA DOMAIN OPERATING SYSTEM

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

EMC Integrated Infrastructure for VMware

Journey to the All-Flash Data Center

Simple. Extensible. Open.

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

FLASH 15 MINUTE GUIDE DELIVER MORE VALUE AT LOWER COST WITH XTREMIO ALL- FLASH ARRAY Unparal eled performance with in- line data services al the time

EMC - XtremIO. All-Flash Array evolution - Much more than high speed. Systems Engineer Team Lead EMC SouthCone. Carlos Marconi.

Backup & Recovery for VMware Environments with Avamar 6.0

CONSOLIDATING MICROSOFT SQL SERVER OLTP WORKLOADS ON THE EMC XtremIO ALL FLASH ARRAY

DATA LAKE FOUNDATION 2.0 JEUDI 19 NOVEMBRE Denis FRAVAL-OLIVIER : ISD Presales Manager

CONFIGURATION GUIDELINES: EMC STORAGE FOR PHYSICAL SECURITY

MODERNIZE WITH ALL-FLASH

New Hitachi Virtual Storage Platform Family. Name Date

All-Flash Arrays: Not Just for the Top Tier Anymore

EMC XtremSF: Delivering Next Generation Performance for Oracle Database

EMC PERFORMANCE OPTIMIZATION FOR MICROSOFT FAST SEARCH SERVER 2010 FOR SHAREPOINT

Integrated Grid Solutions. and Greenplum

ACCELERATING SQL SERVER WITH XTREMIO

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

ENABLING SDDC WITH XTREMIO & BROCADE

NEXENTA S VDI SOLUTIONS BRAD STONE GENERAL MANAGER NEXENTA GREATERCHINA

SHAREPOINT 2010 REMOTE BLOB STORES WITH EMC ISILON NAS AND METALOGIX STORAGEPOINT

Protecting Big Data Data Protection Solutions for the Business Data Lake

ORACLE 11g AND 12c DATABASE CONSOLIDATION AND WORKLOAD SCALABILITY WITH EMC XTREMIO 3.0

EMC Integrated Infrastructure for VMware

EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server

Kaminario K2 All-Flash Array

Symantec NetBackup Appliances

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

EMC ISILON ONEFS OPERATING SYSTEM

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

EMC ISILON SCALE-OUT STORAGE PRODUCT FAMILY

Introduction to NetApp Infinite Volume

SQL Server Virtualization

White Paper. Recording Server Virtualization

Leveraging EMC Fully Automated Storage Tiering (FAST) and FAST Cache for SQL Server Enterprise Deployments

White. Paper. EMC Isilon: A Scalable Storage Platform for Big Data. April 2014

Isilon OneFS. Version 7.2. OneFS Migration Tools Guide

How To Get The Most Out Of An Ecm Xtremio Flash Array

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

EMC BACKUP-AS-A-SERVICE

ANY SURVEILLANCE, ANYWHERE, ANYTIME

INCREASING EFFICIENCY WITH EASY AND COMPREHENSIVE STORAGE MANAGEMENT

EMC IRODS RESOURCE DRIVERS

EMC ISILON AND ELEMENTAL SERVER

With DDN Big Data Storage

EMC ISILON SCALE-OUT STORAGE PRODUCT FAMILY

REFERENCE ARCHITECTURE. PernixData FVP Software and Splunk Enterprise

EMC Data Protection Advisor 6.0

ntier Verde Simply Affordable File Storage

FLASH ARRAY MARKET TRENDS

Isilon OneFS. Version OneFS Migration Tools Guide

Storage Solutions to Maximize Success in VDI Environments

EMC ISILON INSIGHTIQ Customizable analytics platform to accelerate workflows and applications on Isilon clusters

Dell Compellent Storage Center SAN & VMware View 1,000 Desktop Reference Architecture. Dell Compellent Product Specialist Team

Evaluation of Enterprise Data Protection using SEP Software

Microsoft Windows Server Hyper-V in a Flash

ALL-FLASH STORAGE ARRAY. A Hyper-Converged Infrastructure for High I/O Applications and Virtual Desktops

How To Manage A Single Volume Of Data On A Single Disk (Isilon)

Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO

Increasing Storage Performance, Reducing Cost and Simplifying Management for VDI Deployments

EMC Backup and Recovery for Microsoft Exchange 2007 SP2

Big data management with IBM General Parallel File System

Solution Overview VMWARE PROTECTION WITH EMC NETWORKER 8.2. White Paper

Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems

Protecting Information in a Smarter Data Center with the Performance of Flash

EMC SOLUTIONS TO OPTIMIZE EMR INFRASTRUCTURE FOR CERNER

Data Center Solutions

Simplified Management With Hitachi Command Suite. By Hitachi Data Systems

EMC Virtual Infrastructure for Microsoft Applications Data Center Solution

Data Center Solutions

Diablo and VMware TM powering SQL Server TM in Virtual SAN TM. A Diablo Technologies Whitepaper. May 2015

Top 5 Reasons to choose Microsoft Windows Server 2008 R2 SP1 Hyper-V over VMware vsphere 5

EMC Backup and Recovery for Microsoft SQL Server

Building the Virtual Information Infrastructure

NEXT GENERATION EMC: LEAD YOUR STORAGE TRANSFORMATION. Copyright 2013 EMC Corporation. All rights reserved.

Nimble Storage for VMware View VDI

Security. Reliability. Performance. Flexibility. Scalability

Whitepaper. NexentaConnect for VMware Virtual SAN. Full Featured File services for Virtual SAN

RUBRIK CONVERGED DATA MANAGEMENT. Technology Overview & How It Works

RFP-MM Enterprise Storage Addendum 1

Private Cloud Migration

Everything you need to know about flash storage performance

Backup & Recovery for VMware Environments with Avamar 7

Transcription:

EMC SOLUTION FOR SPLUNK Splunk validation using all-flash EMC XtremIO and EMC Isilon scale-out NAS ABSTRACT This white paper provides details on the validation of functionality and performance of Splunk technologies using EMC XtremIO and EMC Isilon. May, 2015 EMC WHITE PAPER

To learn more about how EMC products, services, and solutions can help solve your business and IT challenges, contact your local representative or authorized reseller, visit www.emc.com, or explore and compare products in the EMC Store Copyright 2015 EMC Corporation. All Rights Reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. The information in this publication is provided as is. EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. Part Number H14184 2

TABLE OF CONTENTS EXECUTIVE SUMMARY... 4 OBJECTIVES... 4 AUDIENCE... 4 INTRODUCTION... 5 EMC XTREMIO FOR SPLUNK HOT AND WARM BUCKETS... 5 EMC ISILON SCALE-OUT NAS FOR SPLUNK COLD AND FROZEN BUCKETS... 5 SPLUNK... 6 SPLUNK OVERVIEW... 6 SPLUNK ARCHITECTURE... 6 SPLUNK VALIDATION OVERVIEW... 8 XTREMIO BONNIE++ PERFORMANCE TESTING... 8 CLARIFICATION ON BONNIE++ AND INLINE DATA REDUCTION... 9 RESULTS... 10 THE EMC XTREMIO & ISILON SPLUNK SOLUTION... 11 CONCLUSION... 11 3

EXECUTIVE SUMMARY The Big Data market continues to grow with a greater than 40% year-over-year increase in 2014. The main driving force behind this growth is the use of analytics to gain valuable insight from new and existing types of data sources, resulting in increased productivity, profitability, customer satisfaction, and competitive advantage. Splunk has become a leader in this space with over 9,000 customers in 100 countries. Splunk provides the capability to mine machine generated data and turn it into valuable insights. Splunk can take any data, any log, from anywhere in your infrastructure and add it to a searchable, intelligent index through which you can extract meaningful data about what's happening. Splunk calls this Operational Intelligence which is aimed at three main use-cases: IT Operations: Utilization, Capacity Growth Security: Fraud Detection, Real-time Detection of Threats, Forensics Internet of Things (IoT): Sensor Data, Machine-to-Machine, Machine-Human Interactions Machine data is generated by applications, networking devices, host and server logs, mobile devices, and more. Splunk not only captures this information, but will also search and analyze it. The data can be analyzed by examining the real time feeds. Splunk captures and indexes the data and allows you to run searches on the live data as it streams. Splunk can quickly analyze and provide insight into issues and problems in a matter of minutes instead of hours. This data analysis can provide you with a better understanding of your operational environment, reveal patterns, correlate events from multiple sources, and reduce the time for detection of important events. As customers adopt Splunk and take advantage of these compelling features, managing the underlying DAS infrastructure becomes challenging. Maintaining consistent performance and leveraging flash to ensure the end users get fast query and search capabilities from the Splunk Dashboard begins to involve significant time in design and troubleshooting. In addition, enabling longer retention periods increases floor space in the data center and adds management overhead. This EMC Splunk reference architecture offers a solution which provides the capabilities and features to easily and economically support scaling Splunk within an IT infrastructure. By combining the high performance and linear scalability of XtremIO with the multi-protocol linear scalability of Isilon, customers can feel confident supporting large-scale Splunk deployments growing to TB s of ingest per day. This white paper provides insights into a cost-effective, scalable, and flexible infrastructure that combines the value of EMC s Splunk Reference Architecture with the operational intelligence of the Splunk eco-system. OBJECTIVES The key objectives of this whitepaper are: Validate Splunk high-scale throughput and IOPS with an architecture that includes XtremIO and Isilon Prove the Splunk scale-out capabilities with this architecture by starting at 500GB ingest/day and scaling to 1TB ingest/day AUDIENCE This white paper is intended for IT Program Managers, IT Architects, and IT Management interested in deploying a Splunk infrastructure. 4

INTRODUCTION This white paper focuses on supporting a large-scale Splunk ecosystem (500GB-1TB+ ingest rate/day). The paper will demonstrate consistent and linear performance for a large Splunk deployment. It also will prove the ability to support growth and scale of a Splunk deployment. The intent is to prove that a customer can confidently continue to analyze more and more of their IT environment without the concern of the underlying infrastructure being able to keep up. Ultimately, the advantage of this reference architecture over traditional large-scale DAS deployments will prove the benefits of large-scale consistent performance, less management overhead, data efficiencies, and data center environmental advantages (density, power, cooling, etc.). This reference architecture is built around two EMC solutions: All-Flash EMC XtremIO and EMC Isilon Scale-out NAS. Their advantages and strengths in relation to Splunk are outlined below. EMC XTREMIO FOR SPLUNK HOT AND WARM BUCKETS EMC XtremIO is a scale-out all flash array that provides predictable and consistent low latency performance. XtremIO provides always on inline data reduction services such as data deduplication and data compression. The simple all-flash design of XtremIO requires significantly lower administrative overhead compared to local server storage for hot and warm buckets. XtremIO inline data reduction allows the unique ability to leverage Splunk clustered indexers without the additional disk overhead as XtremIO reduces the capacity of the clustered copies. By leveraging Splunk clustered indexers with XtremIO, administrators have application protection as well as XtremIO s XDP data protection. This can avoid lengthy and performance-impacting data or index rebuilds in the event of disk failures. Key benefits of utilizing XtremIO scale-out all flash array for hot/warm storage include: Linear & simple scalability up to 90TB all-flash in a highly available architecture. Enterprise rich features such as double parity data protection, inline data reduction, inline data compression, and no impact snapshots. Access via fiber channel or iscsi with boot from SAN support. Data at rest encryption with self-encrypting drives. EMC ISILON SCALE-OUT NAS FOR SPLUNK COLD AND FROZEN BUCKETS Acting as the Data Lake Foundation, the center of an analytics ecosystem, EMC Isilon provides a highly scalable, flexible, and secure storage system that protects data and optimizes the flow of information within an organization without sacrificing application performance. The Isilon OneFS operating environment provides the specialized data protection, data security, compliant retention, and simple, massive scalability required for long-term retention. Key benefits of utilizing Isilon scale-out NAS for Cold storage include: Linear and simple scalability up to 50PB in a highly available architecture. Your Splunk cold bucket can start out with a smaller footprint and easily scale to fit your Splunk environment as it grows. Significantly lower administrative overhead as compared to local server storage by providing administrators as easy way to grow without configuring more physical servers and storage. Unmatched efficiency with over 80% storage utilization to reduce IT capital investment requirements. Enterprise rich features such as snapshots, WORM retention, encryption, multi-tenancy, and deduplication. Multi-protocol access including but not limited to SMB, NFS, Object and HDFS to leverage HUNK functionality. Option to leverage Isilon automated tiering to further lower TCO of cold data retention and utilize Splunk frozen process to automate deletions, controlling data lifecycle management. EMC Isilon scale-out NAS is the ideal data lake platform with its unmatched simplicity, efficiency, flexibility, and reliability that you need to maximize the value of your Splunk data storage and analytics workflow investment. 5

SPLUNK SPLUNK OVERVIEW The Splunk application provides the ability to search, analyze, and visualize data gathered from different sources in your IT infrastructure including applications, networking devices, host and server logs, mobile devices, and more. For each incoming data source, Splunk indexes the data into a series of events that you can view and search. Splunk Overview In summary, the power of Splunk is to Collect data from anywhere Search and analyze everything Gain real-time operational intelligence SPLUNK ARCHITECTURE The main architectural features for Splunk are its Web Interface, Apps, Forwarders, Indexers, and Search Heads. The Web Interface is called Splunk Web and provides the ability to administer and manage the Splunk deployment, create searches, and create reports. Splunk Web is the primary interface for any Splunk User. Splunk provides extensions through the use of Apps. For instance, an organization may need more specific networking or administration views. As another example, the EMC Isilon App provides a detailed view for your EMC Isilon cluster. The Forwarder forwards the data to either another forwarder or to an indexer. The Indexer transforms the incoming raw data into events. These events are stored into an index. The indexer will also search these indexes in response to a search request. Splunk implements a storage tiering concept referred to as Index s. Index s include a Hot and Warm set of buckets that comprise Splunk s Home Path. This tier within the Splunk Enterprise concept is where the newest data is written to and where the most high-performance, near real-time searches will be executed. This 6

tier requires high-performance, low latency storage that can either be provided via local disks in index servers or in externally attached SAN storage, which is the focus of EMC s XtremIO solution in the paper. As data ages in the Splunk environment, Splunk provides the ability to continue to tier data down into a Cold. The Cold is still searchable and is often used for longer tail searches, forensic analysis, or as a retention tier where less frequently accessed data can be kept at a lower cost, but remain searchable. The Cold is often served by externally attached storage via NFS protocol access. NAS technologies offer an acceptable blend of performance and lower cost per TB, which is the focus of Isilon s use in this reference architecture. Data can also tier into a Splunk Frozen, but this data is no longer searchable and requires manual user action to bring the data back into Splunk Enterprise s in order to be searchable. While customers sometimes choose to leverage Frozen s to meet compliance retention requirements, the purpose of this paper is to show how Isilon s massive scalability and competitive cost of ownership can empower customers to retain more data in their Cold so data is searchable and retained to meet any compliance or regulatory retention requirements. The graphic below describes Splunk concepts in more detail. Splunk Index s The Search Head manages and directs the search functions such as directing requests to peers. After receiving results from the different peers, it will merge the results back to the user. For the purpose of this white paper, we will focus on the Indexers and the Search Heads. The following is an example of the Splunk architecture. Splunk Architecture Overview 7

SPLUNK VALIDATION OVERVIEW For the validation of the Splunk ecosystem, a virtual environment was setup with a Cisco UCS Blade infrastructure. Leveraging a shared storage model demonstrates the ability to use a denser compute environment such as blade servers. This will allow customers to take advantage of data center footprint cost savings including reduced power and cooling costs. The environment sizing for Splunk was created using the Splunk recommended guidelines according to the tech brief found at: http://docs.splunk.com/documentation/splunk/latest/capacity/referencehardware Each Indexer VM was configured with 12 vcpus and 12GB of RAM. For the validation, Splunk recommends using Bonnie++ to simulate Splunk indexing and querying. The Bonnie++ tool provides an indication of disk performance to simulate Splunk indexing and random I/O read performance to simulate Splunk searches. These values will be reported below to show the capabilities of this architecture in a Splunk environment. For each VM, Bonnie++ was installed with the latest version from http://www.coker.com.au/bonnie++/bonnie++-1.03e.tgz. The following Bonnie++ command was used for testing: bonnie++ -u root:root -d <destination_mount> -fb where <destination _mount> is the mount point for the XtremIO storage. For the overall testing strategy, the Splunk guidelines for an indexer were followed where each indexer handles about 125 GB per day. The first series of tests were performed with 4 Indexers to simulate a throughput of 500 GB per day. Then, the next series of validations were performed with 8 Indexers to simulate a throughput of 1 TB per day. The Bonnie++ commands were run simultaneously on the 4 Indexers and then on the 8 Indexers. XTREMIO BONNIE++ PERFORMANCE TESTING A test environment was setup that included the following components which were focused on hot/warm performance disk I/O on the XtremIO X-Brick: For 500GB ingest: (1) XtremIO 3.0.2 10TB X-Brick (4) vsphere 5.5 hosts configured with CentOS Linux release 7.1.1503 (Core) Isilon X410 Cluster Isilon Cold XtremIO Hot/ Warm XtremIO xbrick ESX Server Virtual Machines splunk-a-indx01 splunk-a-indx02 splunk-a-indx03 splunk-a-indx04 500 GB Ingest 8

For 1TB+ ingest: Added 2nd XtremIO brick online with automatic expansion (8) vsphere 5.5 hosts configured with CentOS Linux release 7.1.1503 (Core) Isilon X410 Cluster Isilon Cold XtremIO Hot/ Warm XtremIO xbrick ESX Server Virtual Machines splunk-a-indx01 splunk-a-indx02 splunk-a-indx03 splunk-a-indx04 Isilon X410 Cluster Isilon Cold XtremIO Hot/ Warm XtremIO xbrick ESX Server Virtual Machines splunk-b-indx05 splunk-b-indx06 splunk-b-indx07 splunk-b-indx08 1 TB Ingest CLARIFICATION ON BONNIE++ AND INLINE DATA REDUCTION Bonnie++ is the most widely used benchmarking tool for Splunk. It creates data using an algorithm for the put block (write) and rewrite file tasks that leads to many duplicate data/blocks in each of the files. XtremIO s inline data reduction engine eliminates these duplicates (and compresses what remains) which enables the array to process more data than it can do for real Splunk data set. Hence, the bandwidth seen with Bonnie++ might be artificially higher than a production Splunk environment. This effect can multiply when running multiple instances of Bonnie++ against a single XtremIO array, which is the case in this testing. In the absence of any other Splunk benchmarking tool, we have used Bonnie++ as way to highlight performance scaling of XtremIO with a high load on Splunk indexers. Readers should be aware of this caveat and perform proof of concepts against their own data. We will publish a revision of this document once a more suitable Splunk load generator is identified. 9

RESULTS The XtremIO system was easily able to handle write throughput of 2.2GB/s scaling out to 4.3GB/s. The read throughput scaled from 2.3GB/s to 5.1GB/s and the IOPS scaled from 24K to 45K. This is well above the Splunk minimum requirements for disk IO. Bonnie++ Results for 4 Indexers and 8 Indexers Splunk VM's put_block rewrite get_block seeks (MB/s) (MB/s) (MB/s) (IOPS) 4 indexers 2227 1190 2360 24708 8 indexers 4298 2648 5144 45151 6000 5000 Bonnie++ Scaling from 4 Indexers to 8 Indexers Throughput 4000 3000 2000 4 indexers 8 indexers 1000 0 put_block (MB/s) rewrite (MB/s) get_block (MB/s) 50000 40000 30000 Bonnie++ IOPS scale from 4 Indexers to 8 Indexers Seeks (IOPS) 20000 seeks (IOPS) 10000 0 4 indexers 8 indexers Customers can be confident that the XtremIO and Isilon platform easily handles ingesting TBs of data. In addition, customers could consolidate the amount of Splunk indexers required to support the necessary ingest rates resulting in a savings of compute resources as well as density, power and cooling in the data center. 10

THE EMC XTREMIO & ISILON SPLUNK SOLUTION EMC XtremIO and EMC Isilon scale-out architectures make them an ideal fit to handle the demanding Splunk requirements around intensive workloads for hot and warm data along with the ever-expanding capacity requirements of cold and frozen data. By addressing these key Splunk priorities separately, it allows the customer to implement the solution that best fits their needs without any contention across these three tiers that would be found in either a DAS or single platform appliance solutions. Importantly, it also allows customers the flexibility of expanding hot\warm or cold\frozen solution needs independently and protects against limitations and bottlenecks found in traditional architectures at scale. CONCLUSION Deep insight into new or previously ignored data sources has resulted in increased competitive advantage for corporations as they are able to improve productivity, profitability, customer experience, and retention. Splunk is a leading platform in this space that enables collection, analysis, and real-time insights into data sources. As customers take advantage of these capabilities and increase the volume of their analyzed data, supporting the performance, reliability, and security of the underlying infrastructure becomes critical. The EMC Splunk reference architecture composed of XtremIO and Isilon not only meets these requirements, but does it with the right economic model as key features such as data efficiencies and data-at-rest encryption are leveraged. XtremIO provides unmatched consistent performance and efficiency for hot and warm buckets and Isilon scale-out NAS creates a long-term and powerful storage solution for cold buckets. Both platforms leverage an underlying scale-out architecture to easily support scale without added overhead. This approach enables organizations to avoid the resource intensive complexity of traditional Splunk deployments and illustrates a simple environment for Splunk that can leverage existing and net new investments in VMware, XtremIO, and EMC Isilon. 11