Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution



Similar documents
EMC SOLUTION FOR SPLUNK

Nexenta Performance Scaling for Speed and Cost

The Data Placement Challenge

Optimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief

THE SUMMARY. ARKSERIES - pg. 3. ULTRASERIES - pg. 5. EXTREMESERIES - pg. 9

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

Jumpstart VDI Deployments with NexentaVSA for View

Microsoft SQL Server 2014 Fast Track

INCREASING EFFICIENCY WITH EASY AND COMPREHENSIVE STORAGE MANAGEMENT

White Paper. Educational. Measuring Storage Performance

Leveraging EMC Fully Automated Storage Tiering (FAST) and FAST Cache for SQL Server Enterprise Deployments

OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006

THESUMMARY. ARKSERIES - pg. 3. ULTRASERIES - pg. 5. EXTREMESERIES - pg. 9

Remote/Branch Office IT Consolidation with Lenovo S2200 SAN and Microsoft Hyper-V

Maximum performance, minimal risk for data warehousing

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

E4 UNIFIED STORAGE powered by Syneto

New Features in PSP2 for SANsymphony -V10 Software-defined Storage Platform and DataCore Virtual SAN

An Oracle White Paper November Backup and Recovery with Oracle s Sun ZFS Storage Appliances and Oracle Recovery Manager

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Accelerating Server Storage Performance on Lenovo ThinkServer

June Blade.org 2009 ALL RIGHTS RESERVED

SOLUTION BRIEF. Resolving the VDI Storage Challenge

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

EMC XTREMIO EXECUTIVE OVERVIEW

Lab Evaluation of NetApp Hybrid Array with Flash Pool Technology

High Performance Server SAN using Micron M500DC SSDs and Sanbolic Software

Everything you need to know about flash storage performance

How To Scale Myroster With Flash Memory From Hgst On A Flash Flash Flash Memory On A Slave Server

Maxta Storage Platform Enterprise Storage Re-defined

Automated Data-Aware Tiering

Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Deep Dive on SimpliVity s OmniStack A Technical Whitepaper

FLASH 15 MINUTE GUIDE DELIVER MORE VALUE AT LOWER COST WITH XTREMIO ALL- FLASH ARRAY Unparal eled performance with in- line data services al the time

An Oracle White Paper October Realizing the Superior Value and Performance of Oracle ZFS Storage Appliance

Flash Accel, Flash Cache, Flash Pool, Flash Ray Was? Wann? Wie?

NEXENTA S VDI SOLUTIONS BRAD STONE GENERAL MANAGER NEXENTA GREATERCHINA

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

WHITE PAPER. Software Defined Storage Hydrates the Cloud

Introduction to NetApp Infinite Volume

A virtual SAN for distributed multi-site environments

VMware Software-Defined Storage Vision

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

" " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " Datto Makes Virtual Hybrid Cloud Backup Easy! Product Analysis!

IBM System Storage DS5020 Express

Essentials Guide CONSIDERATIONS FOR SELECTING ALL-FLASH STORAGE ARRAYS

Understanding the Economics of Flash Storage

Flash Performance in Storage Systems. Bill Moore Chief Engineer, Storage Systems Sun Microsystems

Lab Validation Report

Best Practices for Deploying Citrix XenDesktop on NexentaStor Open Storage

Violin Symphony Abstract

Evaluation Report: Supporting Microsoft Exchange on the Lenovo S3200 Hybrid Array

ENTERPRISE STORAGE WITH THE FUTURE BUILT IN

FAS6200 Cluster Delivers Exceptional Block I/O Performance with Low Latency

Deploying Flash in the Enterprise Choices to Optimize Performance and Cost

With DDN Big Data Storage

Infortrend ESVA Family Enterprise Scalable Virtualized Architecture

MaxDeploy Hyper- Converged Reference Architecture Solution Brief

Big data management with IBM General Parallel File System

EMC Virtual Infrastructure for Microsoft Applications Data Center Solution

WHITE PAPER. Drobo TM Hybrid Storage TM

Virtualizing SQL Server 2008 Using EMC VNX Series and Microsoft Windows Server 2008 R2 Hyper-V. Reference Architecture

EMC VFCACHE ACCELERATES ORACLE

How To Create A Flash-Enabled Storage For Virtual Desktop 2.5 (Vdi) And 3.5D (Vdi) With Nimble Storage

Nutanix Tech Note. Configuration Best Practices for Nutanix Storage with VMware vsphere

SQL Server Virtualization

StorPool Distributed Storage Software Technical Overview

SAP Running on an EMC Virtualized Infrastructure and SAP Deployment of Fully Automated Storage Tiering

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Using Synology SSD Technology to Enhance System Performance Synology Inc.

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

Unlock the value of data with smarter storage solutions.

Using Synology SSD Technology to Enhance System Performance. Based on DSM 5.2

Microsoft Windows Server Hyper-V in a Flash

EMC XtremSF: Delivering Next Generation Performance for Oracle Database

Total Cost of Solid State Storage Ownership

Accelerating Real Time Big Data Applications. PRESENTATION TITLE GOES HERE Bob Hansen

Infortrend EonNAS 3000 and 5000: Key System Features

ACCELERATING SQL SERVER WITH XTREMIO

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

HP and SanDisk Partner for the HP 3PAR StoreServ 7450 All-flash Array

Simplified Management With Hitachi Command Suite. By Hitachi Data Systems

Amazon Cloud Storage Options

NEXSAN NST STORAGE FOR THE VIRTUAL DESKTOP

Scala Storage Scale-Out Clustered Storage White Paper

EMC Backup and Recovery for Microsoft SQL Server 2008 Enabled by EMC Celerra Unified Storage

ntier Verde Simply Affordable File Storage

HP Flash Storage as part of the Converged Infrastructure

All-Flash Storage Solution for SAP HANA:

SOLID STATE DRIVES AND PARALLEL STORAGE

EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server

Flash Storage Optimizing Virtual Desktop Deployments

Transcription:

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution Jonathan Halstuch, COO, RackTop Systems JHalstuch@racktopsystems.com Big Data Invasion

We hear so much on Big Data and its effect on company business, such as fragmented information streams, how to manipulate and use massive amounts of collected information, storage considerations, and the hardships that these issues involve that we have to wonder isn t there a better way? Generally, it is said, that when a company needs to look at information generated by its own devices, it gets a piecemeal view of its results data logs are over here; billing and call records are over there; performance metrics require a totally different screen and application to view results; and click stream functionality needs yet another destination in order to review. And all of these live in a completely different headspace from where to store all this data. What to do? The solution to all this is to centralize and organize all this disparate machine data into one easy to use application, with storage at the back end, so as to enhance business continuity with mission-critical data that is easy to gather, access, analyze, secure, and store. Today, there is an even greater need to better understand your business environment so as to drive insightful decisions. Newer and better search and visualization tools are required to analyze data (Question-focused data set). Automated reporting is needed to consolidate and speed up business analysis, and, most important, the overall view of this data must move from piecemeal snapshots to a comprehensive aggregate overview. And then all this data has to be stored and made accessible. And with this operational intelligence, companies are realizing significant ROI results such as fewer servers needed; tools consolidation; cost reductions in personnel; troubleshooting time per transaction; root cause identification; fewer outages and downtime; lower mean time to repair; infrastructure savings; and, most important, customer satisfaction. Splunk software enables businesses to gather machine data into a searchable repository and generate reports, alerts, and dashboards providing granular analysis, problem detection, and specific, focused, business intelligence. It can aid in troubleshooting application problems and detecting security incidents in short order (hours versus days or longer), thereby minimizing service issues and outages. Automated monitoring and alerting provides for ongoing data monitoring for application, network, and host issues that might be out of sync. Copyright 2013 RackTop Systems, ALL RIGHTS RESERVED Page 2 of 7

Indexing As Splunk accumulates and indexes data, it creates two types of files: 1) Raw Data, which is a compressed version of all data; and 2) Index Files, which are the flat files of extracted information based on user-defined fields, which are highly customizable. Collecting and repurposing information is at the heart of Splunk. Today, many plug-ins exist for collecting information from disparate operating systems and applications. The collected data is indexed, which then allows for fast search, retrieval, and manipulation of information. The data is further enhanced by separating the data stream into individual, searchable events; creating or identifying time stamps; extracting fields such as host, source, and source type; and performing user-defined actions, such as identifying custom fields, masking sensitive data, filtering unwanted events, and routing events to specified indexes or servers. Index Data Report & Analyze Search & Investigate Monitor & Alert Add Knowledge What Splunk Does From Indexes to Buckets Splunk collects and stores indexes in directories called Buckets, which consists of the index file and the raw data. The index and raw data is moved through these buckets Buckets Hot, Warm, Cold, Frozen, Thawed based upon timing or capacity thresholds. The availability and purpose of this data changes based upon what bucket it resides in. Bucket Stage Description Searchable Hot Contains newly indexed data. Open for writing. One or more hot buckets for each index. Warm Data rolled from hot. There are many warm buckets. Yes Cold Data rolled from warm. There are many cold buckets. Yes Frozen Data rolled from cold. Splunk deletes frozen data by default, however it can be archived in the frozen bucket. Thawed Restored Archive (Frozen) Data. Doesn t age off. Yes Copyright 2013 RackTop Systems, ALL RIGHTS RESERVED Page 3 of 7 Yes No

Lifecycle of Bucket Data Hot / Warm Buckets require high performance Read / Writes as this data necessitates a lot of random IO. Cold Buckets require data integrity and capacity. Therefore, the Cold Bucket still will need to handle reads in the case of long search queries. Frozen Buckets also require data integrity and capacity as the Frozen Bucket is the data archive. It can be thawed when needed for a search. Now that you have all that info, where do you store it? BrickStor as a Solution Splunk makes data aggregation easily available, but file counts quickly can grow into the billions. Data integrity, over the long run, must be protected. The storage system for all this accumulated data must be able to scale up to terabyte and petabyte capacities, offer efficient storage management, and protection against data corruption. BrickStor, uses ZFS, an open source technology to help organizations implement high performance, yet cost-effective data storage solutions by taking advantage of features such as compression, inline deduplication, unlimited snapshots and cloning, and high availability support. Additional key features include: ZFS technology the most scalable and flexible 128-bit file system Unlimited File Size Unlimited Snapshots Native inline deduplication & Compression Hybrid Storage Pools End-to-End Data Integrity (ZFS, checksumming, etc.) Heterogeneous Block & File Replication Block-level Mirroring Simplified Disk Management ZFS Hybrid Storage Pools ZFS uses robust, scalable technology with features not available in other file systems today. ZFS Hybrid Storage Pools (HSP) allow you to combine DRAM, SSDs, and spinning HDDs into an accelerated storage medium. These ZFS Hybrid Storage Pools optimize performance for any given working set by minimizing I/O bottlenecks. In addition, by reducing read and write latency, users end up with a system that outperforms stodgy old legacy storage systems, while having a much lower total cost of ownership. As part of this equation, you can have multiple pools within a single appliance, further reducing overall costs and simplifying management. When this concept is applied to applications such as Splunk, it enables end users to address Big Data without worrying about storage management. Typically, Hybrid Storage Pools deliver higher IOPS; leverage a combination of SSDs and Spinning SAS HDDs; use compression to save space and power; and provide flexibility that enables Block and File protocols. In addition, ZFS, with its self-healing architecture, maintains data integrity with end-to-end Copyright 2013 RackTop Systems, ALL RIGHTS RESERVED Page 4 of 7

checksumming. In applications such as Splunk, this level of flexibility enables users to create hybrid storage pools designed and tuned for a specific purpose or Splunk bucket. This further reduces total cost of ownership without sacrificing performance. Specifically for high performance low latency a user may consider a pool with striped vdev s and a read cache appropriate for the expected working set. For extreme performance a professional may even consider an all-flash SSD tier. Both configurations are ideal for the Hot/Warm Bucket and index files. For the Warm and Cold Bucket professionals should consider a RAID-Z2 configuration. This configuration provides higher usable capacity over a mirrored vdev (RAID-10 equivalent). Flexibility and the ability to reconfigure disks and add caching later is the key to near and long term success. The flexibility of Hybrid Storage Pools ensures continued success without a forklift replacement should the environment or business needs change in the future. Performance RackTop s tiered approach to storage allows for differing configurations to be used, depending on desired performance, capacity needs, speed of access required, and overall budgetary constraints. As the following table illustrates, Splunk users (and all others dealing with enormous volumes of data) can regulate or fine tune their hardware configuration to meet their specific demands. Splunk Bucket Performance in Regards to Hardware 1 On SSDs SSDs deliver significant performance gains over conventional hard drives in searches that demand small set results from large data volumes ( rare searches or needle in a haystack searches). SSDs also deliver performance gains with concurrent searches. Copyright 2013 RackTop Systems, ALL RIGHTS RESERVED Page 5 of 7

So, for example, one solution might be to use SSDs for Hot and Warm Splunk Indexes, while employing spinning media for Cold, especially if the price per GB with SSDs keeps getting lower, thus making it even more cost efficient versus traditional disk drives. Benefits of Splunk with SSDs Legend for graph: 7200 2 4 2.40GHz, 16GB, 12x2TB 7200 RPM SATA RAID 10 10k 2 6 2.677GHz, 48GB, 4x900GB 10K RPM SAS RAID 10 15k 2 6 2.667GHz, 12GB, 6x146GB 15K RPM SAS RAID 10 SSD 2 4 2.40GHz, 16GB, 1x240GB (same as 7200 w PCIe SSD) Hardware Three machines were used for this benchmark, classified them by disk speed. CPU and memory were not identical. 2 2 Source: http://blogs.splunk.com/2012/05/10/quantifying-the-benefits-of-splunk-with-ssds/ Conclusion Copyright 2013 RackTop Systems, ALL RIGHTS RESERVED Page 6 of 7

With huge data generating applications such as Splunk, RackTop BrickStor is uniquely positioned to address the massive storage needs generated by data collecting applications as it provides the only product that enables flexible and scalable tiered storage architecture across storage access protocols. BrickStor provides almost limitless future scalability, along with the capability to expand capacity and performance in the future. BrickStor is a ZFS storage solution based on open source technology that provides future-proofing capabilities through ease of scalability and an IT partner that acts as an extension of your internal IT team. Each installation will differ in hardware needs, storage capacity, performance requirements, and many other elements. RackTop has solutions ranging from just a few TB s to multiple Petabytes in a single rack. For more specific information concerning you installation, please feel free to contact RackTop (www.racktopsystems.com/splunk/) at info@racktopsystems.com. Copyright 2013 RackTop Systems, ALL RIGHTS RESERVED Page 7 of 7