Technical Deep Dive: Hunk: Splunk Analy<cs for Hadoop Beta

Similar documents

Gain Insight into Your Cloud Usage with the Splunk App for AWS

More Comprehensive Digital Intelligence - CorrelaFng Client and Server- side Data

Windows Inputs and MicrosoC Apps Strategy

Splunk Apps for Monitoring Microso< Based Infrastructure

Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS

Architec;ng Splunk for High Availability and Disaster Recovery

How to Leverage Splunk s Security Intelligence PlaKorm for Security OperaNons Environments

Splunk Enterprise in the Cloud Vision and Roadmap

Workflow ProducCvity in Splunk Enterprise

Real World Big Data Architecture - Splunk, Hadoop, RDBMS

Deployment Best PracHces for Splunk Apps Monitoring MicrosoK- based Infrastructure

HFAA: A Generic Socket API for Hadoop File Systems

Splunk for Networking and SDN

Incident Response Using Splunk for State and Local Governments

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

HDFS. Hadoop Distributed File System

OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS)

Accelera'ng Your Solu'on Development with Splunk Reference Apps

Big Data : Experiments with Apache Hadoop and JBoss Community projects

Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com

Best PracBces: Deploying Splunk on Physical, Virtual, and Cloud Infrastructure

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE

Best Practices for Building Splunk Apps and Technology Add-ons

A very short Intro to Hadoop

Skynax. Mobility Management System. System Manual

Hadoop 只支援用 Java 開發嘛? Is Hadoop only support Java? 總不能全部都重新設計吧? 如何與舊系統相容? Can Hadoop work with existing software?

File Management. Chapter 12

Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013

HBase Schema Design. NoSQL Ma4ers, Cologne, April Lars George Director EMEA Services

The Jiffy Lube Quick Tune- up for your Splunk Environment

Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD. Introduction to Pig

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

Optimize the execution of local physics analysis workflows using Hadoop

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

A Tutorial Introduc/on to Big Data. Hands On Data Analy/cs over EMR. Robert Grossman University of Chicago Open Data Group

Ankush Cluster Manager - Hadoop2 Technology User Guide

Apache Spark 11/10/15. Context. Reminder. Context. What is Spark? A GrowingStack

CS 378 Big Data Programming. Lecture 2 Map- Reduce

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Processing millions of logs with Logstash

t] open source Hadoop Beginner's Guide ij$ data avalanche Garry Turkington Learn how to crunch big data to extract meaning from

Using RDBMS, NoSQL or Hadoop?

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Unified Big Data Processing with Apache Spark. Matei

Understanding the Digital Audience

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Extreme Computing. Hadoop MapReduce in more detail.

Schema Design Patterns for a Peta-Scale World. Aaron Kimball Chief Architect, WibiData

CS 378 Big Data Programming

BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview

HADOOP PERFORMANCE TUNING

Open source Google-style large scale data analysis with Hadoop

SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON

Luncheon Webinar Series May 13, 2013

Big Data Course Highlights

KonyOne Server Prerequisites _ MS SQL Server

Finding Order(s) in the Chaos

Complete Java Classes Hadoop Syllabus Contact No:

Hadoop. History and Introduction. Explained By Vaibhav Agarwal

User Guide FOR TOSHIBA STORAGE PLACE

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Big Data Technology Core Hadoop: HDFS-YARN Internals

Maximizing Hadoop Performance with Hardware Compression

Ankush Cluster Manager - Cassandra Technology User Guide

Hadoop and Map-Reduce. Swati Gore

Case Closed Installation and Setup

Cloud Computing. Chapter Hadoop

Unified Batch & Stream Processing Platform

Big Data Analytics* Outline. Issues. Big Data

Hadoop SNS. renren.com. Saturday, December 3, 11

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Using the Push Notifications Extension Part 1: Certificates and Setup

Replicating to everything

Hadoop 101. Lars George. NoSQL- Ma4ers, Cologne April 26, 2013

MapReduce, Hadoop and Amazon AWS

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Chapter 7. Using Hadoop Cluster and MapReduce

EMC SOLUTION FOR SPLUNK

Constructing a Data Lake: Hadoop and Oracle Database United!

curbi for Schools Technical Overview October 2014

Introduction to Pig. Content developed and presented by: 2009 Cloudera, Inc.

Release Notes: Onsight Connect for Android Software Release Notes. Software Version Revision 1.0.0

Real-time Analytics at Facebook: Data Freeway and Puma. Zheng Shao 12/2/2011

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine

BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig

HADOOP MOCK TEST HADOOP MOCK TEST II

Change Manager 5.0 Installation Guide

COURSE CONTENT Big Data and Hadoop Training

Transcription:

Copyright 2013 Splunk Inc. Technical Deep Dive: Hunk: Splunk Analy<cs for Hadoop Beta Ledion Bi<ncka Splunk Inc

Legal No<ces During the course of this presenta<on, we may make forward- looking statements regarding future events or the expected performance of the company. We cau<on you that such statements reflect our current expecta<ons and es<mates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward- looking statements, please review our filings with the SEC. The forward- looking statements made in this presenta<on are being made as of the <me and date of its live presenta<on. If reviewed aser its live presenta<on, this presenta<on may not contain current or accurate informa<on. We do not assume any obliga<on to update any forward- looking statements we may make. In addi<on, any informa<on about our roadmap outlines our general product direc<on and is subject to change at any <me without no<ce. It is for informa<onal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obliga<on either to develop the features or func<onality described or to include any such feature or func<onality in a future release. Splunk, Splunk>, Splunk Storm, Listen to Your Data, SPL and The Engine for Machine Data are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respeccve owners. 2013 Splunk Inc. All rights reserved. 2

About Me! Sr Architect! 6+ years at Splunk! Mainly involved in search <me stuff, like: Key- value pair extrac<on Scheduler & Aler<ng Transac<ons, even[ypes, tags etc MySQLConnect, HadoopConnect Hunk! @ledbit 3

The Problem! Large amounts of data in Hadoop Rela<vely easy to get the data in Hard & <me- consuming to get value out! Splunk has solved this problem before Primarily for event <meseries! Wouldn t if be great if Splunk could be used to analyze Hadoop data? Hadoop + Splunk = Hunk 4

The Goals! A viable solu<on must: Process the data in place Maintain support for Splunk Processing Language (SPL) True schema on read Report previews Ease of setup & use 5

GOALS Support SPL! Naturally suitable for MapReduce! Reduces adop<on <me! Challenge: Hadoop apps wri[en in Java & all SPL code is in C++! Por<ng SPL to Java would be a daun<ng task! Reuse the C++ code somehow use splunkd (the binary) to process the data JNI is not easy nor stable 6

GOALS Schema on Read! Apply Splunk s index- <me schema at search <me Event breaking, <me stamping etc.! Anything else would be bri[le & maintenance nightmare! Extremely flexible! Run<me overhead (manpower >>$ computa<on)! Challenge: Hadoop apps wri[en in Java & all index- <me schema logic is implemnted in C++ 7

GOALS Previews! No one likes to stare at a blank screen!! Challenge: Hadoop is designed for batch- like jobs 8

GOALS Ease of Setup & Use! Users should just specify: Hadoop cluster they want to use Data within the cluster they want to process! Immediately be able to explore & analyze their data 9

Deployment Overview 10

Move Data to Computa<on (stream)! Move data from HDFS to SH! Process it in a streaming fashion! Visualize the results! Problem? 11

Move Computa<on to Data (MR)! Create and start a MapReduce job to do the processing! Monitor MR job & collect its results! Merge the results and visualize! Problem? 12

Mixed Mode! Use both computa<on models concurrently Stream preview Switch over <me preview MR Time 13

First Search Setup hdfs://<working dir>/packages Hunk Search Head > 1. Copy splunkd package.tgz HDFS 2. Copy.tgz TaskTracker 1.tgz TaskTracker 2 TaskTracker 3 3. Expand in specified loca<on on each TaskTracker 4. Receives package in subsequent searches 14

Data Flow 15

Search Streaming Data Flow HDFS Raw data / data transfer ERP Preprocessed data / stdin Search process Final search results Results Search Head 16

Search TaskTracker Repor<ng Data Flow raw preprocessed MapReduce Search process Remote results HDFS Remote results Remote results ERP Search Head Search process Remote results Final search results Results 17

Search DataNode/TaskTracker Data Processing Raw data (HDFS) Custom processing stdin Indexing pipeline Search pipeline You can plug in data preprocessors e.g. Apache Avro or format readers Event breaking Timestamping Event typing Lookups Tagging Search processors HDFS Inter. Results Compress stdout Results MapReduce/Java splunkd/c++ 18 18

Search Schema<za<on Raw data Parsing Merging Typing splunkd/c++ IndexerPip e Search Pipeline... Results 19

Search Search Head! Responsible for: Orchestra<ng everything Submixng MR jobs (op<onally splixng bigger jobs into smaller ones) Merging the results of MR jobs ê Poten<ally with results from other VIXes or na<ve indexes Handling high level op<miza<ons 20

Op<miza<on Par<<on Pruning! Data is usually organized into hierarchical dirs, eg. /<base_path>/<date>/<hour>/<hostname>/somefile.log! Hunk can be instructed to extract fields and <me ranges from a path! Ignores directories that cannot possibly contain search results 21

Op<miza<on Par<<on Pruning, e.g. Paths in a VIX: /home/hunk/20130610/01/host1/access_combined.log /home/hunk/20130610/02/host1/access_combined.log /home/hunk/20130610/01/host2/access_combined.log /home/hunk/20130610/02/host2/access_combined.log Search: index=hunk server=host1 " Paths searched: /home/hunk/20130610/01/host1/access_combined.log /home/hunk/20130610/02/host1/access_combined.log 22

Op<miza<on Par<<on Pruning, e.g. Paths in a VIX: /home/hunk/20130610/01/host1/access_combined.log /home/hunk/20130610/02/host1/access_combined.log /home/hunk/20130610/01/host2/access_combined.log /home/hunk/20130610/02/host2/access_combined.log Search: index=hunk earliest_time= 2013-06-10T01:00:00 latest_time = 2013-06-10T02:00:00 " Paths searched: /home/hunk/20130610/01/host1/access_combined.log /home/hunk/20130610/01/host2/access_combined.log 23

Best Prac<ces! Par<<on data in FS using fields that: Are commonly used Rela<vely low cardinality! For new data, use formats that are well defined, e.g. Avro, json etc Avoid columnar formats, like csv/tsv (hard to split)! Use compression, gzip, snappy etc I/O becomes a bo[leneck at scale 24

Troubleshoo<ng Troubleshoo<ng! Search.log is your friend!!!! Log lines annotated with ERP.<name>! Links for spawned MR job(s)! Follow these links to troubleshoot MR issues! hdfs://<base_path>/dispatch/<sid>/<num>/<dispatch_dirs> contains the dispatch dir content of searches ran on TaskTracker 25

Troubleshoo<ng Job Inspector 26

Troubleshoo<ng Common Problems! User running Splunk does not have permission to write to HDFS or run MapReduce jobs! HDFS SPLUNK_HOME not writable! DN/TT SPLUNK_HOME not writable, out of disk! Data reading permission issues 27

Demo 28

Helpful Resources! Download h[p://www.splunk.com/bigdata! Help & Docs h[p://docs.splunk.com/documenta<on/hunk/6.0/hunk/meethunk! Resource h[p://answers.splunk.com 29

Next Steps 1 2 Download the.conf2013 Mobile App If not iphone, ipad or Android, use the Web App Take the survey & WIN A PASS FOR.CONF2014 Or one of these bags! 3 View the other Using sessions All sessions are available on the Mobile App Videos will be available shortly 30

THANK YOU