Copyright 2014 Splunk Inc. Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS Dritan Bi=ncka BD Solu=ons Architecture
Disclaimer During the course of this presenta=on, we may make forward looking statements regarding future events or the expected performance of the company. We cau=on you that such statements reflect our current expecta=ons and es=mates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward- looking statements, please review our filings with the SEC. The forward- looking statements made in the this presenta=on are being made as of the =me and date of its live presenta=on. If reviewed ater its live presenta=on, this presenta=on may not contain current or accurate informa=on. We do not assume any obliga=on to update any forward looking statements we may make. In addi=on, any informa=on about our roadmap outlines our general product direc=on and is subject to change at any =me without no=ce. It is for informa=onal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obliga=on either to develop the features or func=onality described or to include any such feature or func=onality in a future release. 2
About Me! Member of BD Solu=on Architecture team! Large scale deployments! Cloud and Big Data! Fourth.Conf
Agenda! Hunk! Amazon EMR! Understanding how Hunk and EMR can work together! Demo Analyzing HDFS/S3 data with Hunk on EMR 4
Introduc=on to Hunk
Splunk as a single pane of glass for your machine data 6
RDBM NoSQL Splunk> 7
Splunk> RDBM NoSQL Splunk> RDBM NoSQL 8
Hunk for Hadoop and NoSQL Data Stores Explore Splunk> Analyze Visualize RDBM NoSQL 9
Hunk for Hadoop and NoSQL Data Stores Explore Splunk> Analyze Visualize RDBM NoSQL 10
Hadoop Components HDFS NameNode DataNode Distributed, replicated, massively scalable file system MapReduce JobTracker TaskTracker Programming paradigm; two phase processing of large datasets ê We also use it, though a simplified version of it Scalable, fault tolerant etc. STORAGE COMPUTE 11
Splunk and Hadoop Data Splunk Hadoop Connect Export: Write data out to Hadoop, search based (push) Explore: Read data from Hadoop and analyze on SH 12
Splunk and Hadoop Data Splunk Hadoop Connect Export: Write data out to Hadoop, search based (push) Explore: Read data from Hadoop and analyze on SH PULL 13
Splunk and Hadoop Data Splunk Hadoop Connect Export: Write data out to Hadoop, search based (push) Explore: Read data from Hadoop and analyze on SH PULL STORAGE COMPUTE 14
Splunk and Hadoop Data Today Explore Analyze Visualize Dashboard s Share STORAGE COMPUTE 15
Splunk Stack Explore Analyze Visualize Dashboards Share splunkweb Web and Applica=on server Python, AJAX, CSS, XSLT, XML REST API Search Head Virtual Indexes C++, Web Services COMMAND LINE splunkd ODBC 64- bit Linux OS 16
Hunk Stack Explore Analyze Visualize Dashboards Share splunkweb Web and Applica=on server Python, AJAX, CSS, XSLT, XML REST API Search Head Virtual Indexes C++, Web Services COMMAND LINE splunkd 64- bit Linux OS ODBC Hadoop Interface Hadoop Client Libraries JAVA 17
Scaling with Hadoop Explore Analyze Visualize Dashboards Share splunkweb Web and Applica=on server Python, AJAX, CSS, XSLT, XML REST API Search Head Virtual Indexes C++, Web Services COMMAND LINE splunkd ODBC Hadoop Interface Hadoop Client Libraries JAVA Connect Hunk to mul=ple Hadoop clusters Hadoop Cluster 1 Hadoop Cluster 2 Hadoop Cluster 3 64- bit Linux OS 18
What Makes it Stick? In order to access and process data in external data stores (supports HDFS out-of-the-box), Hunk External Resource Providers (ERP) carry out the store-specific file system implementation and computational semantics. Hunk ERP Provider Family Provider Family is a logical grouping of data store framework that accesses the same kind of external systems and shares a global set of configura=ons. Hadoop A provider is a collec=on of specific Hunk ERP helper process implementa=on within the provider family and shares a cluster- specific configura=ons. ERP1 (prod) ERP2 (test) ATer you set up a provider, you configure virtual indexes (VIX) by giving Hunk informa=on about the data loca=on. Hunk then use the informa=on and its underlying implementa=on to distribute searches. VIX- 1 VIX- 2 VIX- 3 VIX- 4
Explore, Analyze, Visualize Data in Hadoop! Unlock business value of data in Hadoop! Fast to learn instead of scarce skills! Integrated explore, analyze and visualize! No fixed schema to search unstructured data! Preview results while MapReduce jobs start! Easier app development than in raw Hadoop 20
Integrated Analy=cs Plaoorm for Hadoop Data Full- featured, Integrated Product Explore Analyze Visualize Dashboards Share Insights for Everyone Works with What You Have Today Hadoop (MapReduce & HDFS) 21 21
Introduc=on to EMR
Amazon EMR! Amazon EMR is Hadoop framework in the cloud offered as a managed service! Used in variety of applica.ons, including log analysis, web indexing, data warehousing, machine learning, financial analysis, scien.fic simula.on, and bioinforma.cs Amazon EMR 23
Provisioning Hadoop on AWS 1. Login to AWS Console 2. Fill in a form 3. Click Create Cluster 4. Wait a few minutes for a fully operayonal Hadoop cluster 24
Why is EMR Compelling?! No Hadoop/HDFS management! NaYve support for AWS S3 Vast amounts of data in S3! Cluster Elas=city! Spot vs. Reserved Instances Long running vs. transient! Pay for what you use! Thousands of customers HDFS Master S3... 25
Integra=ng Hunk with EMR EMR Managed Hadoop framework on the cloud with access to vast amounts of data in HDFS and S3 Hunk Explore, analyze and visualize data from a central place Full analy=cs solu=on for Big Data on the cloud
Hunk on EMR: Op=on 1! Classic Hunk + Hadoop Provision an EMR cluster Provision a Hunk EC2 instance using the AWS Marketplace Hunk AMI Bring Your Own License (BYOL) Configure Hunk with EMR cluster ê Edit Security Groups to allow access ê Master IP addresses & Ports ê Create provider ê Create Virtual Index ê Search 27
Hunk on EMR: Op=on 2! Placeholder 28
! Analyze ELB or S3 Access Logs Demo! Analyze CloudTrail Access Logs 29
Copyright 2014 Splunk Inc. QUESTIONS? You may also like: Hunk 6.1 Technical Deep Dive Hunk Report AcceleraYon Deep Dive Comprehensive Security AnalyYcs for Modern Threats with Hunk
THANK YOU feedback: dritan@splunk.com