Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS



Similar documents
Architec;ng Splunk for High Availability and Disaster Recovery

How To Use Splunk For Android (Windows) With A Mobile App On A Microsoft Tablet (Windows 8) For Free (Windows 7) For A Limited Time (Windows 10) For $99.99) For Two Years (Windows 9

Real World Big Data Architecture - Splunk, Hadoop, RDBMS

Incident Response Using Splunk for State and Local Governments

Stream Deployments in the Real World: Enhance Opera?onal Intelligence Across Applica?on Delivery, IT Ops, Security, and More

Splunk for Networking and SDN

Architec;ng Splunk for High Availability and Disaster Recovery

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Data Center Evolu.on and the Cloud. Paul A. Strassmann George Mason University November 5, 2008, 7:20 to 10:00 PM

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Linux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster. A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech

Background on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros

Apache Hadoop. Alexandru Costan

BENCHMARKING V ISUALIZATION TOOL

Hadoop & Spark Using Amazon EMR

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

ur skills.com

Cloud computing - Architecting in the cloud

BIG DATA TRENDS AND TECHNOLOGIES

Hadoop Architecture. Part 1

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Introducing Data Visualiza2on Cloud Service

Using Amazon EMR and Hunk to explore, analyze and visualize machine data

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Open source Google-style large scale data analysis with Hadoop

Copyright 2013 Splunk Inc. Introducing Splunk 6

Hadoop Setup. 1 Cluster

IntroducJon to Splunk Cloud & Case Study: MindTouch. Praveen Rangnath Splunk César López- Natarén MindTouch Aaron Fulkerson MindTouch

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

A Tutorial Introduc/on to Big Data. Hands On Data Analy/cs over EMR. Robert Grossman University of Chicago Open Data Group

Accelera'ng Your Solu'on Development with Splunk Reference Apps

Large scale processing using Hadoop. Ján Vaňo

Amazon EC2 Product Details Page 1 of 5

Chapter 7. Using Hadoop Cluster and MapReduce

XpoLog Competitive Comparison Sheet

Leveraging SAP HANA & Hortonworks Data Platform to analyze Wikipedia Page Hit Data

Assignment # 1 (Cloud Computing Security)

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

MyCloudLab: An Interactive Web-based Management System for Cloud Computing Administration

Savanna Hadoop on. OpenStack. Savanna Technical Lead

HDFS Cluster Installation Automation for TupleWare

Last time. Today. IaaS Providers. Amazon Web Services, overview

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Qsoft Inc

Hadoop Parallel Data Processing

Deploying Splunk on Amazon Web Services

Passwords are for Chumps

Big Data and Hadoop with components like Flume, Pig, Hive and Jaql

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

MapReduce, Hadoop and Amazon AWS

ArcGIS for Server: In the Cloud

Apache HBase. Crazy dances on the elephant back

Apache Hadoop new way for the company to store and analyze big data

APP DEVELOPMENT ON THE CLOUD MADE EASY WITH PAAS

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

Using SUSE Studio to Build and Deploy Applications on Amazon EC2. Guide. Solution Guide Cloud Computing.

A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud

Return on Experience on Cloud Compu2ng Issues a stairway to clouds. Experts Workshop Nov. 21st, 2013

HADOOP BIG DATA DEVELOPER TRAINING AGENDA

Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science

Microsoft Big Data Solutions. Anar Taghiyev P-TSP

Big Data. The Big Picture. Our flexible and efficient Big Data solu9ons open the door to new opportuni9es and new business areas

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

A Cost-Evaluation of MapReduce Applications in the Cloud

Clusters in the Cloud

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Big Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Hadoop on OpenStack Cloud. Dmitry Mescheryakov Software

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

CDH 5 Quick Start Guide

How to use. ankus v0.2.1 ankus community 작성자 : 이승복. This work is licensed under a Creative Commons Attribution 4.0 International License.

A very short Intro to Hadoop

Leveraging Machine Data to Deliver New Insights for Business Analytics

Big Data on Microsoft Platform

Cloud Based Tes,ng & Capacity Planning (CloudPerf)

Introduction to Cloud Computing

Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Big data blue print for cloud architecture

Deploying Hadoop with Manager

Data Stream Algorithms in Storm and R. Radek Maciaszek

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Mr. Apichon Witayangkurn Department of Civil Engineering The University of Tokyo

BIG DATA SOLUTION DATA SHEET

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Gain Insight into Your Cloud Usage with the Splunk App for AWS

Cloud Computing. Adam Barker

Transcription:

Copyright 2014 Splunk Inc. Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS Dritan Bi=ncka BD Solu=ons Architecture

Disclaimer During the course of this presenta=on, we may make forward looking statements regarding future events or the expected performance of the company. We cau=on you that such statements reflect our current expecta=ons and es=mates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward- looking statements, please review our filings with the SEC. The forward- looking statements made in the this presenta=on are being made as of the =me and date of its live presenta=on. If reviewed ater its live presenta=on, this presenta=on may not contain current or accurate informa=on. We do not assume any obliga=on to update any forward looking statements we may make. In addi=on, any informa=on about our roadmap outlines our general product direc=on and is subject to change at any =me without no=ce. It is for informa=onal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obliga=on either to develop the features or func=onality described or to include any such feature or func=onality in a future release. 2

About Me! Member of BD Solu=on Architecture team! Large scale deployments! Cloud and Big Data! Fourth.Conf

Agenda! Hunk! Amazon EMR! Understanding how Hunk and EMR can work together! Demo Analyzing HDFS/S3 data with Hunk on EMR 4

Introduc=on to Hunk

Splunk as a single pane of glass for your machine data 6

RDBM NoSQL Splunk> 7

Splunk> RDBM NoSQL Splunk> RDBM NoSQL 8

Hunk for Hadoop and NoSQL Data Stores Explore Splunk> Analyze Visualize RDBM NoSQL 9

Hunk for Hadoop and NoSQL Data Stores Explore Splunk> Analyze Visualize RDBM NoSQL 10

Hadoop Components HDFS NameNode DataNode Distributed, replicated, massively scalable file system MapReduce JobTracker TaskTracker Programming paradigm; two phase processing of large datasets ê We also use it, though a simplified version of it Scalable, fault tolerant etc. STORAGE COMPUTE 11

Splunk and Hadoop Data Splunk Hadoop Connect Export: Write data out to Hadoop, search based (push) Explore: Read data from Hadoop and analyze on SH 12

Splunk and Hadoop Data Splunk Hadoop Connect Export: Write data out to Hadoop, search based (push) Explore: Read data from Hadoop and analyze on SH PULL 13

Splunk and Hadoop Data Splunk Hadoop Connect Export: Write data out to Hadoop, search based (push) Explore: Read data from Hadoop and analyze on SH PULL STORAGE COMPUTE 14

Splunk and Hadoop Data Today Explore Analyze Visualize Dashboard s Share STORAGE COMPUTE 15

Splunk Stack Explore Analyze Visualize Dashboards Share splunkweb Web and Applica=on server Python, AJAX, CSS, XSLT, XML REST API Search Head Virtual Indexes C++, Web Services COMMAND LINE splunkd ODBC 64- bit Linux OS 16

Hunk Stack Explore Analyze Visualize Dashboards Share splunkweb Web and Applica=on server Python, AJAX, CSS, XSLT, XML REST API Search Head Virtual Indexes C++, Web Services COMMAND LINE splunkd 64- bit Linux OS ODBC Hadoop Interface Hadoop Client Libraries JAVA 17

Scaling with Hadoop Explore Analyze Visualize Dashboards Share splunkweb Web and Applica=on server Python, AJAX, CSS, XSLT, XML REST API Search Head Virtual Indexes C++, Web Services COMMAND LINE splunkd ODBC Hadoop Interface Hadoop Client Libraries JAVA Connect Hunk to mul=ple Hadoop clusters Hadoop Cluster 1 Hadoop Cluster 2 Hadoop Cluster 3 64- bit Linux OS 18

What Makes it Stick? In order to access and process data in external data stores (supports HDFS out-of-the-box), Hunk External Resource Providers (ERP) carry out the store-specific file system implementation and computational semantics. Hunk ERP Provider Family Provider Family is a logical grouping of data store framework that accesses the same kind of external systems and shares a global set of configura=ons. Hadoop A provider is a collec=on of specific Hunk ERP helper process implementa=on within the provider family and shares a cluster- specific configura=ons. ERP1 (prod) ERP2 (test) ATer you set up a provider, you configure virtual indexes (VIX) by giving Hunk informa=on about the data loca=on. Hunk then use the informa=on and its underlying implementa=on to distribute searches. VIX- 1 VIX- 2 VIX- 3 VIX- 4

Explore, Analyze, Visualize Data in Hadoop! Unlock business value of data in Hadoop! Fast to learn instead of scarce skills! Integrated explore, analyze and visualize! No fixed schema to search unstructured data! Preview results while MapReduce jobs start! Easier app development than in raw Hadoop 20

Integrated Analy=cs Plaoorm for Hadoop Data Full- featured, Integrated Product Explore Analyze Visualize Dashboards Share Insights for Everyone Works with What You Have Today Hadoop (MapReduce & HDFS) 21 21

Introduc=on to EMR

Amazon EMR! Amazon EMR is Hadoop framework in the cloud offered as a managed service! Used in variety of applica.ons, including log analysis, web indexing, data warehousing, machine learning, financial analysis, scien.fic simula.on, and bioinforma.cs Amazon EMR 23

Provisioning Hadoop on AWS 1. Login to AWS Console 2. Fill in a form 3. Click Create Cluster 4. Wait a few minutes for a fully operayonal Hadoop cluster 24

Why is EMR Compelling?! No Hadoop/HDFS management! NaYve support for AWS S3 Vast amounts of data in S3! Cluster Elas=city! Spot vs. Reserved Instances Long running vs. transient! Pay for what you use! Thousands of customers HDFS Master S3... 25

Integra=ng Hunk with EMR EMR Managed Hadoop framework on the cloud with access to vast amounts of data in HDFS and S3 Hunk Explore, analyze and visualize data from a central place Full analy=cs solu=on for Big Data on the cloud

Hunk on EMR: Op=on 1! Classic Hunk + Hadoop Provision an EMR cluster Provision a Hunk EC2 instance using the AWS Marketplace Hunk AMI Bring Your Own License (BYOL) Configure Hunk with EMR cluster ê Edit Security Groups to allow access ê Master IP addresses & Ports ê Create provider ê Create Virtual Index ê Search 27

Hunk on EMR: Op=on 2! Placeholder 28

! Analyze ELB or S3 Access Logs Demo! Analyze CloudTrail Access Logs 29

Copyright 2014 Splunk Inc. QUESTIONS? You may also like: Hunk 6.1 Technical Deep Dive Hunk Report AcceleraYon Deep Dive Comprehensive Security AnalyYcs for Modern Threats with Hunk

THANK YOU feedback: dritan@splunk.com