Real World Big Data Architecture - Splunk, Hadoop, RDBMS

Similar documents
Leveraging Machine Data to Deliver New Insights for Business Analytics

Big Data Unlock the mystery and see what the future holds. Philip Sow SE Manager, SEA

Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS

Big Data Technologies Compared June 2014

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

HDP Hadoop From concept to deployment.

So What s the Big Deal?

Tap into Hadoop and Other No SQL Sources

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

HDP Enabling the Modern Data Architecture

BIG DATA SOLUTION DATA SHEET

Oracle Big Data SQL Technical Update

DBX. SQL database extension for Splunk. Siegfried Puchbauer

#TalendSandbox for Big Data

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Native Connectivity to Big Data Sources in MSTR 10

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Big Data Analytics Nokia

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Big Data Analytics - Accelerated. stream-horizon.com

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc.

Apache Hadoop in the Enterprise. Dr. Amr Awadallah,

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Cost-Effective Business Intelligence with Red Hat and Open Source

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

A Scalable Data Transformation Framework using the Hadoop Ecosystem

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Parallel Data Warehouse

Constructing a Data Lake: Hadoop and Oracle Database United!

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Open source large scale distributed data management with Google s MapReduce and Bigtable

Big Data Too Big To Ignore

Upcoming Announcements

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Dominik Wagenknecht Accenture

Open source Google-style large scale data analysis with Hadoop

From Spark to Ignition:

From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten

Reference Architecture, Requirements, Gaps, Roles

Building Your Big Data Team

Big Data for everyone Democratizing big data with the cloud. Steffen Krause Technical

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

MapReduce with Apache Hadoop Analysing Big Data

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

Data processing goes big

BIG DATA TRENDS AND TECHNOLOGIES

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Oracle BI Roadmap & Visual Analyzer Ljiljana Perica, Oracle Business Solution Leader Ljiljana.perica@oracle.com

ITG Software Engineering

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

Architecting for the Internet of Things & Big Data

Scalable Architecture on Amazon AWS Cloud

Big Data Infrastructure at Spotify

Safe Harbor Statement

Bringing Big Data to People

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Introducing Oracle Exalytics In-Memory Machine

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Using distributed technologies to analyze Big Data

the missing log collector Treasure Data, Inc. Muga Nishizawa

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Qsoft Inc

IBM BigInsights for Apache Hadoop

Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved.

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

Background on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros

NextGen Infrastructure for Big DATA Analytics.

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA FOR MEDIA SIGMA DATA SCIENCE GROUP MARCH 2ND, OSLO

INVESTOR PRESENTATION. First Quarter 2014

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

The Future of Data Management

Telemetry: The Customer Experience

Unlocking Hadoop for Your Rela4onal DB. Kathleen Technical Account Manager, Cloudera Sqoop PMC Member BigData.

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Saving Millions through Data Warehouse Offloading to Hadoop. Jack Norris, CMO MapR Technologies. MapR Technologies. All rights reserved.

Big Data SQL and Query Franchising

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Enabling the Business of IT Through Splunk Dashboarding

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Presenters: Luke Dougherty & Steve Crabb

Oracle Big Data Essentials

Hadoop & Spark Using Amazon EMR

An Oracle White Paper October Oracle: Big Data for the Enterprise

Cloud Big Data Architectures

Transcription:

Copyright 2015 Splunk Inc. Real World Big Data Architecture - Splunk, Hadoop, RDBMS Raanan Dagan, Big Data Specialist, Splunk

Disclaimer During the course of this presentagon, we may make forward looking statements regarding future events or the expected performance of the company. We caugon you that such statements reflect our current expectagons and esgmates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward- looking statements, please review our filings with the SEC. The forward- looking statements made in the this presentagon are being made as of the Gme and date of its live presentagon. If reviewed arer its live presentagon, this presentagon may not contain current or accurate informagon. We do not assume any obligagon to update any forward looking statements we may make. In addigon, any informagon about our roadmap outlines our general product direcgon and is subject to change at any Gme without nogce. It is for informagonal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obligagon either to develop the features or funcgonality described or to include any such feature or funcgonality in a future release. 2

About Your Presenter Raanan Dagan - Sr. SE, Big Data specialist In Splunk for 3.5 Years Before Splunk worked for Cloudera and Oracle 3

Agenda Splunk Big Data Architecture AlternaGve Open Source Approach Real- World Customer Architecture Customer Roadmap Architecture End- to- end DemonstraGon (Splunk, Hadoop, RDBMS) 4

Big Data Technologies Rela(onal Database Structured NoSQL Semi- Structured Hadoop Semi- Structured Splunk Schema at Write Schema at Read Schema at Read Schema at Read SQL ETL Key- Value, Column, Document & Other Stores MapReduce HDFS Storage Search Real- Time Indexing RDBMS Oracle, MySQL, IBM DB2, Teradata Cassandra, Accumulo, MongoDB 5 MapReduce Distributed File System Time- Series, Unstructured, Heterogenous

Big Data Are you ge\ng the most out of your Big Data investments? 6

Big Data in Goverment How can you accelerate your Gme to value? 7

Splunk Big Data Copyright 2015 Splunk Inc.

Splunk Big Data Technologies DB Connect Hunk Splunk Schema at Write Schema at Read Schema at Read Schema at Read SQL ETL Key- Value, Column, Document & Other Stores MapReduce HDFS Storage Search Real- Time Indexing RDBMS Oracle, MySQL, IBM DB2, Teradata Cassandra, Accumulo, MongoDB 9 MapReduce Distributed File System Time- Series, Unstructured, Heterogenous

Splunk Scalability Enterprise- class Availability and Scale " AutomaGc load balancing linearly scales indexing " Distributed search and MapReduce linearly scales search and reporgng Offload search load to Splunk Search Heads Auto load- balanced forwarding to Splunk Indexers Send data from thousands of servers using any combinagon of Splunk forwarders 10

Splunk Real- Time AnalyGcs Data Monitor Input TCP/UDP Input Scripted Input Parsing Queue Parsing Pipeline Source, event typing Character set normalizagon Line breaking Timestamp idengficagon Regex transforms Index Queue Real- Gme Buffer Indexing Pipeline Raw data Index Files Real- Gme Search Process Splunk Index 11

Hunk - AnalyGcs Plaeorm for Hadoop Full- featured, Integrated Product Explore Analyze Visualize Dashboard s Share Insights for Everyone Works with What You Have Today Hadoop Clusters Hadoop Client Libraries NoSQL, EMR, S3 Buckets 12

Hunk Unique Features Virtual Index Enables seamless use of the Splunk technology stack on data wherever it rests NaGvely handles MapReduce Schema- on- the- fly Structure applied at search Gme No briile schema AutomaGcally find paierns and trends Flexibility and Fast Time to Value InteracGve search Preview results while MapReduce jobs run Drag- and- drop analygcs Security: Access Control, Pass Through Authen(ca(on, Kerberos 13

Hunk Provides Self- Service AnalyGcs for Hadoop Explore Analyze Visualize Dashboards Share Hadoop Storage 14

What About Structured Data? Customer profile Product azributes Employee details Pricing and Rate plans Asset info 15

Use cases for structured data in Splunk Index machine data from databases, such as logs or sales records Enrich machine data with high- level data, such as customer records Update structured databases with Splunk info, such as risk scores InteracGvely browse structured and unstructured data from Splunk reports

Machine Data Delivers Real- Gme Insights Media server logs (machine data) Phone Number IP Address Track ID Mar 01 19:18:50:000 aaa2 radiusd[12548]:[id 959576 local1.info] INFO RADOP(13) acct start for 2172618992@splunktel.com 10.164.232.181 from 12.130.60.5 recorded OK.! 2013-03-01 19:18:50:150 10.2.1.34 GET /sync/addtolibrary/01011207201000005652000000000053-80 - 10.164.232.181 "Mozilla/5.0 (iphone; CPU iphone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A405 Safari/7534.48.3" 503 0 0 825 1680! Mar 01 19:18:50:163 aaa2 radiusd[12548]:[id 959576 local1.info] INFO RADOP(13) acct stop for 2172618992@splunktel.com 10.164.232.181 from 12.130.60.5 recorded OK.! 17

Structured Data Contains Business Context Media server logs (machine data) Phone number IP address Track ID Mar 01 19:18:50:000 aaa2 radiusd[12548]:[id 959576 local1.info] INFO RADOP(13) acct start for 2172618992@splunktel.com 10.164.232.181 from 12.130.60.5 recorded OK.! 2013-03-01 19:18:50:150 10.2.1.34 GET /sync/addtolibrary/01011207201000005652000000000053-80 - 10.164.232.181 "Mozilla/5.0 (iphone; CPU iphone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A405 Safari/7534.48.3" 503 0 0 825 1680! Mar 01 19:18:50:163 aaa2 radiusd[12548]:[id 959576 local1.info] INFO RADOP(13) acct stop for 2172618992@splunktel.com 10.164.232.181 from 12.130.60.5 recorded OK.! Track ID ArGst Title Format ID Run Gme 01011207201000005652000000000053 Maroon 5 Moves like Jagger MP3 4:30 Customer, product databases Phone # 2172618992 53546 Subscriber ID Subscriber ID First Name Last Name Age State Customer Score 53546 Jim Morrison 25 CA 93 18

Splunk DB Connect Reliable, scalable, real- (me integra(on between Splunk and tradi(onal rela(onal databases " Enrich search results with addigonal business context " Easily import data into Splunk for deeper analysis " Integrate mulgple DBs concurrently " Simple set- up, non- evasive and secure Database lookup Oracle database Java Bridge Server ConnecGon pooling JDBC MicrosoR SQL server Database query Other databases 19

Copyright 2015 Splunk Inc. Customer Open Source AlternaGve

Hadoop Ecosystem OpGons

Hadoop Advantage / Disadvantage Advantage Cheap Storage Batch Distributed Processing Disadvantage Requires Coding for most AnalyGcs No VisualizaGon Tools No OOTB Apps / SoluGons 22

Copyright 2015 Splunk Inc. Real- World Customer Architecture

Summary Architecture 3 instances Splunk / Hunk / DB Connect Search Heads Real Time Data - 25 Indexers Historical data (VIX) 60 Hortonworks nodes Enrichment data (lookup) - MySQL DB 2000 Forwarders

Splunk Deployment Architecture indexer Web server 2,000 forwarders ~2TB per day indexer 25 indexers 3 search head ~250 Users ~30 Concurrent Users Web server forwarder 25

Hadoop Architecture WebLogic app server ~30 Flume Agents ~60 Data Nodes ~1.2 PB of storage ~2 years data retengon data node WebLogic app server data node data node data node 26

Splunk + Hunk = All the Data indexer Web server indexer app server Real Time AnalyGcs Alerts Apps Batch Compliment Splunk AnalyGcs Historical searches data node data node data node 27

DB Connect Architecture Install DB Connect on a Search Head Use DB Connect for Lookup Several Lookups coming from two different MySQL Databases Lookup Enrich log data with business insight Search Head DB- 1 DB- 2 MySQL JDBC Driver 28

DB - Architecture Performance Impact Command Connec(on Architecture Indexing Inputs - dbmon- tail ** Recommended Inputs dbmon- dump Outputs Not Indexing Medium number of connecgons (Small amount of data - only delta) Small amount of connecgons (Lots of data per connecgon) Lots of DB ConnecGons (Small amount of data) DB to Index (connecgon pooling) DB to Index (connecgon pooling) Search Head to DB (connecgon pooling) Search DBXQuery Lots of DB ConnecGons DB to Search Head Lookups ** Selected this op(on Lots of DB ConnecGons DB to Search Head

Summary Architecture 3 instances Splunk / Hunk / DB Connect Search Heads Real Time Data - 25 Indexers Historical data (VIX) 60 Hortonworks nodes Enrichment data (lookup) - MySQL DB 2000 Forwarders

Copyright 2015 Splunk Inc. Customer Roadmap Architecture

Archive Splunk Data to HDFS or AWS S3 WARM Hadoop Clusters COLD FROZEN Drive Down TCO by Archiving Historical Data to Commodity Hardware

Unified Search Intelligently Search Across Real- Time and Historical Data using the Same Splunk Interface Real- Time Data Historical Data in Hadoop

Roadmap Architecture Unified Search 3 instances Splunk / Hunk / DB Connect Search Heads Real Time Data - 25 Indexers Archive Historical data (VIX) 60 Hortonworks nodes Enrichment data (lookup) - MySQL DB 2000 Forwarders

Copyright 2015 Splunk Inc. Customer Chosen Architecture Demo

Support Our Military Kids Take our Survey! Splunk will Donate $10 Dollars to our Military Kids Plus a bonus if we hit 350 number of completed surveys onsite. Text Splunk to 878787. 36

Thank You Copyright 2015 Splunk Inc.