Exploratory AnalyAcs for Shared- service Hadoop Clusters

Size: px
Start display at page:

Download "Exploratory AnalyAcs for Shared- service Hadoop Clusters"

Transcription

1 Copyright 2014 Splunk Inc. Exploratory AnalyAcs for Shared- service Hadoop Clusters Sagi Zelnick Principal Architect, Yahoo

2 Disclaimer During the course of this presentaaon, we may make forward- looking statements regarding future events or the expected performance of the company. We cauaon you that such statements reflect our current expectaaons and esamates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward- looking statements, please review our filings with the SEC. The forward- looking statements made in the this presentaaon are being made as of the Ame and date of its live presentaaon. If reviewed arer its live presentaaon, this presentaaon may not contain current or accurate informaaon. We do not assume any obligaaon to update any forward- looking statements we may make. In addiaon, any informaaon about our roadmap outlines our general product direcaon and is subject to change at any Ame without noace. It is for informaaonal purposes only, and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligaaon either to develop the features or funcaonality described or to include any such feature or funcaonality in a future release. 2

3 Overview! Yahoo: 8+ years of innovaaon! Yahoo: organizaaon- wide investment for next 3+ years! Yahoo providing Hunk as a self- service to explore, analyze & visualize data in HDFS Hunk allows for visually browsing very complex tables (250+ fields) Rapid prototyping for new jobs with almost instant results for searches, without having to wait for the enare job/query to finish Cuts down on the development cycles by faster interacaon with results Built- in graphs/charts makes for a powerful soluaon for many situaaons

4 History of Hadoop Yahoo

5 Over 600PB of Hadoop Storage (Over Half an Exabyte)! Very large clusters used by many groups across the enterprise! More than 35,000 individual datanodes! Hadoop is provided as a service! MulAple cluster types such as research, dev, sandbox and producaon! Services such as HBase, Hive, Oozie, etc! Users are free to run jobs, but have resource constraints! Maintained by the Grid OperaAons Group

6 Integrated AnalyAcs Plajorm for Diverse Data Stores Full- featured, Integrated Product Fast Insights for Everyone Explore Analyze Visualize Dashboard s Share Works with What You Have Today Hadoop Client Libraries Hadoop Clusters Streaming Resource Libraries NoSQL and Other Data Stores

7 Improving OperaAonal Visibility with Hunk! We pointed Hunk at many operaaonal logs and event data we already had on the grid! This includes system metrics, HDFS ops, JVM stats and YARN metrics! Created instrumentaaon to measure usage per user and job! Analyzed terabytes of NameNode audit logs! Job history leveraged for visualizing usage/growth and historical views! Custom events for HBase staasacs

8 Tracking Hadoop Performance & Metrics in Hunk Use Case Customer Benefits System metrics from 35k nodes Historical insights of resources Grid Ops / Grid Customers All Grid Customers IdenAfy slow tasks/nodes when debugging Track organic growth Job performance All Grid Customers Improved job SLAs HBase metrics Job logs in near real- Ame All Grid Customers All Grid Customers / Ops Track region/rs/table metrics Search for errors directly from the YARN logs

9 Measuring NameNode Performance Pre & Post Upgrades! Historical visualizaaons of all operaaons! Search data in Hunk from billions of NameNode events! Measure JVM and memory usage! Insights into operaaonal performance

10 10,086 events (5/15/14 1:00: AM to 5/22/14 1:36: AM) Visualization VisualizaAon Using Hunk num_blockreports num_copybl...perations num_heartbeats num_readbl...perations num_readme...perations num_replac...operations num_writeb...operations num_blockchecksumop 600,000, ,000, ,000,000 Fri May Sun May 18 Tue May 20 _time _time num_bl ockrep orts num_copy BlockOpera tions num_ HeartB eats num_read BlockOpera tions num_readme tadataoperati ons num_replac eblockoperat ions num_write BlockOpera tions num_blo ckchecks umop :

11 Visualization Sample TroubleshooAng in Hunk of 750 Million Events num_blockreports num_copybl...perations num_heartbeats num_readbl...perations num_readme...perations num_replac...operations num_writeb...operations num_blockchecksumop 1,000,000, ,000, ,000, ,000,000 12:00 PM Tue May :00 AM Wed May 21 _time 12:00 PM _time num_bl ockrep orts num_copy BlockOpera tions num_ HeartB eats num_read BlockOpera tions num_readme tadataoperati ons num_replac eblockoperat ions num_write BlockOpera tions num_blo ckchecks umop :15:

12 Big Picture Plus Granular Details

13 Analyzing NameNode RPC Calls (TroubleshooAng)! Who is making what RPC call (open, liststatus, create, etc.)! How oren are they making these RPC calls! From which IP/host are they coming from! Search and visualize historical data from billions of events! Prevent NameNode abuse/misuse

14 Visualizing 834 Million Discrete Events

15 ConAnued

16 Queue Insights (Capacity & Provisioning)! Each Hadoop job runs in a specific queue! We track every aspect of the YARN framework! Immediate queue performance and configuraaon profiling via job history server! Historical views and trends that enable beper capacity management! Improved queue ualizaaon and allocaaon management

17 Search Splunk New Search Visualizing Queues index="jobsummary_logs_all_red" cluster="dilithium*" eval total_slot_seconds=(mapslotseconds + reduceslotsec onds) eval gb_hours=((total_slot_seconds * 0.5) / 3600) eval gb_hours=round(gb_hours) timechart span=6h sum (gb_hours) as gb_hours by queue Last 7 days 1,175,726 events (5/20/14 8:00: PM to 5/27/14 8:26: PM) Visualization 600, , ,000 Wed May Thu May 22 Fri May 23 Sat May 24 Sun May 25 Mon May 26 _time _time OTH ER apg_dai lyhigh_ p3 apg_dail ymedium _p5 apg_hou rlyhigh_ p1 apg_ho urlylow_ p4 apg_hourl ymedium _p2 apg _p7 curveb all_larg e curveb all_me d sling shot sling stone :

18 Self- Service Job Reports! Each job is unique and so are the map and reduce elements! How to start analyzing jobs?! Historical job performance and profiling enables in- depth performance tuning! Long terms historical views and trending of growth

19 index="jobsummary_logs_all_blue" cluster="*" user="gmon" eval total_slot_seconds=(mapslotseconds + reduceslotseconds) eval gb_hours=((total_slot_seconds * 0.5) / 3600) eval gb_hours=round(gb_hours,2) eval runtime=(finishtime-submittime)/1000 stats sum(gb_hours) as gb-hours avg(runtime) as run_mins by cluster user queue jobname jobid status eval run_mins=round(run_mins/60,2) sort -gb-hours Yesterday 4,871 events (5/26/14 12:00: AM to 5/27/14 12:00: AM) Statistics (4,871) clu ster us er que ue jobname jobid status gb-ho urs run_ mins cob alt g m on grid eng PigLatin:findRemoteHDFSFromAudits.pig job_ _ SUCCE EDED cob alt g m on grid eng PigLatin:findRemoteHDFSFromAudits.pig job_ _ SUCCE EDED cob alt g m on grid eng PigLatin:findRemoteHDFSFromAudits.pig job_ _ SUCCE EDED cob alt g m on grid ops distcp: job_ _ SUCCE EDED

20

21

22

23 ...It s Not Just Logs We re Looking At More data to tap into with the metastore / Hive sources! Using the metastore we can setup virtual indexes to any table(s) in Hive, without the need to define the schema up- front! Visualize very complex tables (250+ fields)! Rapid prototyping for new jobs with almost instant results for searches, without having to wait for the enare job/query to finish! Built- in aggregates and graphs/charts! Accelerates development workflow by providing faster interacaon with data

24

25 THANK YOU

Grid CompuAng AnalyAcs with Splunk Finnbar Cunningham

Grid CompuAng AnalyAcs with Splunk Finnbar Cunningham Copyright 2014 Splunk Inc. Grid CompuAng AnalyAcs with Splunk Finnbar Cunningham Head of Grid CompuAng OperaAons & Support Credit Suisse Disclaimer During the course of this presentaaon, we may make forward-

More information

COURSE CONTENT Big Data and Hadoop Training

COURSE CONTENT Big Data and Hadoop Training COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop

More information

Mobile Big Data AnalyEcs

Mobile Big Data AnalyEcs Copyright 2014 Splunk Inc. Mobile Big Data AnalyEcs Marc Courtemanche, Sr. Director Alain Brunet, Sr. Lead Developer Vantrix CorporaEon Disclaimer During the course of this presentaeon, we may make forward-

More information

Copyright 2013 Splunk, Inc. Splunk 6 Overview. Presenter Name, Presenter Title

Copyright 2013 Splunk, Inc. Splunk 6 Overview. Presenter Name, Presenter Title Copyright 2013 Splunk, Inc. Splunk 6 Overview Presenter Name, Presenter Title Safe Harbor Statement During the course of this presentahon, we may make forward looking statements regarding future events

More information

Real World Big Data Architecture - Splunk, Hadoop, RDBMS

Real World Big Data Architecture - Splunk, Hadoop, RDBMS Copyright 2015 Splunk Inc. Real World Big Data Architecture - Splunk, Hadoop, RDBMS Raanan Dagan, Big Data Specialist, Splunk Disclaimer During the course of this presentagon, we may make forward looking

More information

Building a cloud- based SIEM with Splunk Cloud and AWS

Building a cloud- based SIEM with Splunk Cloud and AWS Copyright 2014 Splunk Inc. Building a cloud- based SIEM with Splunk Cloud and AWS Joe Goldberg Product MarkeAng, Splunk Gary Mikula Senior Director InformaAon Security, FINRA Sivakanth Mundru Product Manager,

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

locuz.com Big Data Services

locuz.com Big Data Services locuz.com Big Data Services Big Data At Locuz, we help the enterprise move from being a data-limited to a data-driven one, thereby enabling smarter, faster decisions that result in better business outcome.

More information

Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS

Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS Copyright 2014 Splunk Inc. Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS Dritan Bi=ncka BD Solu=ons Architecture Disclaimer During the course of this presenta=on, we may make forward looking statements

More information

BBM467 Data Intensive ApplicaAons

BBM467 Data Intensive ApplicaAons Hace7epe Üniversitesi Bilgisayar Mühendisliği Bölümü BBM467 Data Intensive ApplicaAons Dr. Fuat Akal akal@hace7epe.edu.tr Problem How do you scale up applicaaons? Run jobs processing 100 s of terabytes

More information

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment

More information

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)

More information

Hadoop: The Definitive Guide

Hadoop: The Definitive Guide Hadoop: The Definitive Guide Tom White foreword by Doug Cutting O'REILLY~ Beijing Cambridge Farnham Köln Sebastopol Taipei Tokyo Table of Contents Foreword Preface xiii xv 1. Meet Hadoop 1 Da~! 1 Data

More information

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013 Integrating Hadoop Into Business Intelligence & Data Warehousing Philip Russom TDWI Research Director for Data Management, April 9 2013 TDWI would like to thank the following companies for sponsoring the

More information

Big Data Analytics Platform @ Nokia

Big Data Analytics Platform @ Nokia Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform

More information

Big Data Operations Guide for Cloudera Manager v5.x Hadoop

Big Data Operations Guide for Cloudera Manager v5.x Hadoop Big Data Operations Guide for Cloudera Manager v5.x Hadoop Logging into the Enterprise Cloudera Manager 1. On the server where you have installed 'Cloudera Manager', make sure that the server is running,

More information

Qsoft Inc www.qsoft-inc.com

Qsoft Inc www.qsoft-inc.com Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:

More information

Hadoop Usage At Yahoo! Milind Bhandarkar (milindb@yahoo-inc.com)

Hadoop Usage At Yahoo! Milind Bhandarkar (milindb@yahoo-inc.com) Hadoop Usage At Yahoo! Milind Bhandarkar (milindb@yahoo-inc.com) About Me Parallel Programming since 1989 High-Performance Scientific Computing 1989-2005, Data-Intensive Computing 2005 -... Hadoop Solutions

More information

PEPPERDATA IN MULTI-TENANT ENVIRONMENTS

PEPPERDATA IN MULTI-TENANT ENVIRONMENTS ..................................... PEPPERDATA IN MULTI-TENANT ENVIRONMENTS technical whitepaper June 2015 SUMMARY OF WHAT S WRITTEN IN THIS DOCUMENT If you are short on time and don t want to read the

More information

Hadoop: The Definitive Guide

Hadoop: The Definitive Guide FOURTH EDITION Hadoop: The Definitive Guide Tom White Beijing Cambridge Famham Koln Sebastopol Tokyo O'REILLY Table of Contents Foreword Preface xvii xix Part I. Hadoop Fundamentals 1. Meet Hadoop 3 Data!

More information

Splunk Operational Visibility

Splunk Operational Visibility Copyright 2015 Splunk Inc. Splunk Operational Visibility Matthias Maier Sales Engineer, CISSP Safe Harbor Statement During the course of this presentation, we may make forward looking statements regarding

More information

Big Data Technology Core Hadoop: HDFS-YARN Internals

Big Data Technology Core Hadoop: HDFS-YARN Internals Big Data Technology Core Hadoop: HDFS-YARN Internals Eshcar Hillel Yahoo! Ronny Lempel Outbrain *Based on slides by Edward Bortnikov & Ronny Lempel Roadmap Previous class Map-Reduce Motivation This class

More information

This chapter introduces you to Microso2 Office Access 2013. The chapter focuses on what a database is, the components of a database, what a database

This chapter introduces you to Microso2 Office Access 2013. The chapter focuses on what a database is, the components of a database, what a database This chapter introduces you to Microso2 Office Access 2013. The chapter focuses on what a database is, the components of a database, what a database can do and how to create a database. 1 The objecaves

More information

Copyright 2014 Splunk Inc.

Copyright 2014 Splunk Inc. Copyright 2014 Splunk Inc. Extend Splunk by Visualizing Data using Tableau and the ODBC driver Sharad Kylasam Sr. Product Manager, Splunk Ashley Jaschke Product Manager, Tableau Joe Specht Sr. Director

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM 1. Introduction 1.1 Big Data Introduction What is Big Data Data Analytics Bigdata Challenges Technologies supported by big data 1.2 Hadoop Introduction

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

IOmark Suite. Benchmarking Storage with Applica4on Workloads August, 2013. 2013 Evaluator Group, Inc.

IOmark Suite. Benchmarking Storage with Applica4on Workloads August, 2013. 2013 Evaluator Group, Inc. IOmark Suite Benchmarking Storage with Applica4on Workloads August, 2013 1 What is IOmark Suite?! A storage specific benchmark for applicaaon workloads Tests storage only Supports VDI and Virtual Machine

More information

Copyright 2013 Splunk Inc. Introducing Splunk 6

Copyright 2013 Splunk Inc. Introducing Splunk 6 Copyright 2013 Splunk Inc. Introducing Splunk 6 Safe Harbor Statement During the course of this presentation, we may make forward looking statements regarding future events or the expected performance

More information

The Hadoop Distributed File System

The Hadoop Distributed File System The Hadoop Distributed File System Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com Presenter: Alex Hu HDFS

More information

Like what you hear? Tweet it using: #Sec360

Like what you hear? Tweet it using: #Sec360 Like what you hear? Tweet it using: #Sec360 HADOOP SECURITY Like what you hear? Tweet it using: #Sec360 HADOOP SECURITY About Robert: School: UW Madison, U St. Thomas Programming: 15 years, C, C++, Java

More information

Hadoop Scalability at Facebook. Dmytro Molkov (dms@fb.com) YaC, Moscow, September 19, 2011

Hadoop Scalability at Facebook. Dmytro Molkov (dms@fb.com) YaC, Moscow, September 19, 2011 Hadoop Scalability at Facebook Dmytro Molkov (dms@fb.com) YaC, Moscow, September 19, 2011 How Facebook uses Hadoop Hadoop Scalability Hadoop High Availability HDFS Raid How Facebook uses Hadoop Usages

More information

Workshop on Hadoop with Big Data

Workshop on Hadoop with Big Data Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly

More information

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems Proactively address regulatory compliance requirements and protect sensitive data in real time Highlights Monitor and audit data activity

More information

Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013

Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013 Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013 * Other names and brands may be claimed as the property of others. Agenda Hadoop Intro Why run Hadoop on Lustre?

More information

Comprehensive Analytics on the Hortonworks Data Platform

Comprehensive Analytics on the Hortonworks Data Platform Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page

More information

Customer Use Cases: Proactive Monitoring for PowerCenter Operations and Development Governance

Customer Use Cases: Proactive Monitoring for PowerCenter Operations and Development Governance Customer Use Cases: Proactive Monitoring for PowerCenter Operations and Development Governance Prasad Sunkara Assistant Director, Illinois State University Pankaj Mittal Manger, NBC Universal 1 Implementing

More information

How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5

How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5 Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark

More information

Big Data Analytics - Accelerated. stream-horizon.com

Big Data Analytics - Accelerated. stream-horizon.com Big Data Analytics - Accelerated stream-horizon.com StreamHorizon & Big Data Integrates into your Data Processing Pipeline Seamlessly integrates at any point of your your data processing pipeline Implements

More information

Building a SaaS Application. ReddyRaja Annareddy CTO and Founder

Building a SaaS Application. ReddyRaja Annareddy CTO and Founder Building a SaaS Application ReddyRaja Annareddy CTO and Founder Introduction As cloud becomes more and more prevalent, many ISV s and enterprise are looking forward to move their services and offerings

More information

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate

More information

Introduction to Big Data Training

Introduction to Big Data Training Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB

More information

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics

More information

EXPERIMENTATION. HARRISON CARRANZA School of Computer Science and Mathematics

EXPERIMENTATION. HARRISON CARRANZA School of Computer Science and Mathematics BIG DATA WITH HADOOP EXPERIMENTATION HARRISON CARRANZA Marist College APARICIO CARRANZA NYC College of Technology CUNY ECC Conference 2016 Poughkeepsie, NY, June 12-14, 2016 Marist College AGENDA Contents

More information

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after

More information

Native Connectivity to Big Data Sources in MSTR 10

Native Connectivity to Big Data Sources in MSTR 10 Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single

More information

IntroducJon to Splunk Cloud & Case Study: MindTouch. Praveen Rangnath Splunk César López- Natarén MindTouch Aaron Fulkerson MindTouch

IntroducJon to Splunk Cloud & Case Study: MindTouch. Praveen Rangnath Splunk César López- Natarén MindTouch Aaron Fulkerson MindTouch Copyright 2014 plunk Inc. Copyright @ 2 014 CSomcast IntroducJon to Splunk Cloud & Case Study: MindTouch Praveen Rangnath Splunk César López- Natarén MindTouch Aaron Fulkerson MindTouch Disclaimer During

More information

STeP-IN SUMMIT 2014. June 2014 at Bangalore, Hyderabad, Pune - INDIA. Performance testing Hadoop based big data analytics solutions

STeP-IN SUMMIT 2014. June 2014 at Bangalore, Hyderabad, Pune - INDIA. Performance testing Hadoop based big data analytics solutions 11 th International Conference on Software Testing June 2014 at Bangalore, Hyderabad, Pune - INDIA Performance testing Hadoop based big data analytics solutions by Mustufa Batterywala, Performance Architect,

More information

Apache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com

Apache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Apache Sentry Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture

More information

Four Orders of Magnitude: Running Large Scale Accumulo Clusters. Aaron Cordova Accumulo Summit, June 2014

Four Orders of Magnitude: Running Large Scale Accumulo Clusters. Aaron Cordova Accumulo Summit, June 2014 Four Orders of Magnitude: Running Large Scale Accumulo Clusters Aaron Cordova Accumulo Summit, June 2014 Scale, Security, Schema Scale to scale 1 - (vt) to change the size of something let s scale the

More information

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES AWS GLOBAL INFRASTRUCTURE 10 Regions 25 Availability Zones 51 Edge locations WHAT

More information

System Aware Cyber Security Architecture

System Aware Cyber Security Architecture System Aware Cyber Security Architecture Rick A. Jones October, 2011 Research Topic DescripAon System Aware Cyber Security Architecture Addresses supply chain and insider threats Embedded into the system

More information

How Companies are! Using Spark

How Companies are! Using Spark How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Zynga Analytics Leveraging Big Data to Make Games More Fun and Social

Zynga Analytics Leveraging Big Data to Make Games More Fun and Social Connecting the World Through Games Zynga Analytics Leveraging Big Data to Make Games More Fun and Social Daniel McCaffrey General Manager, Platform and Analytics Engineering World s leading social game

More information

Cloudera Manager Training: Hands-On Exercises

Cloudera Manager Training: Hands-On Exercises 201408 Cloudera Manager Training: Hands-On Exercises General Notes... 2 In- Class Preparation: Accessing Your Cluster... 3 Self- Study Preparation: Creating Your Cluster... 4 Hands- On Exercise: Working

More information

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Testing 3Vs (Volume, Variety and Velocity) of Big Data Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used

More information

Peers Techno log ies Pv t. L td. HADOOP

Peers Techno log ies Pv t. L td. HADOOP Page 1 Peers Techno log ies Pv t. L td. Course Brochure Overview Hadoop is a Open Source from Apache, which provides reliable storage and faster process by using the Hadoop distibution file system and

More information

Ganzheitliches Datenmanagement

Ganzheitliches Datenmanagement Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist

More information

Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved.

Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved. Welkom! WIE? Bestuurslid OGh met BI / WA ervaring Bepalen activiteiten van de vereniging Deelname in organisatie commite van 1 of meerdere events Faciliteren van de SIG s Redactie van OGh-Visie Onderhouden

More information

... ... PEPPERDATA OVERVIEW AND DIFFERENTIATORS ... ... ... ... ...

... ... PEPPERDATA OVERVIEW AND DIFFERENTIATORS ... ... ... ... ... ..................................... WHITEPAPER PEPPERDATA OVERVIEW AND DIFFERENTIATORS INTRODUCTION Prospective customers will often pose the question, How is Pepperdata different from tools like Ganglia,

More information

Proposed Central Corridor Loyalty Program

Proposed Central Corridor Loyalty Program Proposed Central Corridor Loyalty Program 1 Business Problem Retail business will lose revenue in the range of 20% to 60% due to construcaon CCLRT ADer construcaon has started, idenafying current customers

More information

Complete Java Classes Hadoop Syllabus Contact No: 8888022204

Complete Java Classes Hadoop Syllabus Contact No: 8888022204 1) Introduction to BigData & Hadoop What is Big Data? Why all industries are talking about Big Data? What are the issues in Big Data? Storage What are the challenges for storing big data? Processing What

More information

Getting Started & Successful with Big Data

Getting Started & Successful with Big Data Getting Started & Successful with Big Data @Pentaho #BigDataWebSeries 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Your Hosts Today Davy Nys VP EMEA & APAC Pentaho Paul

More information

Unified Batch & Stream Processing Platform

Unified Batch & Stream Processing Platform Unified Batch & Stream Processing Platform Himanshu Bari Director Product Management Most Big Data Use Cases Are About Improving/Re-write EXISTING solutions To KNOWN problems Current Solutions Were Built

More information

Application Development. A Paradigm Shift

Application Development. A Paradigm Shift Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the

More information

SAP and Hortonworks Reference Architecture

SAP and Hortonworks Reference Architecture SAP and Hortonworks Reference Architecture Hortonworks. We Do Hadoop. June Page 1 2014 Hortonworks Inc. 2011 2014. All Rights Reserved A Modern Data Architecture With SAP DATA SYSTEMS APPLICATIO NS Statistical

More information

Virtualizing Apache Hadoop. June, 2012

Virtualizing Apache Hadoop. June, 2012 June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING

More information

The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer, Cofounder @mikeolson

The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer, Cofounder @mikeolson The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer, Cofounder @mikeolson 1 A New Platform for Pervasive Analytics Multiple big data opportunities

More information

CS54100: Database Systems

CS54100: Database Systems CS54100: Database Systems Cloud Databases: The Next Post- Relational World 18 April 2012 Prof. Chris Clifton Beyond RDBMS The Relational Model is too limiting! Simple data model doesn t capture semantics

More information

docs.hortonworks.com

docs.hortonworks.com docs.hortonworks.com : Ambari User's Guide Copyright 2012-2015 Hortonworks, Inc. Some rights reserved. The, powered by Apache Hadoop, is a massively scalable and 100% open source platform for storing,

More information

Customer Case Study. Sharethrough

Customer Case Study. Sharethrough Customer Case Study Customer Case Study Benefits Faster prototyping of new applications Easier debugging of complex pipelines Improved overall engineering team productivity Summary offers a robust advertising

More information

Splunk Company Overview

Splunk Company Overview Copyright 2015 Splunk Inc. Splunk Company Overview Name Title Safe Harbor Statement During the course of this presentation, we may make forward looking statements regarding future events or the expected

More information

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the

More information

Agile Infrastructure Update Monitoring

Agile Infrastructure Update Monitoring Agile Infrastructure Update Monitoring Pedro Andrade IT/GT 6 th July 2012 IT Technical Forum CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Overview Introduction Motivation, Challenge,

More information

Cost-Effective Business Intelligence with Red Hat and Open Source

Cost-Effective Business Intelligence with Red Hat and Open Source Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,

More information

Mobile Device Security Risks and RemediaAon Approaches

Mobile Device Security Risks and RemediaAon Approaches Mobile Device Security Risks and RemediaAon Approaches Raj Chaudhary, Principal, Crowe Horwath LLP In- Depth Seminars D11 CRISC CGEIT CISM CISA Informal Poll What is your Atle/role? Internal Audit IT Audit

More information

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.

More information

Big Data & Data Visualiza.on: How to mine through the mess, discover what's important and put it to use

Big Data & Data Visualiza.on: How to mine through the mess, discover what's important and put it to use Big Data & Data Visualiza.on: How to mine through the mess, discover what's important and put it to use Erich Heneke Mayo Clinic Supply Chain Management My Team (SC Audit/Controls) Developed after a widespread

More information

Operations and Big Data: Hadoop, Hive and Scribe. Zheng Shao @ 铮 9 12/7/2011 Velocity China 2011

Operations and Big Data: Hadoop, Hive and Scribe. Zheng Shao @ 铮 9 12/7/2011 Velocity China 2011 Operations and Big Data: Hadoop, Hive and Scribe Zheng Shao @ 铮 9 12/7/2011 Velocity China 2011 Agenda 1 Operations: Challenges and Opportunities 2 Big Data Overview 3 Operations with Big Data 4 Big Data

More information

Oracle Big Data for Dummies

Oracle Big Data for Dummies Oracle Big Data for Dummies Sai Janakiram Penumuru WW Product Expert Cloud Platforms Hewlett-Packard, India The Father of Microbiology first microbiologist Antonie Philips van Leeuwenhoek 2 Sai Janakiram

More information

Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql

Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 7, July 2014, pg.759

More information

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

CRITEO INTERNSHIP PROGRAM 2015/2016

CRITEO INTERNSHIP PROGRAM 2015/2016 CRITEO INTERNSHIP PROGRAM 2015/2016 A. List of topics PLATFORM Topic 1: Build an API and a web interface on top of it to manage the back-end of our third party demand component. Challenge(s): Working with

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Managing Capacity Using VMware vcenter CapacityIQ TECHNICAL WHITE PAPER

Managing Capacity Using VMware vcenter CapacityIQ TECHNICAL WHITE PAPER Managing Capacity Using VMware vcenter CapacityIQ TECHNICAL WHITE PAPER Table of Contents Capacity Management Overview.... 3 CapacityIQ Information Collection.... 3 CapacityIQ Performance Metrics.... 4

More information

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc. Beyond Web Application Log Analysis using Apache TM Hadoop A Whitepaper by Orzota, Inc. 1 Web Applications As more and more software moves to a Software as a Service (SaaS) model, the web application has

More information

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763 International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing

More information

Certified Big Data and Apache Hadoop Developer VS-1221

Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer Certification Code VS-1221 Vskills certification for Big Data and Apache Hadoop Developer Certification

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens Realtime Apache Hadoop at Facebook Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens Agenda 1 Why Apache Hadoop and HBase? 2 Quick Introduction to Apache HBase 3 Applications of HBase at

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

Data Services Advisory

Data Services Advisory Data Services Advisory Modern Datastores An Introduction Created by: Strategy and Transformation Services Modified Date: 8/27/2014 Classification: DRAFT SAFE HARBOR STATEMENT This presentation contains

More information

Introduction and Overview for Oracle 11G 4 days Weekends

Introduction and Overview for Oracle 11G 4 days Weekends Introduction and Overview for Oracle 11G 4 days Weekends The uses of SQL queries Why SQL can be both easy and difficult Recommendations for thorough testing Enhancing query performance Query optimization

More information

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,

More information

Professional Hadoop Solutions

Professional Hadoop Solutions Brochure More information from http://www.researchandmarkets.com/reports/2542488/ Professional Hadoop Solutions Description: The go-to guidebook for deploying Big Data solutions with Hadoop Today's enterprise

More information

The Top 10 7 Hadoop Patterns and Anti-patterns. Alex Holmes @

The Top 10 7 Hadoop Patterns and Anti-patterns. Alex Holmes @ The Top 10 7 Hadoop Patterns and Anti-patterns Alex Holmes @ whoami Alex Holmes Software engineer Working on distributed systems for many years Hadoop since 2008 @grep_alex grepalex.com what s hadoop...

More information