Why Spark on Hadoop Matters



Similar documents
Self-service BI for big data applications using Apache Drill

Self-service BI for big data applications using Apache Drill

Batter Up! Advanced Sports Analytics with R and Storm

Information Builders Mission & Value Proposition

Data Security in Hadoop

Upcoming Announcements

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

HDP Enabling the Modern Data Architecture

SQL on NoSQL (and all of the data) With Apache Drill

Hadoop Ecosystem B Y R A H I M A.

HADOOP VENDOR DISTRIBUTIONS THE WHY, THE WHO AND THE HOW? Guruprasad K.N. Enterprise Architect Wipro BOTWORKS

Moving From Hadoop to Spark

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

The Internet of Things and Big Data: Intro

HDP Hadoop From concept to deployment.

Unified Big Data Analytics Pipeline. 连 城

How to Hadoop Without the Worry: Protecting Big Data at Scale

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

How Companies are! Using Spark

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Comprehensive Analytics on the Hortonworks Data Platform

The Future of Data Management with Hadoop and the Enterprise Data Hub

The Digital Enterprise Demands a Modern Integration Approach. Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader

Dominik Wagenknecht Accenture

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Ali Ghodsi Head of PM and Engineering Databricks

Communicating with the Elephant in the Data Center

Hadoop, the Data Lake, and a New World of Analytics

#TalendSandbox for Big Data

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,

Data Services Advisory

The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer,

A Modern Data Architecture with Apache Hadoop

HADOOP. Revised 10/19/2015

Workshop on Hadoop with Big Data

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Oracle Big Data Fundamentals Ed 1 NEW

Beyond Hadoop with Apache Spark and BDAS

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Unified Big Data Processing with Apache Spark. Matei

Hadoop2, Spark Big Data, real time, machine learning & use cases. Cédric Carbone Twitter

Case Study : 3 different hadoop cluster deployments

Roadmap Talend : découvrez les futures fonctionnalités de Talend

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

Hortonworks CISC Innovation day

Dell In-Memory Appliance for Cloudera Enterprise

Hortonworks Data Platform for Hadoop and SAP HANA

Big Data Management and Security

Hadoop 101. Lars George. NoSQL- Ma4ers, Cologne April 26, 2013

The Future of Data Management

Bringing Big Data to People

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Introduc8on to Apache Spark

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview

BIG DATA - HADOOP PROFESSIONAL amron

Conquering Big Data with BDAS (Berkeley Data Analytics)

Cray XC30 Hadoop Platform Jonathan (Bill) Sparks Howard Pritchard Martha Dumler

Apache Flink Next-gen data analysis. Kostas

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Shark Installation Guide Week 3 Report. Ankush Arora

Big Data and Industrial Internet

Hortonworks Data Platform. Buyer s Guide

Next-Gen Big Data Analytics using the Spark stack

Lars Francke Diplom Wirtschaftsinformatiker (FH) Sülldorfer Kirchenweg 34

What s next for the Berkeley Data Analytics Stack?

ITG Software Engineering

brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS PART 4 BEYOND MAPREDUCE...385

Building Scalable Big Data Pipelines

How To Create A Data Visualization With Apache Spark And Zeppelin

Data Analyst Program- 0 to 100

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

BDA Technologies & Selected Case Studies

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Apache Spark 11/10/15. Context. Reminder. Context. What is Spark? A GrowingStack

TRAINING PROGRAM ON BIGDATA/HADOOP

Training Catalog. Summer 2015 Training Catalog. Apache Hadoop Training from the Experts. Apache Hadoop Training From the Experts

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Trend Micro Big Data Platform and Apache Bigtop. 葉 祐 欣 (Evans Ye) Big Data Conference 2015

Peers Techno log ies Pv t. L td. HADOOP

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Analytics on Spark &

Real Time Data Processing using Spark Streaming

Qsoft Inc

The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi,

Suggest Topics / Workshop Event Registration Agenda

Introduction to Big Data Training

TECHNOLOGY TRANSFER PRESENTS INTERNATIONAL. Rome, December Residenza di Ripetta Via di Ripetta, 231 CONFERENCE BIG DATA

Apache Sentry. Prasad Mujumdar

Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing

Cloudera Enterprise Data Hub in Telecom:

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies

A Brief Introduction to Apache Tez

Interactive data analytics drive insights

Has been into training Big Data Hadoop and MongoDB from more than a year now

Big Data Course Highlights

Complete Java Classes Hadoop Syllabus Contact No:

Transcription:

Why Spark on Hadoop Matters MC Srivas, CTO and Founder, MapR Technologies Apache Spark Summit - July 1, 2014 1

MapR Overview Top Ranked Exponential Growth 500+ Customers Cloud Leaders 3X bookings Q1 13 Q1 14 90% software licenses 80% of accounts expand 3X < 1% lifetime churn > $1B in incremental revenue generated by 1 customer 2

Rapidly Evolving Landscape Management Batch Tez* Spark Cascading Pig MR v1 & v2 ML, Graph GraphX MLLib Mahout APACHE HADOOP AND OSS ECOSYSTEM SQL NoSQL & Streaming Data Security Search Integrtn. & Access Drill* Shark Impala Hive YARN EXECUTION ENGINES Accumulo* Solr HBase Storm* Spark Streaming Hue HttpFS Flume Sqoop MapR Data Platform Knox* Workflow & Data Gov. Falcon* Provision Savannah* Juju Whirr Sentry* Oozie ZooKeeper DATA GOVERNANCE AND OPERATIONS * 2014 TIMELINE 3

The Complete Spark Stack on Hadoop Management Batch Tez* Spark Cascading Pig MR v1 & v2 ML, Graph GraphX MLLib Mahout APACHE HADOOP AND OSS ECOSYSTEM SQL NoSQL & Streaming Data Security Search Integrtn. & Access Drill* Shark Impala Hive YARN EXECUTION ENGINES Accumulo* Solr HBase Storm* Spark Streaming Hue HttpFS Flume Sqoop MapR Data Platform Knox* Workflow & Data Gov. Falcon* Provision Savannah* Juju Whirr Sentry* Oozie ZooKeeper DATA GOVERNANCE AND OPERATIONS * 2014 TIMELINE 4

A Winning Combination 5

Spark Advantages: Easier APIs Python, Scala, Java EASE OF DEVELOPMENT IN-MEMORY PERFORMANCE RDDs DAGs Unify Processing Shark, ML, Streaming, GraphX COMBINE WORKFLOWS 6

Hadoop Advantages: UNLIMITED SCALE Multiple data sources Multiple applications Multiple users Reliability Multi-tenancy Security ENTERPRISE PLATFORM WIDE RANGE OF APPLICATIONS Files Databases Semi-structured 7

The Combination of Spark on Hadoop UNLIMITED SCALE IN-MEMORY PERFORMANCE WIDE RANGE OF APPLICATIONS EASE OF DEVELOPMENT ENTERPRISE PLATFORM COMBINE WORKFLOWS Operational Applications Augmented by In-Memory Performance 8

Case Studies 2014 2014 MapR MapR Technologies Technologies 9

Industry Leading Ad-Targeting Platform High performance analytics over MapR M7 NoSQL Load from M7 table into RDD to augment scoring in real-time Results fed back to M7 for other applications 10

Leading Pharma Company: NextGen Genomics Existing process takes several weeks to align chemical compounds with genes ADAM on Spark allows realignment in a few hours Geneticists can minimize engineering dependency 11

Cisco: Security Intelligence Operations Sensor data lands in M7 Spark Streaming on M7 for first check on known threats Data next processed on GraphX and Mahout Results queried using SQL via Shark and Impala 12

Insurance Giant: Addressing Health Care Regulations Patient information in M7 combined with clinical records to compute readmittance probability Process uses Spark with transactional data in M7 Insurance options decided in real-time on online portals 13

In Summary 2014 2014 MapR MapR Technologies Technologies 14

Spark on Hadoop gains traction for Real-time applications 15

Pick the Right Tool for the Job 16

MapR is Unbiased Open Source (a la Linux) Open source distribution is about providing choice Linux includes MySQL, PostgreSQL and SQLite Linux includes Apache httpd, nginx and Lighttpd MapR Distribution for Hadoop Distribution C Distribution H Spark Spark (all of it) and Shark Spark only No Interactive SQL Shark, Impala, Drill, Hive/Tez One option (Impala) One option (Hive/Tez) Versions Hive 0.10, 0.11, 0.12, 0.13 Pig 0.11, 012 HBase 0.94, 0.98 One version One version 17

Thank you Engage with us! @mapr maprtech mapr-technologies MapR srivas@mapr.com maprtech 18