The Internet of Things and Big Data: Intro

Similar documents
Self-service BI for big data applications using Apache Drill

Self-service BI for big data applications using Apache Drill

Why Spark on Hadoop Matters

SQL on NoSQL (and all of the data) With Apache Drill

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Information Builders Mission & Value Proposition

How Companies are! Using Spark

MapR: Best Solution for Customer Success

Unified Big Data Processing with Apache Spark. Matei

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May Santa Clara, CA

Moving From Hadoop to Spark

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Transforming the Telecoms Business using Big Data and Analytics

HDP Hadoop From concept to deployment.

Implement Hadoop jobs to extract business value from large and varied data sets


THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

Beyond Hadoop with Apache Spark and BDAS

Big Data and Data Science: Behind the Buzz Words

Time-Series Databases and Machine Learning

Unified Big Data Analytics Pipeline. 连 城

Hadoop Ecosystem B Y R A H I M A.

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

Next-Gen Big Data Analytics using the Spark stack

Real-Time Data Analytics and Visualization

HDP Enabling the Modern Data Architecture

Big Data at Spotify. Anders Arpteg, Ph D Analytics Machine Learning, Spotify

Real-time Big Data Analytics with Storm

Large scale processing using Hadoop. Ján Vaňo

Hadoop & Spark Using Amazon EMR

Big Data. Lyle Ungar, University of Pennsylvania

Shark Installation Guide Week 3 Report. Ankush Arora

DANIEL EKLUND UNDERSTANDING BIG DATA AND THE HADOOP TECHNOLOGIES NOVEMBER 2-3, 2015 RESIDENZA DI RIPETTA - VIA DI RIPETTA, 231 ROME (ITALY)

Dominik Wagenknecht Accenture

Splice Machine: SQL-on-Hadoop Evaluation Guide

HP Vertica. Echtzeit-Analyse extremer Datenmengen und Einbindung von Hadoop. Helmut Schmitt Sales Manager DACH

How To Create A Data Visualization With Apache Spark And Zeppelin

Hadoop implementation of MapReduce computational model. Ján Vaňo

How To Handle Big Data With A Data Scientist

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

The 3 questions to ask yourself about BIG DATA

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Dell In-Memory Appliance for Cloudera Enterprise

Oracle Big Data SQL Technical Update

ANALYTICS CENTER LEARNING PROGRAM

Real Time Big Data Processing

Big Data and Industrial Internet

Ali Ghodsi Head of PM and Engineering Databricks

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

HiBench Introduction. Carson Wang Software & Services Group

The Future of Data Management

Scaling Out With Apache Spark. DTL Meeting Slides based on

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Reference Architecture, Requirements, Gaps, Roles

Enterprise Operational SQL on Hadoop Trafodion Overview

Luncheon Webinar Series May 13, 2013

Analytics on Spark &

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Advanced In-Database Analytics

Spark and the Big Data Library

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

From Spark to Ignition:

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

Integrating a Big Data Platform into Government:

InfiniteGraph: The Distributed Graph Database

Big Data and Apache Hadoop s MapReduce

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Apache Hadoop in the Enterprise. Dr. Amr Awadallah,

Apache Spark 11/10/15. Context. Reminder. Context. What is Spark? A GrowingStack

Apache Hadoop: The Big Data Refinery

Data Warehouse Optimization

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic

Saving Millions through Data Warehouse Offloading to Hadoop. Jack Norris, CMO MapR Technologies. MapR Technologies. All rights reserved.

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Bringing the Power of SAS to Hadoop. White Paper

Cost-Effective Business Intelligence with Red Hat and Open Source

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

How To Scale Out Of A Nosql Database

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Transcription:

The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1

What This Is; What This Is Not It s not specific to IoT It s not about any specific type of data or protocol It s not specific to any particular industry It s about processing big data IoT data can be big data IoT might be the biggest data of the coming decade But it s just big data Same strategies & technologies apply 2

3

4

When Does Data Become Big? When the size of the data, itself, becomes a problem When the old way of processing data just doesn t work effectively It s big when we have to rethink: How we store that much data How we move that much data How we extract, load & transform that much data How we explore and analyze that much data How we process and get meaningful insights from that much data 5

C mon! What does that mean in size? Not gigabytes Most likely not a few terabytes Possibly not 10 s of terabytes Probably 100 s of terabytes Definitely petabytes 6

So How Do We Handle Big Data? Distribute & parallelize! 7

MPP Analytic Databases or Hadoop 8

Big Data Analytics Bridging classic & big data worlds SQL performance and structure Classic Method Structured & Repeatable Analysis Business determines what questions to ask Hadoop scale and flexibility IT structures the data to answer those questions Capture only what s needed IT delivers a platform for storing, refining, and analyzing all data sources Capture in case it s needed Big Data Method Multi-structured & iterative analysis Business explores data for questions worth answering 9

Philosophical Differences Traditional Methods More power Summarize data Transform and store Pre-defined schema Move data -> compute Less data / more complex algorithms Big Data More machines Keep all data Transform on demand Flexible / no schema Move compute -> data Mode data / simple algorithms 10

answer = f(all data) Save all raw data Data immutability Transform as needed Result is based on the raw data 11

Q & A Engage with us! @mapr maprtech mapr-technologies MapR jberns@mapr.com maprtech 12

Iot and Big Data: Hadoop as a Data Platform John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 13

Hadoop: The Disruptive Technology at the Core of Big Data 14

Forces of Adoption Hadoop TAM comes from disrupting enterprise data warehouse and storage spending IT BUDGETS GROWING AT 2.5% DATA GROWING AT 40% $ PER TERABYTE $40,000 Data IT Budgets $9,000 <$1,000 2013 2014 2015 2016 2017 ENTERPRISE STORAGE DATABASE WAREHOUSE HADOOP Gartner, "Forecast Analysis: Enterprise IT Spending by Vertical Industry Market, Worldwide, 2010-2016, 3Q12 Update. Wall Street Journal, Financial Services Companies Firms See Results from Big Data Push, Jan. 27, 2014 15

Hadoop 101 (External Presentation) 2014 2014 MapR MapR Technologies Technologies 16

Hadoop Hardware 2014 2014 MapR MapR Technologies Technologies 17

Typical Compute Node Two CPUs, each with 4-8 cores per CPU 32-128 GB Memory 6-24 hard disks 2-4 10GB Network cards 18

Hadoop Ecosystem 2014 2014 MapR MapR Technologies Technologies 19

Ecosystem of Projects Built of Hadoop 20

SQL On Hadoop 2014 2014 MapR MapR Technologies Technologies 21

SQL on Hadoop Generally data has no inherent schema Schema is defined by user / interpreted from structure Schema is applied during processing One file can have many schemas applied Works for many kinds of data but not all Temperature sensor data? Sure Video feeds? Not really 22

Key Use Cases Big Data Analysis 2 Big Data Exploration Large-scale SQL queries on long history Well defined schema Known value, but high cost in existing systems Exploratory analysis on large scale raw data Unknown value No defined schema Variety of data types 23

What is Driving the Need for SQL-on-Hadoop? Organizations are looking for Reuse existing tools and skills to unlock Hadoop data to broader audience Analysis on new types of data More complete data analysis More up-to-date and real-time data analysis (not just after the fact ) 24

SQL on Hadoop: Many Options Flexibility to choose when to use which based on use case Drill 1.0 Hive 0.13 with Tez Impala 1.x Presto 0.56 Shark 0.8 Vertica Latency Low Medium Low Low Medium Low Files Yes (all Hive file formats) Yes (all Hive file formats) Yes (Parquet, Sequence, ) Yes (RC, Sequence, Text) Yes (all Hive file formats) HBase/M7 Yes Yes Various issues No Yes No Schema Hive or schemaless Yes (all Hive file formats) Hive Hive Hive Hive Proprietary or Hive SQL support ANSI SQL HiveQL HiveQL (subset) ANSI SQL HiveQL ANSI SQL + advanced analytics Client support ODBC/JDBC ODBC/JDBC ODBC/JDBC ODBC/JDBC ODBC/JDBC ODBC/JDBC, ADO.NET, Large joins Yes Yes No No No Yes Nested data Yes Limited No Limited Limited Limited Hive UDFs Yes Yes Limited No Yes No Transactions No No No No No Yes Optimizer Limited Limited Limited Limited Limited Yes Concurrency Limited Limited Limited Limited Limited Yes 25

Proven Hadoop Production Success ENTERPRISE DATA HUB MARKETING ANALYTICS RISK ANALYTICS OPERATIONS INTELLIGENCE Multi-structured data staging & archive ETL / DW optimization Mainframe optimization Data exploration Recommendation engines & targeting Ad optimization Pricing analysis Lead scoring Network security monitoring Security information & event management Fraudulent behavioral analysis Supply chain & logistics System log analysis Manufacturing quality assurance Preventative maintenance Sensor analysis 26

Other Tools & Frameworks of Note 2014 2014 MapR MapR Technologies Technologies 27

Pig Procedural Language Loops, if-then statements 28

Map Reduce Framwork Lingual: SQL-like operations Pattern: Machine Learning Applications Scalding: Cascading for Scala Cascalog: Cascading for Clojure 29

Python, Scala and Java Spark powers a stack of high-level tools including Shark for SQL, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these frameworks seamlessly in the same application. 30

Machine Learning / Predictive Analytics Collaborative Filtering Linear / Logistic Regression Naïve Bayes Random Forests K-Mean Clustering Canopy Clustering Principal Component Analysis 31

Database on Hadoop Highly scalable Columnar Flexible schema Data source for Map Reduce and Spark jobs 32

Q & A Engage with us! @mapr maprtech mapr-technologies MapR jberns@mapr.com maprtech 33

Iot and Big Data: Architectures & Use Cases John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 34

NoSQL 2014 2014 MapR MapR Technologies Technologies 35

NoSQL Databases No-SQL or Not only SQL Give up some of the functionality of traditional relational databases for speed and scalability Types Key-Value Columnar Document Graph NoSQL databases favor flexible schemas 36

HBase 37

Queues 2014 2014 MapR MapR Technologies Technologies 38

Queues Just like a queue at an amusement park First-in-first out Queues messages or events 39

Message Queue 40

Stream Processing 2014 2014 MapR MapR Technologies Technologies 41

Stream Processing Handles data at high velocity If Hadoop is the ocean, streams are the firehose Processing in near real-time 42

Storm 43

Batch Processing 2014 2014 MapR MapR Technologies Technologies 44

Combination Architectures 2014 2014 MapR MapR Technologies Technologies 45

Lambda Architecture 46

Complex Architectures Using Many Big Data Technologies 47

Wanna Play? http://www.mapr.com/products/mapr-sandbox-hadoop 48

Q & A Engage with us! @mapr maprtech mapr-technologies MapR jberns@mapr.com maprtech 49

MPP Analytic Databases or Hadoop 50