Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics



Similar documents
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Hadoop Ecosystem B Y R A H I M A.

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

Traditional BI vs. Business Data Lake A comparison

Implement Hadoop jobs to extract business value from large and varied data sets

Workshop on Hadoop with Big Data

Information Builders Mission & Value Proposition

Large scale processing using Hadoop. Ján Vaňo

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Ali Ghodsi Head of PM and Engineering Databricks

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

The Technology of the Business Data Lake

How Companies are! Using Spark

Big Data and Data Science: Behind the Buzz Words

The Internet of Things and Big Data: Intro

The Principles of the Business Data Lake

Native Connectivity to Big Data Sources in MSTR 10

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Hadoop. Sunday, November 25, 12

From Spark to Ignition:

Dell In-Memory Appliance for Cloudera Enterprise

Chase Wu New Jersey Ins0tute of Technology

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Hadoop implementation of MapReduce computational model. Ján Vaňo

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

BIG DATA What it is and how to use?

The Future of Data Management

How To Scale Out Of A Nosql Database

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Upcoming Announcements

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Testing Big data is one of the biggest

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Moving From Hadoop to Spark

#TalendSandbox for Big Data

Unified Big Data Processing with Apache Spark. Matei

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Integrating a Big Data Platform into Government:

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

W H I T E P A P E R. Building your Big Data analytics strategy: Block-by-Block! Abstract

The Technology of the Business Data Lake

ANALYTICS CENTER LEARNING PROGRAM

Tap into Hadoop and Other No SQL Sources

Big Data Weather Analytics Using Hadoop

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer,

Dominik Wagenknecht Accenture

Real Time Big Data Processing

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

BIG DATA TRENDS AND TECHNOLOGIES

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Reference Architecture, Requirements, Gaps, Roles

Apache Hadoop: Past, Present, and Future

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

BIG DATA-AS-A-SERVICE

Unified Big Data Analytics Pipeline. 连 城

Hadoop & Spark Using Amazon EMR

How To Create A Data Visualization With Apache Spark And Zeppelin

So What s the Big Deal?

Why Spark on Hadoop Matters

Big Data Analytics - Accelerated. stream-horizon.com

Big Data on Microsoft Platform

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May Santa Clara, CA

Self-service BI for big data applications using Apache Drill

How To Handle Big Data With A Data Scientist

HDP Hadoop From concept to deployment.

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

Big Data and Hadoop for the Executive A Reference Guide

HADOOP. Revised 10/19/2015

Big Data Big Data/Data Analytics & Software Development

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Capgemini Big Data Analytics Sandbox for Financial Services

Bringing Big Data to People

Distributed Calculus with Hadoop MapReduce inside Orange Search Engine. mardi 3 juillet 12

The Future of Data Management with Hadoop and the Enterprise Data Hub

Comprehensive Analytics on the Hortonworks Data Platform

The Inside Scoop on Hadoop

and Hadoop Technology

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Transcription:

In Organizations Mark Vervuurt Cluster Data Science & Analytics

AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning 7. Industrialization

Doug Cutting

Y2006 Hadoop Ecosystem MapReduce Parallel Data Batch Processing Framework Hadoop Distributed File System (HDFS) Store Data Redundantly

Hadoop Business Benefits Meet ETL Service Level Agreements (SLAs) Store Structured and Unstructured Data in One Place Storage and Batch Processing of Large Data Sets (PetaByte Scale) Cost effective Storage and Processing using Low End or Commodity Servers

Data Ingestion & Complex Event Processing

Import and Export Relational Data Import data from relational databases into Hadoop Export data from Hadoop into relational databases Database Sqoop Hadoop

Stream Data Stream Log files into Hadoop for Storage, Processing and Analysis

Stream Data High Throughput Distributed Messaging Queue

Complex Event Processing (CEP) Filter, Transform and Process Events

Complex Event Processing (CEP) Filter, Transform and Process Micros Batches of Events

Hadoop Business Benefits Data Warehouse Optimization (Near) Real-Time Recommendation Engines Internet Of Things Enabler Through Streaming, Streaming Analytics and Storage of Large Data Sets

SQL on Hadoop

Hive The Apache Hive data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. Hive SQL on Hadoop (Data Warehouse) MapReduce Parallel Data Batch Processing Framework Hadoop Distributed File System (HDFS) Store Data Redundantly

Impala

Hawq

Pig Apache Pig is a platform for analyzing large data sets that consists of a highlevel language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets. Pig ETL (Data Warehouse) MapReduce Parallel Data Batch Processing Framework Hadoop Distributed File System (HDFS) Store Data Redundantly

Hadoop Business Benefits Query and Explore large Data Sets (PetaByte Scale) Query and Transform Large Data Sets

NoSQL

NoSQL Databases NoSQL Not Only SQL Big Data Storage and Querying Non Relational Data Model follows Query Horizontally Scalable

Cassandra NoSQL Database DataStax Enterprise Community Edition Optimized for High Availability Optimized for High-Throughput Writes Geographical Data Replication Integrates with Hadoop Ecosystem Cassandra Query Language Spark API for Cassandra

Hadoop Business Benefits Near Real-Time Query Response on Hadoop Store and Query Large Data Sets (PetaByte Scale) in a Database cost effectively on Low End or Commodity Servers Inserts, Deletes and Updates on Hadoop

InMemory

Spark

Pivotal GemfireXD

Hadoop Business Benefits Build Streaming Application Build Online Analytical Applications Real-Time and Fastest Query Response on Hadoop

Data Science & Machine Learning

Apache Mahout

Spark MLLib & GraphX

Hadoop Business Benefits Business Forecasting Preventive Maintenance Profiling & Anomalous Behavior Build Recommendation Engines Segment Customers Automatically

Industrialization

Deployment Modes Bare-Metal Virtualized Cloud As a Service

Hue

SAS

Informatica Power Center Big Data Goverance

Hadoop Business Benefits Easy and Enterprise Ready Hadoop

TRENDS Datafication of the Enterprise Multidisciplinary Teams Data Scientists Data Engineers

Questions & Answers

About Capgemini With almost 145,000 people in over 40 countries, Capgemini is one of the world's foremost providers of consulting, technology and outsourcing services. The Group reported 2014 global revenues of EUR 10.573 billion. Together with its clients, Capgemini creates and delivers business and technology solutions that fit their needs and drive the results they want. A deeply multicultural organization, Capgemini has developed its own way of working, the Collaborative Business Experience, and draws on Rightshore, its worldwide delivery model. Learn more about us at www.capgemini.com. www.capgemini.com The information contained in this presentation is proprietary. Copyright 2015 Capgemini. All rights reserved. Rightshore is a trademark belonging to Capgemini.