Dominik Wagenknecht Accenture



Similar documents
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Upcoming Announcements

HDP Hadoop From concept to deployment.

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Data Security in Hadoop

Hadoop Ecosystem B Y R A H I M A.

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Bringing Big Data to People

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

So What s the Big Deal?

How To Scale Out Of A Nosql Database

HDP Enabling the Modern Data Architecture

How to Hadoop Without the Worry: Protecting Big Data at Scale

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

#TalendSandbox for Big Data

Big Data and Industrial Internet

Comprehensive Analytics on the Hortonworks Data Platform

Chase Wu New Jersey Ins0tute of Technology

The Future of Data Management

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Qsoft Inc

Dell In-Memory Appliance for Cloudera Enterprise

Self-service BI for big data applications using Apache Drill

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

SQL on NoSQL (and all of the data) With Apache Drill

Hadoop implementation of MapReduce computational model. Ján Vaňo

Information Builders Mission & Value Proposition

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Self-service BI for big data applications using Apache Drill

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15

Peers Techno log ies Pv t. L td. HADOOP

Why Spark on Hadoop Matters

Constructing a Data Lake: Hadoop and Oracle Database United!

The Future of Data Management with Hadoop and the Enterprise Data Hub

Implement Hadoop jobs to extract business value from large and varied data sets

Workshop on Hadoop with Big Data

Big Data Course Highlights

Moving From Hadoop to Spark

A Modern Data Architecture with Apache Hadoop

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer,

The Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang

HADOOP. Revised 10/19/2015

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Hadoop IST 734 SS CHUNG

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

The Top 10 7 Hadoop Patterns and Anti-patterns. Alex

Big Data Management and Security

MySQL and Hadoop. Percona Live 2014 Chris Schneider

The Internet of Things and Big Data: Intro

How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)

White Paper: What You Need To Know About Hadoop

Big Data Infrastructure at Spotify

The Digital Enterprise Demands a Modern Integration Approach. Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader

Introduction to Big Data Training

HADOOP VENDOR DISTRIBUTIONS THE WHY, THE WHO AND THE HOW? Guruprasad K.N. Enterprise Architect Wipro BOTWORKS

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Hadoop & Spark Using Amazon EMR

E6893 Big Data Analytics Lecture 2: Big Data Analytics Platforms

BIG DATA What it is and how to use?

Hadoop Job Oriented Training Agenda

Talend Big Data. Delivering instant value from all your data. Talend

Accelerating Enterprise Big Data Success. Tim Stevens, VP of Business and Corporate Development Cloudera

Hadoop. Sunday, November 25, 12

Next Gen Hadoop Gather around the campfire and I will tell you a good YARN

Training Catalog. Summer 2015 Training Catalog. Apache Hadoop Training from the Experts. Apache Hadoop Training From the Experts

Hadoop2, Spark Big Data, real time, machine learning & use cases. Cédric Carbone Twitter

How To Create A Data Visualization With Apache Spark And Zeppelin

Big Data Too Big To Ignore

Katta & Hadoop. Katta - Distributed Lucene Index in Production. Stefan Groschupf Scale Unlimited, 101tec. sg{at}101tec.com

Application Development. A Paradigm Shift

How Companies are! Using Spark

Large scale processing using Hadoop. Ján Vaňo

Big Data Analytics - Accelerated. stream-horizon.com

ITG Software Engineering

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Hadoop Introduction coreservlets.com and Dima May coreservlets.com and Dima May

Deploying Hadoop with Manager

Hadoop. for Oracle database professionals. Alex Gorbachev Calgary, AB September 2013

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Evolution from Big Data to Smart Data

Apache Hadoop Ecosystem

INDUS / AXIOMINE. Adopting Hadoop In the Enterprise Typical Enterprise Use Cases

Enterprise Operational SQL on Hadoop Trafodion Overview

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Practical Hadoop by Example

Driving Growth in Insurance With a Big Data Architecture

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Transcription:

Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters

About me Dominik Wagenknecht Accenture Vienna Technology Architect Emerging Technology Accenture Open Data Platform

Ask questions or rate the Speaker www.sli.do/openslava

The Scenario: Banking! Mainframe Data lives in DB2 on z/os Perfect for classic banking-style workloads, robust and trusted backbone of many banks Limitations around flexible data processing, analytic capabilities, no model for stream processing, etc Pay-per-use model very expensive for exploding mobile (mostly read) use

So should we replace it? Is a long-term strategy Long time till you get new capabilities Real question is: What is the right architecture for what we re trying to achieve

The aim of a modern data architecture Horizontal Scalability Low cost per TB of data (to keep all data) Processing over all data (explorative) Cheap data access (for simple reads) Enhanced real-time capabilities for analytics, streams and e.g. search so let s look for an appropriate datastore

OpenSlava 2013: Why RDBMS don t seem to work NOT an Option Probably Ok Limited Scalability Latency Rigid Schema Expensive Hardware NOT an Option NOT an Option

Latency is a real world issue San Francisco Frankfurt = 9 132 km (air distance) Speed of Light = 299 792 458 meters/second Lightbeam travel speed (in vacuum) = 30 milliseconds Lightbeam travel speed (in fiber) = 45 milliseconds Round trip time (RTT) = 91 milliseconds Calculation: 9132/299792,458 * 1,5 * 2 as long as users are distributed world-wide

OpenSlava 2013: NoSQL! BigTable C A CAP P Dynamo Illustrations: wiki.basho.com

OpenSlava 2013: NoSQL! BigTable Consistency & Scalability C Flexible Schema (and scalability) A CAP P Dynamo Illustrations: wiki.basho.com

Popular BigTable data stores In short HBase Original open-source implementation of BigTable Cassandra Scalability Datacenter Datacenter/Global Replication Master/Slave Master/Master Consistency Consistent Tunable Dynamo-based BigTable implementation Interfaces HTTP, Avro, Thrift, Native Custom Binary, Thrift Why cool Very large scale, integration in Hadoop (Map/Reduce) Perfect for high write rates in tabular data, some query ability What does that mean? Consistency enables familiar programming and data modeling patterns Full Hadoop ecosystem integration rounds off for Batch and OLTP features

Zookeeper (Coordination) HBase (BigTable) Enter Hadoop ca. 2010/11 (simplified) Pig (Dataflow) Hive (SQL) Sqoop (Data Integration) MapReduce (distributed batch processing) HDFS (distributed file system) Hadoop core Closely related

Zookeeper (Coordination) HBase, Accumulo (BigTable) Sqoop, Flume (Data Integration) Spark, Tez (Interactive / in-memory) Storm, Spark Streaming (Stream Processing) Pig (Dataflow scripting) Spark Mlib, Mahout (Machine Learning) Hive, Drill, Spark SQL (SQL) Solr, Elasticsearch (Search) Enter Hadoop 2014 (even more simplified) YARN (distributed processing framework) HDFS (distributed file system)

Read- & Writepath (Copybooks ) High-Level Architecture: Mainframe only Application(s) Mainframe DB2

Write-Path (Copybooks ) High-Level Architecture: Introducing Hadoop/HBase Application(s) Read-Path (REST/JSON) Hadoop Cluster Loadbalancer Master Primary Master Secondary DB2 Mainframe Agent z/os Agent Linux Node Node Node Node Node

Introducing Storm Hadoop Cluster Agent Linux PubSub e.g. Kafka 2nd (subset) get data update stats e.g. push notification Low-Latency stream processing 1st (full) Operational NoSQL Datastore

Introducing Analytics SQL Hadoop Cluster Hive (via MR or Tez) Agent Linux Flume Log- Ingestion Plain HDFS (including HBase data)

Is a multi-workload Hadoop ready for the Enterprise? Concern How about support? Is it secure? Is it just a hype? Should I use it everywhere? Response Enterprise-level support is available (Cloudera, Hortonworks, MapR Technologies, etc.) Openness you can switch vendor Yes, integration with Kerberos and LDAP Encryption in transit fully supported in OpenSource Encryption at rest coming, but easy with Linux NO Huge Eco-System The adoption rate is steadily increasing, filling a real gap All vendors are major contributors to the open source community Comparable to Linux in takeup NO Just like all of NoSQL it s not for everything Be thoughtful what to adopt, the core is very stable, newer things may not

Thanks. END Ask questions or rate the Speaker www.sli.do/openslava