Big Data Architectures. Lessons Learned from Industrializing Big Data. Kenan Mujkic, PhD 23 June 2016

Similar documents
Comprehensive Analytics on the Hortonworks Data Platform

Cisco IT Hadoop Journey

SAP and Hortonworks Reference Architecture

The Future of Data Management

#TalendSandbox for Big Data

HDP Enabling the Modern Data Architecture

Big Data Management and Security

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

HDP Hadoop From concept to deployment.

Upcoming Announcements

Cisco IT Hadoop Journey

The Future of Data Management with Hadoop and the Enterprise Data Hub

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Are You Big Data Ready?

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Efficiently balance workload variability in your warehouse with Labour Management in SAP EWM.

Data Security in Hadoop

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Cloudera Enterprise Data Hub in Telecom:

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Descriptive to Predictive to Prescriptive Analytics: Move Up the Value Chain. Suren Nathan CTO

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Information Builders Mission & Value Proposition

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Moving From Hadoop to Spark

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,

Roadmap Talend : découvrez les futures fonctionnalités de Talend

Unified Batch & Stream Processing Platform

Big Data Use Case: Business Analytics

Hadoop & Spark Using Amazon EMR

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Providing real-time, built-in analytics with S/4HANA. Jürgen Thielemans, SAP Enterprise Architect SAP Belgium&Luxembourg

Bringing Big Data to People

Deploying an Operational Data Store Designed for Big Data

Ganzheitliches Datenmanagement

TRAINING PROGRAM ON BIGDATA/HADOOP

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

BIG DATA TRENDS AND TECHNOLOGIES

Virtualizing Apache Hadoop. June, 2012

The Digital Enterprise Demands a Modern Integration Approach. Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Talend Big Data. Delivering instant value from all your data. Talend

Modernizing Your Data Warehouse for Hadoop

Native Connectivity to Big Data Sources in MSTR 10

1. Understanding Big Data

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Big Data Realities Hadoop in the Enterprise Architecture

Apache Sentry. Prasad Mujumdar

Big Data Analytics Nokia

A Marketing & Sales Dashboard Implementation Lessons Learned & Results

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Dominik Wagenknecht Accenture

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

Why Spark on Hadoop Matters

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

How To Handle Big Data With A Data Scientist

More Data in Less Time

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Hortonworks Data Platform for Hadoop and SAP HANA

Self-service BI for big data applications using Apache Drill

Introduction to Big Data Training

Implement Hadoop jobs to extract business value from large and varied data sets

CAPTURING & PROCESSING REAL-TIME DATA ON AWS

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

Lars Francke Diplom Wirtschaftsinformatiker (FH) Sülldorfer Kirchenweg 34

How to Hadoop Without the Worry: Protecting Big Data at Scale

Azure Data Lake Analytics

How Companies are! Using Spark

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA HADOOP TRAINING

Workshop on Hadoop with Big Data

Consulting and Systems Integration (1) Networks & Cloud Integration Engineer

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

SAP BusinessObjects Business Intelligence 4.1 One Strategy for Enterprise BI. May 2013

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

Modern Data Architecture for Predictive Analytics

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

GROW WITH BIG DATA Third Eye Consulting Services & Solutions LLC.

Real Time Data Processing using Spark Streaming

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Brand Ambassadors From pre-foundation to advanced recruitment process through Social Media

STREAM ANALYTIX. Industry s only Multi-Engine Streaming Analytics Platform

How To Create A Data Visualization With Apache Spark And Zeppelin

Oracle Big Data SQL Technical Update

Driving Growth in Insurance With a Big Data Architecture

Transcription:

Big Data Architectures Lessons Learned from Industrializing Big Data Kenan Mujkic, PhD 23 June 2016

Deloitte Making an impact that matters for clients, for our people, and for society. We serve clients distinctively, bringing innovative insights, solving complex challenges and unlocking sustainable growth. We inspire our 220,000 talented professionals to deliver outstanding value. Deloitte provides audit, tax, financial advisory and consulting services to public and private clients spanning multiple industries; legal advisory services in Germany are provided by Deloitte Legal. With a globally connected network of member firms in more than 150 countries, Deloitte brings world-class capabilities and high-quality service to clients. 2016 Deloitte Consulting GmbH 1

Contents 1 Scope of Big Data Projects 2 Deloitte Selected Projects 3 Lessons Learned from Industrializing Big Data 2016 Deloitte Consulting GmbH 2

Scope of Big Data Projects 2016 Deloitte Consulting GmbH 3

Scope of Big Data Projects What do clients expect from Big Data? Business Insights 1 Leverage business value from advanced analytics capabilities, e.g. enhanced online marketing, branding and (re)targeting using customer segmentation. Business Models 2 Novel product lines originating from the digital transformation of the economy, e.g. insurance rates based on the scoring originating from telematics data. Reduced Storage Cost 3 Reduction of cost from storage of business relevant data in HDFS, and enhanced computing performance for data pre-processing using parallel computing. Industry 4.0, Internet of Things 4 Enhanced productivity and reduced defects in production lines, e.g. for consumer products produced based on data driven decision making, predictive maintenance. 5 Enhanced Service and Product Quality Enhanced service and product quality due to predictive maintenance solutions in automotive, railway, aerospace and other industries. 2016 Deloitte Consulting GmbH 4

Scope of Big Data Projects What are typical Big Data architectures? Kappa Architecture Architecture: Streaming layer (Spark, Storm) for realtime analytics and decisioning. Purpose: Real-time analytics, scoring results stored in serving layer for visualisation and analytical purposes. Workload: Large workloads, relatively simple analytics use cases. Lambda Architecture Architecture: Real-time layer (Spark, Storm), batch layer (MapReduce/Hive) and serving layer (HBase, MapR DB, Cassandra, etc.). Purpose: Combination of historical and transactional data in serving layer. Workload: Large workloads, complex real-time & batch processing use cases. Batch Architecture Architecture: Batch layer (Hive/Parquet) for storage of large data sets. Purpose: Classical storage of historical data for analytical purposes using batch processing. Workload: Large workloads, complex analytics use cases. 2016 Deloitte Consulting GmbH 5

Scope of Big Data Projects What are typical Big Data architectures? Kappa DL, Key-Value Store Lambda Data Lake, Hive/Parquet Key-Value Store Data Lake Hive/Parquet Batch 2016 Deloitte Consulting GmbH 6

Deloitte - Selected Projects 2016 Deloitte Consulting GmbH 7

Big Data Project Examples Selection of some of our current Big Data Platform projects at large companies in Germany Financial Service Provider European BI Platform Insurance Company Global Telematics Platform Automotive Manufacturer Global Big Data Platform Scope: European platform for regulatory reporting Purpose: Perform regulatory reporting according to European standards, leverage and combine current as well as historical data from a variety of sources, develop new use cases Scope: Scalable platform for real-time trip processing and scoring Purpose: Provide customers with individual insurance rates, based on their driving behavior, realtime ingestion of trip data, crash data and harsh events, mobile app integration Scope: Ingestion, storage and analysis of data from vehicle and production devices Purpose: Identify location and speed to improve traffic information in cars, collect & analyze sensor information from production robots and workstations to decrease failure rates Finance Insurance Automotive 2016 Deloitte Consulting GmbH 8

Financial Service Provider European BI & Reporting Platform Issue New European reporting and regulation structure Lack of standardization of risk data across sources, single stop for all risk related data does not exist Governance processes around the risk data are insufficient Solution Migration of SAP BW to HANA Leverage the storage capabilities and performance of Hadoop and the analytical power of SAP Bank Analyzer on Hana Migration of ETL from SAP BW Tool to Spark/Hadoop Standardized source systems for all subsidiaries Impact Standardized regulatory reporting and risk data from multiple sources Consolidated business data stored in a unified corporate memory Enable reporting and visualization of risk data across the functions Investment Risk, Counterparty Risk and Operational Risk 2016 Deloitte Consulting GmbH 9

Financial Service Provider European BI Platform Architecture Internal and External Sources Analytics Service Delivery Platform Reports Source Systems Inbound Layer Processing and Storage Layer Outbound Layer Consumers Source Systems BATCH SAP Bank Analyzer SAP Design Studio Data Lake SAP HANA Analysis for OLAP Corporate Memory Outbound Layer R Webl External Data REST API SAS Other XS UI5 STREAMING Data Platform 2016 Deloitte Consulting GmbH 10

Insurance Company Global Telematics Platform Issue Solution Impact Changes on agreement level and tariffs are important requirements for the near future. No detailed and standardized information about the customer s master data and driving behavior are available. Manual consolidation processes and management reports with limited ad-hoc reporting. Leverage the real-time processing capabilities of Apache Storm. Take advantage of the storage capabilities and high performance of Hadoop and the analytical and visualization capabilities of R and Tableau. Scope: Real-time analytics and scalability for trip processing and batch processing for scoring to calculate individual insurance rates, based on the driving behavior of the customer. New offerings for insurance leads and customers. Standardized trip data processing to assess the driving behavior of customers, relating to consolidated and standardized master data. Enable enhanced reporting and visualization of customer s driving behavior, and one consolidated view of risk data to the business user. 2016 Deloitte Consulting GmbH 11

Insurance Company Telematics Platform Architecture Data Sources Big Data Platform Data Targets Ingestion Staging Data Transformation Analytics / Scoring Data Usage Contract Data Sqoop/CDC Hive / Pig MapReduce Hive / Pig MapReduce Reports Flume/ REST API Map Info Services REST ActiveMQ Spark Streaming Customer Self Service Customers Migration to Kafka and Spark Streamingrk Streaming App 2016 Deloitte Consulting GmbH 12

Automotive Manufacturer Global Big Data Platform Issue Solution Impact Customer satisfaction and sales offers are a future improvement goal. Vehicle data is essential to analyze and predict near real-time, lack of standardization of contract, finance and vehicle data across sources. Connected car data needs faster central analytics for more valuable services and proactive offers and announcements. Leverage the analytics capabilities of Spark, and the storage and processing capabilities of Hadoop. Take advantage from a structured data warehousing approach on HDFS, with visualization, analytics and reporting based on QlikView, Tableau and R. Enhance the capabilities of Hadoop by enabling streaming, real-time processing, time series and predictive analytics. Standardized data from multiple sources. Enable real-time streaming, replication and analytics of vehicle data using Spark, Kafka and Hadoop. Enhanced reporting & visualization of customer, vehicle and trip related data. A shared data delivery ecosystem for every involved department and production plant. 2016 Deloitte Consulting GmbH 13

Automotive Manufacturer Big Data Platform Architecture Data Sources Big Data Platform Data Targets Ingestion Staging Data Transformation Analytics / Scoring Data Usage Factory Factory Sensor Data Sqoop/ Flume/CDC PIG / Hive Spark PIG / Hive Spark MLlib Flume / REST API Reports Cloud Services Cloud Services Fleet Fleet Spark Streaming Fleet Maps 2016 Deloitte Consulting GmbH 14

Industrializing Big Data Lessons Learned 2016 Deloitte Consulting GmbH 15

Scale of difficulty: 5: difficult, 0: easy Industrializing Big Data - Lessons Learned What are the main challenges in Big Data projects? Difficulty 5 4,5 4 3,5 Project Challenges 3 2,5 2 1,5 1 0,5 0 Budget Constraints Client Expectations Project Time Lines Availability of Data Architectural Decisions 2016 Deloitte Consulting GmbH 16

Industrializing Big Data - Lessons Learned What are the main technical challenges in Big Data projects? Configuration of Hadoop Cluster YARN Container Size Oozie / Hive-Impersonation Spark Distributed Cluster Mode Security Kerberos Authentication Multi-Tenancy, Authorisation and Security Authorisation Policies and Groups Development / Deployment of Applications Jira / Confluence / VMs Jenkins, Maven, SVN, Git Eclipse, IntelliJ IDEA Hive-HBase Dichotomy Hadoop is not a RDBMS Small Files Problem / Compaction Automated Testing of Applications Junit Tests System & Integration Tests HBase Hashing and Distribution Versioning Automation of Applications Scheduling & orchestration: Oozie, Spark, Cron, etc. 2016 Deloitte Consulting GmbH 17

Industrializing Big Data - Lessons Learned What are the main pitfalls in Big Data projects? Business Requirements Missing or frequently changing business requirements lead to flawed architectures. Technical Skills of Project Management Project managers often underestimate the effort to configure Hadoop clusters extending project time lines. Data Quality and Consistency Data samples that are not representative or entirely missing data sets lead to extended project time lines. Management of Client Expectations Many clients expect that Big Data Platforms are more efficient, less expensive and easier to handle than classical Enterprise Data Warehouses (EDW). In reality, Hadoop clusters are usually more difficult to configure and administrate, a.o. due to the lacking skill set in the IT departments. For this reason it is crucial to manage client expectations at the beginning of a Big Data project to achieve the project goals and meet important deadlines. Scalability and Small Files Performance and scalability tests with large data sets on Hadoop Cluster, small files compaction 2016 Deloitte Consulting GmbH 18

Thank you very much for your attention! 2016 Deloitte Consulting GmbH 19

Deloitte refers to one or more of Deloitte Touche Tohmatsu Limited, a UK private company limited by guarantee, and its network of member firms, each of which is a legally separate and independent entity. Please see www.deloitte.com/about or www.deloitte.com/de/ueberuns for a detailed description of the legal structure of Deloitte Touche Tohmatsu Limited and its member firms. Deloitte provides audit, tax, consulting, and financial advisory services to public and private clients spanning multiple industries. With a globally connected network of member firms in more than 150 countries, Deloitte brings world-class capabilities and high-quality service to clients, delivering the insights they need to address their most complex business challenges. Deloitte has in the region of 200,000 professionals, all committed to becoming the standard of excellence. This presentation contains general information only, and none of Deloitte Consulting GmbH or Deloitte Touche Tohmatsu Limited ( DTTL ), any of DTTL s member firms, or any of the foregoing s affiliates (collectively the Deloitte Network ) are, by means of this presentation, rendering accounting, business, financial, investment, legal, tax, or other professional advice or services. In particular this presentation cannot be used as a substitute for such professional advice. No entity in the Deloitte Network shall be responsible for any loss whatsoever sustained by any person who relies on this presentation. 2016 Deloitte Consulting GmbH