BIG DATA ARCHITECTURE AND ANALYTICS BIG DATA STRATEGIES FOR BUSINESS GROWTH



Similar documents
HDP Hadoop From concept to deployment.

The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi,

Hadoop Ecosystem B Y R A H I M A.

The Future of Data Management

Upcoming Announcements

Comprehensive Analytics on the Hortonworks Data Platform

How To Create A Data Visualization With Apache Spark And Zeppelin

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Dominik Wagenknecht Accenture

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Reference Architecture, Requirements, Gaps, Roles

ITG Software Engineering

Big Data Use Case: Business Analytics

Building Scalable Big Data Pipelines

Talend Big Data. Delivering instant value from all your data. Talend

Putting Apache Kafka to Use!

Big Data Web Analytics Platform on AWS for Yottaa

HDP Enabling the Modern Data Architecture

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Ali Ghodsi Head of PM and Engineering Databricks

Big Data Course Highlights

Designing Agile Data Pipelines. Ashish Singh Software Engineer, Cloudera

YARN, the Apache Hadoop Platform for Streaming, Realtime and Batch Processing

How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)

Roadmap Talend : découvrez les futures fonctionnalités de Talend

Business Intelligence Competency Partners Untangling the Confusion What SAP BW powered by HANA and HANA LIVE mean to your organization

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

The Digital Enterprise Demands a Modern Integration Approach. Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader

The Future of Data Management with Hadoop and the Enterprise Data Hub

Analyzing Big Data with AWS

A Case Study of Hadoop in Healthcare

How Companies are! Using Spark

Building a data analytics platform with Hadoop, Python and R

Moving From Hadoop to Spark

QUICK FACTS. Delivering a Unified Data Architecture for Sony Computer Entertainment America TEKSYSTEMS GLOBAL SERVICES CUSTOMER SUCCESS STORIES

Scaling Agile Is Hard, Here s How You Do It!

Getting Real Real Time Data Integration Patterns and Architectures

The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Cisco IT Hadoop Journey

The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer,

Information Builders Mission & Value Proposition

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Big Data Analytics Nokia

The Top 10 7 Hadoop Patterns and Anti-patterns. Alex

Large scale processing using Hadoop. Ján Vaňo

Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing

Data Science in the Travel Industry

Real Time Data Processing using Spark Streaming

Kafka & Redis for Big Data Solutions

Big Data Realities Hadoop in the Enterprise Architecture

Analytics on Spark &

Data Virtualization A Potential Antidote for Big Data Growing Pains

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

The Principles of the Business Data Lake

Ten Cornerstones of a Modern Data Warehouse Environment

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Making Sense of Big Data in Insurance

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

CAPTURING & PROCESSING REAL-TIME DATA ON AWS

Brent Atkins, Cris Hadjez An Agile BI Approach: Mead Johnson Uses Better Data to Push Boundaries and Increase Customer Value Session # 3544

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Informatica Data Replication: Maximize Return on Data in Real Time Chai Pydimukkala Principal Product Manager Informatica

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Making Leaders Successful Every Day

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Apache Flink Next-gen data analysis. Kostas

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

SAS Data Management Technologies Supporting a Data Governance Process. Dave Smith, SAS UK & I

3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

Oracle Big Data Spatial & Graph Social Network Analysis - Case Study

Trend Micro Big Data Platform and Apache Bigtop. 葉 祐 欣 (Evans Ye) Big Data Conference 2015

Introduction to Apache Kafka And Real-Time ETL. for Oracle DBAs and Data Analysts

Hadoop in the Enterprise

The Role of Data & Analytics in Your Digital Transformation. Michael Corcoran Sr. Vice President & CMO

Big Data Success Step 1: Get the Technology Right

Cloudera Enterprise Data Hub in Telecom:

A Scalable Data Transformation Framework using the Hadoop Ecosystem

Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science

ITG Software Engineering

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Solving performance and data protection problems with active-active Hadoop SOLUTIONS BRIEF

Transcription:

BIG DATA ARCHITECTURE AND ANALYTICS BIG DATA STRATEGIES FOR BUSINESS GROWTH Aaron Werman aaron.werman@firstdata.com or aaron.werman@gmail.com if you are in the payments industry, LinkedIn group Big Data Payments Processing and Credit Cards

Agenda Fast and slow data and fast and slow thinking - understanding your analytic goals The two big data successes Big data for data delivery, separated from product and application Big data in a product, as core for the product Minimum Viable Product, agile, and the curse of bad data Getting data out of deliverables Data characteristics Summary TOPICS The whole problem with the world is that fools and fanatics are always so certain of themselves and wiser people so full of doubts. Bertrand Russell

Setting up a big data organization in an established business My recent experience My current role is leading data innovation (primarily big data) in a Fortune 500, and building an enterprise data hub. Been here almost 2 years, initiated MDM, DG, DQ, DS as well It turns out that using big data techniques, we can deliver that data much better/faster to market/cheaper than existing teams We are now delivering substantial parts of many new systems. Our typical time to market is 6-8 weeks, vs. 6+ months for the company historically we are morphing into a data delivery team, rather than just data intermediators In many cases, RDBMS is a better solution and we use that a lot Lots of streaming data, lots of dashboards BIG DATA ROLE IN COMPANY If any of you are interested in big data issues, especially Hadoop, Kafka, Spark, Hive lets discuss

Setting up a big data organization in an established business My recent experience My current role is leading data innovation (primarily big data) in a Fortune 500, and building an enterprise data hub. Been here almost 2 years, initiated MDM, DG, DQ, DS as well It turns out that using big data techniques, we can deliver that data much better/faster to market/cheaper than existing teams We are now delivering substantial parts of many new systems. Our typical time to market is 6-8 weeks, vs. 6+ months for the company historically we are morphing into a data delivery team, rather than just data intermediators In many cases, RDBMS is a better solution and we use that a lot Lots of streaming data, lots of dashboards BIG DATA ROLE IN COMPANY If any of you are interested in big data issues, especially Hadoop, Kafka, Spark, Hive lets discuss

Classic big data: the old school is batch and Hadoop centric TOPICS Sqoop JDBC Sqoop JDBC Hadoop fs -get Hadoop fs -put Hadoop Flume HDFS Hive Map Reduce Flume

Newer Big Data BIG DATA live data stream from Kafka Spark Streaming Microbatches Kafka, Spark, Flink, Storm, Samza, Yarn, Mesos Spark Scala program with analytic output

Big data as an agent of change Big data is becoming a mix of: data for new apps (as in big data is central to the app) old data (a/k/a transactional data) everything that was left on the floor Big data is more about techniques that support horizontal scaling than about data size or volume/velocity/variety And sometimes public clouds (private clouds are generally jokes) The big wins in big data are generally new capabilities/apps not warehousing in big data Keeping everything is power later on BIG DATA

MVP - Minimum Viable Product Key experience is Waterfall generally doesn t align with requirements agile and devops work better if staff is very able (your staff probably is not!) There is always conflict between structure and innovation innovators dilemma says controlled enterprises cannot innovate as well as disruptors (are you part of the problem?) Data products generally are layered Core data that is clearly defined, and is business critical Layers around that are product based, not business function based Focusing on the core data for innovation is critical to allow applications to get real feedback Always try to push out the core data part first and reviewed even if the rest of the data is not yet ready FOCUS ON DATA

Getting data out of deliverables Roadmaps to data landscapes generally fail Consider a set of tactical plans that make data deliverables part of other projects In relatively short time, critical data will be identified Roadmap into strategy works better than top down strategic initiatives for most But most things underfunded or with log deliverables fail DELIVERING DATA In effect, this becomes a data sub-project to each major development initiative This is one of the rare cases where a data tax is justifiable

The two big data successes Established data areas generally fail at big data Often branding it a success (but still) Wrong projects, people, skills, attitude, SDLC, management. Support and conflict with established power Success 1: Data lake And related ILM Especially if it uses as-raw-as-possible data Big data has not solved the ETL question, just confuses people about it UNDERSTANDING YOUR DATA Success 2: Product for-purpose use of big data technology Better, faster, cheaper In many cases, old-school will work for large company IT But big data (as a brand) sometimes gets you out of paralyzing paradigms: cost, process controls meant for maintenance that block innovation

Summary Choose your big data paradigms: fast data vs. batch Data is often used by what you can get, not by what is best Data delivery should be separated from product and application delivery. Winners have the most valuable data use of big data make data more valuable is a clear win Big data is mostly a lever for applications. From a data perspective, it is mostly a secondary EDW/ODS Minimum Viable Product works from a data perspective Roadmap and strategy work for big data, but strategy is politically hard with high risk REVIEW