Big Data Can Drive the Business and IT to Evolve and Adapt

Similar documents
BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

NEWLY EMERGING BEST PRACTICES FOR BIG DATA

The Future of Data Management

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

The Enterprise Data Hub and The Modern Information Architecture

Apache Hadoop in the Enterprise. Dr. Amr Awadallah,

Getting Started Practical Input For Your Roadmap

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

The Evolving Role of the Enterprise Data Warehouse in the Era of Big Data Analytics. A Kimball Group White Paper By Ralph Kimball

Big Data Analytics Best Practices

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Big Data Analytics Nokia

How To Handle Big Data With A Data Scientist

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

BIG DATA What it is and how to use?

Are You Ready for Big Data?

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Design tip #156 containing the drill- across macro:

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

I/O Considerations in Big Data Analytics

Advanced Big Data Analytics with R and Hadoop

The Future of Data Management with Hadoop and the Enterprise Data Hub

Are You Ready for Big Data?

Transforming the Telecoms Business using Big Data and Analytics

Trafodion Operational SQL-on-Hadoop

Cisco IT Hadoop Journey

Native Connectivity to Big Data Sources in MSTR 10

Architectures for Big Data Analytics A database perspective

How To Scale Out Of A Nosql Database

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Extend your analytic capabilities with SAP Predictive Analysis

Ramesh Bhashyam Teradata Fellow Teradata Corporation

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Information Builders Mission & Value Proposition

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Oracle Big Data SQL Technical Update

Ganzheitliches Datenmanagement

MapR: Best Solution for Customer Success

Cost-Effective Business Intelligence with Red Hat and Open Source

Virtualizing Apache Hadoop. June, 2012

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Modernizing Your Data Warehouse for Hadoop

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Please give me your feedback

This Symposium brought to you by

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Ten Cornerstones of a Modern Data Warehouse Environment

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

Big Data for Investment Research Management

HDP Hadoop From concept to deployment.

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

Tap into Hadoop and Other No SQL Sources

Luncheon Webinar Series May 13, 2013

Teradata Unified Big Data Architecture

Big Data Architectures. Tom Cahill, Vice President Worldwide Channels, Jaspersoft

Talend Big Data. Delivering instant value from all your data. Talend

Big Data and Your Data Warehouse Philip Russom

Teradata s Big Data Technology Strategy & Roadmap

Apache Hadoop Patterns of Use

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Analytics on Spark &

Modern Data Architecture for Predictive Analytics

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Big Data. Value, use cases and architectures. Petar Torre Lead Architect Service Provider Group. Dubrovnik, Croatia, South East Europe May, 2013

Big Data Er Big Data bare en døgnflue? Lasse Bache-Mathiesen CTO BIM Norway

Data Virtualization A Potential Antidote for Big Data Growing Pains

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

More Data in Less Time

Traditional BI vs. Business Data Lake A comparison

A New Era Of Analytic

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & MANAGEMENT INFORMATION SYSTEM (IJITMIS)

An Oracle White Paper October Oracle: Big Data for the Enterprise

Applications for Big Data Analytics

UNIFY YOUR (BIG) DATA

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Moving From Hadoop to Spark

HADOOP AND MAINFRAMES CRAZY OR CRAZY LIKE A FOX? Mike Combs, VP of Marketing mcombs@veristorm.com

The Potential of Big Data in the Cloud. Juan Madera Technology Consultant

VIEWPOINT. High Performance Analytics. Industry Context and Trends

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE

Big Data Success Step 1: Get the Technology Right

Parallel Data Warehouse

Big Data Technologies Compared June 2014

In-Memory Analytics for Big Data

SAP and Hortonworks Reference Architecture

Best Practices for Hadoop Data Analysis with Tableau

How To Use Big Data For Business

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Transcription:

Big Data Can Drive the Business and IT to Evolve and Adapt Ralph Kimball Associates 2013 Ralph Kimball Brussels 2013

Big Data Itself is Being Monetized Executives see the short path from data insights to revenue and profit Big data provides new insights, especially behavior Analytic sandbox results taken directly to management Data is becoming an asset on the balance sheet Value by Cost to produce the data Cost to replace the data if it is lost Revenue or profit opportunity provided by the data Revenue or profit loss if data falls into competitors hands Legal exposure from fines and lawsuits if data is exposed to the wrong parties

The DW: An Increasingly Sophisticated Platform Low latency operational data mixed with history New sources, new users, mixed workloads Customer behavior data Web traffic, session analysis, significant integration Analysis of big data Real time intervention into on-going processes Non-relational programming: complex branching & procedural processing, advanced analytics Many new data formats and data types: social media; device sensors, unstructured and hyper-structured Mission unchanged: Publish the Right Data

Big Data Analytic Use Cases Behavior tracking Search ranking Ad tracking Location and proximity tracking Causal factor discovery Social CRM Share of voice, audience engagement, conversation reach, active advocates, advocate influence, advocacy impact, resolution rate, resolution time, satisfaction score, topic trends, sentiment ratio, and idea impact Financial account fraud detection/intervention System hacking detection/intervention On line game gesture tracking

More Big Data Use Cases Non-numeric data and unique algorithms Document similarity testing Genomics analysis Cohort group discovery Satellite image comparison CAT scan comparisons Big science data collection Complex numeric data Smart utility meters Building sensors In flight aircraft status Data bags name/value pairs with ad hoc content

Data Bags: Extremely Sparse With Unpredictable Content E.g., financial disclosure information on loan application: Payload is arbitrary list of name-value pairs No limit to number of pairs or value data types Data bags may contain data bags recursively

Big Data Requires Integration: Eliminating Cisco s Data Silos From:

Houston: We Have a Problem The traditional pure relational data warehouse architecture can t handle ANY of these use cases. We need: Non-scalar data: vectors, arrays, data bags, structured text, free text, images, waveforms Iterative logic, complex branching, advanced statistics Petabyte data sources loaded at gigabytes/second Analysis in place across thousands of distributed processors, data often not in database format, full data scans often needed Schema on read Analysis while loading

Two Architectures to the Rescue: Zero Sum or Hybrid Coexistence? Extended relational Extend the current formidable RDBMS legacy Add features/functions to address big data analytics MapReduce/Hadoop Build new architecture for big data analytics Open source top level Apache project Thousands of participants

Existing RDBMS Based Data Warehouse

Extended Relational Data Warehouse with Big Data Additions

MapReduce/Hadoop Figure 2.3 from Tom White's book, Hadoop, The Definitive Guide, 2nd Edition, (O'Reilly, 2010)

Hadoop Distributed File System HDFS is the distributed file system that supports a number of Apache Hadoop projects: Native MapReduce programming (Java, Ruby, C++) High level MapReduce compilers Pig \procedural interface, leverages DBMS skills Hive SQL interface, described as a data warehouse Hbase open source, non-relational, column oriented DB running directly on HDFS Cassandra open source, distributed database based on BigTable data model

Comparing the Two Architectures Relational DBMSs Proprietary, mostly Expensive Data requires structuring Great for speedy indexed lookups Deep support for relational semantics Indirect support for complex data structures Indirect support for iteration, complex branching Deep support for transaction processing MapReduce/Hadoop Open source Less expensive Data does not require structuring Great for massive full data scans Indirect support for relational semantics, e.g. Hive Deep support for complex data structures Deep support for iteration, complex branching Little or no support for transaction processing

Hybrid Possibilities

Data Warehouse Oriented Architecture Alternatives MapReduce/Hadoop as an ETL step Extract row/column results from Big Data HDFS as an ultra low cost archive $10,000 per TB vs. $300 per TB! Really? SQL compatibility proliferating in BD apps Hive of course, but even tools like Splunk HDFS nodes as actual MPP RDBMS nodes! Clever possibilities for Teradata, Greenplum, etc Pseudo-HDFS and Pseudo-Apache apps Proprietary HW and SW configurations

Returning to the Integration Challenge Big data integration is familiar drilling across: Establish conformed attributes (e.g., Customer Category) in each database Fetch separate answer sets from different platforms grouped on the conformed attributes Sort-merge the answer sets at the BI layer

Implementing Caches of Increasing Latency Each cache supports different class of applications Data quality improves left to right High value facts and dimensions pushed right to left

Data Warehouse Disruptions The rise of the data scientist "newly discovered patterns have the most disruptive potential, and insights from them lead to the highest ROIs." Sandboxes Demand for low latency Thirst for exquisite detail Light touch data waits for relevance to be exposed Simple analysis of all the data trumps sophisticated analysis of some of the data Dozens of special purpose programming languages Have we killed the golden goose of DW integration?

Whither the Data Warehouse? Corral and embrace the data scientists Scientists and IT must meet each other half way Insist on using shared data warehouse resources Conformed dimensions integrate now, avoid data silos! Virtualized data sets quick prototypes, migrate to prod. Build a cross department analytic community with IT partnership Ditch the waterfall approach, go agile Led by business but sophistication required Closely spaced deliverables, midcourse corrections

The Kimball Group Resource www.kimballgroup.com Best selling data warehouse books NEW BOOK! DW Toolkit 3 rd Edition In depth data warehouse classes taught by primary authors Dimensional modeling (Ralph/Margy) ETL architecture (Ralph/Bob) Dimensional design reviews and consulting by Kimball Group principals Kimball Group White Papers on Integration, Data Quality, and Big Data Analytics