What is a Data Lake, anyway? Alec Gardner, GM Advanced Analytics, Teradata ANZ Wednesday 10 th June 2015



Similar documents
Bringing Intergalactic Data Speak (a.k.a.: SQL) to Hadoop Martin Willcox Director Big Data Centre of Excellence (Teradata

Big Data, Start Small! Dr. Frank Säuberlich, Director Advanced Analytics (Teradata International) 26 th May 2015

Teradata s Big Data Technology Strategy & Roadmap

VIEWPOINT. High Performance Analytics. Industry Context and Trends

Big Data for Banking. Kaleem Chaudhry Senior Director, Sales Consulting, ASEAN. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Big Data Integration: A Buyer's Guide

UNIFY YOUR (BIG) DATA

The Future of Data Management

Apigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

The Celebrus v8 Big Data Engine. Powering real-time personalisation, one-to-one data-driven marketing & advanced customer analytics.

Unifying the Enterprise Data Hub and the Integrated Data Warehouse

End Small Thinking about Big Data

INTELLIGENT BUSINESS STRATEGIES WHITE PAPER

Hadoop Data Hubs and BI. Supporting the migration from siloed reporting and BI to centralized services with Hadoop

Are You Big Data Ready?

The Future of Data Management with Hadoop and the Enterprise Data Hub

What to Look for When Selecting a Master Data Management Solution

INVESTOR PRESENTATION. First Quarter 2014

Big Data overview. Livio Ventura. SICS Software week, Sept Cloud and Big Data Day

Agile Business Intelligence Data Lake Architecture

The Data Reservoir as an enabler of differentiating Analytics initiatives

FROM DATA STORE TO DATA SERVICES - DEVELOPING SCALABLE DATA ARCHITECTURE AT SURS. Summary

Achieving Business Value through Big Data Analytics Philip Russom

Exploding Online Sales Using an Effective Order Management System

CrossPoint for Managed Collaboration and Data Quality Analytics

TopBraid Insight for Life Sciences

Accelerate your Big Data Strategy. Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator

Ganzheitliches Datenmanagement

Big Data Can Drive the Business and IT to Evolve and Adapt

THE JOURNEY TO A DATA LAKE

Safe Harbor Statement

Design Approach for a Data Sharing Environment. Presented by Gene Boomer CNO Financial

INVESTOR PRESENTATION. Third Quarter 2014

Beyond the Data Lake

CREATING PACKAGED IP FOR BUSINESS ANALYTICS PROJECTS

The Enterprise Data Hub and The Modern Information Architecture

Engage your customers

Big Data Patterns. Ron Bodkin Founder and President, Think Big

Driving Value From Big Data

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

IBM Data Warehousing and Analytics Portfolio Summary

<Insert Picture Here> Oracle BI Standard Edition One The Right BI Foundation for the Emerging Enterprise

Investor Presentation. Second Quarter 2015

Big Data Discovery: Five Easy Steps to Value

Getting Started Practical Input For Your Roadmap

TopBraid Life Sciences Insight

HDP Hadoop From concept to deployment.

The Next Wave of Data Management. Is Big Data The New Normal?

How To Learn To Use Big Data

Evolution to Revolution: Big Data 2.0

Advanced In-Database Analytics

HDP Enabling the Modern Data Architecture

Analance Data Integration Technical Whitepaper

JOURNAL OF OBJECT TECHNOLOGY

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

Data Lake-based Approaches to Regulatory- Driven Technology Challenges

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Cloudera Enterprise Data Hub in Telecom:

ADVANCED ANALYTICS AND FRAUD DETECTION THE RIGHT TECHNOLOGY FOR NOW AND THE FUTURE

White Paper. Unified Data Integration Across Big Data Platforms

Unified Data Integration Across Big Data Platforms

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

Data Warehouse Overview. Srini Rengarajan

Information-Driven Transformation in Retail with the Enterprise Data Hub Accelerator

Mastering Big Data. Steve Hoskin, VP and Chief Architect INFORMATICA MDM. October 2015

The Top Challenges in Big Data and Analytics

Artur Borycki. Director International Solutions Marketing

Evolving Data Warehouse Architectures

5 Big Data Use Cases to Understand Your Customer Journey CUSTOMER ANALYTICS EBOOK

Data warehouse and Business Intelligence Collateral

Analance Data Integration Technical Whitepaper

IBM Information Management

RED HAT AND HORTONWORKS: OPEN MODERN DATA ARCHITECTURE FOR THE ENTERPRISE

Effecting Data Quality Improvement through Data Virtualization

Business Analytics In a Big Data World Ted Malone Solutions Architect Data Platform and Cloud Microsoft Federal

IBM Big Data in Government

Ramesh Bhashyam Teradata Fellow Teradata Corporation

Enterprise Information Management Capability Maturity Survey for Higher Education Institutions

Unlock the business value of enterprise data with in-database analytics

Using Tableau Software with Hortonworks Data Platform

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.

Data Virtualization: Achieve Better Business Outcomes, Faster

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

Apache Hadoop Patterns of Use

!!!!! BIG DATA IN A DAY!

Data Integration for the Real Time Enterprise

Transcription:

What is a Data Lake, anyway? Alec Gardner, GM Advanced Analytics, Teradata ANZ Wednesday 10 th June 2015

A large objectbased repository that holds data in its native format store all data present and future and create a centralised data archive location. One of the Big Data labels that we risk over-loading to complete abstraction is the idea of a "Data Lake Sometimes called the bit bucket or the landing zone As more and more applications are created that derive value from new types of data the Data Lake forms All Water and Little Substance 3 2014 Teradata

One of the Big Data labels that we risk over-loading to complete abstraction is the idea of a "Data Lake 4 2014 Teradata

Big Idea #1: store all data (whatever all means) Big Idea #2: un-washed, raw data (NoETL / late-binding) and at least part of the discussion sounds familiar Big Idea #3: resolve the nagging problem of accessibility and data integration Data accessibility and integration? Isn t that what the Data Warehouse is for? 5 2014 Teradata

Is the Data Lake a new architectural construct? Or are we building next-generation data silos? Simple, single subject area Dimensional Data Marts with all of the dimensions pre-joined to the fact table? One-perworkload / application? Is this really the future of Enterprise Analytics? Or circa 1995 silo, departmental Decision Support Systems warmed-over and re-platformed? 6 2014 Teradata

7 2014 Teradata Up-front data modelling and data integration are difficult, time-consuming and expensive; maybe we just shouldn t bother with them any more?

but there are no free lunches in Information Management merely more and different options Explicit, or implicit, there is always, always, always (at least one) schema; pay me now, or pay me later (and over and over) 8 2014 Teradata

Gartner Logical Data Warehouse Forrester Enterprise Data Hub Teradata Unified Data Architecture Big Data Are Plural For the foreseeable future, we will need multiple Information Management strategies - and multiple Information Management technologies Integration becomes a critical concern 9 2014 Teradata

The new Big Data The big 5 challenges of making the transactions, to interactions journey #1: The requirement to manage multistructured data and data whose structure changes continuously means that there is no single Information Management strategy that works equally well across the entire Big Data space. #2: Understanding Interactions requires path / graph / time-series Analytics in addition to traditional set-based Analytics, so that there isn t a single parallel processing framework or technology that works equally well across the entire Big Data space. #3: The economic challenge of capturing, storing, managing and exploiting Big Data sets that may be large; getting larger quickly; noisy; of (as yet) unproven value; and infrequently accessed. #4: There might be a needle in one of these haystacks - but if it takes 6-12 months and $1M just to go look, I ll never know. #5: Getting past so what to drive real business value (because old business process + expensive new technology = expensive, old business process) 10

The Logical Data Warehouse is the industry s adaptation to Big Data How will you deploy? How many / which platforms will you need? How will you integrate them? And which data need to be centralised and integrated? The Enterprise Data Warehouse Era The Logical Data Warehouse (a.k.a.: Unified Data Architecture) Era 1 Multi-structured data 2 Interaction / observation Analytics 5 3 4 Flat / falling IT budgets, exploding data volumes Agile Exploration & Discovery 1 3 2 4 Give me integrated, high quality data. 5 Operationalisation Centralise and integrate the data that are widely reused and shared, but integrate all of the analytics. 11

12 2014 Teradata

The Data Lake as it should be: a centralised, consolidated store of raw data from multiple sources Agile acquisition of raw, multi-structured data efficient non-relational computation and cost-effective storage of large and noisy data-sets Now that is new, interesting and potentially very, very useful 13 2014 Teradata

Repeatable process and reproducible results are the basis of experimental science Data. Science. successful, real-world laboratories are not ungoverned environments. 14 2014 Teradata

Any chain is only as strong as its weakest link The Enterprise Information Value Chain Where did this data come from and when was it created? What did its creator believe that it represented? Full copy or delta file? Part of a collection? Etc., etc., etc. 15 2014 Teradata

STOP PRESS: Laws of Physics Unchanged by Big Data! Left to its own devices, does nature tend to give us a single, beautiful lake? Or a messy patchwork of lakes, plural? The new information management strategies and technologies are not a cure for information entropy; a well-ordered Data Lake will not just spontaneously emerge from a collection of consolidated Hadoop applications. 16 2014 Teradata

Architecture and Technology are only enablers Web / clickstream Who navigates to the website, what do they do in each session and then afterwards within other channels? Sentiment What are customers saying about the company / products / services on social media sites? Voice/text Who is complaining to the call center & about what? E-mail / Graph Which brokers are colluding to rig markets and with whom? Process / Path Analytics What s the optimal process for claims or collections activity? 21 2014 Teradata

22 2014 Teradata All Data And The Big Bit Bucket Myth

Operationalising Exploration & Discovery Insight Type 1 Discovery Analytics identifies a broken process; Fix the business process and you are done; E.g.: Large US Telco used Path Analytics to understand pathto-call-centre journey. Teradata Unified Data Architecture Type 2 Discovery Analytics enables us to identify / extend an existing model that is already being scored in the IDW; E.g.: a leading US Retail Bank was able to enhance its existing churn models by understanding the path to churn. Type 3 New model is dependant on data that is not yet available in the IDW and/or on Analytics (e.g.: path, graph, time-series, etc.) that cannot easily be supported there; Operationalisation requires run-time integration of multiple, distributed components. QueryGrid Integration Technology 23 2014 Teradata

24 2014 Teradata Summary and Conclusions

25 2014 Teradata