Hadoop Data Hubs and BI. Supporting the migration from siloed reporting and BI to centralized services with Hadoop

Similar documents
TOP 8 TRENDS FOR 2016 BIG DATA

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Bringing Strategy to Life Using an Intelligent Data Platform to Become Data Ready. Informatica Government Summit April 23, 2015

More Data in Less Time

The Future of Data Management

The Enterprise Data Hub and The Modern Information Architecture

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

Ganzheitliches Datenmanagement

Luncheon Webinar Series May 13, 2013

ENTERPRISE BI AND DATA DISCOVERY, FINALLY

Data Virtualization A Potential Antidote for Big Data Growing Pains

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

TECHNOLOGY TRANSFER PRESENTS MIKE FERGUSON NEXT GENERATION DATA MANAGEMENT BUILDING AN ENTERPRISE DATA RESERVOIR AND DATA REFINERY

Data Integration Checklist

Data Integration for the Real Time Enterprise

Master Your Data and Your Business Using Informatica MDM. Ravi Shankar Sr. Director, MDM Product Marketing

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015

Information Architecture

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

HDP Hadoop From concept to deployment.

Big Data Integration: A Buyer's Guide

Deploying an Operational Data Store Designed for Big Data

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

The Impact of PaaS on Business Transformation

Business Data Authority: A data organization for strategic advantage

Ten Cornerstones of a Modern Data Warehouse Environment

A Comprehensive Review of Self-Service Data Visualization in MicroStrategy. Vijay Anand January 28, 2014

FROM DATA STORE TO DATA SERVICES - DEVELOPING SCALABLE DATA ARCHITECTURE AT SURS. Summary

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

How to Enhance Traditional BI Architecture to Leverage Big Data

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

SAP Agile Data Preparation

Agile Business Intelligence Data Lake Architecture

Oracle Cloud: Line of Business PaaS Services. Balaji Yelamanchili Senior Vice President Product Development

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Oracle Data Integration: CON7926 Oracle Data Integration: A Crucial Ingredient for Cloud Integration

Hadoop Trends and Practical Use Cases. April 2014

Native Connectivity to Big Data Sources in MSTR 10

Are You Big Data Ready?

What s New with Informatica Data Services & PowerCenter Data Virtualization Edition

White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.

Oracle Analytics A New Day. Nick Whitehead Senior Director, Oracle Business Analytics, EMEA

Data Integration Hub

Informatica and our product strategy

Effective Data Integration - where to begin. Bryte Systems

Why You Still Need to Master Your Data Before You Master Your Business (Intelligence) Business Imperatives Addressed By Reliable, Integrated View

Informatica PowerCenter The Foundation of Enterprise Data Integration

Endeca Introduction to Big Data Analytics

VIEWPOINT. High Performance Analytics. Industry Context and Trends

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Safe Harbor Statement

Priyo Lahiri Partner Technical Consultant Microsoft Corporation

Enterprise Data Quality

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Independent process platform

Roadmap Talend : découvrez les futures fonctionnalités de Talend

Ten Things You Need to Know About Data Virtualization

Informatica Platform v10 for: Next Generation Analytics Cloud Modernization Data Archiving. Presented by Ilya Gershanov

What to Look for When Selecting a Master Data Management Solution

Data Integration Alternatives & Best Practices

Evolution of Information Management Architecture and Development

Creating a Business Intelligence Competency Center to Accelerate Healthcare Performance Improvement

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

INTELLIGENT BUSINESS STRATEGIES WHITE PAPER

Integrating a Big Data Platform into Government:

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Why Most Big Data Projects Fail

The Future of Data Management with Hadoop and the Enterprise Data Hub

ADAPTABLE IDENTITY GOVERNANCE AND MANAGEMENT

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE

Getting Started Practical Input For Your Roadmap

HDP Enabling the Modern Data Architecture

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

How to Run a Successful Big Data POC in 6 Weeks

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Traditional BI vs. Business Data Lake A comparison

The SAS Transformation Project Deploying SAS Customer Intelligence for a Single View of the Customer

Data Virtualization and ETL. Denodo Technologies Architecture Brief

Cisco IT Hadoop Journey

EMC/Greenplum Driving the Future of Data Warehousing and Analytics

Data Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Big Data Analytics Nokia

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Data Virtualization. Paul Moxon Denodo Technologies. Alberta Data Architecture Community January 22 nd, Denodo Technologies

TOTAL DATA WAREHOUSING:

Melissa Coates. Tools & Techniques for Implementing Corporate and Self-Service BI. Triad SQL BI User Group 6/25/2013. BI Architect, Intellinet

Azure Data Lake Analytics

Building Confidence in Big Data Innovations in Information Integration & Governance for Big Data

Oracle Big Data Building A Big Data Management System

Transcription:

Hadoop Data Hubs and BI Supporting the migration from siloed reporting and BI to centralized services with Hadoop John Allen October 2014

Introduction John Allen; computer scientist Background in data analytics, distributed systems, enterprise architecture and hacking things at home. Consultancy, start-ups and big businesses; government, gaming, telco and finance John.howard.allen@gmail.com

Objectives Outline the benefits of Hadoop for data staging (Data Lake) Explain why Hadoop is an effective part of an analytics service (Data Lake) Talk a little about how Hadoop can play a role in transforming an organization's data architecture (Enterprise Data Hub)

Disclaimer Conceptually we re not trying to do anything different, the goals of data management and business intelligence remain the same. What s different is Data the Standards trade offs and compromises we can now make ELT or ETL Data masking And the cost of managing change Data governance Cloud or Private Data Profiling Scale up or Scale out Early or Late Optimization Commodity or Specialist Metadata Management Master Data Management Model Driven Data Quality Schema on Write Schema on Need

What is new Unrivalled volume of data Increased variety of sources and formats Need for improved approach to change Agility Cost Flexibility Improved ROI Demand for increased data led decision making Demand for predictive, not just descriptive User expectations around Timeliness Accuracy Ease of use Concerns: Technology Centric Needs: Business Centric

Changes in Data Management 2010: Hadoop was a build it yourself batch processing system 2014: Hadoop is an off the shelf extensible data processing platform with a range of streaming, in-memory and batch processing 80% of all Informatica products to run natively on Hadoop - Ori Lev Ran, Senior Director Big Data and Hadoop: more than just volume and cost Analysts recognise that agility and flexibility are a key components of the BigData story (schema on read, data model flexibility) Gartner, 2014 Hadoop 2 and YARN: game changing data-processing platform Vendors seamlessly deploy their own applications into our cluster and data Run machine learning, streaming, ETL applications and batch job all on the same platform

Changes in Data Integration 80% of Informatica suite will run natively on Hadoop by 2014 Ori Lev Ran, Sr. Director, Strategic Business Development Vendors promoting BigData as compute platforms Data analysis now possible of without traditional ETL (NoETL) ( schema on read ) Industry analysts recognising value of BigData for structured and unstructured data Industry analysts talk about the rise of Data Hubs serving the Logical Data Warehouse NoETL, Data Hubs and semantic data services are a key component of DI in 2014 2013 Ted Friedman, VP Distinguished Analyst, Data Integration Analysts recommend building a Logical data Warehouse using a blend of traditional and BigData technologies

Data Data Everywhere Data Management Heterogeneous IT Complex integration (P2P) Multiple LOB systems, ODS, warehouses Range of models and standards Usage and Analytics Slow BI responsiveness Range of tooling and approaches Departmental approaches Driver: Rapid Growth / Acquisition Driver: Traditional IT Models

Data Hub - Analytics Enablement Data Management Centralized data platform Strategic data sourcing Common data interface Enhanced retention & fidelity Increased compute (MPP) Reduce cost of ETL Usage and Analytics Centralized analytics Self-service BI Advanced data analysis Benefit: Lower Complexity Benefit: Improved TTM Benefit: Improved Insight

Data Hub Reporting Consolidation Data Management Data standards enabled De-duplication and consolidation Reporting focused Usage and Analytics Migration of ad-hoc reporting (Excel) Report consolidation Increased automation of reports Benefit: Reduced Cost Benefit: Improved Accuracy Benefit: Improved Timeliness

Data Hub Data Bus Integration Data Management Removal of P2P Pub/Sub communication Distribution of data products Usage and Analytics Service oriented Event based Benefit: Reduced Cost Benefit: Reduced Duplication Benefit: Improved Timeliness

Functional View Discover Find data by taxonomy, metadata, keys Self-service, data entitlement gateway On-boarding Dedicated managed service All sources and data types Process / Transform Batch and on-demand transform Enrichment and heavy lifting Store Polystructure, elastic, fault-tolerant, MPP Secure, governed, many interfaces. Access and Entitlement Controlled audited access SQL, file, API, web interfaces Manage ITIL services Analysis Descriptive, reactive and predictive Self-service and business led Visualisation Dashboard development Interactive data exploration

Putting it all Together

Next Steps Improved Data Discovery and Data Cleansing Trifacta, Paxata, Established Players Improved On-Cluster Complex Analytics Actian (KNIME), Rapid Miner (Rahoop), SAS Improved Unstructured Support and Search Squirro Improved SQL and (H)OLAP support Splice Machine, Impala 2.0 Improved Metadata Management Informatica, Navigator, Others

Challenges Feed Management and Reconciliation Metadata Management and Lineage Globalized Data Management Sovereign data, Regulatory constraints Security (i.e. Row/Cell-Level) Avoiding Silos and Vendor Lock-in

Thank You