Evolving Data Warehouse Architectures



Similar documents
Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Big Data and Your Data Warehouse Philip Russom

Evolving Data Warehouse Architectures

Evolving Data Warehouse Architectures

Achieving Business Value through Big Data Analytics Philip Russom

Big Data and Your Data Warehouse Philip Russom

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

HDP Hadoop From concept to deployment.

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Introducing Oracle Exalytics In-Memory Machine

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

E-Guide BRINGING BIG DATA INTO A DATA WAREHOUSE ENVIRONMENT

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

HADOOP BEST PRACTICES

Five Technology Trends for Improved Business Intelligence Performance

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY

IBM Data Warehousing and Analytics Portfolio Summary

How To Manage Big Data

BIG DATA APPLIANCES. July 23, TDWI. R Sathyanarayana. Enterprise Information Management & Analytics Practice EMC Consulting

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM

E-Guide HADOOP MYTHS BUSTED

Agile Business Intelligence Data Lake Architecture

INTEGRATING HADOOP INTO BUSINESS INTELLIGENCE AND DATA WAREHOUSING

Big Data Defined Introducing DataStack 3.0

Traditional BI vs. Business Data Lake A comparison

Active Data Archiving

Hadoop for the Enterprise:

Bringing the Power of SAS to Hadoop. White Paper

The Growing Practice of Operational Data Integration. Philip Russom Senior Manager, TDWI Research April 14, 2010

Evolution to Revolution: Big Data 2.0

The Future of Data Management

Navigating the Big Data infrastructure layer Helena Schwenk

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Building Confidence in Big Data Innovations in Information Integration & Governance for Big Data

Luncheon Webinar Series May 13, 2013

Artur Borycki. Director International Solutions Marketing

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE

INTELLIGENT BUSINESS STRATEGIES WHITE PAPER

The Intersection of Big Data and Analytics. Philip Russom TDWI Research Director for Data Management May 5, 2011

INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & MANAGEMENT INFORMATION SYSTEM (IJITMIS)

Applied Business Intelligence. Iakovos Motakis, Ph.D. Director, DW & Decision Support Systems Intrasoft SA

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

White Paper. Unified Data Integration Across Big Data Platforms

Unified Data Integration Across Big Data Platforms

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture

So Many Tools, So Much Data, and So Much Meta Data

IST722 Data Warehousing

PRIME DIMENSIONS. Revealing insights. Shaping the future.

High-Performance Analytics

Getting Started with Data Governance. Philip Russom TDWI Research Director, Data Management June 14, 2012

Table of Contents. Research Methodology and Demographics 3 Executive Summary 4 Introduction to Big Data Management 5

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

Driving Peak Performance IBM Corporation

Data Warehousing in the Cloud

Ne x t gener ation Data Warehouse Pl atforms

Practical Approaches to Big Data & Analytics: From Infrastructure to

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

TDWI research TDWI BEST PRACTICES REPORT FOURTH QUARTER 2013 MANAGING BIG DATA. By Philip Russom. tdwi.org

Getting Started Practical Input For Your Roadmap

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

NEWLY EMERGING BEST PRACTICES FOR BIG DATA

Modern Data Warehousing

of DATA FUTURE The WAREHOUSING Best Practices Series IBM Syncsort PAGE 4 PAGE 6 WHY CLOUD IS THE FUTURE OF DATA WAREHOUSING

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

How To Turn Big Data Into An Insight

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Microsoft Analytics Platform System. Solution Brief

2015 Ironside Group, Inc. 2

BIG DATA What it is and how to use?

Business Intelligence for Big Data

ANALYTICS STRATEGY: creating a roadmap for success

Big Data Analytics Nokia

Next Generation Data Warehousing Appliances

Cloud Integration and the Big Data Journey - Common Use-Case Patterns

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

Management Consulting Systems Integration Managed Services WHITE PAPER DATA DISCOVERY VS ENTERPRISE BUSINESS INTELLIGENCE

In-Database Analytics

Big data and corrections: what s the big issue? Corrections Technology Association June 4, 2013

Ten Things You Need to Know About Data Virtualization

Automated Business Intelligence

The 3 questions to ask yourself about BIG DATA

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

An Oracle White Paper October Oracle: Big Data for the Enterprise

Establish and maintain Center of Excellence (CoE) around Data Architecture

Whitepaper: Solution Overview - Breakthrough Insight. Published: March 7, Applies to: Microsoft SQL Server Summary:

Big Data Integration: A Buyer's Guide

CREATING PACKAGED IP FOR BUSINESS ANALYTICS PROJECTS

Virtualizing Apache Hadoop. June, 2012

Endeca Introduction to Big Data Analytics

How To Use Big Data For Business

BIG DATA SURVEY 2014 SURVEY

Business Intelligence Maturity Model. Wayne Eckerson Director of Research The Data Warehousing Institute

BI Market Dynamics and Future Directions

Accelerate Business Advantage with Dynamic Warehousing

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers

Data Virtualization. Paul Moxon Denodo Technologies. Alberta Data Architecture Community January 22 nd, Denodo Technologies

An Oracle White Paper June Oracle: Big Data for the Enterprise

Technology Innovations for Enhanced Database Management and Advanced BI

Transcription:

Evolving Data Warehouse Architectures In the Age of Big Data Philip Russom April 15, 2014

TDWI would like to thank the following companies for sponsoring the 2014 TDWI Best Practices research report: Evolving Data Warehouse Architectures This presentation is based on the findings of that report. STAY TUNED At the end of this webinar, learn how to download a free copy of the report.

Agenda Definitions of Data Warehouse Architectures Drivers of Change Benefits & Barriers From EDWs to DWEs Role of Hadoop Analytics versus Reporting Trends among Architectural Components and Practices Top Ten Priorities PLEASE TWEET @prussom, #TDWI, #EDW, #DataWarehouse, #DataArchitecture, #Analytics, #Hadoop

Upcoming Points There isn t one, single architecture for all data warehouses (DWs) Each org is different Expect multiple architectures A well-designed DW has multiple architectural layers Architectural approaches get mixed together into hybrids A DW architecture interacts with architectures for data integration, reporting, analytics, operational applications, etc. The warehouse is still vital, even central But it s evolving into a multiple platform environment Architecture is more important than ever, but now as a logical design that s deployed over multiple physical platforms Please don t ask me to draw a Reference Architecture for DWs Given the current diversity, there isn t just one. But I ll describe many.

What do you think data warehouse architecture is? Select all that apply. Source: TDWI survey run in late 2013. Based on 1197 responses from 538 respondents. 2.2 responses per respondent, on average.

Logical versus Physical DW Architectures And Other Architectural Components that Coexist Today s Focus Logical architecture mostly about data models and their relationships, with a focus on how these represent organizational entities and processes Data standards including standards for data modeling, data quality metrics, interfaces for data integration, programming style, format standards, etc. Physical architecture mostly a plan for deploying data and data structures based on the workload and platform requirements of each System architecture a topology of hardware servers and software servers, plus the interfaces and networks that tie them together

Drivers of Change Does your primary enterprise data warehouse have an architectural design? Yes 79% No 18% Don t know 3% Is the architecture of your data warehouse environment evolving? Yes moderately 54% Yes dramatically 22% No except with DW updates 22% Don t know 2% What technical issues or practices are driving change in your DW architecture? Advanced analytics 57% Increasing data volumes 56% Real-time operations 41% Business performance mgt 38% OLAP 30% Non-relational data 25% Virtualization of data 23% Cloud adoption 21% Streaming data 15% What business issues or practices are driving change in your DW architecture? Competitiveness 45% Fast-paced business processes 43% Compliance 29% Funding 29% Sponsorship 26% Reorganizations 25% Centralizing business control 30% Departmental power struggles 19% Mergers and acquisitions 18% Source: TDWI survey run in late 2013. Based on 538 respondents.

Benefits of Multi-Platform Architecture In priority order, based on survey responses All data analytics, in general (61%) Many new platforms are built for analytics: DW appliances, columnar databases, NoSQL databases, Hadoop. With a multi-platform portfolio, users can match an analytic workload to best platform. A diverse platform portfolio can handle a diverse range of data types. This is key to embracing the unstructured and schema-free data types found in most big data. Enables broad data exploration and discovery (43%) A more diverse platform portfolio can aid a business Additional platforms are key to addressing new business requirements (36%), especially data-oriented ones like analytics (61%), more numerous business insights (34%), business optimization (30%) Handling data in real time usually requires an additional purpose-built system. Traditional relational databases and batch-oriented Hadoop systems were not built for real-time operations (33%), though many organizations need faster business processes (26%). Adding low-cost platforms to a DW environ makes big data more affordable. DW appliances, columnar RDBMSs, Hadoop & NoSQL all lower cost for data staging for data warehousing (20%) and data archiving (16%). Source: TDWI survey run in late 2013. Based on 538 respondents.

Barriers to Multi-Platform Architecture In priority order, based on survey responses Inadequate staffing or skills (47%) is the most prominent barrier. Immaturity with new data types and sources (23%) plus new technologies for Hadoop, event processing, and so on make them unprepared for the complexity of multi-platform designs (25%). As usual, organizational and business issues should be settled first. Data ownership and other politics (43%), a lack of business sponsorship (38%), a lack of a compelling business case (25%) A number of data management issues should be addressed. Data integration complexity (36%), poor data quality (34%), lack of data architecture (29%), and data security, privacy, and governance issues (25%) As with any new IT initiative, proper funding is key. Account for the cost of acquiring multiple platforms (25%) and the cost of administering multiple platforms (27%) Source: TDWI survey run in late 2013. Based on 538 respondents.

WHY CAN T A DATA WAREHOUSE DO EVERYTHING? Square Peg Workloads may not fit Round Hole DW Architectures Most data warehouses were designed and optimized for common deliverables and methods: Standard reports, dashboards, performance mgt, online analytic processing (OLAP) This is a design and architectural decision made by users, not a failing of vendor platforms Can/should all DW & analytic workloads run on your EDW? If your EDW can handle multiple mixed concurrent workloads with performance and without impeding other workloads, then run all workloads (including analytics) on the EDW, for simplicity s sake If not, you may need additional data platforms for some workloads

Multi-Platform Data Warehouse Environments Many enterprise data warehouses (EDWs) are evolving into multi-platform data warehouse environments (DWEs). Users continue to add additional standalone data platforms to their warehouse tool and platform portfolio. The new platforms don t replace the core warehouse, because it is still the best platform for the data that goes into standards reports, dashboards, performance management, and OLAP. Instead, the new platforms complement the warehouse, because they are optimized for workloads that manage, process, and analyze new forms of big data, non-structured data, and real-time data.

Ramifications of a Multi-Platform DW Environ Workload-centric DW architecture Assumes that some workloads and their data are best offloaded from the core DW and taken to a platform more suited to them Workloads and data for advanced analytics (not OLAP), SQL-based analytics, unstructured data, massive big data, real time Distributed DW architecture This simply means that data and data structures (as defined in a logical architectural layer) are distributed across multiple physical data platforms Again, the logical layer is the big picture needed with many platforms A distributed DW architecture is both good and bad Good if it serves the unique requirements of multiple workloads and the users that depend on them Bad if platforms proliferate like the dreaded data marts of yore

Growing Complexity in DW System Architectures The technology stack for DW, BI, analytics, and data integration has always been a multi-platform environment. What s new? The trend toward a portfolio of many data platforms has accelerated. Over The Passage of Time Federated Data Federated Marts Data Federated Marts Data Marts Customer Mart Customer or ODS Mart or ODS Real Time ODS DW from a Merger Columnar DBMS Columnar DBMS Map Reduce Complex, Event Processing Data Warehouse Star or Multi- Snowflake dimensional Scheme Data Models Data Staging Data Areas Staging Data Areas Staging Areas Metrics for Performance Mgt OLAP Cubes OLAP DBMSs Detailed Source Detailed Data Source Detailed Data Source Data Analytic Sand Box Data Federation & Virtualization DW Appliance DW Appliances Hadoop Distributed Hadoop File Distributed Sys File Sys No-SQL Database No-SQL Database Streaming Data Tools

EDW Which of the following best describes your extended data warehouse environment today? Pure, central, monolithic EDWs are relatively rare (15%, far left) Likewise, environments without a DW are equally rare (15%, far right) EDWs mix well in hybrid environments (68%, middle three) Central monolithic EDW with no other data platforms Central EDW with many additional data platforms No true EDW, but many workloadspecific data platforms instead 15% 37% 16% 15% 15% DWE Central EDW with a few additional data platforms Many workload-specific data platforms; EDW is present but not the center Other (2%) Source: TDWI survey run in late 2013. Based on 538 respondents.

Which of the following best describes your organization s strategy for evolving your DW environment and its architecture, relative to big data? Most survey respondents plan to extend an existing DW (41%, far left) Few will deploy new data platforms (25%) 29% have no strategy for DW evolution or addressing big data Extend existing core DW to accommodate big data and other new requirements No strategy for DW architecture, though we need one Other (5%) 41% 25% 23% 6% Deploy new data management systems specifically for big data, analytics, real time, etc. No strategy for DW architecture, because we don't need one Source: TDWI survey run in late 2013. Based on 538 respondents.

Hadoop is a Useful Addition to DW Architectures IT COMPLEMENTS AND EXTENDS DATA WAREHOUSES HDFS extends DW Architectures Managing multi-structured data Repository for detailed source data Processing big data for analytics Advanced forms of algorithmic analytics Data staging on steroids ELT push-down processing Inexpensive compared to average DW Hadoop also contributes outside DWs Imagine HDFS as shared infrastructure, similar to SAN & NAS Imagine a huge, live archive Imagine content mgt on steroids

Reporting and Analytics have Different Requirements for Data and DW Architecture Reporting is mostly about entities and facts you know well, represented by highly polished data that you know well. Carefully modeled and cleansed data with rich metadata and master data that s managed in a data warehouse. Most users designed their DWs first and foremost as a repository for reporting and similar practices such as OLAP, performance management, dashboards, and operational BI. Advanced analytics enables the discovery of new facts you didn t know, based on the exploration and analysis of data that s probably new to you. Unlike the pristine data that reports operate on, advanced analytics works best with detailed source data in its original (even messy) form, using discovery oriented technologies, such as ad hoc queries, search, mining, statistics, predictive algorithms, and natural language processing.

Commitment & Growth Components relative to DW Architecture Some components are poised for aggressive adoption by users. Analytics is driving most adoption of new platforms & features. In-memory analytics (36%), analytic sandboxes (29%) Managing non-relational big data is also a pressing need for many organizations. HDFS (34%), open-source MapReduce (32%), vendor-built MapReduce (25%), NoSQL databases (24%) Real-time is just as important as analytics and big data. In-memory database (34%), in-database analytics (29%), solid-state drives (25%), real-time data (24%) Relational technology is more relevant than ever, but in updated forms. Columnar DBMSs (27%), DW appliances (23%)

Top Ten Priorities for DW Architecture These are recommendations, requirements, or rules that can guide you. 1. Recognize that successful data warehouse architectures have integrated logical and physical layers, plus other components. 2. Determine the business and technical drivers in your organization, and let those determine the evolution of your DW architecture. 3. Beware that the leading barrier to successful DW architecture is inadequate staffing and skills. 4. Address other barriers for sponsorship, funding, and improvements to data management infrastructure. 5. Turn on unused features in existing platforms. 6. Establish DW architectures and standards, but be open to exceptions. 7. Be open to hybrids and alternate standards. 8. Consider Hadoop as a DW complement. 9. Remember that analytics and reporting have different data and DW architectural requirements. 10. Don t expect the new stuff to replace the old stuff.

Download a free copy of the report that this Webinar is based on EVOLVING DATA WAREHOUSE ARCHITECTURES IN THE AGE OF BIG DATA Download the report in a PDF file at: tdwi.org/bpreports Feel free to distribute the PDF file of any TDWI Best Practices Report

Q & A Philip Russom Research Director for Data Mgt TDWI prussom@tdwi.org www.bit.ly/philiprussom @prussom on Twitter linkedin.com/in/philiprussom