The Pros and Cons of Data Warehouse Appliances



Similar documents
BIG DATA APPLIANCES. July 23, TDWI. R Sathyanarayana. Enterprise Information Management & Analytics Practice EMC Consulting

<Insert Picture Here> Oracle and/or Hadoop And what you need to know

Big Data and Its Impact on the Data Warehousing Architecture

James Serra Sr BI Architect

EMC/Greenplum Driving the Future of Data Warehousing and Analytics

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

MDM for the Enterprise: Complementing and extending your Active Data Warehousing strategy. Satish Krishnaswamy VP MDM Solutions - Teradata

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

Next Generation Data Warehousing Appliances

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

SQL Server PDW. Artur Vieira Premier Field Engineer

Big Data Technologies Compared June 2014

Data Warehouse Appliances: The Next Wave of IT Delivery. Private Cloud (Revocable Access and Support) Applications Appliance. (License/Maintenance)

HP Enterprise Data Warehouse Deep Dive. Steve Tramack, Sr. Engineering Manager, I2A Solutions, HP

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Evolving Solutions Disruptive Technology Series Modern Data Warehouse

Bringing Big Data into the Enterprise

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

DEPLOYING IBM DB2 FOR LINUX, UNIX, AND WINDOWS DATA WAREHOUSES ON EMC STORAGE ARRAYS

Barbarians at the Gate Data Warehouse Appliances Challenge Existing Storage Paradigm

Innovative technology for big data analytics

BIG DATA TECHNOLOGY. Hadoop Ecosystem

OWB Users, Enter The New ODI World

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

2009 Oracle Corporation 1

The Growing Practice of Operational Data Integration. Philip Russom Senior Manager, TDWI Research April 14, 2010

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

Main Memory Data Warehouses

SMB Direct for SQL Server and Private Cloud

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

High performance ETL Benchmark

Using Hadoop to Expand Data Warehousing

Integrated Application and Data Protection. NEC ExpressCluster White Paper

Inge Os Sales Consulting Manager Oracle Norway

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Cost-Effective Business Intelligence with Red Hat and Open Source

Building an Effective Data Warehouse Architecture James Serra

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM

IBM Data Warehousing and Analytics Portfolio Summary

EMC BACKUP MEETS BIG DATA

Business Intelligence in a Hybrid Cloud Environment

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Aaron Werman.

Big Data and the Cloud Trends, Applications, and Training

Virtual Data Warehouse Appliances

Practical Considerations for Real-Time Business Intelligence. Donovan Schneider Yahoo! September 11, 2006

The HP Neoview data warehousing platform for business intelligence Die clevere Alternative

Driving Peak Performance IBM Corporation

Luncheon Webinar Series May 13, 2013

Real Life Performance of In-Memory Database Systems for BI

How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW

I/O Considerations in Big Data Analytics

SQL Server Parallel Data Warehouse: Architecture Overview. José Blakeley Database Systems Group, Microsoft Corporation

SAP Real-time Data Platform. April 2013

Advanced In-Database Analytics

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

Big Data Processing: Past, Present and Future

IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform:

HP Oracle Database Platform / Exadata Appliance Extreme Data Warehousing

Transforming the Economics of Data Warehousing with Cloud Computing

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

Modernizing Your Data Warehouse for Hadoop

2010 Ingres Corporation. Interactive BI for Large Data Volumes Silicon India BI Conference, 2011, Mumbai Vivek Bhatnagar, Ingres Corporation

Big Data With Hadoop

NoSQL for SQL Professionals William McKnight

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

Please give me your feedback

Using Attunity Replicate with Greenplum Database Using Attunity Replicate for data migration and Change Data Capture to the Greenplum Database

PARALLELS CLOUD STORAGE

An Overview of SAP BW Powered by HANA. Al Weedman

Integrating Netezza into your existing IT landscape

Applied Business Intelligence. Iakovos Motakis, Ph.D. Director, DW & Decision Support Systems Intrasoft SA

Microsoft Analytics Platform System. Solution Brief

Netezza and Business Analytics Synergy

Informatica Data Replication: Maximize Return on Data in Real Time Chai Pydimukkala Principal Product Manager Informatica

Innovate and Grow: SAP and Teradata

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Introduction to Database as a Service

5 Signs You Might Be Outgrowing Your MySQL Data Warehouse*


P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland

Big Data & Cloud Computing. Faysal Shaarani

Using distributed technologies to analyze Big Data

The HP Neoview data warehousing platform for business intelligence

The Selection of Netezza for a Data Warehouse Platform

Five Technology Trends for Improved Business Intelligence Performance

Parallel Data Warehouse

Storage Spaces. Storage Spaces

Oracle Exadata: The World s Fastest Database Machine Exadata Database Machine Architecture

Enable BI, Reporting, and ETL Integration with Your App

How To Use Exadata

Evolving Data Warehouse Architectures

Integrated Big Data: Hadoop + DBMS + Discovery for SAS High Performance Analytics

EMC GREENPLUM DATABASE

Greenplum Database. Getting Started with Big Data Analytics. Ofir Manor Pre Sales Technical Architect, EMC Greenplum

Integrating Ingres in the Information System: An Open Source Approach

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Transcription:

TDWI WEBINAR SERIES The Pros and Cons of Data Warehouse Appliances Philip Russom Senior Manager of Research and Services TDWI: The Data Warehousing Institute prussom@tdwi.org www.tdwi.org

Agenda Data Warehouse Appliances Definitions Strategies Pros Cons Conclusions Recommendations 2

Data Warehouse Appliance Definitions What do you think a data warehouse appliance is? 19% = Don't know 13% = Either definition 14% = Any server hardware and database software bundled to create a data warehouse platform 53% = Server hardware and database software built specifically to be a data warehouse platform Source: TDWI Tech Survey, August 2005. 139 respondents. Respondents feel they know what a data warehouse appliance is: Most think the hardware and software are built for data warehousing But a minority think it could be any hardware-software bundle 3

Appliance and Bundle Vendors Data warehouse appliances Server hardware and database software built specifically to be a data warehouse platform Netezza, DATAllegro, Calpont, and Teradata (maybe) Data warehouse bundles Any server hardware and database software bundled to create a data warehouse platform IBM DB2 Integrated Cluster Environment (ICE) for Linux, Sun-Sybase iforce Solutions, Unisys ES7000 BI Solutions Miscellaneous appliances (outside data warehousing) Network Appliance (storage), Google Search Appliance, Thunderstone Search Appliance 4

Data Warehouse Appliance Strategies What s your group s strategy for data warehouse appliances? 8% = Currently evaluating 13% = Plan to evaluate soon 16% = Don't know 44% = No plans 18% = Deployed Source: TDWI Tech Survey, August 2005. 122 respondents. Most respondents don t plan to evaluate a DWA. A surprisingly high number (18%) say they ve already deployed one. One-fifth are evaluating one now or in the near future. 5

Sweet Spot Large Data Marts Most appliances/bundles support large data marts : Vendors describe their customers/prospects this way. Users describe their implementations this way. Mart enables an analytic app, typically focused on analysis of call-level detail, customers, shopping baskets, etc. The large data mart is a sweet spot for appliances/bundles: Users have succeeded with this kind of project. Strategy: Isolated project to prove appliance before expanding. Other sweet spots are coming. A few users with appliances (and many with similar bundles) have deployed an enterprise data warehouse. Eventually, the EDW will be another sweet spot. 6

Data Warehouse Appliance Pros What do you think is the leading benefit of a data warehouse appliance? 6% = Other 7% = Easy incremental expansion Source: TDWI Tech Survey, August 2005. 119 respondents. 8% = Low cost 13% = Fast installation 38% = Pre-tuned for data warehousing 13% = Reduced system integration 15% = Fast query performance Users want guaranteed performance: Users felt pre-tuning and fast queries were most compelling benefits. Few users expect low cost, though vendors compete on this point. 7

Sweet Spot 1TB to 10TB of Data Most appliances today manage between 1TB and 10TB of data: This is another way to describe the sweet spot, whether large data mart or enterprise data warehouse. Sweet spot will shift with data growth: TDWI data: mid/large marts/dws ~15% annual growth. Appliance users start with 1-3TB, grow toward 10TB. Some appliances now deployed with >10TB. Vendors offer appliances of great capacity: Netezza 8650 (27TB), 10400 (>50TB), 10800 (>100TB) DATAllegro C25 (25TB) At this rate of growth, very large data warehouses (>10TB) will be common on data warehouse appliances within 2 years. 8

Data Warehouse Appliance Cons What do you think is the leading problem with a data warehouse appliance? 7% = Don't Know 9% = New training for the new platform 3% = Other 10% = Black box that resists tuning 44% = Proprietary platform that makes migration off it difficult 27% = Single-use hardware that cannot be re-allocated to non-warehouse use Source: TDWI Tech Survey, August 2005. 110 respondents. Users main concern was migrating off an appliance. My perspective: No harder than other relational migrations. Single-use hardware is the reality of all appliances. 9

Open Source in Data Warehousing Adoption of open source in data warehousing is low. Data warehouses on open source operating systems: 41% = No plans 9% = Deployed Source: TDWI Tech Survey, February 2004. 164 respondents. Data warehouses on open source databases: 64% = No plans 8% = Deployed Source: TDWI Tech Survey, February 2004. 167 respondents. Open source databases and operating systems are typical components of DW appliances and similar bundles. Lack of interest in open source among data warehouse professionals could be a barrier to appliance adoption. 10

Conclusions Definition of a data warehouse appliance: Server hardware and database software built specifically to be a data warehouse platform But similar bundles offer similar advantages. Pros: Pre-tuned for DW, fast queries, less system integration Cons: Proprietary platform, single-use hardware Strategy Succeed with Sweet Spots: 1-10 terabyte databases (but growing) Terabyte-size data marts (but enterprise DWs are possible) Expect 10TB+ enterprise data warehouses to be common on data warehouse appliances in 2 years or so. 11

Recommendations Consider a data warehouse appliance for: Analytic app with terabyte-size data mart Intense queries not appropriate to EDW Short dev/deployment time Sponsor wanting low price per TB Budget needing minimal SI cost Apps without a FTE as administrator Other users have succeeded in these situations you can, too. 12

Stuart Frost, CEO

What is a DATAllegro DW Appliance? Complete DW solution From SQL to storage Modular, rack-based appliances True commodity hardware platform Standard motherboard Western Digital enterprise disks Patent-pending architecture Linear scaling Fault tolerant Leverage open source Linux & Ingres DATAllegro P2 & P5 Very high performance 2TB user data - $250k 5TB user data - $450k DATAllegro C25 Good performance 25TB user data - $450k $18k per TB Encrypted versions available 2

Commodity HW & RDBMS Open Source DBMS highly tuned for DSS Commodity components chosen for speed & reliability Standard 2U chassis 800MBps sustained read/write speed per node Flexible /disk balance rides multi-core wave 3

Redundant Array of Inexpensive DW (RAIDW TM ) - MPP with FT & Low Cost 20Gbps redundant Infiniband network Failover pair Master Slave array 4

Ultra Shared Nothing (USN) Multi-level level Partitioning + Replication Multi-level level hash and/or range partitioning within nodes Hash partitioning across nodes FAST LOAD ETL flat file >300GB per hour via dual GbE Tables can be partitioned and/or replicated - Speeds joins & reduces net traffic Partition locking allows real-time updates with minimal impact on queries 5

Direct Data Streaming (DDS) Sequential I/O, no tuning or indexes Master breaks query into steps that run efficiently on Ingres with minimal/no tuning or indexes ODBC/JDBC SQL92 with many SQL99/Oracle/Teradata extensions Individual steps run on slaves with 98% sequential I/O - Faster, more reliable 6

Commodity / Proprietary COMMODITY DBMS COMMODITY OS COMMODITY HW COMMODITY DBMS PROPRIETARY MPP COMMODITY OS CUSTOM ASSEMBLED COMMODITY HW PROPRIETARY DBMS & MPP COMMODITY OS SOME PROPRIETARY HW PROPRIETARY DBMS & MPP COMMODITY OS PROPRIETARY HW PROPRIETARY DBMS PROPRIETARY OS PROPRIETARY HW 7

DW Appliance Pros 38% Pre-tuned 15% Fast query performance 13% Reduced system integration 13% Fast installation 8% Low cost 7% Easy expansion 8

Performance Comparison 300.00 Teradata vs. DATAllegro P3 Benchmark Results 270 DATAllegro Timings Legacy Platform Timings 250.00 Query Performance (minutes) 200.00 150.00 100.00 150 120 80 50.00 55 48 0.00 1.80 3.82 5.62 4.17 4.20 4.22 1 2 3 4 5 6 Query Type 9

DW Appliance Cons 44% Proprietary = lock-in 27% Single use HW 10% Resists tuning 9% New training Standard interfaces (ODBC etc.) DA uses commodity HW Typ. used for new DM etc. Low cost mitigates DA allows indexes, some tuning Performance limits need for tuning Some training required, but far less than traditional platforms 10

Summary DW appliances are here to stay: Price Performance Scalability Ease of installation and use Two successful vendors DATAllegro addresses most concerns raised by TDWI community 11