Real Life Performance of In-Memory Database Systems for BI



Similar documents
Cost-Effective Business Intelligence with Red Hat and Open Source

Main Memory Data Warehouses

BIG DATA APPLIANCES. July 23, TDWI. R Sathyanarayana. Enterprise Information Management & Analytics Practice EMC Consulting

Big Data and Its Impact on the Data Warehousing Architecture

Performance and Scalability Overview

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances

Netezza and Business Analytics Synergy

Virtuoso and Database Scalability

Performance and Scalability Overview

Big Data Technologies Compared June 2014

Optimize Oracle Business Intelligence Analytics with Oracle 12c In-Memory Database Option

System Requirements Table of contents

2009 Oracle Corporation 1

Inge Os Sales Consulting Manager Oracle Norway

Beyond Conventional Data Warehousing. Florian Waas Greenplum Inc.

SQL Server Business Intelligence on HP ProLiant DL785 Server

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Oracle Exadata Database Machine for SAP Systems - Innovation Provided by SAP and Oracle for Joint Customers

SQL Server PDW. Artur Vieira Premier Field Engineer

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Whitepaper. Innovations in Business Intelligence Database Technology.

NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB

Database Performance with In-Memory Solutions

EMC/Greenplum Driving the Future of Data Warehousing and Analytics

Tiber Solutions. Understanding the Current & Future Landscape of BI and Data Storage. Jim Hadley

Open Source Business Intelligence Intro

Fact Sheet In-Memory Analysis

Microsoft Analytics Platform System. Solution Brief

Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier

hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau

Performance Baseline of Oracle Exadata X2-2 HR HC. Part II: Server Performance. Benchware Performance Suite Release 8.4 (Build ) September 2013

<Insert Picture Here> Oracle Database Directions Fred Louis Principal Sales Consultant Ohio Valley Region

In-memory databases and innovations in Business Intelligence

SNOW LICENSE MANAGER (7.X)... 3

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Microsoft Windows Apple Mac OS X

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Legal Notices Introduction... 3

Oracle Database In-Memory The Next Big Thing

Safe Harbor Statement

HP Oracle Database Platform / Exadata Appliance Extreme Data Warehousing

In-Memory Analytics: A comparison between Oracle TimesTen and Oracle Essbase

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Advanced In-Database Analytics

Bringing Big Data into the Enterprise

The HP Neoview data warehousing platform for business intelligence

Drivers to support the growing business data demand for Performance Management solutions and BI Analytics

inforouter V8.0 Server & Client Requirements

Oracle Database 11g Comparison Chart

SNOW LICENSE MANAGER (7.X)... 3

Microsoft Windows Apple Mac OS X

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

Oracle Exalytics Briefing

Data warehousing with PostgreSQL

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Application-Tier In-Memory Analytics Best Practices and Use Cases

SAP HANA In-Memory Database Sizing Guideline

Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc.

Using Attunity Replicate with Greenplum Database Using Attunity Replicate for data migration and Change Data Capture to the Greenplum Database

Big Data Analytics - Accelerated. stream-horizon.com

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

I/O Considerations in Big Data Analytics

Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers

IBM Netezza High-performance business intelligence and advanced analytics for the enterprise. The analytics conundrum

In-Memory Data Management for Enterprise Applications

How to Build a High-Performance Data Warehouse By David J. DeWitt, Ph.D.; Samuel Madden, Ph.D.; and Michael Stonebraker, Ph.D.

Real-time Data Replication

Using Hadoop to Expand Data Warehousing

DEPLOYING IBM DB2 FOR LINUX, UNIX, AND WINDOWS DATA WAREHOUSES ON EMC STORAGE ARRAYS

Telemetry Database Query Performance Review

In-Memory Business Intelligence

Oracle Big Data SQL Technical Update

Exadata Database Machine

In-Memory Analytics for Big Data

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

Innovative technology for big data analytics

Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

ORACLE DATABASE 10G ENTERPRISE EDITION

Understanding the Benefits of IBM SPSS Statistics Server

QuickDB Yet YetAnother Database Management System?

White Paper February IBM InfoSphere DataStage Performance and Scalability Benchmark Whitepaper Data Warehousing Scenario

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

QlikView Business Discovery Platform. Algol Consulting Srl

How, What, and Where of Data Warehouses for MySQL

Architectures for Big Data Analytics A database perspective

Microsoft s SQL Server Parallel Data Warehouse Provides High Performance and Great Value

Integrated Grid Solutions. and Greenplum

Understanding the Value of In-Memory in the IT Landscape

INTEROPERABILITY OF SAP BUSINESS OBJECTS 4.0 WITH GREENPLUM DATABASE - AN INTEGRATION GUIDE FOR WINDOWS USERS (64 BIT)

Transcription:

D1 Solutions AG a Netcetera Company Real Life Performance of In-Memory Database Systems for BI 10th European TDWI Conference Munich, June 2010

10th European TDWI Conference Munich, June 2010 Authors: Dr. Andreas Hauenstein Dr. Simon Hefti Dr. Andrej Vckovski

In-Memory Database Systems Buzzwords: Column-Orientation, In-Memory, Shared Nothing Meaning: Looks like Oracle/DB2/SQLServer from the outside, just much faster We are talking about relational systems, queryable in SQL We are not talking about client side caching (Microstrategy or QlikView do this) There is a new generation of DB systems, for example MonetDB, Exasol, Greenplum, LucidDB

Business Intelligence Data Warehouse We are not looking at transactional systems Any DB of an online shop or any DB driving a web site is transactional Typically BI applications are driven by a non-transactional data store that is bulk loaded in intervals by an ETL process. This is called a data warehouse. Next generation DB systems also exist for transactional systems. An example is Oracle TimesTen. This is a different subject. DB Systems Spezialized for Transactions (e.g. TimesTen) DB Systems Specialized for Analytics (e.g. Teradata) General Purpose DB Systems (e.g. Oracle, SQL Server)

Business Intelligence Generated SQL Tools with a GUI that generate SQL statements Examples: Business Objects, OBIEE, Microstrategy, Cognos No SQL tuning possible Bad SQL Non-technical users Frequently changing queries Lots of averages and sums, groupings, consolidation

Real Life Problem (1) Consolidation of numbers along a hierarchy Use a Parent-Child Table with a bridge table to do this in a relational DB

Real Life Problem (2) Every company has this sort of problem The most important people (CEO) experience the worst performance OLAP tools exist because this sort of query is traditionally slow on relational systems At a customer, 6 GB of data resulted in a 20 minute wait for the CEO Even Pre-Calculating all reports over night became difficult

The Data Model Bridge Table 400 K Rows 300 K Rows 8191 12 4096 nodes levels leaves 500 K Rows

Size of the Data Blocks Rows DIM_ACCOUNTING 9'780 532 067 DIM_BUSINESSTYPE 10 181 DIM_CLIENT 29 819 453 392 DIM_MEASURE 6 81 DIM _ORG DIM_ORG_FLAT DIM_PRODUCT 123 118 11 875 8 916 53 248 344 380 775 561 blocks * 8 192 Bytes = 6 GB DIM_TIME 11 501 DIM_TRANS 77 3 001 DIM_UNIT 5 81 T_FACTS 723 739 16 019 518 775 561 17 415 366 Quite small data volume Bad performance on several platforms Realistic scenario

Data Generation create_dim( p_bf => 2, p_depth => 12, p_name => 'org', p_cols => 'org01,org02,org03,org04,org05,org06,org07,org08,org09,org10', p_types => 't10,t10,t10,t10,t10,t10,t10,t10,t10,t10 ); One function call creates complete dimension table dim_org Generates id column, parent pointer, bridge table dim_org_flat Generated from a helper table with just integers and random numbers Similar function to generate fact table Started out as PL/SQL, now a Perl script that works with any DB It is easy to model any scenario with this tool

The Test Query Generated by BI tool

Initial Tests on Oracle and SQL Server Aggregated Fact Rows Machine OS DBMS 16 Mio 1 Mio 3500 Description IBM 9117-570 8 GB RAM 1.9 GHt 4 CPU s AIX Oracle 10G 1200 sec 168 sec 167 sec Expensive Production Server Dell Dimension E521 4GB RAM Windows 2003 Server Oracle 10 G 1023 sec 205 sec 159 sec Home PC Dell Dimension E521 4 GB RAM Windows 2003 Server MS SQL Server 2005 741 sec 699 sec 293 sec HP DL 380 Proliant Server 0.5 GB RAM Intel Xeon 3.2 GHz Red Hat Linux Oracle 10 G 1432 sec 413 sec 386 sec Linux with little RAM All the same order of magnitude Adding RAM does not help a traditional DB PCs are better than you think

A New Generation DB System Aggregated Fact Rows Machine OS DBMS 16 Mio 1 Mio 3500 Description IBM 9117-570 8 GB RAM 1.9 GHt 4 CPU s AIX Oracle 10G 1200 sec 168 sec 167 sec Expensive Production Server Dell Dimension E521 4GB RAM Windows 2003 Server Oracle 10 G 1023 sec 205 sec 159 sec Home PC Dell Dimension E521 4 GB RAM Windows 2003 Server MS SQL Server 2005 741 sec 699 sec 293 sec HP DL 380 Proliant Server 0.5 GB RAM Intel Xeon 3.2 GHz Red Hat Linux Oracle 10 G 1432 sec 413 sec 386 sec Linux with little RAM Exasol Test System 2 Quad Core Intel CPU 32 GB RAM 2 nodes Exacluster (Linux Microkernel) Exasol 22 sec 2 sec 0 sec In Memory DB Im memory DB factor 30-50 faster That s the speed of sound relative to a bicycle With generic Intel hardware Worth looking at several of these new systems

A New Generation DB System 1600 1400 1200 1000 800 600 400 200 0 DD SQL DD CRA HP IBM Exa Im memory DB factor 30-50 faster That s the speed of sound relative to a bicycle With generic Intel hardware Worth looking at several of these new systems

The Contenders Oracle 11 G MySQL MonetDB LucidDB Greenplum (their own hardware) Exasol (their own hardware)

The Test Server Intel Dual Xeon E 5205 16 GB RAM 2 x 250 GB SATA Disk 64 Bit Debian Linux

Interesting DB Systems That Were Not Tested Teradata Oracle ExaData Netezza Vertica Infobright Kognitio The field is very active and new products and approaches keep entering the market.

MonetDB Origin: Result of research at CWI in the Netherlands Open Source: Yes Free of Charge: Yes Remarks: o o o Recent publicity through a paper in Communications of the ACM: Breaking the Memory Wall in MonetDB Constantly changing as research progresses Easy to get into direct contact with the developers Quote from the website: MonetDB is a open-source database system for high-performance Applications in data mining, OLAP, GIS, XMLQuery, text and multimedia retrieval.

LucidDB Origin: Formerly part of LucidEra in San Mateo, California Open Source: Yes Free of Charge: Yes Remarks: o Emphasizes ease of configuration and maintenance o Mostly written in Java Quote from the website: LucidDB is the first and only open-source RDBMS purpose-built entirely for data warehousing and business intelligence. It is based on architectural cornerstones such as column-store, bitmap indexing, hash join/aggregation, and page-level multiversioning.

Greenplum Origin: Located in San Mateo, California. Postgres based. Open Source: Based on Open Source Technology Free of Charge: No Remarks: o Based on similiar hardware architecture as Exasol o Highly configurable and tunable, lots of features o Column store is an option, default is row store Quote from the website: Greenplum Database utilizes a shared-nothing MPP (massively parallel processing) architecture that has been designed from the ground up for BI and analytical processing using commodity hardware. In this architecture, data is automatically partitioned across multiple 'segment' servers, and each 'segment' owns and manages a distinct portion of the overall data. All communication is via a network interconnect -- there is no disk-level sharing or contention to be concerned with (i.e. it is a 'shared-nothing architecture).

Exasol Origin: Developed from scratch in Nürnberg, Germany Open Source: No Free of Charge: No Remarks: o Based on similiar hardware architecture as Greenplum o Pure column store DB o Emphasizes ease of administration o No need to create indexes or gather statistics o Imitates some Oracle-isms for compatibility Quote from the website: The database has been specially developed for analysis and is being used successfully for data warehousing, Web analytics, data mining applications and more. In contrast with universal databases, this specialization means that the data to be analyzed can be made available to analysis tools virtually in real time.

Typical Shared Nothing Node Combine many of these, connected by GB Ethernet

Results With 16 Mio Rows in the Fact Table 2500 2280 2000 1500 1000 500 0 460 226 31 13 10 Oracle MySQL LucidDB MonetDB Greenplum Exasol Oracle on a new 64 Bit box is 4 times faster than on an average 32 bit box Both Oracle and LucidDB were twice as fast after dropping all indexes on the fact table (those are the times in the chart) We did not manage to tune MySQL to get acceptable performance for a free system, LucidDB has good performance and little hassle MonetDB needed a fix in the optimizer before coping with the query Next generation in memory DBs are at least one order of magnitude faster

Performance Scaling 400 350 364 300 288 Exasol [sec] (public demo system) 250 200 150 183 133 210 Exasol [sec] (untuned comparable hardware) Exasol [sec] (local dimensions comparable hardware ) Greenplum[sec] 100 105 97 50 26 54 0 13 6 3 16 160 320 Both systems scale linearly It is possible to query at least ten times the data volume efficiently The vendors claim unlimited linear scaling by adding commodity hardware

Conclusion Big Lessons Database technology is in upheaval at the moment By adopting the new technologies, you can totally revolutionize the way you access your data Prices will fall rapidly. This is like the PC revolution. Small Lessons If you have an Oracle on a 32 Bit system, move to a 64 Bit architecture. It will give you a factor 4 without any pain If your table scans are slow, drop all indexes If you move to a new technology, you will get a factor 50 The commercial systems are worth their money. Their SQL is more compatible, and they are more stable