Whitepaper. Innovations in Business Intelligence Database Technology. www.sisense.com

Similar documents
Jet Enterprise Frequently Asked Questions Pg. 1 03/18/2011 JEFAQ - 02/13/ Copyright Jet Reports International, Inc.

Report Model (SMDL) Alternatives in SQL Server A Guided Tour of Microsoft Business Intelligence

Understanding the Value of In-Memory in the IT Landscape

Drivers to support the growing business data demand for Performance Management solutions and BI Analytics

Introducing Oracle Exalytics In-Memory Machine

QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

Innovative technology for big data analytics

Safe Harbor Statement

The IBM Cognos Platform for Enterprise Business Intelligence

PowerPivot Microsoft s Answer to Self-Service Reporting

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Data Warehouse: Introduction

How To Handle Big Data With A Data Scientist

BI, Analytics and Big Data A Modern-Day Perspective

Five Technology Trends for Improved Business Intelligence Performance

SQL Server 2012 Performance White Paper

Business Intelligence in SharePoint 2013

Fact Sheet In-Memory Analysis

WITH BIGMEMORY WEBMETHODS. Introduction

Using In-Memory Data Fabric Architecture from SAP to Create Your Data Advantage

Large Telecommunications Company Gains Full Customer View, Boosts Monthly Revenue, Cuts IT Costs by $3 Million

Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov

Data Warehouse design

CS2032 Data warehousing and Data Mining Unit II Page 1

In-Memory Business Intelligence

Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778

Why DBMSs Matter More than Ever in the Big Data Era

Whitepaper. 5 Dos and Don ts of Embedded Analytics.

QLIKVIEW SERVER MEMORY MANAGEMENT AND CPU UTILIZATION

QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering

CitusDB Architecture for Real-Time Big Data

Speeding ETL Processing in Data Warehouses White Paper

SQL Server 2012 Parallel Data Warehouse. Solution Brief

Monitoring Genebanks using Datamarts based in an Open Source Tool

iservdb The database closest to you IDEAS Institute

Big Fast Data Hadoop acceleration with Flash. June 2013

hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau

Driving Peak Performance IBM Corporation

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

The difference between. BI and CPM. A white paper prepared by Prophix Software

Reporting trends and pain points of current and new customers IBM Corporation

Real Life Performance of In-Memory Database Systems for BI

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Information management software solutions White paper. Powerful data warehousing performance with IBM Red Brick Warehouse

Creating BI solutions with BISM Tabular. Written By: Dan Clark

SQL Server 2005 Features Comparison

Microsoft Analytics Platform System. Solution Brief

Bringing Big Data Modelling into the Hands of Domain Experts

Whitepaper. 4 Steps to Successfully Evaluating Business Analytics Software.

Anwendersoftware Anwendungssoftwares a. Data-Warehouse-, Data-Mining- and OLAP-Technologies. Online Analytic Processing

Accelerating Business Intelligence with Large-Scale System Memory

Accelerating Business Intelligence with Large-Scale System Memory

Actian Vector in Hadoop

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Qlik s Associative Model

WINDOWS AZURE DATA MANAGEMENT

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Big data big talk or big results?

Super-Charged Oracle Business Intelligence with Essbase and SmartView

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

Data Warehousing. Paper

Hard Disk Drive vs. Kingston SSDNow V+ 200 Series 240GB: Comparative Test

RevoScaleR Speed and Scalability

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

The Cubetree Storage Organization

Business Intelligence, Analytics & Reporting: Glossary of Terms

Oracle Database In-Memory The Next Big Thing

SAP HANA. SAP HANA Performance Efficient Speed and Scale-Out for Real-Time Business Intelligence

IBM DB2 Near-Line Storage Solution for SAP NetWeaver BW

Contents Introduction... 5 Deployment Considerations... 9 Deployment Architectures... 11

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

ANALYTICS BUILT FOR INTERNET OF THINGS

Using In-Memory Computing to Simplify Big Data Analytics

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Copyright 1

CHAPTER 5: BUSINESS ANALYTICS

TECHNICAL PAPER. Infor10 ION BI: The Comprehensive Business Intelligence Solution

How To Test For Performance And Scalability On A Server With A Multi-Core Computer (For A Large Server)

Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc.

Architectures for Big Data Analytics A database perspective

Big Data and Big Data Modeling

Everything you need to know about flash storage performance

Performance with the Oracle Database Cloud

Making Business Intelligence Easy. White Paper Agile Business Intelligence

SAP NetWeaver BW Archiving with Nearline Storage (NLS) and Optimized Analytics

Transcription:

Whitepaper Innovations in Business Intelligence Database Technology

The State of Database Technology in 2015 Database technology has seen rapid developments in the past two decades. Online Analytical Processing (OLAP and its derivatives, MOLAP, ROLAP and HOLAP), which gained prominence in the 1990s, gradually lost altitude in favor of in-memory databases at the start of the 21 st century. However, the requirements of modern business intelligence have set a challenge that in-memory databases will have a very difficult time responding to. This, in turn, has brought on the next generation of databases and querying in-chip analytics. This newly developed technology makes use of the CPU, RAM and disk storage in innovative ways in order to tackle the complexity and size of data sets that current BI software is forced to handle in order to provide effective insights to end users at a reasonable timeframe. This guide will cover: OLAP Cubes history and overview In-memory databases advantages and shortcomings In-chip technology development, overview and promise

OLAP Cubes Summary OLAP technology provided a great basis for business intelligence 20 years ago, but suffers from several limitations which make it a less than ideal fit for most modern BI projects. It allows users to receive quick answers to specific pre-defined queries but is resource intensive and problematic when it comes to larger data sets and ad-hoc querying. Leading Provider: Oracle Pros Cons Centralized data integration Fast data retrieval for specific queries Resource intensive Inflexibility, limited support for ad-hoc queries Long build times

Overview History OLAP is a database technology was first developed in the late 1960s, but only gained widespread commercial use in the 1990s with Microsoft s first release of their OLAP Services product (now Analysis Services), based on technology acquired from Panorama Software. At that point in time, when computer hardware wasn t nearly as powerful as it is today, OLAP was groundbreaking. It introduced a spectacular way for business users (typically analysts) to easily perform multidimensional analysis of large volumes of business data. When Microsoft s Multidimensional Expressions language (MDX) came closer to becoming a standard, more and more client tools (e.g., Panorama NovaView, ProClarity) started popping up to provide even more power to these users. How it Works An OLAP database converts table based datasets into multidimensional arrays called Cubes in order to optimize querying and data retrieval. Users can then access specific dimensions of the data for analysis purposes. For a simplified example, let s think of a chain of pet stores that tracks sales of various items across cities and over time. It might track these figures in a series of spreadsheets such as these:

Whereas in an OLAP cube, the same information would be stored multi-dimensionally: Note that this illustration is somewhat over-simplified. In reality there can be a virtually endless amount of dimensions, which are not necessarily symmetrical. To answer queries, an OLAP cube typically includes roll-up cells which contain aggregated data, according to certain perimeters (in our example, sales over time, or item sales by location). These aggregations are pre-calculated when the system is at rest (i.e. not being used by end-users).

Thus, once a query is made, the answer is already within the data cube and retrieved instantaneously. However, OLAP cubes have their drawbacks, the main ones being: Each additional query requires a new dimension to be added to the cube, which means duplicating the entire cube in terms of data storage. This means that OLAP databases quickly become resource intensive when it comes to data storage and management. Aggregating data requires the CPU to process every cell of the data, which means that each new build (such as when additional data is added) takes a relatively long time to produce. OLAP cubes are very fast when it comes to specific, pre-designed queries. However if a user wants to make a NEW query (e.g., avg. sales of hamsters-per-year), this data is not pre-calculated and will require additional dimensions to be added to the cube a lengthy process.

In-Memory Databases Summary In-memory technology i.e., loading the entire database into RAM and from there transferring it to the CPU to perform calculations has become a leading solution for business intelligence, as it provides users with the ability to receive fast answers to their queries, without the need for lengthy builds and pre-calculations; but the size and complexity of modern data is forcing in-memory databases to face their limitations. Leading Provider: Qlik Pros Cons Fast data retrieval Support for ad-hoc queries Expensive to implement and maintain Scalability issues Overview History In-memory databases became popular in the start of the 21 st century with the proliferation of cheap and widely available 64-bit PCs and the

adoption of columnar databases as an alternative to the row-based systems which were the basis for OLAP cubes. More RAM on a PC meant that more data can be quickly queried. If crunching a million rows of data on a machine with only 2GB of RAM was a drag, users could now add more gigabytes of RAM to their PCs and store data in relational databases which could be queried much faster than before. In-memory databases have become much more prominent in recent years. However OLAP-based solutions can still be found in massive organization-wide implementations. How it Works Generally speaking, a computer has two types of data storage mechanisms disk (often called a hard disk) and RAM (random access memory). The important differences between them are outlined in the following table: DISK Abundant Slow Cheap Long term RAM Scarce Fast Expensive Short term Most modern computers have 15-100 times more available disk storage than they do RAM.

However, reading data from disk is much slower than reading the same data from RAM. This is one of the reasons why 1GB of RAM costs approximately 320 times that of 1GB of disk space. In a disk-based RDBMS, there are two things that cause heavy disk operations and therefore poor performance: 1. Table Scans: Loading of an entire table from disk to RAM (for calculations) 2. Complex Data: Querying data scattered across many tables and/or fields (joins) In-memory technology aims to address both these issues by preloading the entire database into RAM, and loading data from RAM to the CPU to perform calculations and data retrieval. All In-memory technologies share the same premise: that it is simply much faster to perform calculations over data that is stored in RAM than it is when that same data is stored in a table on a disk. These technologies also benefit from the fact that 64-bit computers are currently considered commodity hardware. Additionally, it is relatively cheaper to add more RAM to both commodity and proprietary hardware today than it previously was.

Illustration: Disk/RAM utilization when querying 2 fields This technology enables a much faster time to value and significantly less effort and money invested in developing, setting up and maintaining analytics infrastructure. The problem In-memory technology performs beautifully, at small scales. When datasets are simple and small, it enables speedy development compared to a solution built on top of an RDBMS. However, its main inhibitor to wide enterprise adoption has been scalability. The challenge it continues to face is that RAM, when used to store and analyze raw business data, tends to run out quickly and unexpectedly. As storage sizes go, RAM is tiny and many data sets

these days are too large to fit. Moreover, each query to the database uses up additional RAM for intermediate calculations. Complex scenarios still require that data be extensively modified, or even loaded into an RDBMS data warehouse, prior to being loaded into the memory-based storage. This can happen when data sets are complex and/or when there are many users querying the database simultaneously and repeatedly. In such cases, the added value of such technology is debatable and cost-saving benefits of using it become less significant. The fact of the matter is, data sets are getting bigger and bigger, with companies generating more information than ever both from internal sources and from external ones which business executives look to in order to gain a competitive advantage. This exponential growth in the size of data has not been mirrored by a similar reduction in RAM prices while it is indeed cheaper than it was 15 years ago, it s still relatively expensive storage that cannot be scaled indefinitely without procuring significant costs. And so, at this point in time it seems that in-memory technology might just have hit its glass ceiling, and can no longer promise reasonable performance considering the amounts and complexity of the data that is currently being gathered, aggregated and analyzed by modern businesses.

ElastiCubes and In-Chip Analytics Summary In-Chip Technology is the latest development in database technology. It combines the flexibility of in-memory based querying with the speed and robustness of OLAP cubes, without the hardware costs and difficult implementation of traditional solutions. Although only recently developed and released, In- Chip is quickly gaining popularity due to its increased performance and ability to tackle complex and large data sets. Leading Provider: Sisense Advantages Fastest data retrieval Does not require proprietary hardware or extensive RAM Full support for ad-hoc queries Overview History You might not have heard of ElastiCubes In-Chip Technology yet, as it has only been released for commercial use a few short years ago. However it has already become the data analytics platform of choice for such companies as ebay, Samsung and NASA and is growing

rapidly as an alternative and solution to the limitations imposed by traditional OLAP database technologies. ElastiCube is a unique form of database developed by SIsense, the result of thoroughly analyzing the strengths and weaknesses of both OLAP and in-memory technologies, while taking into consideration the off-the-shelf hardware of today and tomorrow. The vision was to provide a true alternative to OLAP technology, without compromising the speediness of the development cycle and query response times for which in-memory technologies are lauded. This would allow a single technology to be used in BI solutions of any scale, in any industry. How it Works In-Chip Analytics is the latest generation of in-memory technology for business analytics and sets itself apart by being fast as well as scalable. The name ElastiCube comes from the database s unique ability to stretch beyond the hard limitations imposed by older generation technologies. This technology employs a disk-based columnar database for storage to provide fast disk reads and is able to load data from disk to RAM (and vice versa) when is needed. The queries themselves are processed entirely in-memory without any disk-reads throughout. And most importantly, there is only a subset of the data physically stored in RAM at any given time, leaving more space for other operations to take place in parallel in other words, RAM limitations are not as big an issue as with previous in-memory technologies, as there is no need to keep the entire data in RAM on a permanent basis.

This is achieved via advanced compression as well as identification of the parts of the dataset which are not being used on a regular basis and can be left at rest typically this is around 80 percent of the data businesses collect. In-Chip Technology also has a unique way of handling joins. Instead of joining tables, it uses columnar algebra to merge between fields. This way, the join operation can be processed entirely in the CPU cache. Illustration: Disk/RAM utilization when querying 2 fields The table below compares between RDBMS technology, In-Memory technology and Sisense s In-Chip Technology by a set of several technical aspects: Columnar Storage: whether the technology supports storage of columns rather than tables.

In-Memory Query Processing: whether the technology typically requires reads from disk during query execution Performance Upon Installation: Fast query response to queries involving joining, grouping and aggregating data without lengthy preparation work or specialized configuration. Data Capacity: Is there a cap on data capacity beyond what can be stored on a single hard disk (TBs of data). Scalability Level: The ability of the technology to support growing data volumes and concurrent usage without having to significantly modify/re-build the solution. Feature RDBMS In-Memory In-Chip Associative Technology Columnar Storage Some No Yes In Memory Query Processing Performance Upon Installation Data Capacity No Yes Yes Slow Fast Fast Unlimited Limited (by size and RAM) Scalability Level Large scale Small scale Unlimited Small / Large scale In-Chip technology further optimizes data processing by making the most of the built-in components of today s 64-bit commodity hardware. Using algorithms that run beneath the OS and replace its set of instructions, In-Chip manages to utilize the CPU to its fullest, thus achieving unparalleled performance rates even on huge, complex data sets that would previously have required massive hardware upgrades to even consider handling.

Illustration: Latencies of CPU cache, RAM and disk storage Summary: The Future of Databases? We ve reviewed three major database technologies employed by BI software in the past few decades: OLAP cubes, in-memory databases, and up and coming In-Chip Analytics. As we have seen, both OLAP and in-memory technology suffer from scalability issues, and there are significant doubts as to their ability to provide a reasonable solution for the requirements of 21 st century business intelligence, in terms of data size, complexity, and cost to implement. In-Chip Technology is currently the most advanced way to store and query data in rapidly changing business environments, and is

expected to be adopted by more and more companies in coming years. Want to learn more about In-Chip technology? Visit sisense.com Join a Sisense Analytics Expert for a Weekly Live Demo of In-Chip technology at work Questions, notes, or comments on the contents of this document? We d love to hear them! Contact us