Tips and Tricks to Partition In-Memory Cubes for Faster Performance

Similar documents
SQL Server and MicroStrategy: Functional Overview Including Recommendations for Performance Optimization. MicroStrategy World 2016

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

Driving Peak Performance IBM Corporation

Native Connectivity to Big Data Sources in MSTR 10

CitusDB Architecture for Real-Time Big Data

Oracle BI Suite Enterprise Edition

Optimize Oracle Business Intelligence Analytics with Oracle 12c In-Memory Database Option

OLAP Services. MicroStrategy Products. MicroStrategy OLAP Services Delivers Economic Savings, Analytical Insight, and up to 50x Faster Performance

An Architectural Review Of Integrating MicroStrategy With SAP BW

Online Courses. Version 9 Comprehensive Series. What's New Series

SQL Server Business Intelligence on HP ProLiant DL785 Server

Monitor and Manage Your MicroStrategy BI Environment Using Enterprise Manager and Health Center

Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier

Performance and Scalability Overview

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server

Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc.

Data Integrator Performance Optimization Guide

SQL Server 2012 Performance White Paper

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect

Using distributed technologies to analyze Big Data

An Overview of SAP BW Powered by HANA. Al Weedman

<Insert Picture Here> Enhancing the Performance and Analytic Content of the Data Warehouse Using Oracle OLAP Option

AtScale Intelligence Platform

IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances

Top 10 Performance Tips for OBI-EE

MicroStrategy Course Catalog

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Creating a universe on Hive with Hortonworks HDP 2.0

Introducing Oracle Exalytics In-Memory Machine

The IBM Cognos Platform for Enterprise Business Intelligence

Tuning Tableau Server for High Performance

Tableau Visual Intelligence Platform Rapid Fire Analytics for Everyone Everywhere

Scalability and Performance Report - Analyzer 2007

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Creating Hybrid Relational-Multidimensional Data Models using OBIEE and Essbase by Mark Rittman and Venkatakrishnan J

Tips and tricks for using SAP BusinessObjects Web Intelligence with SAP BW

BI4Dynamics provides rich business intelligence capabilities to companies of all sizes and industries. From the first day on you can analyse your

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

PUBLIC Performance Optimization Guide

TIBCO Live Datamart: Push-Based Real-Time Analytics

SQL Server 2012 Business Intelligence Boot Camp

HP Vertica and MicroStrategy 10: a functional overview including recommendations for performance optimization. Presented by: Ritika Rahate

SAP BO 4.1 COURSE CONTENT

Condusiv s V-locity Server Boosts Performance of SQL Server 2012 by 55%

Microsoft Services Exceed your business with Microsoft SharePoint Server 2010

Sisense. Product Highlights.

Business Intelligence & Product Analytics

August 2014 San Antonio Texas The Power of Embedded Analytics with SAP BusinessObjects

<Insert Picture Here> Best Practices for Extreme Performance with Data Warehousing on Oracle Database

How To Handle Big Data With A Data Scientist

QlikView 11.2 SR5 DIRECT DISCOVERY

In-Memory Analytics for Big Data

From Spark to Ignition:

America s Most Wanted a metric to detect persistently faulty machines in Hadoop

Understanding and Evaluating the BI Platform by Cindi Howson

Application-Tier In-Memory Analytics Best Practices and Use Cases

White Paper November Technical Comparison of Perspectium Replicator vs Traditional Enterprise Service Buses

Using Microsoft Business Intelligence Dashboards and Reports in the Federal Government

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

Apache Kylin Introduction Dec 8,

Understanding the Value of In-Memory in the IT Landscape

Oracle Database In-Memory The Next Big Thing

Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III

LEARNING SOLUTIONS website milner.com/learning phone

Cost-Effective Business Intelligence with Red Hat and Open Source

Business Intelligence and Healthcare

#mstrworld. No Data Left behind: 20+ new data sources with new data preparation in MicroStrategy 10

SQL Server Administrator Introduction - 3 Days Objectives

How to Create Dashboards. Published

Oracle OLAP 11g and Oracle Essbase

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

MOC 20467B: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Tap into Hadoop and Other No SQL Sources

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

An Accenture Point of View. Oracle Exalytics brings speed and unparalleled flexibility to business analytics

Database Scalability and Oracle 12c

IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance

OLAP Theory-English version

When to consider OLAP?

Budgeting and Planning with Microsoft Excel and Oracle OLAP

Whitepaper. Innovations in Business Intelligence Database Technology.

Data warehousing/dimensional modeling/ SAP BW 7.3 Concepts

Performance and Scalability Overview

Exadata Database Machine

Oracle BI 11g R1: Build Repositories

Toronto 26 th SAP BI. Leap Forward with SAP

Oracle MulBtenant Customer Success Stories

Turning your Warehouse Data into Business Intelligence: Reporting Trends and Visibility Michael Armanious; Vice President Sales and Marketing Datex,

How To Use Big Data For Telco (For A Telco)

A Comparison of Oracle Performance on Physical and VMware Servers

Mobile Application Performance

Exploring the Synergistic Relationships Between BPC, BW and HANA

<Insert Picture Here> Oracle and/or Hadoop And what you need to know

Transcription:

Tips and Tricks to Partition In-Memory Cubes for Faster Performance Presented by: Trishla Maru

Agenda Understanding MicroStrategy s In-Memory Architecture Tips and Tricks: Enabling Parallel Data Fetch Enabling the Correct Number of Partitions Enabling Partition on the Appropriate Attribute Using Cubes vs View Reports Performance Improvements and Customer Success Stories Summary and Q&A 2

Architecture of MicroStrategy s In-Memory Capability Web and mobile output API Parallel query execution VISUALIZATION Application Engines Analytics Engines API Tightly coupled for minimal computational distance Data partitioning within cores Optimized in-memory data structure DATA DATA DATA DATA Parallel data loading SOURCE DATA

MicroStrategy s In-Memory Offering Co-exists With Existing Databases Does not replace databases Data Warehouse In-Memory Layer Functions as Hot data layer for apps requiring high performance Drill through to databases for detail Load from databases or directly from files and Hadoop DATA SOURCE

MicroStrategy s Next Generation In-Memory Analytics PARALLEL PARTITIONED IN-MEMORY CUBES Ability to generate parallel queries and fetch it in parallel from the underlying source. Ability to partition the data in the cube Cubes with flexible schema. No pre-joins. Improves the speed of cube publication No 2B row limit per cube. Each cube can be divided into partitions, each partition can contain up to 2B rows Higher Data Throughput Higher Capacity/Data Scalability More Efficient, Optimized, Scalable Cubes for Building Fast Performing Dashboards

Tips and Tricks 6

Enabling Parallel Queries with MicroStrategy s In-Memory Analytics Improve the speed of cube publication by generating maximum parallel queries per report Project level setting to configure the Maximum Parallel Queries Per Report. Network Transfer Rate depends on the theoretical limit between data source and Intelligence Server. Each table being imported would be executed over a single thread. In order to parallelize a big table, user may want to build multiple views representing slices to be fetched over independent connection. 7

Enabling Parallel Data Fetch Option This option is specifically available for OLAP cubes built using the Developer tool This is a different option and not to be confused with Parallel Query Execution option. Allows for SQL Select Pass for Metrics (typically the last pass) to be fetched over multiple ODBC connections. Users are allowed to switch between Permanent Table (for Generic) and Derived Table syntax (optimal for Single Select). Number of maximum parallel queries for Parallel Data Fetch would be governed by Number of Partitions. 8

Parallel Data Fetch Compatibility with VLDB Settings Parallel Data Fetch currently does not take effect for certain VLDB settings Parallel Data Fetch for OLAP Cubes would NOT take effect if any of the following are set: Insert Mid/Pre/Post Statements Table Pre/Post Statements We do plan to overcome this limitation in the coming releases. 9

Parallel Data Fetch Compatibility with VLDB Settings Parallel Data Fetch currently does not take effect for certain VLDB settings Parallel Data Fetch for OLAP Cubes would NOT take effect if any of the following options are chosen for Data Population for Intelligent Cubes. 10

Enabling Partitioning of In-Memory Cubes Key Facts Ability to partition the cubes is one of the biggest advantage of the new in-memory analytics introduced in v10. It helps to increase the capacity/data scalability of cubes. Although, entirely optional. If the user is not partitioning the data, the published cube consists of only one table. How Many Partitions Can the Cube Have? Depends on the number of cores used by Intelligence Server. If Intelligence Server is restricted to certain number of cores, through CPU affinity, number of partitions will also be restricted to the limit. Currently, can only partition on a single attribute. 11

Where to Define the Partition? Using MicroStrategy Developer go to Intelligent Cube editor-> Data menu-> Configure Intelligent Cube-> Options-> Data Partition 12

Where to Define the Partition? Using MicroStrategy Web go to Data Import Cube editor-> Edit -> All Objects View on Preview screen 13

Analytical Functions Support with Partitioning Partitioning does limit the types of aggregations that can be performed really fast on the raw data. A list of functions that can be handled include distributive functions such as SUM, MIN, MAX, COUNT, PRODUCT Semi-distributive functions such as STD DEV, VARIANCE that can be re-written using distributive functions. Scalar functions such as Add, Greatest, Date/Time Functions, String manipulation functions, etc. are also supported. DISTINCT COUNTs on the partition attribute are also supported. Derived metrics using any of the MicroStrategy 250+ functions are supported Non-Distributive functions, may witness high CPU and memory consumption. 14

Recommendations to Select the Partition Key Some of the largest fact tables in the application are typically good candidates for partitioning and thus influence the choice of the partition attribute. Attributes that are frequently used for filtering or selections don t make for good partition attributes, as they tend to push the analysis towards specific sets of partitions thus minimizing the benefits of parallel processing. Partition attribute should also allow for near uniform distribution of data across the partitions, so that the workload on each partition is more evenly distributed. Columns on which some of the larger tables in the application are joined also make for good partition attributes To support best dashboard execution and concurrency performance, we have chosen to limit the number of logical CPUs engaged for any single grid evaluation to 4. 15

Recommendations to Make How Many Number of Partitions Each partition can hold a maximum of 2 billion rows, so the number of partitions should be picked accordingly. Typically, the number of partitions should be set to be equal to half the number of logical cores available to Intelligence server. This maximizes CPU usage to offer the best possible performance during cube publishing. Lower cap on the number of partitions would be dictated by the number of rows in the largest table divided by 2 billion, since each partition can hold up to 2 billion records. Higher cap would be dictated by the number of cores on the box. The number of partitions should typically be in-between these two numbers and closer to Half the number of logical cores. 16

General Cube Sizing Ensure that the Intelligence Server has the capacity to support all cubes in memory. In estimation of memory consumption, the RAM can consume up to 3 times the table size. In the case where the cube has multiple tables, each table size can be added up to estimate the peak memory requirement. In-memory Partitioning generally results in more memory requirement as compared to No Partitioning. For understanding PRIME Cube structure better, one can enable Engine -> CSI logs before publishing the cubes. 17

Cube as Dataset Vs View Report as Dataset Working Set/View/Normal Reports, used as datasets in documents, currently cannot support multi pass analytics for document grids. View reports allow customers to drag and drop more out of box derived metrics into different dashboards that can be built and saved in view reports. We are looking to reduce that gap for Cubes. View reports are less optimal than cubes when using as datasets. 18

Performance Improvements and Case Studies 19

Benefits with Next Generation In-Memory Analytics Access the Database With Higher Throughout Create and Publish the Cube With Higher Data Scalability Analyze the Data With Faster Response Time OLAP Services Data: 5M rows Fetch Rate: 5074kb/sec OLAP Services Data: 2.35B rows Failed due to 2 billion row limit OLAP Services Data: 8M rows Response Time: 0:06:33 Prime 8Partitions Data: 5M rows Fetch Rate: 22454kb/sec Prime 8Partitions Data: 2.35B rows Publish time: 5:14:23 Cube size: 265GB Prime 8Partitions Data: 8M rows Response Time: 0:04:25 Upload data 4 times faster Increase the data scalability up to 80 times 50% Faster Data Interactions

Response Time (sec) Response Time (sec) Significant Performance Improvements with New In-Memory Engine 35 30 490 390 25 20 15 10 5 290 190 90 0 R1 R3 R5 R7 R9 R11 R13 R15 R17 R19 R21 R23 R25 Reports of Online Travel Industry -10 R1 R3 R5 R7 R9 R11 R13 R15 R17 R19 R21 R23 R25 R27 R29 Reports of Largest ecommerce Industry OLAP Services Prime Non Partition Prime with Partition

Leading Financial Services Key BI Characteristics: INDUSTRY: Finance BI COMPONENTS: Reports, Dashboards, VI DATABASE: Oracle, Essbase HARDWARE 40-core CPU, 2TB RAM CUBE DETAILS 850 million row fact table APPLICATIONS: Financial Data Analysis PRODUCT VERSION: 10.2.0 Business Use and Benefits Ability to provide an ad-hoc controlled dashboard to their end users, that are able to analyze their $ metrics across a variety of dimensions. Dashboard performance improved from 6 minutes (on 9.4.1) to 12 seconds with v10.2.0. This is also helping the company to move away from a cube database and leverage MicroStrategy s in-memory to give users the ad-hoc query performance within seconds.

Multi National Technology Company Key BI Characteristics: INDUSTRY: Technology BI COMPONENTS: Reports, Dashboards, VI DATABASE: Teradata APPLICATIONS: CEO Dashboard PRODUCT VERSION: 10.2.0 Business Use and Benefits Ability to provide a sophisticated high performant dashboard used by the CEO Dashboard previously took 40 hours to run and now is published every day.

Summary BI Applications Need to be Highly Performant and Scalable MicroStrategy is ready to support these with its next generation in-memory analytics Data Loads Need to be Faster MicroStrategy offers it with parallel data fetch capability Increasing Data Volumes That Need to be Analyzed MicroStrategy s data partitioning achieves higher data scalability Need for Interactive Faster Running Dashboards MicroStrategy s visualization capabilities are tightly coupled with the underlying data fetching layer Customers Say it All Numerous success stories proves the quality and functionality of MicroStrategy s robust BI platform

Questions? Q&A