ROLAP with Column Store Index Deep Dive. Alexei Khalyako SQL CAT Program Manager alexeik@microsoft.com

Similar documents
Columnstore Indexes for Fast Data Warehouse Query Processing in SQL Server 11.0

The Cubetree Storage Organization

DATA WAREHOUSING - OLAP

SQL Server 2012 Performance White Paper

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Business Intelligence, Data warehousing Concept and artifacts

Multi-dimensional index structures Part I: motivation

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #

Oracle Database In-Memory The Next Big Thing

SQL Server Administrator Introduction - 3 Days Objectives

Apache Kylin Introduction Dec 8,

Week 13: Data Warehousing. Warehousing

SQL Server Analysis Services Complete Practical & Real-time Training

IST722 Data Warehousing

SQL Server 2012 Optimization, Performance Tuning and Troubleshooting

W I S E. SQL Server 2008/2008 R2 Advanced DBA Performance & WISE LTD.

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Presented by: Jose Chinchilla, MCITP

SAS BI Course Content; Introduction to DWH / BI Concepts

Data Warehousing and OLAP

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing

Week 3 lecture slides

When to consider OLAP?

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse?

Using distributed technologies to analyze Big Data

Performance Tuning and Optimizing SQL Databases 2016

Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc.

70-467: Designing Business Intelligence Solutions with Microsoft SQL Server

<Insert Picture Here> Best Practices for Extreme Performance with Data Warehousing on Oracle Database

Performance Verbesserung von SAP BW mit SQL Server Columnstore

Business Intelligence & Product Analytics

Monitoring Genebanks using Datamarts based in an Open Source Tool

Part 22. Data Warehousing

Columnstore in SQL Server 2016

OLAP Systems and Multidimensional Expressions I

Business Intelligence in SharePoint 2013

SQL Server Business Intelligence on HP ProLiant DL785 Server

Data Warehouse: Introduction

Learning Objectives. Definition of OLAP Data cubes OLAP operations MDX OLAP servers

Designing a Dimensional Model

The Arts & Science of Tuning HANA models for Performance. Abani Pattanayak, SAP HANA CoE Nov 12, 2015

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

Microsoft Data Warehouse in Depth

University of Gaziantep, Department of Business Administration

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

SQL Server 2008 Performance and Scale

DATA WAREHOUSING AND OLAP TECHNOLOGY

Execution Plans: The Secret to Query Tuning Success. MagicPASS January 2015

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Data Warehousing. Paper

Creating BI solutions with BISM Tabular. Written By: Dan Clark

Retail POS Data Analytics Using MS Bi Tools. Business Intelligence White Paper

Microsoft Services Exceed your business with Microsoft SharePoint Server 2010

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

M Designing and Implementing OLAP Solutions Using Microsoft SQL Server Day Course

SQL 2016 and SQL Azure

OLAP Theory-English version

Building Data Cubes and Mining Them. Jelena Jovanovic

Whitepaper. Innovations in Business Intelligence Database Technology.

LEARNING SOLUTIONS website milner.com/learning phone

Towards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration. Sina Meraji sinamera@ca.ibm.com

The Database is Slow

How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

MS SQL Performance (Tuning) Best Practices:

PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor

Using SQL Server 2014 In-Memory Optimized Columnstore with SAP BW

AV-005: Administering and Implementing a Data Warehouse with SQL Server 2014

Introducing Oracle Exalytics In-Memory Machine

SQL Server and MicroStrategy: Functional Overview Including Recommendations for Performance Optimization. MicroStrategy World 2016

Who am I? Copyright 2014, Oracle and/or its affiliates. All rights reserved. 3

Safe Harbor Statement

SQL SERVER BUSINESS INTELLIGENCE (BI) - INTRODUCTION

How, What, and Where of Data Warehouses for MySQL

MS SQL Server 2014 New Features and Database Administration

Review. Data Warehousing. Today. Star schema. Star join indexes. Dimension hierarchies

JOURNEY TO VERTICA+ROLAP SOLUTION. Ramūnas Balukonis, Senior Developer, adform

Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778

Decision Support. Chapter 23. Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1

OLAP Services. MicroStrategy Products. MicroStrategy OLAP Services Delivers Economic Savings, Analytical Insight, and up to 50x Faster Performance

Overview. Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Data Warehousing. An Example: The Store (e.g.

Data Warehousing and Data Mining

Beyond Plateaux: Optimize SSAS via Best Practices

Understanding Query Processing and Query Plans in SQL Server. Craig Freedman Software Design Engineer Microsoft SQL Server

Turning your Warehouse Data into Business Intelligence: Reporting Trends and Visibility Michael Armanious; Vice President Sales and Marketing Datex,

Oracle Database 11g: SQL Tuning Workshop Release 2

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

SAP BUSINESS OBJECTS BO BI 4.1 amron

Data warehousing/dimensional modeling/ SAP BW 7.3 Concepts

IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance

Transcription:

ROLAP with Column Store Index Deep Dive Alexei Khalyako SQL CAT Program Manager alexeik@microsoft.com

What are we doing now? 1000 of concurrent users 4TB Cube High speed loading

Current design 14 th Jan 2012 15 th Jan 2012 16 th Jan 2012 17 th Jan 2012 15 th Jan US Europe Asia

Physical implementation Date Region VIEW

MOLAP ROLAP Move data 2x times Partitioning complex Bitmap Index Data loaded one time Partitioning simple Column Store Index

Good to know: Compression 450,00 Size,GB 400,00 350,00 300,00 250,00 200,00 150,00 100,00 50,00 0,00 RAW PAGE CSI

Good to know: Scalability of Index Build Index building time scales well up to 32 cores Reached 2,5 million rows/sec 3000000 Rows/sec vs DOP 2500000 2000000 1500000 1000000 Rows/sec 500000 0 4 8 16 24 32 40

It should just work Let s do the dumb thing: one-to-one conversion

Duh!

IS there a better way? MOLAP 1 Table Per Day... Per region. Per Slice 55 K fact tables ROLAP 1 Partition Per day 8 hash tables for the fact 8 Partitioned Fact tables Remove table to delete SWITCH to delete

New partitioning Before After One table partitioned by day Day Slice Hash value used to support fast load

List of qualifying rows All we need is.. BATCH.. Or vector processing object Column vectors Process a batch of rows at a time stored in vector form Vector mode operators implemented Filter, hash join, hash aggregation Greatly reduced CPU time

Indexing strategy We build a column store index on the fact table Dimensions stay HEAP

Indexing Fact table Include all columns Do not forget to align with partition function

Small mistakes matter MOLAP developer : It is not very accurate, BUT! It is not big problem, that sometimes join columns data types do not match if this gives right results Relational guy: Well, this is not Kimball, this burns CPU, query will suffer from that..

Join Key This is how we want to see the join Key to look a like, instead of Fact Dim_SK (int) Column Dimension Dim_SK numeric(25,0)

INT/ NUMERIC(18,0) Runtime: 20 sec Execution mode: BATCH INT/INT Runtime:1 sec Execution mode: BATCH

Beware of row execution Row execution indicator! ROW

Recap: Hash Joins Build Probe

Spill on the Hash Build Side SELECT session_id,requested_memory_kb,granted_memory_kb,used_memory_kb from sys.dm_exec_query_memory_grants session_id requested_memory_kb granted_memory_kb used_memory_kb 51 438536 438536 42527248

Spill on the Dim Search Side Spill in TempDB

Young people do not have cars Own Car SELECT <columns> From Customer_Dim WHERE [Distince_To_Work] BETWEEN 10 and 20 AND [Age] BETWEEN 25 and 30 And [Private_Car_Owns] ='Yes' 50% 40% 30% 20% 10% 0% 20-30 30-40 40-50 50-60 60-70 Distance to work Own Car 10 25 50 50+

We can help SQL Create filtered composite statistics Create statistics YoungCarOwners On Customer_Dim ([Distince_To_Work], [Age], [Private_Car_Owns]) Where [Age]> 25 and [Age] <30 And [Private_Car_Owns] ='Yes'

Composite statistics in action Added two filtered stats on one dim query_id test_name time/sec test_name time/sec % Improvemnt 15 ROLAP_NO_CUSTOM_STAT 103,78 ROLAP_WITH_CUSTOM_STAT 48,69 53,08% 99 ROLAP_NO_CUSTOM_STAT 84,09 ROLAP_WITH_CUSTOM_STAT 40,36 52,00% 24 ROLAP_NO_CUSTOM_STAT 94,93 ROLAP_WITH_CUSTOM_STAT 51,87 45,36% 72 ROLAP_NO_CUSTOM_STAT 6901,83 ROLAP_WITH_CUSTOM_STAT 4401,19 36,23% 9 ROLAP_NO_CUSTOM_STAT 4623,60 ROLAP_WITH_CUSTOM_STAT 3095,50 33,05% 6 ROLAP_NO_CUSTOM_STAT 157,78 ROLAP_WITH_CUSTOM_STAT 122,13 22,59% 45 ROLAP_NO_CUSTOM_STAT 7882,40 ROLAP_WITH_CUSTOM_STAT 6186,38 21,52% 93 ROLAP_NO_CUSTOM_STAT 944,48 ROLAP_WITH_CUSTOM_STAT 753,84 20,18%

Estimated Plan Type Row Row PROBE Pushdown Execution Mode Bitmap Filters No Probe No Bitmap

SELECT column1, column2 FROM Fact F JOIN Dim A JOIN Dim B JOIN Dim C-- (this is big) WHERE DimA.Col1 = value1 And DimB.Col = Val1 Group By DimC.Col1 Query pattern? If JOIN result small Loop into C If JOIN result big Build Hash on C Can we make this fast? C B A F

Expensive Loop If JOIN result small Loop into C Dim C

CSI on the Dimension All great as long as the estimates are correct Dim C

Also good plan As long as there are no misestimates.. Probe pushed down to the Dimension

Misestimate of the JOIN result Dim C

Composite statistics on the fact table 80% 20% My Restaurant Chain Germany UK 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% SELECT <columns> FROM Fact.MyRestaurants Join on Dim.Country Join on Dim.Products WHERE Fact.Country = 'Germany' AND Fact.Product = 'Sausage' Sausage Pie UK Germany

Intelligent HINTS OPTION (HASH JOIN, Nested LOOP) We know how to do that in relational, but query sent from MOLAP

Hacking SSAS <!-- Area of CORE parametrizations: These are externally checked --> <mssqlcrt:provider type="prefix" managed="yes" native="yes">microsoft SQL Server.09</mssqlcrt:provider> <mssqlcrt:provider type="prefix" managed="yes" native="yes">microsoft SQL Server.10</mssqlcrt:provider> <mssqlcrt:provider type="prefix" managed="yes" native="yes">microsoft SQL Server.11</mssqlcrt:provider> <mssqlcrt:parameter-style native="unnamed" managed="named"/> <xsl:param name="post-select-query-hint"> OPTION (HASH JOIN, NESTED LOOP JOIN) </xsl:param>

Before Hinting Runtime: 29sec After Runtime: 14 sec

Title, ms Impact of the Degree of Parallelism 30000 25000 20000 15000 10000 5000 0 DOP 0 DOP4 DOP8 DOP16 DOP32 DOP64 Query1 17761,75 9704 3885 4337 6349 10343 Query2 7382,75 3260,5 3514,25 4220 5784 7578

Compare with MOLAP Before Optimizations After Optimizations 500,0% 400,0% 300,0% 200,0% 100,0% 0,0% -100,0% -200,0% -300,0% -400,0% 500,00% 400,00% 300,00% 200,00% 100,00% 0,00% -100,00% -200,00% -300,00% -400,00% ROLAP is Ok

Check List Dimensional Modeling Join columns data types matching Up-to-date or custom statistics No spills on the disk mode column store index scan Column store on the big dimensions

?!

Additional Resources Resource list in columnstore FAQ: http://social.technet.microsoft.com/wiki/contents/articles/s ql-server-columnstore-index-faq.aspx Includes tuning guide, white paper, video talk, video demo, and in-depth technical paper Column Store tuning guide Great resource for SIs & consultants

VERY BIG customer Motivation Estimates 4 TB of data in the SSAS cube Near real time (XX TB/sec, which is..million rows/sec) Demand: Store more Fresh data

Why ROLAP? ROLAP stands for Relational Online Analytical Processing. ROLAP is an alternative to the MOLAP (Multidimensional OLAP) technology. While both ROLAP and MOLAP analytic tools are designed to allow analysis of data through the use of a multidimensional data model, ROLAP differs significantly in that it does not require the pre-computation and storage of information. Instead, ROLAP tools access the data in a relational database and generate SQL queries to calculate information at the appropriate level when an end user requests it. With ROLAP, it is possible to create additional database tables (summary tables or aggregations) which summarize the data at any desired combination of dimensions. While ROLAP uses a relational database source, generally the database must be carefully designed for ROLAP use. A database which was designed for OLTP will not function well as a ROLAP database. Therefore, ROLAP still involves creating an additional copy of the data. However, since it is a database, a variety of technologies can be used to populate the database.

Thoughts Relational engine scales better Loaded once we don t need to do anything else query will need simply go n grab data Column store index: This new index, combined with enhanced query optimization and execution features, improves data warehouse query performance by hundreds to thousands of times in some cases, and can routinely give a tenfold speedup for a broad range of queries fitting the scenario for which it was designed. - source: Columnstore Indexes for Fast Data Warehouse Query Processing in SQL Server 11.0

exec sp_executesql N' SELECT SUM ( [dbo_fct_riskvalue_master_03].[value_usd] ) AS [dbo_fct_riskvalue_master_03value_usd0_0], SUM ( [dbo_fct_riskvalue_master_03].[value_local_ccy] ) AS [dbo_fct_riskvalue_master_03value_local_ccy0_1],[dbo_fct_riskvalue_master_03].[value_date] AS [dbo_fct_riskvalue_master_03value_date0_5],[dbo_uvw_bm_business_hierarchy_vw_15].[sub_desk_id] AS [dbo_uvw_bm_business_hierarchy_vwsub_desk_id2_0],[dbo_uvw_market_dataset_vw_16].[curve_quality] AS [dbo_uvw_market_dataset_vwcurve_quality3_0] FROM [dbo].[fct_riskvalue_master_03] AS [dbo_fct_riskvalue_master_03],[dbo].[uvw_bm_business_hierarchy_vw] AS [dbo_uvw_bm_business_hierarchy_vw_15],[dbo].[uvw_market_dataset_vw] AS [dbo_uvw_market_dataset_vw_16],[dbo].[uvw_market_segment_vw] AS [dbo_uvw_market_segment_vw_1],[dbo].[uvw_risk_measure_vw] AS [dbo_uvw_risk_measure_vw_2],[dbo].[uvw_scenario_vw] AS [dbo_uvw_scenario_vw_3] WHERE ( ( [dbo_fct_riskvalue_master_03].[market_segment1_id]=[dbo_uvw_market_segment_vw_1].[ms_id] ) AND ( [dbo_fct_riskvalue_master_03].[risk_measure_id]=[dbo_uvw_risk_measure_vw_2].[risk_measure_id] ) AND ( [dbo_fct_riskvalue_master_03].[scenario_id]=[dbo_uvw_scenario_vw_3].[scenario_id] ) AND ( [dbo_fct_riskvalue_master_03].[book_id]=[dbo_uvw_bm_business_hierarchy_vw_15].[book_id] ) AND ( [dbo_fct_riskvalue_master_03].[market_dataset_id]=[dbo_uvw_market_dataset_vw_16].[mds_id] ) AND ( [dbo_uvw_risk_measure_vw_2].[risk_measure_name]= @P1

( [dbo_fct_riskvalue_master_03].[market_segment1_id]=[dbo_uvw_market_segment_vw_1].[ms_id] ) AND ( [dbo_fct_riskvalue_master_03].[risk_measure_id]=[dbo_uvw_risk_measure_vw_2].[risk_measure_id] ) AND ( [dbo_fct_riskvalue_master_03].[scenario_id]=[dbo_uvw_scenario_vw_3].[scenario_id] ) AND ( [dbo_fct_riskvalue_master_03].[position_id]=[dbo_uvw_position_vw_5].[position_id] ) AND ( [dbo_fct_riskvalue_master_03].[issuer_id]=[dbo_uvw_issuer_vw_14].[curve_id] ) AND ( [dbo_fct_riskvalue_master_03].[book_id]=[dbo_uvw_bm_business_hierarchy_vw_15].[book_id] Nested Loop Join query exec sp_executesql N' SELECT SUM ( [dbo_fct_riskvalue_master_03].[value_usd] ) AS [dbo_fct_riskvalue_master_03value_usd0_0], SUM ( [dbo_fct_riskvalue_master_03].[value_local_ccy] ) AS [dbo_fct_riskvalue_master_03value_local_ccy0_1],[dbo_fct_riskvalue_master_03].[value_date] AS [dbo_fct_riskvalue_master_03value_date0_6],[dbo_uvw_position_vw_5].[product_name] AS [dbo_uvw_position_vwproduct_name1_0],[dbo_uvw_bm_business_hierarchy_vw_15].[sub_desk_id] AS [dbo_uvw_bm_business_hierarchy_vwsub_desk_id3_0] FROM [dbo].[fct_riskvalue_master_03] AS [dbo_fct_riskvalue_master_03],[dbo].[uvw_position_vw] AS [dbo_uvw_position_vw_5],[dbo].[uvw_bm_business_hierarchy_vw] AS [dbo_uvw_bm_business_hierarchy_vw_15],[dbo].[uvw_market_segment_vw] AS [dbo_uvw_market_segment_vw_1],[dbo].[uvw_risk_measure_vw] AS [dbo_uvw_risk_measure_vw_2],[dbo].[uvw_scenario_vw] AS [dbo_uvw_scenario_vw_3],[dbo].[uvw_issuer_vw] AS [dbo_uvw_issuer_vw_14],[dbo].[uvw_market_dataset_vw] AS [dbo_uvw_market_dataset_vw_16] WHERE (

First look all is nice Plan

Zooming out

) AS [ARISK_RUFF_OLAP_G2_RISK_VALUE_MODEL_SQL], [dbo].market_segment_vw AS [dbo_uvw_market_segment_vw_1], [dbo].risk_measure_vw AS [dbo_uvw_risk_measure_vw_2], [dbo].scenario_vw AS [dbo_uvw_scenario_vw_3] WHERE ( ( [ARISK_RUFF_OLAP_G2_RISK_VALUE_MODEL_SQL].[MARKET_SEGMENT1_ID] = [dbo_uvw_market_segment_vw_1].[ms_id] ) AND ( [ARISK_RUFF_OLAP_G2_RISK_VALUE_MODEL_SQL].[RISK_MEASURE_ID] = [dbo_uvw_risk_measure_vw_2].[risk_measure_id] ) AND ( Query to show the benefits of the data type matches --SET STATISTICS TIME ON --UPDATE STATISTICS dbo.fct_riskvalue_amer_01_hashed Declare @Param1 nvarchar(max) = 'TV', @Param2 nvarchar(max) = 'CentreState', @Param3 nvarchar(max) = 'FO' SELECT SUM ( [ARISK_RUFF_OLAP_G2_RISK_VALUE_MODEL_SQL].[VALUE_USD] )AS [ARISK_RUFF_OLAP_G2_RISK_VALUE_MODEL_SQLVALUE_USD0_0], SUM ( [ARISK_RUFF_OLAP_G2_RISK_VALUE_MODEL_SQL].[VALUE_LOCAL_CCY] )AS [ARISK_RUFF_OLAP_G2_RISK_VALUE_MODEL_SQLVALUE_LOCAL_CCY0_2] FROM ( SELECT * FROM dbo.fct_riskvalue_master_00

All looks good from the first look Execution time is 2 min 33 sec Instead 30 sec

But something look strange

DOP vs Execution time 600 Query Time vs DOP 500 400 300 200 DOP32 DOP16 DOP8 DOP4 DOP0 100 0 6 15 24 42 51 84