HP Vertica and MicroStrategy 10: a functional overview including recommendations for performance optimization Presented by: Ritika Rahate
MicroStrategy Data Access Workflows There are numerous ways for MicroStrategy to interact with HP Vertica Ad-hoc Schema Live Connect In-Memory Cube Modeled Schema
MicroStrategy Is Most Commonly Used To Send Analytical Queries to HP Vertica Analytical Queries have specific characteristics that differentiate them from operational queries Analytical queries involve processing of massive quantities of data Accessing large data volumes Processing massive data volumes Typical Challenge Achieve interactive response time Vertica offers the following key features: Columnar Orientation Compression Projections Massively Parallel Processing
Schema Design Is Essential for Analytical Query Performance All key features are implemented as part of schema design Pro Tip: Use the Database Designer Tool offered by Vertica
MicroStrategy Unique Optimizations for HP Vertica Vertica-specific SQL syntax Analytical functions (OLAP functions) CASE expressions Full outer joins Set operators Subqueries Multi-pass SQL for Analytical Sophistication Use of temporary tables Use Read Optimized Storage Analyze statistics on temporary tables Middle-tier computation of calculations not available in Vertica Support for key Vertica features Massively Parallel Processing Label queries for simplified analysis High Availability and Load Balancing Secure connectivity Extensions to Vertica functionality Aggregate awareness with physical summary tables Middle-tier caching via In-Memory Cubes Report caching Application-level partitioning
MicroStrategy Generates Vertica-Specific SQL Syntax MicroStrategy integrates with HP Vertica s broad list of database functions and SQL functionality to improve analytical performance 99 35 37 Function patterns pushed down Unique VLDB properties Data types supported
MicroStrategy Unique Optimizations for HP Vertica Vertica-specific SQL syntax Analytical functions (OLAP functions) CASE expressions Full outer joins Set operators Subqueries Multi-pass SQL for Analytical Sophistication Use of temporary tables Use Read Optimized Storage Analyze statistics on temporary tables Middle-tier computation of calculations not available in Vertica Support for key Vertica features Massively Parallel Processing Label queries for simplified analysis High Availability and Load Balancing Secure connectivity Extensions to Vertica functionality Aggregate awareness with physical summary tables Middle-tier caching via In-Memory Cubes Report caching Application-level partitioning
MicroStrategy Generates Multi-Pass SQL Queries For Analytical Richness By default MicroStrategy creates temporary tables to hold intermediate result sets user-session visibility session-scoped data create local temporary table ZZSP00 on commit preserve rows as select a13.year_id YEAR_ID, a12.subcat_id SUBCAT_ID, sum(a11.tot_unit_sales) WJXBFS1 from ITEM_MNTH_SLS a11, LU_ITEM a12, LU_MONTH a13 where a11.item_id = a12.item_id and a11.month_id = a13.month_id group by a13.year_id, a12.subcat_id unsegmented all nodes data replication & distribution
Large Intermediate Result Sets Can Bypass Write Store for Better Performance Query hint forces storage to Read Optimized Storage (ROS) Data stored in the Read Optimized Storage(ROS) Highly organized by compression Indexed Example: create local temporary table ZZMD00 on commit preserve rows as select /*+ DIRECT */ a11.year_id YEAR_ID, sum(a11.tot_unit_sales) WJXBFS1 from ITEM_MNTH_SLS a11
Analyzing Large Intermediate Result Sets Improves Query Execution MicroStrategy can instruct Vertica to generate statistics on temporary tables Optimal Query plan Vertica Query Optimizer Analyze_statistics for temp table 101110010101
MicroStrategy Provides Middle-tier Computations for Analytical Sophistication Combining multiple insert statements, removes the overhead of parsing structurally identical statements repeatedly Row-by-Row Inserts are Slow Requires time-consuming locking/unlocking of table Bulk-Inserts are Fast Uses Parameterized Statements to insert blocks of data all at once vs. Row Insert Row Insert Row Insert Row Insert Bulk Insert
Enabling Parameterized Inserts in MicroStrategy Significantly Improves Response Time Parameterized inserts are enabled a DB Instance level in MicroStrategy Navigate to DB instance DB connection Click Modify to edit the Database Connection. Check Use parameterized queries is located on the Advanced tab
SQL Pass #N-1 SQL Pass #N-1 SQL Pass #3 Single SQL Pass SQL Pass #2 SQL Pass #2 SQL Pass #1 SQL Pass #1 SQL Pass #1 MicroStrategy Avoids Unnecessary Workload on Vertica Enabling SQL Global Optimization reduces the number of SQL passes improving query performance Before Global Optimization Redundant SQL Pass After Global Optimization Level 1 Redundant SQL Pass automatically removed Before Global Optimization Metric definitions force different SQL passes After Global Optimization Level 2 SQL Engine automatically combines different SQL passes into a single SQL pass FROM HAVING Category Sum(Sales)>50000 FROM HAVING Category Sum(Sales)>50000 FROM GROUPBY.. Sum(Revenue) ITEM_MTH_SLS FROM HAVING Category Sum(Sales)>50000 FROM GROUPBY.. Count(item) ITEM_MTH_SLS Units Sold Units Received FROM SQL Pass # N-2 SQL Pass # N-1 Units Sold Units Received FROM SQL Pass # N-3 SQL Pass # N-2 Sum(Revenue) Count(Item) FROM SQL Pass # 1 SQL Pass # 2 FROM Sum(Revenue) Count(Item) ITEM_MTH_SLS.
SQL Pass #4 Single SQL Pass SQL Pass #1 SQL Pass #2 SQL Pass #1 SQL Pass #1 SQL Pass #1 MicroStrategy Pushes Smart SQL to Vertica SQL Global Optimization is enabled by default for Vertica Before Global Optimization Filter Conditions Force Four Separate SQL Passes FROM SLS.REGION =LU_REGION.REGION AND REGION = Northeast FROM SLS.REGION =LU_REGION.REGION AND REGION = Central Northeast Revenue, Central Revenue, Southeast Revenue FROM SQL Pass # 1 SQL Pass # 2 SQL Pass # 3. Separate SQL Passes After Global Optimization Level 3 Resolve Filter Conditions into a Single SQL Pass FROM Sum(SLS.Revenue(IF LU_REGION.REGION = Northeast,0), Sum(SLS.Revenue(IF LU_REGION.REGION = Central), Sum(SLS.Revenue(IF LU_REGION.REGION = Southeast), SLS.REGION =LU_REGION.REGION Before Global Optimization Intermediate results stored in multiple temp tables CREATE TABLE ZZMD01 AS Category FROM Region = Northeast AND CREATE TABLE ZZMD02 AS Category. FROM Region = Central AND Intermediate Table ZZMD01 stores Northeast Categories Intermediate Table ZZMD02 stores Central Categories After Global Optimization Level 4 Intermediate results stored in one temp table CREATE TABLE ZZMD01 AS Category Sum(Revenue(IF.REGION = Northeast,0)), Sum(Revenue(IF REGION = Central,0)) FROM
MicroStrategy Unique Optimizations for HP Vertica Vertica-specific SQL syntax Analytical functions (OLAP functions) CASE expressions Full outer joins Set operators Subqueries Multi-pass SQL for Analytical Sophistication Use of temporary tables Use Read Optimized Storage Analyze statistics on temporary tables Middle-tier computation of calculations not available in Vertica Support for key Vertica features Massively Parallel Processing Label queries for simplified analysis High Availability and Load Balancing Secure connectivity Extensions to Vertica functionality Aggregate awareness with physical summary tables Middle-tier caching via In-Memory Cubes Report caching Application-level partitioning
Easily Identify MicroStrategy Workloads in Vertica for Profiling and Debugging purposes Monitoring MicroStrategy workloads in Vertica using Query Labels to ensure efficient utilization of available resources MicroStrategy queries can be easily identified in vertica using the /*+label(label-name)*/ hint These query label hints can be passed as a prefix to, INSERT statements. select identifier, projections_used, query_duration_us, query_start, user_name, processed_row_count from QUERY_PROFILES where identifier = 'MSTR_39C90CD7475A6AAFEFB125BE4FB8B0D7';
Configuring High Availability and Load Balancing in an MPP Environment Leverage Vertica-specific features Vertica cluster failover scenarios Initiator node goes down Session ends and the query is lost Executor node goes down If node goes down in the middle of processing a multi-pass sql job, a query failure error message is sent to the end user and the report has to be rerun If No results have been sent back yet. In this case the end user query will be processed by another node in the Vertica cluster
Connect Securely to HP Vertica MicroStrategy recommends using encrypted data connections Security features in Vertica Client Authentication Connection Encryption Client Authorization
Summary MicroStrategy and HP Vertica continue to have a strong partnership Multi-faceted technical integration of products Continued optimization provides a seamless reporting experience
Resources Link to integration paper: TN47683 Contact: Ritika Rahate rrahate@microstrategy.com
Questions