Beyond Plateaux: Optimize SSAS via Best Practices Bill Pearson Island Technologies Inc. wep3@islandtechnologies.com @Bill_Pearson
Beyond Plateaux: Optimize SSAS via Best Practices Introduction and Overview
Introduction and Overview About Me Who This Presentation is For Presentation Components Presentation Objectives Questions?
About Me Submariner in the U.S. Navy (SSBN-632 USS Von Steuben) Recovering CPA Accounting & Finance to Financial Databases to Business Intelligence Founded Island Technologies in 1997 Implementing Microsoft BI Solution Consultant, Author and MSSQL Server MVP Passions: BI and Classical Literature
Who This Presentation is For Administrators Architects Designers Developers (including Reporting) working within the integrated Microsoft BI solution, with a need to optimize Analysis Services database and query processing.
Presentation Objectives Explore Best Practice Objectives Examine Categorized Best Practices Design and Structure Query Processing Optimization Analysis Services Database Processing Server Settings and Resource Tuning References and Discussion
Beyond Plateaux: Optimize SSAS via Best Practices Best Practices Objectives
Best Practices Objectives The primary objectives of best practices for Analysis Services: Enhanced Query Performance Enhanced Processing Performance
Best Practices Objectives The primary objectives of best practices for Analysis Services: Enhanced Query Performance Enhanced Processing Performance We often must consider tradeoffs
Enhanced Query Performance Query processing efficiency is highly palpable to consumers and reflects upon the overall BI solution Analysis Services ( SSAS ) offers a variety of ways to accelerate queries, including: Aggregations Caching Indexed Data Retrieval We can also optimize via several options, including: General Cube design Dimensional (and particularly Attribute) design Multidimensional Expressions (MDX) Query design Query performance is relevant to Reporting Services, other client applications Editor-generated MDX typically requires optimization and maintenance
Enhanced Processing Performance Processing is the operation that refreshes the data in an Analysis Services database The faster the processing performance, the sooner consumers can access refreshed data Analysis Services provides a variety of mechanisms to influence processing performance, including: Efficient Dimension design Effective Aggregations Partitions Economical processing strategies (for example, incremental vs. full refresh vs. proactive caching)
Beyond Plateaux: Optimize SSAS via Best Practices Categorized Best Practices
Categorized Best Practices Design and Structure Query Processing Optimization Analysis Services Database Processing and Tuning Server Resources
Design and Structure General Underlying Data Data Sources Data Source Views (DSVs) Dimensions Measure Groups Aggregations Partitions
Design and Structure: General Seek expert guidance in complicated areas: Load Balancing Storage media (NAS, SAN, etc.) Other In general, smaller cubes process and query faster than larger cubes Share dimensions as much as possible between cubes Role-playing dimensions also help us to minimize redundancies and cube size
Design and Structure: Underlying Data Data Mart / Warehouse over Online Transactional Processing Database ( OLTP ) Build Unified Dimension Model ( UDM ) atop Mart / Warehouse database views Can apply relational tuning Indexing, etc. Can use NOLOCK hint in view definition Views can help to make debugging easier (can issue SQL queries directly to views to compare relational data with that in the cube cannot issue SQL statement s against UDM
Design and Structure: Data Sources Windows Authentication within SQL Server connections, with Impersonation of a domain account created for this purpose Account should have the conservative privileges (least number of capabilities required to access necessary data) Providers for SQL Server Microsoft OLE DB Provider for SQL Server SQL Server Native Client Avoid.NET Data Provider
Design and Structure: Data Source Views (DSVs) Separate diagram for each fact table Use Named Calculations in the DSV when cannot have columns in underlying views / tables but don t overdo it!
Design and Structure: Dimensions Removal of extraneous attributes, then apply techniques to optimize processing of remaining attributes Fewer attributes mean more efficient processing and a smaller cube Using NameColumn and ValueColumn eliminates the need for at least two extraneous attributes
Attribute relationships Design and Structure: Dimensions (cont d) Direct the Analysis Server engine in building aggregations Are bushy by default Need to be created to optimize query processing via correct and complete rollups Relationship Type to Rigid (vs Flexible) when appropriate
Design and Structure: Dimensions (cont d) Attribute columns Key Unique values Composite Key where necessary to make unique Optimal SQL Server data types: - Tinyint - Int - Smallint - Bigint Name User-friendly and clearly distinct Value Further information about the attribute Typically used for calculations Unlike member properties, this property of an attribute is strongly typed, providing increased performance when used in calculations Contents can be accessed via the MemberValue MDX function
Design and Structure: Dimensions (cont d) Attribute Hierarchies Hide attribute hierarchies for attributes added to a User Hierarchy by setting AttributeHierarchyVisible to False Make attributes that are not added to User Hierarchies a member property by setting AttributeHierarchyEnabled to false (reduces processing time for dimension) For attributes expected to be infrequently used in queries, set AttributeHierarchyOptimized State to False to disable Analysis Server s building indexes (reduces dimension processing time)
Design and Structure: Dimensions (cont d) Attribute Hierarchies (cont d) For attributes with the AttributeHierarchyEnabled property set to True, leave IsAggregatable property set to True, except in unusual circumstances, to retain the All member dropping All will likely confuse consumers
Design and Structure: Dimensions (cont d) Attribute Hierarchies (cont d) Natural vs. Unnatural Hierarchies
Design and Structure: Dimensions (cont d) Natural vs. Unnatural Hierarchies (cont d) Natural hierarchy: all attributes participating as levels in the hierarchy have direct or indirect attribute relationships from the bottom of the hierarchy to the top of the hierarchy. Performance considerations: In natural hierarchies, the hierarchy tree is materialized on disk in hierarchy stores. All attributes participating in natural hierarchies are automatically considered to be aggregation candidates.
Design and Structure: Dimensions (cont d) Natural vs. Unnatural Hierarchies (cont d) Unnatural hierarchy: the hierarchy consists of at least two consecutive levels that have no attribute relationships. Typically used to create drill-down paths of commonly viewed attributes without natural hierarchy. Allow use of a variety of MDX navigation functions to easily perform calculations (like percent of parent, etc.) Performance considerations: Unnatural hierarchies are not materialized on disk attributes participating in unnatural hierarchies are not automatically considered as aggregation candidates.
Design and Structure: Dimensions (cont d) Other Dimensional Considerations Parent-Child dimensions if they must be used should be limited to < 500,000 members Set Date dimension to Time type to enable shortcut MDX Date functions.
Design and Structure: Measure Groups Avoid multiple measure groups with same granularity and dimensionality Set IgnoreUnrelatedDimensions to False when possible to avoid repeating a value for all unrelated members Avoid many-to-many relationships if the dimensions or intermediate fact table ( bridge table) have more than one million members unless a compression strategy is in place
Design and Structure: Aggregations Analysis Services Tools to Assist with Aggregations: Aggregation Design wizard approx.at 20% (general rule of thumb) Usage Based Optimization wizard to define better aggregations Log queries during periods of normal activity as a basis Set approximate counts if there is too much data to let the wizard count the rows Repeat on a regular basis
Design and Structure: Aggregations Analysis Services Add-In Tool: BIDS Helper Aggregation Manager marries individual tools
Custom Aggregations Design and Structure: Aggregations (Cont d) When creating custom aggregations, do not create aggregations below the measure group s granularity Avoid overlapping aggregations Consider partition-specific aggregations (more in the next section ) Tools Aggregation Wizard BIDS Helper Aggregation Manager
Design and Structure: Partitions To optimize Cube Processing By commonalities of usage Why Time / Date is almost always a no-brainer To optimize Query Processing Speed queries by creating partitions for larger data sets Partition by common usage, such as by Time / Date NOTE: Multiple partitions can only be deployed to a server running Microsoft SQL Server Enterprise Edition or Developer Edition.
Design and Structure: Partitions (cont d) Some Basic Rules of Thumb (to be borne with a grain of salt environments are unique ) Max of 2,000 partitions with a maximum of 20,000,000 records each per cube (SSAS 2008 quote) Ensure facts are not duplicated in multiple partitions Partitioning by more than one criteria is acceptable Strive to partition the relational warehouse fact table the same way as the cube
Query Processing Optimization General and Monitoring MDX Optimization
General Query Processing Optimization: General and Monitoring Examine server-wide settings for opportunities to increase parallelism, general capacity, etc. (Cite MemoryHeaptype ) The SQL Server 2008 Analysis Services Performance Guide covers many opportunities to adjust settings. A deep level of understandings of these settings, and the possible consequences / tradeoffs involved in their adjustment, is critical. Concurrent skills with Monitoring Tools is also a must. Strategy to scale up / scale out
Query Processing Optimization: General and Monitoring Monitoring Become proficient at use of Monitoring Tools SQL Server Profiler Performance Monitor See Identifying and Resolving MDX Query Performance Bottlenecks in SQL Server 2005 Analysis Services
Query Processing Optimization: MDX Optimization MDX optimization represents a presentation in itself, if not a full day s examination (I take some of this up in a subset presentation) ******************************************************** Practical, detailed references: Identifying and Resolving MDX Query Performance Bottlenecks in SQL Server 2005 Analysis Services SQL Server 2008 Analysis Services Performance Guide
General Strategy Query Processing Optimization: MDX Optimization (cont d) Debugging calculation performance issues across a cube can be difficult if there are many calculations. Try to narrow down the problem expression, and then Apply best practices to the MDX. With some client applications, the query itself can be problem, should it: demand large data volumes push down to unnecessarily low granularities (bypassing aggregations), or contain query calculations that bypass the global and session query processor caches. Try to reduce the query to simplest expression possible that continues to reproduce the performance issue.
Query Processing Optimization: MDX Optimization (cont d) General Strategy (cont d) If the issue is confirmed to be in the cube itself, remove or comment out all calculations from the cube. This includes the following: Custom member formulas Unary operators MDX scripts (except the CALCULATE statement, which should be left intact) Rerun the query (perhaps altering to account for missing members). Bring back the calculations until the problem is reproduced.
MDX Optimization (cont d) A Few Tips: Follow the step-by-step guides I ve suggested Move simple calculations (such as [Measures].[Internet Sales Amt] * [Measures].[Sales Tax Ratio] ) from Calculated Members to the Query Processing Optimization: Mart / Warehouse, Views between the Mart / Warehouse and the DSV, or the DSV itself Explicitly reference cells when possible (Example: [Products].[Short Sleeve Jersey, XL] instead of [Product].CurrentMember)
Query Processing Optimization: MDX Optimization (cont d) A Few Tips (cont d): Understand what functions do cell by cell versus subspace computations (great explanation in the referenced documents ) Learn how to use the SCOPE statement SCOPE vs. Nested IIF (practical examples in the referenced documents )
Analysis Services Database Processing and Tuning Server Resources Best Practices in Processing Surround and Support: Understanding and Measuring Processing Enhancement of Dimension Processing Performance Enhancement of Partition Processing Performance Tuning of Server Resources
Understanding and Measuring Processing Vital in tuning the processing of cubes Processing: loading data from one or more data sources into one or more Analysis Services objects Processing performance impacts how quickly new data is available for querying Data refresh requirements range from monthly updates to near real-time data refreshes, but the faster the processing performance, the sooner users can query refreshed data Analysis Services provides several processing options, allowing granular control over the data loading and refresh frequency of cubes
Enhancement of Dimension Processing Performance Performance Goal of Dimension Processing: to refresh dimension data in an efficient manner that does not negatively impact the query performance of dependent partitions. Best Practices include support of the following techniques: Optimizing SQL source queries Reducing attribute overhead Overall dimension processing architecture
Enhancement of Partition Processing Performance Performance Goal of Partition Processing: to refresh fact data and aggregations in an efficient manner that satisfies overall data refresh requirements Best Practices include support of the following techniques: Optimizing SQL source queries Using multiple partitions Tuning I/O Optimizing networking speeds Tuning thread and concurrency settings
Tuning of Server Resources While many best practices support the designing and tuning of Analysis Services, there are times when we need to add more hardware of tune the server itself Best Practices include support of the following techniques: Using PreAllocate (msmdsrv.ini) to reserve physical memory for Analysis Services Disabling Flight Recorder Monitoring and Adjusting Server Memory
Beyond Plateaux: Optimize SSAS via Best Practices References and Discussion
References and Discussion General Optimization: SQL Server 2008 Analysis Services Performance Guide Many-to-Many Optimization: Mark Russo, et. Al. Erik Veerman http://blogs.solidq.com/en/erik MDX Optimization: Identifying and Resolving MDX Query Performance Bottlenecks in SQL Server 2005 Analysis Services SQL Server 2008 Analysis Services Performance Guide
References and Discussion My own monthly series : www.sqlservercentral.com Stairway to MDX New! Stairway to DAX Coming Soon www.databasejournal.com MDX Essentials Introduction to Analysis Services (2000 through 2008R2) Introduction to Reporting Services (2000 through 2008R2)