Application Scalability in Proactive Performance & Capacity Management



Similar documents
Application Performance Testing Basics

MS SQL Performance (Tuning) Best Practices:

Virtuoso and Database Scalability

In-memory Tables Technology overview and solutions

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database

The Complete Performance Solution for Microsoft SQL Server

In-Memory Data Management for Enterprise Applications

WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE

Load Testing Analysis Services Gerhard Brückl

In this session, we use the table ZZTELE with approx. 115,000 records for the examples. The primary key is defined on the columns NAME,VORNAME,STR

Oracle Database 11 g Performance Tuning. Recipes. Sam R. Alapati Darl Kuhn Bill Padfield. Apress*

Optimizing Performance. Training Division New Delhi

ABAP SQL Monitor Implementation Guide and Best Practices

Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.

Optimizing Your Database Performance the Easy Way

DB2 for i. Analysis and Tuning. Mike Cain IBM DB2 for i Center of Excellence. mcain@us.ibm.com

Binary search tree with SIMD bandwidth optimization using SSE

Leveraging EMC Fully Automated Storage Tiering (FAST) and FAST Cache for SQL Server Enterprise Deployments

PUBLIC Performance Optimization Guide

Hypertable Architecture Overview

The Classical Architecture. Storage 1 / 36

Databases Going Virtual? Identifying the Best Database Servers for Virtualization

SAP HANA. SAP HANA Performance Efficient Speed and Scale-Out for Real-Time Business Intelligence

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Effective Java Programming. efficient software development

Oracle Database Scalability in VMware ESX VMware ESX 3.5

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

Response Time Analysis

2013 OTM SIG CONFERENCE Performance Tuning/Monitoring

Graph Database Proof of Concept Report

TUTORIAL WHITE PAPER. Application Performance Management. Investigating Oracle Wait Events With VERITAS Instance Watch

Agenda. Capacity Planning practical view CPU Capacity Planning LPAR2RRD LPAR2RRD. Discussion. Premium features Future

Oracle Database 11g: SQL Tuning Workshop Release 2

Table of Contents. Chapter 1: Introduction. Chapter 2: Getting Started. Chapter 3: Standard Functionality. Chapter 4: Module Descriptions

Oracle DBA Course Contents

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Performance Verbesserung von SAP BW mit SQL Server Columnstore

Performance And Scalability In Oracle9i And SQL Server 2000

PERFORMANCE TUNING FOR PEOPLESOFT APPLICATIONS

Performance Counters. Microsoft SQL. Technical Data Sheet. Overview:

Performance Problems in ABAP Programs: How to Fix Them

Using Synology SSD Technology to Enhance System Performance Synology Inc.

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

Network Infrastructure Services CS848 Project

Toad for Oracle 8.6 SQL Tuning

Deploying Affordable, High Performance Hybrid Flash Storage for Clustered SQL Server

White Paper. Optimizing the Performance Of MySQL Cluster

Using Synology SSD Technology to Enhance System Performance. Based on DSM 5.2

Architectures for Big Data Analytics A database perspective

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Benchmarking Hadoop & HBase on Violin

Whitepaper: performance of SqlBulkCopy

Outline. Failure Types

Oracle Database 11g: SQL Tuning Workshop

High-Volume Data Warehousing in Centerprise. Product Datasheet

HDMQ :Towards In-Order and Exactly-Once Delivery using Hierarchical Distributed Message Queues. Dharmit Patel Faraj Khasib Shiva Srivastava

ABAP How To on SQL Trace Analysis

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

Data Integrator Performance Optimization Guide

Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc.

SWISSBOX REVISITING THE DATA PROCESSING SOFTWARE STACK

Controlling Dynamic SQL with DSCC By: Susan Lawson and Dan Luksetich

Capacity Planning Process Estimating the load Initial configuration

DBACockpit for Oracle. Dr. Ralf Hackmann SAP AG - CoE EMEA Tech Appl. Platf. DOAG St. Leon-Rot 02. July 2013

SAP HANA In-Memory Database Sizing Guideline

MyOra 3.5. User Guide. SQL Tool for Oracle. Kris Murthy

Rackspace Cloud Databases and Container-based Virtualization

Optimizing the Performance of Your Longview Application

DB Audit Expert 3.1. Performance Auditing Add-on Version 1.1 for Microsoft SQL Server 2000 & 2005

All-Flash Arrays Weren t Built for Dynamic Environments. Here s Why... This whitepaper is based on content originally posted at

Physical Data Organization

IBM Tivoli Monitoring Version 6.3 Fix Pack 2. Infrastructure Management Dashboards for Servers Reference

W I S E. SQL Server 2008/2008 R2 Advanced DBA Performance & WISE LTD.

Monitoring Best Practices for COMMERCE

Response Time Analysis

A Performance Engineering Story

Configuration and Coding Techniques. Performance and Migration Progress DataServer for Microsoft SQL Server

Crystal Reports Server 2008

Performance Baseline of Oracle Exadata X2-2 HR HC. Part II: Server Performance. Benchware Performance Suite Release 8.4 (Build ) September 2013

Energy Efficient MapReduce

Performance Tuning for the Teradata Database

SQL Server Performance Intelligence

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Memory Database Application in the Processing of Huge Amounts of Data Daqiang Xiao 1, Qi Qian 2, Jianhua Yang 3, Guang Chen 4

Accelerating Server Storage Performance on Lenovo ThinkServer

iservdb The database closest to you IDEAS Institute

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

ERP on HANA Suite Migration. Robert Hernandez Director In-Memory Solutions SAP Americas

Response Time Analysis

Memory-Centric Database Acceleration

How To Virtualize A Storage Area Network (San) With Virtualization

A Middleware Strategy to Survive Compute Peak Loads in Cloud

A Survey of Shared File Systems

SAP HANA Troubleshooting and Performance Analysis Guide

Java DB Performance. Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860

Transcription:

Application Scalability in Proactive Performance & Capacity Management Bernhard Brinkmoeller, SAP AGS IT Planning Work in progress

What is Scalability? How would you define scalability? In the context of PPCM is scalability a characteristic of the load or the hardware? How would you define scalable load? How would you define scalable hardware? 2013 SAP AG. All rights reserved. 2

What is Scalability? Definition from Wikipedia In electronics (including hardware, communication and software), scalability is the ability of a system, network, or process to handle a growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth.. Scalability, as a property of systems, is generally difficult to define [2] and in any particular case it is necessary to define the specific requirements for scalability on those dimensions that are deemed important. It is a highly significant issue in electronics systems, databases, routers, and networking. A system whose performance improves after adding hardware, proportionally to the capacity added, is said to be a scalable system. An algorithm, design, networking protocol, program, or other system is said to scale if it is suitably efficient and practical when applied to large situations. The general definition of scalability is depends strongly on the context that it is used in. Even in a given context it is questionable whether it is precise enough to form the basis to define concrete work packages for a scalability analysis. It is very important, that we reach a common understanding of what we want to achieve in Proactive Performance & Capacity Management before we start. 2013 SAP AG. All rights reserved. 3

Content Definition of Scalability Proactive Performance & Capacity Management according to ITIL Scalability of load with the amount of business data processed in a step the number of parallel processes the size of the DB Scalability of service time with the number of CPUs available the capacity of the I/O Subsystem Database locks Non-scalability introduced by application server buffering Consequences for risk assessment and quality control Consequences for monitoring 2013 SAP AG. All rights reserved. 4

Capacity Management Service According To ITIL ITIL Service Delivery v.2.1 Published For OGC By TSO According to the Information Technology Infrastructure Library (ITIL) a capacity management service consists of three Sub- Processes The output with the highest value is obtained when the results of the sub-processes are brought together. While the entry point of the sub-processes are different, all of them aim at establishing a connection from the business requirement over the services (reports and transactions) to the resource (CPU, memory, disk) consumption 2013 SAP AG. All rights reserved. 5

What is Scalability? Definition of Scalability in PPCM 1/3 Business capacity Management: Business volume for process X Load is scalable when the resource consumption of the Services necessary to run the business process depends linearly on the (business) volume and there are no unexpected load drivers Service Capacity Management: Resource consumption for Service Y Hardware is scalable when it is capable to provide the necessary resources for the required number of services in a given time interval without a degradation of service times Resource Capacity Management: Service time for resource Z 2013 SAP AG. All rights reserved. 6

What is Scalability? Definition of Scalability in PPCM 2/3 Load is scalable when the consumption of expected resources depends linearly on the (business) volume and there are no unexpected load drivers. Hardware is scalable when it is capable to provide the necessary resources for the required number of services in a given time interval without a degradation of service times. Examples: The response time of processing of an order should depend linearly on the number of items in the order. Signs of nonscalability are: Quadratic dependence on the number of line items in the order Use of sorted tables and read binary search in ABAP Dependence on the network latency between the front end and the server: The number of communication steps has to be so small that the network latency can be neglected. Dependency on the amount of data stored in the DB. Read only new data from chronologically sorted indices. The throughput for order processing should depend linear on the CPU capacity provided by the infrastructure. Signs for non-scalability are: Dependence on the length of the critical path of DB locks. avoid long critical path for updates with a large likelihood of lock collision I/O bottlenecks caused by high ReDo volume. Avoid unproductive database changes (eg. Using set update task local ) 2013 SAP AG. All rights reserved. 7

What is Scalability? Definition of Scalability in PPCM 3/3 On a detailed level scalability describes the relationships between: Volumes of all business processes supported by a system The consumption of the various resources provided by the system The service request times the system is capable to provide A system is scalable up to the required limit when even under high load the contribution of non scalable contributions to the overall resource consumption and service times remains below an acceptable limit and the hardware can provide the required resources at peak time without unacceptable degradations of service request times. For very large systems the acceptable contribution of non scalable load to the resource consumption is typically set at about 20%. For smaller systems it is much higher as it is cheaper to provide more hardware. A system is scalable when the load consisting about 80% of the resource consumption is proven to be scalable and the hardware can provide the required resources at peak time without degradations of service request times of less than 20%. (Limits are debatable) A Scalability analysis is always restricted to the (expected) top load contributors 2013 SAP AG. All rights reserved. 8

Content Definition of Scalability Proactive Performance & Capacity Management according to ITIL Scalability of load with the amount of business data processed in a step the number of parallel processes the size of the DB Scalability of service time with the number of CPUs available the capacity of the I/O Subsystem Database locks Non-scalability introduced by application server buffering Consequences for risk assessment and quality control Consequences for monitoring 2013 SAP AG. All rights reserved. 9

factor of load increase The size of the DB Principal Scalability Patterns 6 5 Scaling behaviour with DB size 4 3 2 Independend Buffer hit ratio depends on table size Amount of Data read depends on table size 1 0 1 1,5 2 2,5 3 3,5 4 4,5 5 DB table growth Following scaling behavior can be observed for the load with the DB size: 1. Constant resource consumption independent of table size Independent of the table size are all fully indexed access to data ( the small dependency on the depth of the B-tree for the index can be neglected) which have a high likelihood to only access Data blocks in the buffer. 2. Decrease of the buffer hit ratio with table size In case the chance that a data block decreases with the index or the table size a week linear dependency of the resource consumption with the table size can be observed Directly proportional with the table size: 3. In case the number of data blocks that need to be read increases with the table size a strong linear dependency of the resource consumption can be observed. 2013 SAP AG. All rights reserved. 10

The size of the DB Amount of Data Read Depends on Table Size 1/3 In the cursor cache statements that create a load proportional to the size of the DB can be identified by a large (and growing) number of Bgets/row or Rproc/Exec. In case Rproc/exec is large the most common technical issue is select for all entries with an empty selection table. This always needs to be check In case this is not the solution it has to be check how the processes can be changed to reduce the number of records read. In case Bgets/row the index layout has to be checked. In case the access is to a single table correct indexing will always allow to reduce the Bgets/row to < 6. 2013 SAP AG. All rights reserved. 11

The size of the DB Amount of Data Read Depends on Table Size 2/3 In case the large number of buffer gets is seen for a join with distributed selectivity it is not always possible to improve the situation with technical means. The most prominent and frequently seen example for such a join is the selection of material movement either in standard or as seen here in customer coding. The main issue here is the distributed selectivity of the Date on MKPF and all other fields on MSEG. In this special case the only stable solution for this is described in SAP Note 1598760. FAQ: MSEG extension and MB51/MB5B redesign. The changes necessary to avoid such non scalability are very complex as it is not only necessary to change coding but also the table layout. In many cases it is therefore not possible to implement a solution. So, most customers refuse to implement the changes. In that case knowledge of the non scalability can be used to estimate the largest allowed residence time for archiving to stay within acceptable performance limits. 2013 SAP AG. All rights reserved. 12

The size of the DB Amount of Data Read Depends on Table Size 3/3 A nice example of non scalable load with an ever increasing amount of data read can be found in customer systems with long running delivery contracts typical for the automotive industry Such a statement is the select from EKBE which is the second most time consuming in the snapshot of the cursor cache with already more than a billion recorded disk reads. It is a select with specified EBLEN and POSNR so it looks quite harmless. Rproc/exec is not that high as it is diluted by many access caused by simple Purchase orders. But the huge number of disk reads triggered are suspicious. EKBE is the order history containing all deliveries made on behave of a contract. Using JIT his might be one delivery every 3 minutes for more than a year for each position of a contract. As more and more old data is touched this drives the I/O load for this access dramatically. A solution for this special issue can be found in the use of transaction ME87 that needs to be used regularly to summarize the order history. (See SAP Note 417933 for details). The example shows once again that it is more important to understand the business processes associated with the top resource consuming statements to find relevant performance improvements. 2013 SAP AG. All rights reserved. 13

Example: VAPMA-VBUK non scalable runtime increases with number of orders in DB The database uses index VAPMA~Z01 to access the data, this way each time all entries belonging to one plant will be read. The runtime of this statement will increase with the number of orders in the system. It is necessary to change the access so that the number open orders determine the runtime. This is most securely done by selecting VBUK first or by introducing oracle hints to use index VBUK~z02) 2013 SAP AG. All rights reserved. 14

The size of the DB Buffer hit ratio depends on table size 1/2 Less obvious than the depends discussed before are cases where we have fully indexed access to data reading only necessary. However there are statements in the cursor cache among the most expensive statement which are executed in huge but justified number numbers which only have a rather bad ratio of disk reds to buffer gets compared to the overall buffer quality. Very often this is caused by an access via a non-chronologically sorted index. The theory behind this is elaborated in more detail in: Data Archiving Improves Performance Myth or Reality? http://www.sdn.sap.com/irj/scn/go/portal/prtroot/docs/library/uuid/d0b0de48-0701-0010-fcb1- fb99d43920e3?quicklink=index&overridelayout=true&5003637390373 or in more detail in Performance Aspects of Data Archiving https://websmp108.sap-ag.de/~sapidb/011000358700005070382005e/da_and_performance_11_en.pdf The main principle is easy to understand: If the data accessed is randomly distributed over the full width of an index or even a DB table the buffer hit ratio will depend heavily on the ratio of Index/table size vs. buffer size. Assuming fixed buffer sizes and growing index tables sizes the buffer hit ratio will go down. This is not the case when the access is concentrated on a small part of the index/table that does not grow with the DB size. 2013 SAP AG. All rights reserved. 15

The size of the DB Buffer hit ratio depends on table size 2/2 Data is touched by the DB when it resides in the same data block as data that is needed to fulfill a request. Therefore it can be concluded that the DB load can only be scalable when old data that is not needed any more does not reside in data blocks which also contain new data, that is needed to fulfill a request. The pictures below show the insertion points of new data in a chronologically sorted index compared to that of an index that is not chronologically sorted. In a chronologically sorted index the amount of data touched to for all business transactions remains constant and is independent of the number of entries in the table. If the index is not chronologically sorted this is not the case: The number of data blocks that are touched increases as the fraction of new data per index block gets smaller and smaller until the growth of old data is stopped for instance by data archiving. Classification of Indices used for the access to data in respect to their quality with respect to chronology allows a very good estimate of the scalability of an application even from single user measurements. 2013 SAP AG. All rights reserved. 16

The size of the DB Tools to check The most important tools to check this are the SQL trace in single user measurements and the cursor cache after go live. There may be several reasons for this kind of non scalability. In the cursor cache statements need to be checked for a large number of Bget/execution, a large number of rproc/execution, and even for a worse than average buffer hit ratio. In a single user trace it is necessary to check indexing and table design with special attention given to the explicit or implicit time constraints in the where clause of each statement and how this is handled in the index. Especially the importance of considering the different buffer quality for old and new data is neglected in many tables and index designs and very often makes the decisive difference between scalable and non scalable load. 2013 SAP AG. All rights reserved. 17

Praxis Check: Indices of DFKKOP Table DFKKOP (Items in contract account document) is typically the largest and most important table of FI- CA with billions of entries in customer systems. In Standard 6 indices are defined for this table: dfkkop~0 dfkkop~1 dfkkop~2 dfkkop~3 dfkkop~4 dfkkop~5 dfkkop~6 MANDT MANDT MANDT MANDT MANDT MANDT MANDT OPEL Number of Contract Accts Rec. & Payable Doc. OPUPW Repetition Item in Contract Account Document OPUPK Item number in contract account document OPUPZ Subitem for a Partial Clearing in Document AUGST Clearing status GPART business partner BUKRS Company Code XMANL Exclude Item from Dunning Run AUGBL Clearing Document or Printed Document ABWBL Number of the substitute FI- CA document AUGST Clearing status WHGRP Repetition group None of the indices is explicitly chronologically sorted. Specifying a time as the last field of an index (AUGDT in Index ~5; ~6) only creates a chronological order for entries with equal VTREF and BUKRS, which does not prevent the mixture of new and old data in one block. All of the indices are implicitly chronologically sorted, by the use of either a document number (ascending with time), or the clearing status (open new; closed old). Note: The clearing status was explicitly chosen as second field of all indices that did not contain a document number to achieve a separation between old and new data and enhance the scalability of access to new data with status open. Note also: there is no chronological order among the closed records for index ~1; ~4; ~5 and ~6. Any access to the closed records via one of the indices ~1; ~4; ~5 or ~6 creates a non scalable load. AUGST Clearing status VKONT Contract Account Number BUKRS Company Code AUGDT Clearing date AUGST Clearing status VTREF Reference Specifications from Contract BUKRS Company Code AUGDT Clearing date AUGST Clearing status ABWKT Alternative contract account for collective bills 2013 SAP AG. All rights reserved. 18

Praxis Check: Access to DFKKOP Insert of new records into DFKKOP When new records are inserted into DFKKOP the status is open. Insertion points are concentrated locally for new items Access to open items Use of all indices guarantees local access to new items only Clearing run The change of the clearing status distributes the entry points in index ~1, ~4, ~5, ~6 Distributes the entry points equally over the complete index range forcing access to the complete range of these indices. Open item list for settlement day To determine recently closed items it is necessary to access all of index ~1 and all of the table 2013 SAP AG. All rights reserved. 19

Praxis Check: Access to DFKKOP Insert of new records into DFKKOP When new records are inserted into DFKKOP the status is open. Insertion points are concentrated locally for new items Access to open items Use of all indices guarantees local access to new items only Clearing run The change of the clearing status distributes the entry points in index ~1, ~4, ~5, ~6 Distributes the entry points equally over the complete index range forcing access to the complete range of these indices. Open item list for settlement day To determine recently closed items it is necessary to access all of index ~1 and all of the table 2013 SAP AG. All rights reserved. 20

Expected Performance impact of HANA Migration Insert of new records into DFKKOP Any insert is just done into the L1-delta. While the merge will be very resource intensive the insert itself should be fast. Access to open items Being column based, HANA has a principle disadvantage here which will result in higher access times. Clearing run The update of the records again is just an insert into the L1-delta. Open item list for settlement day While this only touches only recent data (as long as the report is executed a short time after the settlement day) the amount of data necessary to be read is large enough that this disadvantage is offset by the efficient access to the data in the column store. 2013 SAP AG. All rights reserved. 21

Content Definition of Scalability Proactive Performance & Capacity Management according to ITIL Scalability of load with the amount of business data processed in a step the number of parallel processes the size of the DB Scalability of service time with the number of CPUs available the capacity of the I/O Subsystem Database locks Non-scalability introduced by application server buffering Consequences for risk assessment and quality control Consequences for monitoring 2013 SAP AG. All rights reserved. 22

ratio to top contributor [%] Example: DB Cursor Cache Analysis Resource Consumption of Top 20 SQL- statements 100 duration disk reads buffer reads rows read 90 80 70 60 50 40 30 20 10 0 2013 SAP AG. All rights reserved. 23

# buffer gets normalized to top statement Importance of top 20 Resource Consumers 1 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0 Relative cost of top resource consuming statements 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 top n statements sorted by buffer gets SYS1 (time 1) SYS1 (time 2) Z2L SYS2 Jun (time1) 2012 Z2L SYS2 Dec (time2) 2012 Efficient optimization concentrates on the largest resource consumers. The longer and the more extensive this approach is followed, the smaller will be the relative importance of the top n resource consumers compared to the rest. The effect of each optimization becomes smaller and less significant for overall sizing. SYS2 has reached a state where this approach does not show any significant potential for improvement any more. This can be seen very clearly using the example of the shared cursor cache analysis: Shown above are the top 20 statements with respect to the number of buffer gets form SYS2 from tim1 and time2 together with an example of another customer system before and after optimization. The slop of the curves in SYS2 are so small that the top 20 virtually are meaningless for the overall resource consumption. 2013 SAP AG. All rights reserved. 24