Best practices for setup and loading an SAP Business Warehouse using HP ProLiant servers and HP StorageWorks EVA8000 storage array white paper Environment: SAP ERP 2004 on Microsoft Windows Server 2003 R2 Enterprise x64 Edition and SQL Server 2005 database, HP ProLiant DL585 servers and HP StorageWorks EVA8000 storage array, SAP BW 3.5 Data Warehouse Workload Executive summary... 3 Key findings... 3 Objectives... 3 Overview... 4 SAP BW workload analysis... 4 Extraction... 5 Infocube/ODS Load... 5 ODS activation... 6 Configuration and test procedures... 7 Server configuration... 7 SAP configuration... 9 Array configuration... 10 Test procedures... 12 Extraction... 12 Infocube/ODS Load... 13 ODS activation... 14 Test results... 14 BW Load: Infocube and ODS data targets, Server Tuning... 14 Manage data package size... 14 Manage parallel work processes...16 Manage the number of application servers... 17 Manage SQL Server 2005 memory and the database server... 18 Manage SQL Server 2005 setting: max degree of parallelism... 20 BW Load: Infocube and ODS data targets, Storage Tuning... 21 Manage EVA I/O load balancing policy... 21 Manage the number of EVA disk groups... 23 Manage the number of disk drives in SAPDATA disk group... 24 Manage SAPDATA virtual disks VRAID type... 25 BW Load: Manage PSA partition size...27 ODS activation... 28 Manage parallel work processes and application servers... 28 Manage BEx Reporting option...29
Manage SAPDATA virtual disks VRAID type... 30 Bottom line performance considerations... 31 Best practices... 32 Storage administrators... 32 Server administrators... 32 SQL Server 2005 administrators... 32 SAP administrators... 32 Conclusions... 33 Other findings... 34 Appendix A Bill of Materials (BOM)... 35 Appendix B SAP Notes... 36 Appendix C Reference materials... 37 For more information... 38
Executive summary SAP Business Warehouse (BW) is a comprehensive data warehousing solution used for business data extraction, data storage, and data analysis. Though SAP BW started out small in scale in 1998, it has now grown to become one of the world s most widespread business intelligence platforms, capable of handling a business s most demanding data warehousing needs. Due to SAP BW s growth in complexity, flexibility, and capability, there has also been growth in its demand for high-performance storage and computing. HP storage and servers form a solid foundation on which to deploy an SAP BW solution, as they provide the performance and reliability needed to run SAP BW effectively. This document is the continuation of a series of white papers related to SAP solutions utilizing HP storage and servers sized for a typical medium-sized enterprise-class business. Each white paper in the series focuses on a different SAP solution-based workload. This paper addresses the SAP BW workload that performs a load procedure from a BW data source to a BW data target. The BW data targets can then be analyzed by way of a query or report. Key findings The Data Package Size in transactions SBIW and RSCUSTV6 must be set appropriately. On the database server, the Lock Pages in Memory local security option must be set in Microsoft Windows 2003. Failure to set this option may result in decreased BW performance and an unresponsive database server. Divide a BW Load into at least as many background work processes as application server processor cores available. For an Operational Data Store (ODS) activation, use parallel processing options in transaction RSCUSTA2 for best performance. For an ODS activation, do not enable the BEx Reporting option for best performance. Use the SQR or SQST Load Balancing Algorithms provided by MPIO 2.00.00 or higher for best performance. For Infocube Index Rebuild, the Microsoft SQL Server 2005 setting Max Degree of Parallelism should be set to 0. Objectives The objective of this white paper is to present the best practices for storage administrators, server administrators, SQL Server 2005 administrators, and SAP administrators, using SAP ERP 2004, a BW 3.5 Load workload, with the HP StorageWorks 8000 Enterprise Virtual Array (EVA8000), HP ProLiant servers, and the SQL Server 2005 database, as part of an SAP production system. A further objective is to demonstrate the performance possible for a BW Load workload with this combination of HP storage and servers. With this knowledge, it is simple to determine whether a given BW Load workload can complete within a given time specification. Tuning and sizing the servers and storage to handle the workload is a vital step in providing a solution that meets performance expectations. This goal was accomplished by varying server and storage configurations and tuning the parameters of SAP and SQL Server 2005 to optimize performance for this SAP BW Load workload. 3
Overview The HP StorageWorks Customer Focused Testing Team put together a mid-sized enterprise system using the SAP ERP 2004 application with a SQL Server 2005 database on four HP ProLiant DL585 servers and an EVA8000 storage array. SAP provided scripts and programs to generate BW transactional data. The BW system could then process the transactional data by way of extraction from a data source through the data analysis phase of the BW data cycle. SAP also provided the associated master data for the transactional data. Master data is data that is expected to remain relatively static. An example of master data is a customer s name and address. Transactional data, on the other hand, is expected to change, and will probably need to be processed by the SAP BW system periodically. An example of transactional data is the number of widgets a customer buys during a given time period. SAP BW consists of several distinct workloads that are characterized by unique performance profiles. These workloads can be generally described as: Extraction Infocube and ODS Load ODS Activation Querying and Reporting Infocube and ODS load performance, as well as ODS activation performance, is the focus of this white paper s findings. Extraction is also examined. The Infocube/ODS load plus ensuing ODS activation is defined to be a BW Load workload in this white paper. A BW Load workload is generally characterized as being time and resource intensive, typically run in off-hours and during off-peak resource usage times. The purpose of the BW Load is to store collected data for later analysis by either a person running a query against a BW InfoProvider (also known as a data target) or a report that is run against the BW InfoProvider. The queries or reports are then used in analyzing the data stored in the BW InfoProvider. For an end user, BW Load performance is critical because analysis of the BW data (by way of a query or report) cannot occur unless the InfoProvider exists and is in an error-free state. If data is collected and then analyzed on a daily basis, then there is only a limited amount of time to ensure that the BW Load is performed correctly and completely. Understanding the characteristics of the BW Load workload and then optimizing its performance saves time, frustration, money, and worry: Can I ensure that my data will be ready for analysis on time? SAP BW workload analysis To better understand the way storage and server tuning can influence BW Load workload performance, some of the characteristics of a BW Load workload must be presented. Further, the Input/Output (I/O) characteristics of the workload from the servers to the storage for each virtual disk must be understood before storage tuning can be optimized. A virtual disk is a logical unit of storage on the EVA8000 storage array, where the size of the virtual disk is specified, but the physical storage space for that virtual disk can exist on a number of disk drives used by the EVA8000. With the SAP BW Load workload, the virtual disks that must be proactively managed are the SAPDATA (where all the SAP BW database files are located) and database transaction log virtual disks. Almost 100% of all storage activity in terms of I/Os per second (IOPS) and data throughput for the BW Load workload occurs on the SAPDATA and transaction log virtual disks. Storage activity to the transaction log virtual 4
disk will always be 100% write, with a predominant write size of 64 KB per I/O. Storage activity to the SAPDATA virtual disks will vary depending on the workload type. Extraction Extraction is the term used for the process of moving data from a data source (SAP R/3 system, a database, flat files of data, and so on) into an SAP BW system. After extraction from a data source is complete, the data may then be manipulated by the SAP BW system. The extraction portion of a BW system s workload is a necessary precursor to the BW Load workload, which is examined in detail in this white paper. The storage activity for the extraction workload can be characterized as follows: For the SAPDATA virtual disks: The ratio of read-to-write host IOPS was 3:1. The ratio of read-to-write host throughput was 1:1. For the SAPDATA and transaction logs: The ratio of read-to-write host IOPS was 1:3. The ratio of read-to-write host throughput was 1:2. The read I/O size was almost exclusively 64 KB. The extraction workload for SAPDATA virtual disks is read-io dominated. The overall workload becomes write-io dominated due to the transaction log activity. The process of extraction can be simplified into the following steps: 1. SAP reads source data from files located on an EVA virtual disk. 2. SAP processes the data and then the database inserts the data into the Persistent Staging Area (also known as the PSA, which is a database table in SAP BW), which is part of the SAPDATA files located on EVA virtual disks. Infocube/ODS Load Loading into an Infocube or ODS is the process of moving and transforming data from the PSA (or directly from a data source) into a data target. A data target must first exist before any data analysis can be performed on the BW data. The process of loading data into an Infocube or ODS object is therefore always a necessary part of a BW system s administration. An Infocube differs from an ODS object in terms of structure. An Infocube is a set of relational database tables that are structured around a fact table, where the fact table contains the transactional data extracted from the data source. An ODS object by contrast is set of flat, transparent database tables that store data on the document (detailed) level. Further details of the differences and use cases of an Infocube versus an ODS object are beyond the scope of this white paper. Whether a business uses Infocubes, ODS objects, or a percentage of both ultimately depends on the reporting and analysis requirements of that business. The storage activity for the Infocube/ODS Load workload observed can be characterized as follows: For the SAPDATA virtual disks: The ratio of read-to-write host IOPS was 15:1. The ratio of read-to-write host throughput was 1:1. For the SAPDATA and transaction logs: The ratio of read-to-write host IOPS was 3:1. The ratio of read-to-write host throughput was 1:2. 5
The read I/O size was almost exclusively 8 KB. This workload is read-io dominated. This workload is also summarized as having small reads and large writes by comparing host IOPS and host throughput read/write ratios. The writes to SAPDATA average 15x the size of the reads. The process of Infocube/ODS load can be simplified into the following steps: 1. The database reads data from the PSA located on EVA virtual disks. 2. SAP processes the data from the PSA and organizes it into the appropriate data target structure. 3. The database writes the data target s structure and data to the SAPDATA files on their EVA virtual disks. Previously, the terms data target and InfoProvider have been used to label an Infocube or ODS object. In fact, the terms are nearly interchangeable when referring to either an Infocube or ODS object. An InfoProvider is an SAP BW object that can be analyzed or reported on. A data target is any object that results from the manipulation of another SAP BW object or table. An Infocube or ODS object is a data target of the PSA in the testing described in this white paper. However, the ODS object is not truly an InfoProvider until it is transformed by way of a process called ODS activation. ODS activation Though an ODS object can only be viewed as an InfoProvider in SAP (transaction RSA1), ODS activation is needed if reporting or analysis is to be done on the ODS object. Until the ODS object is activated, it primarily functions as a data storage mechanism in SAP BW. An ODS activation can essentially be broken down into two parts, with each part having a different workload profile. The first part is Surrogate ID (SID) determination. During SID determination, SAP BW generates or confirms the existence of a 4-byte integer that will be used for reporting and analysis of the ODS by way of the SAP BW Business Explorer (BEx). SID determination is essential to enable standard BEx reporting. The second part of ODS activation involves the data transformation of the ODS object s data. The storage activity for SID determination and the subsequent ODS data transformation can be summarized as follows: SID Determination No storage activity was observed. The workload cannot be characterized in terms of IOPS or storage throughput. ODS Data Transformation For the SAPDATA virtual disks: The ratio of read-to-write host IOPS was 1:5. The ratio of read-to-write host throughput was 1:21. For the SAPDATA and transaction logs: The ratio of read-to-write host IOPS was 1:6. The ratio of read-to-write host throughput was 1:30. The read I/O size was almost exclusively 8 KB. This workload is write-io dominated. This workload is also summarized as having small reads and large writes by comparing host IOPS and host throughput read/write ratios. The writes to SAPDATA are roughly 4x the size of the reads. 6
The overall process of ODS activation observed can be summarized as follows: 1. The database reads data from the ODS object located on EVA virtual disks. 2. SAP processes the data from the ODS activation queue table, and then determines SIDs for BEx reporting if necessary. 3. The database writes a flat transparent structure to the ODS active data table located on EVA virtual disks. 4. The database deletes the original data from the ODS activation queue table located on EVA virtual disks. Configuration and test procedures Server configuration An overview of our mid-sized enterprise class configuration is illustrated in Figure 1. The SAP ERP 2004 based system consisted of a central instance, a database instance, and two dialog instances, each installed on a separate HP ProLiant server, with the SAP Central Instance (CI) and SQL Server 2005 database clustered. Storage was SAN-based, utilizing an EVA8000 with two HP Fibre Channel (FC) switches for redundancy. The SAP system in Figure 1 represents a production system, but could also apply to a development or quality assurance system as well. SAP solution based servers Four ProLiant DL585 servers running Microsoft Windows Server 2003 R2 Enterprise x64 Edition SP1, each with four 2.6-GHz dual-core processors and 32 GB of memory formed the basis of the hardware used for the SAP solution based servers. Each server was equipped with a dual port HP StorageWorks 2-Gb PCI-X 64-bit FC host bus adapter (HBA) utilizing a Storport driver. HP MPIO DSM for EVA was used for managing the pathing from the HBAs to the EVA8000. Cluster Two of the four ProLiant DL585 servers running Microsoft Windows Server 2003 R2 Enterprise x64 Edition SP1 were clustered using MSCS. A cluster group was created using the database server and the CI server in the SAP system landscape. The testing was executed with the SQL Server 2005 database and the SAP CI running on separate servers. The effect of clustering the two servers was to add a level of availability in case a server should fail. The entire SAP system would still be available due to the automatic failover of either the database or SAP CI to the other cluster node. Storage array An EVA8000 2C12D array running XCS 5031 firmware was fully populated with 168 146-GB 15K RPM FC disk drives. This array was used to store SAP and SQL Server 2005 executables, SAPDATA, and database log files. Three different disk group configurations (see Figure 3) were tested during this project. Storage management A ProLiant DL585 server running Microsoft Windows Server 2003 R2 Enterprise Edition SP1 was used for storage management including HP StorageWorks Command View EVA 5.0 with EVAPerf for collecting EVA performance information. Domain controller A ProLiant DL585 server running Microsoft Windows Server 2003 R2 Enterprise Edition SP1 was used as a domain controller for the testing environment. While a backup domain controller is advisable for production environments, it was not deemed necessary in this testing environment. 7
SAN infrastructure Two unzoned HP StorageWorks Edge Switches 4/32 running 5.0.3b firmware were used to create two independent 2-Gb fabrics. Figure 1. Storage and server configuration diagram 8
SAP configuration The two dedicated application servers and the CI server (which also served as a third application server) were each configured with 40 dialog work processes and eight background work processes (see Figure 2). On tests where only one Application Server was used, 24 background work processes were configured on that application server instead of eight. An appropriate number of background work processes is needed for BW Load. Dialog work processes were not used for BW Load during testing. The dialog work processes are needed to perform the ODS activation. Figure 2. SAP work processes 9
Array configuration Three different EVA disk group configurations were tested for performance comparison. In each subsequent case, eight more 146-GB 15K RPM disk drives were utilized than the previous case. In the first configuration, 24 disk drives were used and all virtual disks were placed in a single disk group. This configuration presents the easiest administration option due to its simplicity of design. In the second configuration, two disk groups were used: one for SQL Server 2005 transaction logs, and one for SAPDATA files. With this configuration the sequential I/O (log files) is separated from random I/O (SAPDATA files). Also, with this configuration availability is improved in the event of a disk group failure. If only the disk group with SAPDATA fails, the transaction logs are safe. If the other disk group fails, then SAPDATA is safe. The third configuration consisted of three disk groups: one for transaction logs, one for TempDB (the temporary database construct for SQL Server 2005), and one for SAPDATA files. This configuration separates I/O types, as in the second configuration, but goes a step further by isolating TempDB and dedicating a separate disk group to its function. For more detail, see Figure 3. All three configurations use disk groups populated with disk drives in multiples of eight. This is an EVA best practice that ensures that EVA s internal administration of the disk groups is optimized. VRAID5 was chosen for SAPDATA files because it is more space-efficient (fewer disks required to store data) than VRAID1. For many workloads, VRAID5 performs similarly to VRAID1. In all three configurations, SAP executables, SQL Server 2005 executables, cluster quorum, and the Microsoft Distributed Transaction Coordinator (MSDTC) were placed on separate virtual disks in the 24-disk (largest) disk group. 10
Figure 3. EVA8000 setup diagram Notes: Boxes with dotted lined borders represent EVA virtual disks. Virtual disks for SAP executables, SQL Server 2005 executables, cluster quorum, and MSDTC are not shown. EVA 8000 EVA 8000 EVA 8000 Disk Grou p 24 Disks Disk Group Disk Group 8 Disks 8 Disks TempDB VRAID1 Transaction Logs VRAID1 Transaction Logs VRAID1 Transaction Logs VRAID1 Disk Group Disk Grou p 8 Disks 24 Disks TempDB VRAID1 TempDB VRAID1 SAPDATA VRAID5 SAPDATA VRAID5 Disk Grou p 24 Disks SAPDATA VRAID5 SAPDATA VRAID5 SAPDATA VRAID5 SAPDATA VRAID5 11
Test procedures Using the SAP BW Load workload, real-time user loads were run and performance information was captured for SAP and SQL Server 2005, the servers, and the storage to characterize the performance of the configuration. Each test included: 1. Extraction of BW data from a data source. 2. Loading of the BW data into a Data Target (InfoProvider), either an Infocube or an ODS object. 3. For a load into an ODS object, the ODS was then activated. More information on the Extraction, Infocube and ODS Load, and ODS Activation portions of the testing are detailed in the following sections. Extraction Extraction for testing purposes was performed using a data structure provided by SAP contained in a flat file. The data structure was exactly 334 bytes per row of data to be inserted into the BW system. To optimize extraction performance, the following steps were taken: 1. The flat file was broken into a set of equally-sized (+/-2%) flat files. 2. The number of flat files used was always a multiple of the number of processor cores available to the SAP application servers. 3. The data was extracted from the set of flat files into the Persistent Staging Area (PSA) (see Figure 4). The PSA is effectively an entry point into the SAP BW system. The PSA is a database table that stores extracted data, and then references that data using an SAP BW generated Request ID. The PSA allows the user to check the extracted data for errors before updating the data into the Infocube or ODS data target. This ensures that no time is wasted during the load of the data into the data targets, because the accuracy and consistency of the data can be validated in the PSA. Finally, the PSA offers a clear separation point between the extraction and BW Load workload processes. Without this separation point, if a BW Load failed, then both the BW Load and extraction processes would have to be retried. Figure 4. Extraction data flow Source Flat Files Data Flow PSA Tables 12
Infocube/ODS Load When the data is in the PSA, the next step is to load the data into a data target. Regardless of data target, the procedure for loading the data is primarily the same. To optimize load performance, the following steps were taken: 1. All associated master data was uploaded before the loading of the transactional data. If master data is not loaded before the corresponding transactional data, then SAP BW will attempt to reconstruct the master data structure. This process is time-consuming and was avoided. 2. For an Infocube load, any existing database indexes for the Infocube were dropped. Indexes improve query performance but negatively impact load performance. It is more effective to drop the indexes, load the data, and then reconstruct the indexes after the load is complete if the load size is greater than 1,000,000 rows of data. 3. Transactional data was loaded into only one data target at a time. SAP BW loads data more efficiently when only one data target is loaded at a time when compared to loading multiple data targets simultaneously. Loading only one data target at a time results in less chance for loading errors. 4. Transactional data was loaded into a data target in the sizes of: 30M, 120M, or 240M rows (the symbol M stands for 1,000,000 in this document). This was done to determine the effects of different load sizes. 5. Transactional data was loaded either using one or three SAP application servers simultaneously. This was done to determine the effects of adding more application servers to the BW Load process. 6. Transactional data was loaded using one SAP background work process for every flat file that was extracted (either 8 or 24 during testing). Using dialog work processes for loading will effectively lock out the user from performing other BW-related actions in the current user session. It is more convenient and unobtrusive to schedule a background process to do the load at a predetermined time. The data flow of this workload is illustrated in Figure 5. Figure 5. BW Load data flow PSA Tables Data Flow Infocube Tables OR ODS Activation Queue Table 13
ODS activation If the ODS is used for reporting, then it will need to be activated. For ODS activation, the following steps were taken: 1. The ODS load was allowed to complete. 2. The ODS was activated using 8, 16, 24, or 40 parallel dialog work processes. This was done to determine the effects of adding more parallel work processes on ODS activation runtime. 3. The ODS was activated either using one or three SAP application servers simultaneously. This was done to determine the effects of adding more application servers on ODS activation runtime. The data flow of this workload is illustrated in Figure 6. Figure 6. ODS activation data flow ODS Activation Queue Table Data Flow ODS Active Data Table ODS Change Log Table Test results BW Load: Infocube and ODS data targets, Server Tuning The first step taken in improving and optimizing the BW Load workload was to determine the SAP parameters and settings that impact BW Load performance. This section of the white paper describes the most important findings when tuning SAP, the SAP application servers, the SQL Server 2005 database server, or the SQL Server 2005 database parameters. Manage data package size A data package in SAP BW refers to a structure by which data is transferred into SAP BW from a data source. In addition, for each data package that is created, SAP BW transfers that data package by way of one transactional RFC (trfc) process. trfc is a type of asynchronous RFC transfer method, and is the preferred method for transferring data into an SAP BW system. For extraction, the best practice is to set the data package size as large as possible for best performance. The data package size can be set by way of SAP transaction RSCUSTV6 for a flat file source, or by way of transaction SBIW for other types of data sources. The trade off with larger data packages is that they place a larger main memory requirement on the application server that is extracting the BW data from the data source. In theory, the larger the data package, the better the performance so long as main memory is available for processing the data packages from the extraction source. However, this case of larger is better for data package size has an upper bound for the BW Load workload. In testing, setting the data package size (by way of RSCUSTV6) to 100,000 rows resulted in the use of 12 GB of memory on each application server during extraction. 14
However, during the BW Load workload, memory usage peaked at only 9 GB per application server, with an average of 6 GB at data packages sizes of 50,000, 100,000, and 200,000 rows. The average memory usage on the application servers did not increase across the tested range of data package sizes during the BW Load workload. When the data package size was varied, BW Load workload performance changed dramatically. The default BW data package size in RSCUSTV6 of 1,000 rows per data package was compared to the data package size of 50,000, 100,000, or 200,000 rows. Figure 7 displays the BW Load workload performance (in terms of completion time) and the processor utilization on the application servers. The result is clear: Changing the data package size resulted in significantly better BW Load wallclock time (runtime) needed to complete the BW data load process. There was a 20-fold improvement between 1,000 rows and 100,000 rows in BW Load runtime. Also, the processor utilization for 100,000 rows was less than half of the utilization for 1,000 rows. This finding provides a welcome result where significantly better performance is achieved at no extra cost (in fact, less cost) in terms of server resources. Figure 7. Comparison of BW Load times based on data package size Though in Figure 7, 1,000 rows is compared directly to 100,000 rows, the performance for the BW Load workload was equivalent for data package sizes of 50,000, 100,000, and 200,000 rows. The SAP recommended data package size for SQL Server 2005 is 50,000 rows (SAP Note 130253). 15
Manage parallel work processes BW Load should be run using background work processes. This allows the load to be scheduled for a convenient time and prevents the current user from being locked while the load is running. An SAP best practice is to use a number of background processes for a BW Load that is a multiple of the number of processor cores available on the SAP application servers used. The ProLiant DL585 servers used for testing each has eight processor cores, so each SAP application server should use at least eight parallel background processes for a BW Load workload. During testing one application server was used with either eight or 24 parallel background work processes simultaneously performing a BW Load. The results are presented in Figure 8. Figure 8. Comparison of BW Load times based on the number of background work processes allocated Figure 8 shows that tripling the number of parallel background work processes resulted in a 94% improvement in BW Load runtime. This was accomplished by a greater percentage utilization of the application server s processor cores. The processor cores were 50% more utilized, and helped achieve a significant improvement in BW Load runtime. 16
Manage the number of application servers By adding more application servers to process the BW Load, the number of processor cores available to perform the BW Load workload is increased. In the testing, the number of application servers was increased from one to three (therefore increasing the number of processor cores from eight to 24). The number of simultaneous background processes was held constant at 24. In Figure 9, the results are presented. Tripling the number of application servers resulted in an 11% improvement in BW Load runtime. In the three application server case, the processor utilization of each application server was a little more than 1/3 of the utilization of the one application server case. Figure 9. Comparison of BW Load times based on the number of application servers used 17
Manage SQL Server 2005 memory and the database server Assigning memory available to SQL Server 2005 is somewhat of a trade off. On a dedicated database server (as used in this testing), it might be tempting to give an unlimited amount of memory to SQL Server 2005. If all the physical memory is used up, then paging will occur, but then there is evidence that perhaps more physical memory on the database server is needed. Also, in this way SQL Server 2005 is never constrained by physical memory on the database server. However, a better practice is to dedicate.5 GB to 2 GB of physical memory to the operating system. With the BW Loads performed in this testing, the database server always paged to disk when the database was given the entire database server s physical memory to use. The database server will begin to page/swap when physical memory is used up by SQL Server 2005, and this will negatively affect BW Load performance. In addition, the database server will become unresponsive to the user. To mitigate these side effects, 31 GB of physical memory was assigned to SQL Server 2005, and the remaining 1 GB of physical memory was left dedicated to the OS during testing. With all BW Loads consisting of 30M or more rows of data being loaded, SQL Server 2005 used all the memory allocated to it. During those BW Loads, an occurrence that can be labeled as memory pressure regularly occurred. When memory pressure occurs, the database server will page/swap to disk even though there is still physical memory space available. This has the same negative impact on BW Load performance as described in the last paragraph. When the paging has begun, the only way to stop it is to reboot the database server. The effects of the paging will permeate beyond just a BW Load: During ODS activation, the activation may run for hours before failing with a database deadlock error (Status 9 System Error). The effects of the memory pressure are long-lasting and obtrusive. The cause of the memory pressure is from a cache flush due to a large read operation. The cache flush is easy to spot by monitoring the database server with MS Perfmon. With Perfmon, the cache flush is recognized due to its accompanying side effect of a sudden, dramatic decrease in memory page life expectancy. The cache flush is a routine occurrence with the BW Load sizes tested (30M rows and greater). How can the negative impact of the associated paging/swapping to disk be avoided? The answer is to maintain the Windows local security policy, Lock Pages in Memory. This setting is not specific to SQL Server 2005, but rather to the entire database server. However, the process of setting this policy is detailed in the online article: SQL Server 2005 Books Online: How to: Enable the Lock Pages in Memory Option (Windows). After the process described in the online article is applied, SQL Server 2005 must be restarted, and then the setting can be verified in the current SQL Server 2005 log file by verifying the Using locked pages for buffer pool log message. As seen in Figure 10, there is a significant impact to BW Load performance by locking pages in memory for the memory used by SQL Server 2005. Of course, the impacts are greater than just on the current BW Load. Subsequent BW Loads or ODS activations will also be impacted if the Lock Pages in Memory policy is not maintained correctly. 18
Figure 10. Comparison of BW Load times based on Windows local security setting Lock Pages in Memory on the database server 19
Manage SQL Server 2005 setting: max degree of parallelism For an Infocube/ODS Load, the SQL Server 2005 setting max degree of parallelism can be set to 0 or 1 with no impact on performance. Microsoft recommends that this parallelism setting be set to 0 for Infocube aggregate rollup, and be set to 1 for BW querying. However, no specific recommendation is given for the rebuilding of an Infocube s indexes. Recall that the best practice for loading >1,000,000 rows is to drop the Infocube s indexes to improve BW Load performance. The Infocube s indexes are needed to improve BW query performance, so they must be rebuilt. Figure 11 shows the effect of managing the SQL Server 2005 max degree of parallelism setting for the process of rebuilding an Infocube s indexes. Setting the parallelism to 0 improves the performance of the Infocube index rebuild by 377%. The better performance is primarily attributable to the database server s higher processor utilization. In this case, correctly maintaining the parallelism setting enabled the database server to work harder on the task of rebuilding indexes. Figure 11. Comparison of Infocube index rebuild times based on SQL Server 2005 setting: max degree of parallelism 20
BW Load: Infocube and ODS data targets, Storage Tuning Another process used to improve and optimize the BW Load workload was to find settings on the EVA8000 that would impact BW Load performance. This section of the white paper describes the most important findings for tuning the EVA8000 for the BW Load workload. Manage EVA I/O load balancing policy The EVA8000 has a total of eight host ports. In general, it is a best practice to be sure all the host ports are used equally in terms of I/O and data throughput. This ensures that the load to the array is evenly balanced. With Windows, the way to balance the host port load from the servers to the storage is by way of the HP Multi-pathing I/O Device Specific Module (MPIO DSM). During the testing, MPIO 2.00.00 was used to balance the load to the EVA8000. There are several MPIO load balancing policies that can be used. One method used during testing was manually load balancing the virtual disks using the No Load Balancing (NLB) option and assigning each virtual disk its own path. Another method was to use the other load balancing policies: Shortest Queue Requests (SQR), Shortest Queue Service Time (SQST), Shortest Queue Bytes (SQB), and Round Robin (RR). More technical definitions of these load balancing policies are found in the MPIO DSM documentation. All of the host port load balancing policies were used during testing. SQST performed similarly to SQR. SQB performed slightly worse than SQST and SQR. RR performed worse than SQB, but better than NLB. In Figure 12, the worst performing and best performing load balancing policies are compared: SQR and NLB. There is a 13% improvement in BW Load runtime by switching from NLB to SQR. One of the prime reasons for the improvement is shown by the differences in HBA queue depths with NLB and SQR. The reason for the difference in HBA queue depths is because the SAPDATA virtual disks get the majority of the I/O for the BW Load workload. Only the transaction log virtual disk experiences similar amount of I/O activity. Those three virtual disks process almost 100% of the I/O workload for a BW Load, and can only be manually balanced on three of the EVA8000 s available eight host ports. The result is a higher queue depth on those three ports and a workload that is not optimally balanced. 21
Figure 12. Comparison of BW Load times based on the MPIO 2.00.00 load balancing policy used 22
Manage the number of EVA disk groups Three EVA disk group configurations were examined during testing. The primary difference between the first and second configurations is that the SQL Server 2005 Transaction Log virtual disk is separated from the other virtual disks and placed in its own EVA disk group. The primary difference between the second and third configurations is that SQL Server 2005 s TempDB virtual disk is separated into its own disk group. The separation of TempDB into its own disk group had no impact on performance. This result was validated by observing the rarity of I/O to the TempDB virtual disk during an SAP BW Load workload. It is expected that better performance would result by placing the transaction logs in their own disk group. The transaction logs generate a lot of I/O, and the I/O profile is exclusively sequential write. Isolating that workload to its own disk group should prove beneficial to BW Load workload performance. The effects of the placement of the SQL Server 2005 transaction logs in their own disk group are shown in Figure 13. This action resulted in a modest 2% improvement in BW Load workload performance. Though the write latency to the transaction logs improved significantly in terms of percentage, the write latency was already superb (<2 ms) in the one disk group configuration. Figure 13. Comparison of BW Load times based on number of disk groups used 23
Manage the number of disk drives in SAPDATA disk group In general, adding more disk drives to a disk group will provide better I/O performance. This is because each disk drive has a specific limit to the number of IOPS it can handle, and it is much more likely that this limit will be hit before the drive s throughput limit is reached. More drives in a disk group effectively increases the maximum number of IOPS that disk group can handle. In Figure 14, the effects of adding eight more disk drives (a 33% increase in number of drives) to the disk group containing the SAPDATA virtual disks is examined. There is a minimal improvement of 2% of BW Load runtime. This result indicates that disk drives are not the primary bottleneck of the BW Load workload. If disk drives IOPS was the sole bottleneck, a greater improvement in BW Load performance would have been observed. Figure 14. Comparison of BW Load times based on the addition of eight more disk drives to the disk group 24
Manage SAPDATA virtual disks VRAID type For a random workload, VRAID1 will usually provide better performance than VRAID5. The degree of performance improvement between VRAID1 and VRAID5 depends directly on the percentage of read I/O the workload exhibits versus the percentage of write I/O for the workload: The greater the percentage of write I/O, the greater the expectation for storage performance improvement by switching from VRAID5 to VRAID1. The SAP BW Load workload is predominately a read I/O workload, but it is not exclusively a read I/O workload. Therefore, it is important to understand the impact of VRAID1 to the performance of the BW Load workload. In Figure 15, the disk latencies of VRAID1 and VRAID5 are compared for 32 disk drives in the SAPDATA disk group. For both read and write latencies, switching from VRAID5 to VRAID1 results in a 100% improvement in those latencies. In VRAID1, the read latency of 16 ms and write latency of 7 ms indicates a storage system performing at a very high level for SAP BW. Figure 15. Comparison of BW Load disk latencies based on VRAID type of SAPDATA virtual disks 25
In Figure 16, the difference in BW Load runtimes for VRAID1 and VRAID5 are compared. VRAID1 demonstrates a 7% performance improvement in BW Load runtime over VRAID5 given the same number of disk drives. The performance improvements shown in Figure 15 for the storage system are not fully realized in the overall workload runtime. For this workload, VRAID5 provides very Figure 16. Comparison of BW Load times based on VRAID type of SAPDATA virtual disks 26
BW Load: Manage PSA partition size Deleting the contents of the PSA after the transactional data has been updated and verified in the data targets is a necessary and essential part of BW Load management. New transactional data will be periodically arriving to the PSA, and old data left in the PSA then provides unnecessary clutter in PSA and continues to use valuable storage space. The performance (in terms of runtime) of deleting data from the PSA can be easily optimized by way of SAP transaction RSCUSTV6. In Figure 17, the Partition size setting in RSCUSTV6 was changed from the default of 1,000,000 rows per partition, to 10,000, 50,000, and 100,000 rows, and the PSA delete performance was compared. The best delete performance was consistently at the 50,000 row PSA partition size. The 50,000 row setting provided a 113% performance improvement compared to the default setting. The best performing PSA partition size had no relation to the chosen data package size in RSCUSTV6. PSA partition size did not affect BW Load performance, only PSA delete performance. Figure 17. Comparison of PSA delete times based on PSA partition size 27
ODS activation ODS activation is the most time-consuming part of the BW Load workload. It has a very different I/O profile than an Infocube/ODS load. Compared to Infocube/ODS load, there are fewer performance optimization options for ODS activation. The methods for improving ODS activation performance are discussed next. Manage parallel work processes and application servers The ODS can be activated more quickly by using parallel dialog work processes. The number of parallel work processes can be specified in SAP transaction RSCUSTA2. Also in RSCUSTA2, the number of SAP application servers used during activation can be specified as a server group. The server group is a collection of application servers to be used for activation. The server group used in RSCUSTA2 must first be created in transaction RZ12. For the SID determination portion of ODS activation, parallel processing can only occur on one application server, even if more than one application server is specified in the server group of RSCUSTA2. The options for optimizing SID determination are very limited. In Figure 18, the performance of three combinations of application servers and parallel processes is displayed for ODS activation without SID determination. SID determination is not included because it does not benefit from the use of more than one application server. A 40 parallel dialog processes case was also tested, but not shown in Figure 17. The 40-process case gave no performance benefit over the 24-process case. The use of 24 parallel dialog processes and three application servers provided a 29% performance improvement over the eight-process, one application server case. Figure 18. Comparison of ODS activation times based on number of parallel work processes and application servers used 28
Manage BEx Reporting option If the ODS is to be used as a data storage mechanism only, or BEx reporting is not required, then the BEx Reporting option should be unchecked in RSA1 for best ODS activation performance. In RSA1, navigate to the ODS object, select Change, and then expand the Settings dropdown menu to find the BEx Reporting option. Even if the BEx Reporting option is unselected, the ODS can still be analyzed by way of the InfoSet Query Technique. The ODS will still need to be activated to analyze the ODS, but the SID determination portion of the activation will be skipped when using the InfoSet Query Technique. The impact of unselecting the BEx Reporting option and bypassing SID determination is demonstrated in Figure 19. Bypassing SID determination results in a 77% improvement in ODS activation performance. Figure 19. Comparison of ODS activation times based on maintaining the BEx Reporting option in RSA1 29
Manage SAPDATA virtual disks VRAID type The ODS activation workload is predominately a write I/O workload, but it is not exclusively a write I/O workload. Therefore, it is important to understand the impact of VRAID1 to the performance of the ODS activation workload. In Figure 20, the disk latencies of VRAID1 and VRAID5 are compared for 32 disk drives in the SAPDATA disk group. For both read and write latencies, switching from VRAID5 to VRAID1 results in a >50% improvement in those latencies. When using VRAID1, the read latency of 44 ms may still be of some concern. The best method for further improving that latency is to add more disk drives to the SAPDATA disk group. Figure 20. Comparison of ODS activation disk latencies based on VRAID type 30
In Figure 21, the difference in ODS activation runtimes for VRAID1 and VRAID5 are compared. VRAID1 demonstrates an 11% performance improvement in ODS activation runtime over VRAID5 given the same number of disk drives. The performance improvements shown in Figure 20 for the storage system are not fully realized in the overall workload runtime. For this workload, VRAID5 provides very satisfactory performance. Figure 21. Comparison of ODS activation times based on VRAID type Bottom line performance considerations In the end, whether doing a BW Load or ODS activation, the user must perform all the needed tasks of a BW Load workload in time for the analysis portion of the SAP BW data cycle. The final question should be: Can my BW system complete its BW Load workload in time, given my amount of BW transactional data collected from my data source? The tested combination of ProLiant DL585 servers and the EVA8000 storage array provided a certain level of performance for the following BW Load and ODS activation workloads. Infocube Load: 272M rows/hour ODS Load: 317M rows/hour ODS Activation with SID determination: 38M rows/hour ODS Activation without SID determination: 68M rows/hour The performance numbers represent the best (shortest) runtimes for that particular BW Load activity at any number of rows that was tested. All rows consisted of exactly 334 bytes of data. 31
Best practices Storage administrators Design around an IOPS and latency goal, not just storage capacity. Size all disk groups in multiples of eight disk drives. Use MPIO Load Balancing SQR or SQST algorithms for SAPDATA virtual disks. Avoid using RR or NLB. Know that VRAID5 is acceptable for SAPDATA virtual disks, though VRAID1 will provide marginally better performance. Use VRAID1 for transaction logs due to its greater disk redundancy and greater fault tolerance. Know that the 2 Disk Group EVA8000 configuration will give the best performance, but the 1 Disk Group configuration gives nearly identical performance and uses fewer disk drives. Do not separate SQL Server 2005 s TempDB files into a dedicated disk group. This provides no performance advantage and potentially uses more disk drives. Server administrators Do not give equal memory amounts to application servers and database servers. Put more memory in the database servers. Maximum memory usage on application servers during BW Load: 9 GB each (12 GB each during extraction). Maintain Windows security setting Lock Pages in Memory. Default is setting is No, and must be changed on the database server. Use more application servers to achieve better overall performance. SQL Server 2005 administrators Set memory available to SQL Server 2005 to.5 GB to 2 GB less than total physical memory, so the OS has some memory dedicated to it. Otherwise, SQL Server 2005 will use up all the database server s memory and OS paging/swapping will occur. For Infocube index rebuild, set the SQL Server 2005 parameter Max Degree of Parallelism to 0 for best performance. Know that TempDB warrants no special attention or consideration for a BW Load workload. TempDB management has no impact on BW Load performance. SAP administrators Reference the SAP Notes in Appendix B of this white paper. Use transactions SBIW and RSCUSTV6 to size Data Packages correctly for BW Load. Use transaction RSCUSTA2 and ODS Settings in RSA1 to minimize ODS activation time. Use at least as many parallel processes as you have processor cores available on the application servers for Infocube/ODS load or ODS activation. 32
Conclusions This white paper focused on running an SAP BW 3.5 Load workload on HP servers and storage. Performance was compared while adjusting tuning parameters in SQL Server 2005 and SAP applications, as well as on the storage and servers. The conclusions are as follows: The ProLiant DL585 servers performed very well with this workload. CPUs were never a bottleneck. The application servers also had enough memory to ensure that SAP buffers never swapped to disk. There are clear performance benefits to parallel processing by way of SAP settings and multiple processor cores available in the ProLiant DL585 server. The HP StorageWorks EVA8000 was able to run this workload successfully. Both read and write latencies were excellent (< 20 ms) for BW Load, and only ODS activation read latency poses the need for further optimization by adding more disk drives. The EVA s controller processor utilization was not a concern. To improve BW Load performance, disk drives should be added before HBAs unless loading 120M rows or more. In terms of disk latency and overall performance, VRAID1 will generally outperform VRAID5. For this workload, however, VRAID5 does provide very acceptable performance. Load balancing the EVA by way of MPIO 2.00.00 or higher is an important step in achieving optimal performance. The Data Package Size in transactions SBIW and RSCUSTV6 must be set appropriately. The default setting is not adequate for the load sized examined. Failure to set the Data Package Size correctly results in poor BW Load performance and the unnecessary consumption of additional application server processor resources. On the Database Server, the Lock Pages in Memory local security option must be set in Windows 2003 for BW Loads of 30M rows or greater. Failure to set this option may result in decreased BW Load performance and an unresponsive database server. For BW Loads of 30M rows or greater, SQL Server 2005 will use up all the memory it is allocated rapidly. As much memory as possible should be devoted to SQL Server 2005 on the database server. For the best performance during Infocube Index Rebuild, the SQL Server 2005 setting Max Degree of Parallelism should be set to 0. 33
Other findings During SID determination of ODS activation, only four to seven parallel work processes are ever used, even if more are specified in SAP transaction RSCUSTA2. All other work processes are stopped CPIC. The result is that SID determination takes a significant amount of time and cannot be further optimized. SID determination is not a resource intensive operation in terms of processor utilization or I/O performance. Better performance is expected by way of a higher degree of parallel processing. The stopped CPIC condition is generally associated with a Remote Function Call (RFC) issue. However, SID determination can only occur on one application server, so this status is unexpected. SID determination can only occur on one application server, but SAP BW tries to reserve the same number of dialog work process on that application server as parallel process specified in transaction RSCUSTA2. If the number of parallel processes specified in RSCUSTA2 is greater than the number of dialog work processes available on the one application server used for SID determination, then ODS activation will fail during the SID determination. Activation will have to be restarted after either the number of parallel processes is decreased in RSCUSTA2 or more dialog processes are allocated (using transaction RZ10) on the one application server used for SID determination. If SID determination is needed, then the number of parallel processes that can be specified in RSCUSTA2 cannot exceed the number of dialog processes available on any one application server. ODS activation uses fewer than the number of parallel work processes specified in RSCUSTA2. The result is that ODS activation takes longer than expected. 34
Appendix A Bill of Materials (BOM) The following BOM describes the specific environment utilized in testing and are included for reference purposes. Qty Part No. Description Production and Application Server Configuration (x4 servers) 1 407659-001 HP DL585 O2.6 DC 2P PC3200 US Svr 2 407661-B21 HP O885 2.6 PC3200 DC DL585 Opt Kit 8 379300-B21 HP 4-GB Reg PC3200 2x2-GB Memory 4 347708-B22 HP 146-GB 15K RPM U320 Univ Hard Drive 1 383975-B21 HP 8X Slim DVD+RW Drive 1 273915-B21 HP Smart Array 6402/128-MB Controller 1 313881-B21 HP NC7170 DP PCI-X 1000T Gb Svr Adapter 1 A7387A 2-GB PCI-X 64-bit 133-Mhz Dual Channel Load/Test and Storage Server Configuration (x2 servers) 1 397299-001 HP DL585R01 O2.8 2PPC3200 Server 4 379300-B21 HP 4-GB Reg PC3200 2x2-GB Memory 2 347708-B22 HP 146-GB 15K RPM U320 Univ Hard Drive 1 383975-B21 HP 8X Slim DVD+RW Drive 1 A7387A 2-GB PCI-X 64-bit 133-Mhz Dual Channel 1 U2426A Microsoft and Novell OE EVA 8000 Configuration 1 AD522A HP EVA8000 2C12D 60-Hz 42U Cabinet 168 364621-B23 HP StorageWorks 146-GB 15K FC HDD 8 221692-B21 Storage Works LC/LC 2m Cable 1 T4256C HP EVA4000/6000/8000 5.1 Controller Media Kit 1 T3724C HP Command View EVA v5.0 Media Kit 1 T3732A HP Command View EVA5000/8000 Unlim use per EVA LTU Network Configuration 2 A7393A HP StorageWorks SAN Switch 4/32 32 A6515A HP Short Wave Optical Transceivers 1 J4904A ProCurve 2800 series switch (2848) 35
Appendix B SAP Notes The following SAP Notes may prove helpful in setting up SAP BW 3.5 or SQL Server 2005. 130253 General tips on uploading transaction data to BW 192658 Setting basis parameters for BW Systems 74141 Resource Management for trfc and arfc 88416 Zero administration memory management as of 4.0A/Windows 799058 Setting up Microsoft SQL Server 2005 567745 Composite note BW 3.x performance: DB-specific settings 567747 Oracle Statistics for RFC Tables with Oracle 10g 417307 Extractor package size: Collective note for applications 603050 Activation terminates as not all requests are GREEN 634458 ODS object: Activation fails DEADLOCK 790249 ODS: Activation of data in an ODS takes a long time 36
Appendix C Reference materials Schroder, Thomas. SAP BW Performance Optimization Guide. Bonn: SAP Press, 2006. Thomas, Juergen. SAP with Microsoft SQL Server 2005: Best Practices for High Availability, Maximum Performance, and Scalability. SQL Server Technical Article. June 2006. How to: Enable the Lock Pages Memory Option (Windows). SQL Server 2005 Books Online. April 2006. HP StorageWorks 4000/6000/8000 Enterprise Virtual Array configuration best practices white paper. Hewlett-Packard Company. October 2005. HP StorageWorks Enterprise Virtual Array configuration guide for mysap Business Suite white paper. For the EVA3000/5000 and EVA4000/6000/8000, 5 th Edition. Hewlett-Packard Company. August 2005. 37
For more information HP StorageWorks Customer Focused Testing http://www.hp.com/go/hpcft HP SAP Solutions http://www.hp.com/go/sap HP StorageWorks http://www.hp.com/go/storage HP ProLiant Server http://www.hp.com/go/proliant 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. Microsoft and Windows are U.S. registered trademarks of Microsoft Corporation. 4AA1-3428ENW, June 2007