An Introduction to System Sizing for Data Warehousing Workloads

Size: px
Start display at page:

Download "An Introduction to System Sizing for Data Warehousing Workloads"

Transcription

1 An Introduction to System Sizing for Data Warehousing Workloads An Example Sizing Method for Optimal System Performance Tony Petrossian, pseries Performance, IBM, Austin Ann Matzou, Cluster Software Performance, IBM, Poughkeepsie Kwai Wong, DB2 Performance, IBM, Toronto First Edition March 31, 2004

2 Note Copyright International Business Machines Corporation All rights reserved. TPC Benchmark, TPC-D, TPC-H, TPC-R, QppD, QppH, QppR, QthD, QthH, QthR and QphD, QphH, QphR are trademarks of the Transaction Processing Performance Council The following terms are trademarks or registered trademarks of IBM in the United States and/or other countries: IBM, AIX, DB2, DB2 Universal Database, Power PC Architecture, POWER, ~, pseries. Performance results described in this paper were obtained under controlled conditions and may not be achievable under different conditions. All information is provided AS IS and no warranties or guarantees are expressed or implied by IBM. Actual system performance may vary and is dependent upon many factors including system hardware configuration and software design and configuration. All TPC-H results referenced are as of March 30, 2004 Special Notices This publication was produced in the United States. IBM may not offer the products, programs, services, or features discussed herein in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the products, programs, services, and features available in your area. Any reference to an IBM product, program, service, or feature is not intended to state or imply that only IBM's product, program, service, or feature may be used. Any functionally equivalent product, program, service, or feature that does not infringe any of IBM's intellectual property rights may be used instead of the IBM product, program, service or feature. Information in this presentation concerning non-ibm products was obtained from the suppliers of these products, published announcement material, or other publicly available sources. Questions on the capabilities of non-ibm products should be addressed to the suppliers of those products. IBM may have patents or pending patent applications covering subject matter in this presentation. The furnishing of this presentation does not give you any license to these patents. Send license inquires, in writing, to IBM Director of Licensing, IBM Corporation, 500 Columbus Avenue, Thornwood, NY USA. All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Contact your IBM local Branch Office or IBM Authorized Reseller for the full text of a specific Statement of General Direction. The information contained in this presentation has not been submitted to any formal IBM test and is distributed AS IS. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. The use of this information or the implementation of any techniques described herein is a customer responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. Customers attempting to adapt these techniques to their own environments do so at their own risk. The information contained in this document represents the current views of IBM on the issues discussed as of the date of publication. IBM cannot guarantee the accuracy of any information presented after the date of publication. IBM products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply. Any performance data contained in this document was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Users of this presentation should verify the applicable data for their specific environment. UNIX is a registered trademark in the United States and other countries, licensed exclusively through X/Open Company, Limited. Other company, product and service names, which may be denoted by a double asterisk (**), may be trademarks or service marks of others. This publication was developed for products and services offered in the United States. IBM may not offer the products, services or features discussed in this document in other countries. Information is subject to change without notice. Consult your local IBM representative for information on offerings available in your area International Business Machines Corporation, all right reserved. 2

3 1 SIZING THE HARDWARE FOR A DATA WAREHOUSE Introduction The Common Misconception about Sizing Understanding the Sizing Problems for a Data Warehouse Accuracy Goal in Sizing Estimates Optimally Balanced System Overview of the Sizing Process SIZING METHODOLOGY Choosing a System Understanding System Capabilities The eserver pseries 655 System FAStT900 Storage Server Understanding the Workload Storage Space Requirements of the Workload Data Processing Characteristics of the Workload Operational Characteristics of the Workload Business Requirements Sizing the System for Effective Utilization Sizing the Storage Sizing For CPU and Memory Sizing for the Network The Overall System Conclusion...27 APPENDIX - 1 TPC-H BENCHMARK OVERVIEW TPC Council TPC-H Overview TPC-H Metrics Benchmark Evolution Performance Evolution International Business Machines Corporation, all right reserved. 3

4 1 Sizing the Hardware for a Data Warehouse 1.1 Introduction Today s successful enterprises have a significant interest in Business Intelligence (BI) and data warehousing (DW) because they rely on these tools to gain a better understanding of their business and to establish a competitive position within their marketplaces. When reviewing analyst reports and market research papers related to data warehousing and Business Intelligence applications, we can easily see the following common theme: Data volumes are growing at unprecedented rates To accommodate huge volumes of data, we must learn to build large data warehouses that can function successfully and provide an ever increasing return on investment. The need to effectively process large volumes of data requires abilities beyond just storing the data in a warehouse. This paper introduces the reader to the various aspects of sizing a system for data warehousing. To bring awareness to the critical issues that are frequently ignored in sizing projects, a sizing example is provided that estimates the hardware requirements of a recently published data warehousing benchmark The Common Misconception about Sizing One of the key elements contributing to the success of a Data warehouse project is the hardware that stores, processes, and facilitates the movement of the data. Obviously, a large warehouse requires a large storage capacity, but the challenges of building a successful data warehouse are not limited to amassing a huge storage complex. Unfortunately, many system sizing exercises put too much emphasis on the capacity and function of the storage without considering the overall IO subsystem and the balance of system resources needed to make efficient use of the storage investment. The ability to attach a large storage complex to a system does not suggest that the system is appropriately equipped to process the large volumes of data within a reasonable window of time Understanding the Sizing Problems for a Data Warehouse Sizing a system for a new data warehouse project without any experimental data can be a daunting task. Unlike more traditional OLTP workloads, only a small portion of common performance information can be used to size different data warehouse systems and applications. The majority of OLTP workloads tend to have well understood units of work per transaction. As a result, resource requirements can be scaled using transaction rates and number of users. In contrast, a unit of work in a data warehouse application is variable and mostly unrelated to the data size. This variability makes it difficult to compare resource utilizations of different DW applications for estimating system requirements International Business Machines Corporation, all right reserved. 4

5 Many existing DW installations have created a science of capacity planning and measuring the resource utilization of their workloads. Unfortunately, most of this information is unavailable or inapplicable to a new installation. The ad hoc nature of a DW workload makes it difficult to compare different systems. Estimating CPU requirements for data processing in a data warehouse is a complex task. CPU requirements to process 100MB of data can vary depending on the complexity of the queries. In order to build a knowledge base from which to estimate the processing requirements for a specific warehouse workload, be prepared to experiment, use benchmarks, seek expert opinions, and even guess. Understanding sizing problems can help to build flexible configurations for future refinements Accuracy Goal in Sizing Estimates An overly complex sizing methodology that requires massive amounts of estimated input will most likely produce a false sense of accuracy without necessarily producing a better system sizing estimate. The goal should be to produce a sizing estimate for a flexible system configuration with room for minor adjustments in the future. The system should have a good balance of resources that scale proportionally. It is important to remember that the outcome of any sizing methodology is an estimate and although the accuracy can be improved, it will never reach one hundred percent. It is critical to recognize the point of diminishing returns when going through a sizing process. Hopefully, there exists a point between knowing the size of the data and understanding the resource requirements of all possible queries in order to achieve a reasonable size estimate. Each sizing effort should include an accuracy goal and a margin of error based on the level of existing knowledge of the application. Like any other business decision, this task requires risk calculation and contingency planning. An alternative to a sizing estimate is to run the custom benchmark using specific application and data. When feasible, these efforts are usually very expensive. In most cases, an application is built after the hardware infrastructure is installed, so benchmarking the application before buying hardware is not possible Optimally Balanced System Regardless of the methodology used to establish sizing estimates for data warehouse workloads, the outcome should always be a system with balanced resources that can be used efficiently. A well balanced configuration should be capable of maximizing one or more of the most expensive system resources at any time. Quite often, poorly configured systems leave expensive processing power idle due to an inadequate IO subsystem. Data warehousing workloads present additional challenges that are not seen in traditional OLTP systems. The volume of data moved between storage and CPU for any given OLTP transaction 2004 International Business Machines Corporation, all right reserved. 5

6 is very small. The aggregate data movement for a large OLTP system is minuscule when compared with data warehousing systems. The balance between system CPU power, storage capacity and the IO subsystem is critical when building data warehousing systems. The IO subsystem connects the storage to the CPU and accommodates the movement of data between the two components. CPU & Memory IO BUS Network Interconnect IO Subsystem Networks System Storage Interconnect Storage Figure 1: Data Movement in Systems Due to the high volume of data moving through data warehousing systems, special consideration should be given to the IO subsystem for this type of workload Overview of the Sizing Process To effectively size a system for optimal performance, architecting a solution requires the following steps: Understand system performance capabilities and limits Understand the workload and resource usage characteristics Establish business requirements and system expectations Size the system for optimal use of its resources With some analytical work a reasonable configuration that meets the business requirements can be estimated. As the quality of data increases, sizing estimates become more accurate International Business Machines Corporation, all right reserved. 6

7 2 Sizing Methodology This section introduces the sizing methodology using a sample sizing effort. The workload used is a benchmark from the Transaction Processing Performance Council (TPC) for data warehousing. The TPC Benchmark H (TPC-H) is a well recognized data warehousing benchmark and its detailed description is publicly available. Appendix - 1 contains more information about the TPC and the TPC-H benchmark. The following diagram illustrates the methodology used: Data Collection Workload Characteristics Product Data Business Requirements Sizing Knowledge Base Assumptions Facts Requirements Sizing Process Sizing Estimate Figure 2: System Sizing Methodology This example, characterizes the behavior of a data warehouse workload and sets specific performance goals for achieving the business objective. Performance data sheets on IBM ~ pseries TM 655 (p655) and IBM TotalStorage TM FAStT900 Storage Server were used to establish a system sizing estimate to meet the goals. Although not part of this example, the data was used to run and publish a TPC-H benchmark that validated our work. More details on this benchmark result can be found on the TPC web site: 1. The following sections describe the steps required to collect data and size the system. 2.1 Choosing a System Selecting a vendor, a product line, and a system for a new project is a complicated process beyond the scope of this paper. It should be noted that most selection processes are influenced by organizational preferences, historical purchasing patterns and other non-technical issues International Business Machines Corporation, all right reserved. 7

8 Regardless of the reasoning, the selection team is responsible for ensuring that a selected system is capable of meeting the technical requirements of the workload and providing the return on investment sought by the business. For this project the following products were selected: Clustered configuration of IBM eserver pseries 655 systems IBM FAStT900 Storage Server IBM DB2 UDB The choice of products was influenced by the project requirements, as well as the desire to highlight these products. 2.2 Understanding System Capabilities The eserver pseries 655 System The pseries 655 Central Electronics Complex (CEC) ( ) is a 4U tall, 24-inch half drawer, rack-mounted device. It houses the system processors, memory, system support processor, I/O drawer connection capability, and associated components. The p655 server includes the latest IBM POWER4+ chip technology in a building-block approach to the requirements of high-performance, clustered technical and commercial computing. With the speed advantages provided by the powerful 1.7GHz POWER4+ processor and its associated system architecture, a fast system bus, extremely high memory bandwidth and robust input/output (I/O) subsystems, the pseries 655 (p655) provides a versatile solution to the most demanding client requirements. The following diagram shows the p655 system configuration. MCM GX GX to RIO-2 Bridge RIO-2 RIO-2 External I/O Drawer Memory Slot Memory Slot Memory Slot Memory Slot RIO-2 to PCI-X Bridge PCI Buses Internal PCI Devices and PCI Slots Figure 3: p655 CEC The following sections describe the major components of the p655. For general description and configuration information about the p655, refer to the following IBM web site: 2004 International Business Machines Corporation, all right reserved. 8

9 Processing Power The p655 system is powered by a single multi-chip processor module. A Multi-Chip Module (MCM) has either four or eight 1.7GHz, POWER4+ processor cores. Each processor core contains 32 KB of data cache and 64 KB of instruction cache. Each processor chip has a 1.5MB L2 cache onboard that operates at chip frequency. On the 8-way MCM, the two cores on each processor chip share that chip s L2 cache, while on the 4-way MCM each core has a dedicated L2 cache. A 32MB L3 cache is located between each processor chip and main memory and operates at one-third of the chip frequency. For more detailed information on the p655 configuration, refer to the following white paper: Memory Configuration The p655 System has four memory slots which allow from 4GB to 64GB of memory to be installed. Memory cards are available in 4GB, 8GB and 16GB sizes. The following table shows possible memory configurations. Total memory Slot 1 Slot 2 Slot 3 Slot 4 4GB 4GB 8GB 4GB 4GB 16GB 4GB 4GB 4GB 4GB 16GB 8GB 8GB 32GB 8GB 8GB 8GB 8GB 32GB 16GB 16GB 64GB 16GB 16GB 16GB 16GB Table 1: System Memory Configuration Options IO Subsystem The p655 has two RIO-2 (Remote I/O) buses. The first RIO-2 bus supports the service processor, two Ethernet ports, an integrated SCSI adapter and three hot-plug/blind-swap PCI-X slots on the system board (see Figure 3). The second RIO-2 bus can be connected to D I/O drawer for addition IO adapter slots and performance. The p655 supports a maximum of one I/O drawer with two RIO-2 ports. The IO drawer contains two PCI I/O planars. Each planar has three PCI Host Buses (PHB), first PHB has four 64-bit (133MHz) PCI slots, the second and third ones have three 64-bit (133MHz) PCI slots. Figure 4 shows the detailed configuration of the I/O drawer connected to the RIO-2 bus International Business Machines Corporation, all right reserved. 9

10 Sustained 2100MB/s Duplex RIO-2 RIO-2 HUB RIO-2 RIO-2 Sustained 1050MB/s Duplex Active Active Passive/Failover 64 bit PHB 600MB/s Sustained RIO-2 to PCI-X Bridge RIO-2 to PCI-X Bridge PHB1 PCI-PCI Bridge PHB2 PCI-PCI Bridge PHB3 PCI-PCI Bridge PHB1 PCI-PCI Bridge PHB2 PCI-PCI Bridge PHB3 PCI-PCI Bridge D IO Drawer FAStT900 Storage Server Figure 4: IO Drawer Configuration The FAStT900 Storage Server is a member of the IBM FAStT family of disk storage products. The FAStT900 is an enterprise-class storage server designed to provide performance and flexibility for data-intensive computing environments. The FAStT900 Storage Server has four host side FC interfaces which provide an extremely high IO bandwidth and four drive side interfaces which accommodate a very large storage capacity. It offers up to 32 TB of Fibre Channel disk capacity using 18.2, 36.4, 73.4, and 146.8GB drives with EXP700 disk drive enclosures. Dual controllers with mirrored cache in the FAStT900 provides for the RAID functions necessary to protect data from disk failures A FAStT900 can be connected through SAN switches or attached directly to the host. The FAStT900 Storage Servers sustains an enormous IO rate with a mixture of read and write operations. When performing sequential IO operations a FAStT900 can saturate the four 2Gb FC host interfaces and deliver more than 720MB per second of IO to the system. For more information about the FAStT900 features and performance refer to the following IBM web site: International Business Machines Corporation, all right reserved. 10

11 2.3 Understanding the Workload It should be mentioned that the authors have experience with the TPC-H workload based on previous projects. To characterize a workload for sizing a system, it is useful to have experience in data warehousing, with a good understanding of the specific database products and the targeted business environment. There are two major workload related areas of concern in sizing a system for data warehousing projects: The storage of the data warehouse The processing of the data Both storage and processing requirements have an impact on all system components. Each will be considered separately Storage Space Requirements of the Workload Estimating the disk space to store data is the simpler aspect of system sizing so many sizing efforts put most of the emphasis on this task. For this example, the assumption was to have 1,000GB of raw data. Most DW projects can easily calculate the raw data size based on information provided by the data sources. For example, when the data is extracted from an existing transactional system, its size is either known or easy to estimate. The various components requiring storage space are: Table data storage Index data storage Database log space Temporary space required by the database Staging space required to store raw data Once the raw data size is established, it is necessary to estimate the database space requirement for storing the data in tables using the internal format of the database. For this, the schema, page size, row density per page are required. A database administrator and a data architect must be involved in this process. Most database vendors provide accurate estimates once the schema and data size are known. For this example, the information provided in the DB2 Administration Guide: Planning 5 document was used. This manual has a specific section that can help estimate table, index, log, and temporary space requirements for the schema Table Data Storage For each table in the database, the following information was used to calculate the space requirements for the base table that holds the warehouse data: Raw data size Data Row size Number of rows 2004 International Business Machines Corporation, all right reserved. 11

12 Page size Rows per page Including page overhead Free slots per page Free space for future page additions Considering the above items, the space requirement for storing all the base tables was estimated to be 1,350GB. Most database vendors provide ample documentation to help calculate the database table space requirement for any given schema. The database product documentation should be consulted for information on estimating table space requirements Index Data Storage For each index, the following information was used to calculate the space requirements for the indices in the schema: Index size per row of the table Page size Index page overhead Rows per page Including the page overhead Free slots per page Free space for future page additions Considering the above information, the space requirement for storing all the indices was estimated to be 258GB. Once again, the database product documentation should be consulted for information on estimating index space requirements Database Log Space Most data warehouses have infrequent update activity in comparison with OLTP workloads. The data warehouse maintenance strategy will generally dictate the database log requirements. Some data warehouses are loaded periodically from operational data and are never modified between loads; these configurations have insignificant logging requirements. Other strategies involve regular maintenance of data that requires inserts and deletes from the database. In this configuration, regular updates to the data had to be accommodated but the volume of data being changed was only 0.1% of the total warehouse which adds up to 1GB (See TPC-H Specification in the Appendix-1 for details) per update cycle. The following was taken into consideration: Data warehouse update and maintenance strategy Frequency of updates Volume of changing data per cycle Data recovery requirements Update cycles between log backups 2004 International Business Machines Corporation, all right reserved. 12

13 Transactional characteristics of the workload The log space requirements were estimated to be insignificant in size relative to the data size of the warehouse. 36GB of space was allocated to the database logs to satisfy logging needs for at least twenty update cycles between log backups. Consult DB2 Administration Guide: Planning 6 for more details on sizing log requirements Temporary Database Space When databases are executing queries that process large join, sort, or aggregation operations, they require memory to hold intermediate or partially processed data. As the database reaches the limits of its allotted memory, it uses disk space to temporarily store the partially processed data and free up memory for further processing. Most data warehouse systems process more data than can be held in memory and therefore they need temporary database storage space. For example, a query that attempts to sort 300GB of data on a system with 16GB of memory will require significant temporary storage. Estimating the temporary storage space requirements for a DW workload is difficult because several external factors such as system memory size and concurrency levels impact the need for space. It usually takes an experienced data warehousing architect with help from database vendors to estimate temporary space needs. Underestimating the temporary space requirements of a workload can prevent the proper execution of large queries or limit concurrency levels of query execution. The following information was considered when estimating our temporary storage needs: Percentage of data used in large sort and join queries Number of concurrent queries that can be running at any one time Number of concurrent query segments per query Previous experience with the workload memory usage Expert guesses and rules of thumb available from product vendors Comparative estimates provided by the database vendor Future growth of data and increase of concurrency levels Based on the above information, 140GB of temporary space was estimated for the worst case query and since seven query streams could run concurrently, about 1,000GB of temporary space was needed Staging Space Most data warehouse projects require some space for staging raw data for loading or maintaining the warehouse. Depending on the operational procedures, the space requirement can vary drastically. The following information was considered when estimating the staging space requirement: Data warehouse loading procedures 2004 International Business Machines Corporation, all right reserved. 13

14 Location and storage needs for the raw data Data warehouse maintenance procedures Location and storage needs for the maintenance data Future growth of data Based on the operational needs to store update data and some load data it was estimated that 1,500GB of space was sufficient for the project Minimum Storage Space Requirement The storage space estimate is the minimum of space requirements. There are several factors which impact the overall storage configuration, for instance: RAID requirements to protect the data Number of disks required to meet the performance requirements Number of disks needed to balance the IO performance. For example, suppose that the storage requirement can be satisfied with 11 disk drives, but if the system has two disk controllers it might be better to use six disks per controller to evenly distribute the IO on both controllers. Adjustments to the number of disks for performance reasons may result in having more space than the minimum required, but as always one must balance the performance needs and cost based on the project priorities. The following table shows the overall storage requirements for the configuration. 1,000GB TPC-H Warehouse Space Requirement Data Index Database Log Database temporary storage Staging space Total Storage Space 1,350GB 258GB 36GB 1,000GB 1,500GB 4,144GB Table 2: Minimum Storage Space Requirements Data Processing Characteristics of the Workload The TPC-H workload consists of a number of queries executed on the data. Although an infinite number of queries can be formulated in an ad hoc environment, most of these queries can be put into a few general categories. For example, queries with a significant number of arithmetic operations per row of data are CPU bound, while other queries that require simple filtering operations on large volumes of data become IO bound. It is important to understand the resource needs of the different categories of queries and to size the system to accommodate as many categories as possible. The three major system resources that are stressed by a data warehousing workload are: CPU resources 2004 International Business Machines Corporation, all right reserved. 14

15 Memory resources IO Resources Network Disk Sequential IO scans Random IO scans A well balanced data warehouse system maximizes one or more of the resources in the system. Unlike OLTP workloads, a single unit of a DW work (Query) can be parallelized to maximize the utilization of the system and return results in a minimum amount of time. When additional concurrent queries are added, the databases start to distribute system resources among the active queries. The time to complete a query in a DW workload will vary from one execution to another depending on the mix of queries running at the same time. Only an expert in data warehousing workloads, with help from the database vendor and a data architect, can analyze a schema and anticipate the resource needs of potential queries. Without the ability to experiment and run test queries, even experts can have a hard time gauging the CPU needs of a query. The process of categorizing the various queries starts by careful examination of the database schema, data characteristics and queries. The goal is to determine the following: 1. Estimate the size of the data accessed by each group of queries 2. Categorize the queries based on the dominant resource needs 3. Select some larger queries to represent these categories 4. Use these queries as a guide for system sizing Estimate the Size of Each Query Based on the predicates used in a query and knowledge of the data, the minimum amount of data each query would need to produce the query result can be anticipated. For example, the following query should access the entire LINEITEM table (see Appendix 1 for details): select sum(l_extendedprice*l_discount) as revenue from lineitem For this workload, the various queries were organized in three groups: 1. Small: Less than 15% of data is needed to produce results 2. Medium: Between 15% and 50% of data 3. Large: More than 50% of the data In estimating the data set size for queries, it was assumed that the database can be optimized using indices and other schemes to minimize data access to what is necessary to calculate and produce the query results. Different databases and schema definitions may behave differently for the same queries. For example, a missing index may force the database to execute a full table scan and result in significantly more access to data International Business Machines Corporation, all right reserved. 15

16 Since it was intended to use the largest queries for this characterization work, there was less concern about the ability of the database to optimize data access. The following chart shows the approximate minimum data size required to complete the six largest queries in the workload: 3,000 2,500 Data Size (GB) 2,000 1,500 1, Query Number Figure 5: Query Size Estimates The details of the size categorization for all the queries are gathered in Table 3, in section Categorizing Queries TPC-H queries are categorized with respect to the most dominant resource they use. To do this, the intensity of the processing being applied to the data being read for the query is estimated. For example, the following segment of SQL code shows a query with significant number of calculations for every row of data to be processed: select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice*(1-l_discount)) as sum_disc_price, sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc from lineitem group by l_returnflag, l_linestatus In contrast, the following query has modest processing requirements for each row being scanned. This query will run as fast as the data can be read by a database and therefore it is IO bound International Business Machines Corporation, all right reserved. 16

17 select sum(l_extendedprice*l_discount) as revenue from lineitem Few of the diverse queries were compared and contrasted with each other and based on the experience of the team with DB2 query processing, they were categorized in the following table: Query Data Set Size Memory Requirement CPU IO Requirement Requirement Sequential Random 21 Large High High Low Low Low 9 Large High High Low Medium High 17 Large Low Low High None Low 19 Large Low Low High None Low 1 Large Low Medium Medium None Low 18 Large High High Low Medium Low 7 Medium High High Medium Medium High 5 Medium High High Low None High 13 Medium Medium High Low Low Low 11 Small Medium Low High None Low 6 Small Low Low High None Low 2 Small Low Medium Medium None Low Network Requirement 22 Small Low High Low Low Medium 16 Small High High Low Low Medium 14 Small Low High Low Low High 15 Small Medium High Low Low High Table 3: Query Categorization Once the characteristics of the various queries is established, it can be seen that most of the queries have high CPU or sequential IO requirements. To balance the system, sufficient IO performance is needed to keep CPU resources maximally utilized. As can be seen in the above table, the top six largest queries can easily represent the entire query set with respect to resource needs. All the major system resource categories can be maximized by one or more of the six largest queries Selecting a Representative Set of Queries For this characterization work, the assumption was that if the system is configured to meet the resource needs of the six largest queries it can also provide for the smaller queries with similar characteristics. The following chart was built using data from the analysis of the six largest queries. The chart showed the relation between CPU and IO requirements of each query. Each bubble on the chart represents a query. The size of the bubble is proportional to the volume of data required to obtain the query result. The location of the bubble on the chart shows IO versus CPU 2004 International Business Machines Corporation, all right reserved. 17

18 requirements for the query. When a query is CPU intensive, its IO requirements are lower than a query that requires little CPU power to process the same amount of data. Relative Resource Needs of Queries Relative CPU Resource Requirements Relative IO Throughput Requirements Figure 6: Relative Resource Requirements Figure 6 can be used to make system sizing decisions based on the relative information. For example, if a system is configured with just enough IO bandwidth to maximize the CPU utilization during Query 1, then all queries to the right of Query 1 will be IO bound. It can also be concluded that configuring the system with more IO than is necessary for Query 17 will not provide any benefit. Since it is not possible to configure the system optimally for every possible query, this type of categorization can be used to optimize for the largest class of queries within a given budget The Reference Queries The IO Sizing Query Based on the information in section and Figure 5, Query 17 reads 80% of the data to produce a result. Assuming this query is IO bound, the time to complete this query is equal to the time it takes to read the data. A system with 1GB per second of IO bandwidth would require 800 seconds to read the 800GB of data to complete this query. Based on the business requirement a reasonable response time for this class of queries can be set and the IO system can be configured accordingly. To complete a given query in a fixed period of time, the system needs enough IO bandwidth to read the required data in that time and enough CPU power to keep up with the processing of the data International Business Machines Corporation, all right reserved. 18

19 The point representing Query 17 on the far right edge of the chart in Figure 6 indicates that this query has an insatiable requirement to read data when compared to the other queries in the workload. The CPU Sizing Query Based on the assessments in section and the information provided in Figure 6, it was determined that Query 18 is the most CPU intensive query. Using the information in Figure 6, if the IO requirements of Query 17 can be met, there will be no shortage of IO bandwidth for Query 18. Query 18 will be limited by CPU during its executions. The point representing Query 18 on the top left edge of the chart in Figure 6 indicates that this query has massive requirements for data processing power relative to all other queries in the workload Operational Characteristics of the Workload The TPC-H benchmark has two execution models, the Power Test and the Throughput Test. These two modes of operation have very different system resource requirements, so the needs of the two must be balanced. This multi-mode operational requirement is also common in many DW installations, where, depending on time of day or class of users, system requirements can be very different. When sizing a system all modes of operation must be considered and prioritized between conflicting requirements The Power Test (Single-user Batch Jobs) For this test the goal was to optimize the single-user query processing (Power Test) because of the benchmark requirements. The decision to optimize for single stream query processing makes sense depending on the business needs for the system. If users are scheduled to have dedicated system time with the expectation of the fastest possible response time to the queries they submit, then the system has to be optimized for single stream processing. On the other hand, if the system is mostly used by multiple users who submit queries and are not sensitive to response time detail, then single stream performance is less critical The Throughput Test (Multi-user Operation) Considering that the number of queries that access large amounts of data are CPU intensive and have low IO rate requirements (large bubbles near the top left of the chart in Figure 6), it can be assumed that running multiple queries at the same time would result in a CPU bound system. If a single stream of queries takes 1 CPU hour to complete, then 7 streams of similar queries could potentially take about 7 hours of CPU time. This rule of thumb provides reasonable estimates. However, sometimes simultaneously running query streams can benefit from sharing data caches while at other times the reduced availability of memory results in resource conflicts. To be safe a margin of error should be anticipated International Business Machines Corporation, all right reserved. 19

20 If a system is mostly IO bound during the execution of single stream queries, then more than one stream of queries is required to fully utilize the system resources. To get the best return on investment, the system should be configured to fully utilize CPU resources as much as possible. 2.4 Business Requirements Obviously, the primary purpose of the data warehouse infrastructure is to solve business problems; so it is critical to collect the business requirements applicable to the sizing effort and translate them to system requirements. For example, a business unit may require that a warehouse be reloaded from transactional data once per week and the task must be completed in a six hour window of time. This requirement must be translated to the various read, write and process rates for the data to ensure the system has the appropriate capacity. The following table shows the list of business requirements and expectations addressed in this sizing example. This list is by no means conclusive, but it does capture the most critical elements for this purpose. Requirement Expectation Comment Raw data size Annual growth rate of data Tolerance for performance decline due to growth rate 1000GB of raw text data Less than 2% Less than 2% The raw data size is only the base number for calculating storage space requirements. Normal growth rate must be accommodated without major architectural changes in the system. Slower response time for any operation can be tolerated if the percentage of degradation is less than or equal to that of the growth in data. Service life expectancy 3 Years The system is expected to operate for at least 3 years without major changes. Raw data load rates 145MB/Sec 145MB per second load rate is derived from the need to load the 1000GB data in less than two hours. At this rate, a DBA will have enough time to rebuild the warehouse from scratch and prepare it for query process in less than 4 hours International Business Machines Corporation, all right reserved. 20

21 Requirement Expectation Comment Scan query response time based on total data Reporting and computational query response time based on total data. Less than 200 seconds Less than 900 seconds This requirement is critical because a significant number of ad hoc queries in the workload perform scan and filter operations on all or part of the data. In addition, extract operations from the warehouse are bound by the scan rate of the system. The response time for several large and small queries with simple calculation will be impacted by the scan rate. Query 17 is our guide for this criterion. Our workload has several queries that frequently run to generate reports. These queries require several complicated computations and sort operation on all or parts of the data. The intent of this requirement is to ensure the worst case report will complete in what is considered reasonable time based on the business requirement. We also intend to run multiple simultaneous queries and we like to ensure the work can be completed in a reasonable time. We will use Query 18 as our guide for sizing the system to meet this requirement. Query concurrency rate 7 query streams The workload requires that at least 7 query streams can operate concurrently at any one time. In addition, query response times must not degrade by more than a factor of 7 to 8, when compared to a single stream execution. Performance versus cost optimization priority Performance In this exercise, reaching the specific performance targets had the highest priority. Although the overall hardware cost had a budget ceiling, the price/performance was secondary to overall performance. Our goal was to achieve the performance target with optimal configuration and price within the budget guidelines International Business Machines Corporation, all right reserved. 21

22 Requirement Expectation Comment Data protections RAID Level 5 Our workload requires RAID protection to prevent a single disk failure from disabling the system. RAID level 5 is most appropriate for us because it allows protection with a reasonable overhead. A 5+P RAID level 5 configuration requires 5 disks for data and one disk for parity; this adds a 17% disk overhead for RAID-5. Table 4: Business Requirements The above table maps the business requirements to a set of checklist items for the system. When sizing the system, this table should be used as a boundary guide to ensure the business needs are met. 2.5 Sizing the System for Effective Utilization In data warehouse workloads, it is difficult to always maximize the utilization of CPU, IO or both resources. In many situations, factors such as limited memory or network resources can result in idle CPU or IO subsystems. In addition, all databases occasionally have execution paths that fail to maximize system utilization. The goal is to configure a system that meets the workload requirements and runs as close to the system utilization limit as possible. Saturating processing power and maximizing the IO subsystem during most of the operational hours of the system will provide the best return on investment Sizing the Storage After collecting all the relevant data from workload characterization, business requirements, and system data sheets, the information can be compiled into a single set of guidelines for sizing the storage configuration. RAID level 5 configuration using 5+P settings 145MB/sec load rate 5,000MB/sec scan rate based on Query 17 IO profile and 200 second limit (see Table 4) on scan query response time. Approximately 700MB/sec IO rate per FAStT900 (See section 2.2.2) 4 Fibre Channel interfaces per FAStT disk capacity per FAStT700 Expansion unit Disk size of our choice 36GB Minimum storage space needs 4,144GB Minimum number of disks is 122 (4144 / 34 = 122) Number of FAStT900 needed to meet IO 5,000MB/sec IO rate is 8 (5000 / 700 = 8) Number of FAStT900 needed to meet load rate of 145MB/sec is International Business Machines Corporation, all right reserved. 22

23 Number of EXP700 needed to fully utilize all FAStT900 disk side interfaces is 16 (2 per T900) Since 122 disks do not evenly distribute amongst 16 EXP700, the number of disks should be rounded up from 7.6 per EXP700 to 8. Additionally to configure for 5+P RAID level 5 configuration, the number of data disks in each EXP700 should be divisible by 5 so once again the number of disks in each EXP700 is rounded up to 10. For every 5 data disks, a parity disk should be added so the total number of disks per EXP700 is 12. The following diagram shows the logical configuration of a FAStT900 with the enclosures and disks attached: 5+P RAID 5 Arrays FC HBA FC HBA CTRL A SFP ESM A SFP SFP ESM B A B E E Host SFP EXP700 Enclosure 0 PCI-X SFP FC HBA FC HBA CTRL B FAStT900 ESM A SFP SFP ESM B SFP A B E E EXP700 Enclosure 1 Figure 7: FAStT900 Disk Configuration 12 disks per EXP700, 2 EXP700 per FAStT900 and 8 FAStT900 brings the total number of disks to 192. Obviously 192 disks is significantly more than the initial requirement of 122 disks but this configuration provides a balanced, evenly distributed load with RAID level 5 protection that meets the performance requirement. Random read and write requirements for this workload was relatively low when compared to the rest of the IO needs. Considering the number of disks and FAStT900 servers, the random IO requirements of the workload is easily met. Although a single FAStT900 can easily accommodate the space needs for the workload it can not possibly meet the performance requirements. Space should never be the only determining factor for storage configuration. Based on the storage sizing the system has to be able to accommodate 8 storage servers each with 4 Fibre Channel interfaces. A total of 32 Fibre Channel host bus adapters are required to match the performance of the storage subsystem International Business Machines Corporation, all right reserved. 23

24 Each p655 can easily provide for 1400MB/sec of IO (See section ). From this, it can be determined that at least four p655 nodes are required to satisfy the 5000MB/sec IO needs of the workload. Depending on the CPU and memory needs for the workloads, the storage can be connected to 8 or 16 nodes to satisfy the IO bandwidth requirements. Considering the flexibility of the FAStT900, any size system can be connected that can evenly distribute the access to 32 HBAs amongst all processors to provide for 175MB/sec of bandwidth for each HBA. This system can be a single node with any number of CPUs or a multi-node cluster of systems with an aggregate IO bandwidth of 5000MB/sec. The following table sums up the storage requirements: Storage Configuration FAStT900 Storage Server 8 EXP700 Enclosures 16 36GB disk drives Gb Fibre Channel Host bus adapters Sizing For CPU and Memory Table 5: Storage Configuration Sizing for CPU and memory requires extensive experience and significant knowledge of the workload. For this project, the experience of past testing with specific queries and the knowledge of the relative performance of the older systems compared to the targeted systems was beneficial. In addition to past experiences, the sizing team needs to run experiments and make educated guesses. To estimate processor requirements for CPU intensive queries, a small one processor system can be set up with a fraction of the data to measure query performance. The test system can have any processors so long as their performance can be related to the targeted system. For example, as long as it is known that the test system processor is ten times slower than the targeted system processor, reasonable estimates can be made. The test system can be used to measure the time it takes to process a fixed amount of data by the worst case query. If the test system completes a 1GB query in 50 seconds then it is known that a processor which is ten times faster can complete the work in 5 seconds. To process 1000GB in 500 seconds, 10 of the new processors are needed. This type of estimate can only be applied to simple compute intensive queries that are known to scale and that have stable query plans regardless of their size. Complex multi-table join queries are not good candidates for simple CPU testing because most databases try to apply query optimizations that may behave differently based on data size. The memory usage of the test system needs to be limited to a relative size. It would be unreasonable to allow the 1GB test query to consume 1GB of memory unless is was intended to configure the target system with 1000GB of memory. To simplify the estimates it is preferable to establish a fixed relation between memory size and data size during testing. For this testing, it was established that at least 100MB of database memory was needed for each 1GB of data International Business Machines Corporation, all right reserved. 24

Dell Microsoft SQL Server 2008 Fast Track Data Warehouse Performance Characterization

Dell Microsoft SQL Server 2008 Fast Track Data Warehouse Performance Characterization Dell Microsoft SQL Server 2008 Fast Track Data Warehouse Performance Characterization A Dell Technical White Paper Database Solutions Engineering Dell Product Group Anthony Fernandez Jisha J Executive

More information

Introduction to Decision Support, Data Warehousing, Business Intelligence, and Analytical Load Testing for all Databases

Introduction to Decision Support, Data Warehousing, Business Intelligence, and Analytical Load Testing for all Databases Introduction to Decision Support, Data Warehousing, Business Intelligence, and Analytical Load Testing for all Databases This guide gives you an introduction to conducting DSS (Decision Support System)

More information

Introduction to Decision Support, Data Warehousing, Business Intelligence, and Analytical Load Testing for all Databases

Introduction to Decision Support, Data Warehousing, Business Intelligence, and Analytical Load Testing for all Databases Introduction to Decision Support, Data Warehousing, Business Intelligence, and Analytical Load Testing for all Databases This guide gives you an introduction to conducting DSS (Decision Support System)

More information

Maximizing Backup and Restore Performance of Large Databases

Maximizing Backup and Restore Performance of Large Databases Maximizing Backup and Restore Performance of Large Databases - 1 - Forward (from Meta Group) Most companies critical data is being stored within relational databases. Over 90% of all mission critical systems,

More information

New!! - Higher performance for Windows and UNIX environments

New!! - Higher performance for Windows and UNIX environments New!! - Higher performance for Windows and UNIX environments The IBM TotalStorage Network Attached Storage Gateway 300 (NAS Gateway 300) is designed to act as a gateway between a storage area network (SAN)

More information

The MAX5 Advantage: Clients Benefit running Microsoft SQL Server Data Warehouse (Workloads) on IBM BladeCenter HX5 with IBM MAX5.

The MAX5 Advantage: Clients Benefit running Microsoft SQL Server Data Warehouse (Workloads) on IBM BladeCenter HX5 with IBM MAX5. Performance benefit of MAX5 for databases The MAX5 Advantage: Clients Benefit running Microsoft SQL Server Data Warehouse (Workloads) on IBM BladeCenter HX5 with IBM MAX5 Vinay Kulkarni Kent Swalin IBM

More information

Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III

Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III White Paper Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III Performance of Microsoft SQL Server 2008 BI and D/W Solutions on Dell PowerEdge

More information

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance. Agenda Enterprise Performance Factors Overall Enterprise Performance Factors Best Practice for generic Enterprise Best Practice for 3-tiers Enterprise Hardware Load Balancer Basic Unix Tuning Performance

More information

Technical White Paper. Symantec Backup Exec 10d System Sizing. Best Practices For Optimizing Performance of the Continuous Protection Server

Technical White Paper. Symantec Backup Exec 10d System Sizing. Best Practices For Optimizing Performance of the Continuous Protection Server Symantec Backup Exec 10d System Sizing Best Practices For Optimizing Performance of the Continuous Protection Server Table of Contents Table of Contents...2 Executive Summary...3 System Sizing and Performance

More information

Maximum performance, minimal risk for data warehousing

Maximum performance, minimal risk for data warehousing SYSTEM X SERVERS SOLUTION BRIEF Maximum performance, minimal risk for data warehousing Microsoft Data Warehouse Fast Track for SQL Server 2014 on System x3850 X6 (95TB) The rapid growth of technology has

More information

Evaluation Report: Accelerating SQL Server Database Performance with the Lenovo Storage S3200 SAN Array

Evaluation Report: Accelerating SQL Server Database Performance with the Lenovo Storage S3200 SAN Array Evaluation Report: Accelerating SQL Server Database Performance with the Lenovo Storage S3200 SAN Array Evaluation report prepared under contract with Lenovo Executive Summary Even with the price of flash

More information

SAN Conceptual and Design Basics

SAN Conceptual and Design Basics TECHNICAL NOTE VMware Infrastructure 3 SAN Conceptual and Design Basics VMware ESX Server can be used in conjunction with a SAN (storage area network), a specialized high speed network that connects computer

More information

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief Technical white paper HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief Scale-up your Microsoft SQL Server environment to new heights Table of contents Executive summary... 2 Introduction...

More information

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays Red Hat Performance Engineering Version 1.0 August 2013 1801 Varsity Drive Raleigh NC

More information

Oracle Database Deployments with EMC CLARiiON AX4 Storage Systems

Oracle Database Deployments with EMC CLARiiON AX4 Storage Systems Oracle Database Deployments with EMC CLARiiON AX4 Storage Systems Applied Technology Abstract This white paper investigates configuration and replication choices for Oracle Database deployment with EMC

More information

HP reference configuration for entry-level SAS Grid Manager solutions

HP reference configuration for entry-level SAS Grid Manager solutions HP reference configuration for entry-level SAS Grid Manager solutions Up to 864 simultaneous SAS jobs and more than 3 GB/s I/O throughput Technical white paper Table of contents Executive summary... 2

More information

Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation

Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation Evaluation report prepared under contract with HP Executive Summary The computing industry is experiencing an increasing demand for

More information

Performance and scalability of a large OLTP workload

Performance and scalability of a large OLTP workload Performance and scalability of a large OLTP workload ii Performance and scalability of a large OLTP workload Contents Performance and scalability of a large OLTP workload with DB2 9 for System z on Linux..............

More information

Comprehending the Tradeoffs between Deploying Oracle Database on RAID 5 and RAID 10 Storage Configurations. Database Solutions Engineering

Comprehending the Tradeoffs between Deploying Oracle Database on RAID 5 and RAID 10 Storage Configurations. Database Solutions Engineering Comprehending the Tradeoffs between Deploying Oracle Database on RAID 5 and RAID 10 Storage Configurations A Dell Technical White Paper Database Solutions Engineering By Sudhansu Sekhar and Raghunatha

More information

IOmark- VDI. Nimbus Data Gemini Test Report: VDI- 130906- a Test Report Date: 6, September 2013. www.iomark.org

IOmark- VDI. Nimbus Data Gemini Test Report: VDI- 130906- a Test Report Date: 6, September 2013. www.iomark.org IOmark- VDI Nimbus Data Gemini Test Report: VDI- 130906- a Test Copyright 2010-2013 Evaluator Group, Inc. All rights reserved. IOmark- VDI, IOmark- VDI, VDI- IOmark, and IOmark are trademarks of Evaluator

More information

HP SN1000E 16 Gb Fibre Channel HBA Evaluation

HP SN1000E 16 Gb Fibre Channel HBA Evaluation HP SN1000E 16 Gb Fibre Channel HBA Evaluation Evaluation report prepared under contract with Emulex Executive Summary The computing industry is experiencing an increasing demand for storage performance

More information

TECHNOLOGY BRIEF. Compaq RAID on a Chip Technology EXECUTIVE SUMMARY CONTENTS

TECHNOLOGY BRIEF. Compaq RAID on a Chip Technology EXECUTIVE SUMMARY CONTENTS TECHNOLOGY BRIEF August 1999 Compaq Computer Corporation Prepared by ISSD Technology Communications CONTENTS Executive Summary 1 Introduction 3 Subsystem Technology 3 Processor 3 SCSI Chip4 PCI Bridge

More information

The IntelliMagic White Paper: Storage Performance Analysis for an IBM Storwize V7000

The IntelliMagic White Paper: Storage Performance Analysis for an IBM Storwize V7000 The IntelliMagic White Paper: Storage Performance Analysis for an IBM Storwize V7000 Summary: This document describes how to analyze performance on an IBM Storwize V7000. IntelliMagic 2012 Page 1 This

More information

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1 Performance Study Performance Characteristics of and RDM VMware ESX Server 3.0.1 VMware ESX Server offers three choices for managing disk access in a virtual machine VMware Virtual Machine File System

More information

Windows Server Performance Monitoring

Windows Server Performance Monitoring Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly

More information

Optimizing Large Arrays with StoneFly Storage Concentrators

Optimizing Large Arrays with StoneFly Storage Concentrators Optimizing Large Arrays with StoneFly Storage Concentrators All trademark names are the property of their respective companies. This publication contains opinions of which are subject to change from time

More information

EMC XtremSF: Delivering Next Generation Performance for Oracle Database

EMC XtremSF: Delivering Next Generation Performance for Oracle Database White Paper EMC XtremSF: Delivering Next Generation Performance for Oracle Database Abstract This white paper addresses the challenges currently facing business executives to store and process the growing

More information

Virtuoso and Database Scalability

Virtuoso and Database Scalability Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of

More information

Distribution One Server Requirements

Distribution One Server Requirements Distribution One Server Requirements Introduction Welcome to the Hardware Configuration Guide. The goal of this guide is to provide a practical approach to sizing your Distribution One application and

More information

EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server

EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server White Paper EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server Abstract This white paper addresses the challenges currently facing business executives to store and process the growing

More information

Server Consolidation with SQL Server 2008

Server Consolidation with SQL Server 2008 Server Consolidation with SQL Server 2008 White Paper Published: August 2007 Updated: July 2008 Summary: Microsoft SQL Server 2008 supports multiple options for server consolidation, providing organizations

More information

SQL Server Business Intelligence on HP ProLiant DL785 Server

SQL Server Business Intelligence on HP ProLiant DL785 Server SQL Server Business Intelligence on HP ProLiant DL785 Server By Ajay Goyal www.scalabilityexperts.com Mike Fitzner Hewlett Packard www.hp.com Recommendations presented in this document should be thoroughly

More information

High performance ETL Benchmark

High performance ETL Benchmark High performance ETL Benchmark Author: Dhananjay Patil Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 07/02/04 Email: erg@evaltech.com Abstract: The IBM server iseries

More information

The Revival of Direct Attached Storage for Oracle Databases

The Revival of Direct Attached Storage for Oracle Databases The Revival of Direct Attached Storage for Oracle Databases Revival of DAS in the IT Infrastructure Introduction Why is it that the industry needed SANs to get more than a few hundred disks attached to

More information

Sun 8Gb/s Fibre Channel HBA Performance Advantages for Oracle Database

Sun 8Gb/s Fibre Channel HBA Performance Advantages for Oracle Database Performance Advantages for Oracle Database At a Glance This Technical Brief illustrates that even for smaller online transaction processing (OLTP) databases, the Sun 8Gb/s Fibre Channel Host Bus Adapter

More information

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Oracle Database Scalability in VMware ESX VMware ESX 3.5 Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises

More information

The IntelliMagic White Paper on: Storage Performance Analysis for an IBM San Volume Controller (SVC) (IBM V7000)

The IntelliMagic White Paper on: Storage Performance Analysis for an IBM San Volume Controller (SVC) (IBM V7000) The IntelliMagic White Paper on: Storage Performance Analysis for an IBM San Volume Controller (SVC) (IBM V7000) IntelliMagic, Inc. 558 Silicon Drive Ste 101 Southlake, Texas 76092 USA Tel: 214-432-7920

More information

Storage Layout and I/O Performance in Data Warehouses

Storage Layout and I/O Performance in Data Warehouses Storage Layout and I/O Performance in Data Warehouses Matthias Nicola 1, Haider Rizvi 2 1 IBM Silicon Valley Lab 2 IBM Toronto Lab mnicola@us.ibm.com haider@ca.ibm.com Abstract. Defining data placement

More information

1 Storage Devices Summary

1 Storage Devices Summary Chapter 1 Storage Devices Summary Dependability is vital Suitable measures Latency how long to the first bit arrives Bandwidth/throughput how fast does stuff come through after the latency period Obvious

More information

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f. one large disk) Parallelism improves performance Plus extra disk(s) for redundant data storage Provides fault tolerant

More information

IBM System Storage DS5020 Express

IBM System Storage DS5020 Express IBM DS5020 Express Manage growth, complexity, and risk with scalable, high-performance storage Highlights Mixed host interfaces support (Fibre Channel/iSCSI) enables SAN tiering Balanced performance well-suited

More information

Scalability. Microsoft Dynamics GP 10.0. Benchmark Performance: Advantages of Microsoft SQL Server 2008 with Compression.

Scalability. Microsoft Dynamics GP 10.0. Benchmark Performance: Advantages of Microsoft SQL Server 2008 with Compression. Scalability Microsoft Dynamics GP 10.0 Benchmark Performance: Advantages of Microsoft SQL Server 2008 with Compression White Paper May 2009 Contents Introduction... 3 Summary Results... 3 Benchmark Test

More information

Maximizing Server Storage Performance with PCI Express and Serial Attached SCSI. Article for InfoStor November 2003 Paul Griffith Adaptec, Inc.

Maximizing Server Storage Performance with PCI Express and Serial Attached SCSI. Article for InfoStor November 2003 Paul Griffith Adaptec, Inc. Filename: SAS - PCI Express Bandwidth - Infostor v5.doc Maximizing Server Storage Performance with PCI Express and Serial Attached SCSI Article for InfoStor November 2003 Paul Griffith Adaptec, Inc. Server

More information

Optimizing LTO Backup Performance

Optimizing LTO Backup Performance Optimizing LTO Backup Performance July 19, 2011 Written by: Ash McCarty Contributors: Cedrick Burton Bob Dawson Vang Nguyen Richard Snook Table of Contents 1.0 Introduction... 3 2.0 Host System Configuration...

More information

Setting a new standard

Setting a new standard Changing the UNIX landscape IBM pseries 690 nearly doubling the power of the pseries 680, previously the most powerful pseries server available. 2 The powerful IBM ^ pseries 690 Highlights Datacenter-class

More information

EMC Unified Storage for Microsoft SQL Server 2008

EMC Unified Storage for Microsoft SQL Server 2008 EMC Unified Storage for Microsoft SQL Server 2008 Enabled by EMC CLARiiON and EMC FAST Cache Reference Copyright 2010 EMC Corporation. All rights reserved. Published October, 2010 EMC believes the information

More information

enabling Ultra-High Bandwidth Scalable SSDs with HLnand

enabling Ultra-High Bandwidth Scalable SSDs with HLnand www.hlnand.com enabling Ultra-High Bandwidth Scalable SSDs with HLnand May 2013 2 Enabling Ultra-High Bandwidth Scalable SSDs with HLNAND INTRODUCTION Solid State Drives (SSDs) are available in a wide

More information

Express5800 Scalable Enterprise Server Reference Architecture. For NEC PCIe SSD Appliance for Microsoft SQL Server

Express5800 Scalable Enterprise Server Reference Architecture. For NEC PCIe SSD Appliance for Microsoft SQL Server Express5800 Scalable Enterprise Server Reference Architecture For NEC PCIe SSD Appliance for Microsoft SQL Server An appliance that significantly improves performance of enterprise systems and large-scale

More information

Rackspace Cloud Databases and Container-based Virtualization

Rackspace Cloud Databases and Container-based Virtualization Rackspace Cloud Databases and Container-based Virtualization August 2012 J.R. Arredondo @jrarredondo Page 1 of 6 INTRODUCTION When Rackspace set out to build the Cloud Databases product, we asked many

More information

Benchmarking Hadoop & HBase on Violin

Benchmarking Hadoop & HBase on Violin Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages

More information

What is RAID--BASICS? Mylex RAID Primer. A simple guide to understanding RAID

What is RAID--BASICS? Mylex RAID Primer. A simple guide to understanding RAID What is RAID--BASICS? Mylex RAID Primer A simple guide to understanding RAID Let's look at a hard disk... Several platters stacked on top of each other with a little space in between. One to n platters

More information

DEPLOYING IBM DB2 FOR LINUX, UNIX, AND WINDOWS DATA WAREHOUSES ON EMC STORAGE ARRAYS

DEPLOYING IBM DB2 FOR LINUX, UNIX, AND WINDOWS DATA WAREHOUSES ON EMC STORAGE ARRAYS White Paper DEPLOYING IBM DB2 FOR LINUX, UNIX, AND WINDOWS DATA WAREHOUSES ON EMC STORAGE ARRAYS Abstract This white paper provides an overview of key components, criteria, and requirements for deploying

More information

2009 Oracle Corporation 1

2009 Oracle Corporation 1 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material,

More information

Q & A From Hitachi Data Systems WebTech Presentation:

Q & A From Hitachi Data Systems WebTech Presentation: Q & A From Hitachi Data Systems WebTech Presentation: RAID Concepts 1. Is the chunk size the same for all Hitachi Data Systems storage systems, i.e., Adaptable Modular Systems, Network Storage Controller,

More information

BACKUP BENCHMARKING OF VERY L ARGE MICROSOFT SQL S ERVER 7.0 D ATABASES DURING ACTIVE ONLINE T RANSACTION PERFORMANCE L OADING ON COMPAQ HARDWARE

BACKUP BENCHMARKING OF VERY L ARGE MICROSOFT SQL S ERVER 7.0 D ATABASES DURING ACTIVE ONLINE T RANSACTION PERFORMANCE L OADING ON COMPAQ HARDWARE BACKUP BENCHMARKING OF VERY L ARGE MICROSOFT SQL S ERVER 7.0 D ATABASES DURING ACTIVE ONLINE T RANSACTION PERFORMANCE L OADING ON COMPAQ HARDWARE By Torrey Russell Backup & Disaster Recovery for Windows

More information

Minimize cost and risk for data warehousing

Minimize cost and risk for data warehousing SYSTEM X SERVERS SOLUTION BRIEF Minimize cost and risk for data warehousing Microsoft Data Warehouse Fast Track for SQL Server 2014 on System x3850 X6 (55TB) Highlights Improve time to value for your data

More information

An Oracle White Paper September 2011. Oracle Exadata Database Machine - Backup & Recovery Sizing: Tape Backups

An Oracle White Paper September 2011. Oracle Exadata Database Machine - Backup & Recovery Sizing: Tape Backups An Oracle White Paper September 2011 Oracle Exadata Database Machine - Backup & Recovery Sizing: Tape Backups Table of Contents Introduction... 3 Tape Backup Infrastructure Components... 4 Requirements...

More information

Accelerating Business Intelligence with Large-Scale System Memory

Accelerating Business Intelligence with Large-Scale System Memory Accelerating Business Intelligence with Large-Scale System Memory A Proof of Concept by Intel, Samsung, and SAP Executive Summary Real-time business intelligence (BI) plays a vital role in driving competitiveness

More information

Emulex 8Gb Fibre Channel Expansion Card (CIOv) for IBM BladeCenter IBM BladeCenter at-a-glance guide

Emulex 8Gb Fibre Channel Expansion Card (CIOv) for IBM BladeCenter IBM BladeCenter at-a-glance guide Emulex 8Gb Fibre Channel Expansion Card (CIOv) for IBM BladeCenter IBM BladeCenter at-a-glance guide The Emulex 8Gb Fibre Channel Expansion Card (CIOv) for IBM BladeCenter enables high-performance connection

More information

PSAM, NEC PCIe SSD Appliance for Microsoft SQL Server (Reference Architecture) September 11 th, 2014 NEC Corporation

PSAM, NEC PCIe SSD Appliance for Microsoft SQL Server (Reference Architecture) September 11 th, 2014 NEC Corporation PSAM, NEC PCIe SSD Appliance for Microsoft SQL Server (Reference Architecture) September 11 th, 2014 NEC Corporation 1. Overview of NEC PCIe SSD Appliance for Microsoft SQL Server Page 2 NEC Corporation

More information

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...

More information

QLogic 2500 Series FC HBAs Accelerate Application Performance

QLogic 2500 Series FC HBAs Accelerate Application Performance White Paper QLogic 2500 Series FC HBAs Accelerate Application Performance QLogic 8Gb HBAs: Planning for Future Requirements 8Gb Performance Meets the Needs of Next Generation Data Centers Key Findings

More information

Dependable Systems. 9. Redundant arrays of. Prof. Dr. Miroslaw Malek. Wintersemester 2004/05 www.informatik.hu-berlin.de/rok/zs

Dependable Systems. 9. Redundant arrays of. Prof. Dr. Miroslaw Malek. Wintersemester 2004/05 www.informatik.hu-berlin.de/rok/zs Dependable Systems 9. Redundant arrays of inexpensive disks (RAID) Prof. Dr. Miroslaw Malek Wintersemester 2004/05 www.informatik.hu-berlin.de/rok/zs Redundant Arrays of Inexpensive Disks (RAID) RAID is

More information

High Performance Tier Implementation Guideline

High Performance Tier Implementation Guideline High Performance Tier Implementation Guideline A Dell Technical White Paper PowerVault MD32 and MD32i Storage Arrays THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS

More information

Best Practices RAID Implementations for Snap Servers and JBOD Expansion

Best Practices RAID Implementations for Snap Servers and JBOD Expansion STORAGE SOLUTIONS WHITE PAPER Best Practices RAID Implementations for Snap Servers and JBOD Expansion Contents Introduction...1 Planning for the End Result...1 Availability Considerations...1 Drive Reliability...2

More information

DELL TM PowerEdge TM T610 500 Mailbox Resiliency Exchange 2010 Storage Solution

DELL TM PowerEdge TM T610 500 Mailbox Resiliency Exchange 2010 Storage Solution DELL TM PowerEdge TM T610 500 Mailbox Resiliency Exchange 2010 Storage Solution Tested with: ESRP Storage Version 3.0 Tested Date: Content DELL TM PowerEdge TM T610... 1 500 Mailbox Resiliency

More information

Data Protection with IBM TotalStorage NAS and NSI Double- Take Data Replication Software

Data Protection with IBM TotalStorage NAS and NSI Double- Take Data Replication Software Data Protection with IBM TotalStorage NAS and NSI Double- Take Data Replication September 2002 IBM Storage Products Division Raleigh, NC http://www.storage.ibm.com Table of contents Introduction... 3 Key

More information

HP Smart Array Controllers and basic RAID performance factors

HP Smart Array Controllers and basic RAID performance factors Technical white paper HP Smart Array Controllers and basic RAID performance factors Technology brief Table of contents Abstract 2 Benefits of drive arrays 2 Factors that affect performance 2 HP Smart Array

More information

The Methodology Behind the Dell SQL Server Advisor Tool

The Methodology Behind the Dell SQL Server Advisor Tool The Methodology Behind the Dell SQL Server Advisor Tool Database Solutions Engineering By Phani MV Dell Product Group October 2009 Executive Summary The Dell SQL Server Advisor is intended to perform capacity

More information

Innovative technology for big data analytics

Innovative technology for big data analytics Technical white paper Innovative technology for big data analytics The HP Vertica Analytics Platform database provides price/performance, scalability, availability, and ease of administration Table of

More information

STORAGE CENTER. The Industry s Only SAN with Automated Tiered Storage STORAGE CENTER

STORAGE CENTER. The Industry s Only SAN with Automated Tiered Storage STORAGE CENTER STORAGE CENTER DATASHEET STORAGE CENTER Go Beyond the Boundaries of Traditional Storage Systems Today s storage vendors promise to reduce the amount of time and money companies spend on storage but instead

More information

Windows 8 SMB 2.2 File Sharing Performance

Windows 8 SMB 2.2 File Sharing Performance Windows 8 SMB 2.2 File Sharing Performance Abstract This paper provides a preliminary analysis of the performance capabilities of the Server Message Block (SMB) 2.2 file sharing protocol with 10 gigabit

More information

MS SQL Performance (Tuning) Best Practices:

MS SQL Performance (Tuning) Best Practices: MS SQL Performance (Tuning) Best Practices: 1. Don t share the SQL server hardware with other services If other workloads are running on the same server where SQL Server is running, memory and other hardware

More information

RAID 5 rebuild performance in ProLiant

RAID 5 rebuild performance in ProLiant RAID 5 rebuild performance in ProLiant technology brief Abstract... 2 Overview of the RAID 5 rebuild process... 2 Estimating the mean-time-to-failure (MTTF)... 3 Factors affecting RAID 5 array rebuild

More information

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads 89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com @EdisonGroupInc 212.367.7400 IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads A Competitive Test and Evaluation Report

More information

WHITE PAPER BRENT WELCH NOVEMBER

WHITE PAPER BRENT WELCH NOVEMBER BACKUP WHITE PAPER BRENT WELCH NOVEMBER 2006 WHITE PAPER: BACKUP TABLE OF CONTENTS Backup Overview 3 Background on Backup Applications 3 Backup Illustration 4 Media Agents & Keeping Tape Drives Busy 5

More information

Benchmarking Cassandra on Violin

Benchmarking Cassandra on Violin Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract

More information

QLogic 4Gb Fibre Channel Expansion Card (CIOv) for IBM BladeCenter IBM BladeCenter at-a-glance guide

QLogic 4Gb Fibre Channel Expansion Card (CIOv) for IBM BladeCenter IBM BladeCenter at-a-glance guide QLogic 4Gb Fibre Channel Expansion Card (CIOv) for IBM BladeCenter IBM BladeCenter at-a-glance guide The QLogic 4Gb Fibre Channel Expansion Card (CIOv) for BladeCenter enables you to quickly and simply

More information

Deploying Microsoft SQL Server 2005 Business Intelligence and Data Warehousing Solutions on Dell PowerEdge Servers and Dell PowerVault Storage

Deploying Microsoft SQL Server 2005 Business Intelligence and Data Warehousing Solutions on Dell PowerEdge Servers and Dell PowerVault Storage White Paper Dell Microsoft - Reference Configurations Deploying Microsoft SQL Server 2005 Business Intelligence and Data Warehousing Solutions on Dell PowerEdge Servers and Dell PowerVault Storage Abstract

More information

Dell Virtualization Solution for Microsoft SQL Server 2012 using PowerEdge R820

Dell Virtualization Solution for Microsoft SQL Server 2012 using PowerEdge R820 Dell Virtualization Solution for Microsoft SQL Server 2012 using PowerEdge R820 This white paper discusses the SQL server workload consolidation capabilities of Dell PowerEdge R820 using Virtualization.

More information

Redbooks Paper. Local versus Remote Database Access: A Performance Test. Victor Chao Leticia Cruz Nin Lei

Redbooks Paper. Local versus Remote Database Access: A Performance Test. Victor Chao Leticia Cruz Nin Lei Redbooks Paper Victor Chao Leticia Cruz Nin Lei Local versus Remote Database Access: A Performance Test When tuning a database for better performance, one area to examine is the proximity of the database

More information

James Serra Sr BI Architect JamesSerra3@gmail.com http://jamesserra.com/

James Serra Sr BI Architect JamesSerra3@gmail.com http://jamesserra.com/ James Serra Sr BI Architect JamesSerra3@gmail.com http://jamesserra.com/ Our Focus: Microsoft Pure-Play Data Warehousing & Business Intelligence Partner Our Customers: Our Reputation: "B.I. Voyage came

More information

QLogic 8Gb FC Single-port and Dual-port HBAs for IBM System x IBM System x at-a-glance guide

QLogic 8Gb FC Single-port and Dual-port HBAs for IBM System x IBM System x at-a-glance guide QLogic 8Gb FC Single-port and Dual-port HBAs for IBM System x IBM System x at-a-glance guide The QLogic 8Gb FC Single-port and Dual-port HBA for IBM System x are PCI Express 2.0 x8 8Gb Fibre Channel adapters

More information

Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers

Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers WHITE PAPER FUJITSU PRIMERGY AND PRIMEPOWER SERVERS Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers CHALLENGE Replace a Fujitsu PRIMEPOWER 2500 partition with a lower cost solution that

More information

Unprecedented Performance and Scalability Demonstrated For Meter Data Management:

Unprecedented Performance and Scalability Demonstrated For Meter Data Management: Unprecedented Performance and Scalability Demonstrated For Meter Data Management: Ten Million Meters Scalable to One Hundred Million Meters For Five Billion Daily Meter Readings Performance testing results

More information

Configuring RAID for Optimal Performance

Configuring RAID for Optimal Performance Configuring RAID for Optimal Performance Intel RAID Controller SRCSASJV Intel RAID Controller SRCSASRB Intel RAID Controller SRCSASBB8I Intel RAID Controller SRCSASLS4I Intel RAID Controller SRCSATAWB

More information

Accelerating Business Intelligence with Large-Scale System Memory

Accelerating Business Intelligence with Large-Scale System Memory Accelerating Business Intelligence with Large-Scale System Memory A Proof of Concept by Intel, Samsung, and SAP Executive Summary Real-time business intelligence (BI) plays a vital role in driving competitiveness

More information

Accelerating Database Applications on Linux Servers

Accelerating Database Applications on Linux Servers White Paper Accelerating Database Applications on Linux Servers Introducing OCZ s LXL Software - Delivering a Data-Path Optimized Solution for Flash Acceleration Allon Cohen, PhD Yaron Klein Eli Ben Namer

More information

What s the best disk storage for my i5/os workload?

What s the best disk storage for my i5/os workload? What s the best disk storage for my i5/os workload? Sue Baker IBM System i Advanced Technical Support Agenda Storage management styles Storage technologies for i5/os Considerations for implementing and

More information

Evaluation Report: Database Acceleration with HP 3PAR StoreServ 7450 All-flash Storage

Evaluation Report: Database Acceleration with HP 3PAR StoreServ 7450 All-flash Storage Evaluation Report: Database Acceleration with HP 3PAR StoreServ 7450 All-flash Storage Evaluation report prepared under contract with HP Executive Summary Solid state storage is transforming the entire

More information

The Benefits of Virtualizing

The Benefits of Virtualizing T E C H N I C A L B R I E F The Benefits of Virtualizing Aciduisismodo Microsoft SQL Dolore Server Eolore in Dionseq Hitachi Storage Uatummy Environments Odolorem Vel Leveraging Microsoft Hyper-V By Heidi

More information

Microsoft SQL Server 2000 Index Defragmentation Best Practices

Microsoft SQL Server 2000 Index Defragmentation Best Practices Microsoft SQL Server 2000 Index Defragmentation Best Practices Author: Mike Ruthruff Microsoft Corporation February 2003 Summary: As Microsoft SQL Server 2000 maintains indexes to reflect updates to their

More information

IOmark-VM. DotHill AssuredSAN Pro 5000. Test Report: VM- 130816-a Test Report Date: 16, August 2013. www.iomark.org

IOmark-VM. DotHill AssuredSAN Pro 5000. Test Report: VM- 130816-a Test Report Date: 16, August 2013. www.iomark.org IOmark-VM DotHill AssuredSAN Pro 5000 Test Report: VM- 130816-a Test Report Date: 16, August 2013 Copyright 2010-2013 Evaluator Group, Inc. All rights reserved. IOmark-VM, IOmark-VDI, VDI-IOmark, and IOmark

More information

The Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays

The Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays The Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays Executive Summary Microsoft SQL has evolved beyond serving simple workgroups to a platform delivering sophisticated

More information

Leveraging EMC Fully Automated Storage Tiering (FAST) and FAST Cache for SQL Server Enterprise Deployments

Leveraging EMC Fully Automated Storage Tiering (FAST) and FAST Cache for SQL Server Enterprise Deployments Leveraging EMC Fully Automated Storage Tiering (FAST) and FAST Cache for SQL Server Enterprise Deployments Applied Technology Abstract This white paper introduces EMC s latest groundbreaking technologies,

More information

Laserfiche Hardware Planning and Specifications. White Paper

Laserfiche Hardware Planning and Specifications. White Paper Laserfiche Hardware Planning and Specifications White Paper September 2012 Table of Contents Introduction... 3 Gathering System Requirements... 3 System Storage Calculations... 4 Evaluate Current State...

More information

BrightStor ARCserve Backup for Windows

BrightStor ARCserve Backup for Windows BrightStor ARCserve Backup for Windows Tape RAID Option Guide r11.5 D01183-1E This documentation and related computer software program (hereinafter referred to as the "Documentation") is for the end user's

More information

IOmark- VDI. HP HP ConvergedSystem 242- HC StoreVirtual Test Report: VDI- HC- 150427- b Test Report Date: 27, April 2015. www.iomark.

IOmark- VDI. HP HP ConvergedSystem 242- HC StoreVirtual Test Report: VDI- HC- 150427- b Test Report Date: 27, April 2015. www.iomark. IOmark- VDI HP HP ConvergedSystem 242- HC StoreVirtual Test Report: VDI- HC- 150427- b Test Copyright 2010-2014 Evaluator Group, Inc. All rights reserved. IOmark- VDI, IOmark- VM, VDI- IOmark, and IOmark

More information

Capacity Planning Process Estimating the load Initial configuration

Capacity Planning Process Estimating the load Initial configuration Capacity Planning Any data warehouse solution will grow over time, sometimes quite dramatically. It is essential that the components of the solution (hardware, software, and database) are capable of supporting

More information