Capacity Planning Process Estimating the load Initial configuration

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Capacity Planning Process Estimating the load Initial configuration"

Transcription

1 Capacity Planning Any data warehouse solution will grow over time, sometimes quite dramatically. It is essential that the components of the solution (hardware, software, and database) are capable of supporting the extended sizes without unacceptable performance loss, or growth of the load window to a point where it affects the use of the system. Process The capacity plan for a data warehouse is defined within the technical blueprint stage of the process. The business requirements stage should have identified the approximate sizes for data, users, and any other issues that constrain system performance. One of the most difficult decisions you will have to make about a data warehouse is the capacity required by the hardware. It is important to have a clear understanding of the usage profiles of all users of the data warehouse. For each user or group of users you need to know the following: The number of users in the group; Whether they use ad hoc queries frequently; Whether they use ad hoc queries occasionally at unknown intervals; Whether they use ad hoc queries occasionally at regular and predictable times; The average size of query they tend to run; The maximum size of query they tend to run; The elapsed login time per day; The peak time of daily usage; The number of queries they run peak hour; The number of queries they run per day. These usage profiles will probably change over time, and need to be kept up to date. They are useful for growth predictions and capacity. The profiles in themselves are not enough; you also require an understanding of the business. Estimating the load When choosing the hardware for the data warehouse there are many things to consider, such as hardware architecture, resilience, and so on. The data warehouse will probably grow rapidly from its initial configuration, so it is not sufficient to consider the initial size of the data warehouse. There are a number of different elements that need to be considered, but the decision all come down to how much CPU, how much memory and how much disk you will need. If your sizing calls for more than the budget can afford, do not allow the required capacity to be chopped back. If that is the case, some of the functionality will need to be pared back and then the capacity can be re-estimated. In the following pages we shall attempt to outline the rules and guidelines that we follow when sizing a system for a data warehouse. Initial configuration When sizing the initial configuration you will have no history information or statistics to work with, and the sizing will need to be done on the predicted load. Estimating this load is difficult, because there is an ad hoc element to it. 1

2 All you can do is estimate the configuration based on the known requirements. This is why the business requirements phase is so important. When deciding on the initial configuration you will need to allow some contingency. This is particularly important in a data warehouse project, because the requirements are often difficult to pin down accurately, and the load can quickly vary from the expected. Otherwise, the sizing exercise is the same irrespective of the phase of the data warehouse that you are trying to size. How much CPU Bandwidth? To start with you need to consider the distinct loads that will be placed on the system. There are many aspects to this, such as query load, data load, backup and so on, but essentially the load can be divided into two distinct phases: Daily processing # user query processing Overnight processing # data transformation and load # aggregation and index creation # backup Daily processing The daily processing is centered on the user queries. To estimate the CPU requirements you need to estimate the time that each query will take. As much of the query load will be ad hoc it is impossible to estimate the requirement of every query; therefore another approach has to be found. The first thing to do is estimate the size of the largest likely common query. It is possible that some user will want to query across every piece of data in the data warehouse, but this will probably not be a common requirement. It is more likely that the users will want to query the most recent week or month s worth of data. Having established the likely period that will be queried, you will know the volume of data that will be involved. As you cannot assume that a relevant index will be in place, you must assume the query will perform a full table scan of the fact data for that period. So we now have a measure of the volume of data, let us say F megabytes, that will be accessed. To progress any further we need to know the I/O characteristics of the devices that the data will reside on. This allows us to calculate the scan rate S at which the fact data can be read. This will depend on the disk speeds and on the throughput ratings of the controllers. Clearly this also depends on the size of F itself. If F is many gigabytes then it will definitely be spread across multiple disks and probably across multiple controllers. You can now calculate S assuming a reasonable spread of the data. Remember that if the database is designed correctly and the large queries are controlled properly you should not get much contention for the disks, so you should get a reasonable throughput. Using S and F you can calculate T, the time in seconds to perform a full table scan of the period in question: T = F/S..1.1 If fact you should calculate a number of times, T1 Ta, which depend on the degree of parallelism that you are using to perform the scan. Therefore we get T1 = F/S1.... Tn = T/Sn 1.2 2

3 Where S1 is the scan speed of a disk or striped disk set, and Sn is the scan speed of all the disks or disk sets that F is spread across. You may be able to get slightly higher throughput than Sn with higher degrees of parallelism, but you will bottlenect on I/O at degrees of parallelism much above N. Now you can take the query response time requirements specified in the service level agreement and pick the appropriate T value, Tp say: this will give you Sp, the required scan rate, the number of disks or disk sets you will need to spread data across. It also gives you P, the required degree of parallelism to meet query response times for a single query. Now you need to estimate Pc, the number of parallel scanning threads that a single CPU will support. This will vary from processor to processor. The processors currently on the market will support from two to four scanners, but chip technology is moving on quickly, and this number will change over time. If possible, you should establish this by experiment. Now you can estimate your CPU requirement to support a single query with: Cs = Roundup(2P/Ps) You need to use 2P to allow for other query overheads, and for queries that involve sorts. To calculate the minimum number of CPUs required overall use the following formula: Ct = ncs Where n is the number of concurrent queries that will be allowed to run. This should not be confused with the number of concurrent users, because unless a user is running a query that are not likely to be doing anything heavier than editing. Note that the additional 1 is added to the total to allow for the operating system overheads and for all other user processing. Overnight Processing The first point to note about the nightly processing is that the operations listed at he beginning of this section are, for the most part, serialized. This is because each operation usually relies on the previous operation s completing before it can begin. The CPU bandwidth required for the data transformation will depend on the amount of data processing that is involved. Unless there is an enormous amount of data transformation, it is unlikely that this operation will require more CPU bandwidth than the aggregation and index creation operations. The same applies to the backup, although you should bear in mind that backing up large quantities of data in a short period of time will cause a major kickback onto the CPU. If the backup is spread over more hours, the amount of parallelism will come down, and its CPU bandwidth requirement will drop. The data load is another task that can use massive parallelism to speed up its operation. as with backup. As with backup, if you use fewer parallel streams it will use less CPU bandwidth and will take longer to run. Having established what you are going to use as your baseline, you then need to estimate how much CPU capacity that operation requires to complete in the allowed time. It is not safe to assume that you will have more than 10 hours overnight to achieve all the processing, even if the user day is only 8 or 9 hours long. Delays in data arrival can cause you significant problems, and you must make sure that you can complete the overnight processing without running over into the business day. 3

4 As every data warehouse is different, it is impossible here to give explicit estimates for these operations. It is not even possible to give firm guidelines, because each aggregation is a different complex query and/or update plus an intensive write operation. How Much Memory? There are a number of things that affect the amount of memory required. First, there are the database requirements. The database will need memory to cache data blocks as they are used; it will also need memory to cache parsed SQL statements and so on. You will need memory for sort space. Secondly, each user connected to the system will use an amount of memory: how much will depend on how they are connected to the system and what software they are running. Finally, the operating system will require an amount of memory. How much disk? The disk requirement can be broken down into the following categories: Database requirements # administration # fact and dimension data # aggregation Non database requirements # operating system requirements # other software requirements # user requirements Database sizing There are a number of aspects to the database sizing that need to be considered. First, there are the system administration requirements of the database. There is data dictionary and the journal files, plus any rollback space that is required. These will all be small by comparison with the temporary or sort area. If you can gauze the size of the largest transactin that will realistically be run you can use this to size the temporary requirements. If not, the best you can do is tie it to the size o a partition. If you do this, make allowance for multiple queries running at a given time, and set the temporary space to T = (2n + 1) P Where n is the number of concurrent queries allowed, and P is the size of a partition. It you use different-sized partitions for current data and for older data, then use the largest partition size in the calculation above. Then there is the fact and dimension data. This is the one piece of data that you can actually size; everything else will be sized off this. To do this sizing exercise, you will need to known the database schema. Clearly, when performing the original size estimates for a business case, much of this information will be missing. In this situation, the sizing will need to be based purely on original estimates of the base data size. When sizing the fact or dimension data you will have the record definitions, with each field type and size specified. Note, however, that the size specified for a field will be the maximum size. When calculating the actual size you will need to know. The average size of the data in each field; 4

5 The percentage occupancy of each field; The position of each data field in the table; The RDBMS storage format of each data type; Any table; row and column overhead. Another factor that may affect our calculations is the database block or page size. A database block will normally have some header information at the top of the block. This is space that cannot be used by data and can amount to bytes. The difference between using a 2 kb block size and using a!6 kb block size will mean something of the order of 100 bytes of extra data space in every 16 kb. You will also need to estimate the size of the index space required for the fact and dimension data. Fact data should generally be only lightly indexed, with indexes occupying between 15% and 30% of the space occupied by the fact data; the cost in terms of index maintenance would be extremely heavy otherwise. The final determinant of the ultimate size of the fact data is the amount of data that you intend to keep online. When this is known, you can decide on your partitionling strategy. This needs to be taken into account in the sizing, because it is unlikely that you will get data to load exactly into every partition. You also need to size the aggregations. For the initial system you will probably be able to size the actual aggregations that are planned. All the factors discussed above apply equally to the aggregations. As a rule of thumb, you should allow the same amount of space for aggregations as you will have fact data online. You will also need to allow space for indexes on the aggregations. These summarized tables are likely to be heavily indexed, and it is usual to assume 100% indexation. In other words, allow as much space again for indexing as for the aggregates themselves. So, to summarize, the space required by the database will be Space required = F+Fi+D+Di+A+Ai+A+S Where F is the size of the fact data (all the fact data that will be kept online); Fi is the size of the fact data indexation; D is the size of the dimension data; Di is the size of the dimension data indexation; A is the size of the aggregations; Ai is the size of the aggregations indexation; T is the size of the database temporary or sort area; and S is the database system administration overhead. If you want to get a quick upper bound on the database size, equation 1.6 can be reduced as follows: Space required = F+Fi+D+Di+A+Ai+A+S = 3F + Fi + D + Di + T + S as A = Ai = F < 3.3F +D+Di+T+S as Fi <= 30% F < 3.5F + T + S as D<=10%F and D = Di <= 3.5F + T as S<<T and S<<F 1.7 If F is sized accurately this formula will give a reasonable estimate of the ultimate system size. To show a worked example, suppose the fact data is calculated to be 36GB of data per year, and 4 years worth of data are to be kept online. This means that F would be 144GB. Then using eq. 1.7 we get 5

6 Space required = 3.5F + T = (3.5 * 144)+T = 504 +T GB Now suppose the data is to be partitioned by month; that would give a partition size P of 3GB. If four concurrent queries are to be allowed, using eq 1.5 we can now estimate the size of the temporary space T: T = (2n+1)P T = [(2*4)+1]3 T = 27 This gives a total database size of 531 GB for the full-sized system. If it is intended to keep 3 years worth of data online, the formula above will represent the size of the database after 3 years worth of data has been loaded. To size the initial system for 6 months worth of data you can say Initial space = (3.5F+T)/[(n*12)/6] Where n is the number of years data you intend to keep online. Note the +1 under the line: this will account for the dimension data s being a bigger percentage of the fact initially. This means that eq. 1.9 reduces to Initial space = (3.5F+T)/ Which is your initial sizing. One final word: remember that every data warehouse is different. These figures are only guidelines. 6

Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc.

Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc. Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services By Ajay Goyal Consultant Scalability Experts, Inc. June 2009 Recommendations presented in this document should be thoroughly

More information

Virtuoso and Database Scalability

Virtuoso and Database Scalability Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of

More information

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5 Performance Study VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5 VMware VirtualCenter uses a database to store metadata on the state of a VMware Infrastructure environment.

More information

PARALLEL PROCESSING AND THE DATA WAREHOUSE

PARALLEL PROCESSING AND THE DATA WAREHOUSE PARALLEL PROCESSING AND THE DATA WAREHOUSE BY W. H. Inmon One of the essences of the data warehouse environment is the accumulation of and the management of large amounts of data. Indeed, it is said that

More information

Muse Server Sizing. 18 June 2012. Document Version 0.0.1.9 Muse 2.7.0.0

Muse Server Sizing. 18 June 2012. Document Version 0.0.1.9 Muse 2.7.0.0 Muse Server Sizing 18 June 2012 Document Version 0.0.1.9 Muse 2.7.0.0 Notice No part of this publication may be reproduced stored in a retrieval system, or transmitted, in any form or by any means, without

More information

Whitepaper: performance of SqlBulkCopy

Whitepaper: performance of SqlBulkCopy We SOLVE COMPLEX PROBLEMS of DATA MODELING and DEVELOP TOOLS and solutions to let business perform best through data analysis Whitepaper: performance of SqlBulkCopy This whitepaper provides an analysis

More information

Distribution One Server Requirements

Distribution One Server Requirements Distribution One Server Requirements Introduction Welcome to the Hardware Configuration Guide. The goal of this guide is to provide a practical approach to sizing your Distribution One application and

More information

The Bus (PCI and PCI-Express)

The Bus (PCI and PCI-Express) 4 Jan, 2008 The Bus (PCI and PCI-Express) The CPU, memory, disks, and all the other devices in a computer have to be able to communicate and exchange data. The technology that connects them is called the

More information

Innovative technology for big data analytics

Innovative technology for big data analytics Technical white paper Innovative technology for big data analytics The HP Vertica Analytics Platform database provides price/performance, scalability, availability, and ease of administration Table of

More information

SQL Server Business Intelligence on HP ProLiant DL785 Server

SQL Server Business Intelligence on HP ProLiant DL785 Server SQL Server Business Intelligence on HP ProLiant DL785 Server By Ajay Goyal www.scalabilityexperts.com Mike Fitzner Hewlett Packard www.hp.com Recommendations presented in this document should be thoroughly

More information

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc. Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE

More information

Technical White Paper. Symantec Backup Exec 10d System Sizing. Best Practices For Optimizing Performance of the Continuous Protection Server

Technical White Paper. Symantec Backup Exec 10d System Sizing. Best Practices For Optimizing Performance of the Continuous Protection Server Symantec Backup Exec 10d System Sizing Best Practices For Optimizing Performance of the Continuous Protection Server Table of Contents Table of Contents...2 Executive Summary...3 System Sizing and Performance

More information

Quick Guide to the SPD Engine Disk-I/O Set-Up

Quick Guide to the SPD Engine Disk-I/O Set-Up 1 Quick Guide to the SPD Engine Disk-I/O Set-Up SPD Engine Disk-I/O Set-Up 1 Disk Striping and RAIDs 2 Metadata Area Configuration 3 Assigning a Metadata Area 3 Metadata Space Requirements 3 Data Area

More information

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1 Performance Study Performance Characteristics of and RDM VMware ESX Server 3.0.1 VMware ESX Server offers three choices for managing disk access in a virtual machine VMware Virtual Machine File System

More information

Parallel Replication for MySQL in 5 Minutes or Less

Parallel Replication for MySQL in 5 Minutes or Less Parallel Replication for MySQL in 5 Minutes or Less Featuring Tungsten Replicator Robert Hodges, CEO, Continuent About Continuent / Continuent is the leading provider of data replication and clustering

More information

Cognos Performance Troubleshooting

Cognos Performance Troubleshooting Cognos Performance Troubleshooting Presenters James Salmon Marketing Manager James.Salmon@budgetingsolutions.co.uk Andy Ellis Senior BI Consultant Andy.Ellis@budgetingsolutions.co.uk Want to ask a question?

More information

Q & A From Hitachi Data Systems WebTech Presentation:

Q & A From Hitachi Data Systems WebTech Presentation: Q & A From Hitachi Data Systems WebTech Presentation: RAID Concepts 1. Is the chunk size the same for all Hitachi Data Systems storage systems, i.e., Adaptable Modular Systems, Network Storage Controller,

More information

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance. Agenda Enterprise Performance Factors Overall Enterprise Performance Factors Best Practice for generic Enterprise Best Practice for 3-tiers Enterprise Hardware Load Balancer Basic Unix Tuning Performance

More information

Performance And Scalability In Oracle9i And SQL Server 2000

Performance And Scalability In Oracle9i And SQL Server 2000 Performance And Scalability In Oracle9i And SQL Server 2000 Presented By : Phathisile Sibanda Supervisor : John Ebden 1 Presentation Overview Project Objectives Motivation -Why performance & Scalability

More information

Performance Counters. Microsoft SQL. Technical Data Sheet. Overview:

Performance Counters. Microsoft SQL. Technical Data Sheet. Overview: Performance Counters Technical Data Sheet Microsoft SQL Overview: Key Features and Benefits: Key Definitions: Performance counters are used by the Operations Management Architecture (OMA) to collect data

More information

Amadeus SAS Specialists Prove Fusion iomemory a Superior Analysis Accelerator

Amadeus SAS Specialists Prove Fusion iomemory a Superior Analysis Accelerator WHITE PAPER Amadeus SAS Specialists Prove Fusion iomemory a Superior Analysis Accelerator 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com SAS 9 Preferred Implementation Partner tests a single Fusion

More information

The Methodology Behind the Dell SQL Server Advisor Tool

The Methodology Behind the Dell SQL Server Advisor Tool The Methodology Behind the Dell SQL Server Advisor Tool Database Solutions Engineering By Phani MV Dell Product Group October 2009 Executive Summary The Dell SQL Server Advisor is intended to perform capacity

More information

SQL Memory Management in Oracle9i

SQL Memory Management in Oracle9i SQL Management in Oracle9i Benoît Dageville Mohamed Zait Oracle Corporation Oracle Corporation 500 Oracle Parway 500 Oracle Parway Redwood Shores, CA 94065 Redwood Shores, CA 94065 U.S.A U.S.A Benoit.Dageville@oracle.com

More information

Data Warehouse in the Cloud Marketing or Reality? Alexei Khalyako Sr. Program Manager Windows Azure Customer Advisory Team

Data Warehouse in the Cloud Marketing or Reality? Alexei Khalyako Sr. Program Manager Windows Azure Customer Advisory Team Data Warehouse in the Cloud Marketing or Reality? Alexei Khalyako Sr. Program Manager Windows Azure Customer Advisory Team Data Warehouse we used to know High-End workload High-End hardware Special know-how

More information

www.dotnetsparkles.wordpress.com

www.dotnetsparkles.wordpress.com Database Design Considerations Designing a database requires an understanding of both the business functions you want to model and the database concepts and features used to represent those business functions.

More information

Rackspace Cloud Databases and Container-based Virtualization

Rackspace Cloud Databases and Container-based Virtualization Rackspace Cloud Databases and Container-based Virtualization August 2012 J.R. Arredondo @jrarredondo Page 1 of 6 INTRODUCTION When Rackspace set out to build the Cloud Databases product, we asked many

More information

WHITE PAPER BRENT WELCH NOVEMBER

WHITE PAPER BRENT WELCH NOVEMBER BACKUP WHITE PAPER BRENT WELCH NOVEMBER 2006 WHITE PAPER: BACKUP TABLE OF CONTENTS Backup Overview 3 Background on Backup Applications 3 Backup Illustration 4 Media Agents & Keeping Tape Drives Busy 5

More information

High performance ETL Benchmark

High performance ETL Benchmark High performance ETL Benchmark Author: Dhananjay Patil Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 07/02/04 Email: erg@evaltech.com Abstract: The IBM server iseries

More information

Case Study I: A Database Service

Case Study I: A Database Service Case Study I: A Database Service Prof. Daniel A. Menascé Department of Computer Science George Mason University www.cs.gmu.edu/faculty/menasce.html 1 Copyright Notice Most of the figures in this set of

More information

XenDesktop 7 Database Sizing

XenDesktop 7 Database Sizing XenDesktop 7 Database Sizing Contents Disclaimer... 3 Overview... 3 High Level Considerations... 3 Site Database... 3 Impact of failure... 4 Monitoring Database... 4 Impact of failure... 4 Configuration

More information

WHITE PAPER FUJITSU PRIMERGY SERVER BASICS OF DISK I/O PERFORMANCE

WHITE PAPER FUJITSU PRIMERGY SERVER BASICS OF DISK I/O PERFORMANCE WHITE PAPER BASICS OF DISK I/O PERFORMANCE WHITE PAPER FUJITSU PRIMERGY SERVER BASICS OF DISK I/O PERFORMANCE This technical documentation is aimed at the persons responsible for the disk I/O performance

More information

Optimizing Performance. Training Division New Delhi

Optimizing Performance. Training Division New Delhi Optimizing Performance Training Division New Delhi Performance tuning : Goals Minimize the response time for each query Maximize the throughput of the entire database server by minimizing network traffic,

More information

Understanding the Value of In-Memory in the IT Landscape

Understanding the Value of In-Memory in the IT Landscape February 2012 Understing the Value of In-Memory in Sponsored by QlikView Contents The Many Faces of In-Memory 1 The Meaning of In-Memory 2 The Data Analysis Value Chain Your Goals 3 Mapping Vendors to

More information

sql server best practice

sql server best practice sql server best practice 1 MB file growth SQL Server comes with a standard configuration which autogrows data files in databases in 1 MB increments. By incrementing in such small chunks, you risk ending

More information

MS SQL Performance (Tuning) Best Practices:

MS SQL Performance (Tuning) Best Practices: MS SQL Performance (Tuning) Best Practices: 1. Don t share the SQL server hardware with other services If other workloads are running on the same server where SQL Server is running, memory and other hardware

More information

IncidentMonitor Server Specification Datasheet

IncidentMonitor Server Specification Datasheet IncidentMonitor Server Specification Datasheet Prepared by Monitor 24-7 Inc October 1, 2015 Contact details: sales@monitor24-7.com North America: +1 416 410.2716 / +1 866 364.2757 Europe: +31 088 008.4600

More information

Firebird meets NoSQL (Apache HBase) Case Study

Firebird meets NoSQL (Apache HBase) Case Study Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

DB2 Database Layout and Configuration for SAP NetWeaver based Systems

DB2 Database Layout and Configuration for SAP NetWeaver based Systems IBM Software Group - IBM SAP DB2 Center of Excellence DB2 Database Layout and Configuration for SAP NetWeaver based Systems Helmut Tessarek DB2 Performance, IBM Toronto Lab IBM SAP DB2 Center of Excellence

More information

Understanding the Benefits of IBM SPSS Statistics Server

Understanding the Benefits of IBM SPSS Statistics Server IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster

More information

Storage Layout and I/O Performance in Data Warehouses

Storage Layout and I/O Performance in Data Warehouses Storage Layout and I/O Performance in Data Warehouses Matthias Nicola 1, Haider Rizvi 2 1 IBM Silicon Valley Lab 2 IBM Toronto Lab mnicola@us.ibm.com haider@ca.ibm.com Abstract. Defining data placement

More information

Optimizing the Performance of Your Longview Application

Optimizing the Performance of Your Longview Application Optimizing the Performance of Your Longview Application François Lalonde, Director Application Support May 15, 2013 Disclaimer This presentation is provided to you solely for information purposes, is not

More information

Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC

Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC ABSTRACT As data sets continue to grow, it is important for programs to be written very efficiently to make sure no time

More information

PUBLIC Performance Optimization Guide

PUBLIC Performance Optimization Guide SAP Data Services Document Version: 4.2 Support Package 6 (14.2.6.0) 2015-11-20 PUBLIC Content 1 Welcome to SAP Data Services....6 1.1 Welcome.... 6 1.2 Documentation set for SAP Data Services....6 1.3

More information

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011 SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

Data Warehouse Performance Management Techniques.

Data Warehouse Performance Management Techniques. Data Warehouse Performance Management Techniques. Author: Organization: Andrew Holdsworth Oracle Services, Date: 2/9/96 Advanced Technologies, Data Warehousing Practice. Address: 500 Oracle Parkway, Redwood

More information

Microsoft SQL Server 2008 Data and Backup Compression

Microsoft SQL Server 2008 Data and Backup Compression white paper Microsoft SQL Server 2008 Data and Backup Jerrold Buggert Rick Freeman Elena Shen Richard Saunders Cecil Reames August 19, 2008 Table of Contents Introduction to in Microsoft SQL Server 2008

More information

Deploying and Optimizing SQL Server for Virtual Machines

Deploying and Optimizing SQL Server for Virtual Machines Deploying and Optimizing SQL Server for Virtual Machines Deploying and Optimizing SQL Server for Virtual Machines Much has been written over the years regarding best practices for deploying Microsoft SQL

More information

Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.

Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit. Is your database application experiencing poor response time, scalability problems, and too many deadlocks or poor application performance? One or a combination of zparms, database design and application

More information

Performance Workload Design

Performance Workload Design Performance Workload Design The goal of this paper is to show the basic principles involved in designing a workload for performance and scalability testing. We will understand how to achieve these principles

More information

Computer Components Study Guide. The Case or System Box

Computer Components Study Guide. The Case or System Box Computer Components Study Guide In this lesson, we will briefly explore the basics of identifying the parts and components inside of a computer. This lesson is used to introduce the students to the inside

More information

Windows Server Performance Monitoring

Windows Server Performance Monitoring Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly

More information

SQL Server 2012 Performance White Paper

SQL Server 2012 Performance White Paper Published: April 2012 Applies to: SQL Server 2012 Copyright The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication.

More information

Communicating with devices

Communicating with devices Introduction to I/O Where does the data for our CPU and memory come from or go to? Computers communicate with the outside world via I/O devices. Input devices supply computers with data to operate on.

More information

WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE

WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE 1 W W W. F U S I ON I O.COM Table of Contents Table of Contents... 2 Executive Summary... 3 Introduction: In-Memory Meets iomemory... 4 What

More information

2009 Oracle Corporation 1

2009 Oracle Corporation 1 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material,

More information

Navigating Big Data with High-Throughput, Energy-Efficient Data Partitioning

Navigating Big Data with High-Throughput, Energy-Efficient Data Partitioning Application-Specific Architecture Navigating Big Data with High-Throughput, Energy-Efficient Data Partitioning Lisa Wu, R.J. Barker, Martha Kim, and Ken Ross Columbia University Xiaowei Wang Rui Chen Outline

More information

White Paper February 2010. IBM InfoSphere DataStage Performance and Scalability Benchmark Whitepaper Data Warehousing Scenario

White Paper February 2010. IBM InfoSphere DataStage Performance and Scalability Benchmark Whitepaper Data Warehousing Scenario White Paper February 2010 IBM InfoSphere DataStage Performance and Scalability Benchmark Whitepaper Data Warehousing Scenario 2 Contents 5 Overview of InfoSphere DataStage 7 Benchmark Scenario Main Workload

More information

Load Testing Analysis Services Gerhard Brückl

Load Testing Analysis Services Gerhard Brückl Load Testing Analysis Services Gerhard Brückl About Me Gerhard Brückl Working with Microsoft BI since 2006 Mainly focused on Analytics and Reporting Analysis Services / Reporting Services Power BI / O365

More information

Benchmarking Cassandra on Violin

Benchmarking Cassandra on Violin Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract

More information

Initial Hardware Estimation Guidelines. AgilePoint BPMS v5.0 SP1

Initial Hardware Estimation Guidelines. AgilePoint BPMS v5.0 SP1 Initial Hardware Estimation Guidelines Document Revision r5.2.3 November 2011 Contents 2 Contents Preface...3 Disclaimer of Warranty...3 Copyright...3 Trademarks...3 Government Rights Legend...3 Virus-free

More information

QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE

QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE QlikView Technical Brief April 2011 www.qlikview.com Introduction This technical brief covers an overview of the QlikView product components and architecture

More information

ISTANBUL AYDIN UNIVERSITY

ISTANBUL AYDIN UNIVERSITY ISTANBUL AYDIN UNIVERSITY 2013-2014 Academic Year Fall Semester Department of Software Engineering SEN361 COMPUTER ORGANIZATION HOMEWORK REPORT STUDENT S NAME : GÖKHAN TAYMAZ STUDENT S NUMBER : B1105.090068

More information

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

low-level storage structures e.g. partitions underpinning the warehouse logical table structures DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: low-level storage structures e.g. partitions underpinning the warehouse logical table structures low-level structures

More information

CribMaster Database and Client Requirements

CribMaster Database and Client Requirements FREQUENTLY ASKED QUESTIONS CribMaster Database and Client Requirements GENERAL 1. WHAT TYPE OF APPLICATION IS CRIBMASTER? ARE THERE ANY SPECIAL APPLICATION SERVER OR USER INTERFACE REQUIREMENTS? CribMaster

More information

Administração e Optimização de BDs

Administração e Optimização de BDs Departamento de Engenharia Informática 2010/2011 Administração e Optimização de BDs Aula de Laboratório 1 2º semestre In this lab class we will address the following topics: 1. General Workplan for the

More information

SAP HANA. SAP HANA Performance Efficient Speed and Scale-Out for Real-Time Business Intelligence

SAP HANA. SAP HANA Performance Efficient Speed and Scale-Out for Real-Time Business Intelligence SAP HANA SAP HANA Performance Efficient Speed and Scale-Out for Real-Time Business Intelligence SAP HANA Performance Table of Contents 3 Introduction 4 The Test Environment Database Schema Test Data System

More information

Identifying the Hidden Risk of Data De-duplication: How the HYDRAstor Solution Proactively Solves the Problem

Identifying the Hidden Risk of Data De-duplication: How the HYDRAstor Solution Proactively Solves the Problem Identifying the Hidden Risk of Data De-duplication: How the HYDRAstor Solution Proactively Solves the Problem October, 2006 Introduction Data de-duplication has recently gained significant industry attention,

More information

A Guide to Getting Started with Successful Load Testing

A Guide to Getting Started with Successful Load Testing Ingenieurbüro David Fischer AG A Company of the Apica Group http://www.proxy-sniffer.com A Guide to Getting Started with Successful Load Testing English Edition 2007 All Rights Reserved Table of Contents

More information

Towards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration. Sina Meraji sinamera@ca.ibm.com

Towards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration. Sina Meraji sinamera@ca.ibm.com Towards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration Sina Meraji sinamera@ca.ibm.com Please Note IBM s statements regarding its plans, directions, and intent are subject to

More information

PEPPERDATA IN MULTI-TENANT ENVIRONMENTS

PEPPERDATA IN MULTI-TENANT ENVIRONMENTS ..................................... PEPPERDATA IN MULTI-TENANT ENVIRONMENTS technical whitepaper June 2015 SUMMARY OF WHAT S WRITTEN IN THIS DOCUMENT If you are short on time and don t want to read the

More information

Configuring Apache Derby for Performance and Durability Olav Sandstå

Configuring Apache Derby for Performance and Durability Olav Sandstå Configuring Apache Derby for Performance and Durability Olav Sandstå Database Technology Group Sun Microsystems Trondheim, Norway Overview Background > Transactions, Failure Classes, Derby Architecture

More information

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #2610771

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #2610771 ENHANCEMENTS TO SQL SERVER COLUMN STORES Anuhya Mallempati #2610771 CONTENTS Abstract Introduction Column store indexes Batch mode processing Other Enhancements Conclusion ABSTRACT SQL server introduced

More information

System Requirements Table of contents

System Requirements Table of contents Table of contents 1 Introduction... 2 2 Knoa Agent... 2 2.1 System Requirements...2 2.2 Environment Requirements...4 3 Knoa Server Architecture...4 3.1 Knoa Server Components... 4 3.2 Server Hardware Setup...5

More information

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database WHITE PAPER Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive

More information

Oracle Database In-Memory The Next Big Thing

Oracle Database In-Memory The Next Big Thing Oracle Database In-Memory The Next Big Thing Maria Colgan Master Product Manager #DBIM12c Why is Oracle do this Oracle Database In-Memory Goals Real Time Analytics Accelerate Mixed Workload OLTP No Changes

More information

q for Gods Whitepaper Series (Edition 1) Multi-Partitioned kdb+ Databases: An Equity Options Case Study

q for Gods Whitepaper Series (Edition 1) Multi-Partitioned kdb+ Databases: An Equity Options Case Study Series (Edition 1) Multi-Partitioned kdb+ Databases: An Equity Options Case Study October 2012 Author: James Hanna, who joined First Derivatives in 2004, has helped design and develop kdb+ implementations

More information

Performance Optimization Guide

Performance Optimization Guide Performance Optimization Guide Publication Date: July 06, 2016 Copyright Metalogix International GmbH, 2001-2016. All Rights Reserved. This software is protected by copyright law and international treaties.

More information

Big Fast Data Hadoop acceleration with Flash. June 2013

Big Fast Data Hadoop acceleration with Flash. June 2013 Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional

More information

ence confident powerful experienced The Teradata Scalability Story A Teradata White Paper

ence confident powerful experienced The Teradata Scalability Story A Teradata White Paper powerful simplicity mple confident ence experienced Contributor: Carrie Ballinger, Senior Technical Consultant, Teradata Development Teradata, a division of NCR A Teradata White Paper EB-3031 0801 PAGE

More information

Hardware Configuration Guide

Hardware Configuration Guide Hardware Configuration Guide Contents Contents... 1 Annotation... 1 Factors to consider... 2 Machine Count... 2 Data Size... 2 Data Size Total... 2 Daily Backup Data Size... 2 Unique Data Percentage...

More information

Measuring Cache and Memory Latency and CPU to Memory Bandwidth

Measuring Cache and Memory Latency and CPU to Memory Bandwidth White Paper Joshua Ruggiero Computer Systems Engineer Intel Corporation Measuring Cache and Memory Latency and CPU to Memory Bandwidth For use with Intel Architecture December 2008 1 321074 Executive Summary

More information

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f. one large disk) Parallelism improves performance Plus extra disk(s) for redundant data storage Provides fault tolerant

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

MailEnable Scalability White Paper Version 1.2

MailEnable Scalability White Paper Version 1.2 MailEnable Scalability White Paper Version 1.2 Table of Contents 1 Overview...2 2 Core architecture...3 2.1 Configuration repository...3 2.2 Storage repository...3 2.3 Connectors...3 2.3.1 SMTP Connector...3

More information

Exploring RAID Configurations

Exploring RAID Configurations Exploring RAID Configurations J. Ryan Fishel Florida State University August 6, 2008 Abstract To address the limits of today s slow mechanical disks, we explored a number of data layouts to improve RAID

More information

Azure VM Performance Considerations Running SQL Server

Azure VM Performance Considerations Running SQL Server Azure VM Performance Considerations Running SQL Server Your company logo here Vinod Kumar M @vinodk_sql http://blogs.extremeexperts.com Session Objectives And Takeaways Session Objective(s): Learn the

More information

Geospatial Server Performance Colin Bertram UK User Group Meeting 23-Sep-2014

Geospatial Server Performance Colin Bertram UK User Group Meeting 23-Sep-2014 Geospatial Server Performance Colin Bertram UK User Group Meeting 23-Sep-2014 Topics Auditing a Geospatial Server Solution Web Server Strategies and Configuration Database Server Strategy and Configuration

More information

Optimizing Your Data Warehouse Design for Superior Performance

Optimizing Your Data Warehouse Design for Superior Performance Optimizing Your Data Warehouse Design for Superior Performance Lester Knutsen, President and Principal Database Consultant Advanced DataTools Corporation Session 2100A The Problem The database is too complex

More information

Comprehending the Tradeoffs between Deploying Oracle Database on RAID 5 and RAID 10 Storage Configurations. Database Solutions Engineering

Comprehending the Tradeoffs between Deploying Oracle Database on RAID 5 and RAID 10 Storage Configurations. Database Solutions Engineering Comprehending the Tradeoffs between Deploying Oracle Database on RAID 5 and RAID 10 Storage Configurations A Dell Technical White Paper Database Solutions Engineering By Sudhansu Sekhar and Raghunatha

More information

An Architecture for Using Tertiary Storage in a Data Warehouse

An Architecture for Using Tertiary Storage in a Data Warehouse An Architecture for Using Tertiary Storage in a Data Warehouse Theodore Johnson Database Research Dept. AT&T Labs - Research johnsont@research.att.com Motivation AT&T has huge data warehouses. Data from

More information

Oracle Database Concepts

Oracle Database Concepts Oracle Database Concepts Database Structure The database has logical structures and physical structures. Because the physical and logical structures are separate, the physical storage of data can be managed

More information

Data Integrator Performance Optimization Guide

Data Integrator Performance Optimization Guide Data Integrator Performance Optimization Guide Data Integrator 11.7.2 for Windows and UNIX Patents Trademarks Copyright Third-party contributors Business Objects owns the following

More information

CitusDB Architecture for Real-Time Big Data

CitusDB Architecture for Real-Time Big Data CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing

More information

SharePoint Capacity Planning Balancing Organiza,onal Requirements with Performance and Cost

SharePoint Capacity Planning Balancing Organiza,onal Requirements with Performance and Cost SharePoint Capacity Planning Balancing Organiza,onal Requirements with Performance and Cost Kirk Devore / J.D. Wade SharePoint Consultants Horizons Consul;ng Agenda Expecta;ons Defining SharePoint Capacity

More information

THE NEAL NELSON DATABASE BENCHMARK : A BENCHMARK BASED ON THE REALITIES OF BUSINESS

THE NEAL NELSON DATABASE BENCHMARK : A BENCHMARK BASED ON THE REALITIES OF BUSINESS THE NEAL NELSON DATABASE BENCHMARK : A BENCHMARK BASED ON THE REALITIES OF BUSINESS Neal Nelson & Associates is an independent benchmarking firm based in Chicago. They create and market benchmarks as well

More information

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April 2009. Page 1 of 12

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April 2009. Page 1 of 12 XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines A.Zydroń 18 April 2009 Page 1 of 12 1. Introduction...3 2. XTM Database...4 3. JVM and Tomcat considerations...5 4. XTM Engine...5

More information

Directions for VMware Ready Testing for Application Software

Directions for VMware Ready Testing for Application Software Directions for VMware Ready Testing for Application Software Introduction To be awarded the VMware ready logo for your product requires a modest amount of engineering work, assuming that the pre-requisites

More information