PARALLEL PROCESSING AND THE DATA WAREHOUSE

Size: px
Start display at page:

Download "PARALLEL PROCESSING AND THE DATA WAREHOUSE"

Transcription

1 PARALLEL PROCESSING AND THE DATA WAREHOUSE BY W. H. Inmon

2 One of the essences of the data warehouse environment is the accumulation of and the management of large amounts of data. Indeed, it is said that if you manage large amounts of data well in the data warehouse environment, that all other aspects of the data warehouse design and usage come easily. And if you do not manage large amounts of data well in the data warehouse environment, that nothing else really matters, because you will fail. It is true that the management of large amounts of data is the first and most critical success factor in the building and using of the data warehouse. There are many design approaches and techniques for the management of large amounts of data in the warehouse environment, such as: storing data on multiple storage media, summarizing data when detail becomes obsolete, storing data relationships in terms of artifacts (see the tech topic on data relationships in the data warehouse for an in-depth discussion of this topic), encoding and referencing data where appropriate, partitioning data for independent management of the different partitions, choosing levels of granularity and summarization properly for the data warehouse, and so forth. While all of these design and architecture techniques and approaches are valid and should be employed in any hardware environment, there is another approach to the management of large volumes of data for the data warehouse and that approach is to select technology that can manage data in parallel. Parallel technology is sometimes known as data base machine technology. THE TOPIC OF DISCUSSION This discussion will be on the management of large amounts of data warehouse data in a parallel environment. At this point the reader should be very cautious about one aspect of this discussion. This discussion is for the data warehouse only. Occasionally a developer will try to use data base machine or parallel technology for operational processing. This discussion is not about that kind of environment. Also, occasionally a designer will try to use data base machine technology for an environment that attempts to do both operational transaction processing and data warehouse processing at the same time on the same machine on the same data. This discussion is not about that environment. (For an in-depth discussion of the mixed operational and data warehouse environment refer to the tech topic on doing both operational and DSS processing on a single database.) This discussion is for the data warehouse environment only, where parallel technology has been selected as the (or one of the) primary storage and access method. THE APPEAL OF PARALLEL TECHNOLOGY Parallel technology is technology in which different machines are tightly coupled together but work independently. The machines each manage their own collection of data independently. The spread of data in the data warehouse is orthogonal. Copyright 2000 by William H. Inmon, all rights reserved Page 1

3 Figure 1 shows the basic configuration of processors working together and managing data independently. In Figure 1 there are five basic components of interest - the transaction (or request), the queue the transaction goes in, the network connecting the processors, the processor, and the data controlled by the processor. A request enters the system, and the processor the request needs to be channeled to determine the queue the request is routed to. In the case of a large transaction, the transaction may be broken into a series of requests for data and processing, which in turn are routed to the appropriate processor. The request enters the queue, and the processor starts the execution of the request. Upon the completion of the work done for the request, the results are sent to the requestor. While one processor is servicing the requests that operate on data that belongs to it, another processor can be servicing the requests that have been channeled to it independently of the work done by other processors. It is this independence of processing that has great appeal to the data warehouse architect because the independence of processing means that managing large amount of data is technologically possible. To manage large amounts of data merely requires the harnessing together multiple processors. Said another way, in order to manage more data warehouse data, the data warehouse architect merely needs to add more processors to the tightly networked configuration, as shown by Figure 2. Copyright 2000 by William H. Inmon, all rights reserved Page 2

4 In nonparallel environments, adding large amounts of data to an already large environment can cause tremendous difficulties at the basic operating system level. Nonparallel environments have a threshold of data after which they operate inefficiently. Once this threshold of data is reached, there is nothing that can be done except to go to another technology. And transferring data and processing to another technology can be a disruptive, expensive, complex experience to be avoided if at all possible. Parallel processing offers the possibility then that the technology itself can be extended almost ad infinitum, avoiding the conversion of the data warehouse from one technology to another. The independence of processing in the parallel environment leads to the observation that the speed of access of data is proportional to the number of processors that data is spread over. Suppose a parallel configuration has m independent processors with data spread evenly and optimally over the processors. Now suppose it takes a single large (non parallel) processor n units of time to service a request. The elapsed time required for the parallel configuration to execute the service is n/m. Figure 3 shows this difference. Copyright 2000 by William H. Inmon, all rights reserved Page 3

5 The savings in elapsed time by going to a parallel environment can be expressed: n - (n/m) = elapsed time differential It is worthy of note that the work done by the systems - either parallel or nonparallel - is the same in terms of I/O. The real difference is not in the total amount of work done, but in the elapsed time required to do that work. Another observation about the parallel environment is that the marginal improvement in elapsed time decreases as the number of processors are added. In other words, when the number of processors in the parallel environment increases from one to two, there is an enormous improvement in elapsed time. When the number of processors increases from two to three there is a significant improvement in elapsed time. But as the number of processors increases, the improvement grows smaller. For example, increasing the number of processors from twenty to twenty one may make no noticeable improvement in elapsed time at all. The choice between a parallel approach to technology in the data warehouse and a standard centralized approach in the warehouse usually revolves around volumes of data. For small to modest sized data warehouses, a centralized approach makes economic and technological sense. But after a point (when the data warehouse starts to contain a very large volume of data), the amount of data is such that a parallel approach becomes economically and technologically advantageous. The choice always boils down to both technological and economic considerations. Copyright 2000 by William H. Inmon, all rights reserved Page 4

6 PHYSICAL ORGANIZATION There are many ways that the components of parallel technology can be arranged. The following is a discussion of the most common ways, but is hardly intended to describe all the possibilities. It is worth noting that each configuration and arrangement of the components of parallel technology have their associated tradeoffs. Figure 4 illustrates the (typical) dynamics of the inner workings of the components of the parallel environment. The request can be a singular request for data that is routed to the appropriate processor or can be a general request that is broken down into a series of specific requests that are individually routed to individual processors. The queue that the request goes into can be a single large queue that has access to all the processors or can be a series of queues each of which is unique to different processors. When the queues are unique to different processors, the request must be assigned to a specific processor prior to execution. The designation of data to a processor can be done by means of a hashing algorithm or an index (or ostensibly by both means). When a hashing algorithm is used, the data is divided across the different processors in a random manner based on the primary key of the record. When data is assigned to a processor by means of an index, data is usually (although not necessarily) assigned to a processor in groups. Once the data arrives at the processor to which it is assigned, it is placed on disk storage and an index keeps track of its assignment. The data is stored in physical blocks, which hold tables, which are made up of rows (or records), which contain columns (or fields.) Copyright 2000 by William H. Inmon, all rights reserved Page 5

7 As stated there are many variations of the arrangement of the components of a parallel environment, each with its own strengths and weaknesses. HOT SPOTS The appeal of parallel technology is quite strong in the face of needing to manage large amounts of data and the ability to add processing resources in an incremental, nondisruptive fashion. At first glance it appears that the parallel approach is the answer to the data architect's prayers insofar as managing the volumes of data found in the data warehouse. However, there are occasions where the parallel approach to the management of large amounts of data yields worse results in terms of performance than the traditional single processor approach. The performance and the efficient utilization of the parallel approach to the management of large amounts of data depends on the data being spread evenly across the different processors so that the corresponding workload is likewise spread evenly over the processors. When there is an even and equitable spread of data and processing, the parallel approach to the management of large amounts of data works quite well. However, if there ever is an imbalance in the spread of data across the parallel processors and there is a corresponding imbalance in the workload spread across the processors, then there develops what is known as a "hot spot", and the effectiveness of the parallel environment is compromised. Figure 5 shows a hot spot. The workload in Figure 5 is seen to be imbalanced and a hot spot has occurred. Some processors have no work at all and one processor has the majority of the work piled on it. In this case there might as well be a central management of data. (Indeed, in this case a central approach to the management of data is much more effective and efficient than a parallel approach.) Copyright 2000 by William H. Inmon, all rights reserved Page 6

8 One of the problems associated with the data warehouse environment (and the world of DSS in general) is that the patterns of access of data in a warehouse are unpredictable. Both the rate of access and the specific records to be accessed are very variable. This implies that for DSS processing hot spots are the norm. Of course hot spots can be remedied. To remedy a hot spot requires the redistribution of data to other or more processors. Figure 6 shows the remedying of hot spots. The problem with remedying hot spots in the data warehouse environment is that any remedy depends on a foreknowledge of the usage of data. Just because data has shown a pattern of usage in the past does not mean that it will exhibit the same pattern of usage in the future. Therefore trying to identify and remedy hot spots in the parallel environment is a difficult task. OPERATIONAL/DSS DIFFERENCES The parallel environment can be used for the purposes of operational or data warehouse, DSS processing, but not both at the same time. There are several reasons why the two environments do not mix in the parallel environment. Figure 7.1, Figure 7.2, Figure 7.3 and Figure 7.4 illustrate some of those differences. Copyright 2000 by William H. Inmon, all rights reserved Page 7

9 Copyright 2000 by William H. Inmon, all rights reserved Page 8

10 The first difference between the two environments is in the type of transaction that is being run. The operational environment runs many small transactions, each of which attach their processing to a single processor. The data warehouse environment has a very different transaction operating in it. On the other hand, the data warehouse Copyright 2000 by William H. Inmon, all rights reserved Page 9

11 environment contains a few very large transactions, which operate on data spread all over the parallel environment. The second major difference between the two environments lies in the internal structure of the data. The data warehouse environment contains data whose structure is optimized for massive sequential, non-update processing. The operational environment is structured for access of a limited amount of data where the data can be updated. In addition the operational environment typically groups together data of different types so that a transaction does not have to look into different locations in order to find the data. Data warehouse data on the other hand is stored homogeneously. Another important difference between the operational environment and the data warehouse environment is in the logging of data. Since operational processing involves the update (or potential update) of data, there is a certain amount of overhead required. Logging is one type of overhead that comes with update processing. But data warehouse data does not require a log because no update is done. Therefore the basic system characteristics of the operating system are quite different. These then are the basic reasons why the operational environment and the data warehouse environment do not mix, even in the face of a parallel management of data. THE LEVELS OF THE WAREHOUSE The main benefit of the data warehouse residing on a parallel technology is that data can be accessed quickly and randomly. The details of data managed this way are (relatively!) easy to get to. But a parallel management of data is expensive. Most organizations try to position the data warehouse current detail level in the parallel environment, and let other levels of the data warehouse reside on other technologies. Figure 8 shows this arrangement. Copyright 2000 by William H. Inmon, all rights reserved Page 10

12 For a variety of reasons - economic and technological - the management of current detailed data by parallel technology and other data by other technologies is a very good solution. METADATA IN THE PARALLEL DATA WAREHOUSE ENVIRONMENT Metadata is one of the most important aspects of the data warehouse environment. The fact that the data warehouse resides on a parallel technology neither diminishes nor enhances the role of metadata. Typically the metadata stored with a data warehouse includes: data content, data structure, the mapping of data from the operational environment, the history of extracts, versioning, etc. PHYSICAL DESIGN IN THE PARALLEL DATA WAREHOUSE ENVIRONMENT The design of the data warehouse in the parallel environment proceeds exactly the same as the design of a data warehouse in the non-parallel environment in the early phases of design. The activities of defining the data model, defining the major subject areas, defining the system of record and so forth are the same for both environments. The major difference in the design of a data warehouse in the parallel environment comes Copyright 2000 by William H. Inmon, all rights reserved Page 11

13 when the physical design is created. The spread of the data over the different processors is a major design issue. The first issue is how many processors will there be. The next issue is how the data will be spread over the processors. Some of the relevant factors affecting this decision are: what will be the pattern of growth of the data, how much data will there be initially, what is the pattern of access of the data, and so forth. Another important design issue is what the primary key of the data ought to be. The primary key affects the physical spread of the data over the different parallel processors in that the primary key is the discriminator that allows the data to be spread in the first place. A related important design issue is the placement of the secondary key of the data for units of data not directly related to the key. Secondary data may be placed randomly over the parallel processors or may be forced into the same physical location as the data relating directly to the primary key. The usage of the data dictates which is the better choice. Partitioning of data is as important in the parallel environment as it is in the centralized data warehouse environment. Partitioning allows you to index data independently, restructure data independently, manage data independently, and so forth. The assignment of data to parallel processor by means of definition of keys is a very important design aspect in the data warehouse environment because the physical placement of data profoundly affects the pattern of access of data, which in turn has a profound effect on the effectiveness of the parallel management of data. Said another way, if the data is not spread properly over the parallel processors, the benefit of parallel processing is lost and the data may as well be managed by a single, central processor. A second important physical design aspect is the identification and support of derived (i.e., summary) data in the data warehouse environment. The storage of summary data makes sense when that data is used often and/or when an "official" calculation of data needs to be done and there is concern that if calculation is done more than once it will not be correct. Under these circumstances summarization and storing of data in the parallel environment makes sense. An important physical design technique in the parallel data warehouse environment is that of the prejoining of data when it is known that the data will be joined as a regular matter of course. If it is known that data will be joined, it is much more efficient to join the data at the moment of load than it is to join the data dynamically. Another physical design technique is to create artifacts of relationships in the data warehouse. Data relationships are important in the data warehouse. However, their implementation is quite different than that found in the operational environment. Copyright 2000 by William H. Inmon, all rights reserved Page 12

14 Because data is much more quickly accessible in the parallel data warehouse environment, it is a temptation to store as much detailed data as possible, on the theory that you can never tell when you will be needing a scrap of data. There is however a cost of storing data in the warehouse, even in the face of a parallel technology. The following rules of thumb for the management of data hold true: if the data will not be used for DSS processing, it has no place in the data warehouse, if the data is very old, it should be considered for placement in "deep freeze", bulk storage, and if the level of detail is so granular that it is unlikely to be used, the data should be summarized. SUMMARY The data warehouse can be managed by a parallel approach to technology. The parallel approach has multiple processors which operate on data independently and manage data independently. Because of the independent management of data, processors can be added linearly and independently. The question as to whether to use a parallel approach or a centralized approach depends on the volume of data to be managed and the access to the data. Even though the parallel approach offers a powerful alternative to the management of data, physical design issues are still very important in the design of the data warehouse. Copyright 2000 by William H. Inmon, all rights reserved Page 13

Snapshots in the Data Warehouse BY W. H. Inmon

Snapshots in the Data Warehouse BY W. H. Inmon Snapshots in the Data Warehouse BY W. H. Inmon There are three types of modes that a data warehouse is loaded in: loads from archival data loads of data from existing systems loads of data into the warehouse

More information

Capacity Planning Process Estimating the load Initial configuration

Capacity Planning Process Estimating the load Initial configuration Capacity Planning Any data warehouse solution will grow over time, sometimes quite dramatically. It is essential that the components of the solution (hardware, software, and database) are capable of supporting

More information

Q & A From Hitachi Data Systems WebTech Presentation:

Q & A From Hitachi Data Systems WebTech Presentation: Q & A From Hitachi Data Systems WebTech Presentation: RAID Concepts 1. Is the chunk size the same for all Hitachi Data Systems storage systems, i.e., Adaptable Modular Systems, Network Storage Controller,

More information

Recommendations for Performance Benchmarking

Recommendations for Performance Benchmarking Recommendations for Performance Benchmarking Shikhar Puri Abstract Performance benchmarking of applications is increasingly becoming essential before deployment. This paper covers recommendations and best

More information

Chapter 12 File Management. Roadmap

Chapter 12 File Management. Roadmap Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 12 File Management Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Overview Roadmap File organisation and Access

More information

Chapter 12 File Management

Chapter 12 File Management Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 12 File Management Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Roadmap Overview File organisation and Access

More information

Why Relative Share Does Not Work

Why Relative Share Does Not Work Why Relative Share Does Not Work Introduction Velocity Software, Inc March 2010 Rob van der Heij rvdheij @ velocitysoftware.com Installations that run their production and development Linux servers on

More information

THE ARCHIVAL SECTOR IN DW2.0 By W H Inmon

THE ARCHIVAL SECTOR IN DW2.0 By W H Inmon The fourth sector of the DW2.0 environment is the archival sector. Fig arch.1 shows the architectural positioning of the archival sector. Fig arch.1 The archival sector All data that flows into the archival

More information

An Oracle White Paper January 2013. A Technical Overview of New Features for Automatic Storage Management in Oracle Database 12c

An Oracle White Paper January 2013. A Technical Overview of New Features for Automatic Storage Management in Oracle Database 12c An Oracle White Paper January 2013 A Technical Overview of New Features for Automatic Storage Management in Oracle Database 12c TABLE OF CONTENTS Introduction 2 ASM Overview 2 Total Storage Management

More information

Benchmarking Hadoop & HBase on Violin

Benchmarking Hadoop & HBase on Violin Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages

More information

Performance Tuning for the Teradata Database

Performance Tuning for the Teradata Database Performance Tuning for the Teradata Database Matthew W Froemsdorf Teradata Partner Engineering and Technical Consulting - i - Document Changes Rev. Date Section Comment 1.0 2010-10-26 All Initial document

More information

Chapter 7 Memory Management

Chapter 7 Memory Management Operating Systems: Internals and Design Principles Chapter 7 Memory Management Eighth Edition William Stallings Frame Page Segment A fixed-length block of main memory. A fixed-length block of data that

More information

OLAP AND DATA WAREHOUSE BY W. H. Inmon

OLAP AND DATA WAREHOUSE BY W. H. Inmon OLAP AND DATA WAREHOUSE BY W. H. Inmon The goal of informational processing is to turn data into information. Online analytical processing (OLAP) is an important method by which this goal can be accomplished

More information

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc. Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE

More information

SAS Data Set Encryption Options

SAS Data Set Encryption Options Technical Paper SAS Data Set Encryption Options SAS product interaction with encrypted data storage Table of Contents Introduction: What Is Encryption?... 1 Test Configuration... 1 Data... 1 Code... 2

More information

Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No. # 11 Block Cipher Standards (DES) (Refer Slide

More information

ENTERPRISE VIRTUALIZATION ONE PLATFORM FOR ALL DATA

ENTERPRISE VIRTUALIZATION ONE PLATFORM FOR ALL DATA ENTERPRISE VIRTUALIZATION ONE PLATFORM FOR ALL DATA ENTERPRISE VIRTUALIZATION ONE PLATFORM FOR ALL DATA SUMMARY ONE PLATFORM FOR ALL DATA WOULD YOU LIKE TO SAVE 20% TO 30% ON YOUR STORAGE SPEND? We can

More information

RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems CLOUD COMPUTING GROUP - LITAO DENG

RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems CLOUD COMPUTING GROUP - LITAO DENG 1 RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems CLOUD COMPUTING GROUP - LITAO DENG Background 2 Hive is a data warehouse system for Hadoop that facilitates

More information

PIONEER RESEARCH & DEVELOPMENT GROUP

PIONEER RESEARCH & DEVELOPMENT GROUP SURVEY ON RAID Aishwarya Airen 1, Aarsh Pandit 2, Anshul Sogani 3 1,2,3 A.I.T.R, Indore. Abstract RAID stands for Redundant Array of Independent Disk that is a concept which provides an efficient way for

More information

Windows Server Performance Monitoring

Windows Server Performance Monitoring Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly

More information

Optimizing Performance. Training Division New Delhi

Optimizing Performance. Training Division New Delhi Optimizing Performance Training Division New Delhi Performance tuning : Goals Minimize the response time for each query Maximize the throughput of the entire database server by minimizing network traffic,

More information

Innovative technology for big data analytics

Innovative technology for big data analytics Technical white paper Innovative technology for big data analytics The HP Vertica Analytics Platform database provides price/performance, scalability, availability, and ease of administration Table of

More information

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs

More information

Bringing Big Data into the Enterprise

Bringing Big Data into the Enterprise Bringing Big Data into the Enterprise Overview When evaluating Big Data applications in enterprise computing, one often-asked question is how does Big Data compare to the Enterprise Data Warehouse (EDW)?

More information

The IntelliMagic White Paper: Storage Performance Analysis for an IBM Storwize V7000

The IntelliMagic White Paper: Storage Performance Analysis for an IBM Storwize V7000 The IntelliMagic White Paper: Storage Performance Analysis for an IBM Storwize V7000 Summary: This document describes how to analyze performance on an IBM Storwize V7000. IntelliMagic 2012 Page 1 This

More information

Prescriptive Analytics. A business guide

Prescriptive Analytics. A business guide Prescriptive Analytics A business guide May 2014 Contents 3 The Business Value of Prescriptive Analytics 4 What is Prescriptive Analytics? 6 Prescriptive Analytics Methods 7 Integration 8 Business Applications

More information

SIMULATION OF LOAD BALANCING ALGORITHMS: A Comparative Study

SIMULATION OF LOAD BALANCING ALGORITHMS: A Comparative Study SIMULATION OF LOAD BALANCING ALGORITHMS: A Comparative Study Milan E. Soklic Abstract This article introduces a new load balancing algorithm, called diffusive load balancing, and compares its performance

More information

Practical issues in DIY RAID Recovery

Practical issues in DIY RAID Recovery www.freeraidrecovery.com Practical issues in DIY RAID Recovery Based on years of technical support experience 2012 www.freeraidrecovery.com This guide is provided to supplement our ReclaiMe Free RAID Recovery

More information

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server

More information

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework Many corporations and Independent Software Vendors considering cloud computing adoption face a similar challenge: how should

More information

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

1. Comments on reviews a. Need to avoid just summarizing web page asks you for: 1. Comments on reviews a. Need to avoid just summarizing web page asks you for: i. A one or two sentence summary of the paper ii. A description of the problem they were trying to solve iii. A summary of

More information

ORACLE DATABASE 12C IN-MEMORY OPTION

ORACLE DATABASE 12C IN-MEMORY OPTION Oracle Database 12c In-Memory Option 491 ORACLE DATABASE 12C IN-MEMORY OPTION c The Top Tier of a Multi-tiered Database Architecture There is this famous character, called Mr. Jourdain, in The Bourgeois

More information

SOME STRAIGHT TALK ABOUT THE COSTS OF DATA WAREHOUSING

SOME STRAIGHT TALK ABOUT THE COSTS OF DATA WAREHOUSING Inmon Consulting SOME STRAIGHT TALK ABOUT THE COSTS OF DATA WAREHOUSING Inmon Consulting PO Box 210 200 Wilcox Street Castle Rock, Colorado 303-681-6772 An Inmon Consulting White Paper By W H Inmon By

More information

Oracle Rdb Performance Management Guide

Oracle Rdb Performance Management Guide Oracle Rdb Performance Management Guide Solving the Five Most Common Problems with Rdb Application Performance and Availability White Paper ALI Database Consultants 803-648-5931 www.aliconsultants.com

More information

IS YOUR DATA WAREHOUSE SUCCESSFUL? Developing a Data Warehouse Process that responds to the needs of the Enterprise.

IS YOUR DATA WAREHOUSE SUCCESSFUL? Developing a Data Warehouse Process that responds to the needs of the Enterprise. IS YOUR DATA WAREHOUSE SUCCESSFUL? Developing a Data Warehouse Process that responds to the needs of the Enterprise. Peter R. Welbrock Smith-Hanley Consulting Group Philadelphia, PA ABSTRACT Developing

More information

A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture

A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture Yangsuk Kee Department of Computer Engineering Seoul National University Seoul, 151-742, Korea Soonhoi

More information

Cloud Computing at Google. Architecture

Cloud Computing at Google. Architecture Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale

More information

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper Offload Enterprise Data Warehouse (EDW) to Big Data Lake Oracle Exadata, Teradata, Netezza and SQL Server Ample White Paper EDW (Enterprise Data Warehouse) Offloads The EDW (Enterprise Data Warehouse)

More information

White Paper. Optimizing the Performance Of MySQL Cluster

White Paper. Optimizing the Performance Of MySQL Cluster White Paper Optimizing the Performance Of MySQL Cluster Table of Contents Introduction and Background Information... 2 Optimal Applications for MySQL Cluster... 3 Identifying the Performance Issues.....

More information

(Refer Slide Time: 01:52)

(Refer Slide Time: 01:52) Software Engineering Prof. N. L. Sarda Computer Science & Engineering Indian Institute of Technology, Bombay Lecture - 2 Introduction to Software Engineering Challenges, Process Models etc (Part 2) This

More information

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation Top Ten Questions to Ask Your Primary Storage Provider About Their Data Efficiency May 2014 Copyright 2014 Permabit Technology Corporation Introduction The value of data efficiency technologies, namely

More information

Technical White Paper. Symantec Backup Exec 10d System Sizing. Best Practices For Optimizing Performance of the Continuous Protection Server

Technical White Paper. Symantec Backup Exec 10d System Sizing. Best Practices For Optimizing Performance of the Continuous Protection Server Symantec Backup Exec 10d System Sizing Best Practices For Optimizing Performance of the Continuous Protection Server Table of Contents Table of Contents...2 Executive Summary...3 System Sizing and Performance

More information

Application of Predictive Analytics for Better Alignment of Business and IT

Application of Predictive Analytics for Better Alignment of Business and IT Application of Predictive Analytics for Better Alignment of Business and IT Boris Zibitsker, PhD [email protected] July 25, 2014 Big Data Summit - Riga, Latvia About the Presenter Boris Zibitsker

More information

Evaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group

Evaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group NoSQL Evaluator s Guide McKnight Consulting Group William McKnight is the former IT VP of a Fortune 50 company and the author of Information Management: Strategies for Gaining a Competitive Advantage with

More information

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #2610771

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #2610771 ENHANCEMENTS TO SQL SERVER COLUMN STORES Anuhya Mallempati #2610771 CONTENTS Abstract Introduction Column store indexes Batch mode processing Other Enhancements Conclusion ABSTRACT SQL server introduced

More information

The Teradata Scalability Story

The Teradata Scalability Story Data Warehousing The Teradata Scalability Story By: Carrie Ballinger, Senior Technical Advisor, Teradata Development Table of Contents Executive Summary 2 Introduction 4 Scalability in the Data Warehouse

More information

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing Chapter 13 Disk Storage, Basic File Structures, and Hashing Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files

More information

Capacity Plan. Template. Version X.x October 11, 2012

Capacity Plan. Template. Version X.x October 11, 2012 Template Version X.x October 11, 2012 This is an integral part of infrastructure and deployment planning. It supports the goal of optimum provisioning of resources and services by aligning them to business

More information

WHAT IS ENTERPRISE OPEN SOURCE?

WHAT IS ENTERPRISE OPEN SOURCE? WHITEPAPER WHAT IS ENTERPRISE OPEN SOURCE? ENSURING YOUR IT INFRASTRUCTURE CAN SUPPPORT YOUR BUSINESS BY DEB WOODS, INGRES CORPORATION TABLE OF CONTENTS: 3 Introduction 4 Developing a Plan 4 High Availability

More information

Database Schema Management

Database Schema Management Whitemarsh Information Systems Corporation 2008 Althea Lane Bowie, Maryland 20716 Tele: 301-249-1142 Email: [email protected] Web: www.wiscorp.com Table of Contents 1. Objective...1 2. Topics Covered...2

More information

V:Drive - Costs and Benefits of an Out-of-Band Storage Virtualization System

V:Drive - Costs and Benefits of an Out-of-Band Storage Virtualization System V:Drive - Costs and Benefits of an Out-of-Band Storage Virtualization System André Brinkmann, Michael Heidebuer, Friedhelm Meyer auf der Heide, Ulrich Rückert, Kay Salzwedel, and Mario Vodisek Paderborn

More information

HP Smart Array Controllers and basic RAID performance factors

HP Smart Array Controllers and basic RAID performance factors Technical white paper HP Smart Array Controllers and basic RAID performance factors Technology brief Table of contents Abstract 2 Benefits of drive arrays 2 Factors that affect performance 2 HP Smart Array

More information

Speed and Persistence for Real-Time Transactions

Speed and Persistence for Real-Time Transactions Speed and Persistence for Real-Time Transactions by TimesTen and Solid Data Systems July 2002 Table of Contents Abstract 1 Who Needs Speed and Persistence 2 The Reference Architecture 3 Benchmark Results

More information

Virtuoso and Database Scalability

Virtuoso and Database Scalability Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of

More information

TPCalc : a throughput calculator for computer architecture studies

TPCalc : a throughput calculator for computer architecture studies TPCalc : a throughput calculator for computer architecture studies Pierre Michaud Stijn Eyerman Wouter Rogiest IRISA/INRIA Ghent University Ghent University [email protected] [email protected]

More information

Oracle Database In-Memory The Next Big Thing

Oracle Database In-Memory The Next Big Thing Oracle Database In-Memory The Next Big Thing Maria Colgan Master Product Manager #DBIM12c Why is Oracle do this Oracle Database In-Memory Goals Real Time Analytics Accelerate Mixed Workload OLTP No Changes

More information

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.

More information

Module 14: Scalability and High Availability

Module 14: Scalability and High Availability Module 14: Scalability and High Availability Overview Key high availability features available in Oracle and SQL Server Key scalability features available in Oracle and SQL Server High Availability High

More information

The Top 20 VMware Performance Metrics You Should Care About

The Top 20 VMware Performance Metrics You Should Care About The Top 20 VMware Performance Metrics You Should Care About Why you can t ignore them and how they can help you find and avoid problems. WHITEPAPER BY ALEX ROSEMBLAT Table of Contents Introduction... 3

More information

A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems*

A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems* A Content-Based Load Balancing Algorithm for Metadata Servers in Cluster File Systems* Junho Jang, Saeyoung Han, Sungyong Park, and Jihoon Yang Department of Computer Science and Interdisciplinary Program

More information

Nimble Storage Best Practices for Microsoft SQL Server

Nimble Storage Best Practices for Microsoft SQL Server BEST PRACTICES GUIDE: Nimble Storage Best Practices for Microsoft SQL Server Summary Microsoft SQL Server databases provide the data storage back end for mission-critical applications. Therefore, it s

More information

Notes on Factoring. MA 206 Kurt Bryan

Notes on Factoring. MA 206 Kurt Bryan The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor

More information

In-House vs. Software as as Service (SaaS)

In-House vs. Software as as Service (SaaS) In-House vs. Software as as Service (SaaS) A Lifestyle Cost of Ownership Comparison Ensenta Corporation Copyright 2011 Ensenta Corporation 2 In-House vs. SaaS A common decision facing users of mission-critical

More information

Colgate-Palmolive selects SAP HANA to improve the speed of business analytics with IBM and SAP

Colgate-Palmolive selects SAP HANA to improve the speed of business analytics with IBM and SAP selects SAP HANA to improve the speed of business analytics with IBM and SAP Founded in 1806, is a global consumer products company which sells nearly $17 billion annually in personal care, home care,

More information

EMC Unified Storage for Microsoft SQL Server 2008

EMC Unified Storage for Microsoft SQL Server 2008 EMC Unified Storage for Microsoft SQL Server 2008 Enabled by EMC CLARiiON and EMC FAST Cache Reference Copyright 2010 EMC Corporation. All rights reserved. Published October, 2010 EMC believes the information

More information

SUN ORACLE EXADATA STORAGE SERVER

SUN ORACLE EXADATA STORAGE SERVER SUN ORACLE EXADATA STORAGE SERVER KEY FEATURES AND BENEFITS FEATURES 12 x 3.5 inch SAS or SATA disks 384 GB of Exadata Smart Flash Cache 2 Intel 2.53 Ghz quad-core processors 24 GB memory Dual InfiniBand

More information

Parallel Scalable Algorithms- Performance Parameters

Parallel Scalable Algorithms- Performance Parameters www.bsc.es Parallel Scalable Algorithms- Performance Parameters Vassil Alexandrov, ICREA - Barcelona Supercomputing Center, Spain Overview Sources of Overhead in Parallel Programs Performance Metrics for

More information

EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server

EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server White Paper EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server Abstract This white paper addresses the challenges currently facing business executives to store and process the growing

More information

arxiv:1112.0829v1 [math.pr] 5 Dec 2011

arxiv:1112.0829v1 [math.pr] 5 Dec 2011 How Not to Win a Million Dollars: A Counterexample to a Conjecture of L. Breiman Thomas P. Hayes arxiv:1112.0829v1 [math.pr] 5 Dec 2011 Abstract Consider a gambling game in which we are allowed to repeatedly

More information

Contributions to Gang Scheduling

Contributions to Gang Scheduling CHAPTER 7 Contributions to Gang Scheduling In this Chapter, we present two techniques to improve Gang Scheduling policies by adopting the ideas of this Thesis. The first one, Performance- Driven Gang Scheduling,

More information

SAS Grid Manager Testing and Benchmarking Best Practices for SAS Intelligence Platform

SAS Grid Manager Testing and Benchmarking Best Practices for SAS Intelligence Platform SAS Grid Manager Testing and Benchmarking Best Practices for SAS Intelligence Platform INTRODUCTION Grid computing offers optimization of applications that analyze enormous amounts of data as well as load

More information

Managing Capacity Using VMware vcenter CapacityIQ TECHNICAL WHITE PAPER

Managing Capacity Using VMware vcenter CapacityIQ TECHNICAL WHITE PAPER Managing Capacity Using VMware vcenter CapacityIQ TECHNICAL WHITE PAPER Table of Contents Capacity Management Overview.... 3 CapacityIQ Information Collection.... 3 CapacityIQ Performance Metrics.... 4

More information

Enterprise Intelligence - Enabling High Quality in the Data Warehouse/DSS Environment. by Bill Inmon. INTEGRITY IN All Your INformation

Enterprise Intelligence - Enabling High Quality in the Data Warehouse/DSS Environment. by Bill Inmon. INTEGRITY IN All Your INformation INTEGRITY IN All Your INformation R TECHNOLOGY INCORPORATED Enterprise Intelligence - Enabling High Quality in the Data Warehouse/DSS Environment by Bill Inmon WPS.INM.E.399.1.e Introduction In a few short

More information

Moving Virtual Storage to the Cloud

Moving Virtual Storage to the Cloud Moving Virtual Storage to the Cloud White Paper Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage www.parallels.com Table of Contents Overview... 3 Understanding the Storage

More information

Public Cloud Partition Balancing and the Game Theory

Public Cloud Partition Balancing and the Game Theory Statistics Analysis for Cloud Partitioning using Load Balancing Model in Public Cloud V. DIVYASRI 1, M.THANIGAVEL 2, T. SUJILATHA 3 1, 2 M. Tech (CSE) GKCE, SULLURPETA, INDIA [email protected] [email protected]

More information

Comprehending the Tradeoffs between Deploying Oracle Database on RAID 5 and RAID 10 Storage Configurations. Database Solutions Engineering

Comprehending the Tradeoffs between Deploying Oracle Database on RAID 5 and RAID 10 Storage Configurations. Database Solutions Engineering Comprehending the Tradeoffs between Deploying Oracle Database on RAID 5 and RAID 10 Storage Configurations A Dell Technical White Paper Database Solutions Engineering By Sudhansu Sekhar and Raghunatha

More information

Whitepaper: performance of SqlBulkCopy

Whitepaper: performance of SqlBulkCopy We SOLVE COMPLEX PROBLEMS of DATA MODELING and DEVELOP TOOLS and solutions to let business perform best through data analysis Whitepaper: performance of SqlBulkCopy This whitepaper provides an analysis

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

STORAGE CENTER. The Industry s Only SAN with Automated Tiered Storage STORAGE CENTER

STORAGE CENTER. The Industry s Only SAN with Automated Tiered Storage STORAGE CENTER STORAGE CENTER DATASHEET STORAGE CENTER Go Beyond the Boundaries of Traditional Storage Systems Today s storage vendors promise to reduce the amount of time and money companies spend on storage but instead

More information

Windows Server 2008 R2 Hyper-V Live Migration

Windows Server 2008 R2 Hyper-V Live Migration Windows Server 2008 R2 Hyper-V Live Migration Table of Contents Overview of Windows Server 2008 R2 Hyper-V Features... 3 Dynamic VM storage... 3 Enhanced Processor Support... 3 Enhanced Networking Support...

More information

Scaling Microsoft SQL Server

Scaling Microsoft SQL Server Recommendations and Techniques for Scaling Microsoft SQL To support many more users, a database must easily scale out as well as up. This article describes techniques and strategies for scaling out the

More information

The Benefits of POWER7+ and PowerVM over Intel and an x86 Hypervisor

The Benefits of POWER7+ and PowerVM over Intel and an x86 Hypervisor The Benefits of POWER7+ and PowerVM over Intel and an x86 Hypervisor Howard Anglin [email protected] IBM Competitive Project Office May 2013 Abstract...3 Virtualization and Why It Is Important...3 Resiliency

More information

The Curious Case of Database Deduplication. PRESENTATION TITLE GOES HERE Gurmeet Goindi Oracle

The Curious Case of Database Deduplication. PRESENTATION TITLE GOES HERE Gurmeet Goindi Oracle The Curious Case of Database Deduplication PRESENTATION TITLE GOES HERE Gurmeet Goindi Oracle Agenda Introduction Deduplication Databases and Deduplication All Flash Arrays and Deduplication 2 Quick Show

More information

How to analyse your business sales 80/20 rule

How to analyse your business sales 80/20 rule 10 Minute Guide How to analyse your business sales 80/20 rule Membership Services Moor Hall, Cookham Maidenhead Berkshire, SL6 9QH, UK Telephone: 01628 427500 www.cim.co.uk/marketingresources The Chartered

More information

FAWN - a Fast Array of Wimpy Nodes

FAWN - a Fast Array of Wimpy Nodes University of Warsaw January 12, 2011 Outline Introduction 1 Introduction 2 3 4 5 Key issues Introduction Growing CPU vs. I/O gap Contemporary systems must serve millions of users Electricity consumed

More information

Deploying and Optimizing SQL Server for Virtual Machines

Deploying and Optimizing SQL Server for Virtual Machines Deploying and Optimizing SQL Server for Virtual Machines Deploying and Optimizing SQL Server for Virtual Machines Much has been written over the years regarding best practices for deploying Microsoft SQL

More information

Azure VM Performance Considerations Running SQL Server

Azure VM Performance Considerations Running SQL Server Azure VM Performance Considerations Running SQL Server Your company logo here Vinod Kumar M @vinodk_sql http://blogs.extremeexperts.com Session Objectives And Takeaways Session Objective(s): Learn the

More information

FAS6200 Cluster Delivers Exceptional Block I/O Performance with Low Latency

FAS6200 Cluster Delivers Exceptional Block I/O Performance with Low Latency FAS6200 Cluster Delivers Exceptional Block I/O Performance with Low Latency Dimitris Krekoukias Systems Engineer NetApp Data ONTAP 8 software operating in Cluster-Mode is the industry's only unified, scale-out

More information

RAID HARDWARE. On board SATA RAID controller. RAID drive caddy (hot swappable) SATA RAID controller card. Anne Watson 1

RAID HARDWARE. On board SATA RAID controller. RAID drive caddy (hot swappable) SATA RAID controller card. Anne Watson 1 RAID HARDWARE On board SATA RAID controller SATA RAID controller card RAID drive caddy (hot swappable) Anne Watson 1 RAID The word redundant means an unnecessary repetition. The word array means a lineup.

More information

W H I T E P A P E R E X E C U T I V E S U M M AR Y S I T U AT I O N O V E R V I E W. Sponsored by: EMC Corporation. Laura DuBois May 2010

W H I T E P A P E R E X E C U T I V E S U M M AR Y S I T U AT I O N O V E R V I E W. Sponsored by: EMC Corporation. Laura DuBois May 2010 W H I T E P A P E R E n a b l i n g S h a r e P o i n t O p e r a t i o n a l E f f i c i e n c y a n d I n f o r m a t i o n G o v e r n a n c e w i t h E M C S o u r c e O n e Sponsored by: EMC Corporation

More information

Intelligent Log Analyzer. André Restivo <[email protected]>

Intelligent Log Analyzer. André Restivo <andre.restivo@portugalmail.pt> Intelligent Log Analyzer André Restivo 9th January 2003 Abstract Server Administrators often have to analyze server logs to find if something is wrong with their machines.

More information

NoSQL Database Options

NoSQL Database Options NoSQL Database Options Introduction For this report, I chose to look at MongoDB, Cassandra, and Riak. I chose MongoDB because it is quite commonly used in the industry. I chose Cassandra because it has

More information