Data Warehousing Concepts



Similar documents
OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Sterling Business Intelligence

IST722 Data Warehousing

When to consider OLAP?

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

Data warehouse Architectures and processes

Part 22. Data Warehousing

Sterling Business Intelligence

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

Data Warehouse: Introduction

Data Warehousing Systems: Foundations and Architectures

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Fluency With Information Technology CSE100/IMT100

The Benefits of Data Modeling in Data Warehousing

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

Increase productivity and safety data of warehouse systems using Shareplex for Oracle

Foundations of Business Intelligence: Databases and Information Management

Performance Enhancement Techniques of Data Warehouse

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

SQL Server 2008 Performance and Scale

CHAPTER : 5. MODELS & HYBRID METHODS

DATA WAREHOUSING AND OLAP TECHNOLOGY

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Course MIS. Foundations of Business Intelligence

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Data Warehouses & OLAP

Week 3 lecture slides

Designing a Microsoft SQL Server 2005 Infrastructure

Databases in Organizations

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Data Warehousing and OLAP Technology for Knowledge Discovery

Chapter 3 - Data Replication and Materialized Integration

ENTERPRISE RESOURCE PLANNING SYSTEMS

Implementing a Data Warehouse with Microsoft SQL Server 2012

MS SQL Performance (Tuning) Best Practices:

University of Gaziantep, Department of Business Administration

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

CHAPTER 5: BUSINESS ANALYTICS

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage

Creating BI solutions with BISM Tabular. Written By: Dan Clark

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

A McKnight Associates, Inc. White Paper: Effective Data Warehouse Organizational Roles and Responsibilities

Framework for Data warehouse architectural components

Integrating data in the Information System An Open Source approach

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

Foundations of Business Intelligence: Databases and Information Management

IMPLEMENTATION OF DATA WAREHOUSE SAP BW IN THE PRODUCTION COMPANY. Maria Kowal, Galina Setlak

Data Mart/Warehouse: Progress and Vision

hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau

Database Design Patterns. Winter Lecture 24

Optimizing Your Data Warehouse Design for Superior Performance

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning

CHAPTER 4: BUSINESS ANALYTICS

A Comparative Study on Operational Database, Data Warehouse and Hadoop File System T.Jalaja 1, M.Shailaja 2

Foundations of Business Intelligence: Databases and Information Management

SQL Server 2012 Business Intelligence Boot Camp


Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov

An Introduction to RAID. Giovanni Stracquadanio

Implementing a Data Warehouse with Microsoft SQL Server

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

HYPERION MASTER DATA MANAGEMENT SOLUTIONS FOR IT

A Model-based Software Architecture for XML Data and Metadata Integration in Data Warehouse Systems

European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project

LEARNING SOLUTIONS website milner.com/learning phone

DATA WAREHOUSING - OLAP

University Data Warehouse Design Issues: A Case Study

Data Warehousing and Data Mining

CONCEPTUALIZING BUSINESS INTELLIGENCE ARCHITECTURE MOHAMMAD SHARIAT, Florida A&M University ROSCOE HIGHTOWER, JR., Florida A&M University

Driving Peak Performance IBM Corporation

By Makesh Kannaiyan 8/27/2011 1

Would-be system and database administrators. PREREQUISITES: At least 6 months experience with a Windows operating system.

Comprehending the Tradeoffs between Deploying Oracle Database on RAID 5 and RAID 10 Storage Configurations. Database Solutions Engineering

BENEFITS OF AUTOMATING DATA WAREHOUSING

CHAPTER 3 PROBLEM STATEMENT AND RESEARCH METHODOLOGY

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi

Performance Counters. Microsoft SQL. Technical Data Sheet. Overview:

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

Course Outline. Module 1: Introduction to Data Warehousing

Virtual Operational Data Store (VODS) A Syncordant White Paper

Data Warehouse design

Module 1: Introduction to Data Warehousing and OLAP

Chapter 3 Data Warehouse - technological growth

Week 13: Data Warehousing. Warehousing

Key Attributes for Analytics in an IBM i environment

In principle, SAP BW architecture can be divided into three layers:

Data Warehousing and Data Mining in Business Applications

Transcription:

Data Warehousing Concepts JB Software and Consulting Inc 1333 McDermott Drive, Suite 200 Allen, TX 75013.

[[[[[ DATA WAREHOUSING What is a Data Warehouse? Decision Support Systems (DSS), provides an analysis of the data that is maintained by the various systems within the organization. DSS generally refers to a system that is geared towards a specific area of the business: product sales/distribution, labor/productivity. These systems maintain information about the business, and are used by mid-level managers to provide the information necessary to make decisions A Data Warehouse is a read-only database that is used as a foundation for decision support. A Data Warehouse is a relational database, designed to store enormous amount of historical data from a variety of different sources, to ease in better analysis of data. A Data Warehouse stores historical data, which is static rather than transactional data, which is dynamic and hence segregating the analytical records from the transactional records. Thus, with a Data Warehouse an organisation can perform an efficient information access from various sources and make better analytical decisions. Need for a Data Warehouse To survive in a competitive market, an organization needs to make decisions very quickly. Unfortunately, the business centers of an Organization are spread universally and data needs to effectively access from these various sources. Many organizations look for an efficient decision making tool which will assist them in accessing their huge data globally. The solution is Data Warehouse. What is an Online Analytical Process (OLAP)? An OLAP stores static historical data. It does not allow any modifications (DML operations) to be performed on the data. It only serves the purpose of analyzing the data. Data Warehouse is generally a OLAP server. 1

What is an Online Transactional Process (OLTP)? An OLTP stores dynamic data, allowing the user to perform DML operations on the data. It is mainly beneficial in situations where current data transactions are often performed. Difference between OLAP and OLTP? The fundamental concept behind data warehousing is to incorporate data from multiple databases into a single database. While the Data Warehouse may incorporate all of the data used by an organization, it differs from an Enterprise Database in three significant areas: The existing systems remain in operation The common data is replicated in the warehouse The warehouse is not updated in real time OLAP An OLAP stores historical data. An OLAP query can retrieve millions of records. An OLAP supports Adhoc queries. In an OLAP system, the database is updated on a regular basis by the ELT process. The user does not have rights to directly update the database. An OLAP uses denormalized schemas to optimize query performance. OLTP An OLTP stores the current data. An OLTP query can retrieve only a handful of records. An OLTP supports only the pre-defined queries. In an OLTP system, the database is updated by the user as and when the user performs the transaction. An OLTP uses normalized schemas to optimized DML queries performance. 2

Data Warehouse Architecture The Data Warehouse architecture varies and is specific to the organization needs. The common Data Warehouse architectures are: 1. Data Warehouse Architecture (Basics) 2. Data Warehouse Architecture (With a staging area) 3. Data Warehouse Architecture (with the Staging Area and the Data Marts) Data Warehouse Architecture (Basics) The Data Warehouse acts as a centralized source for data access. It stores historical data of all the OLAP systems. The users access the static data without knowing, where the data actually is retrieved from. Thus, the Data Warehouse acts as a mediator between the user and the data source for retrieval of data and in better analysis of the data. 3

Data Warehouse Architecture (With a staging area) Before inserting the data in the Data Warehouse, the data in the data sources should be cleaned and processed. This can be done either programmatically or through the Staging Area. The staging area acts as a cleanup mediator between the data source and the Data Warehouse. The data coming from the data source is purified by the Staging area and then sent to the Data Warehouse. 4

Data Warehouse Architecture (with the Staging Area and the Data Marts) DSS may read data directly from the warehouse, or data may be replicated into other relational or multi-dimensional databases, referred to as data marts. The Data Mart used by the DSS may be a relational or multidimensional hierarchical database. When data marts are employed, the data mart databases are extremely denormalized, while the data warehouse database may be somewhat normalized. This configuration minimizes the size of the centralized warehouse database by reducing the data redundancy, while maximizing the query performance of the DSS by allowing redundant data in the data mart(s). Historical data, from legacy systems, is generally included in the warehouse and the data mart. An organization may wish to customize the Data Warehouse for different departments within the organization. When a user of a specific department request for data, the data is retrieved from, the data warehouse and stored in the respective Data Mart. As shown in the above figure, Data Marts are classified as accounts, marketing and computers. A user might wish to analyze the historical data of the accounts or marketing or computers department. 5

How is data/information maintained in the Data Warehouse? The data warehouse environment is, by definition, redundant. Multiple production and legacy database systems feed data into a common database, which, in turn, may be replicating the data into one or more data marts. The mapping of data from production and legacy systems may be written into a custom routine or a Data Warehouse Management tool (ETL Tools) may be employed. The level of decisions made from DSS usually does not require up-to-the-minute data. The data in the warehouse and data marts is not in synchronous with the production databases. Rather, the information contained in the warehouse and data marts is updated at regular scheduled intervals. The fact that the data mart is read-only, and is not updated on the fly, enables it to have a very high level of denormalization. Note: It generally takes several hours to update the data warehouse and data marts from the production systems. Star Schema Star Schema and Snowflake Schema have become key buzzwords in data warehousing. Both schema types are relational database schemas. What spaces them apart from other schemas is that they represent highly denormalized, multi-related tables within the relational database. The star schema is the simplest data warehouse schema. It is called a star schema because the diagram resembles a star, with points radiating from a center. The center of the star consists of one or more fact tables and the points of the star are the dimension tables, as shown below. The Star Schema consists of a normalized fact table, and several denormalized dimension tables. 6

The Fact Table has a concatenated key, made up of the Primary Keys from each of the Dimension Tables. This table includes facts about individual objects or transactions. The Dimension Tables include summarizations and derived information. With this structure, the user of the relational data mart can perform quick ad hoc queries of the database. The most natural way to model a data warehouse is as a star schema, only one join establishes the relationship between the fact table and any one of the dimension tables. A star schema optimizes performance by keeping queries simple and providing fast response time. All the information about each level is stored in one row. Other Schemas Some schemas in data warehousing environments use third normal form rather than star schemas. Another schema that is sometimes useful is the snowflake schema, which is a star schema with normalized dimensions in a tree structure. Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been grouped into multiple tables instead of one large table. For example, a product dimension table in a star schema might be normalized into a products table, a product category table, and a product manufacturer table in a snowflake schema. While this saves space, it increases the number of dimension tables and requires more foreign key joins. The result is more complex queries and reduced query performance. Figure below presents a graphical representation of a snowflake schema. 7

8

Overview of Hardware and I/O Considerations in Data Warehouses Data warehouses are normally very concerned with I/O performance. This is in contrast to OLTP systems, where the potential bottleneck depends on user workload and application access patterns. When a system is constrained by I/O capabilities, it is I/O bound, or has an I/O bottleneck. When a system is constrained by having limited CPU resources, it is CPU bound, or has a CPU bottleneck. Database architects frequently use RAID (Redundant Arrays of Inexpensive Disks) systems to overcome I/O bottlenecks and to provide higher availability. RAID can be implemented in several levels, ranging from 0 to 7. Many hardware vendors have enhanced these basic levels to lessen the impact of some of the original restrictions at a given RAID level. 9