Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach



Similar documents
SIZE & ESTIMATION OF DATA WAREHOUSE SYSTEMS

Data Warehouse Overview. Srini Rengarajan

IST722 Data Warehousing

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Lection 3-4 WAREHOUSING

Dimensional Modeling for Data Warehouse

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

CHAPTER 5: BUSINESS ANALYTICS

Methodology Framework for Analysis and Design of Business Intelligence Systems

CHAPTER 4: BUSINESS ANALYTICS

Speeding ETL Processing in Data Warehouses White Paper

Trends in Data Warehouse Data Modeling: Data Vault and Anchor Modeling

Data Warehouse design

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Week 3 lecture slides

Data Warehousing Systems: Foundations and Architectures

SAS BI Course Content; Introduction to DWH / BI Concepts

Data Testing on Business Intelligence & Data Warehouse Projects

APPLYING FUNCTION POINTS WITHIN A SOA ENVIRONMENT

Enterprise Solutions. Data Warehouse & Business Intelligence Chapter-8

Why Business Intelligence

Data Vault at work. Does Data Vault fulfill its promise? GDF SUEZ Energie Nederland

Understanding Data Warehousing. [by Alex Kriegel]

CHAPTER 3. Data Warehouses and OLAP

Mario Guarracino. Data warehousing

Data Vault and The Truth about the Enterprise Data Warehouse

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design

Part 22. Data Warehousing

A Design and implementation of a data warehouse for research administration universities

OLAP Theory-English version

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Introduction to Function Points

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

DATA WAREHOUSING AND OLAP TECHNOLOGY

The COSMIC Functional Size Measurement Method Version 3.0

Republic Polytechnic School of Information and Communications Technology C355 Business Intelligence. Module Curriculum

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

ETL-EXTRACT, TRANSFORM & LOAD TESTING

Paper DM10 SAS & Clinical Data Repository Karthikeyan Chidambaram

The Data Warehouse ETL Toolkit

Data Warehousing and OLAP Technology for Knowledge Discovery

Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Course 20467A; 5 Days

Data Warehouse: Introduction

LEARNING SOLUTIONS website milner.com/learning phone

THE DATA WAREHOUSE ETL TOOLKIT CDT803 Three Days

INTRODUCTION TO BUSINESS INTELLIGENCE What to consider implementing a Data Warehouse and Business Intelligence

The Benefits of Data Modeling in Business Intelligence

Jagir Singh, Greeshma, P Singh University of Northern Virginia. Abstract

Advanced Data Management Technologies

HYPERION MASTER DATA MANAGEMENT SOLUTIONS FOR IT

Chapter 5. Learning Objectives. DW Development and ETL

Whitepaper. Data Warehouse/BI Testing Offering YOUR SUCCESS IS OUR FOCUS. Published on: January 2009 Author: BIBA PRACTICE

Implementing a Data Warehouse with Microsoft SQL Server

14. Data Warehousing & Data Mining

Business Intelligence: Effective Decision Making

Requirements are elicited from users and represented either informally by means of proper glossaries or formally (e.g., by means of goal-oriented

SENG 520, Experience with a high-level programming language. (304) , Jeff.Edgell@comcast.net

MDM for the Enterprise: Complementing and extending your Active Data Warehousing strategy. Satish Krishnaswamy VP MDM Solutions - Teradata

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Implementing a Data Warehouse with Microsoft SQL Server

How To Model Data For Business Intelligence (Bi)

Modeling: Operational, Data Warehousing & Data Marts

MDM and Data Warehousing Complement Each Other

Data Warehousing & Business Intelligence

Dimensional Modeling and E-R Modeling In. Joseph M. Firestone, Ph.D. White Paper No. Eight. June 22, 1998

Meta-data and Data Mart solutions for better understanding for data and information in E-government Monitoring

Microsoft Data Warehouse in Depth

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

A Survey on Data Warehouse Architecture

Presented by: Jose Chinchilla, MCITP

Original Research Articles

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

Implementing a Data Warehouse with Microsoft SQL Server MOC 20463

COURSE OUTLINE MOC 20463: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

East Asia Network Sdn Bhd

MIS636 AWS Data Warehousing and Business Intelligence Course Syllabus

CHAPTER SIX DATA. Business Intelligence The McGraw-Hill Companies, All Rights Reserved

FUNCTION POINT ANALYSIS: Sizing The Software Deliverable. BEYOND FUNCTION POINTS So you ve got the count, Now what?

Data Warehousing and Data Mining

資 料 倉 儲 (Data Warehousing)

3/17/2009. Knowledge Management BIKM eclassifier Integrated BIKM Tools

With business intelligence, we create a learning organization that adapts quickly to market changes and stays one step ahead of the competition.

Data Warehousing. Yeow Wei Choong Anne Laurent

When to consider OLAP?

IMPLEMENTATION OF DATA WAREHOUSE SAP BW IN THE PRODUCTION COMPANY. Maria Kowal, Galina Setlak

Business Intelligence Systems: a Comparative Analysis

Lecture Data Warehouse Systems

Mastering Data Warehouse Aggregates. Solutions for Star Schema Performance

New Approach of Computing Data Cubes in Data Warehousing

Master Data Management and Data Warehousing. Zahra Mansoori

Data Warehouse - Basic Concepts

Making Business Intelligence Easy. Whitepaper Measuring data quality for successful Master Data Management

Microsoft. Course 20463C: Implementing a Data Warehouse with Microsoft SQL Server

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days

Reflections on Agile DW by a Business Analytics Practitioner. Werner Engelen Principal Business Analytics Architect

CHAPTER 4 Data Warehouse Architecture

Transcription:

2006 ISMA Conference 1 Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach Priya Lobo CFPS Satyam Computer Services Ltd. 69, Railway Parallel Road, Kumarapark West, Bangalore 560020, INDIA Priya_Lobo@satyam.com www.satyam.com 14 th September 2006

2006 ISMA Conference 2 Agenda Why Domain-specific guidelines for Data Warehouses? Unique Data Warehouse Characteristics Quick Review of Related IFPUG 4.2 rules Sizing Data in a Data Warehouse, using IFPUG 4.2 rules Conclusion Questions

Why Domain Specific Guidelines for Data Warehouses? 2006 ISMA Conference 3 IFPUG s CPM 4.2 provides a generic set of Software Sizing practices intended for sizing every domain or application Domain-specific counting guidelines are necessary to size functionally unique applications Data Warehouse (DW) systems are a special context for the application of software functional measurement and hence require domain specific sizing guidelines

Why Domain Specific Guidelines for Data Warehouses? 2006 ISMA Conference 4 The Data Component within a DW is unique (different from traditional, operational systems) Existence of different Data Structures, Data Types and Data Redundancy in a DW make it difficult to force a logical view of the Data A Consistent and Auditable approach is important to sizing logical data in a DW This approach, based on IFPUG 4.2 has been successfully adopted at multiple sites both for development and enhancement DW projects

Why Domain Specific Guidelines for Data Warehouses? 2006 ISMA Conference 5 Prior to Sizing Logical Data in a DW, using IFPUG 4.2 rules, lets look at some Unique Data Warehouse Characteristics and Related IFPUG 4.2 rules

2006 ISMA Conference 6 Unique Data Warehouse Characteristics A Data Warehouse is a logical collection of information gathered from many different operational databases, to create a business intelligent application Contains cleansed and organized data to support business analysis activities and decision-making tasks Periodically, one imports data from enterprise resource planning (ERP) systems, mainframe systems and other related business software systems into the data warehouse for further processing

Unique Data Warehouse Characteristics DATA WAREHOUSE STRUCTURE PRESENTATION ODS: Operational Data Store EDW: Enterprise DW 2006 ISMA Conference 7

2006 ISMA Conference 8 Unique Data Warehouse Characteristics Enterprise Data Warehouse (EDW) Contains data captured from operational systems, cleaned, transformed, integrated and loaded into a separate Subject Oriented database Data is accumulated to show a historical record Data Mart (DM) Subset of corporate data (historical, summarized) that is of value to a specific business unit or set of users Contains multi dimensional data (can be hybrid as well) Dependent Data Mart - data loaded from EDW Independent Data Mart - data loaded directly from ODS

2006 ISMA Conference 9 Unique Data Warehouse Characteristics Source Systems Core operational systems that feed data into the DW Operational Data Store (ODS) Stores data received from source systems Is an extract of the operational data Presentation Area Area where the organized data is available for direct querying by users & data access tools Data is dimensional and atomic Extraction, Transformation and Loading (ETL) Set of processes by which the operational source data is prepared for the DW

2006 ISMA Conference 10 Quick Review of Related IFPUG 4.2 rules FP Counting Process Data Data Functions Unadjusted Function Point Count Application to to be be Counted Type of of Count & Appln. Boundary Transaction Functions Final Adjusted FP Count 14 GSC Value Adjustment Factor

2006 ISMA Conference 11 Quick Review of Related IFPUG 4.2 rules Data Functions Represents the functionality provided to the user to meet internal and external data requirements Defined as Internal Logical Files (ILF) and External Interface Files (EIF) ILF User identifiable group of logically related data or control information maintained within the boundary of the application EIF User identifiable group of logically related data or control information referenced by the application, but maintained within the boundary of another application

2006 ISMA Conference 12 Quick Review of Related IFPUG 4.2 rules Logical Data Types Business Data Core User data; Business Objects Satisfies functional user requirements Eg: Customer file, Job File, Invoice File Reference Data Supports business rules for the maintenance of business data Satisfies functional user requirements Eg: Tax Rates, Threshold Settings Code Data (List Data) Usually represents technical requirements Eg: State Code, Payment Type code

2006 ISMA Conference 13 SUGGESTED SIZING APPROACH [1] Basic Philosophy [2] Table (Data) Types found in a Data Warehouse [3] Multidimensional Data Structures [4] Lessons Learnt

2006 ISMA Conference 14 [1] Basic Philosophy For a DW table to be included as a Logical Data File it must display Persistence, have at least one load function which processes the input data or be referenced by at least one output function The Count should be Inclusive rather than Exclusive Philosophy of sizing data is not whether a table is to be counted but whether or not the data in the table, when interpreted logically can be viewed as business data, and if so has not been included elsewhere in the count

2006 ISMA Conference 15 Data in a DW is (generally) defined as belonging to : Operational Data Store (ODS) Enterprise Data Warehouse (EDW) Data Mart (DM) Presentation Layer (Cubes may be present here) If the data in any of these areas show persistence and have at least 1 Process loading it, then count as logical files, using the respective approach for the table type or multidimensional schema as suggested in the following slides

2006 ISMA Conference 16 [2] Table (Data) Types found in a Data Warehouse Metadata Translation tables Staging tables Temporary tables Aggregates Logs/Event Files Release or Publish tables View tables Existence tables Facts Dimensions

2006 ISMA Conference 17 Metadata Data about Data Keeps track of what is where in a DW Examples - Data Dictionary Files, Event Files, User Profiles Files Reference Data as defined in CPM 4.2 Count metadata that is recognizable to the user (including administrator) as Logical files. If maintained within the application count as ILF If referenced (not maintained) by the application count as EIF Do not count technical metadata like update frequency, physical-logical file mapping

2006 ISMA Conference 18 Translation Tables Holds translation information used during Data Load process Usually exists only for implementation purposes Code Data as defined in CPM 4.2 Do not count, as it does not contribute to functional size As with Code data, at the maximum only one translation table may be counted per application

2006 ISMA Conference 19 Staging Tables, Temporary tables Contains Temporary Data Just another instance of same data counted elsewhere in the application Not User recognizable Do not count Staging and Temporary tables as Logical Files

2006 ISMA Conference 20 Aggregates Contains Aggregated data created for use by other processes or for performance reasons Usually exists as Aggregate Fact table Count each Aggregated fact group as 1 Internal Logical file, if used by other processes Do not count Aggregate tables, if created for performance reasons

2006 ISMA Conference 21 Views - Query statements that create logical copies of tables Do not count Views as Logical Files Existence (Exception) Tables - Defines business rules (Eg.- which products will be offered in which regions during which promotions) - Reference Data as defined in CPM 4.2 Count as Logical files. If maintained within the application count as ILF If referenced (not maintained) by the application count as EIF

2006 ISMA Conference 22 Release/Publish tables Data is assembled for further use Usually data stored here is temporary (not persistent) Do not Count Release/Publish tables as Logical files (unless it contains persistent data) Logs/Event Files Metrics data about the DW may be logged Maintains information about the currency of the data within the DW. Count as Internal Logical Files

2006 ISMA Conference 23 Fact tables (Measures) Factual or quantitative data Contain detailed or summarized data values Links to dimensional data Is usually meaningful to the user only when linked to dimensional data Examples - sales, profits, counts, claims Business Data as defined in CPM 4.2 Count each fact table as an RET in a logical star schema/cube

2006 ISMA Conference 24 Dimension tables (Attributes) Describes business data in a Fact table Usually organized into hierarchies (to roll up and drill down) Is usually meaningful to the user only when linked with fact data Examples - products, regions, time period (months roll up to quarters & quarters to years) Count each dimension table as an RET in a logical star schema/cube (which is an ILF) If it serves additional business purpose, Count as a separate ILF

2006 ISMA Conference 25 [3] Multi-dimensional Data Structures Categorizes data in order to enable business analysis Has two components - Fact and Dimensions Common Design Schemas Star Snowflake Cube

2006 ISMA Conference 26 Star Schema Generic representation of a dimensional model in a relational database Multiple Dimensions provide descriptive information on data in the central Fact table Count each star schema as an ILF. Complexity : Product Region Count each Fact subgroup as 1 RET Count each Dimension subgroup as 1 RET Sales If a Dimension is used in multiple star schemas, count each as 1 RET Do not count redundant DETs that occur in the same star Unit DIMENSION FACT Time

2006 ISMA Conference 27 Snowflake Schema Expanded version of (Adds Hierarchy to) the Star Schema Count each snowflake schema as an ILF. Product Group Complexity : Count each Fact subgroup as 1 RET Count each Dimension subgroup as 1 RET Where the hierarchical dimensions are exploded to their levels, the second order tables do not represent other RETs Do not count redundant DETs that occur in the same snowflake Product Unit DIMENSION Sales FACT Region Time

2006 ISMA Conference 28 Cube A multidimensional cube represents a similar structure with the cube axes representing Dimensions and the cube the Facts Count each cube as an ILF. PRODUCT REGION Complexity : Count each Fact subgroup as 1 RET Count each Dimension subgroup as 1 RET Do not count redundant DETs that occur in the same cube TIME

2006 ISMA Conference 29 [4] Lessons Learnt Document a local DW Sizing guideline (as each DW has its own uniqueness) to ensure consistency across the organization Clearly segregate Code Data from Business and Reference data as it may often be confused with Reference data in the DW Use an inclusive approach Consider all data groups (unless no justification may be found in terms of rules for doing so) Maintain a list of Logical-to-Physical file (temporary and persistent) Mapping as part of the count documentation. This helps identify the logical data groups impacted by an enhancement to the DW

2006 ISMA Conference 30 Conclusion Suggested domain-specific approach is consistent with IFPUG s 4.2 counting guidelines Suitable for both development and enhancement projects this approach has been successfully adopted at multiple sites in both scenarios This inclusive approach ensures all logical data is addressed The Size obtained through this approach is in line with Industry data - this is indicated by the productivity indicator.

2006 ISMA Conference 31 References IFPUG, Function Point Counting Practices Manual Release 4.2 W. H. Inmon, Building the Data Warehouse Ralph Kimball, The Data Warehouse Toolkit (Second Edition) Luca Santillo, Size & Estimation of Data Warehouse Systems

Questions??? 2006 ISMA Conference 32

Thank you 2006 ISMA Conference 33