Whitepaper. Function Points Based Estimation Model for Data Warehouses. Published on: March 2010 Author: Karthikeyan Sankaran

Similar documents
Whitepaper. Data Warehouse/BI Testing Offering YOUR SUCCESS IS OUR FOCUS. Published on: January 2009 Author: BIBA PRACTICE

Whitepaper. Data Warehouse/BI Testing Offering. Published on: January 2010 Author: Sena Periasamy

YOUR SUCCESS IS OUR FOCUS. Whitepaper. Claim Processing Test Suite. Hexaware Technologies. All rights reserved.

Whitepaper. Power of Predictive Analytics. Published on: March 2010 Author: Sumant Sahoo

Whitepaper. IT Strategies for HR Transformation YOUR SUCCESS IS OUR FOCUS. Published on: Feb 2006 Author: Madhavi M

Whitepaper. Agile Methodology: An Airline Business Case YOUR SUCCESS IS OUR FOCUS. Published on: Jun-09 Author: Ramesh & Lakshmi Narasimhan

Whitepaper. Hexaware Data Masking Solution for PeopleSoft Applications. Published on: January 2011 Author: Immanuel J. Kingsley

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach

Human Resource Development

APPLYING FUNCTION POINTS WITHIN A SOA ENVIRONMENT

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

Performance of Infosys group for the Third Quarter ended December 31, 2007

Oracle BI 10g: Analytics Overview

Zensar revenues up 12.8% in Third Quarter

A Knowledge Management Framework Using Business Intelligence Solutions

Data Warehouse Overview. Srini Rengarajan

SIZE & ESTIMATION OF DATA WAREHOUSE SYSTEMS

Enterprise Solutions. Data Warehouse & Business Intelligence Chapter-8

Decision Support and Business Intelligence Systems. Chapter 1: Decision Support Systems and Business Intelligence

Enterprise Data Quality

Data Warehousing and Data Mining in Business Applications

POLAR IT SERVICES. Business Intelligence Project Methodology

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

Counting Infrastructure Software

Cúram Business Intelligence and Analytics Guide

A Service-oriented Architecture for Business Intelligence

Data warehouse and Business Intelligence Collateral

CONCEPTUALIZING BUSINESS INTELLIGENCE ARCHITECTURE MOHAMMAD SHARIAT, Florida A&M University ROSCOE HIGHTOWER, JR., Florida A&M University

B.Sc (Computer Science) Database Management Systems UNIT-V

Quarterly Quarterly Rep ort eport

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of

Why SNAP? What is SNAP (in a nutshell)? Does SNAP work? How to use SNAP when we already use Function Points? How can I learn more? What s next?

How to Enhance Traditional BI Architecture to Leverage Big Data

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

Data Quality Assessment. Approach

EAI vs. ETL: Drawing Boundaries for Data Integration

Implementing a SQL Data Warehouse 2016

CHAPTER SIX DATA. Business Intelligence The McGraw-Hill Companies, All Rights Reserved

Effecting Data Quality Improvement through Data Virtualization

Data Integration Alternatives Managing Value and Quality

Rational Reporting. Module 3: IBM Rational Insight and IBM Cognos Data Manager

Whitepaper. Benefits of using Metadata Driven Engines to Reduce risk of Insurance Data Migration

Outline Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Fluency With Information Technology CSE100/IMT100

Data Warehousing Systems: Foundations and Architectures

QAD Business Intelligence

Data Integration Alternatives Managing Value and Quality

US Department of Education Federal Student Aid Integration Leadership Support Contractor January 25, 2007

Chapter 5. Learning Objectives. DW Development and ETL

PUSH INTELLIGENCE. Bridging the Last Mile to Business Intelligence & Big Data Copyright Metric Insights, Inc.

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Building a Custom Data Warehouse

Business Intelligence Project Management 101

SAS BI Course Content; Introduction to DWH / BI Concepts

Proven Testing Techniques in Large Data Warehousing Projects

Understanding Data Warehousing. [by Alex Kriegel]

Extensibility of Oracle BI Applications

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Implementing a Data Warehouse with Microsoft SQL Server

Results for the quarter ended December 31, 2013 under IFRS

Course 20463:Implementing a Data Warehouse with Microsoft SQL Server

Oracle BI Applications (BI Apps) is a prebuilt business intelligence solution.

Trends in Data Warehouse Data Modeling: Data Vault and Anchor Modeling

IST722 Data Warehousing

FAST Function Points. David Seaver Director Estimation and Measurement Fidelity Investments

Deriving Business Intelligence from Unstructured Data

<Insert Picture Here> Oracle BI Standard Edition One The Right BI Foundation for the Emerging Enterprise

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Traditional BI vs. Business Data Lake A comparison

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

White Paper

Implementing a Data Warehouse with Microsoft SQL Server 2014

SIZING ANDROID MOBILE APPLICATIONS

Application Of Business Intelligence In Agriculture 2020 System to Improve Efficiency And Support Decision Making in Investments.

Applied Business Intelligence. Iakovos Motakis, Ph.D. Director, DW & Decision Support Systems Intrasoft SA

Introduction to Oracle Business Intelligence Standard Edition One. Mike Donohue Senior Manager, Product Management Oracle Business Intelligence

Implementing a Data Warehouse with Microsoft SQL Server MOC 20463

COURSE OUTLINE MOC 20463: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

ORACLE HEALTHCARE ANALYTICS DATA INTEGRATION

Advanced Data Management Technologies

The Role of the Analyst in Business Analytics. Neil Foshay Schwartz School of Business St Francis Xavier U

Data Warehouse (DW) Maturity Assessment Questionnaire

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days

Transcription:

Published on: March 2010 Author: Karthikeyan Sankaran Hexaware Technologies. All rights reserved.

Table of Contents 1. Introduction 2. Estimation Challenges 3. Boundary, Scope and Estimation Approach 03 03 03 Author Bio Karthikeyan Sankaran (Karthik) is currently working as a Senior Consultant in the Business Intelligence practice at Hexaware Technologies, a global provider of Information Technology Solutions based in India. Karthik has over 10 years of experience in Business Intelligence domain, having worked as an architect, consultant and project manager for data warehousing projects. Karthik can be reached at karthikeyans@hexaware.com. Abstract Data warehousing applications differ from the traditional application development in the way that they are predominantly subject oriented rather than process oriented and have an analytic centric focus rather than a transaction centric focus. To evolve a structured estimation method for such an application is especially challenging as the data is pooled from a plethora of application sources and they are implemented using a myriad of solution domains. The further transformation, source qualification and data cleansing that happens on this data and transgression into a format which supports business analytics adds to further complexity of these estimates. The user centric approach of this domain lends itself to adopt Function Points since it is a methodology which is based on logical user oriented terms. A structured estimation model is attempted to be evolved based on size which is measured in function points. The model is tweaked and mapped to the various components of the data warehousing domain whose data model is architected using a dimensional model rather than a relational model. Complexity factors other than size and their interrelations have been factored in framing a structured estimation model. Hexaware Technologies. All rights reserved. 2

1. Introduction Business intelligence (BI) is a business management term which refers to applications and technologies which are used to gather, provide access to and analyze data and information about company operations. Business intelligence systems can help companies have a more comprehensive knowledge of the factors affecting their business and thus help companies make better business decisions. Data Warehousing can be considered as the technology domain that facilitates Business Intelligence in any organization. This technology realm of BI has 3 major components: Back-Room Architecture Technology components that are used to extract data from source transactional systems, integrate them, transform the data using business rules and load into target data repositories that aid decision making. Front-Room Architecture Technology components that help the business users analyze the information present using pre-built & ad-hoc reports and utilize the whole range of analytical solutions. Data Repository Typically called a Data Warehouse / Data Mart / Operational Data store, this layer models the data in a subject oriented, integrated, nonvolatile, time-variant fashion that enables the back-room & front-room architectures to work seamlessly. 2. Estimation Challenges Business Intelligence systems consist of many tools & technologies, each with its own advantages and constraints for handling a business problem. The nuances of the tool have to be taken into account for effort estimation. Backend architecture for DW systems, typically called the ETL layer, has many interlinked processes that are run to gather the information requirements. The interlinking / sequencing of processes that dictate how the business rules are applied in a particular situation pose difficulties for estimation. The front-end Reporting / Analytical layer has many user-centric softer aspects to it like performance of reports, the clarity in the semantic layer for adhoc analysis, etc. All these factors need to be taken into account for arriving at proper estimates. As the DW / BI system evolves over time, new functionality is being added on a daily basis. For each of these requirements, the process and data should be in conformance to what is already present in the data warehouse. This implies that the effort for regression testing should also be factored into the effort. Data warehouses process huge volumes of data on a daily basis. Gauging the effort required to perform load testing for each new requirement is also a difficult task. 3. Boundary, Scope and Estimation Approach Data warehousing applications typically have a staging area which pools the data from source applications, an Extraction Transformation Loading (ETL) process which is implemented through data integration tools, validations and transformations to enforce the business logic, process alerts to provide the status notification for the ETL process and a target table where the transformed data is loaded. Figure 1: DWH Application Boundary Hexaware Technologies. All rights reserved. 3

This article is restricted to analyzing the estimation model for back room architecture which involves the data integration (ETL) process. The following approach can be adopted for framing the estimation model: Step 1: Gauging the size of the application using Function Points Step 2: Assessing the ETL complexity factors Step 3: Performing regression analysis Step 4: Implementation of the effort estimation model Step 1 - Size Estimation Using Function Points The Function Point model has five basic function types which are bifurcated into data functions and transaction functions. Data functions have two constituents which are internal logical files (ILF) and External Interface Files (EIF).Transaction functions have three types which are External Inputs (EI), External Outputs (EO) and External Inquiries (EQ).The application of Function point types to DWH application is not straightforward as opposed to Web Application or GUI applications This was mapped to the DWH components as per the following guidelines Table 1: DWH Components vs FP types Staging Tables Staging tables hold the data which is aggregated from different application sources. Even though the staging tables are resident within the boundary of the data warehouse, they are considered to be External Interface Files (EIF) as they primarily hold the application data of external systems and are maintained within the staging area for performance considerations. If there is any additional processing which is involved before updation of these tables then these files could be considered as Internal Logical Files (ILF) Target Tables These are destination tables which are considered to be Internal Logical Files (ILF) since they are updated with the transformed data after different transformation and exceptions processing in accordance with the business logic within the boundary of the data warehouse. Mapping The mapping component in a data warehouse primarily implements the Extraction Transformation and Loading logic by pulling the data from the source table which undergoes transformations before being loaded into a target table. These are considered an External Input (EI) even though the mapping wouldn t necessarily involve an updation of the target tables as the system s behavior is altered by way of the transformational expression which implements the business logic. Mapping in such instances will be considered as a control data which alters the system s behavior. Look Up Table This table generally maps the Identifier field and the Description field. Since the description will invariably not be available as part of the source data and as the description is required by the business, the look up tables are referenced by the ETL mapping before loading the data into target tables. Even though a look up table resembles a code data which provides the explanatory description for an identifier it is not something which is serving any technical implementation. Hence the look up tables are considered as reference data and are included in the ILF count and counted in the FTR of the ETL mapping process. Process Alerts These alerts are considered as External Outputs (EO) since these notify the status of the ETL process after updation into the target table. Hexaware Technologies. All rights reserved. 4

Step 2 - Assessing the ETL Complexity Factors Since the 14 general system characteristics given by the IFPUG counting practices manual were inadequate for estimating the effort required given the complexity of a data warehousing application, a separate brainstorming session was conducted and a cause and effect analysis was done to identify the complexity factors which are depicted below. Figure2: Cause and Effect Diagram for ETL Complexity Factors Based on the Initial Analysis 27 complexity factors were identified and a correlation analysis was performed. By eliminating some of the insignificant factors and bringing in some more factors this was later curtailed to 19 complexity factors and a fresh set of data was re-collected for performing the regression analysis which is presented below. Step 3 - Regression Analysis Based on the above data points gathered a step wise forward regression analysis was performed to filter out the most influential factors which have a correlation with actual effort. The regression equation that was developed for this specific project looked like: The Correlation co-efficient R-Sq value 98% indicates a significant correlation of the above identified 7 factors (out of 19 factors) with the actual effort. This R-Sq value signifies 98% variation in output described by these 7 factors. Step 4 - Implementation of the Effort Estimation Model The Estimation model was implemented and there was significant improvement in the process capability of the estimation process. Hexaware Technologies. All rights reserved. 5

Figure 3: Process Capability before Implementation Figure 4: Process Capability after Implementation Hexaware Technologies. All rights reserved. 6

Address 1095 Cranbury South River Road, Suite 10, Jamesburg, NJ 08831. Main: 609-409-6950 Fax: 609-409-6910 Safe Harbor Certain statements on this whitepaper concerning our future growth prospects are forward-looking statements, which involve a number of risks, and uncertainties that could cause actual results to differ materially from those in such forward-looking statements. The risks and uncertainties relating to these statements include, but are not limited to, risks and uncertainties regarding fluctuations in earnings, our ability to manage growth, intense competition in IT services including those factors which may affect our cost advantage, wage increases in India, our ability to attract and retain highly skilled professionals, time and cost overruns on fixed-price, fixed-time frame contracts, client concentration, restrictions on immigration, our ability to manage our international operations, reduced demand for technology in our key focus areas, disruptions in telecommunication networks, our ability to successfully complete and integrate potential acquisitions, liability for damages on our service contracts, the success of the companies in which Hexaware has made strategic investments, withdrawal of governmental fiscal incentives, political instability, legal restrictions on raising capital or acquiring companies outside India, and unauthorized use of our intellectual property and general economic conditions affecting our industry. Hexaware Technologies. All rights reserved.