An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of



Similar documents
The Role of Data Warehousing Concept for Improved Organizations Performance and Decision Making

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

Lection 3-4 WAREHOUSING

Part 22. Data Warehousing

Data Warehouse for Interactive Decision Support for Addis Ababa City Administration

14. Data Warehousing & Data Mining

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Data Warehousing Systems: Foundations and Architectures

A Survey on Data Warehouse Architecture

Data Warehousing and Data Mining in Business Applications

Data Warehousing: A Technology Review and Update Vernon Hoffner, Ph.D., CCP EntreSoft Resouces, Inc.

Master Data Management and Data Warehousing. Zahra Mansoori

Data Warehousing and Data Mining

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach

SENG 520, Experience with a high-level programming language. (304) , Jeff.Edgell@comcast.net

Fluency With Information Technology CSE100/IMT100

Data Warehousing and OLAP Technology for Knowledge Discovery

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Methodology Framework for Analysis and Design of Business Intelligence Systems

Data Warehouse Overview. Srini Rengarajan

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Dimensional Modeling for Data Warehouse

Advanced Data Management Technologies

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

OLAP Theory-English version

Dr. Osama E.Sheta Department of Mathematics (Computer Science) Faculty of Science, Zagazig University Zagazig, Elsharkia, Egypt

Sales and Operations Planning in Company Supply Chain Based on Heuristics and Data Warehousing Technology

THE TECHNOLOGY OF USING A DATA WAREHOUSE TO SUPPORT DECISION-MAKING IN HEALTH CARE

CHAPTER SIX DATA. Business Intelligence The McGraw-Hill Companies, All Rights Reserved

An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies

BUSINESS INTELLIGENCE AS SUPPORT TO KNOWLEDGE MANAGEMENT

Deriving Business Intelligence from Unstructured Data

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Technology-Driven Demand and e- Customer Relationship Management e-crm

Hybrid Support Systems: a Business Intelligence Approach

A Knowledge Management Framework Using Business Intelligence Solutions

B. 3 essay questions. Samples of potential questions are available in part IV. This list is not exhaustive it is just a sample.

What is Customer Relationship Management? Customer Relationship Management Analytics. Customer Life Cycle. Objectives of CRM. Three Types of CRM

Turkish Journal of Engineering, Science and Technology

What is Management Reporting from a Data Warehouse and What Does It Have to Do with Institutional Research?

Master Data Management. Zahra Mansoori

Data Mining for Successful Healthcare Organizations

A Review of Data Warehousing and Business Intelligence in different perspective

Business Intelligence Solutions. Cognos BI 8. by Adis Terzić

Jagir Singh, Greeshma, P Singh University of Northern Virginia. Abstract

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

DATA MINING AND WAREHOUSING CONCEPTS

Business Intelligence: Effective Decision Making

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

Data Mart/Warehouse: Progress and Vision

Dimensional Modeling and E-R Modeling In. Joseph M. Firestone, Ph.D. White Paper No. Eight. June 22, 1998

Datawarehousing and Business Intelligence

Student Performance Analytics using Data Warehouse in E-Governance System

Data Warehouses in the Path from Databases to Archives

Research on Airport Data Warehouse Architecture

Meta-data and Data Mart solutions for better understanding for data and information in E-government Monitoring

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Class News. Basic Elements of the Data Warehouse" 1/22/13. CSPP 53017: Data Warehousing Winter 2013" Lecture 2" Svetlozar Nestorov" "

BUILDING A HEALTH CARE DATA WAREHOUSE FOR CANCER DISEASES

Understanding Data Warehousing. [by Alex Kriegel]

CHAPTER 1 INTRODUCTION

IMPROVING THE QUALITY OF THE DECISION MAKING BY USING BUSINESS INTELLIGENCE SOLUTIONS

Trends in Data Warehouse Data Modeling: Data Vault and Anchor Modeling

CHAPTER 3. Data Warehouses and OLAP

Enterprise Solutions. Data Warehouse & Business Intelligence Chapter-8

Application Of Business Intelligence In Agriculture 2020 System to Improve Efficiency And Support Decision Making in Investments.

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

IST722 Data Warehousing

Implementing Oracle BI Applications during an ERP Upgrade

An Overview of Database management System, Data warehousing and Data Mining

DATA QUALITY IN BUSINESS INTELLIGENCE APPLICATIONS

Framework for Data warehouse architectural components

Applied Business Intelligence. Iakovos Motakis, Ph.D. Director, DW & Decision Support Systems Intrasoft SA

Technology in Action. Alan Evans Kendall Martin Mary Anne Poatsy. Eleventh Edition. Copyright 2015 Pearson Education, Inc.

THE QUALITY OF DATA AND METADATA IN A DATAWAREHOUSE

TDWI strives to provide course books that are content-rich and that serve as useful reference documents after a class has ended.

University Data Warehouse Design Issues: A Case Study

A Data Warehouse Design for A Typical University Information System

Republic Polytechnic School of Information and Communications Technology C355 Business Intelligence. Module Curriculum

Metadata Technique with E-government for Malaysian Universities

An Instructional Design for Data Warehousing: Using Design Science Research and Project-based Learning

A Design and implementation of a data warehouse for research administration universities

Foundations of Business Intelligence: Databases and Information Management

MIS636 AWS Data Warehousing and Business Intelligence Course Syllabus

Transcription:

An Introduction to Data Warehousing An organization manages information in two dominant forms: operational systems of record and data warehouses. Operational systems are designed to support online transaction processing (OLTP) whereas data warehousing systems are designed to support online analytical processing (OLAP). Operational systems concentrate on high-volume transaction processing on a day-to-day basis using real-time data. These systems are generally process-oriented and usually focus on specific business tasks such as registering students, updating financial transactions, and managing employee timesheets. They are optimized for simplicity and speed of modification, allowing for efficient and effortless data entry and retrieval. Such systems also track historical and transactional data, but not to the degree required by research queries. While the operational systems primarily focus on current data management, the data warehouses update and store historical data. They are generally subject-specific and usually carry data from multiple operational systems to support organizational decision-making. Data warehouses can be used to address issues in academic institutions regarding student satisfaction, the effectiveness of new instructional techniques, and the attrition rate. In response to a concern, relevant data can be extracted, or mined, and utilized for data analysis and report generation. Definition: According to William H. Inmon, a data warehouse is a subject-oriented, integrated, timevarying, non-volatile collection of data in support of the management s decision-making process (Inmon, 2005, p. 32). A data warehouse is a centralized repository that stores data from multiple Prepared by Priya Chaplot 1 January 9, 2007

information sources and transforms them into a common, multidimensional data model for efficient querying and analysis. A data warehouse has the ability to address a wide variety of phenomena. A faculty member may ask, How can I modify my instruction to help my students learn to write more effective essays? A manager in the Department of Human Resources may ask, What kind of training or orientation is necessary for new employees? Individuals from administration may ask, Are students who attend classes full-time more likely to succeed academically than those who take classes on a part-time basis? Each of these questions requires more information about the situation in order to conduct research. This information originates from the data warehouse. A data warehouse is a storehouse of an organization s historical data. Information from operational systems is extracted and imported into the data warehouse on a regular basis. As a result, complex inquiries, or queries, can be conducted through the data warehouse with minimal interruptions to the operational systems. The imported data is read-only and only adds to the data existing in the data warehouse. With greater amounts of data, the value of the data warehouse to the user increases, since analyses looking over a longer period of time become possible. When a user query is submitted to the warehouse, all relevant historical data addressing that query is readily available and current to support in the decision-making. Goals: The fundamental goal of the data warehouse is to support strategic planning, modeling and forecasting at the organizational level. It must fulfill the need for knowledge for an area of uncertainty or growth in the organization. In order to accomplish this task, it must provide a single, comprehensive and consistent view of the organization. Prepared by Priya Chaplot 2 January 9, 2007

The data must be easily accessible and understandable to the user. It must be simple yet intuitive, quick, and easy to use. Also, the data warehouse must present the information consistently and securely to its users. When data is collected from the source systems, it must complete various measures of quality assurance to confirm its accuracy. The data must be verified, appropriately labeled, and fully accounted for before it can be made available to the users. Also, the data must be resilient and able to seamlessly adapt to change without discrediting the existing data. Effective data warehousing can help create a meaningful relationship between information technology and business, facilitating enterprise-level strategic planning and growth (Cohen, 2006). Components: A data warehouse has four main components: operational systems of record, the data staging area, the data presentation area, and data access tools (Kimball & Ross, 2002, p. 7). Each component of the data warehouse serves a unique function in preparing data for manipulation and examination. Operational Source Systems Data Staging Area Data Presentation Area Data Access Tools Extract Extract Extract Services: Clean, combine, and standardize Conform dimensions NO USER QUERY SERVICES Data Store: Flat files and relational tables Processing: Sorting and sequential processing Load Load Data Mart #1 DIMENSIONAL Atomic and Summary data Based on a single Business process DW Bus: Conformed facts & dimensions Data Mart #2 (Similarly designed) Access Access Ad Hoc Query Tools Report Writers Analytic Applications Modeling: Forecasting Scoring Data mining Figure 1. Basic elements of the data warehouse (Kimball & Ross, 2002, p. 7). Prepared by Priya Chaplot 3 January 9, 2007

As aforementioned, the operational systems of records, or source systems, capture and process the organization s day-to-day transactions. They concentrate heavily on efficient processing performance, since they are dealing with a high volume of transactions. They function in isolation and do not typically share common data with other source systems. The data that is acquired through these systems is uploaded into the data staging area. The data staging area acts doubly as a storage area for the captured data and as a platform for the set of processes called extract-transformation-load (ETL). This set of processes occurs to standardize the raw data and incorporate it into the data warehouse environment. First, the data is extracted from various source systems and copied into the data staging area. There, the data is combined, cleansed and transformed into a standard format and structure. Missing elements, incorrect labels, duplicate data, misspellings, and other errors are manipulated and corrected in this phase. Once the data is standardized, it is loaded into the data presentation area, where it is finally accessible to users. The formatted data is organized, located and available for user queries in the data presentation area. The data presentation area is considered to be a set of integrated data marts. A data mart is a subset of the data warehouse and represents select data regarding a specific business function (Inmon, 1999). An organization can have multiple data marts, each one relevant to the department for which it was designed. For example, the English department may have a data mart reflecting historical student data including demographics, placement scores, academic performance, and class schedules. The data contained in the data presentation area must be detailed and logically organized. Once the data presentation area contains the formatted data, users can utilize various data access tools to perform queries. Some data access tools include ad hoc query tools, data mining Prepared by Priya Chaplot 4 January 9, 2007

applications and sophisticated forecasting tools. Users can use these tools to customize queries to search specific segments of the data presentation area. In addition to these components of a data warehouse, it is imperative to discuss the importance of a strong metadata structure. Metadata, or data about the data, contains vital information that guides the process of converting the raw data from the operational systems of record into accessible data in the data presentation area (Kimball & Ross, 2002, p. 14). Due to its value, the metadata resources must be as carefully categorized, protected, and accessible as the data itself. Analysis: Data warehouses are effective in the transformation from intuitive information gathering to systematic and objective investigation (Zikmund, 2003, p. 5). They provide users access and control to a wide variety of centralized and formatted data to choose the best course of action and support business decisions. Users can manipulate and customize the data to support specific queries that will enable positive changes at various business levels. Since the various stages increase data accuracy and integrity, complex queries can be conducted with a strong sense of confidence. Although there are many benefits to data warehousing, there are several challenges and drawbacks as well. Depending on their design, data warehouses can be highly risky since they have complex architectures, long development cycles, poor information quality, and an incapability to adapt as quickly as business conditions change. Furthermore, since the operational source systems provide the data that eventually makes its way into the data presentation area, data warehouses are limited by these source systems. Thus, each organization should focus on Prepared by Priya Chaplot 5 January 9, 2007

continuous evaluation and improvement of its data warehouse, as well as its source systems, to ensure its effectiveness in conducting research and supporting business decisions. Conclusion: Since the primary task of management is effective decision making, the primary task of research, and subsequently data warehouses, is to generate accurate information for use in that decision making. It is imperative that an organization s data warehousing strategies reflect changes in the internal and external business environment in addition to the direction in which the business is traveling. Playing an integral role in the growth, development and success of an organization, data warehouses facilitate meaningful research which facilitates effective management. References: [1] Cohen, Rich (2006). Business Intelligence Strategy: Seven Principles for Enterprise Data Warehouse Design. DM Review. Retrieved December 18, 2006, from http://www.dmreview.com/article_sub.cfm?articleid=1045818. [2] Inmon, William H., Building the Data Warehouse, 4th Edition, Wiley Publishing, Indianapolis, 2005. [3] Inmon, William H. (1999). Data Mart Does Not Equal Data Warehouse. DM Review. Retrieved January 2, 2007 from http://www.dmreview.com/article_sub.cfm?articleid=1675. [4] Kimball, Ralph; Ross, Margy. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd Edition, John Wiley and Sons, Inc., Chichester, 2002. [5] Zikmund, William G., Business Research Methods, 7th Edition, Dryden Press, New York, 2003. Prepared by Priya Chaplot 6 January 9, 2007