DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS



Similar documents
OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

14. Data Warehousing & Data Mining

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Part 22. Data Warehousing

Fluency With Information Technology CSE100/IMT100

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Understanding Data Warehousing. [by Alex Kriegel]

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

Data Mart/Warehouse: Progress and Vision

IST722 Data Warehousing

Enterprise Data Warehouse (EDW) UC Berkeley Peter Cava Manager Data Warehouse Services October 5, 2006

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

Data Warehousing Systems: Foundations and Architectures

Data Warehousing, OLAP, and Data Mining

Data Warehousing and OLAP Technology for Knowledge Discovery

DATA WAREHOUSING AND OLAP TECHNOLOGY

Data Warehousing Concepts

CHAPTER 4 Data Warehouse Architecture

IT0457 Data Warehousing. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Overview of Data Warehousing and OLAP

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

Advanced Data Management Technologies

Turning your Warehouse Data into Business Intelligence: Reporting Trends and Visibility Michael Armanious; Vice President Sales and Marketing Datex,

Introduction to Data Warehousing. Ms Swapnil Shrivastava

Data Warehousing and Data Mining

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

Lection 3-4 WAREHOUSING

Technology-Driven Demand and e- Customer Relationship Management e-crm

SAS BI Course Content; Introduction to DWH / BI Concepts

When to consider OLAP?

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Business Intelligence, Analytics & Reporting: Glossary of Terms

DATA WAREHOUSING APPLICATIONS: AN ANALYTICAL TOOL FOR DECISION SUPPORT SYSTEM

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

The Role of Data Warehousing Concept for Improved Organizations Performance and Decision Making

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Dimensional Modeling for Data Warehouse

Turkish Journal of Engineering, Science and Technology

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

Paper DM10 SAS & Clinical Data Repository Karthikeyan Chidambaram

Data Warehouse Overview. Srini Rengarajan

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

Designing a Dimensional Model

B.Sc (Computer Science) Database Management Systems UNIT-V

Enterprise Solutions. Data Warehouse & Business Intelligence Chapter-8

3.1 Data Warehouse Methodology. Literature Review on Data Warehouse

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach

Week 13: Data Warehousing. Warehousing

HYPERION MASTER DATA MANAGEMENT SOLUTIONS FOR IT

SQL Server 2012 Business Intelligence Boot Camp

A Critical Review of Data Warehouse

Justice Data Warehousing and Court Business Intelligence. Technical Introduction. Harris County Courts

Top 10 Business Intelligence (BI) Requirements Analysis Questions

Data Warehousing and Data Mining in Business Applications

STRATEGIC AND FINANCIAL PERFORMANCE USING BUSINESS INTELLIGENCE SOLUTIONS

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches

A Service-oriented Architecture for Business Intelligence

Republic Polytechnic School of Information and Communications Technology C355 Business Intelligence. Module Curriculum

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning

Module 1: Introduction to Data Warehousing and OLAP

Data Testing on Business Intelligence & Data Warehouse Projects

Business Benefits From Microsoft SQL Server Business Intelligence Solutions How Can Business Intelligence Help You? PTR Associates Limited

TRANSFORMING YOUR BUSINESS

LEARNING SOLUTIONS website milner.com/learning phone

BUILDING OLAP TOOLS OVER LARGE DATABASES

Prophix and Business Intelligence. A white paper prepared by Prophix Software 2012

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777

Data Warehouse Center Administration Guide

OLAP Theory-English version

A Design and implementation of a data warehouse for research administration universities

Technology in Action. Alan Evans Kendall Martin Mary Anne Poatsy. Eleventh Edition. Copyright 2015 Pearson Education, Inc.

8. Business Intelligence Reference Architectures and Patterns

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing

Data W a Ware r house house and and OLAP II Week 6 1

Week 3 lecture slides

Business Intelligence Solutions. Cognos BI 8. by Adis Terzić

CHAPTER 3. Data Warehouses and OLAP

<Insert Picture Here> Oracle BI Standard Edition One The Right BI Foundation for the Emerging Enterprise

Driving Peak Performance IBM Corporation

IAF Business Intelligence Solutions Make the Most of Your Business Intelligence. White Paper November 2002

Implementing Business Intelligence at Indiana University Using Microsoft BI Tools

Data Warehousing. Overview, Terminology, and Research Issues. Joachim Hammer. Joachim Hammer

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

M Designing and Implementing OLAP Solutions Using Microsoft SQL Server Day Course

The difference between. BI and CPM. A white paper prepared by Prophix Software

Data Warehouse: Introduction

Transcription:

DATA WAREHOUSE CONCEPTS A fundamental concept of a data warehouse is the distinction between data and information. Data is composed of observable and recordable facts that are often found in operational or transactional systems. At Rutgers, these systems include the registrar s data on students (widely known as the SRDB), human resource and payroll databases, course scheduling data, and data on financial aid. In a data warehouse environment, data only comes to have value to end-users when it is organized and presented as information. Information is an integrated collection of facts and is used as the basis for decisionmaking. For example, an academic unit needs to have diachronic information about its extent of instructional output of its different faculty members to gauge if it is becoming more or less reliant on part-time faculty. DATA WAREHOUSE DEFINITIONS The data warehouse is that portion of an overall Architected Data Environment that serves as the single integrated source of data for processing information. The data warehouse has specific characteristics that include the following: Subject-Oriented: Information is presented according to specific subjects or areas of interest, not simply as computer files. Data is manipulated to provide information about a particular subject. For example, the SRDB is not simply made accessible to end-users, but is provided structure and organized according to the specific needs. Integrated: A single source of information for and about understanding multiple areas of interest. The data warehouse provides one-stop shopping and contains information about a variety of subjects. Thus the OIRAP data warehouse has information on students, faculty and staff, instructional workload, and student outcomes. Non-Volatile: Stable information that doesn t change each time an operational process is executed. Information is consistent regardless of when the warehouse is accessed. Time-Variant: Containing a history of the subject, as well as current information. Historical information is an important component of a data warehouse. Accessible: The primary purpose of a data warehouse is to provide readily accessible information to end-users. Process-Oriented: It is important to view data warehousing as a process for delivery of information. The maintenance of a data warehouse is ongoing and iterative in nature.

Other Definitions Data Warehouse: A data structure that is optimized for distribution. It collects and stores integrated sets of historical data from multiple operational systems and feeds them to one or more data marts. It may also provide end-user access to support enterprise views of data. Data Mart: A data structure that is optimized for access. It is designed to facilitate end-user analysis of data. It typically supports a single, analytic application used by a distinct set of workers. Staging Area: Any data store that is designed primarily to receive data into a warehousing environment. Operational Data Store: A collection of data that addresses operational needs of various operational units. It is not a component of a data warehousing architecture, but a solution to operational needs. OLAP (On-Line Analytical Processing): A method by which multidimensional analysis occurs. Multidimensional Analysis: The ability to manipulate information by a variety of relevant categories or dimensions to facilitate analysis and understanding of the underlying data. It is also sometimes referred to as drilling-down, drilling-across and slicing and dicing Hypercube: A means of visually representing multidimensional data. Star Schema: A means of aggregating data based on a set of known dimensions. It stores data multidimensionally in a two dimensional Relational Database Management System (RDBMS), such as Oracle. Snowflake Schema: An extension of the star schema by means of applying additional dimensions to the dimensions of a star schema in a relational environment. Multidimensional Database: Also known as MDDB or MDDBS. A class of proprietary, non-relational database management tools that store and manage data in a multidimensional manner, as opposed to the two dimensions associated with traditional relational database management systems. OLAP Tools: A set of software products that attempt to facilitate multidimensional analysis. Can incorporate data acquisition, data access, data manipulation, or any combination thereof.

COMPARISON OF DATA WAREHOUSE AND OPERATIONAL DATA HOW IS THE WAREHOUSE DIFFERENT? The data warehouse is distinctly different from the operational data used and maintained by day-to-day operational systems. Data warehousing is not simply an access wrapper for operational data, where data is simply dumped into tables for direct access. Among the differences: OPERATIONAL DATA application oriented detailed accurate, as of the moment of access serves the clerical community can be updated run repetitively and nonreflectively requirements for processing understood before initial development compatible with the Software Development Life Cycle performance sensitive (immediate response required when entering a transaction) accessed a unit at a time (limited number of data elements for a single record) transaction driven control of update a major concern in terms of ownership high availability managed in its entirety nonredundancy static structure; variable contents small amount of data used in a process DW DATA subject oriented summarized, otherwise refined represents values over time, snapshots serves the managerial community is not updated run heuristically requirements for processing not completely understood before development completely different life cycle performance relaxed (immediacy not required) accessed a set at a time (many records of many data elements) analysis driven control of update no issue relaxed availability managed by subsets redundancy is a fact of life flexible structure large amount of data used in a process

The Data Warehousing Process Part 1 Determine Informational Requirements Identify and analyze existing informational capabilities. Identify from key users the significant business questions and key metrics that the target user. group regards as their most important requirements for information. Decompose these metrics into their component parts with specific definitions. Map the component parts to the informational model and systems of record. The Data Warehousing Process Part 2 Evolutionary and Iterative Development Process When you begin to develop your first data warehouse increment, the architecture is new and fresh. With the second and subsequent increments, the following is true: Start with one subject area (or subset or superset) and one target user group. Continue and add subject areas, user groups and informational capabilities to the architecture based on the organization s requirements for information, not technology. Improvements are made from what was learned from previous increments. Improvements are made from what was learned about warehouse operation and support. The technical environment may have changed. Results are seen very quickly after each iteration. The end user requirements are refined after each iteration. Data Warehousing is an evolutionary/iterative process that follows a spiral pattern The warehouse architecture is initially developed at the start. The first increment is developed based on the architecture. Building the first increment causes architectural changes. Operation of the warehouse brings architectural changes. Each additional increment extends the warehouse. Each new increment may cause architectural adjustments. Continued operation may cause architectural adjustments.

Current Status Learn and modify initial requirements Identify Need Plan for next phase Deploy, and Use Start of project Expanding Scope Design Build Test

Office of Institutional Research and Academic Planning Data Warehouse Technical Architecture Design Data Sources (see attached) Budget System ETL Services Database Services OLAP Services Access Methods OLAP Developer: build OLAP views of data cubes Personnel CARS CAS FAMS Cubes built from data marts Multi-dimensional cubes created include: Enrollment Degrees Conferred Time to degree Retention Rates Graduation Rates Instructional Workload Faculty Analysis OLAP Tool User OLAP Tool User OLAP User: Access pre-defined views of data cubes NJAS Course Scheduling SRDB Faculty Survey Data Marts: SRDB Course Anaysis System Human Resources Course Scheduling SURE Report/Cube Distribution Server Web Browser Report Users Graduate School Programs Reports built from data marts Web Browser SURE Financial Accounting System Reports created include: Fact Book Instructional Workload Power Users - develop ad-hoc queries and reports. using Access Payroll System Reference Data Exploration Data Warehouse ODBC And Create Data Files Prepared by Michael J. Cullinan, Application Developer, OIRAP

THE WAREHOUSE POPULATING PROCESS A data warehouse is populated through a series of steps that 1) Remove data from the source environment (extract). 2) Change the data to have desired warehouse characteristics like subject-orientation and time-variance (transform). 3) Place the data into a target environment (load). This process is represented by the acronym ETL for Extract, Transform and Load. Complexity of Transformation and Integration The extraction of data from the operational environment to the data warehouse environment requires a change in technology. The selection of data from the operational environment may be very complex. Data is reformatted. Data is cleansed. Multiple input sources of data exist. Default values need to be supplied. Summarization of data often needs to be done. The input records that must be read have exotic or nonstandard formats. Data format conversion must be done. Massive volumes of input must be accounted for. Perhaps the worst of all: Data relationships that have been built into old legacy program logic must be understood and unraveled before those files can be used as input.

Data Warehouse Tools/Software Component Product used by Institutional Research Component Description Reporting Crystal Reports Querying Access 2000 OLAP Data Mining/Statistical Analysis Crystal Analysis Professional SAS Create presentation style reports with chart and graphs. Can be used to access any type of data source. Create complex ad-hoc queries against a variety of data sources using ODBC access to DW databases. Able to export to other types of formats such as text files, Access data cubes for designing views to pivot, filter and aggregate facts on pre-defined dimensions for specific subject areas such as enrollment, degrees conferred, etc. Statistical Analysis using ODBC access to IR DW databases. Recommended System Requirements For Web Access Internet Explorer for using OLAP web based ActiveX control Expand video to 1024 X 768 for more viewing space