1960s 1970s 1980s 1990s. Slow access to



Similar documents
Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Week 13: Data Warehousing. Warehousing

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Overview of Data Warehousing and OLAP

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse?

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

DATA WAREHOUSING AND OLAP TECHNOLOGY

Integrate multiple, heterogeneous data sources. Data cleaning and data integration techniques are applied

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Data Warehousing & OLAP

DATA WAREHOUSING - OLAP

Data Warehousing and OLAP

Introduction to Data Warehousing. Ms Swapnil Shrivastava

Lecture Data Warehouse Systems

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

Data Warehousing. Overview, Terminology, and Research Issues. Joachim Hammer. Joachim Hammer

Part 22. Data Warehousing

Week 3 lecture slides

Lection 3-4 WAREHOUSING

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing

Data warehousing. Han, J. and M. Kamber. Data Mining: Concepts and Techniques Morgan Kaufmann.

This tutorial will help computer science graduates to understand the basic-toadvanced concepts related to data warehousing.

Data Warehousing Systems: Foundations and Architectures

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Data W a Ware r house house and and OLAP Week 5 1

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi

When to consider OLAP?

Data Warehousing and OLAP Technology for Knowledge Discovery

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Data Warehousing & OLAP

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

CS54100: Database Systems

Data W a Ware r house house and and OLAP II Week 6 1

14. Data Warehousing & Data Mining

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes

What is OLAP - On-line analytical processing

Chapter 3, Data Warehouse and OLAP Operations

DATA CUBES E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES

Building Data Cubes and Mining Them. Jelena Jovanovic

DATA WAREHOUSE E KNOWLEDGE DISCOVERY

BUILDING OLAP TOOLS OVER LARGE DATABASES

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

BUSINESS ANALYTICS AND DATA VISUALIZATION. ITM-761 Business Intelligence ดร. สล ล บ ญพราหมณ

DATA WAREHOUSING APPLICATIONS: AN ANALYTICAL TOOL FOR DECISION SUPPORT SYSTEM

Data Warehouse: Introduction

Namrata 1, Dr. Saket Bihari Singh 2 Research scholar (PhD), Professor Computer Science, Magadh University, Gaya, Bihar

CS2032 Data warehousing and Data Mining Unit II Page 1

Data Warehousing and Online Analytical Processing

OLAP. Business Intelligence OLAP definition & application Multidimensional data representation

Overview. Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Data Warehousing. An Example: The Store (e.g.

Lecture 2: Introduction to Business Intelligence. Introduction to Business Intelligence

Fluency With Information Technology CSE100/IMT100

Data Warehousing and Data Mining

Why Business Intelligence


Mario Guarracino. Data warehousing

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

OLAP and Data Warehousing! Introduction!

B.Sc (Computer Science) Database Management Systems UNIT-V

An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies

IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance

New Approach of Computing Data Cubes in Data Warehousing

Data Warehousing and OLAP Technology

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

Data Mining for Knowledge Management. Data Warehouses

Data Warehousing & OLAP

European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project

Designing a Dimensional Model

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April ISSN

CHAPTER 5: BUSINESS ANALYTICS

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

Decision Support. Chapter 23. Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

DATA WAREHOUSE CONCEPTS DATA WAREHOUSE DEFINITIONS

CHAPTER 3. Data Warehouses and OLAP

Data Warehousing, OLAP, and Data Mining

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

CHAPTER 4 Data Warehouse Architecture

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Module 1: Introduction to Data Warehousing and OLAP

IST722 Data Warehousing

IT0457 Data Warehousing. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

(Week 10) A04. Information System for CRM. Electronic Commerce Marketing

ETL-EXTRACT, TRANSFORM & LOAD TESTING

Dimensional Modeling for Data Warehouse

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach

Lecture 2 Data warehousing

Data Warehouse. MIT-652 Data Mining Applications. Thimaporn Phetkaew. School of Informatics, Walailak University. MIT-652: DM 2: Data Warehouse 1

1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining

Data Warehouse. The term Data Warehouse was coined by Bill Inmon in 1990, which he defined in the following way:

Microsoft Business Intelligence

CHAPTER 4: BUSINESS ANALYTICS

Data Warehousing and Decision Support. Torben Bach Pedersen Department of Computer Science Aalborg University

HYPERION MASTER DATA MANAGEMENT SOLUTIONS FOR IT

Transcription:

Principles of Knowledge Discovery in Fall 2002 Chapter 2: Warehousing and Dr. Osmar R. Zaïane University of Alberta Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 1 Course Content Introduction to Mining warehousing and cleaning operations summarization Association analysis Classification and prediction Clustering Web Mining Similarity Search Other topics if time permits Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 3 Summary of Last Chapter What kind of information are we collecting? What are Mining and Knowledge Discovery? What kind of data can be mined? What can be discovered? Is all that is discovered interesting and useful? How do we categorize data systems? What are the issues in Mining? Are there application examples? Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 2 Chapter 2 Objectives Realize the purpose of data warehousing. Comprehend the data structures behind data warehouses and understand the technology. Get an overview of the schemas used for multi-dimensional data. Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 4

and What is the difference between and OLTP? Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 5 Evolution of Decision Support Systems 1960s 1970s 1980s 1990s Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 7 Incentive for a Businesses have a lot of data, operational data and facts. This data is usually in different databases and in different physical places. is available (or archived), but in different formats and locations. (heterogeneous and distributed). Decision makers need to access information (data that has been summarized) virtually on one single site. This access needs to be fast regardless of the size of the data, and how old the data is. Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 6 Warehousing and On-Line Analytical Processing Desktop Tools Terminal-based Decision Support Systems Batch and Manual Reporting Statistician Computer scientist Difficult and limited queries highly specific to some distinctive needs Analyst Inflexible and non-integrated tools Analyst Flexible integrated spreadsheets. Slow access to operational data Executive Integrated tools Mining What Is? A data warehouse consolidates different data. A data warehouse is a database that is different and maintained separately from an operational database. A data warehouse combines and merges information in a consistent database (not necessarily up-to-date) to help decision support. Decision support systems access data warehouse and do not need to access operational databases do not unnecessarily over-load operational databases. Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 8

Definitions is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management s decision making process. (W.H. Inmon) Subject oriented: oriented to the major subject areas of the corporation that have been defined in the data model. Integrated: data collected in a data warehouse originates from different heterogeneous data. Time-variant: The dimension time is all-pervading in a data warehouse. The data stored is not the current value, but an evolution of the value in time. Non-volatile: update of data does not occur frequently in the data warehouse. The data is loaded and accessed. Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 9 Building a Option 1: Consolidate Marts Corporate Mart Mart Mart Mart Option 2: Build from scratch Corporate data Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 11 Definitions (con t) Warehousing is the process of constructing and using data warehouses. A corporate data warehouse collects data about subjects spanning the whole organization. Marts are specialized, single-line of business warehouses. They collect data for a department or a specific group of people. Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 10 and What is the difference between and OLTP? Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 12

Markets Describing the Organization We sell products in various markets, and we measure our performance over time Business Manager Time We sell Products in various Markets, and we measure our performance over Time Designer Products Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 13 Concept-Hierarchies Dimensions are hierarchical by nature: total orders or partial orders Example: Location(continent country province city) Time(year quarter (month,week) day) Industry Country Year Dimensions: Product, Region, week Hierarchical summarization paths Category Region Quarter Product City Month Week Office Day Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 15 Construction of Based on Multi-dimensional Model Think of it as a cube with labels on each edge of the cube. The cube doesn t just have 3 dimensions, but may have many dimensions (N). Any point inside the cube is at the intersection of the coordinates defined by the edge of the cube. A point in the cube may store values (measurements) relative to the combination of the labeled dimensions. Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 14 and What is the difference between and OLTP? Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 16

On-Line Transaction Processing base management systems are typically used for on-line transaction processing (OLTP) OLTP applications normally automate clerical data processing tasks of an organization, like data entry and enquiry, transaction handling, etc. (access, read, update) base is current, and consistency and recoverability are critical. Records are accessed one at a time. OLTP operations are structured and repetitive OLTP operations require detailed and up-to-date data OLTP operations are short, atomic and isolated transactions bases tend to be hundreds of Mb to Gb. Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 17 Example Location (city, AB) Edmonton Calgary Lethbridge Red Deer Time (Quarters) Q1 Q2 Q3 Q4 Compact Mid-size Family Minivan. Location (city, AB) Category Red Deer Edmonton Calgary Lethbridge Drill down on Q3 Time (Months, Q3) Jul Aug Sep Compact Mid-size Family Minivan. Category Slice on Category=mid-size Location (city, AB) Edmonton Calgary Lethbridge Red Deer Time (Quarters) Q1 Q2 Q3 Q4 Mid-size Category Roll-up on Location Location (province, Maritimes Quebec Ontario Time (Quarters) Q1 Q2 Q3 Q4 Compact Prairies Mid-size Canada) Category Western Pr Family Minivan. Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 19 On-Line Analytical Processing On-line analytical processing () is essential for decision support. is supported by data warehouses. warehouse consolidation of operational databases. The key structure of the data warehouse always contains some element of time. Owing to the hierarchical nature of the dimensions, operations view the data flexibly from different perspectives (different levels of abstractions). operations: DW tend to be in the order of Tb roll-up (increase the level of abstraction) drill-down (decrease the level of abstraction) slice and dice (selection and projection) pivot (re-orient the multi-dimensional view) drill-through (links to the raw data) Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 18 E Red Deer Time (Quarters) Q1 Q2 Q3 Q4 Location (city, AB) Lethbridge Pivot Calgary dmonton Mid-size Category OLTP vs OLTP users Clerk, IT professional Knowledge worker function day to day operations decision support DB design application-oriented subject-oriented data current, up-to-date detailed, flat relational isolated usage repetitive ad-hoc access read/write index/hash on prim. key historical, summarized, multidimensional integrated, consolidated lots of scans unit of work short, simple transaction complex query # records accessed tens millions #users thousands hundreds DB size 100MB-GB 100GB-TB metric transaction throughput query throughput, response (Source: JH) Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 20

Why Do We Separate DW From DB? Performance reasons: necessitates special data organization that supports multidimensional views. queries would degrade operational DB. is read only. No concurrency control and recovery. Decision support requires historical data. Decision support requires consolidated data. Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 21 Transform Marts Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 23 and What is the difference between and OLTP? Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 22 Sources Transform Marts Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in bases University of Alberta 23 are often the operational systems, providing the lowest level of data. are designed for operational use, not for decision support, and the data reflect this fact. Multiple data are often from different systems run on a wide range of hardware and much of the software is built in-house or highly customized. Multiple data introduce a large number of issues -- semantic conflicts. Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 24

Cleaning Transform Marts Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in bases University of Alberta 23 cleaning is important to warehouse. data from multiple are often noisy (may contain data that is unnecessary for DS). Three classes of tools. migration: allows simple data transformation. scrubbing: uses domain-specific knowledge to scrub data. auditing: discovers rules and relationships by scanning data (detect outliers). Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 25 Transform Marts Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in bases University of Alberta 23 Detect changes to an information source that are of interest to the warehouse. Define triggers in a full-functionality DBMS. Examine the updates in the log file. Write programs for legacy systems. Propagate the change in a generic form to the integrator. Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 27 and Transform Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in bases University of Alberta 23 ing the warehouse includes some other processing tasks: Checking integrity constraints, sorting, summarizing, build indices, etc. ing a warehouse means propagating updates on source data to the data stored in the warehouse. When to refresh. Determined by usage, types of data source, etc. How to refresh. shipping: using triggers to update snapshot log table and propagate the updated data to the warehouse. Transaction shipping: shipping the updates in the transaction log. Marts Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 26 Transform Marts Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in bases University of Alberta 23 Receive changes from the monitors Make the data conform to the conceptual schema used by the warehouse Integrate the changes into the warehouse Merge the data with existing data already present Resolve possible update anomalies Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 28

Metadata Repository Transform Administrative Source database and their contents Gateway descriptions schema, view and derived data definitions Dimensions and hierarchies Pre-defined queries and reports mart locations and contents partitions extraction, cleansing, transformation rules, defaults refresh and purge rules User profiles, user groups Security: user authorization, access control Marts Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in bases University of Alberta 23 Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 29 and What is the difference between and OLTP? Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 31 Metadata Repository Transform Marts Business data business terms and definitions ownership of data charging policies data lineage: history of migrated data and sequence of transformations applied currency of data: active, archived, purged monitoring information: warehouse usage statistics, error reports, audit trails Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in bases University of Alberta 23 Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 30 Design Most data warehouses use a star schema to represent the multidimensional model. Each dimension is represented by a dimension-table that describes it. A fact-table connects to all dimension-tables with a multiple join. Each tuple in the fact-table consists of a pointer to each of the dimension-tables that provide its multi-dimensional coordinates and stores measures for those coordinates. The links between the fact-table in the centre and the dimensiontables in the extremities form a shape like a star. (Star Schema) Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 32

Example of Star Schema Date Date Month Year Store StoreID City State Country Region Sales Fact Table Date Product Store Customer unit_sales dollar_sales Product ProductNo ProdName ProdDesc Category Cust CustId CustName CustCity CustCountry (Source: JH) Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 33 s Design (con t) Snowflake schema: A refinement of star schema where the dimensional hierarchy is represented explicitly by normalizing the dimension tables. Fact constellations: Multiple fact tables share dimension tables. Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 35 s Design (con t) Modeling data warehouses: dimensions measurements Star schema: A single object (fact table) in the middle connected to a number of objects (dimension tables) Each dimension is represented by one table Un-normalized (introduces redundancy). Ex: (Edmonton, Alberta, Canada, North America) (Calgary, Alberta, Canada, North America) Normalize dimension tables Snowflake schema Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 34 Example of Snowflake Schema Year Year Month Month Year Date Date Month Sales Fact Table Date Product Product ProductNo ProdName ProdDesc Category Country Country Region State State Country City City State Store StoreID City Store Customer unit_sales dollar_sales Cust CustId CustName CustCity CustCountry (Source: JH) Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 36

Red Deer What Is the Best Design? Performance benchmarking can be used to determine what is the best design. Snowflake schema: Easier to maintain dimension tables when dimension table are very large (reduces overall space). Star schema: More effective for data cube browsing (less joins): can affect performance. Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 37 Aggregation in s The Cube and The Sub-Space Aggregates Q1 Q2 Q3 Q4 Lethbridge Edmonton Calgary Construction of Multi-dimensional Cube Province sum B.C. Prairies Ontario Amount 0-20K20-40K 40-60K 60K- sum All Amount Algorithms, B.C. Algorithms base... Discipline sum Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 39 Multidimensional view of data in the warehouse: Stress on aggregation of measures by one or more dimensions Aggregate Sum Drama Comedy Horror Group By Category Sum Drama Comedy Horror By Time Cross Tab Q1 Q2Q3 Q4 By Category Sum By City By Time City By Category City Sum By Category By Time Drama Comedy Horror By Time Category Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 38 A Star-Net Model Shipping Method Air-Express Customer Orders Customer Contracts Time Annually Quarterly City Country Region Geography Order Truck Product Line Daily Product Item Sales Person District Promotion Product Product Group Division Organization Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 40

Implementation of the R: Relational - data is stored in tables in relational database or extended-relational databases. They use an RDBMS to manage the warehouse data and aggregations using often a star schema. They support extensions to SQL A cell in the multi-dimensional structure is represented by a tuple. Advantage: Scalable (no empty cells for sparse cube). Disadvantage: no direct access to cells. Ex: Microstrategy Metacube (Informix) Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 41 View of s and Hierarchies with DBMiner Importing data Table Browsing Dimension creation Dimension browsing Cube building Cube browsing Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 43 Implementation of the M: Multidimensional implements the multidimensional view by storing data in special multidimensional data structures (MDDS) Advantage: Fast indexing to pre-computed aggregations. Only values are stored. Disadvantage: Not very scalable and sparse Ex: Essbase of Arbor H: Hybrid - combines R and M technology. (Scalability of R and faster computation of M) Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 42 DBMiner Cube Visualization Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 44

Dr. Osmar R. Zaïane, 1999-2002 ORB ORB Engine Engine DBMS DBMS Source 1 Schema Schema Principles of Knowledge Discovery in University of Alberta ORB ORBClient Client SOAP SOAPClient Client Distributor Distributor DCC DCC Principles of Knowledge Discovery in Schema Schema SOAP SOAP Interface Interface(Java (JavaAPI) API) ORB ORB DCC Shell DBMS DBMS University of Alberta SOAP SOAP Executer Executer Schema Schema Source 2 Example DIVE-ON Project Dr. Osmar R. Zaïane, 1999-2002 Example DIVE-ON Project 47 45 1 Visualization Federated N-dimensional data warehouses Principles of Knowledge Discovery in VCU Local 3-dimensional data cube 3 UIM Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta What is the difference between and OLTP? 48 46 Immersed user 4 University of Alberta CAVE Interaction and Dr. Osmar R. Zaïane, 1999-2002 2 CORBA+XML Communication Virtual Distributed databases CORBA+XML Construction Example DIVE-ON Project

Issues Scalability Sparseness Curse of dimensionality Materialization of the multidimensional data cube (total, virtual, partial) Efficient computation of aggregations Indexing Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 49 Mining requires integrated, consistent and cleaned data which data warehouses can provide. tools can interface with the engine to take advantage of the integrated and aggregated data, as well as the navigation power. Interactive and exploratory. -based is referred to as or OLAM (on-line analytical ). Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 51 and What is the difference between and OLTP? Dr. Osmar R. Zaïane, 1999-2002 Principles of Knowledge Discovery in University of Alberta 50