DATA WAREHOUSE DESIGN



Similar documents
Dimensional Modeling for Data Warehouse

Data warehouse design

Data Warehouse Design

Advanced Data Management Technologies

Data Warehousing Systems: Foundations and Architectures

Data warehouse life-cycle and design

A Design and implementation of a data warehouse for research administration universities

THE DIMENSIONAL FACT MODEL: A CONCEPTUAL MODEL FOR DATA WAREHOUSES 1

DATA WAREHOUSING AND OLAP TECHNOLOGY

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

Data warehouses. Data Mining. Abraham Otero. Data Mining. Agenda

Data Warehouse: Introduction

Designing a Dimensional Model

DATA WAREHOUSING APPLICATIONS: AN ANALYTICAL TOOL FOR DECISION SUPPORT SYSTEM

Data Warehousing and Mining: Concepts, Methodologies, Tools, and Applications

CHAPTER 4 Data Warehouse Architecture

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Data Warehousing and Data Mining Introduction

Data Warehousing. Outline. From OLTP to the Data Warehouse. Overview of data warehousing Dimensional Modeling Online Analytical Processing

Conceptual Multidimensional Models

A Survey on Data Warehouse Architecture

Week 3 lecture slides

Lection 3-4 WAREHOUSING

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Indexing Techniques for Data Warehouses Queries. Abstract

BUILDING OLAP TOOLS OVER LARGE DATABASES

An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies

Data Warehousing. Overview, Terminology, and Research Issues. Joachim Hammer. Joachim Hammer

Part 22. Data Warehousing

Introduction to Data Warehousing. Ms Swapnil Shrivastava

Turkish Journal of Engineering, Science and Technology

Data Warehousing and OLAP Technology for Knowledge Discovery

Data Warehousing and Data Mining

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing

SAS BI Course Content; Introduction to DWH / BI Concepts

Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi

Goal-Oriented Requirement Analysis for Data Warehouse Design

Overview. DW Source Integration, Tools, and Architecture. End User Applications (EUA) EUA Concepts. DW Front End Tools. Source Integration

When to consider OLAP?

Metadata Management for Data Warehouse Projects

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of

Presented by: Jose Chinchilla, MCITP

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

Data Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina

A Critical Review of Data Warehouse

Jagir Singh, Greeshma, P Singh University of Northern Virginia. Abstract

IST722 Data Warehousing

2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000

Fluency With Information Technology CSE100/IMT100

14. Data Warehousing & Data Mining

MDM and Data Warehousing Complement Each Other

A Comparative Study on Operational Database, Data Warehouse and Hadoop File System T.Jalaja 1, M.Shailaja 2

Why Business Intelligence

IMPROVING THE QUALITY OF THE DECISION MAKING BY USING BUSINESS INTELLIGENCE SOLUTIONS

Chapter 3 - Data Replication and Materialized Integration

Data Warehousing Concepts

LITERATURE SURVEY ON DATA WAREHOUSE AND ITS TECHNIQUES

Information assets are immensely valuable to any enterprise, and because of this,

Outline. Data Warehousing. What is a Warehouse? What is a Warehouse?

Data warehouse Architectures and processes

A Methodology for the Conceptual Modeling of ETL Processes

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

This tutorial will help computer science graduates to understand the basic-toadvanced concepts related to data warehousing.

Data warehousing. Han, J. and M. Kamber. Data Mining: Concepts and Techniques Morgan Kaufmann.

Data Warehouse Logical Modeling and Design (6)

Data Warehousing and OLAP

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage

The Study on Data Warehouse Design and Usage

Data Warehousing, OLAP, and Data Mining

A Review of Data Warehousing and Business Intelligence in different perspective

DATA WAREHOUSE E KNOWLEDGE DISCOVERY

Data Mart/Warehouse: Progress and Vision

Business Intelligence, Analytics & Reporting: Glossary of Terms

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

INTEGRATION OF HETEROGENEOUS DATABASES IN ACADEMIC ENVIRONMENT USING OPEN SOURCE ETL TOOLS

DATA WAREHOUSING - OLAP

A DATA WAREHOUSE SOLUTION FOR E-GOVERNMENT

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

Transcription:

DATA WAREHOUSE DESIGN ICDE 2001 Tutorial Stefano Rizzi, Matteo Golfarelli DEIS - University of Bologna, Italy 1 Motivation Building a data warehouse for an enterprise is a huge and complex task, which requires an accurate planning aimed at devising satisfactory answers to organizational and architectural questions. Despite the pushing demand for working solutions coming from enterprises and the wide offer of advanced technologies from producers, few attempts towards devising a specific methodology for data warehouse design have been made. On the other hand, the statistic reports related to DW project failures state that a major cause lies in the absence of a global view of the design process: in other terms, in the absence of a design methodology. Summary Introduction to Data Warehousing Conceptual design of Data Warehouses Workload-based logical design for ROLAP Indexes for physical design 2

Introduction to Data Warehousing Stefano Rizzi 3 Information Systems: profile and role Information systems are rooted in the relationship between information, decision and control. An IS should collect and classify the information, by means of integrated and suitable procedures, in order to produce in time and at the right levels the synthesis to be used to support the decisional process, as well as to administrate and globally control the enterprise activity. 4

Information as a resource Information is an increasing value resource, required from s to schedule and monitor effectively the enterprise activities. Information is the first matter which is transformed by information systems like unfinished products are transformed by manufacturing systems. Manufacturing system Finished product Information system Information 5 Value of information Information is an enterprise resource like capital, first matters, plants and people; thus, it has a cost. Hence, understanding the value of information is important. Value Strategic directions Reports Selected information Primary information sources Amount 6

ESS MIS DSS TPS Different kinds of information systems OAS KWS Operational Operational Knowledge Knowledge Management Management Strategic Strategic Senior s Middle s Knowledge and data workers Operational s Sales and marketing Manufacturing Finance Accounting Human resources 7 The Data Warehouse phenomenon Usual complaints: We have tons of data but we cannot access them! How can people playing the same role produce substantially different results? We want to slice and dice data in any possible way! Show me only what is important! Everyone knows some data are incorrect... (R. Kimball, The Data Warehouse Toolkit) 8

Data Warehousing A collection of technologies and tools supporting the knowledge worker (executive,, analyst) in analysing data aimed at decision making and at improving the knowledge assets of the enterprise. Data Warehouse At the core of the architecture of modern information systems, it is a data repository: Oriented to subjects Integrated and consistent Representing temporal evolution Non volatile The data warehouse is regularly refreshed, permanently growing, logically centralised and easily accessed by users, essentially read-only 9 Data Warehouse Operational data (relational, legacy) External data ETL tools Summary data Warehouse Access Analysis tools (OLAP) Data mining What-If analysis Reporting tools 10

Data Marts Data Warehouse Replication and broadcasting Data mart Marketing Finance Geographical regions Client management Supplier management 11 Subject vs Process region patient charge consumption reservations Medical reports admissions Emphasis on applications Emphasis on subjects 12

Integration and consistency External data DB Schema Integration Extraction Transformation Cleaning Validation Filtering Loading DW Text files wrappers loaders mediators 13 Temporal evolution OLTP DW Current values Snapshot Restricted historical content, Often time is not included in keys, Data are upd Rich historical content, Time is included in keys, Snapshots cannot be upd 14

OLTP up Non-volatility DW load acce ss insert delete Huge data volumes: from 20 GBs to some TBs in a few years In a DW, no advanced techniques for transaction management are required (differently from OLTP systems) Key issues are the query throughput and the resilience 15 DW 90% ad hoc queries Mostly read access Hundreds users Denormalised Supports historical versions Optimised for accesses involving most database Based on summary data vs.. OLTP 90% predefined transactions Read/write access Thousands users Normalised Does not support historical versions Optimised for accesses involving a small database fraction Based on elemental data 16

ROLAP (Relational OLAP) Intermediate level server between a relational back- end server and the front-end client Specialised middleware Generation of SQL multi-statements for the back-end server Query scheduling MOLAP (Multidimensional OLAP) Direct support of multi-dimensional views Special data structures (e.g., multi-dimensional arrays) Compression techniques Intelligent disk/memory caching Pre-computation Complex analysis 17 The technological progress knowledge Pattern Warehousing Data Mining Refinement data Statistics & reporting OLAP Data Warehousing Source: Information Discovery 1970 1980 1990 2000 18

The Data Warehouse Market 4500 4000 3500 RDBMS OLAP Source: Shilakes, Tylman - Enterprise Information Portals 3000 2500 2000 1500 1000 500 0 1998 1999 2000 2001 2002 400 350 300 250 200 Data Marts ETL Data Quality Metadata 150 100 50 0 1998 1999 2000 2001 2002 19 The DW life-cycle Objective definition and planning Clearly determine the scopes, define the borders, estimate dimensions, choose the approach to design, evaluate the benefits Infrastructure design Choose the technologies and the tools, analyse the architectural solutions, solve the management problems Design and implementation of applications Add iteratively new data marts and applications to the warehouse 20

Bibliography R. Barquin, S. Edelstein. Planning and Designing the Data Warehouse. Prentice Hall (1996). S. Chaudhuri, U. Dayal. An overview of data warehousing and OLAP technology. SIGMOD Record 26,1 (1997). G. Colliat. OLAP, relational and multidimensional database systems. SIGMOD Record 25, 3 (1996). M. Demarest. The politics of data warehousing. Http://www.hevanet.com/demarest/marc/dwpol.html U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth. Data mining and knowledge discovery in databases: an overview. Comm. of the ACM 39, 11 (1996). W.H. Inmon. Building the data warehouse. John Wiley & Sons (1996). S. Kelly. Data Warehousing in Action. John Wiley & Sons (1997). R. Kimball. The data warehouse toolkit. John Wiley & Sons (1996). R. Kimball, L. Reeves, M. Ross, W. Thornthwaite. The data Warehouse Lifecycle Toolkit. John Wiley & Sons (1998). C. Shilakes, J. Tylman. Enterprise Information Portals. Http://www.sagemaker.com/company/downloads/eip/indepth.pdf P. Vassiliadis. Gulliver inthe land of data warehousing: practical experiences and observations of a researcher. Proc. DMDW 2000 (2000). J. Widom. Research Problems in Data Warehousing. Proc. CIKM (1995). 21 Conceptual modelling for Data Warehousing Stefano Rizzi 22

Why a new conceptual model? While it is universally recognised that a DW leans on a multidimensional model, there is no agreement on the approach to conceptual modelling. On the other hand, an accurate conceptual design is the necessary foundation for building a good information system. The Entity/Relationship model is widespread in the enterprises, but. "Entity relation data models [...] cannot be understood by users and they cannot be navigated usefully by DBMS software. Entity relation models cannot be used as the basis for enterprise data warehouses. (Kimball, 96) 23 The multidimensional data model Number of Coke cans sold at BIGSTORES in London on 10/10/99 Sales Store Product Time Time Number of Pepsi cans sold at all BIGSTORES on 10/10/99 Number of Fanta cans globally sold 24

Basic terminology Fact (cube, target). It is a focus of interest for the decisionmaking process; typically, it models an event occurring in the enterprise world (sales, shipments, purchases). It is essential for a fact to have some dynamic aspects, i.e., to evolve somehow across time. Measures (attributes, variables, metrics, properties). They are continuously valued (typically numerical) attributes which describe a fact from different points of view. For instance, each sale is measured by its revenue. Dimensions. They are discrete attributes which determine the minimum granularity adopted to represent facts. Typical dimensions for the sale fact are product, store and. Hierarchies (dimensions). They contain dimension attributes (levels, parameters) connected in a tree-like structure by many-to-one relationships (functional dependencies). 25 DW modelling in the literature Golfarelli et al. 98 Gyssens, Lakshmanan 97 Hüsemann et al. 00 Vassiliadis 98 Agrawal et al. 95 Sapia et al. 98 Datta, Thomas 97 Cabibbo, Torlone 98 Tryfona et al. 99 Franconi, Sattler 99 Li, Wang 96 26

DW modelling in the literature Golfarelli et al. 98 Hüsemann et al. 00 CONCEPTUAL Vassiliadis 98 Gyssens, Lakshmanan 97 Agrawal et al. 95 Sapia et al. 98 Datta, Thomas 97 Cabibbo, Torlone 98 Tryfona et al. 99 Franconi, Sattler 99 Li, Wang 96 LOGICAL 27 DW modelling in the literature Golfarelli et al. 98 FORMAL Gyssens, Lakshmanan 97 Hüsemann et al. 00 Vassiliadis 98 Agrawal et al. 95 Sapia et al. 98 Datta, Thomas 97 Cabibbo, Torlone 98 Tryfona et al. 99 Franconi, Sattler 99 Li, Wang 96 GRAPHICAL 28

DW modelling in the literature Golfarelli et al. 98 ALGEBRA Gyssens, Lakshmanan 97 Hüsemann et al. 00 Vassiliadis 98 Agrawal et al. 95 Sapia et al. 98 Datta, Thomas 97 Cabibbo, Torlone 98 Tryfona et al. 99 Franconi, Sattler 99 Li, Wang 96 29 DW modelling in the literature Golfarelli et al. 98 Gyssens, Lakshmanan 97 Hüsemann et al. 00 Vassiliadis 98 Agrawal et al. 95 Sapia et al. 98 Datta, Thomas 97 Cabibbo, Torlone 98 Tryfona et al. 99 Franconi, Sattler 99 DESIGN Li, Wang 96 30

Conceptual models Sapia, Blaschka, Höfling, Dinter (1998) dimension level attribute roll-up relationship fact relationship 31 Conceptual models (2) Franconi, Sattler (1999) dimension target property level aggregated entity 32

Conceptual models (3) Hüsemann, Lechtenbörger, Vossen (2000) fact optional dimension measure dimension level property attribute optional property attribute aggregation path 33 The Dimensional Fact Model The Dimensional Fact Model (DFM) is a graphical conceptual model for DWs, aimed to: Effectively support conceptual design; Provide an environment where user queries can be formulated intuitively; Enable communication between the designer and the final user in order to refine requirement specification; Supply a stable platform for logical design; Provide an expressive and non-ambiguous documentation. The DFM is independent of the target logical model (multidimensional or relational) 34

The Dimensional Fact Model (2) Three levels of conceptual documentation are provided: Fact scheme: represents a fact of interest and the associated measures, dimensions and hierarchies. Data Mart scheme: summarizes the fact schemes which constitute each data mart and emphasize the feasible connections between them. Data Warehouse scheme: shows the different data marts emphasizing their overlaps, the different profiles of the users accessing them, and the operational sources which feed them. Each documentation level is integrated by glossaries which explain the names adopted within the schemes, define a connection between the DW data and the operational sources, express data volumes. Data mart schemes are associated to the workload specification. 35 dimension year hierarchy quarter month day of week holiday week Fact schemes fact marketing group SALE department category type brand city brand product sales sale district qty sold revenue unit price no. of customers store store city county measure dimension attribute A fact expresses a many-to-many relationship between its dimensions state 36

Fact schemes (2) A non-dimension attribute contains additional information about a dimension attribute, and is typically connected to it by a one-to-one relationship. It cannot be used for aggregation. Some links between attributes can be optional. marketing department group category type brand city brand product diet day of week sales holiday sale district SALE store year quarter month qty sold store state revenue city county week address unit price phone no. of customers begin end cost promotion optionality price reduction ad type non-dimension attribute 37 Convergence Cross-dimension attributes Additivity, non-additivity, non-aggregability Overlap fiscal year Fact schemes (3) week year quarter month fiscal quarter month fiscal fiscal week day of week marketing group non-aggregability product department category type brand diet SALE qty sold revenue unit price no. of customers promotion brand city sale district store store county store city store state phone address V.A.T. cross-dimension attribute ad type price reduction begin end convergence 38

The SHIPMENTS fact scheme FACT SCHEME: SHIPMENT TO STORES department marketing group category product type brand brand city fiscal year week year quarter month fiscal fiscal quarter month fiscal week day of week SHIPMENT TO STORES qty shipped shipping cost warehouse warehouse store city store state store city warehouse state mode type carrier 39 The INVENTORY fact scheme FACT SCHEME: INVENTORY department marketing group units per pallet package type package size weight category type brand city brand product fiscal year week year quarter month fiscal fiscal fiscal quarter month week day of week AVG, MIN INVENTORY level warehouse warehouse city warehouse nation 40

The supply chain component component from factory component PRODUCTION OF COMPONENTS factory COMPONENT DELIVERY to factory COMPONENT INVENTORY factory product MANUFACTURING factory product package type factory PACKAGING product SHIPMENT TO WAREHOUSE warehouse factory mode product product warehouse product promotion WAREHOUSE INVENTORY warehouse SHIPMENT TO STORES store SALES store mode 41 Glossaries ATTRIBUTE GLOSSARY: SHIPMENT TO STORES name description domain card. query product products 5000 select prodname,brandname, brand brands 800 cityname, brand city Where brands are manufactured cities 50 from PRODUCTS P,BRANDS B, type (pasta, soft drink, ) pr. types 200 CITIES C, where P.brandId = category (food, clothing, music, ) pr. categories 10 B.brandId department Deps. managing categories deps. 5 and B.cityId = C.cityId marketing group Responsible for product types groups 20 and........... stores stores 100 select storename,cityname, store city cities 80 statename from STORES store state states 5 S,CITIES C where S.cityId = C.cityId......................... MEASURE GLOSSARY: SHIPMENT TO STORES (sparsity = 0.01) name description type query qty shipped Quantity of each product being shipped INTEGER select SUM(PS.qty) from PRODUCTS P,SHIP S,PRODSHIP PS, where P.prodId = PS.prodId and PS.shipId = S.shipId and............. group by P.prodId,S.,... shipping cost Cost of the shipment MONEY............. refresh frequency: 1 per week; refresh technique: periodic complete 42

Data mart schemes The data mart scheme is used to summarize the fact schemes which constitute the data mart and to show drill-across connections between them. It is a graph whose nodes are elemental and overlapped fact schemes; the arcs are directed to each overlapped scheme from its component schemes, which in turn may be overlapped. DATA MART SCHEME: SUPPLY CHAIN PRODUCTION OF COMPONENTS PRODUCTION AND DELIVERY COMPONENT DELIVERY DELIVERY AND INVENTORY COMPONENT INVENTORY MANUFACTURING MANUFACTURING AND PACKAGING PACKAGING WAREHOUSE INVENTORY DISTRIBUTION CYCLE SHIPMENT TO WAREHOUSE PRODUCT CYCLE SHIPMENT TO STORES SHIPMENT AND SALE SALE 43 The workload In principle, the workload for a data mart is dynamic and unpredictable. In some commercial tools, the actual workload is monitored while the DW is operating and the logical and physical schemes are dynamically tuned. We claim that a core workload can, and should, be determined a priori: The user typically knows in advance which kind of data analysis (s)he will carry out more often for decisional or statistical purposes; A substantial amount of queries are aimed at extracting summary data to fill standard reports. 44

The workload (2) FACT SCHEME: SHIPMENT TO STORES department marketing group category product type brand brand city fiscal year week year quarter month fiscal fiscal quarter month fiscal week day of week SHIPMENT TO STORES qty shipped shipping cost warehouse warehouse store city store state store city warehouse state mode type carrier 45 Data warehouse schemes At the highest abstraction level, the data warehouse scheme shows the different data marts emphasizing the fact schemes duplicated on two or more of them, the different profiles of the users accessing them, and the operational sources which feed them. personnel personnel database buyer SALES SUPPLY CHAIN PERSONNEL RENOVATION incentives administrative data mart user fact scheme operational db SALES DEMAND CHAIN purchases file transfer product database orders claims sale executive restoration works manual input 46

Conceptual design of Data Warehouses Stefano Rizzi 47 Designing the DW Within a successful approach to DW design, top-down and bottom-up strategies should be mixed. When planning a DW, a bottom-up approach should be followed. One data mart at a time is identified and prototyped. Each data mart is designed in a top-down fashion by building a conceptual scheme for each fact of interest. 48

Data Mart prototyping Prototype first the data mart which: plays the most strategic role for the enterprise; can convince the final users of the potential benefits; leans on available and consistent data sources. DM2 DM4 DM1 DM5 DM3 Source 3 Source 1 Source 2 49 Reference architecture DW Reconciled data Problem of designing the reconciled data (integration of heterogeneous sources) heterogeneous operational dbs 50

chiave negozio negozio città regione indirizzo resp. vendite N1... N2 chiave tempo chiave negozio chiave_prodotto quant venduta incasso num_clienti T1 N1 P1 10 1000000 2 T1 N1 P2 8 1200000 8 T1 N2 P5 15 1500000 5... Methodological framework analysis of the operational db requirement specification conceptual design final user db administrator workload refinement DWs are based on a pre-existing information system designer logical design physical design 51 Methodological framework (2) E/R Scheme Relational Scheme Conceptual Scheme Logical Scheme Physical Scheme Facts Preliminary workload CONCEPTUAL DESIGN Workload LOGICAL DESIGN Target logical model Workload PHYSICAL DESIGN Target DBMS 52

Conceptual design of the data mart Design is based on the documentation of the underlying operational information system: E/R schemes Relational schemes Golfarelli, Maio, Rizzi 98; Cabibbo, Torlone 98; Moody, Kortink 00; Hüsemann, Lechtenbörger, Vossen 00 Steps: Find facts For each fact: Navigate functional dependencies Drop useless attributes Define dimensions and measures 53 Finding facts Within an E/R scheme, a fact is represented by either an entity F or an n-ary relationship between entities E 1...E n Within a relational scheme, a fact is represented by a relation F. The entities and relationships representing frequently upd archives are good candis to define facts; those representing nearly-static archives are not. 54

Navigating functional dependencies Build a tree in which each vertex corresponds to an attribute of the scheme; The root corresponds to the identifier (key) of F; For each vertex v, the corresponding attribute functionally determines all the attributes corresponding to the descendants of v. 55 Example (from the E/R scheme): marketing group type diet (0,1) size weight warehouse MARKETING GROUP (1,N) for (1,1) TYPE (0,N) of (1,1) PRODUCT (1,N) from product (1,N) WAREHOUSE of (1,1) (1,N) unit price (0,N) (1,N) sale address department category qty DEPARTM. for CATEGORY (1,N) (1,1) PURCHASE TICKET district no. SALE DISTRICT (1,N) of (1,1) sales in (1,1) (1,N) (1,1) (0,N) (1,1) (1,N) in STORE in county STATE COUNTY (1,N) (1,1) state CITY ticket number store address phone city (1,N) (1,1) (1,N) (1,1) of BRAND produced in of of (1,N) (1,1) brand 56

Example (from the E/R scheme): dept. state category brand diet weight mark. grp. county type product city qty size sale unit price ticket number sales store address phone city county district no+state state district no 57 Dropping useless attributes Some attributes in the tree may be uninteresting for the DW. In order to drop useless levels of detail, it is possible to apply the following operators: Pruning: delete a vertex and its subtree. Grafting: delete a vertex and move its subtree. It is useful when an attribute is not interesting but the attributes it determines must be preserved. sales address sales address ticket number store city state sales address store ticket number store 58

Defining dimensions The choice of dimensions determines the fact granularity. Dimensions must be chosen among the root children in the attribute tree. Time should always be a dimension. dept. category brand diet weight mark. grp. type product city qty sale unit price sales store address phone city county district no+state state 59 Defining measures Measures must be chosen among the children of the root. Typically, measures are computed either by counting the number of instances of F, or by summing (averaging, ) expressions which involve numerical attributes. An attribute cannot be both a measure and a dimension. A fact may have no measures. dept. category brand diet weight mark. grp. type product city qty sale unit price sales store address phone city county district no+state state 60

Granularity Defining the granularity of data is a primary issue in determining performance. Granularity depends on the queries users are interested in, and represents a trade-off between query response time and detail of information to be stored. It may be worth adopting a finer granularity than that required by users, provided that this does not slow down the system too much. Constrained by the maximum time frame for loading. Choosing granularity includes defining the refresh interval. Issues to be considered: Availability of operational data Workload characteristics The total time period to be analysed 61 a CASE tool for WAND tool for data warehouse design A design methodology is almost useless, if no CASE tool to support it is provided. Acquire the relational db scheme via ODBC Carry out conceptual design Define the workload Calculate data volume Carry out logical design Create the documentation (including loading/feeding queries) 62

Bibliography (1) K. Aberer, K. Hemm. A methodology for building a data warehouse in a scientific environment. Proc. 1st Int. Conf. on Cooperative Inf. Systems, Brussels (1996). R. Agrawal, A. Gupta, S. Sarawagi Modeling multidimensional databases. IBM Research Report, IBM Almaden Research Center (1995). M. Blaschka et al. Finding your way through multidimensional data models. Proc. DEXA 98 (1998). L. Cabibbo, R. Torlone. A logical approach to multidimensional databases. EDBT 98 (1998). A. Datta, H. Thomas. A conceptual model and algebra for on-line analytical processing in data warehouses. Proc. WITS 97 (1997). E. Franconi, U. Sattler. A data warehouse conceptual model for multidimensional aggregation. Proc. DMDW 99 (1999). M. Golfarelli, D. Maio, S. Rizzi The Dimensional Fact Model: a conceptual model for data warehouses. Int. Jour. of Cooperative Inf. Systems 7, 2&3 (1998). M. Golfarelli, S. Rizzi. Designing the data warehouse: key steps and crucial issues. Jour. of Computer Science and Information Management 2, 3 (1999). 63 Bibliography (2) M. Gyssens, L.V.S. Lakshmanan. A foundation for multi-dimensional databases. Proc. 23rd VLDB, Athens, Greece (1997). B. Hüsemann, J. Lechtenbörger, G. Vossen. Conceptual data warehouse design. Proc. DMDW 00 (2000). R. Kimball. The data warehouse toolkit. John Wiley & Sons (1996). D. Moody, M. Kortink. From enterprise models to dimensional models: a methodology for data warehouse and data mart design. Proc. DMDW 00 (2000). T. Bach Pedersen, C. Jensen. Multidimensional data modelling for complex data. Proc. 15th ICDE, Sydney (1999). C. Sapia et al. Extending the E/R model for the multidimensional paradigm. Proc. ER 98 (1998). N. Tryfona, F. Busborg, J. Christiansen. starer: A Conceptual Model for Data Warehouse Design. Proc. DOLAP 99 (1999). P. Vassiliadis. Modeling multidimensional databases, cubes and cube operations. Proc. 10th SSDBM Conf., Capri, Italy (1998). 64